Approachability of Documentation Generation

Mon Oct 5 23:24:27 UTC 2020

On 5/10/20 7:25 pm, Sebastian Huber wrote:
> On 02/10/2020 20:11, Gedare Bloom wrote:
>> On Fri, Oct 2, 2020 at 8:57 AM Joel Sherrill <joel at rtems.org> wrote:
>>> Hi
>>>
>>> The other thread has a mix of detailed "review this" versus philosophy on how
>>> to make the documentation generation process approachable so Sebastian isn't
>>> the only human capable of maintaining it. I want to talk at a high level
>>> about making the process approachable. The major factor in the push from
>>> texinfo to Sphinx was to increase the number of people who knew the format
>>> markup language and tools. We are now moving in a direction where we have a
>>> very unique system that we are responsible for making approachable.
>>> Otherwise, we are stuck with a system even fewer people know than texinfo.
>>>
>> Thanks for pulling this out. I think it is an important discussion to
>> separate for posterity too.
> Yes, it is good to discuss this, however, we could have done it some months ago.

It was not clear to me the how this was going to work and a number of part are
still not clear. I have no issue with this approach as something on this scale
and complexity would die before it started if we made sure all the pieces were
clearly understood and fully worked out.

We have a base but there is more than needs to be done to make it easier to work
with in the project and to lower the bar for others to access. This will only
come with time.

>>  From an historical perspective, I would also say that texinfo was
>> technically dying/dead. Sphinx provided a popular approach that
>> supported our requirements for HTML and PDF generation with latex-like
>> quality. So I wouldn't necessarily say that lowering the bar was the
>> primary decision to use Sphinx. If we just wanted simplicity we would
>> have picked one of the alternatives we discussed such as markdown and
>> asciidoc.

I used and spent time with each of those tools and they failed to meet the
demands we have. I felt both were not suitable.

>> The use of ReST introduced a learning curve for some of us.
>> Any change in formatting is going to have some learning curve.
> Selecting Sphinx for the documentation framework was a good decision. I am quite
> happy with this framework and generating the documentation code for Sphinx was
> really easy.

I agree Sphinx is a great documentation platform.

>> I think one issue to consider is that tooling can help a lot with
>> formatting concerns. The specification items use a custom format to
>> encode them. Perhaps there is a better syntax (syntactic sugar) to
>> simplify how we refer to them, if you find the ${.*:/.*} format hard
>> to understand/parse. Once it is explained though, if it makes sense,
>> then maybe that is not so relevant; just part of the learning curve.
>>
>> I do agree that we have to consider the "debt" we incur to make our
>> documentation system approachable. This is nothing new however in the
>> past the burden for the learning curve has been shifted on to the
>> formatting tools (e.g., texinfo or ReST/Sphinx documentation). If we
>> have to generate our own tutorials/HOWTOs we should be aware of that,
>> and make those easy to find.
>>
>> Sebastian has started this here:
>> https://docs.rtems.org/branches/master/eng/req/howto.html#interface-specification
>>
>> Certainly, this needs to be expanded especially to facilitate
>> documentation patches from new contributors or even non-coders
>> (technical writers).
> 
> Yes, I think we need more tutorials and how-tos. We have a lot of documentation,
> but it is hard to find and sometimes quite detailed. We need good staring points
> for newcomers.
> 
> If you are already familiar with Doxygen/Sphinx I don't think it is hard to
> learn working with the YAML files. That a ${...} kind of thing has something to
> do with variables should be clear to most programmers.

YAML and Sphinx (ReSET) are easy and we have some good examples of how we use
them that can be copied. It is what we are doing with both that is complex and
intimidating. And this is understandable as the goal is to capture a lot of
complexity in a compact, efficient and machine readable structure. I see what we
have as low level and we need to look to grow the supporting structures around
it to make it work. I have no idea what this means or how it will look but we
need to consider it and not accept the merging as the end of the efforts. It is
just the start.

>>> These are random thoughts about different ways to address this problem:
>>>
>>> Do not generate the documentation
>>>
>>> Although this is an option, I think we all agree that there is value to
>>> generating multiple artifacts from the same sources for consistency. This
>>> unfortunately puts the burden on us to avoid having the generation process
>>>
>> I lost your rich formatting/bullets. I'm going to respond by 1st level
>> groups. Looks like your thought trailed off at the end there. Anyway,
>> I think this is a non-starter. We want to avoid repetition and
>> copy-paste (as a project-level unofficial goal).
> Yes, we should really try to avoid large scale manual copy and paste. The
> directive and application configuration option documentation is large scale
> manual copy and paste. With a generator tool it is very easy to tweak the format
> of structured content.

The ability to machine read and handle the content is key to having the qual
effort work. It is not the specifics of a generator tool or format output logic
that is important, those are implementation details. The removal of redundant
information is a feature but at what cost? We need to understand this. Do we
accept a simple patch to the generated ReST documentation from a user, do we
respond with a link to another repo, Python source code, YAML config files,
scripts and more or does a core developer translate the patch to a YAML file
change and then regenerate the ReSET? As things currently stand today accepting
the ReST patch seems the best alternative because it makes a user feel welcome
and productive in our community. If we select one of the other paths we need to
make sure the the user is considered and engaged in some manner.

>>> Clear way to identify generated sections
>> I think what you want here is to understand the relationship between
>> the Sections, which are determined by the generator tool, and the
>> items that go in them, which comes from the specification. Probably
>> the biggest gap is understanding the mapping between the YAML
>> key-value pairs that get parsed for generating the documentation
>> content.
> 
> I think this is quite well documented in the source code of the generator tool.

I would hope we could do better than this. I have found being able to visually
report and then inspect the data is important.

> The Python module used to generate the directive documentation has just 200
> lines of code (including the file header):
> 
> https://git.rtems.org/rtems-central/tree/rtemsspec/interfacedoc.py#n135

The complexity is the underlying data structure and not a fragment of code

>>> Clear way to trace generated sections to yaml file source
>>>
>>> All sections of generated documentation could be "fenced" with "start
>>> generated from XXX" and "end generated from XXX".  The XXX should be very
>>> specific.
>>> Another approach which I don't think I like as much is to have a roadmap
>>> sister document which follows the outline and shows where each part came
>>> from. This would have to be automatically generated to be accurate.  This
>>> seems to just create burden and yet another document which a human would have
>>> to look at and correlate to figure out what they should edit.
>>>
>> I agree that the generator should generate comments to embed in the
>> documentation to help with this relationship. My guess is that
>> actually it won't be that hard to describe where each part comes from,
>> e.g., a log of the generator could be made that provides a mechanical
>> trace (matrix). I wouldn't use this as a documentation
>> writer/developer tool though.
> 
> Yes, this would be easy to add. For example before the rtems_event_send()
> directive documentation we could place a:
> 
> .. Generated from spec:/rtems/event/if/send

Nice idea.

>>> Exception documentation on writing RtEMS documentation (tests, headers, etc)
>>> including extremely accurate and detailed information on the Yaml files.
>>>
>>> This is mandatory. We are heading to a very project specific workflow and
>>> toolchain.
>>>
>> By "exception" I guess you mean the directions/tutorials that go
>> beyond what someone can find out from the public/search of the
>> wider-used formatting tools. This would be about documentation for the
>> RTEMS Project specific tooling and custom formats. I think actually a
>> lot of this has been creeping in to
>> https://docs.rtems.org/branches/master/eng/req/howto.html#how-to for
>> example your question about what ${.*:/.*} means is described already
>> in
>> https://docs.rtems.org/branches/master/eng/req/howto.html#application-configuration-options
>>
>>
>> So it is mostly a matter of making sure people know where to find the
>> documentation, and to tell them to read the docs ;)
>>
>>> Chris' thought that in the case manually generated content  is added, we need
>>> a way to know and check that so it isn't lost.
>>>
>>> Possible solution. Generation tool adds checksum and checks that generated
>>> file matches before replacing it.
>>>
>> Manual content should not be added to generated files. We should
>> instead separate manually added sections into their own files, I
>> suspect.
> Yes, I tried to separate generated from hand-written content at a file level.
> For this I split up the manager documentation into multiple files recently.
> Mixing things in a file would be possible, but I think this make things just
> more complicated and contradicts the approach to have structured information.

Saying do not edit is easy but the effects can be hard to see. A user spots an
error and decides to help out so checks the section's ReST source and at the top
is "Generated, do not edit" which is harder than they expected so they walk away.

Also "not adding manual content" contradicts the base requirement the RTEMS
Project does not depending on the qual efforts and data. I need time to see how
this all works and what supporting tools make it easier before I support a
change in policy like this. I hope we can make a change like this, it means the
generation side is working well.

>>
>>> I'm sure there are other challenges but this particular generation process
>>> flies in the face of what drove many of the RTEMS process changes over the
>>> past decade.

I agreed.

>>> We went from CVS to git, texinfo to Sphinx, and autoconf to waf
>>> because we wanted to move to more modern tools that more people knew.  With
>>> texinfo, we also had our own texinfo to html converter and had to switch to
>>> the official one before the Sphinx conversion to avoid owning a tool not core
>>> to our mission. I believe this generation process is good for the RTEMS
>>> mission but we need to be very cognizant of the impact this has. How can we
>>> avoid mistakes in the workflow? How do we bring new users up to speed?

Agreed.

>> The one thing I'll agree with here is that we should identify what
>> "debt" we are taking on here. We should consider the balance between
>> maintenance costs and onboarding, and the benefits we are gaining. We
>> knew some things will get harder because of pre-qual, and other things
>> will get easier. We can't just reject anything that gets harder if on
>> the sum things are better overall for us.

Nicely said, thanks. The other part of this is understanding what the qual
effort is going to provide to help support the onboarding and "debt". This is
where my thoughts are heading with this discussed. It is unfair to expect
everything to be resolved in a few commits and it is hard for us to know what
the qual effort is looking to do.

>>   I'm not convinced we should call our documentation "not core to our
>> mission" but that's a philosophical question, really. ;)
> 
> In the worst case you can still throw away the stuff in rtems-central and
> maintain the generated files manually. The generated files could have been
> written by a human. It is not some generated spaghetti code.

Yes and my wish is for rtems-central to be central and part of the normal
workflow. This happening depends on how easy it is to use and so the cost
overhead we have maintaining it.

> What you also have to take into account is the rate of change in the
> documentation. The generated content is for API elements. These items have a
> very slow rate of change. Once the conversion is done, we will mostly have typo
> fixes and some wording improvements here and there. People will notice this in
> the Sphinx documentation or the Doxygen. They will probably first ask on the
> mailing list if they find something. Then we can guide them and point to a
> tutorial/how-to or just fix it directly.

Would adding comments on what a user does in the ReST source help with this?

>>> The idea that very few people have contributed to our documentation is now a
>>> good thing and unless we are very careful, the promise of Sphinx opening that
>>> group up is not going to happen. Worse, it could easily shrink the set of
>>> people who contribute to the documentation.
>>>
>> Maybe. OTOH, if someone just wants to contribute to a Sphinx
>> documentation project they will probably look to Linux first. We do
>> have a possibility here to innovate in this space, which may draw
>> creative contributors anyway.
>>
>>> Thanks for doing this but it needs to be approachable to a high school
>>> student. We need to remember that.
>>>
>> There's a difference between generating new documentation and fixing
>> existing documentation. Our goal with onboarding has always been
>> fixing existing work, e.g., typos or formatting. While the new
>> approach adds some complexity to fix a typo in the generated content,
>> because it requires to run through the generator and then build the
>> docs to confirm, it also has the advantage of fixing multiple typos at
>> once such as in generated doxygen too. So, there are pros and cons.
>> And students would gain more experience by tackling something with a
>> slightly higher learning curve. There are always tradeoffs to
>> consider. If the tutorials are clear, students will figure it out.
> Yes.

Agreed.

Thanks for this thread and the comments. It is helping me get to terms with the
changes that are happening.

Chris