Approachability of Documentation Generation

Sebastian Huber sebastian.huber at embedded-brains.de
Mon Oct 5 08:25:39 UTC 2020


On 02/10/2020 20:11, Gedare Bloom wrote:

> On Fri, Oct 2, 2020 at 8:57 AM Joel Sherrill <joel at rtems.org> wrote:
>> Hi
>>
>> The other thread has a mix of detailed "review this" versus philosophy on how to make the documentation generation process approachable so Sebastian isn't the only human capable of maintaining it. I want to talk at a high level about making the process approachable. The major factor in the push from texinfo to Sphinx was to increase the number of people who knew the format markup language and tools. We are now moving in a direction where we have a very unique system that we are responsible for making approachable. Otherwise, we are stuck with a system even fewer people know than texinfo.
>>
> Thanks for pulling this out. I think it is an important discussion to
> separate for posterity too.
Yes, it is good to discuss this, however, we could have done it some 
months ago.
>
>  From an historical perspective, I would also say that texinfo was
> technically dying/dead. Sphinx provided a popular approach that
> supported our requirements for HTML and PDF generation with latex-like
> quality. So I wouldn't necessarily say that lowering the bar was the
> primary decision to use Sphinx. If we just wanted simplicity we would
> have picked one of the alternatives we discussed such as markdown and
> asciidoc. The use of ReST introduced a learning curve for some of us.
> Any change in formatting is going to have some learning curve.
Selecting Sphinx for the documentation framework was a good decision. I 
am quite happy with this framework and generating the documentation code 
for Sphinx was really easy.
>
> I think one issue to consider is that tooling can help a lot with
> formatting concerns. The specification items use a custom format to
> encode them. Perhaps there is a better syntax (syntactic sugar) to
> simplify how we refer to them, if you find the ${.*:/.*} format hard
> to understand/parse. Once it is explained though, if it makes sense,
> then maybe that is not so relevant; just part of the learning curve.
>
> I do agree that we have to consider the "debt" we incur to make our
> documentation system approachable. This is nothing new however in the
> past the burden for the learning curve has been shifted on to the
> formatting tools (e.g., texinfo or ReST/Sphinx documentation). If we
> have to generate our own tutorials/HOWTOs we should be aware of that,
> and make those easy to find.
>
> Sebastian has started this here:
> https://docs.rtems.org/branches/master/eng/req/howto.html#interface-specification
>
> Certainly, this needs to be expanded especially to facilitate
> documentation patches from new contributors or even non-coders
> (technical writers).

Yes, I think we need more tutorials and how-tos. We have a lot of 
documentation, but it is hard to find and sometimes quite detailed. We 
need good staring points for newcomers.

If you are already familiar with Doxygen/Sphinx I don't think it is hard 
to learn working with the YAML files. That a ${...} kind of thing has 
something to do with variables should be clear to most programmers.

>
>> These are random thoughts about different ways to address this problem:
>>
>> Do not generate the documentation
>>
>> Although this is an option, I think we all agree that there is value to generating multiple artifacts from the same sources for consistency. This unfortunately puts the burden on us to avoid having the generation process
>>
> I lost your rich formatting/bullets. I'm going to respond by 1st level
> groups. Looks like your thought trailed off at the end there. Anyway,
> I think this is a non-starter. We want to avoid repetition and
> copy-paste (as a project-level unofficial goal).
Yes, we should really try to avoid large scale manual copy and paste. 
The directive and application configuration option documentation is 
large scale manual copy and paste. With a generator tool it is very easy 
to tweak the format of structured content.
>
>> Clear way to identify generated sections
> I think what you want here is to understand the relationship between
> the Sections, which are determined by the generator tool, and the
> items that go in them, which comes from the specification. Probably
> the biggest gap is understanding the mapping between the YAML
> key-value pairs that get parsed for generating the documentation
> content.

I think this is quite well documented in the source code of the 
generator tool. The Python module used to generate the directive 
documentation has just 200 lines of code (including the file header):

https://git.rtems.org/rtems-central/tree/rtemsspec/interfacedoc.py#n135

>
>> Clear way to trace generated sections to yaml file source
>>
>> All sections of generated documentation could be "fenced" with "start generated from XXX" and "end generated from XXX".  The XXX should be very specific.
>> Another approach which I don't think I like as much is to have a roadmap sister document which follows the outline and shows where each part came from. This would have to be automatically generated to be accurate.  This seems to just create burden and yet another document which a human would have to look at and correlate to figure out what they should edit.
>>
> I agree that the generator should generate comments to embed in the
> documentation to help with this relationship. My guess is that
> actually it won't be that hard to describe where each part comes from,
> e.g., a log of the generator could be made that provides a mechanical
> trace (matrix). I wouldn't use this as a documentation
> writer/developer tool though.

Yes, this would be easy to add. For example before the 
rtems_event_send() directive documentation we could place a:

.. Generated from spec:/rtems/event/if/send

>
>> Exception documentation on writing RtEMS documentation (tests, headers, etc) including extremely accurate and detailed information on the Yaml files.
>>
>> This is mandatory. We are heading to a very project specific workflow and toolchain.
>>
> By "exception" I guess you mean the directions/tutorials that go
> beyond what someone can find out from the public/search of the
> wider-used formatting tools. This would be about documentation for the
> RTEMS Project specific tooling and custom formats. I think actually a
> lot of this has been creeping in to
> https://docs.rtems.org/branches/master/eng/req/howto.html#how-to for
> example your question about what ${.*:/.*} means is described already
> in https://docs.rtems.org/branches/master/eng/req/howto.html#application-configuration-options
>
> So it is mostly a matter of making sure people know where to find the
> documentation, and to tell them to read the docs ;)
>
>> Chris' thought that in the case manually generated content  is added, we need a way to know and check that so it isn't lost.
>>
>> Possible solution. Generation tool adds checksum and checks that generated file matches before replacing it.
>>
> Manual content should not be added to generated files. We should
> instead separate manually added sections into their own files, I
> suspect.
Yes, I tried to separate generated from hand-written content at a file 
level. For this I split up the manager documentation into multiple files 
recently. Mixing things in a file would be possible, but I think this 
make things just more complicated and contradicts the approach to have 
structured information.
>
>> I'm sure there are other challenges but this particular generation process flies in the face of what drove many of the RTEMS process changes over the past decade. We went from CVS to git, texinfo to Sphinx, and autoconf to waf because we wanted to move to more modern tools that more people knew.  With texinfo, we also had our own texinfo to html converter and had to switch to the official one before the Sphinx conversion to avoid owning a tool not core to our mission. I believe this generation process is good for the RTEMS mission but we need to be very cognizant of the impact this has. How can we avoid mistakes in the workflow? How do we bring new users up to speed?
>>
> The one thing I'll agree with here is that we should identify what
> "debt" we are taking on here. We should consider the balance between
> maintenance costs and onboarding, and the benefits we are gaining. We
> knew some things will get harder because of pre-qual, and other things
> will get easier. We can't just reject anything that gets harder if on
> the sum things are better overall for us.
>
>   I'm not convinced we should call our documentation "not core to our
> mission" but that's a philosophical question, really. ;)

In the worst case you can still throw away the stuff in rtems-central 
and maintain the generated files manually. The generated files could 
have been written by a human. It is not some generated spaghetti code.

What you also have to take into account is the rate of change in the 
documentation. The generated content is for API elements. These items 
have a very slow rate of change. Once the conversion is done, we will 
mostly have typo fixes and some wording improvements here and there. 
People will notice this in the Sphinx documentation or the Doxygen. They 
will probably first ask on the mailing list if they find something. Then 
we can guide them and point to a tutorial/how-to or just fix it directly.

>
>> The idea that very few people have contributed to our documentation is now a good thing and unless we are very careful, the promise of Sphinx opening that group up is not going to happen. Worse, it could easily shrink the set of people who contribute to the documentation.
>>
> Maybe. OTOH, if someone just wants to contribute to a Sphinx
> documentation project they will probably look to Linux first. We do
> have a possibility here to innovate in this space, which may draw
> creative contributors anyway.
>
>> Thanks for doing this but it needs to be approachable to a high school student. We need to remember that.
>>
> There's a difference between generating new documentation and fixing
> existing documentation. Our goal with onboarding has always been
> fixing existing work, e.g., typos or formatting. While the new
> approach adds some complexity to fix a typo in the generated content,
> because it requires to run through the generator and then build the
> docs to confirm, it also has the advantage of fixing multiple typos at
> once such as in generated doxygen too. So, there are pros and cons.
> And students would gain more experience by tackling something with a
> slightly higher learning curve. There are always tradeoffs to
> consider. If the tutorials are clear, students will figure it out.
Yes.


More information about the devel mailing list