[PATCH 2/5] build: Use CSafeLoader if available

Thu May 4 23:38:18 UTC 2023

On 4/5/2023 4:16 pm, Sebastian Huber wrote:
> On 04.05.23 05:35, Chris Johns wrote:
>> On 3/5/2023 7:40 pm, Sebastian Huber wrote:
>>> On 03.05.23 05:30, Chris Johns wrote:
>>>> On 28/4/2023 3:38 pm, Sebastian Huber wrote:
>>>>> On 27.04.23 20:27, Gedare Bloom wrote:
>>>>>> On Wed, Apr 26, 2023 at 11:46 PM Sebastian Huber
>>>>>> <sebastian.huber at embedded-brains.de>  wrote:
>>>>>>> On 27.04.23 02:11, Chris Johns wrote:
>>>>>>>> On 26/4/2023 6:04 pm, Sebastian Huber wrote:
>>>>>>>>> The CSafeLoader uses the C libyaml libary to considerably speed up the
>>>>>>>>> loading of YAML files.
>>>>>>>> No from me.
>>>>>>> What do you mean with not for me? You have the CSafeLoader available and
>>>>>>> it is slow? Do you have some timings before and after the patch set for
>>>>>>> a "./waf configure" and "./waf build"? On my systems the configure needs
>>>>>>> less than a second with the CSafeLoader and the waf build setup time is
>>>>>>> less than 100ms.
>>>>>>>
>>>>>>>> I do not agree with conditional states of operation in the build system
>>>>>>>> that
>>>>>>>> depend on packages a host has installed. If speed is an important factor
>>>>>>>> all
>>>>>>>> users then I suggest you find a means to have it available automatically
>>>>>>>> on the
>>>>>>>> hosts we support (Linux, FreeBSD, MacOS, Windows MINGW64 and Cygwin.
>>>>>>> I am not sure if we should automatically install system Python packages
>>>>>>> on user machines.
>>>>>>>
>>>>>>> The fall back is the Python PyYAML package available through the RTEMS
>>>>>>> sources. This is what we use currently. For RTEMS users, this is
>>>>>>> acceptable since they are not supposed to touch the YAML files. For
>>>>>>> RTEMS maintainers, not having the cache makes working with the build
>>>>>>> system more efficient.
>>>>>>>
>>>>>>> If they system PyYAML package is not installed, then you get now a hint
>>>>>>> to install it:
>>>>>>>
>>>>>>> Setting top to                           : /home/EB/sebastian_h/src/rtems
>>>>>>> Setting out to                           :
>>>>>>> /home/EB/sebastian_h/src/rtems/build
>>>>>>> Regenerate the build specification cache.  Install the PyYAML Python
>>>>>>> package to avoid this.  The cache regeneration needs a couple of seconds...
>>>>>>> Configure board support package (BSP)    : arm/realview_pbx_a9_qemu
>>>>>>>
>>>>>> I have two questions, which are related to Chris's concern I think.
>>>>>> 1. Are the output of PyYAML and C libyaml guaranteed to be consistent?
>>>>>
>>>>> I trust the PyYAML maintainers that the SafeLoader and CSafeLoader produce the
>>>>> same results. With respect to the alternative ItemCache class
>>>>> implementation in
>>>>> the wscript I am quite confident that this produces the same results. This
>>>>> part
>>>>> just has to load the item data from the files. The CSafeLoader based ItemCache
>>>>> has 53 lines of code.
>>>>>
>>>>>>
>>>>>> 2. Why not make C libyaml part of the RTEMS toolchain?
>>>>>>
>>>>>> Any dependencies that exist in the build system are (by definition)
>>>>>> suitable to be checked/provided by the tool buildset.
>>>>>
>>>>> Yes, this is an option. If we remove the pickle cache, then we force
>>>>> everyone to
>>>>> use the libyaml based PyYAML module. Is this really necessary right now?
>>>>
>>>> If we leave it who would do it? I would like to understand the next question
>>>> before we decide if this is important. The key objective is to have consistent
>>>> performance for every one. If the package is easy to build then we should do it
>>>> when we build the tools and the questions we are having go away.
>>>
>>> The PyYAML package had some security issues in the past. If we ship this
>>> package, who will monitor this package, update it, and write security
>>> advisories?
>>
>> The same way we would handle any security issue. When we become aware we update
>> what we provide.
> 
> This is a problem from my point of view. Maintenance activities (including
> security related topics) happen by accident in the RTEMS Project. In general,
> each mandatory host tool makes it harder to install RTEMS in certain environments.
> 
>>
>> Is PyYAML a pip package or is it provided by a distro package when using Linux?
>> My assumption, which may be wrong, is building libyaml (the C part) is all we
>> need to do?
> 
> You can install it through pip, conda, or whatever your host provides as
> packages. I guess you need to build also some Python bindings for libyaml to be
> able to use it.

Using pip with a virtual env is the path I think we should document for users.
With python3 is it easy to do and safe because the packages are contained and
localised to the RTEMS environment. Other solution can be used for those who are
across python and the packages so they can manage themselves.

When I say safe I mean the results are controlled and we are able to provide
support if something is not working as it should.

>>>>> For
>>>>> most use cases the Python only solution works fine. If you spend your time
>>>>> developing BSPs, then the CSafeLoader pays off.
>>>>
>>>> Maybe I am not understanding how this works. Why is there a difference for
>>>> developers vs a user who does not have this package installed? Does the
>>>> difference scale?
>>>
>>> A user typically just uses a certain version of RTEMS. Then the BSPs of interest
>>> are configured and built. A user is not supposed to touch the spec files.
>>
>> My experience is different.
>>
>> I do not agree with different levels of performance and build experience based
>> on the host operating system being used. We need to support all hosts in the
>> same way and this seems to favour users who have an OS that can provide the
>> package. We have had host biases other places in RTEMS and it takes a long time
>> to remove it. The policy I work to is RTEMS developers and users use the same
>> tools and processes and this has been working well through my time with this
>> project. I see no reason to move away from this.
> 
> I don't see the problem here, PyYAML is a widely used package.

I am sorry but I did not know if the package was a disto one and so specific to
Linux as it was based around a C code library. Using pip changes this but it
raises new issues. We have avoided dependent packages in the python tools we
provide so the tools are easy to install and use. They just work. Python package
dependencies adds something new to using RTEMS so the popularity of a package is
not the concern.

We need to manage this process to make sure what we do is easy and documented
for those who have no idea about python. We should indicate to users, maybe with
a warning, that a fall back solution is being used so they know they need to do
something to improve what is happening. Or we make it a hard rule to have the
csafeloader package and we move onto resolving how users get install it.

We have already decided to maintain python2 support in our user tools for RTEMS
6 however users now need python3 to build gdb for RTEMS 6 and so the python2
rule is now not making sense. These issues may be linked as Python3 has the easy
`python3 -m venv rtemspy` virtual environment support and that makes using pip
easy on any host.

> When I install it
> through pip, I get the CSafeLoader on my machine. I don't have a libyaml
> development package installed.

This is what I was not understanding. If it can be loaded in a python.org
install on a N2 MacOS with a virtual env then I think we are close to a resolution.
> 
> The pickle cache approach is not that bad, it just doesn't support some use
> cases well.

Sure.

> 
>>
>>> A maintainer adds, modifies, removes spec files during development. With the
>>> item cache, this always involves a time to wait of several seconds. the time to
>>> wait depends on the total number of spec files. With the CSafeLoader this time
>>> is reduced to a fraction of a second.
>>
>> If a user downloads a release is the intermediate data present or do they need
>> to wait while the it is parsed using what ever system they have?
>>
>> I am sorry if I am not understanding something in how this all works. I cannot
>> tell if your statement implies we are holding intermediate data in the repo or
>> releases need some extra processing before being packaged?
> 
> The pickle cache is not in the repository. We could add it to the release
> archive, but I am not sure if this is a good idea. The pickle format is Python
> version dependent.

Pickling a file into a repo is not a good idea.

Chris