What to do with rtems_cache_disable_data()?

Chris Johns chrisj at rtems.org
Tue Jun 18 05:20:27 UTC 2024


On 18/6/2024 12:02 am, Sebastian Huber wrote:
> On 17.06.24 03:35, Chris Johns wrote:
>> On 14/6/2024 10:42 pm, Peter Dufault wrote:
>>>
>>>
>>>> On Jun 14, 2024, at 5:47 AM, Sebastian Huber
>>>> <sebastian.huber at embedded-brains.de> wrote:
>>>>
>>>> Hello,
>>>>
>>>> an user noticed that for example on the Xilinx Zynq 7000 BSP, the
>>>> rtems_cache_disable_data() doesn't work.
>>>>
>>>> I had a look at this and managed to disable the L1 and L2 caches, however,
>>>> afterwards I got not that far. On the Cortex-A cores it seems at least the
>>>> L1 data cache is required to provide support for atomic operations:
>>>>
>>>> https://stackoverflow.com/questions/76207164/disabled-dcache-will-prevent-atomic-flag-from-being-set
>>>>
>>>> I guess we have this situation on most modern chips with caches since the
>>>> reservation granule is usually a cache line. How do we want to deal with
>>>> rtems_cache_disable_data() in this case? From my point of view, this
>>>> function as no real use case and adding it in the first place was a mistake.
>>>>
>>>> We have a couple of options:
>>>>
>>>> * Make the rtems_cache_disable_data() directive a no-operation. Afterwards
>>>> the cache is still enabled, and an user may get confused.
>>>>
>>>> * Issue a fatal error if someone calls rtems_cache_disable_data().
>>>>
>>>> * Really disable the cache and let the user figure out that the atomic
>>>> operations no longer work. Disabling the data cache can be a quite complex
>>>> thing to do.
>>>>
>>>> * Add a return status code to rtems_cache_disable_data() and let it return
>>>> RTEMS_UNSATISFIED for example.
>>>
>>> Assuming "this function has no real use case and adding it in the first place
>>> was a mistake" how about:
>>>
>>> - In the active release continue to disable the data cache but add a warning
>>> attribute 'warning ("Disabling data cache breaks atomic functionality")';
>>> - In the next release change it to an error attribute and change the function
>>> behavior to do nothing except return RTEMS_UNSATISFIED (in case someone
>>> somehow still calls it), or better change it to call an RTEMS fatal function.
>>>
>>
>> I am not so sure it can be easily simplified. I know of a Zynq or ZynqMP
>> application in a critical system (bare metal, no OS) with the caches disabled
>> for a specific reason. I cannot go into or remember the specific detail.
> 
> Even in such a scenario you would probably not enable the data cache and then
> disable it with a function call? You would simply configure the system to not
> enable the cache at all.

Yes this would be correct, it held off.

>> We cannot silently disable functionality. A call must do what it says or return
>> or raise an error. If disabling the data cache is not implement there should be
>> an error. The error would draw the user to the documentation. It is not yet
>> clear we can disable the code in that function.
>>
>> If our kernel code uses atomics then the data cache cannot be disabled. However
>> we should look into the use of atomics and if it is SMP specific and then handle
>> the specific cases.
> 
> The atomic operations are defined by the C/C++ standard. This is not an
> SMP-specific issue. It is also not about the usage of the cache in general. It
> is about the rtems_cache_disable_data() directive.

My comments about not having silent things happens is common to all things we
do. We need an error if what we have is not workable on a platform.

>> A user can use atomics even if the kernel does not but leaving that to the user
>> to figure out and deal is at best unhelpful and as an OS provider we should help
>> users in these complex corner cases. At a minimum our documentation should cover
>> this and we could update rtems-exeinfo to report a potential issue.
> 
> Which of the above four options would be your favourite? Would there be another
> option to deal with rtems_cache_disable_data()?
> 
> Currently, calling this function crashes the system on a Zynq 7000.

The return code but is it realistic to make such a change for such a long
standing interface? I suppose it is then the fatal error. It is a major thing to
do disabling the cache.

Out of interest why do we have the interface? What is the use case?

Chris


More information about the devel mailing list