Question regarding ARMV7M FPU context switch and ISRs

Thu Feb 8 10:43:33 UTC 2024

Hello Sebastian,

On 08.02.2024 11:09, Sebastian Huber wrote:
> Hello Cedric,
>
> On 08.02.24 10:53, Cedric Berger wrote:
>> Hello,
>>
>> I've a question: does RTEMS really wants to support FPU operations in 
>> ISRs?
>>
>> Because if the answer is "no", then I believe that we could simplify 
>> the RTEMS code (and for me the mental model of the whole thing) by 
>> running the FPU with both FPCCR.ASPEN and FPCCR.LSPEN = 0.
>>
>> This mean that during IRQ/exception, the simple exception frame (32 
>> bytes) would always be used instead of a combination of the simple 
>> and extended frame (32 or 116 bytes).
>>
>> This would then improve the real-time guarantees of the system, by 
>> having a shorter and more deterministic IRQ response time.
>
> if you don't use the FPU in ISRs, then the overhead is just a space 
> overhead with the lazy FPU save/restore.

Yes, but it still means more cache lines, right?

And functions like _ARMV7M_Pendable_service_call and 
_ARMV7M_Supervisor_call now have to save/restore 116 bytes instead of 32 
bytes, right?

> If you switch from an ISR to another thread you have to save/restore 
> the volatile FPU context anyway.

Yes, my idea was to simply move d0-d7 out of struct 
ARMV7M_Exception_frame and into struct Context_Control.

And killing functions like 
_ARMV7M_Trigger_lazy_floating_point_context_save()

>>
>> This would also simplify the context switching code, by centralizing 
>> of the saving of the FPU context in RTEMS only, and enabling 
>> optimisation like only saving/restoring the FPU when switching 
>> between tasks defined with RTEMS_FLOATING_POINT.
>>
>> What do you think? I'm missing something? would it be a good idea?
>
> From experience, working with the RTEMS_FLOATING_POINT in applications 
> is quite annoying. Is there really a measurable and significant 
> performance improvement if you enable the deferred FPU switching? Can 
> you guarantee that the compiler will not generate FPU or vector 
> instructions for integer operations? In this version or a GCC release 
> in the future?

Obviously, since I'm not God, I won't be able to provide any guarantee 
regarding the future :)

But I believe that if GCC started to use FPU for integer operations, 
many people would complain:

FreeBSD requires fpu_kern_enter/fpu_kern_leave to use FPU in the kernel, 
and Linux requires kernel_fpu_begin/kernel_fpu_end to use FPU ops in the 
kernel.

I'm pretty sure Linus will give GCC developpers a hard time if they 
start to use FPU for integer operations anytime soon...

>
>>
>> I would be willing to work on that is there is some kind of agreement 
>> here.
>
> If you change the ARMv7-M CPU port to use the deferred FPU switching, 
> then you surely break existing applications which then have to use 
> RTEMS_FLOATING_POINT. I would do this only if there would be a clear 
> and measurable performance improvement. For the measurements we would 
> need a benchmark.

Ok, so no to using deferred FPU switching for the moment, at least 
without benchmark.

But what about just running with FPCCR.ASPEN and FPCCR.LSPEN = 0, and 
always saving the FPU in _CPU_Context_switch when swithing tasks?

It would only break existing applications which use the FPU inside IRQs, 
is that a problem? is the ISR FPU behaviour/requirements documented 
somewhere in RTEMS?

To be honest, my main motivation here is trying to simplify the code and 
more importantly improve debuggability and my understanding of how the 
system work. my head kind of hurts trying to understand exactly at which 
point in the code a lazy FPU save can occur.

Thanks for your quick answer.

Cedric