[PATCH 1/2] cpukit/aarch64: Keep state across context switch

Kinsey Moore kinsey.moore at oarcorp.com
Mon Feb 28 19:18:44 UTC 2022


On 2/28/2022 12:19, Sebastian Huber wrote:
> On 26/02/2022 08:03, Kinsey Moore wrote:
>> On 2/26/2022 00:53, Sebastian Huber wrote:
>>> On 26/02/2022 00:41, Kinsey Moore wrote:
>>>> This may also be an issue for ARM, RISC-V and others as it doesn't 
>>>> appear that ARM saves CPSR during context switch and I couldn't 
>>>> tell that RISC-V does this either, though I'm less familiar with it.
>>>
>>> This doesn't look like the right way to fix this issue.
>>>
>>> There is currently the assumption that all processors start 
>>> multitasking with a context switch to _Thread_Handler() which sets 
>>> the interrupt level. It is possible to construct a scenario in which 
>>> we start multitasking with a migration of a thread which already 
>>> executed the _Thread_Handler() prologue. This would result in an 
>>> execution with disabled interrupts. I think the proper fix for this 
>>> scenario is to enable interrupts in 
>>> _CPU_SMP_Prepare_start_multitasking().
>>>
>>> Doing a context switch with interrupts disabled is a fatal 
>>> application error on all architectures with
>>>
>>> #define CPU_ENABLE_ROBUST_THREAD_DISPATCH TRUE
>>>
>>> or enabled SMP support.
>>>
>> Ok, great. I was wondering if that was the case and this is 
>> definitely the kind of feedback I was looking for. I'll adjust the 
>> patch set to reflect that. I still wonder if this is an issue on 
>> other SMP CPU ports, though, since most of them don't implement that 
>> hook, either.
>
> I would like to have a closer look at this next week then I am back 
> from holidays.
>
> Enabling interrupts in _CPU_SMP_Prepare_start_multitasking() would not 
> work since we use the interrupt stack at this point. We should add a 
> ticket and a test case for this (I can do this next week). How did you 
> observe this bug?
>
I was only able to observe this bug once the 2/2 patch is applied and 
that optimization opens a race condition (adding a few no-ops to the 
Per_CPU_Control accessor prevents it from appearing) in the 
sppercpudata01 test on SMP configurations since the task is migrating 
across CPUs as CPUs are coming online. The race condition resolves 
nominally in 90% of cases so while it's not a frequent failure it is 
reproducible.


Kinsey



More information about the devel mailing list