[PATCH 1/2] cpukit/aarch64: Keep state across context switch

Kinsey Moore kinsey.moore at oarcorp.com
Tue Mar 8 14:06:35 UTC 2022


On 3/8/2022 02:52, Sebastian Huber wrote:
> On 28/02/2022 20:18, Kinsey Moore wrote:
>>
>> On 2/28/2022 12:19, Sebastian Huber wrote:
>>> On 26/02/2022 08:03, Kinsey Moore wrote:
>>>> On 2/26/2022 00:53, Sebastian Huber wrote:
>>>>> On 26/02/2022 00:41, Kinsey Moore wrote:
>>>>>> This may also be an issue for ARM, RISC-V and others as it 
>>>>>> doesn't appear that ARM saves CPSR during context switch and I 
>>>>>> couldn't tell that RISC-V does this either, though I'm less 
>>>>>> familiar with it.
>>>>>
>>>>> This doesn't look like the right way to fix this issue.
>>>>>
>>>>> There is currently the assumption that all processors start 
>>>>> multitasking with a context switch to _Thread_Handler() which sets 
>>>>> the interrupt level. It is possible to construct a scenario in 
>>>>> which we start multitasking with a migration of a thread which 
>>>>> already executed the _Thread_Handler() prologue. This would result 
>>>>> in an execution with disabled interrupts. I think the proper fix 
>>>>> for this scenario is to enable interrupts in 
>>>>> _CPU_SMP_Prepare_start_multitasking().
>>>>>
>>>>> Doing a context switch with interrupts disabled is a fatal 
>>>>> application error on all architectures with
>>>>>
>>>>> #define CPU_ENABLE_ROBUST_THREAD_DISPATCH TRUE
>>>>>
>>>>> or enabled SMP support.
>>>>>
>>>> Ok, great. I was wondering if that was the case and this is 
>>>> definitely the kind of feedback I was looking for. I'll adjust the 
>>>> patch set to reflect that. I still wonder if this is an issue on 
>>>> other SMP CPU ports, though, since most of them don't implement 
>>>> that hook, either.
>>>
>>> I would like to have a closer look at this next week then I am back 
>>> from holidays.
>>>
>>> Enabling interrupts in _CPU_SMP_Prepare_start_multitasking() would 
>>> not work since we use the interrupt stack at this point. We should 
>>> add a ticket and a test case for this (I can do this next week). How 
>>> did you observe this bug?
>>>
>> I was only able to observe this bug once the 2/2 patch is applied and 
>> that optimization opens a race condition (adding a few no-ops to the 
>> Per_CPU_Control accessor prevents it from appearing) in the 
>> sppercpudata01 test on SMP configurations since the task is migrating 
>> across CPUs as CPUs are coming online. The race condition resolves 
>> nominally in 90% of cases so while it's not a frequent failure it is 
>> reproducible.
>
> I added a ticket and a test case:
>
> http://devel.rtems.org/ticket/4627
>
> Could you please check if the test case fails currently on your 
> aarch64 target?

I have verified that this test case fails under QEMU and on the hardware 
target.



More information about the devel mailing list