Potential SIS or RTEMS/libbsd problem

Jiri Gaisler jiri at gaisler.se
Wed May 22 20:34:11 UTC 2019


On 5/22/19 7:43 PM, Jiri Gaisler wrote:
> On 5/22/19 9:49 AM, Sebastian Huber wrote:
>> On 22/05/2019 09:39, Jiri Gaisler wrote:
>>> On 5/22/19 8:03 AM, Sebastian Huber wrote:
>>>> Hello,
>>>>
>>>> in the libbsd there is a test for the Epoch Based Reclamation:
>>>>
>>>> https://git.rtems.org/rtems-libbsd/tree/testsuite/epoch01/test_main.c
>>>>
>>>> When I run this test using the leon3 BSP on real hardware (150MHz
>>>> NGMP FP) the test completes successfully.
>>>>
>>>> If I run the test on the SIS, it is stuck at some point (using "-m
>>>> 1" works):
>>>>
>>>> sparc-rtems5-sis -leon3 -nouartrx -r -tlim 200 s -m 2
>>>> build/sparc-rtems5-leon3-everything/epoch01.exe
>>>>
>>>>
>>> This test needs a shorter time-slice in the simulator to succeed (-d
>>> option). The more cpus, the lower number of clocks in the slice is
>>> needed. Through trial-and-error, these values seem to work:
>>>
>>> 2 CPUs: -m 2 -d 25
>>>
>>> 3 CPUs: -m 3 -d 10
>>>
>>> 4 CPUs will not work, even if -d 1 is set. This is most likely a
>>> simulator problem, I will try to find time to look at it in more
>>> detail. A quick trace shows that all CPUs are stuck in a loop
>>> checking for a lock or similar:
>>>
>> It seems cpu 2 and 3 are in _SMP_barrier_Wait(). The cpu 0 and 1 still
>> to some stuff in the EBR algorithm (ck_* functions). Maybe the
>> algorithm works only in case some random timing fluctuations occur.
> Either that or there is a hidden race condition in the test that does
> not show up on real hardware. I noticed that increasing the time slice
> actually make the test succeed even on 4 cpus ..!
>
> -m 2 -d 200    PASS
>
> -m 3 -d 200    PASS
>
> -m 4 -d 200    FAIL
>
> -m 4 -d 400    PASS!
>
> BUT
>
> -m 3 -d 400    FAIL!
>
> I will try to add random delays to the interrupt response time to see if
> that will make a difference. That is more inline with the real hardware ...

Adding a pseudo-random delay of 0 - 15 clocks to each trap/interrupt causes the test to pass on all cpu configurations with the default time slice (50)..! I am not sure what this means - it could be a hidden race condition, the algorithm might need some jitter to work or it could still be a simulator issue.

Is there any chance that you could compile this test for sis-riscv? RISC-V has different atomic operations and trap handlers so it would be interesting to see if the test behaves differently.

Jiri.



More information about the devel mailing list