Potential SIS or RTEMS/libbsd problem

Jiri Gaisler jiri at gaisler.se
Thu May 23 22:37:29 UTC 2019


On 5/23/19 7:35 AM, Sebastian Huber wrote:
> On 22/05/2019 22:34, Jiri Gaisler wrote:
>> On 5/22/19 7:43 PM, Jiri Gaisler wrote:
>>> On 5/22/19 9:49 AM, Sebastian Huber wrote:
>>>> On 22/05/2019 09:39, Jiri Gaisler wrote:
>>>>> On 5/22/19 8:03 AM, Sebastian Huber wrote:
>>>>>> Hello,
>>>>>>
>>>>>> in the libbsd there is a test for the Epoch Based Reclamation:
>>>>>>
>>>>>> https://git.rtems.org/rtems-libbsd/tree/testsuite/epoch01/test_main.c
>>>>>>
>>>>>>
>>>>>> When I run this test using the leon3 BSP on real hardware (150MHz
>>>>>> NGMP FP) the test completes successfully.
>>>>>>
>>>>>> If I run the test on the SIS, it is stuck at some point (using "-m
>>>>>> 1" works):
>>>>>>
>>>>>> sparc-rtems5-sis -leon3 -nouartrx -r -tlim 200 s -m 2
>>>>>> build/sparc-rtems5-leon3-everything/epoch01.exe
>>>>>>
>>>>>>
>>>>> This test needs a shorter time-slice in the simulator to succeed (-d
>>>>> option). The more cpus, the lower number of clocks in the slice is
>>>>> needed. Through trial-and-error, these values seem to work:
>>>>>
>>>>> 2 CPUs: -m 2 -d 25
>>>>>
>>>>> 3 CPUs: -m 3 -d 10
>>>>>
>>>>> 4 CPUs will not work, even if -d 1 is set. This is most likely a
>>>>> simulator problem, I will try to find time to look at it in more
>>>>> detail. A quick trace shows that all CPUs are stuck in a loop
>>>>> checking for a lock or similar:
>>>>>
>>>> It seems cpu 2 and 3 are in _SMP_barrier_Wait(). The cpu 0 and 1 still
>>>> to some stuff in the EBR algorithm (ck_* functions). Maybe the
>>>> algorithm works only in case some random timing fluctuations occur.
>>> Either that or there is a hidden race condition in the test that does
>>> not show up on real hardware. I noticed that increasing the time slice
>>> actually make the test succeed even on 4 cpus ..!
>>>
>>> -m 2 -d 200    PASS
>>>
>>> -m 3 -d 200    PASS
>>>
>>> -m 4 -d 200    FAIL
>>>
>>> -m 4 -d 400    PASS!
>>>
>>> BUT
>>>
>>> -m 3 -d 400    FAIL!
>>>
>>> I will try to add random delays to the interrupt response time to
>>> see if
>>> that will make a difference. That is more inline with the real
>>> hardware ...
>> Adding a pseudo-random delay of 0 - 15 clocks to each trap/interrupt
>> causes the test to pass on all cpu configurations with the default
>> time slice (50)..! I am not sure what this means - it could be a
>> hidden race condition, the algorithm might need some jitter to work
>> or it could still be a simulator issue.
>>
>> Is there any chance that you could compile this test for sis-riscv?
>> RISC-V has different atomic operations and trap handlers so it would
>> be interesting to see if the test behaves differently.
>
> It locks up at the same spot:
>
> riscv-rtems5-sis -m 4 build/riscv-rtems5-griscv-default/epoch01.exe
>
>  SIS - SPARC/RISCV instruction simulator 2.13,  copyright Jiri Gaisler
> 2019
>  Bug-reports to jiri at gaisler.se
>
>  RISCV emulation enabled, 4 cpus online, delta 50 clocks
>
> cpu0> run
> *** LIBBSD EPOCH 1 TEST ***
> nexus0: <RTEMS Nexus device>
> <TestEpoch01>
>   <EnterExit activeWorker="1">
>     <Counter worker="0">1059417</Counter>
>   </EnterExit>
>   <EnterExit activeWorker="2">
>     <Counter worker="0">1059303</Counter>
>     <Counter worker="1">1049390</Counter>
>   </EnterExit>
>   <EnterExit activeWorker="3">
>     <Counter worker="0">1058922</Counter>
>     <Counter worker="1">1049008</Counter>
>     <Counter worker="2">1061640</Counter>
>   </EnterExit>
>   <EnterExit activeWorker="4">
>     <Counter worker="0">1058540</Counter>
>     <Counter worker="1">1048679</Counter>
>     <Counter worker="2">1061258</Counter>
>     <Counter worker="3">1061258</Counter>
>   </EnterExit>
>   <EnterListOpExit activeWorker="1">
>     <Counter worker="0">925414</Counter>
>     <Removals worker="0">100</Removals>
>   </EnterListOpExit>
>   <EnterListOpExit activeWorker="2">
>     <Counter worker="0">704898</Counter>
>     <Counter worker="1">704835</Counter>
>     <Removals worker="0">46</Removals>
>     <Removals worker="1">45</Removals>
>   </EnterListOpExit>
>   <EnterListOpExit activeWorker="3">
>     <Counter worker="0">589977</Counter>
>     <Counter worker="1">585688</Counter>
>     <Counter worker="2">592200</Counter>
>     <Removals worker="0">23</Removals>
>     <Removals worker="1">23</Removals>
>     <Removals worker="2">23</Removals>
>   </EnterListOpExit>
>   <EnterListOpExit activeWorker="4">
>     <Counter worker="0">505834</Counter>
>     <Counter worker="1">501869</Counter>
>     <Counter worker="2">507615</Counter>
>     <Counter worker="3">507614</Counter>
>     <Removals worker="0">19</Removals>
>     <Removals worker="1">18</Removals>
>     <Removals worker="2">18</Removals>
>     <Removals worker="3">18</Removals>
>   </EnterListOpExit>
>   <EnterExitPreempt activeWorker="1">
>     <Counter worker="0">275348</Counter>
>   </EnterExitPreempt>
>   <EnterExitPreempt activeWorker="2">
>     <Counter worker="0">275971</Counter>
>     <Counter worker="1">280381</Counter>
>   </EnterExitPreempt>
>   <EnterExitPreempt activeWorker="3">
>     <Counter worker="0">275956</Counter>
>     <Counter worker="1">280283</Counter>
>     <Counter worker="2">280283</Counter>
>   </EnterExitPreempt>
>   <EnterExitPreempt activeWorker="4">
>     <Counter worker="0">275800</Counter>
>     <Counter worker="1">280185</Counter>
>     <Counter worker="2">280185</Counter>
>     <Counter worker="3">280185</Counter>
>   </EnterExitPreempt>
>   <EnterListOpExitPreempt activeWorker="1">
>     <Counter worker="0">266212</Counter>
>     <Removals worker="0">68</Removals>
>   </EnterListOpExitPreempt>
> Interrupt!
>  Stopped at time 975738600 (19514.772 ms)
> cpu0>
>
> The EBR is a core synchronization primitive in libbsd. It makes me a
> bit nervous to have this dependency on random fluctuations to make
> progress. I don't know the algorithm good enough to say if this is the
> expected behaviour. A real machine with such an exact relative
> instruction execution is probably non-existent.
>
> In general, you can lock up an SMP system quite easily if you perform
> the right LL/SC pair on two processors to that they endlessly steal
> each other the reservation.

So I added an emulated 4 Kbyte L1 instruction cache to each core and a
shared 256 Kbyte L2 instruction cache to get some instruction timing
jitter. The cache refill time also has a pseudo-random element to
emulate the shared AHB bus latency. This makes of the epoch01 tests pass
but I can get a fail by playing with the time slice size or changing the
cache size. I wonder if it was pure luck that the test worked on the
real SPARC hardware. Is it possible to run the test using only three or
two cores on the hardware? In my experiments, three cores seemed to fail
more often than the other configurations ...

Jiri.





More information about the devel mailing list