[PATCH] aarch64: Add tests that are failing intermittently

Kinsey Moore kinsey.moore at oarcorp.com
Sat Aug 28 01:46:59 UTC 2021


On 8/27/2021 19:01, Chris Johns wrote:
> On 27/8/21 9:36 am, Kinsey Moore wrote:
>> Since I'm working on SMP and I've had some of those tests failing 
>> sporadically
>> as well, I took a dive into smpschededf01.exe on AArch64 and the issue that
>> particular test seems to be encountering is a mismatch between the busy wait
>> delay using rtems_test_busy_cpu_usage() and the number of kernel ticks that have
>> been experienced. My hypothesis is that QEMU is prone to dumping a pile of timer
>> ticks into the virtual CPU all at once to catch up to wall time after returning
>> from a context switch on the host OS. This would support the observation that
>> failures are sporadic and increase under system load.  I instrumented the code
>> and can see that the loop in rtems_test_busy_cpu_usage() isn't running
>> substantially between these tick interrupts if at all.
> Oh that would confuse things.
I bumped RSB qemu locally from 5.2-rc1 to 5.2.0 release and the behavior 
got better, but it's still not great and will cause a failure rate of 
approximately 30% with my stripped down and instrumented test. At least 
it's better than 90+% failure rate of 4.1.0 or 5.2-rc1. I previously had 
QEMU 3.1.0 installed from the debian buster package repo and it behaved 
even better than the 5.2.0 release, so there was definitely some kind of 
regression in the interim that got partially fixed.
>> I guess my next step is seeing if QEMU has an option to run its timers closer to
>> the illusion of metal instead of being based on the wall clock.
> QEMU would need to handle instruction or a CPU timer to manage this.

There don't seem to be any options to manipulate this that I've found, 
but there are a couple of internal timer types. It looks like the QEMU 
virtual timers fall back to a QEMU realtime timer if the virtual timer 
hooks aren't available. I didn't see many of the virtual timer hooks 
defined in the QEMU codebase, so I assume that's what's happening since 
the timer definitions in QEMU for the ARM Generic Timers are of the 
virtual variety.

I'm not sure what can be done from this point beyond updating RSB QEMU 
to 5.2.0 release from 5.2-rc1 barring inordinate time spent in the 
bowels of QEMU.


Kinsey



More information about the devel mailing list