[PATCH] aarch64: Add tests that are failing intermittently
Kinsey Moore
kinsey.moore at oarcorp.com
Sat Aug 28 01:46:59 UTC 2021
On 8/27/2021 19:01, Chris Johns wrote:
> On 27/8/21 9:36 am, Kinsey Moore wrote:
>> Since I'm working on SMP and I've had some of those tests failing
>> sporadically
>> as well, I took a dive into smpschededf01.exe on AArch64 and the issue that
>> particular test seems to be encountering is a mismatch between the busy wait
>> delay using rtems_test_busy_cpu_usage() and the number of kernel ticks that have
>> been experienced. My hypothesis is that QEMU is prone to dumping a pile of timer
>> ticks into the virtual CPU all at once to catch up to wall time after returning
>> from a context switch on the host OS. This would support the observation that
>> failures are sporadic and increase under system load. I instrumented the code
>> and can see that the loop in rtems_test_busy_cpu_usage() isn't running
>> substantially between these tick interrupts if at all.
> Oh that would confuse things.
I bumped RSB qemu locally from 5.2-rc1 to 5.2.0 release and the behavior
got better, but it's still not great and will cause a failure rate of
approximately 30% with my stripped down and instrumented test. At least
it's better than 90+% failure rate of 4.1.0 or 5.2-rc1. I previously had
QEMU 3.1.0 installed from the debian buster package repo and it behaved
even better than the 5.2.0 release, so there was definitely some kind of
regression in the interim that got partially fixed.
>> I guess my next step is seeing if QEMU has an option to run its timers closer to
>> the illusion of metal instead of being based on the wall clock.
> QEMU would need to handle instruction or a CPU timer to manage this.
There don't seem to be any options to manipulate this that I've found,
but there are a couple of internal timer types. It looks like the QEMU
virtual timers fall back to a QEMU realtime timer if the virtual timer
hooks aren't available. I didn't see many of the virtual timer hooks
defined in the QEMU codebase, so I assume that's what's happening since
the timer definitions in QEMU for the ARM Generic Timers are of the
virtual variety.
I'm not sure what can be done from this point beyond updating RSB QEMU
to 5.2.0 release from 5.2-rc1 barring inordinate time spent in the
bowels of QEMU.
Kinsey
More information about the devel
mailing list