[PATCH] aarch64: Add tests that are failing intermittently

Kinsey Moore kinsey.moore at oarcorp.com
Thu Aug 26 23:36:33 UTC 2021


On 8/20/2021 22:06, Chris Johns wrote:
> On 21/8/21 2:38 am, Kinsey Moore wrote:
>> On 8/19/2021 18:03, Chris Johns wrote:
>>> On 20/8/21 4:55 am, Kinsey Moore wrote:
>>>> On 8/19/2021 13:32, Gedare Bloom wrote:
>>>>> On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore <kinsey.moore at oarcorp.com> wrote:
>>>>>> I've seen these failures on my local system, in our CI, and on a build
>>>>>> server that I sometimes
>>>>>> use for development/testing so if it's a configuration issue we're being
>>>>>> pretty consistent about
>>>>>> misconfiguration across some pretty different environments (docker,
>>>>>> bare-metal, VM, different
>>>>>> OSs, different QEMU versions). I've seen enough of the spintrcritical
>>>>>> tests fail sporadically on
>>>>>> QEMU to lump them all into this category. These are also tests that I
>>>>>> have seen behave badly
>>>>>> on ARMv7 QEMU on my local system (which doesn't rule out
>>>>>> misconfiguration, but it's another
>>>>>> data point).
>>>>>>
>>>>> Yes, for example, it may be a matter of qemu process counts spawned by
>>>>> rtems-test, and the order in which tests get invoked could be a cause
>>>>> for which ones don't work. I could easily see this happening, since
>>>>> each test runtime will be fairly consistent, so you'll often see the
>>>>> same tests running concurrently with each other. But, if you change
>>>>> the order (e.g., by adding new tests), then we may see a new set of
>>>>> sporadically failing testcases, will we just add those, or do we need
>>>>> to re-examine this indetermine set periodically? Who will maintain
>>>>> this list? That's kind of the root of my concern here.
>>>> I understand your concern about maintenance of the failure list and I don't
>>>> have a good answer for you. I imagine going forward it would be a combination
>>>> of the current stake-holders for a given BSP and anyone who watches the
>>>> automated build output from Joel's runs for these kinds of issues.
>>>>
>>>> On the other hand if we don't mark those tests, people will get fatigued
>>>> looking at the spurious failures and assume any new ones just fall into the
>>>> same category as others. At that point is it even worth running the
>>>> automated tests for that platform?
>>>>
>>>>>> As far as your worry about marking these indeterminate, they're only
>>>>>> being marked as such for
>>>>>> QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
>>>>>> and runs all these tests flawlessly.
>>> Great, this is important.
>>>
>>>>>> These failures become much more common when there is otherwise load on
>>>>>> the system and a
>>>>>> lot of them disappear when you limit the tester to a single QEMU
>>>>>> instance at a time.
>>>>>>
>>>>> I'm wondering if we should sacrifice testing speed for
>>>>> coverage/quality. If throttling rtems-test leads to more reliable test
>>>>> results, then it may be a better option than basically ignoring a
>>>>> swath of our testsuite.
>>>> That would certainly mitigate some of the failures, but you'd also have to
>>>> guarantee nothing else is running on the system which could cause the same
>>>> problem. I know at least some of the current automated runs operate on a
>>>> shared system which can and does often have other intensive processes
>>>> running on it. There are also the tests that are sporadic on QEMU even
>>>> without additional load.
>>> What is it in these tests when combined with qemu that causes the tests to fail?
>>> Is there some relation to a real clock, some shared host resource or a bug in
>>> qemu? I am concerned a simulator can vary like this based on the host's load and
>>> it makes me wonder how people use it on machines to host a number VMs.
>> I experienced very similar results on an ARMv7 BSP (not Zynq) and assumed that this
>> was a known/accepted problem with QEMU when the same issues popped up on
>> AArch64.
> I think we have just ignored issue. I know I have ignored it because of the
> rabbit hole it is.
>
>> My local system under no other load produces these failures for the
>> Zynq A9 QEMU
>> BSP:
>>
>>          "failed": [
>>              "spcpucounter01.exe",
>>              "psxtimes01.exe",
>>              "sp69.exe",
>>              "psx12.exe",
>>              "minimum.exe",
>>              "dl06.exe",
>>              "sptimecounter02.exe"
>>          ],
>>
>> minimum.exe
> We have discussed this test in the past and I think the end result from Joel was
> an exit code of 0 meant it had passed but I am not sure the exit code is printed
> because it is minimal. Maybe it should be changed to be a `no-run` type test?
>
>> and dl06.exe are probably unrelated,
> Yeap and that is one I should fix when I can find the time.
>
>> but the remainder are in my problem set for AArch64 on QEMU.
> OK.
>
>> A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same
>> conditions with all the test carve-outs removed:
>>
>>          "failed": [
>>              "psx12.exe",
>>              "spcpucounter01.exe",
>>              "sptimecounter01.exe",
>>              "sptimecounter02.exe",
>>              "sp04.exe"
>>          ],
>>
>> Because of my experience with the aforementioned ARMv7 BSP and the lack of
>> failures on hardware, I chose not to weed out the root cause of the failures under
>> QEMU.
> Sure. It however leaves the underlying problem about the reasons these fail with
> QEMU and so we caught either way.
>
>> This patch is documentation of our observations across multiple
>> architectures and BSPs running on QEMU more than anything else.
> And also effects the results.
>
>>> I feel with this volume of tests being tagged this way we should have a better
>>> understanding of the problem and so a means to track or not track how to resolve
>>> it. As Gedare has kindly stated once pushed this change disappears into a dark
>>> corner and we have no means to track it.
>>>
>>> The other solution is to set `jobs` to `1` in this BSP's tester config, again
>>> something Gedare has raised. It means we get better or even valid results. What
>>> is more important, valid results or running the testsuite as fast as possible?
>> I fully support dropping the number of jobs to "half" or 1 for better results on
>> QEMU runs that display these problems.
> OK then may be this is the way to go.
I submitted a patch to the mailing list to set jobs=1 on all ARM and 
AArch64 QEMU tester configurations.
>> My comment in that regard was that other system
>> loading (or multiple simultaneous test runs) can also cause the same problem and so
>> this is only a partial solution. Barring a fix for RTEMS or QEMU for these load-
>> dependent and sporadic failures, this at least still needs to be documented in some
>> form.
> Yes and the failures should highlight an issue on the host that needs to be
> looked into.

Since I'm working on SMP and I've had some of those tests failing 
sporadically as well, I took a dive into smpschededf01.exe on AArch64 
and the issue that particular test seems to be encountering is a 
mismatch between the busy wait delay using rtems_test_busy_cpu_usage() 
and the number of kernel ticks that have been experienced. My hypothesis 
is that QEMU is prone to dumping a pile of timer ticks into the virtual 
CPU all at once to catch up to wall time after returning from a context 
switch on the host OS. This would support the observation that failures 
are sporadic and increase under system load.  I instrumented the code 
and can see that the loop in rtems_test_busy_cpu_usage() isn't running 
substantially between these tick interrupts if at all.

I guess my next step is seeing if QEMU has an option to run its timers 
closer to the illusion of metal instead of being based on the wall clock.


Kinsey



More information about the devel mailing list