[PATCH] aarch64: Add tests that are failing intermittently

Sat Aug 21 03:06:06 UTC 2021

On 21/8/21 2:38 am, Kinsey Moore wrote:
> On 8/19/2021 18:03, Chris Johns wrote:
>> On 20/8/21 4:55 am, Kinsey Moore wrote:
>>> On 8/19/2021 13:32, Gedare Bloom wrote:
>>>> On Thu, Aug 19, 2021 at 11:43 AM Kinsey Moore <kinsey.moore at oarcorp.com> wrote:
>>>>> I've seen these failures on my local system, in our CI, and on a build
>>>>> server that I sometimes
>>>>> use for development/testing so if it's a configuration issue we're being
>>>>> pretty consistent about
>>>>> misconfiguration across some pretty different environments (docker,
>>>>> bare-metal, VM, different
>>>>> OSs, different QEMU versions). I've seen enough of the spintrcritical
>>>>> tests fail sporadically on
>>>>> QEMU to lump them all into this category. These are also tests that I
>>>>> have seen behave badly
>>>>> on ARMv7 QEMU on my local system (which doesn't rule out
>>>>> misconfiguration, but it's another
>>>>> data point).
>>>>>
>>>> Yes, for example, it may be a matter of qemu process counts spawned by
>>>> rtems-test, and the order in which tests get invoked could be a cause
>>>> for which ones don't work. I could easily see this happening, since
>>>> each test runtime will be fairly consistent, so you'll often see the
>>>> same tests running concurrently with each other. But, if you change
>>>> the order (e.g., by adding new tests), then we may see a new set of
>>>> sporadically failing testcases, will we just add those, or do we need
>>>> to re-examine this indetermine set periodically? Who will maintain
>>>> this list? That's kind of the root of my concern here.
>>> I understand your concern about maintenance of the failure list and I don't
>>> have a good answer for you. I imagine going forward it would be a combination
>>> of the current stake-holders for a given BSP and anyone who watches the
>>> automated build output from Joel's runs for these kinds of issues.
>>>
>>> On the other hand if we don't mark those tests, people will get fatigued
>>> looking at the spurious failures and assume any new ones just fall into the
>>> same category as others. At that point is it even worth running the
>>> automated tests for that platform?
>>>
>>>>> As far as your worry about marking these indeterminate, they're only
>>>>> being marked as such for
>>>>> QEMU BSPs. The ZynqMP hardware BSP doesn't have these testing carve-outs
>>>>> and runs all these tests flawlessly.
>> Great, this is important.
>>
>>>>> These failures become much more common when there is otherwise load on
>>>>> the system and a
>>>>> lot of them disappear when you limit the tester to a single QEMU
>>>>> instance at a time.
>>>>>
>>>> I'm wondering if we should sacrifice testing speed for
>>>> coverage/quality. If throttling rtems-test leads to more reliable test
>>>> results, then it may be a better option than basically ignoring a
>>>> swath of our testsuite.
>>> That would certainly mitigate some of the failures, but you'd also have to
>>> guarantee nothing else is running on the system which could cause the same
>>> problem. I know at least some of the current automated runs operate on a
>>> shared system which can and does often have other intensive processes
>>> running on it. There are also the tests that are sporadic on QEMU even
>>> without additional load.
>> What is it in these tests when combined with qemu that causes the tests to fail?
>> Is there some relation to a real clock, some shared host resource or a bug in
>> qemu? I am concerned a simulator can vary like this based on the host's load and
>> it makes me wonder how people use it on machines to host a number VMs.
> I experienced very similar results on an ARMv7 BSP (not Zynq) and assumed that this
> was a known/accepted problem with QEMU when the same issues popped up on
> AArch64.

I think we have just ignored issue. I know I have ignored it because of the
rabbit hole it is.

> My local system under no other load produces these failures for the
> Zynq A9 QEMU
> BSP:
> 
>         "failed": [
>             "spcpucounter01.exe",
>             "psxtimes01.exe",
>             "sp69.exe",
>             "psx12.exe",
>             "minimum.exe",
>             "dl06.exe",
>             "sptimecounter02.exe"
>         ],
> 
> minimum.exe 

We have discussed this test in the past and I think the end result from Joel was
an exit code of 0 meant it had passed but I am not sure the exit code is printed
because it is minimal. Maybe it should be changed to be a `no-run` type test?

> and dl06.exe are probably unrelated,

Yeap and that is one I should fix when I can find the time.

> but the remainder are in my problem set for AArch64 on QEMU.

OK.

> A run of the AArch64 ZynqMP ILP32 BSP produced these failures under the same
> conditions with all the test carve-outs removed:
> 
>         "failed": [
>             "psx12.exe",
>             "spcpucounter01.exe",
>             "sptimecounter01.exe",
>             "sptimecounter02.exe",
>             "sp04.exe"
>         ],
> 
> Because of my experience with the aforementioned ARMv7 BSP and the lack of
> failures on hardware, I chose not to weed out the root cause of the failures under
> QEMU.

Sure. It however leaves the underlying problem about the reasons these fail with
QEMU and so we caught either way.

> This patch is documentation of our observations across multiple
> architectures and BSPs running on QEMU more than anything else.

And also effects the results.

>> I feel with this volume of tests being tagged this way we should have a better
>> understanding of the problem and so a means to track or not track how to resolve
>> it. As Gedare has kindly stated once pushed this change disappears into a dark
>> corner and we have no means to track it.
>>
>> The other solution is to set `jobs` to `1` in this BSP's tester config, again
>> something Gedare has raised. It means we get better or even valid results. What
>> is more important, valid results or running the testsuite as fast as possible?
> I fully support dropping the number of jobs to "half" or 1 for better results on
> QEMU runs that display these problems. 

OK then may be this is the way to go.

> My comment in that regard was that other system
> loading (or multiple simultaneous test runs) can also cause the same problem and so
> this is only a partial solution. Barring a fix for RTEMS or QEMU for these load-
> dependent and sporadic failures, this at least still needs to be documented in some
> form.

Yes and the failures should highlight an issue on the host that needs to be
looked into.

Chris