How to Classify Intermittent Test Failures

Tue Feb 2 14:40:34 UTC 2021

On Mon, Feb 1, 2021 at 6:50 PM Chris Johns <chrisj at rtems.org> wrote:

>
>
> On 2/2/21 9:12 am, Joel Sherrill wrote:
> >  On Mon, Feb 1, 2021 at 3:50 PM Chris Johns <chrisj at rtems.org
> > <mailto:chrisj at rtems.org>> wrote:
> >     On 2/2/21 3:42 am, Joel Sherrill wrote:
> >     > Hi
> >     >
> >     > On the aarch64 qemu testing, we are seeing some tests which seem
> to pass
> >     most of
> >     > the time but fail intermittently. It appears to be based somewhat
> on host load
> >     > but there may be other factors.
> >     >
> >     > There does not appear to be a good test results state for these.
> Marking them
> >     > expected pass or fail means they will get flagged incorrectly
> sometimes.
> >
> >     We have the test state 'indeterminate' ...
> >
> >
> https://docs.rtems.org/branches/master/user/testing/tests.html#expected-test-states
> >     <
> https://docs.rtems.org/branches/master/user/testing/tests.html#expected-test-states
> >
> >
> >     It is for this type of test result.
> >
> >     > I don't see not running them as a good option. Beyond adding a new
> state to
> >     > reflect this oddity, any suggestions?
> >
> >     I prefer we used the already defined and documented state.
> >
> > +1
> >
> > Kinsey had already marked them as indeterminate and the guys were in the
> > process of documenting why. I interpreted the question of what to do
> more
> > broadly than it needed to be but the discussion was good.
>
> A discussion is needed and welcome. Handling these intermittent simulator
> failures is hard. I once looked into some gdb simulator cases when I first
> put
> rtems-test together and found myself quickly heading into a deep dark
> hole. I
> have not been back since.
>

Agreed it is ugly.

If the BSP has a simulator variant, then using the test configuration is
appropriate.

But for the PC and leon3, we don't have separate sim builds of the BSP so
if
there are intermittent failures there, we would have to mark them in the
set shared with hardware test runs. That's bad.

It's almost like we might need a conditional like "sp04: intermittent
sim=qemu"
or something. Which means build it but the tester ini could know the
simulator
type and adjust its expectations. May have to account for multiple
simulators
on the sim=XXX though. Just a thought.

This is a dark hole.

>
> Chris
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20210202/4d022522/attachment.html>