How to Classify Intermittent Test Failures

Tue Feb 2 23:28:33 UTC 2021

On 3/2/21 3:13 am, Gedare Bloom wrote:
> On Tue, Feb 2, 2021 at 7:40 AM Joel Sherrill <joel at rtems.org
> <mailto:joel at rtems.org>> wrote:
>     On Mon, Feb 1, 2021 at 6:50 PM Chris Johns <chrisj at rtems.org
>     <mailto:chrisj at rtems.org>> wrote:
>         On 2/2/21 9:12 am, Joel Sherrill wrote:
>         >  On Mon, Feb 1, 2021 at 3:50 PM Chris Johns <chrisj at rtems.org
>         <mailto:chrisj at rtems.org>
>         > <mailto:chrisj at rtems.org <mailto:chrisj at rtems.org>>> wrote:
>         >     On 2/2/21 3:42 am, Joel Sherrill wrote:
>         >     > Hi
>         >     >
>         >     > On the aarch64 qemu testing, we are seeing some tests which seem
>         to pass
>         >     most of
>         >     > the time but fail intermittently. It appears to be based
>         somewhat on host load
>         >     > but there may be other factors. 
>         >     >
>         >     > There does not appear to be a good test results state for these.
>         Marking them
>         >     > expected pass or fail means they will get flagged incorrectly
>         sometimes.
>         >
>         >     We have the test state 'indeterminate' ...
>         >
>         >   
>          https://docs.rtems.org/branches/master/user/testing/tests.html#expected-test-states
>         <https://docs.rtems.org/branches/master/user/testing/tests.html#expected-test-states>
>         >   
>          <https://docs.rtems.org/branches/master/user/testing/tests.html#expected-test-states
>         <https://docs.rtems.org/branches/master/user/testing/tests.html#expected-test-states>>
>         >
>         >     It is for this type of test result.
>         >
>         >     > I don't see not running them as a good option. Beyond adding a
>         new state to
>         >     > reflect this oddity, any suggestions?
>         >
>         >     I prefer we used the already defined and documented state.
>         >
>         > +1 
>         >
>         > Kinsey had already marked them as indeterminate and the guys were in the 
>         > process of documenting why. I interpreted the question of what to do more 
>         > broadly than it needed to be but the discussion was good.
> 
>         A discussion is needed and welcome. Handling these intermittent simulator
>         failures is hard. I once looked into some gdb simulator cases when I
>         first put
>         rtems-test together and found myself quickly heading into a deep dark
>         hole. I
>         have not been back since.
> 
> 
>     Agreed it is ugly.
> 
>     If the BSP has a simulator variant, then using the test configuration is
>     appropriate.
> 
>     But for the PC and leon3, we don't have separate sim builds of the BSP so if 
>     there are intermittent failures there, we would have to mark them in the 
>     set shared with hardware test runs. That's bad.
> 
> yeah, don't do that.

Agreed. Simulation is nice and important but it is a second tier test and
development frame work and real hardware is tier 1. I am not in favour of a
cloned BSP to indication the intended platform is a simulator to categorise
these types of failures.

As I stated before this is a deep hole. For a BSP like the PC, LEON3, ARM and
RISCV setting _any_ test state based on a result when using a simulator requires
you perform extensive testing on hardware to determine the test result is
specific to simulation and not a general failure. If it is specific to
simulation you then need to ask the question, does the simulator have an issue
or does the test itself have some issues that get exposed on a loaded server
running multiple simulations? For example does the simulated timer's clock track
the CPU cycles simulated when loaded?

>     It's almost like we might need a conditional like "sp04: intermittent  sim=qemu"
>     or something. Which means build it but the tester ini could know the simulator
>     type and adjust its expectations. May have to account for multiple simulators
>     on the sim=XXX though. Just a thought. 
> 
> maybe pass a sim.tcfg file to tester that is different for the sim.cfg file than
> it is for the hw.cfg file?

Testing does not work this way. A test executable by design contains all the
information about the test and the outcome. Managing external files with state
information is something I decided was too difficult and fragile at best.

Chris