[PATCH 4/6] testsuite: Add expected-fail to psim

Sat May 9 08:09:02 UTC 2020

On 9/5/20 11:30 am, Gedare Bloom wrote:
> On Wed, May 6, 2020 at 5:12 AM Chris Johns <chrisj at rtems.org> wrote:
>>
>>
>>> On 6 May 2020, at 8:15 pm, Sebastian Huber <sebastian.huber at embedded-brains.de> wrote:
>>>
>>> On 06/05/2020 12:00, Chris Johns wrote:
>>>
>>>>> On 6/5/20 7:35 pm, Sebastian Huber wrote:
>>>>>> On 06/05/2020 10:41, chrisj at rtems.org wrote:
>>>>>
>>>>>> From: Chris Johns<chrisj at rtems.org>
>>>>>>
>>>>>> Updates #2962
>>>>>> ---
>>>>>>    bsps/powerpc/psim/config/psim-testsuite.tcfg | 22 ++++++++++++++++++++
>>>>>>    1 file changed, 22 insertions(+)
>>>>>>    create mode 100644 bsps/powerpc/psim/config/psim-testsuite.tcfg
>>>>>>
>>>>>> diff --git a/bsps/powerpc/psim/config/psim-testsuite.tcfg b/bsps/powerpc/psim/config/psim-testsuite.tcfg
>>>>>> new file mode 100644
>>>>>> index 0000000000..b0d2a05086
>>>>>> --- /dev/null
>>>>>> +++ b/bsps/powerpc/psim/config/psim-testsuite.tcfg
>>>>>> @@ -0,0 +1,22 @@
>>>>>> +#
>>>>>> +# PSIM RTEMS Test Database.
>>>>>> +#
>>>>>> +# Format is one line per test that is_NOT_  built.
>>>>>> +#
>>>>>> +
>>>>>> +expected-fail: fsimfsgeneric01
>>>>>> +expected-fail: block11
>>>>>> +expected-fail: rbheap01
>>>>>> +expected-fail: termios01
>>>>>> +expected-fail: ttest01
>>>>>> +expected-fail: psx12
>>>>>> +expected-fail: psxchroot01
>>>>>> +expected-fail: psxfenv01
>>>>>> +expected-fail: psximfs02
>>>>>> +expected-fail: psxpipe01
>>>>>> +expected-fail: spextensions01
>>>>>> +expected-fail: spfatal31
>>>>>> +expected-fail: spfifo02
>>>>>> +expected-fail: spmountmgr01
>>>>>> +expected-fail: spprivenv01
>>>>>> +expected-fail: spstdthreads01
>>>>>
>>>>> I don't think these tests are expected to fail. If they fail, then there is a bug somewhere.
>>>>
>>>> Yes we hope no tests fail but they can and do. Excluding tests because they fail would be incorrect. In the 5.1 release these bugs are present so we expect, or maybe it should say, we know the test will fail. With this change any thing that appears in the failure column is "unexpected" and that means the user build of the release does not match the state we "expect" and it is worth investigation by the user.
>>>>
>>>> Without these tests being tagged this way the user would have no idea where the stand after a build and test run and that would mean we would have to make sure a release has no failures. I consider that as not practical or realistic.
>>> Maybe we need another state, e.g. something-is-broken-please-fix-it.
>>
>> I do not think so, it is implicit in the failure or the test is broken. The only change is to add unexpected-pass, that will be on master after the 5 branch.
>>
> 
> I disagree with this in principle, 

I did not invent this, it is borrowed from gcc. I considered their 
mature test model as OK to follow. Look for "How to interpret test 
results" in https://gcc.gnu.org/install/test.html.

We have ...

https://docs.rtems.org/branches/master/user/testing/tests.html#test-controls

Is the principle the two points below?

> and it should be reverted after we branch 5. 

I would like to understand how regressions are to be tracked before we 
revert the change. Until this change you could not track them. We need 
to capture the state somehow and I view capturing the state in the tests 
themselves as the best method.

> It's fine for now to get the release state sync'd, but we

I am not following why we would only tracking regressions on a release 
branch?

> should find a long-term solution that distinguishes the cases:
> 1. we don't expect this test to pass on this bsp

If a test cannot pass on a BSP for a specific reasons it is excluded and 
not built, e.g. not enough memory, single core. A test is expected to 
fail because of a bug or missing feature we are not or cannot fix or 
implement so we tag it as expected-fail or by default the test is tagged 
as expected-pass. If a test may or may not pass because of some edge 
case in a BSP it can be tagged 'indeterminate'.

> 2. we expect this test to pass, but know it doesn't currently

This depends on a point in time. After a change I make I would consider 
this a regression and I would need to see what I have done in my change 
to cause it. For this to happen we need a baseline where the tests that 
fail because of a known bug or missing feature at the time I add my 
change are tagged as expected to fail.

An example is dl06 on the beagleboneblack:

https://lists.rtems.org/pipermail/build/2020-May/014695.html

The RAP needs to support trampolines and it does not so the test is 
expected to fail.

An example of a regression is a test that passes in a specific build 
configuration and fails in another. These recent psim results from Joel 
show this where the build without RTEMS_DEBUG passes and with 
RTEMS_DEBUG fails. Here there are 2 regressions:

https://lists.rtems.org/pipermail/build/2020-May/014943.html
https://lists.rtems.org/pipermail/build/2020-May/014946.html

The regression in fsrfsbitmap01.exe with RTEMS_DEBUG explains the 
timeout in the no RTEMS_DEBUG version. I had not noticed this before. 
They are hard to notice without a baseline in each BSP and expecting us 
to have 100% pass on all BSPs in all testing configurations, especially 
simulation, is too hard.

My hope is a simple rule "If you do not see 0 fails you need to check 
your changes".

> They are two very different things, and I don't like conflating them
> into one "expected-fail" case

Sorry, I am not following. Would you be able to provide some examples 
for 1. and 2. that may help me understand the issue?

Chris