[PATCH 4/6] testsuite: Add expected-fail to psim

Wed May 13 01:57:09 UTC 2020

On 13/5/20 12:11 am, Gedare Bloom wrote:
> On Tue, May 12, 2020 at 3:11 AM Chris Johns <chrisj at rtems.org> wrote:
>>
>> On 12/5/20 5:15 pm, Sebastian Huber wrote:
>>> Hello,
>>>
>>> On 09/05/2020 03:30, Gedare Bloom wrote:
>>>>>>> Without these tests being tagged this way the user would have no
>>>>>>> idea where the stand after a build and test run and that would mean
>>>>>>> we would have to make sure a release has no failures. I consider
>>>>>>> that as not practical or realistic.
>>>>>> Maybe we need another state, e.g. something-is-broken-please-fix-it.
>>>>> I do not think so, it is implicit in the failure or the test is
>>>>> broken. The only change is to add unexpected-pass, that will be on
>>>>> master after the 5 branch.
>>>>>
>>>> I disagree with this in principle, and it should be reverted after we
>>>> branch 5. It's fine for now to get the release state sync'd, but we
>>>> should find a long-term solution that distinguishes the cases:
>>>> 1. we don't expect this test to pass on this bsp
>>>> 2. we expect this test to pass, but know it doesn't currently
>>>>
>>>> They are two very different things, and I don't like conflating them
>>>> into one "expected-fail" case
>>> originally, I had the same point of view. What I didn't take into
>>> account was the perspective of the tester. Now, I think it is perfectly
>>> fine to flag these tests as expected failure test states. Because right
>>> now, due to some known bugs such as https://devel.rtems.org/ticket/3982
>>> and probably also some more issues, these tests fail. On this BSP and
>>> this RTEMS version, they will always fail. This is not some sort of
>>> random failure. When we change test states to expected failure I think
>>> we should make sure that a ticket exists, which captures that there are
>>> some test results which indicate issues (expected failure test state).
>>> The ticket system is the better place to manage this. We should not use
>>> the test states for this. The test states should be used to figure out
>>> changes between different test runs. They should enable also to quickly
>>> check if the outcome of a test run yields the expected results for a
>>> certain RTEMS version and BSP.
>>
>> Thanks. It is clear to me we lack documentation on this topic and this
>> is an oversight on my part which I will attempt to correct.
>>
> This makes enough sense to me.
> 
>> I have reviewed Dejagnu and considered other things like the withdrawn
>> IEEE 1003.3 standard and there are states we have that need to change
>> but I think the original intent is the right path.
>>
>> The Dejagnu states are documented here:
>>
>> https://www.gnu.org/software/dejagnu/manual/A-POSIX-Conforming-Test-Framework.html#A-POSIX-Conforming-Test-Framework
>>
>> And the exit codes are:
>>
>> https://www.gnu.org/software/dejagnu/manual/Runtest.html#Runtest
>>
>> For me they define the goal and intent.
>>
>> The test states are metadata for the tester so it can determine the
>> result of any given set of tests in relation to the expected state of
>> the test when it was built. You need to detach yourself from being a
>> developer and put yourself in the position of a tester who's task is to
>> give an overall pass or fail for a specific build of RTEMS without
>> needing to consider the specifics of any test, bug or feature.
>>
>> The primary requirement is to allow machine check of the results to
>> determine regressions. A regression is a failure, pass or unresolved
>> result that was not expected.
>>
>> My current thinking for test states are:
>>
>> PASS:
>> The test has succeeded and passed without a failure.
>>
>> UNEXCEPTED-PASS:
> in case you copy-paste, there are a few of these typos for EXPECTED

Brain to fingers wiring issue ... I will sort that out.

> 
>> The test has succeeded when it was expected to fail.
>>
>> FAIL:
>> The test has not succeeded and has failed when it was expected to pass.
>> The failure can be a failed assert, unhandled exception, resource
>> constraint, or a faulty test.
>>
>> EXCEPTED-FAIL:
>> The test has not succeeded and has failed and this is expected.
>>
>> UNRESOLVED:
>> The test has not completed and the result cannot be determined. The
>> result can be unresolved because the test did not start or end, test
>> harness failure, insufficient computing resources for the test harness
>> to function correctly.
>>
>> EXCEPTED-UNRESOLVED:
>> The test has not completed and the result cannot be determined and this
>> is expected.
>>
>> INDETERMINATE:
>> The test has succeeded, has failed or in unresolved. The test is an edge
>> case where the test can pass, can fail, can be unresolved and this is
>> expected.
>>
>> USER-INPUT:
>> The test has not completed and the result is unresolved because it
>> requires user intervention that cannot be provided.
>>
> This USER-INPUT could also be EXPECTED-UNRESOLVED?

Yes. These are states and what we get rtems-test to report is something 
we can work on. For example I think we present the passes, fails and 
unresolved as gross values. The means user-input is counted in the gross 
unresolved count because it is unresolved. Under unresolved we can 
detail its composition and the net unsolved value should be 0 for tier 1 
or specific BSPs.

>> BENCHMARK:
>> The test performs a performance type test. These are currently not
>> supported.
>>
>> UNTESTED:
>> The test has not run and is a place holder for a real test that is not
>> yet provided.
>>
>> UNSUPPORTED:
>> The test is not supported for this build of RTEMS, BSP or architecture.
>>
>> Note:
>>
>> 1. Any expected failures, unresolved, or indeterminate test results are
>> considered faults and require fixing.
>>
>> 2. The nature of a failure cannot be inferred from the test's metadata
>> state.
>>
>> 3. The timeout and invalid states will be merged into UNRESOLVED.
>>
>> 4. The excluded state will be changed to UNSUPPORTED.
>>
>> 5. The metadata is placed in each test because is it an effective way to
>> capture the state. Tests can be run as a group, stand alone or at
>> different location and the test results can determine a regression. The
>> version of the test harness does not need to match the RTEMS build.
>>
>> This list of test states account for some missing states. It also adds
>> some states I do not see being available until we move to a new build
>> system. For UNTESTED and UNSUPPORTED I see a template test being built
>> and run and does nothing. This is important because it means we get a
>> complete set of test results that are complete and consistent for all BSPs.
>>
>> I can attend to this change before releasing 5.1 or it can be done on
>> master and we can determine if it is back ported to 5.2[34..].
>>
> I'm with Joel, this should defer until after release

OK.

>> The change will come with documentation to explain thing a little better.
>>
>> I hope this addresses the issues we have and I am sorry for creating a
>> disturbance so close to a release.
>>
> Thanks for bringing this out it is an important area to keep improving.

I also think it is.

Chris