<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 12, 2020 at 4:11 AM Chris Johns <<a href="mailto:chrisj@rtems.org">chrisj@rtems.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 12/5/20 5:15 pm, Sebastian Huber wrote:<br>

> Hello,<br>

> <br>

> On 09/05/2020 03:30, Gedare Bloom wrote:<br>

>>>>> Without these tests being tagged this way the user would have no <br>

>>>>> idea where the stand after a build and test run and that would mean <br>

>>>>> we would have to make sure a release has no failures. I consider <br>

>>>>> that as not practical or realistic.<br>

>>>> Maybe we need another state, e.g. something-is-broken-please-fix-it.<br>

>>> I do not think so, it is implicit in the failure or the test is <br>

>>> broken. The only change is to add unexpected-pass, that will be on <br>

>>> master after the 5 branch.<br>

>>><br>

>> I disagree with this in principle, and it should be reverted after we<br>

>> branch 5. It's fine for now to get the release state sync'd, but we<br>

>> should find a long-term solution that distinguishes the cases:<br>

>> 1. we don't expect this test to pass on this bsp<br>

>> 2. we expect this test to pass, but know it doesn't currently<br>

>><br>

>> They are two very different things, and I don't like conflating them<br>

>> into one "expected-fail" case<br>

> originally, I had the same point of view. What I didn't take into <br>

> account was the perspective of the tester. Now, I think it is perfectly <br>

> fine to flag these tests as expected failure test states. Because right <br>

> now, due to some known bugs such as <a href="https://devel.rtems.org/ticket/3982" rel="noreferrer" target="_blank">https://devel.rtems.org/ticket/3982</a> <br>

> and probably also some more issues, these tests fail. On this BSP and <br>

> this RTEMS version, they will always fail. This is not some sort of <br>

> random failure. When we change test states to expected failure I think <br>

> we should make sure that a ticket exists, which captures that there are <br>

> some test results which indicate issues (expected failure test state). <br>

> The ticket system is the better place to manage this. We should not use <br>

> the test states for this. The test states should be used to figure out <br>

> changes between different test runs. They should enable also to quickly <br>

> check if the outcome of a test run yields the expected results for a <br>

> certain RTEMS version and BSP.<br>

<br>

Thanks. It is clear to me we lack documentation on this topic and this <br>

is an oversight on my part which I will attempt to correct.<br>

<br>

I have reviewed Dejagnu and considered other things like the withdrawn <br>

IEEE 1003.3 standard and there are states we have that need to change <br>

but I think the original intent is the right path.<br>

<br>

The Dejagnu states are documented here:<br>

<br>

<a href="https://www.gnu.org/software/dejagnu/manual/A-POSIX-Conforming-Test-Framework.html#A-POSIX-Conforming-Test-Framework" rel="noreferrer" target="_blank">https://www.gnu.org/software/dejagnu/manual/A-POSIX-Conforming-Test-Framework.html#A-POSIX-Conforming-Test-Framework</a><br>

<br>

And the exit codes are:<br>

<br>

<a href="https://www.gnu.org/software/dejagnu/manual/Runtest.html#Runtest" rel="noreferrer" target="_blank">https://www.gnu.org/software/dejagnu/manual/Runtest.html#Runtest</a><br>

<br>

For me they define the goal and intent.<br>

<br>

The test states are metadata for the tester so it can determine the <br>

result of any given set of tests in relation to the expected state of <br>

the test when it was built. You need to detach yourself from being a <br>

developer and put yourself in the position of a tester who's task is to <br>

give an overall pass or fail for a specific build of RTEMS without <br>

needing to consider the specifics of any test, bug or feature.<br>

<br>

The primary requirement is to allow machine check of the results to <br>

determine regressions. A regression is a failure, pass or unresolved <br>

result that was not expected.<br>

<br>

My current thinking for test states are:<br>

<br>

PASS:<br>

The test has succeeded and passed without a failure.<br>

<br>

UNEXCEPTED-PASS:<br>

The test has succeeded when it was expected to fail.<br>

<br>

FAIL:<br>

The test has not succeeded and has failed when it was expected to pass. <br>

The failure can be a failed assert, unhandled exception, resource <br>

constraint, or a faulty test.<br>

<br>

EXCEPTED-FAIL:<br>

The test has not succeeded and has failed and this is expected.<br>

<br>

UNRESOLVED:<br>

The test has not completed and the result cannot be determined. The <br>

result can be unresolved because the test did not start or end, test <br>

harness failure, insufficient computing resources for the test harness <br>

to function correctly.<br>

<br>

EXCEPTED-UNRESOLVED:<br>

The test has not completed and the result cannot be determined and this <br>

is expected.<br>

<br>

INDETERMINATE:<br>

The test has succeeded, has failed or in unresolved. The test is an edge <br>

case where the test can pass, can fail, can be unresolved and this is <br>

expected.<br>

<br>

USER-INPUT:<br>

The test has not completed and the result is unresolved because it <br>

requires user intervention that cannot be provided.<br>

<br>

BENCHMARK:<br>

The test performs a performance type test. These are currently not <br>

supported.<br>

<br>

UNTESTED:<br>

The test has not run and is a place holder for a real test that is not <br>

yet provided.<br>

<br>

UNSUPPORTED:<br>

The test is not supported for this build of RTEMS, BSP or architecture.<br>

<br>

Note:<br>

<br>

1. Any expected failures, unresolved, or indeterminate test results are <br>

considered faults and require fixing.<br>

<br>

2. The nature of a failure cannot be inferred from the test's metadata <br>

state.<br>

<br>

3. The timeout and invalid states will be merged into UNRESOLVED.<br>

<br>

4. The excluded state will be changed to UNSUPPORTED.<br>

<br>

5. The metadata is placed in each test because is it an effective way to <br>

capture the state. Tests can be run as a group, stand alone or at <br>

different location and the test results can determine a regression. The <br>

version of the test harness does not need to match the RTEMS build.<br></blockquote><div><br></div><div>Not to be dense but what state do tests which fail but have not been investigated</div><div>yet go? GCC just leaves those as FAIL and releases happen with those on </div><div>secondary targets.  I know FAILs are undesirable for primary targets with</div><div>GCC. </div><div><br></div><div>I don't want to see a test that fails but we don't know why binned somewhere </div><div>it will never get investigated. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

<br>

<br>

This list of test states account for some missing states. It also adds <br>

some states I do not see being available until we move to a new build <br>

system. For UNTESTED and UNSUPPORTED I see a template test being built <br>

and run and does nothing. This is important because it means we get a <br>

complete set of test results that are complete and consistent for all BSPs.<br>

<br>

I can attend to this change before releasing 5.1 or it can be done on <br>

master and we can determine if it is back ported to 5.2[34..].<br></blockquote><div><br></div><div>I have previously stated that this is a good goal but it is moving the goal line</div><div>for the 5.x releases. I would propose we be happy with the fact we have </div><div>reported test results at all for the first time a release happens.  Let's not</div><div>let the perfect be the enemy of the good. In this case, the good is quite</div><div>a bit better than previous releases. We need to be more conscious of this</div><div><br></div><div>I'm also concerned this task is bigger than you think based solely on the number</div><div>of BSPs we have and the number we can execute tests on simulators. My</div><div>build sweep has at least 21 BSPs (hand count) it is testing on simulator and </div><div>I didn't count the handful of qemu based ones.</div><div><br></div><div>To get an accurate assessment, I think you would have to temporarily</div><div>let all tests build for a BSP so you would know which are disabled because</div><div>they don't fit. Then the rest of the tests in the .tcfg which are not ld overflow</div><div>issues would have to be examined and categorized.  I don't think the </div><div>current list of "don't build" tests fits nicely into one of the new categories.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

The change will come with documentation to explain thing a little better.<br></blockquote><div><br></div><div>+1 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

I hope this addresses the issues we have and I am sorry for creating a <br>

disturbance so close to a release.<br></blockquote><div><br></div><div>It's a good goal but I think the timing is wrong.</div><div><br></div><div>--joel </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

Chris<br>

_______________________________________________<br>

devel mailing list<br>

<a href="mailto:devel@rtems.org" target="_blank">devel@rtems.org</a><br>

<a href="http://lists.rtems.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.rtems.org/mailman/listinfo/devel</a><br>

</blockquote></div></div>