<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 12, 2020 at 4:11 AM Chris Johns <<a href="mailto:chrisj@rtems.org">chrisj@rtems.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 12/5/20 5:15 pm, Sebastian Huber wrote:<br>
> Hello,<br>
> <br>
> On 09/05/2020 03:30, Gedare Bloom wrote:<br>
>>>>> Without these tests being tagged this way the user would have no <br>
>>>>> idea where the stand after a build and test run and that would mean <br>
>>>>> we would have to make sure a release has no failures. I consider <br>
>>>>> that as not practical or realistic.<br>
>>>> Maybe we need another state, e.g. something-is-broken-please-fix-it.<br>
>>> I do not think so, it is implicit in the failure or the test is <br>
>>> broken. The only change is to add unexpected-pass, that will be on <br>
>>> master after the 5 branch.<br>
>>><br>
>> I disagree with this in principle, and it should be reverted after we<br>
>> branch 5. It's fine for now to get the release state sync'd, but we<br>
>> should find a long-term solution that distinguishes the cases:<br>
>> 1. we don't expect this test to pass on this bsp<br>
>> 2. we expect this test to pass, but know it doesn't currently<br>
>><br>
>> They are two very different things, and I don't like conflating them<br>
>> into one "expected-fail" case<br>
> originally, I had the same point of view. What I didn't take into <br>
> account was the perspective of the tester. Now, I think it is perfectly <br>
> fine to flag these tests as expected failure test states. Because right <br>
> now, due to some known bugs such as <a href="https://devel.rtems.org/ticket/3982" rel="noreferrer" target="_blank">https://devel.rtems.org/ticket/3982</a> <br>
> and probably also some more issues, these tests fail. On this BSP and <br>
> this RTEMS version, they will always fail. This is not some sort of <br>
> random failure. When we change test states to expected failure I think <br>
> we should make sure that a ticket exists, which captures that there are <br>
> some test results which indicate issues (expected failure test state). <br>
> The ticket system is the better place to manage this. We should not use <br>
> the test states for this. The test states should be used to figure out <br>
> changes between different test runs. They should enable also to quickly <br>
> check if the outcome of a test run yields the expected results for a <br>
> certain RTEMS version and BSP.<br>
<br>
Thanks. It is clear to me we lack documentation on this topic and this <br>
is an oversight on my part which I will attempt to correct.<br>
<br>
I have reviewed Dejagnu and considered other things like the withdrawn <br>
IEEE 1003.3 standard and there are states we have that need to change <br>
but I think the original intent is the right path.<br>
<br>
The Dejagnu states are documented here:<br>
<br>
<a href="https://www.gnu.org/software/dejagnu/manual/A-POSIX-Conforming-Test-Framework.html#A-POSIX-Conforming-Test-Framework" rel="noreferrer" target="_blank">https://www.gnu.org/software/dejagnu/manual/A-POSIX-Conforming-Test-Framework.html#A-POSIX-Conforming-Test-Framework</a><br>
<br>
And the exit codes are:<br>
<br>
<a href="https://www.gnu.org/software/dejagnu/manual/Runtest.html#Runtest" rel="noreferrer" target="_blank">https://www.gnu.org/software/dejagnu/manual/Runtest.html#Runtest</a><br>
<br>
For me they define the goal and intent.<br>
<br>
The test states are metadata for the tester so it can determine the <br>
result of any given set of tests in relation to the expected state of <br>
the test when it was built. You need to detach yourself from being a <br>
developer and put yourself in the position of a tester who's task is to <br>
give an overall pass or fail for a specific build of RTEMS without <br>
needing to consider the specifics of any test, bug or feature.<br>
<br>
The primary requirement is to allow machine check of the results to <br>
determine regressions. A regression is a failure, pass or unresolved <br>
result that was not expected.<br>
<br>
My current thinking for test states are:<br>
<br>
PASS:<br>
The test has succeeded and passed without a failure.<br>
<br>
UNEXCEPTED-PASS:<br>
The test has succeeded when it was expected to fail.<br>
<br>
FAIL:<br>
The test has not succeeded and has failed when it was expected to pass. <br>
The failure can be a failed assert, unhandled exception, resource <br>
constraint, or a faulty test.<br>
<br>
EXCEPTED-FAIL:<br>
The test has not succeeded and has failed and this is expected.<br>
<br>
UNRESOLVED:<br>
The test has not completed and the result cannot be determined. The <br>
result can be unresolved because the test did not start or end, test <br>
harness failure, insufficient computing resources for the test harness <br>
to function correctly.<br>
<br>
EXCEPTED-UNRESOLVED:<br>
The test has not completed and the result cannot be determined and this <br>
is expected.<br>
<br>
INDETERMINATE:<br>
The test has succeeded, has failed or in unresolved. The test is an edge <br>
case where the test can pass, can fail, can be unresolved and this is <br>
expected.<br>
<br>
USER-INPUT:<br>
The test has not completed and the result is unresolved because it <br>
requires user intervention that cannot be provided.<br>
<br>
BENCHMARK:<br>
The test performs a performance type test. These are currently not <br>
supported.<br>
<br>
UNTESTED:<br>
The test has not run and is a place holder for a real test that is not <br>
yet provided.<br>
<br>
UNSUPPORTED:<br>
The test is not supported for this build of RTEMS, BSP or architecture.<br>
<br>
Note:<br>
<br>
1. Any expected failures, unresolved, or indeterminate test results are <br>
considered faults and require fixing.<br>
<br>
2. The nature of a failure cannot be inferred from the test's metadata <br>
state.<br>
<br>
3. The timeout and invalid states will be merged into UNRESOLVED.<br>
<br>
4. The excluded state will be changed to UNSUPPORTED.<br>
<br>
5. The metadata is placed in each test because is it an effective way to <br>
capture the state. Tests can be run as a group, stand alone or at <br>
different location and the test results can determine a regression. The <br>
version of the test harness does not need to match the RTEMS build.<br></blockquote><div><br></div><div>Not to be dense but what state do tests which fail but have not been investigated</div><div>yet go? GCC just leaves those as FAIL and releases happen with those on </div><div>secondary targets. I know FAILs are undesirable for primary targets with</div><div>GCC. </div><div><br></div><div>I don't want to see a test that fails but we don't know why binned somewhere </div><div>it will never get investigated. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
<br>
This list of test states account for some missing states. It also adds <br>
some states I do not see being available until we move to a new build <br>
system. For UNTESTED and UNSUPPORTED I see a template test being built <br>
and run and does nothing. This is important because it means we get a <br>
complete set of test results that are complete and consistent for all BSPs.<br>
<br>
I can attend to this change before releasing 5.1 or it can be done on <br>
master and we can determine if it is back ported to 5.2[34..].<br></blockquote><div><br></div><div>I have previously stated that this is a good goal but it is moving the goal line</div><div>for the 5.x releases. I would propose we be happy with the fact we have </div><div>reported test results at all for the first time a release happens. Let's not</div><div>let the perfect be the enemy of the good. In this case, the good is quite</div><div>a bit better than previous releases. We need to be more conscious of this</div><div><br></div><div>I'm also concerned this task is bigger than you think based solely on the number</div><div>of BSPs we have and the number we can execute tests on simulators. My</div><div>build sweep has at least 21 BSPs (hand count) it is testing on simulator and </div><div>I didn't count the handful of qemu based ones.</div><div><br></div><div>To get an accurate assessment, I think you would have to temporarily</div><div>let all tests build for a BSP so you would know which are disabled because</div><div>they don't fit. Then the rest of the tests in the .tcfg which are not ld overflow</div><div>issues would have to be examined and categorized. I don't think the </div><div>current list of "don't build" tests fits nicely into one of the new categories.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
The change will come with documentation to explain thing a little better.<br></blockquote><div><br></div><div>+1 </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I hope this addresses the issues we have and I am sorry for creating a <br>
disturbance so close to a release.<br></blockquote><div><br></div><div>It's a good goal but I think the timing is wrong.</div><div><br></div><div>--joel </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Chris<br>
_______________________________________________<br>
devel mailing list<br>
<a href="mailto:devel@rtems.org" target="_blank">devel@rtems.org</a><br>
<a href="http://lists.rtems.org/mailman/listinfo/devel" rel="noreferrer" target="_blank">http://lists.rtems.org/mailman/listinfo/devel</a><br>
</blockquote></div></div>