<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 9, 2020, 6:56 PM Chris Johns <<a href="mailto:chrisj@rtems.org">chrisj@rtems.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/5/20 9:24 am, Joel Sherrill wrote:<br>

> <br>

> <br>

> On Sat, May 9, 2020, 6:18 PM Chris Johns <<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <br>

> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>> wrote:<br>

> <br>

>     On 10/5/20 4:57 am, Joel Sherrill wrote:<br>

>      ><br>

>      ><br>

>      > On Sat, May 9, 2020, 1:02 PM Gedare Bloom <<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a><br>

>     <mailto:<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a>><br>

>      > <mailto:<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a> <mailto:<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a>>>> wrote:<br>

>      ><br>

>      >     On Sat, May 9, 2020 at 2:09 AM Chris Johns <<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>

>     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>><br>

>      >     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>>> wrote:<br>

>      >      ><br>

>      >      > On 9/5/20 11:30 am, Gedare Bloom wrote:<br>

>      >      > > On Wed, May 6, 2020 at 5:12 AM Chris Johns<br>

>     <<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>><br>

>      >     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>>> wrote:<br>

>      >      > >><br>

>      >      > >><br>

>      >      > >>> On 6 May 2020, at 8:15 pm, Sebastian Huber<br>

>      >     <<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a><br>

>     <mailto:<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a>><br>

>      >     <mailto:<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a><br>

>     <mailto:<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a>>>> wrote:<br>

>      >      > >>><br>

>      >      > >>> On 06/05/2020 12:00, Chris Johns wrote:<br>

>      >      > >>><br>

>      >      > >>>>> On 6/5/20 7:35 pm, Sebastian Huber wrote:<br>

>      >      > >>>>>> On 06/05/2020 10:41, <a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>

>     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>><br>

>      >     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>> wrote:<br>

>      >      > >>>>><br>

>      >      > >>>>>> From: Chris Johns<<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>

>     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>

>     <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>>><br>

>      >      > >>>>>><br>

>      >      > >>>>>> Updates #2962<br>

>      >      > >>>>>> ---<br>

>      >      > >>>>>>    bsps/powerpc/psim/config/psim-testsuite.tcfg | 22<br>

>      >     ++++++++++++++++++++<br>

>      >      > >>>>>>    1 file changed, 22 insertions(+)<br>

>      >      > >>>>>>    create mode 100644<br>

>      >     bsps/powerpc/psim/config/psim-testsuite.tcfg<br>

>      >      > >>>>>><br>

>      >      > >>>>>> diff --git<br>

>     a/bsps/powerpc/psim/config/psim-testsuite.tcfg<br>

>      >     b/bsps/powerpc/psim/config/psim-testsuite.tcfg<br>

>      >      > >>>>>> new file mode 100644<br>

>      >      > >>>>>> index 0000000000..b0d2a05086<br>

>      >      > >>>>>> --- /dev/null<br>

>      >      > >>>>>> +++ b/bsps/powerpc/psim/config/psim-testsuite.tcfg<br>

>      >      > >>>>>> @@ -0,0 +1,22 @@<br>

>      >      > >>>>>> +#<br>

>      >      > >>>>>> +# PSIM RTEMS Test Database.<br>

>      >      > >>>>>> +#<br>

>      >      > >>>>>> +# Format is one line per test that is_NOT_  built.<br>

>      >      > >>>>>> +#<br>

>      >      > >>>>>> +<br>

>      >      > >>>>>> +expected-fail: fsimfsgeneric01<br>

>      >      > >>>>>> +expected-fail: block11<br>

>      >      > >>>>>> +expected-fail: rbheap01<br>

>      >      > >>>>>> +expected-fail: termios01<br>

>      >      > >>>>>> +expected-fail: ttest01<br>

>      >      > >>>>>> +expected-fail: psx12<br>

>      >      > >>>>>> +expected-fail: psxchroot01<br>

>      >      > >>>>>> +expected-fail: psxfenv01<br>

>      >      > >>>>>> +expected-fail: psximfs02<br>

>      >      > >>>>>> +expected-fail: psxpipe01<br>

>      >      > >>>>>> +expected-fail: spextensions01<br>

>      >      > >>>>>> +expected-fail: spfatal31<br>

>      >      > >>>>>> +expected-fail: spfifo02<br>

>      >      > >>>>>> +expected-fail: spmountmgr01<br>

>      >      > >>>>>> +expected-fail: spprivenv01<br>

>      >      > >>>>>> +expected-fail: spstdthreads01<br>

>      >      > >>>>><br>

>      >      > >>>>> I don't think these tests are expected to fail. If they<br>

>      >     fail, then there is a bug somewhere.<br>

>      >      > >>>><br>

>      >      > >>>> Yes we hope no tests fail but they can and do. Excluding<br>

>      >     tests because they fail would be incorrect. In the 5.1<br>

>     release these<br>

>      >     bugs are present so we expect, or maybe it should say, we<br>

>     know the<br>

>      >     test will fail. With this change any thing that appears in the<br>

>      >     failure column is "unexpected" and that means the user build<br>

>     of the<br>

>      >     release does not match the state we "expect" and it is worth<br>

>      >     investigation by the user.<br>

>      >      > >>>><br>

>      >      > >>>> Without these tests being tagged this way the user would<br>

>      >     have no idea where the stand after a build and test run and that<br>

>      >     would mean we would have to make sure a release has no<br>

>     failures. I<br>

>      >     consider that as not practical or realistic.<br>

>      >      > >>> Maybe we need another state, e.g.<br>

>      >     something-is-broken-please-fix-it.<br>

>      >      > >><br>

>      >      > >> I do not think so, it is implicit in the failure or the<br>

>     test<br>

>      >     is broken. The only change is to add unexpected-pass, that<br>

>     will be<br>

>      >     on master after the 5 branch.<br>

>      >      > >><br>

>      >      > ><br>

>      >      > > I disagree with this in principle,<br>

>      >      ><br>

>      >      > I did not invent this, it is borrowed from gcc. I<br>

>     considered their<br>

>      >      > mature test model as OK to follow. Look for "How to<br>

>     interpret test<br>

>      >      > results" in <a href="https://gcc.gnu.org/install/test.html" rel="noreferrer noreferrer" target="_blank">https://gcc.gnu.org/install/test.html</a>.<br>

>      >      ><br>

>      >      > We have ...<br>

>      >      ><br>

>      >      ><br>

>      ><br>

>     <a href="https://docs.rtems.org/branches/master/user/testing/tests.html#test-controls" rel="noreferrer noreferrer" target="_blank">https://docs.rtems.org/branches/master/user/testing/tests.html#test-controls</a><br>

>      >      ><br>

>      >      > Is the principle the two points below?<br>

>      >      ><br>

>      >      > > and it should be reverted after we branch 5.<br>

>      >      ><br>

>      >      > I would like to understand how regressions are to be tracked<br>

>      >     before we<br>

>      >      > revert the change. Until this change you could not track<br>

>     them. We<br>

>      >     need<br>

>      >      > to capture the state somehow and I view capturing the state in<br>

>      >     the tests<br>

>      >      > themselves as the best method.<br>

>      >      ><br>

>      >      > > It's fine for now to get the release state sync'd, but we<br>

>      >      ><br>

>      >      > I am not following why we would only tracking regressions on a<br>

>      >     release<br>

>      >      > branch?<br>

>      >      ><br>

>      >      > > should find a long-term solution that distinguishes the<br>

>     cases:<br>

>      >      > > 1. we don't expect this test to pass on this bsp<br>

>      >      ><br>

>      >      > If a test cannot pass on a BSP for a specific reasons it is<br>

>      >     excluded and<br>

>      >      > not built, e.g. not enough memory, single core. A test is<br>

>     expected to<br>

>      >      > fail because of a bug or missing feature we are not or<br>

>     cannot fix or<br>

>      >      > implement so we tag it as expected-fail or by default the<br>

>     test is<br>

>      >     tagged<br>

>      >      > as expected-pass. If a test may or may not pass because of<br>

>     some edge<br>

>      >      > case in a BSP it can be tagged 'indeterminate'.<br>

>      >      ><br>

>      >      > > 2. we expect this test to pass, but know it doesn't<br>

>     currently<br>

>      >      ><br>

>      >      > This depends on a point in time. After a change I make I would<br>

>      >     consider<br>

>      >      > this a regression and I would need to see what I have done<br>

>     in my<br>

>      >     change<br>

>      >      > to cause it. For this to happen we need a baseline where the<br>

>      >     tests that<br>

>      >      > fail because of a known bug or missing feature at the time<br>

>     I add my<br>

>      >      > change are tagged as expected to fail.<br>

>      >      ><br>

>      >      > An example is dl06 on the beagleboneblack:<br>

>      >      ><br>

>      >      > <a href="https://lists.rtems.org/pipermail/build/2020-May/014695.html" rel="noreferrer noreferrer" target="_blank">https://lists.rtems.org/pipermail/build/2020-May/014695.html</a><br>

>      >      ><br>

>      >      > The RAP needs to support trampolines and it does not so<br>

>     the test is<br>

>      >      > expected to fail.<br>

>      >      ><br>

>      >      > An example of a regression is a test that passes in a<br>

>     specific build<br>

>      >      > configuration and fails in another. These recent psim results<br>

>      >     from Joel<br>

>      >      > show this where the build without RTEMS_DEBUG passes and with<br>

>      >      > RTEMS_DEBUG fails. Here there are 2 regressions:<br>

>      >      ><br>

>      >      > <a href="https://lists.rtems.org/pipermail/build/2020-May/014943.html" rel="noreferrer noreferrer" target="_blank">https://lists.rtems.org/pipermail/build/2020-May/014943.html</a><br>

>      >      > <a href="https://lists.rtems.org/pipermail/build/2020-May/014946.html" rel="noreferrer noreferrer" target="_blank">https://lists.rtems.org/pipermail/build/2020-May/014946.html</a><br>

>      >      ><br>

>      >      > The regression in fsrfsbitmap01.exe with RTEMS_DEBUG<br>

>     explains the<br>

>      >      > timeout in the no RTEMS_DEBUG version. I had not noticed<br>

>     this before.<br>

>      >      > They are hard to notice without a baseline in each BSP and<br>

>      >     expecting us<br>

>      >      > to have 100% pass on all BSPs in all testing configurations,<br>

>      >     especially<br>

>      >      > simulation, is too hard.<br>

>      >      ><br>

>      >      > My hope is a simple rule "If you do not see 0 fails you<br>

>     need to check<br>

>      >      > your changes".<br>

>      >      ><br>

>      >      > > They are two very different things, and I don't like<br>

>     conflating<br>

>      >     them<br>

>      >      > > into one "expected-fail" case<br>

>      >      ><br>

>      >      > Sorry, I am not following. Would you be able to provide<br>

>     some examples<br>

>      >      > for 1. and 2. that may help me understand the issue?<br>

>      >      ><br>

>      ><br>

>      >     Yes. There are tests that "pass" by failing, such as the<br>

>     intrcritical<br>

>      >     tests.  These are tests that are expected to fail, always and<br>

>     forever,<br>

>      >     and are not worth looking at further if they are failing. An<br>

>      >     expected-fail that passes is, then, a bug/regression.<br>

>      ><br>

>      >     Then there are tests we have triaged and identified as bugs,<br>

>     which<br>

>      >     could be tagged by something such as "known-failure" that is not<br>

>      >     expected but we know it happens. This would be like spfenv<br>

>     tests where<br>

>      >     the support doesn't exist yet, or like the dl06.  These are<br>

>     tests that<br>

>      >     should be passing some day, but they are not right now. Yes,<br>

>      >     "known-failure" encodes a notion of time, but we must have a<br>

>     notion of<br>

>      >     time, because a regression is time-sensitive as well. The idea of<br>

>      >     "known-failure" is just a subset of what you have added to the<br>

>      >     "expected-failure" column. It would just be another reported<br>

>     statistic<br>

>      >     to add just like Timeouts or Benchmarks.<br>

>      ><br>

>      ><br>

>      > I'm concerned that we are not making a distinction between<br>

>     investigated<br>

>      > and known failures and deficiencies which have tickets and should<br>

>     work<br>

>      > if X is fixed. The Beagle issue and many of the jmr3904 failures<br>

>     are in<br>

>      > this category. Known failure should indicate a certainty that it<br>

>     can't<br>

>      > be made to work per someone who investigated. You can't add<br>

>     memory, the<br>

>      > simulator catches an invalid access before the trap handler, etc. As<br>

>      > opposed to all the TLS tests which fail because it isn't<br>

>     supported on an<br>

>      > architecture.<br>

>      ><br>

>      > Can we make a distinction between those two conditions? Something<br>

>     like<br>

>      > failure accepted pending investigation versus fails and explained<br>

>     versus<br>

>      > known failure?<br>

>      ><br>

>      > Known failure has a  comment explaining it<br>

> <br>

>     Comment where? This is the thread that pulls the design.<br>

> <br>

>      > Maybe a known issue which has a comment and ticket.<br>

> <br>

>     A ticket is a good place.<br>

> <br>

>      > Pending investigation for these you are flagging. Noted as known<br>

>     but no<br>

>      > explanation. Can serve as future tasks pool.<br>

> <br>

>     This seems like process management or resolution management. The<br>

>     purpose<br>

>     here is to automate regression detection. Separate databases or files<br>

>     containing values breaks a number of other tester requirements or it<br>

>     complicates the management and tester.<br>

> <br>

>      > Is that all that's needed? We don't want to lose the information<br>

>     that we<br>

>      > think these likely should pass but we haven't been investigated.<br>

>      ><br>

>      > Otherwise we have lost that no one has explained the situation.<br>

> <br>

>     For me the states are from the tester's point of view and the state is<br>

>     only metadata for the tester and nothing more. The tester is simpler<br>

>     when the states we deal with are simpler.<br>

> <br>

>     Can you please explain how I determine if my build of a BSP has any<br>

>     regressions over what the release has? I am fine with whatever labels<br>

>     you think should be used, or even more if you want them but there is<br>

>     needs to be a base requirement that a new user builds a BSP and<br>

>     tests it<br>

>     and knows what they meets the same standard as the release.<br>

> <br>

> <br>

> We don't have a magic database.<br>

<br>

We have the tcfg files. A database is hard or a range of reasons.<br>

<br>

> We have mailing list archives with <br>

> results. I would have to check if my bsp has some results and compare <br>

> them by hand. If my results match those posted, that's it. You couldn't <br>

> do that before this release.<br>

<br>

The assumes those results are golden and they may not.<br>

<br>

> If you want no unexplained failures, then you are raising the bar. <br>

> Marking all as expected is wrong. They are just not investigated yet. <br>

> Adding that state is the realistic answer if you don't want any <br>

> unexpected failures.<br>

> <br>

> And I said comment in tcfg for explained expected failures. Unexplained <br>

> ones there is nothing to say<br>

<br>

The states are from the tester's point of view to allow us to machine <br>

check for regressions. There was never any intention to characterise the <br>

test with these labels but this seems to be what is happening. It seems <br>

the label and the state is confusing so I can add another state to catch <br>

unexplained failures. Is unexplained-fail OK?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">If it is just to move things from unexpected failure to another category oh, I'm okay with it. I just don't want test written off as expected failure when they haven't been investigated. We have a fair number of those.</div><div dir="auto"><br></div><div dir="auto">We also have the situation where some bsps run on multiple simulators and real hardware and the results don't always align. The tcfg file doesn't capture that either.</div><div dir="auto"><br></div><div dir="auto">But at least we aren't putting tests and in a bin where they will be ignored forever. That's a step forward and explainable as a state and a work activity.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

We need to move forward from what we have and I think reverting the <br>

patch after a release is a step backwards.<br>

<br>

Chris<br>

</blockquote></div></div></div>