<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 9, 2020, 6:56 PM Chris Johns <<a href="mailto:chrisj@rtems.org">chrisj@rtems.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 10/5/20 9:24 am, Joel Sherrill wrote:<br>
> <br>
> <br>
> On Sat, May 9, 2020, 6:18 PM Chris Johns <<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <br>
> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>> wrote:<br>
> <br>
> On 10/5/20 4:57 am, Joel Sherrill wrote:<br>
> ><br>
> ><br>
> > On Sat, May 9, 2020, 1:02 PM Gedare Bloom <<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a><br>
> <mailto:<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a>><br>
> > <mailto:<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a> <mailto:<a href="mailto:gedare@rtems.org" target="_blank" rel="noreferrer">gedare@rtems.org</a>>>> wrote:<br>
> ><br>
> > On Sat, May 9, 2020 at 2:09 AM Chris Johns <<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>
> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>><br>
> > <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>>> wrote:<br>
> > ><br>
> > > On 9/5/20 11:30 am, Gedare Bloom wrote:<br>
> > > > On Wed, May 6, 2020 at 5:12 AM Chris Johns<br>
> <<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>><br>
> > <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>>> wrote:<br>
> > > >><br>
> > > >><br>
> > > >>> On 6 May 2020, at 8:15 pm, Sebastian Huber<br>
> > <<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a><br>
> <mailto:<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a>><br>
> > <mailto:<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a><br>
> <mailto:<a href="mailto:sebastian.huber@embedded-brains.de" target="_blank" rel="noreferrer">sebastian.huber@embedded-brains.de</a>>>> wrote:<br>
> > > >>><br>
> > > >>> On 06/05/2020 12:00, Chris Johns wrote:<br>
> > > >>><br>
> > > >>>>> On 6/5/20 7:35 pm, Sebastian Huber wrote:<br>
> > > >>>>>> On 06/05/2020 10:41, <a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>
> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>><br>
> > <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>> wrote:<br>
> > > >>>>><br>
> > > >>>>>> From: Chris Johns<<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>
> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a><br>
> <mailto:<a href="mailto:chrisj@rtems.org" target="_blank" rel="noreferrer">chrisj@rtems.org</a>>>><br>
> > > >>>>>><br>
> > > >>>>>> Updates #2962<br>
> > > >>>>>> ---<br>
> > > >>>>>> bsps/powerpc/psim/config/psim-testsuite.tcfg | 22<br>
> > ++++++++++++++++++++<br>
> > > >>>>>> 1 file changed, 22 insertions(+)<br>
> > > >>>>>> create mode 100644<br>
> > bsps/powerpc/psim/config/psim-testsuite.tcfg<br>
> > > >>>>>><br>
> > > >>>>>> diff --git<br>
> a/bsps/powerpc/psim/config/psim-testsuite.tcfg<br>
> > b/bsps/powerpc/psim/config/psim-testsuite.tcfg<br>
> > > >>>>>> new file mode 100644<br>
> > > >>>>>> index 0000000000..b0d2a05086<br>
> > > >>>>>> --- /dev/null<br>
> > > >>>>>> +++ b/bsps/powerpc/psim/config/psim-testsuite.tcfg<br>
> > > >>>>>> @@ -0,0 +1,22 @@<br>
> > > >>>>>> +#<br>
> > > >>>>>> +# PSIM RTEMS Test Database.<br>
> > > >>>>>> +#<br>
> > > >>>>>> +# Format is one line per test that is_NOT_ built.<br>
> > > >>>>>> +#<br>
> > > >>>>>> +<br>
> > > >>>>>> +expected-fail: fsimfsgeneric01<br>
> > > >>>>>> +expected-fail: block11<br>
> > > >>>>>> +expected-fail: rbheap01<br>
> > > >>>>>> +expected-fail: termios01<br>
> > > >>>>>> +expected-fail: ttest01<br>
> > > >>>>>> +expected-fail: psx12<br>
> > > >>>>>> +expected-fail: psxchroot01<br>
> > > >>>>>> +expected-fail: psxfenv01<br>
> > > >>>>>> +expected-fail: psximfs02<br>
> > > >>>>>> +expected-fail: psxpipe01<br>
> > > >>>>>> +expected-fail: spextensions01<br>
> > > >>>>>> +expected-fail: spfatal31<br>
> > > >>>>>> +expected-fail: spfifo02<br>
> > > >>>>>> +expected-fail: spmountmgr01<br>
> > > >>>>>> +expected-fail: spprivenv01<br>
> > > >>>>>> +expected-fail: spstdthreads01<br>
> > > >>>>><br>
> > > >>>>> I don't think these tests are expected to fail. If they<br>
> > fail, then there is a bug somewhere.<br>
> > > >>>><br>
> > > >>>> Yes we hope no tests fail but they can and do. Excluding<br>
> > tests because they fail would be incorrect. In the 5.1<br>
> release these<br>
> > bugs are present so we expect, or maybe it should say, we<br>
> know the<br>
> > test will fail. With this change any thing that appears in the<br>
> > failure column is "unexpected" and that means the user build<br>
> of the<br>
> > release does not match the state we "expect" and it is worth<br>
> > investigation by the user.<br>
> > > >>>><br>
> > > >>>> Without these tests being tagged this way the user would<br>
> > have no idea where the stand after a build and test run and that<br>
> > would mean we would have to make sure a release has no<br>
> failures. I<br>
> > consider that as not practical or realistic.<br>
> > > >>> Maybe we need another state, e.g.<br>
> > something-is-broken-please-fix-it.<br>
> > > >><br>
> > > >> I do not think so, it is implicit in the failure or the<br>
> test<br>
> > is broken. The only change is to add unexpected-pass, that<br>
> will be<br>
> > on master after the 5 branch.<br>
> > > >><br>
> > > ><br>
> > > > I disagree with this in principle,<br>
> > ><br>
> > > I did not invent this, it is borrowed from gcc. I<br>
> considered their<br>
> > > mature test model as OK to follow. Look for "How to<br>
> interpret test<br>
> > > results" in <a href="https://gcc.gnu.org/install/test.html" rel="noreferrer noreferrer" target="_blank">https://gcc.gnu.org/install/test.html</a>.<br>
> > ><br>
> > > We have ...<br>
> > ><br>
> > ><br>
> ><br>
> <a href="https://docs.rtems.org/branches/master/user/testing/tests.html#test-controls" rel="noreferrer noreferrer" target="_blank">https://docs.rtems.org/branches/master/user/testing/tests.html#test-controls</a><br>
> > ><br>
> > > Is the principle the two points below?<br>
> > ><br>
> > > > and it should be reverted after we branch 5.<br>
> > ><br>
> > > I would like to understand how regressions are to be tracked<br>
> > before we<br>
> > > revert the change. Until this change you could not track<br>
> them. We<br>
> > need<br>
> > > to capture the state somehow and I view capturing the state in<br>
> > the tests<br>
> > > themselves as the best method.<br>
> > ><br>
> > > > It's fine for now to get the release state sync'd, but we<br>
> > ><br>
> > > I am not following why we would only tracking regressions on a<br>
> > release<br>
> > > branch?<br>
> > ><br>
> > > > should find a long-term solution that distinguishes the<br>
> cases:<br>
> > > > 1. we don't expect this test to pass on this bsp<br>
> > ><br>
> > > If a test cannot pass on a BSP for a specific reasons it is<br>
> > excluded and<br>
> > > not built, e.g. not enough memory, single core. A test is<br>
> expected to<br>
> > > fail because of a bug or missing feature we are not or<br>
> cannot fix or<br>
> > > implement so we tag it as expected-fail or by default the<br>
> test is<br>
> > tagged<br>
> > > as expected-pass. If a test may or may not pass because of<br>
> some edge<br>
> > > case in a BSP it can be tagged 'indeterminate'.<br>
> > ><br>
> > > > 2. we expect this test to pass, but know it doesn't<br>
> currently<br>
> > ><br>
> > > This depends on a point in time. After a change I make I would<br>
> > consider<br>
> > > this a regression and I would need to see what I have done<br>
> in my<br>
> > change<br>
> > > to cause it. For this to happen we need a baseline where the<br>
> > tests that<br>
> > > fail because of a known bug or missing feature at the time<br>
> I add my<br>
> > > change are tagged as expected to fail.<br>
> > ><br>
> > > An example is dl06 on the beagleboneblack:<br>
> > ><br>
> > > <a href="https://lists.rtems.org/pipermail/build/2020-May/014695.html" rel="noreferrer noreferrer" target="_blank">https://lists.rtems.org/pipermail/build/2020-May/014695.html</a><br>
> > ><br>
> > > The RAP needs to support trampolines and it does not so<br>
> the test is<br>
> > > expected to fail.<br>
> > ><br>
> > > An example of a regression is a test that passes in a<br>
> specific build<br>
> > > configuration and fails in another. These recent psim results<br>
> > from Joel<br>
> > > show this where the build without RTEMS_DEBUG passes and with<br>
> > > RTEMS_DEBUG fails. Here there are 2 regressions:<br>
> > ><br>
> > > <a href="https://lists.rtems.org/pipermail/build/2020-May/014943.html" rel="noreferrer noreferrer" target="_blank">https://lists.rtems.org/pipermail/build/2020-May/014943.html</a><br>
> > > <a href="https://lists.rtems.org/pipermail/build/2020-May/014946.html" rel="noreferrer noreferrer" target="_blank">https://lists.rtems.org/pipermail/build/2020-May/014946.html</a><br>
> > ><br>
> > > The regression in fsrfsbitmap01.exe with RTEMS_DEBUG<br>
> explains the<br>
> > > timeout in the no RTEMS_DEBUG version. I had not noticed<br>
> this before.<br>
> > > They are hard to notice without a baseline in each BSP and<br>
> > expecting us<br>
> > > to have 100% pass on all BSPs in all testing configurations,<br>
> > especially<br>
> > > simulation, is too hard.<br>
> > ><br>
> > > My hope is a simple rule "If you do not see 0 fails you<br>
> need to check<br>
> > > your changes".<br>
> > ><br>
> > > > They are two very different things, and I don't like<br>
> conflating<br>
> > them<br>
> > > > into one "expected-fail" case<br>
> > ><br>
> > > Sorry, I am not following. Would you be able to provide<br>
> some examples<br>
> > > for 1. and 2. that may help me understand the issue?<br>
> > ><br>
> ><br>
> > Yes. There are tests that "pass" by failing, such as the<br>
> intrcritical<br>
> > tests. These are tests that are expected to fail, always and<br>
> forever,<br>
> > and are not worth looking at further if they are failing. An<br>
> > expected-fail that passes is, then, a bug/regression.<br>
> ><br>
> > Then there are tests we have triaged and identified as bugs,<br>
> which<br>
> > could be tagged by something such as "known-failure" that is not<br>
> > expected but we know it happens. This would be like spfenv<br>
> tests where<br>
> > the support doesn't exist yet, or like the dl06. These are<br>
> tests that<br>
> > should be passing some day, but they are not right now. Yes,<br>
> > "known-failure" encodes a notion of time, but we must have a<br>
> notion of<br>
> > time, because a regression is time-sensitive as well. The idea of<br>
> > "known-failure" is just a subset of what you have added to the<br>
> > "expected-failure" column. It would just be another reported<br>
> statistic<br>
> > to add just like Timeouts or Benchmarks.<br>
> ><br>
> ><br>
> > I'm concerned that we are not making a distinction between<br>
> investigated<br>
> > and known failures and deficiencies which have tickets and should<br>
> work<br>
> > if X is fixed. The Beagle issue and many of the jmr3904 failures<br>
> are in<br>
> > this category. Known failure should indicate a certainty that it<br>
> can't<br>
> > be made to work per someone who investigated. You can't add<br>
> memory, the<br>
> > simulator catches an invalid access before the trap handler, etc. As<br>
> > opposed to all the TLS tests which fail because it isn't<br>
> supported on an<br>
> > architecture.<br>
> ><br>
> > Can we make a distinction between those two conditions? Something<br>
> like<br>
> > failure accepted pending investigation versus fails and explained<br>
> versus<br>
> > known failure?<br>
> ><br>
> > Known failure has a comment explaining it<br>
> <br>
> Comment where? This is the thread that pulls the design.<br>
> <br>
> > Maybe a known issue which has a comment and ticket.<br>
> <br>
> A ticket is a good place.<br>
> <br>
> > Pending investigation for these you are flagging. Noted as known<br>
> but no<br>
> > explanation. Can serve as future tasks pool.<br>
> <br>
> This seems like process management or resolution management. The<br>
> purpose<br>
> here is to automate regression detection. Separate databases or files<br>
> containing values breaks a number of other tester requirements or it<br>
> complicates the management and tester.<br>
> <br>
> > Is that all that's needed? We don't want to lose the information<br>
> that we<br>
> > think these likely should pass but we haven't been investigated.<br>
> ><br>
> > Otherwise we have lost that no one has explained the situation.<br>
> <br>
> For me the states are from the tester's point of view and the state is<br>
> only metadata for the tester and nothing more. The tester is simpler<br>
> when the states we deal with are simpler.<br>
> <br>
> Can you please explain how I determine if my build of a BSP has any<br>
> regressions over what the release has? I am fine with whatever labels<br>
> you think should be used, or even more if you want them but there is<br>
> needs to be a base requirement that a new user builds a BSP and<br>
> tests it<br>
> and knows what they meets the same standard as the release.<br>
> <br>
> <br>
> We don't have a magic database.<br>
<br>
We have the tcfg files. A database is hard or a range of reasons.<br>
<br>
> We have mailing list archives with <br>
> results. I would have to check if my bsp has some results and compare <br>
> them by hand. If my results match those posted, that's it. You couldn't <br>
> do that before this release.<br>
<br>
The assumes those results are golden and they may not.<br>
<br>
> If you want no unexplained failures, then you are raising the bar. <br>
> Marking all as expected is wrong. They are just not investigated yet. <br>
> Adding that state is the realistic answer if you don't want any <br>
> unexpected failures.<br>
> <br>
> And I said comment in tcfg for explained expected failures. Unexplained <br>
> ones there is nothing to say<br>
<br>
The states are from the tester's point of view to allow us to machine <br>
check for regressions. There was never any intention to characterise the <br>
test with these labels but this seems to be what is happening. It seems <br>
the label and the state is confusing so I can add another state to catch <br>
unexplained failures. Is unexplained-fail OK?<br></blockquote></div></div><div dir="auto"><br></div><div dir="auto">If it is just to move things from unexpected failure to another category oh, I'm okay with it. I just don't want test written off as expected failure when they haven't been investigated. We have a fair number of those.</div><div dir="auto"><br></div><div dir="auto">We also have the situation where some bsps run on multiple simulators and real hardware and the results don't always align. The tcfg file doesn't capture that either.</div><div dir="auto"><br></div><div dir="auto">But at least we aren't putting tests and in a bin where they will be ignored forever. That's a step forward and explainable as a state and a work activity.</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
We need to move forward from what we have and I think reverting the <br>
patch after a release is a step backwards.<br>
<br>
Chris<br>
</blockquote></div></div></div>