[PATCH] tester: Limit simultaneous QEMU jobs to 1

Kinsey Moore kinsey.moore at oarcorp.com
Wed Sep 1 02:25:10 UTC 2021


On 8/31/2021 18:00, Chris Johns wrote:
> On 31/8/21 6:30 pm, Sebastian Huber wrote:
>> On 31/08/2021 09:00, Chris Johns wrote:
>>> On 31/8/21 3:20 pm, Sebastian Huber wrote:
>>>> On 30/08/2021 20:32, Kinsey Moore wrote:
>>>>> On 8/30/2021 12:12, Sebastian Huber wrote:
>>>>>> On 24/08/2021 20:45, Kinsey Moore wrote:
>>>>>>> diff --git a/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
>>>>>>> b/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
>>>>>>> index 3beba06..581c59c 100644
>>>>>>> --- a/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
>>>>>>> +++ b/tester/rtems/testing/bsps/a53_ilp32_qemu.ini
>>>>>>> @@ -36,3 +36,4 @@ bsp           = a53_ilp32_qemu
>>>>>>>     arch          = aarch64
>>>>>>>     tester        = %{_rtscripts}/qemu.cfg
>>>>>>>     bsp_qemu_opts = %{qemu_opts_base} -serial mon:stdio -machine
>>>>>>> virt,gic-version=3 -cpu cortex-a53 -m 4096
>>>>>>> +jobs          = 1
>>>>>> Does this overwrite the command line option or is this a default value?
>>>>>>
>>>>> When this is set in the tester configuration, the command line switch has no
>>>>> effect but it can be overridden in the user-config.
>>>> Overruling the command line option is not that great. I have a vastly different
>>>> test run duration with --jobs=1 vs. --jobs=48 with more or less the same test
>>>> results.
>>> What does more or less mean?
>> On Qemu some tests have no reliable outcome. If I run with --jobs=48 only two of
>> these tests fail compare to --jobs=1.
> It seems the experience varies between archs and hosts. It is the origin of this
> patch series.
>
>>> I appreciate the efforts Kinsey has gone to looking into why we have this
>>> happening and I also believe we need to keep pushing towards repeatable result.
>>> If limiting to 1 gives us repeatable results on qemu then I prefer this over
>>> tainted test results with intermittent tags.
>> During development waiting one minute is much better than waiting 13 minutes.
>> Repeatable tests is one aspect, but there are other aspects too. Overruling
>> command line options is not that great. If you run with default values, it is
>> all right to trade off repeatable results against a fast test run. However, if I
>> want to run with --jobs=N, I want to run with N jobs and not just one.
> Yes I agree. How we manage this so it is apparent seems to be the key issue here.
>
>>>> I think this option should be split into a "force-jobs" and
>>>> "default-jobs" option.
>>> I am sorry I do not understand these options?
>> force-jobs forces the jobs to N regardless of what is specified on the command
>> line. Maybe a warning or error should be emitted if the command line option
>> conflicts with the configuration option.
>>
>> default-jobs selects the job count if no --jobs command line option is specified.
> What about adding a `max-job` field which is 0 for no limit? This cannot be
> exceeded?
>
> Then `default-jobs` can be used as the default, again 0 means no liimit?
>
>>> The command line is ignored because and the value is fixed on purpose and I am
>>> not seeing a reason to change this.
>> Ignoring command line options is not really a pleasant user experience.
> Yes it is not. It was added in a hurry without much though when I added the TFTP
> support.
>
>>> When specified in a config it is a physical limit. A user being able to change
>>> the number of TFTP jobs on the command line does not make sense.
>> Yes, for physical limits this makes sense.
> We need to manage the managed this case for new users.
>
>>> This tool's focus is testing on hardware and I see that as more important. And
>>> as I have said before if we have problematic tests maybe the test or the tool
>>> generating the results needs to be investigated.
>>>
>>> I see this issue as something specific to the design of qemu and a few of our
>>> tests. I can guess at some of the reasons qemu does this but also being able to
>>> have the tick timer's clock be sync'ed with the CPU clock is important in some
>>> types of simulation, ie our case and these problematic test. We are a real-time
>>> operating system so needing this to be right to closer in simulation does not
>>> seem unreasonable.
>>>
>>> This discussion send a clear message, tier 1 archs and BSPs are very important
>>> to this project.
>> There are several ways to address the sporadic test failures on Qemu. You could
>> for example also change the tests to make them independent of the simulator
>> timing. For now, my approach would be to change the default jobs count for the
>> Qemu BSPs and still let the user overrule the default with a custom value to get
>> for example a faster test run.
> This is sensible. In summary:
>
> 1. Add `max-jobs` as a config file only settings with a default of 0
>
> 2. Change the config `jobs` to `default-jobs` again with 0 as the default default.
>
> 3. Let the command line override the default jobs and raise an error if over the
> maximum jobs allowed.
>
> 4. Provide a clear notice at the start and end of a run if the jobs used do not
> match the default.

I'll work toward this solution.


Kinsey



More information about the devel mailing list