Proposal for hardware configuration dependent performance limits

Thu Nov 19 08:26:44 UTC 2020

Hello Chris,

On 17/11/2020 22:43, Chris Johns wrote:

>
> On 17/11/20 6:14 pm, Sebastian Huber wrote:
>> On 16/11/2020 23:42, Chris Johns wrote:
>>> On 16/11/20 5:40 pm, Sebastian Huber wrote:
>>>> On 16/11/2020 00:33, Chris Johns wrote:
>>>>
>>>>>>>> In the proposal, limits are specified like this:
>>>>>>>>
>>>>>>>>
>>>>>>>> limits:
>>>>>>>>        sparc/gr712rc:
>>>>>>>>          DirtyCache:
>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>          FullCache:
>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>          HotCache:
>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>          Load/1:
>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>          Load/2:
>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>          Load/3:
>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>          Load/4:
>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>
>>>>>>>> This neglects that the limits are subject to a board configuration. One
>>>>>>>> approach to cover this is the addition of a new BSP provided function:
>>>>>>>>
>>>>>>>> const char *rtems_get_hardware_performance_hash();
>>>>>>>>
>>>>>>>> The BSP feeds all performance related data into a hash function and
>>>>>>> "data" here means configuration?
>>>>>> Yes, hardware configuration.
>>>>> Why not make these values part of the BSP configuration? The defaults for the
>>>>> BSP can have a set of suitable values. Different boards have different
>>>>> configurations to match and a separate kernel build.
>>>>>
>>>> This doesn't work on BSPs which support configuration via a hardware
>>>> enumeration, boot loader settings, or device trees. Also changes in the BSP
>>>> options have no influence on the BSP name. Not only BSP configuration influence
>>>> performance, the CPU options play a role too, for example RTEMS_SMP. In order to
>>>> compare performance values over time we have to obtain the values under the same
>>>> conditions.
>>> Maybe I am not understanding the context.
>>>
>>> A BSP, which ever one, has a set of options that configure it. An example is the
>>> xilinx_zynq_zc702 and the `ZYNQ_RAM_LENGTH = 0x40000000`. If I have 2 Zynq
>>> circuits one with 256M and one with 1G I need to build and maintain 2 RTEMS
>>> builds and from a purists point of view I need to maintain 2 builds of the exact
>>> same application.
>>>
>>> I asked about the fixed memory and your answer was to use the BSP options, the
>>> size is fixed in the linker command files via the BSP option. That is what I
>>> have done.
>>>
>>> I would expect there exists a set of values for the xilinx_zynq_zc702 with no
>>> SMP and with SMP as this BSP supports SMP. Those values would match all the
>>> other settings for the BSP such as ZYNQ_CLOCK_CPU_1X, BSP_ARM_A9MPCORE_PERIPHCLK
>>> etc. If my clock is different (and they are) I would need to supply a suitable
>>> set of performance values if I wanted to pass those tests.
>>>
>>> I am not questioning the need for the values or the tests. I am suggesting the
>>> values form part of the BSP settings so a user can adjust them to suite their
>>> specific set up in the same way they adjust other BSP settings. I do not think
>>> we should attempt to hold or manage an endless sets of possible values and I do
>>> not see the need for complex encapsulation methods such as a base64 hashes. The
>>> systems we interact with are too complex and list is endless.
>> I think it will be highly BSP-specific what parameters are relevant to the
>> performance limits. This is why I suggested to add a function which can be
>> implemented by each BSP.
>>
>> const char *rtems_get_hardware_performance_something();
>>
>> It should return a string which changes if a performance relevant parameter
>> changed. If it is only SMP/no-SMP, ZYNQ_CLOCK_CPU_1X, and
>> BSP_ARM_A9MPCORE_PERIPHCLK, then fine, just return "SMP/800MHz/400MHz" or whatever.
> I suggest you avoid heading down a path of specific strings, ie avoid something
> meaningful a human can read. Also performance characteristics are a part of a
> wider configuration topic. Maybe considering that would solve the performance
> specific parts as well.
>
> A label for a build of RTEMS is a good idea (see below) that could serve the
> human readable part. I would consider computing a hash for the config.ini file,
> ie the build, and embedding it. If you wanted to capture the state of the RTEMS
> source built optionally compute a hash for the entire source tree and embed that
> as well. You can then have calls such as:
>
> const char* rtems_config_build_hash(void);
> const char* rtems_config_source_hash(void);
>
>   [ the last one could return "NOT-AVAILABLE" if not enabled ]
>
> The key point is defining markers, with defaults if optional, then wrapping your
> configuration management system round them. Strings with a meaning such as
> "SMP/800MHz/400MHz" are fragile because cosmetic changes break dependent
> configuration management systems. A hash implies nothing specific, that task is
> left to your CM systems.
>
> For a BSP specific case of runtime values what about:
>
> const char* rtems_config_bsp_hash(void);
>
> with a default returning "DEFAULT". A BSP could override a weak function to
> provide a hash computed in a specific way.
>
> When I said a build label I was considering ...
>
> [arm/beagleboneblack]
> RTEMS_BUILD_LABEL = "...---..."
>
> with a function 'rtems_config_build_label' to fetch it. The default could be
> "RTEMS" if not set in config.ini. This would be useful when tracking deployed
> builds of RTEMS. Consider this as labelling the config.ini file in a human
> readable way that suites my CM processes.
thanks for broadening the perspective. Maybe just focusing on the 
performance limits was a bit too specific. However, if we put things 
into a hash which only weakly influence the performance characteristics, 
then comparable performance test runs will be hard over time.
>
> Can environment variables effect a build of RTEMS? If so you either need to
> include them somehow or have waf ignore them.

I don't know waf good enough. If some environment variables are set 
during ./waf configure a warning is printed. I don't know, if 
environment variables are used during ./waf build.

>
>> My point is that we need a key reported by the BSP and then some performance
>> limits which can be found by arch/bsp/key to check if there are performance
>> regressions.
> I am missing the place where the performance limits are held. Do the tests
> report timing values and the checks against the limits happen on a host?

Yes, this is what I proposed. An alternative would be to generate tables 
with performance limits and excessive C preprocessor conditionals and 
let the tests check the limits. Another option is to let the build 
system generate the tables. This would require that the performance 
limits are a part of the build specification.

The proposed work flow would be something like this:

1. You select a board to use for long term performance tests.

2. You define a set of configurations you want to test.

3. You do an initial run of the test suite for each configuration. The 
RTEMS Tester provides you with a machine readable output (test data) of 
the test run with the raw test output per test executable and some meta 
information (TODO).

4. A tool reads the  test data and the RTEMS specification and updates 
the specification with the performance limits obtained from the test run 
(maybe with some simple transformation, for example increase maximum by 
10% and round to microseconds).

5. You review the performance limits and then commit them.

6. Later you run the tests with a new RTEMS commit, get the performance 
values, compare them against the limits stored in the specification, and 
generate a report.

In the specification items the limits are stored like this:

limits:
       sparc/gr712rc:
         DirtyCache:
           max-upper-bound: 0.000005
           mean-upper-bound: 0.000005

So each BSP has a separate block of lines. This avoids trouble with merge conflicts.

As discussed above, using arch/bsp as a key is not enough. We need to include other things, so it should be really:

limits:
       sparc/gr712rc/something-in-addition:
         DirtyCache:
           max-upper-bound: 0.000005
           mean-upper-bound: 0.000005

-- 
embedded brains GmbH
Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber at embedded-brains.de
Phone: +49-89-18 94 741 - 16
Fax:   +49-89-18 94 741 - 08
PGP: Public key available on request.

embedded brains GmbH
Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier: https://embedded-brains.de/datenschutzerklaerung/