Proposal for hardware configuration dependent performance limits

Thu Nov 19 23:50:58 UTC 2020

On 19/11/20 7:26 pm, Sebastian Huber wrote:
> Hello Chris,
> 
> On 17/11/2020 22:43, Chris Johns wrote:
> 
>>
>> On 17/11/20 6:14 pm, Sebastian Huber wrote:
>>> On 16/11/2020 23:42, Chris Johns wrote:
>>>> On 16/11/20 5:40 pm, Sebastian Huber wrote:
>>>>> On 16/11/2020 00:33, Chris Johns wrote:
>>>>>
>>>>>>>>> In the proposal, limits are specified like this:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> limits:
>>>>>>>>>        sparc/gr712rc:
>>>>>>>>>          DirtyCache:
>>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>>          FullCache:
>>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>>          HotCache:
>>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>>          Load/1:
>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>          Load/2:
>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>          Load/3:
>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>          Load/4:
>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>
>>>>>>>>> This neglects that the limits are subject to a board configuration. One
>>>>>>>>> approach to cover this is the addition of a new BSP provided function:
>>>>>>>>>
>>>>>>>>> const char *rtems_get_hardware_performance_hash();
>>>>>>>>>
>>>>>>>>> The BSP feeds all performance related data into a hash function and
>>>>>>>> "data" here means configuration?
>>>>>>> Yes, hardware configuration.
>>>>>> Why not make these values part of the BSP configuration? The defaults for the
>>>>>> BSP can have a set of suitable values. Different boards have different
>>>>>> configurations to match and a separate kernel build.
>>>>>>
>>>>> This doesn't work on BSPs which support configuration via a hardware
>>>>> enumeration, boot loader settings, or device trees. Also changes in the BSP
>>>>> options have no influence on the BSP name. Not only BSP configuration
>>>>> influence
>>>>> performance, the CPU options play a role too, for example RTEMS_SMP. In
>>>>> order to
>>>>> compare performance values over time we have to obtain the values under the
>>>>> same
>>>>> conditions.
>>>> Maybe I am not understanding the context.
>>>>
>>>> A BSP, which ever one, has a set of options that configure it. An example is
>>>> the
>>>> xilinx_zynq_zc702 and the `ZYNQ_RAM_LENGTH = 0x40000000`. If I have 2 Zynq
>>>> circuits one with 256M and one with 1G I need to build and maintain 2 RTEMS
>>>> builds and from a purists point of view I need to maintain 2 builds of the
>>>> exact
>>>> same application.
>>>>
>>>> I asked about the fixed memory and your answer was to use the BSP options, the
>>>> size is fixed in the linker command files via the BSP option. That is what I
>>>> have done.
>>>>
>>>> I would expect there exists a set of values for the xilinx_zynq_zc702 with no
>>>> SMP and with SMP as this BSP supports SMP. Those values would match all the
>>>> other settings for the BSP such as ZYNQ_CLOCK_CPU_1X,
>>>> BSP_ARM_A9MPCORE_PERIPHCLK
>>>> etc. If my clock is different (and they are) I would need to supply a suitable
>>>> set of performance values if I wanted to pass those tests.
>>>>
>>>> I am not questioning the need for the values or the tests. I am suggesting the
>>>> values form part of the BSP settings so a user can adjust them to suite their
>>>> specific set up in the same way they adjust other BSP settings. I do not think
>>>> we should attempt to hold or manage an endless sets of possible values and I do
>>>> not see the need for complex encapsulation methods such as a base64 hashes. The
>>>> systems we interact with are too complex and list is endless.
>>> I think it will be highly BSP-specific what parameters are relevant to the
>>> performance limits. This is why I suggested to add a function which can be
>>> implemented by each BSP.
>>>
>>> const char *rtems_get_hardware_performance_something();
>>>
>>> It should return a string which changes if a performance relevant parameter
>>> changed. If it is only SMP/no-SMP, ZYNQ_CLOCK_CPU_1X, and
>>> BSP_ARM_A9MPCORE_PERIPHCLK, then fine, just return "SMP/800MHz/400MHz" or
>>> whatever.
>> I suggest you avoid heading down a path of specific strings, ie avoid something
>> meaningful a human can read. Also performance characteristics are a part of a
>> wider configuration topic. Maybe considering that would solve the performance
>> specific parts as well.
>>
>> A label for a build of RTEMS is a good idea (see below) that could serve the
>> human readable part. I would consider computing a hash for the config.ini file,
>> ie the build, and embedding it. If you wanted to capture the state of the RTEMS
>> source built optionally compute a hash for the entire source tree and embed that
>> as well. You can then have calls such as:
>>
>> const char* rtems_config_build_hash(void);
>> const char* rtems_config_source_hash(void);
>>
>>   [ the last one could return "NOT-AVAILABLE" if not enabled ]
>>
>> The key point is defining markers, with defaults if optional, then wrapping your
>> configuration management system round them. Strings with a meaning such as
>> "SMP/800MHz/400MHz" are fragile because cosmetic changes break dependent
>> configuration management systems. A hash implies nothing specific, that task is
>> left to your CM systems.
>>
>> For a BSP specific case of runtime values what about:
>>
>> const char* rtems_config_bsp_hash(void);
>>
>> with a default returning "DEFAULT". A BSP could override a weak function to
>> provide a hash computed in a specific way.
>>
>> When I said a build label I was considering ...
>>
>> [arm/beagleboneblack]
>> RTEMS_BUILD_LABEL = "...---..."
>>
>> with a function 'rtems_config_build_label' to fetch it. The default could be
>> "RTEMS" if not set in config.ini. This would be useful when tracking deployed
>> builds of RTEMS. Consider this as labelling the config.ini file in a human
>> readable way that suites my CM processes.
> thanks for broadening the perspective. Maybe just focusing on the performance
> limits was a bit too specific. However, if we put things into a hash which only
> weakly influence the performance characteristics, then comparable performance
> test runs will be hard over time.

A hash provides nothing more than a unique data point. How it is used qualifies
what it means and so weak or hard is relative. The path I have put forward
simply says if the hash is not what you expect something has changed. I like
this because it is simple and clear at the origin. Exposing internal components
of a board's configuration so you can determine the reason adds complexity to
RTEMS and it is not clear to me what the advantages are when considering
something is fit for purpose.

Note, there is nothing stopping additional adhoc interfaces being added to a
specific BSP that can be queried in a BSP specific manner to report extra
detail. This would be outside the formal RTEMS interfaces and could change. An
example of this is bootloader and boot rom output.

Also I am not sure we need a secure sized hash. Something simple, small and fast
may be suitable.

>> Can environment variables effect a build of RTEMS? If so you either need to
>> include them somehow or have waf ignore them.
> 
> I don't know waf good enough. If some environment variables are set during ./waf
> configure a warning is printed. I don't know, if environment variables are used
> during ./waf build.

I am the same. I noted it as a matter of being complete while we discuss this
topic. Would something in the documentation in relation to configuration
management be suitable?

>>> My point is that we need a key reported by the BSP and then some performance
>>> limits which can be found by arch/bsp/key to check if there are performance
>>> regressions.
>> I am missing the place where the performance limits are held. Do the tests
>> report timing values and the checks against the limits happen on a host?
> 
> Yes, this is what I proposed.

Thanks and sorry for not picking up on this before now. It makes sense to do it
this way.

> An alternative would be to generate tables with
> performance limits and excessive C preprocessor conditionals and let the tests
> check the limits. Another option is to let the build system generate the tables.
> This would require that the performance limits are a part of the build
> specification.
> 
> The proposed work flow would be something like this:
> 
> 1. You select a board to use for long term performance tests.
> 
> 2. You define a set of configurations you want to test.
> 
> 3. You do an initial run of the test suite for each configuration. The RTEMS
> Tester provides you with a machine readable output (test data) of the test run
> with the raw test output per test executable and some meta information (TODO).
> 
> 4. A tool reads the  test data and the RTEMS specification and updates the
> specification with the performance limits obtained from the test run (maybe with
> some simple transformation, for example increase maximum by 10% and round to
> microseconds).
> 
> 5. You review the performance limits and then commit them.
> 
> 6. Later you run the tests with a new RTEMS commit, get the performance values,
> compare them against the limits stored in the specification, and generate a report.
> 
> In the specification items the limits are stored like this:
> 
> limits:
>       sparc/gr712rc:
>         DirtyCache:
>           max-upper-bound: 0.000005
>           mean-upper-bound: 0.000005
> 
> So each BSP has a separate block of lines. This avoids trouble with merge
> conflicts.
> 
> As discussed above, using arch/bsp as a key is not enough. We need to include
> other things, so it should be really:
> 
> limits:
>       sparc/gr712rc/something-in-addition:
          configs:
            - 1727638abd7188282ef
            - 19292efab87ade8928e
            - etc
>         DirtyCache:
>           max-upper-bound: 0.000005
>           mean-upper-bound: 0.000005
> 

Nice. I think a hash still works. I would use it to raise an "alert" if it does
not match any listed value. By an "alert" I am attempting to avoid error or
warning because this depends on the context. A qualified system may want this to
be an error while a warning for me is OK if the timing figures are being achieved.

Chris