Proposal for hardware configuration dependent performance limits

Tue Nov 17 21:43:09 UTC 2020

On 17/11/20 6:14 pm, Sebastian Huber wrote:
> 
> On 16/11/2020 23:42, Chris Johns wrote:
>> On 16/11/20 5:40 pm, Sebastian Huber wrote:
>>> On 16/11/2020 00:33, Chris Johns wrote:
>>>
>>>>>>> In the proposal, limits are specified like this:
>>>>>>>
>>>>>>>
>>>>>>> limits:
>>>>>>>       sparc/gr712rc:
>>>>>>>         DirtyCache:
>>>>>>>           max-upper-bound: 0.000005
>>>>>>>           mean-upper-bound: 0.000005
>>>>>>>         FullCache:
>>>>>>>           max-upper-bound: 0.000005
>>>>>>>           mean-upper-bound: 0.000005
>>>>>>>         HotCache:
>>>>>>>           max-upper-bound: 0.000005
>>>>>>>           mean-upper-bound: 0.000005
>>>>>>>         Load/1:
>>>>>>>           max-upper-bound: 0.00001
>>>>>>>           mean-upper-bound: 0.00001
>>>>>>>         Load/2:
>>>>>>>           max-upper-bound: 0.00001
>>>>>>>           mean-upper-bound: 0.00001
>>>>>>>         Load/3:
>>>>>>>           max-upper-bound: 0.00001
>>>>>>>           mean-upper-bound: 0.00001
>>>>>>>         Load/4:
>>>>>>>           max-upper-bound: 0.00001
>>>>>>>           mean-upper-bound: 0.00001
>>>>>>>
>>>>>>> This neglects that the limits are subject to a board configuration. One
>>>>>>> approach to cover this is the addition of a new BSP provided function:
>>>>>>>
>>>>>>> const char *rtems_get_hardware_performance_hash();
>>>>>>>
>>>>>>> The BSP feeds all performance related data into a hash function and
>>>>>> "data" here means configuration?
>>>>> Yes, hardware configuration.
>>>> Why not make these values part of the BSP configuration? The defaults for the
>>>> BSP can have a set of suitable values. Different boards have different
>>>> configurations to match and a separate kernel build.
>>>>
>>> This doesn't work on BSPs which support configuration via a hardware
>>> enumeration, boot loader settings, or device trees. Also changes in the BSP
>>> options have no influence on the BSP name. Not only BSP configuration influence
>>> performance, the CPU options play a role too, for example RTEMS_SMP. In order to
>>> compare performance values over time we have to obtain the values under the same
>>> conditions.
>> Maybe I am not understanding the context.
>>
>> A BSP, which ever one, has a set of options that configure it. An example is the
>> xilinx_zynq_zc702 and the `ZYNQ_RAM_LENGTH = 0x40000000`. If I have 2 Zynq
>> circuits one with 256M and one with 1G I need to build and maintain 2 RTEMS
>> builds and from a purists point of view I need to maintain 2 builds of the exact
>> same application.
>>
>> I asked about the fixed memory and your answer was to use the BSP options, the
>> size is fixed in the linker command files via the BSP option. That is what I
>> have done.
>>
>> I would expect there exists a set of values for the xilinx_zynq_zc702 with no
>> SMP and with SMP as this BSP supports SMP. Those values would match all the
>> other settings for the BSP such as ZYNQ_CLOCK_CPU_1X, BSP_ARM_A9MPCORE_PERIPHCLK
>> etc. If my clock is different (and they are) I would need to supply a suitable
>> set of performance values if I wanted to pass those tests.
>>
>> I am not questioning the need for the values or the tests. I am suggesting the
>> values form part of the BSP settings so a user can adjust them to suite their
>> specific set up in the same way they adjust other BSP settings. I do not think
>> we should attempt to hold or manage an endless sets of possible values and I do
>> not see the need for complex encapsulation methods such as a base64 hashes. The
>> systems we interact with are too complex and list is endless.
> 
> I think it will be highly BSP-specific what parameters are relevant to the
> performance limits. This is why I suggested to add a function which can be
> implemented by each BSP.
> 
> const char *rtems_get_hardware_performance_something();
> 
> It should return a string which changes if a performance relevant parameter
> changed. If it is only SMP/no-SMP, ZYNQ_CLOCK_CPU_1X, and
> BSP_ARM_A9MPCORE_PERIPHCLK, then fine, just return "SMP/800MHz/400MHz" or whatever.

I suggest you avoid heading down a path of specific strings, ie avoid something
meaningful a human can read. Also performance characteristics are a part of a
wider configuration topic. Maybe considering that would solve the performance
specific parts as well.

A label for a build of RTEMS is a good idea (see below) that could serve the
human readable part. I would consider computing a hash for the config.ini file,
ie the build, and embedding it. If you wanted to capture the state of the RTEMS
source built optionally compute a hash for the entire source tree and embed that
as well. You can then have calls such as:

const char* rtems_config_build_hash(void);
const char* rtems_config_source_hash(void);

 [ the last one could return "NOT-AVAILABLE" if not enabled ]

The key point is defining markers, with defaults if optional, then wrapping your
configuration management system round them. Strings with a meaning such as
"SMP/800MHz/400MHz" are fragile because cosmetic changes break dependent
configuration management systems. A hash implies nothing specific, that task is
left to your CM systems.

For a BSP specific case of runtime values what about:

const char* rtems_config_bsp_hash(void);

with a default returning "DEFAULT". A BSP could override a weak function to
provide a hash computed in a specific way.

When I said a build label I was considering ...

[arm/beagleboneblack]
RTEMS_BUILD_LABEL = "...---..."

with a function 'rtems_config_build_label' to fetch it. The default could be
"RTEMS" if not set in config.ini. This would be useful when tracking deployed
builds of RTEMS. Consider this as labelling the config.ini file in a human
readable way that suites my CM processes.

Can environment variables effect a build of RTEMS? If so you either need to
include them somehow or have waf ignore them.

> My point is that we need a key reported by the BSP and then some performance
> limits which can be found by arch/bsp/key to check if there are performance
> regressions.

I am missing the place where the performance limits are held. Do the tests
report timing values and the checks against the limits happen on a host?

Chris