Proposal for hardware configuration dependent performance limits

Chris Johns chrisj at rtems.org
Sun Nov 15 23:33:03 UTC 2020


On 14/11/20 11:20 pm, Sebastian Huber wrote:
> On 13/11/2020 20:01, Gedare Bloom wrote:
> 
>> On Fri, Nov 13, 2020 at 3:48 AM Sebastian Huber
>> <sebastian.huber at embedded-brains.de>  wrote:
>>> Hello,
>>>
>>> there is one aspect with respect to performance limits which is
>>> currently not considered in this proposal:
>>>
>>> https://lists.rtems.org/pipermail/devel/2020-November/063213.html
>>>
>>> You can run the some BSPs such as sparc/gr712rc on several boards.
>>> However, each board may have different settings which affect the system
>>> performance, for example the CPU frequency, memory controller settings,
>>> memory chips, etc.

Yes it is common to see this in a number of boards and systems. Another BSP that
has this is the Zynq where the same BSP could run on a number of boards however
RTEMS does not support runtime configuration and we need separate builds, eg
memory size.

>>> In the proposal, limits are specified like this:
>>>
>>>
>>> limits:
>>>     sparc/gr712rc:
>>>       DirtyCache:
>>>         max-upper-bound: 0.000005
>>>         mean-upper-bound: 0.000005
>>>       FullCache:
>>>         max-upper-bound: 0.000005
>>>         mean-upper-bound: 0.000005
>>>       HotCache:
>>>         max-upper-bound: 0.000005
>>>         mean-upper-bound: 0.000005
>>>       Load/1:
>>>         max-upper-bound: 0.00001
>>>         mean-upper-bound: 0.00001
>>>       Load/2:
>>>         max-upper-bound: 0.00001
>>>         mean-upper-bound: 0.00001
>>>       Load/3:
>>>         max-upper-bound: 0.00001
>>>         mean-upper-bound: 0.00001
>>>       Load/4:
>>>         max-upper-bound: 0.00001
>>>         mean-upper-bound: 0.00001
>>>
>>> This neglects that the limits are subject to a board configuration. One
>>> approach to cover this is the addition of a new BSP provided function:
>>>
>>> const char *rtems_get_hardware_performance_hash();
>>>
>>> The BSP feeds all performance related data into a hash function and
>> "data" here means configuration?
> Yes, hardware configuration.

Why not make these values part of the BSP configuration? The defaults for the
BSP can have a set of suitable values. Different boards have different
configurations to match and a separate kernel build.

>>> returns a string encoding (for example a MD5 digest in base64 encoding).
>>> The example from above with the performance hash:
>>>
>>> limits:
>>>     sparc/gr712rc/XrY7u+Ae7tCTyyK7j1rNww==:
>>>       DirtyCache:
>>>         max-upper-bound: 0.000005
>>>         mean-upper-bound: 0.000005
>>>
>>> The test suite could report the performance has and a test output analyser
>>> coudl check that the reported values are in the specified bounds.
>>>
>> This will work for fixed sets of configurations. I wonder if it is
>> reasonable instead to define ranges over configuration values? Or to
>> use board variant mnemonic names instead?  How many possible
>> configurations are we talking about?
>>
>> The hash output is fairly opaque, and really small configuration
>> changes would result in totally different hashes, so if I change a
>> board from 2 MiB RAM to 3 MiB RAM, I might like to know which
>> configuration this is closest to in case there is no matching hash.
>>
>> To invert the hash we will need to keep the table of all the
>> configurations mapped to hash values, which is fine I suppose.
>>
>> I can see that the advantage of a hash is that we don't need to create
>> a namespace for configuration options that may influence performance.
>> This can give some flexibility for boards that have different sets of
>> configuration options that matter.
> Maybe we should name it rtems_get_performance_magic_value(). On some BSPs it
> might be possible to return something with less entropy, but I think this stuff
> can get very quickly very complicated if you want to use explicit hardware
> configuration parameters to approximate expected limits for a new hardware
> configuration. The hash is a crude simplification, but it works well for known
> hardware configurations. Just using arch/bsp is definitely not enough to record
> performance limits for later regression testing.

Magic works by hiding how something works.

Chris


More information about the devel mailing list