Proposal for hardware configuration dependent performance limits

Sun Nov 22 21:45:33 UTC 2020

On 21/11/20 3:43 am, Gedare Bloom wrote:
> On Thu, Nov 19, 2020 at 4:51 PM Chris Johns <chrisj at rtems.org> wrote:
>>
>> On 19/11/20 7:26 pm, Sebastian Huber wrote:
>>> Hello Chris,
>>>
>>> On 17/11/2020 22:43, Chris Johns wrote:
>>>
>>>>
>>>> On 17/11/20 6:14 pm, Sebastian Huber wrote:
>>>>> On 16/11/2020 23:42, Chris Johns wrote:
>>>>>> On 16/11/20 5:40 pm, Sebastian Huber wrote:
>>>>>>> On 16/11/2020 00:33, Chris Johns wrote:
>>>>>>>
>>>>>>>>>>> In the proposal, limits are specified like this:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> limits:
>>>>>>>>>>>        sparc/gr712rc:
>>>>>>>>>>>          DirtyCache:
>>>>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>>>>          FullCache:
>>>>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>>>>          HotCache:
>>>>>>>>>>>            max-upper-bound: 0.000005
>>>>>>>>>>>            mean-upper-bound: 0.000005
>>>>>>>>>>>          Load/1:
>>>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>>>          Load/2:
>>>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>>>          Load/3:
>>>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>>>          Load/4:
>>>>>>>>>>>            max-upper-bound: 0.00001
>>>>>>>>>>>            mean-upper-bound: 0.00001
>>>>>>>>>>>
>>>>>>>>>>> This neglects that the limits are subject to a board configuration. One
>>>>>>>>>>> approach to cover this is the addition of a new BSP provided function:
>>>>>>>>>>>
>>>>>>>>>>> const char *rtems_get_hardware_performance_hash();
>>>>>>>>>>>
>>>>>>>>>>> The BSP feeds all performance related data into a hash function and
>>>>>>>>>> "data" here means configuration?
>>>>>>>>> Yes, hardware configuration.
>>>>>>>> Why not make these values part of the BSP configuration? The defaults for the
>>>>>>>> BSP can have a set of suitable values. Different boards have different
>>>>>>>> configurations to match and a separate kernel build.
>>>>>>>>
>>>>>>> This doesn't work on BSPs which support configuration via a hardware
>>>>>>> enumeration, boot loader settings, or device trees. Also changes in the BSP
>>>>>>> options have no influence on the BSP name. Not only BSP configuration
>>>>>>> influence
>>>>>>> performance, the CPU options play a role too, for example RTEMS_SMP. In
>>>>>>> order to
>>>>>>> compare performance values over time we have to obtain the values under the
>>>>>>> same
>>>>>>> conditions.
>>>>>> Maybe I am not understanding the context.
>>>>>>
>>>>>> A BSP, which ever one, has a set of options that configure it. An example is
>>>>>> the
>>>>>> xilinx_zynq_zc702 and the `ZYNQ_RAM_LENGTH = 0x40000000`. If I have 2 Zynq
>>>>>> circuits one with 256M and one with 1G I need to build and maintain 2 RTEMS
>>>>>> builds and from a purists point of view I need to maintain 2 builds of the
>>>>>> exact
>>>>>> same application.
>>>>>>
>>>>>> I asked about the fixed memory and your answer was to use the BSP options, the
>>>>>> size is fixed in the linker command files via the BSP option. That is what I
>>>>>> have done.
>>>>>>
>>>>>> I would expect there exists a set of values for the xilinx_zynq_zc702 with no
>>>>>> SMP and with SMP as this BSP supports SMP. Those values would match all the
>>>>>> other settings for the BSP such as ZYNQ_CLOCK_CPU_1X,
>>>>>> BSP_ARM_A9MPCORE_PERIPHCLK
>>>>>> etc. If my clock is different (and they are) I would need to supply a suitable
>>>>>> set of performance values if I wanted to pass those tests.
>>>>>>
>>>>>> I am not questioning the need for the values or the tests. I am suggesting the
>>>>>> values form part of the BSP settings so a user can adjust them to suite their
>>>>>> specific set up in the same way they adjust other BSP settings. I do not think
>>>>>> we should attempt to hold or manage an endless sets of possible values and I do
>>>>>> not see the need for complex encapsulation methods such as a base64 hashes. The
>>>>>> systems we interact with are too complex and list is endless.
>>>>> I think it will be highly BSP-specific what parameters are relevant to the
>>>>> performance limits. This is why I suggested to add a function which can be
>>>>> implemented by each BSP.
>>>>>
>>>>> const char *rtems_get_hardware_performance_something();
>>>>>
>>>>> It should return a string which changes if a performance relevant parameter
>>>>> changed. If it is only SMP/no-SMP, ZYNQ_CLOCK_CPU_1X, and
>>>>> BSP_ARM_A9MPCORE_PERIPHCLK, then fine, just return "SMP/800MHz/400MHz" or
>>>>> whatever.
>>>> I suggest you avoid heading down a path of specific strings, ie avoid something
>>>> meaningful a human can read. Also performance characteristics are a part of a
>>>> wider configuration topic. Maybe considering that would solve the performance
>>>> specific parts as well.
>>>>
>>>> A label for a build of RTEMS is a good idea (see below) that could serve the
>>>> human readable part. I would consider computing a hash for the config.ini file,
>>>> ie the build, and embedding it. If you wanted to capture the state of the RTEMS
>>>> source built optionally compute a hash for the entire source tree and embed that
>>>> as well. You can then have calls such as:
>>>>
>>>> const char* rtems_config_build_hash(void);
>>>> const char* rtems_config_source_hash(void);
>>>>
>>>>   [ the last one could return "NOT-AVAILABLE" if not enabled ]
>>>>
>>>> The key point is defining markers, with defaults if optional, then wrapping your
>>>> configuration management system round them. Strings with a meaning such as
>>>> "SMP/800MHz/400MHz" are fragile because cosmetic changes break dependent
>>>> configuration management systems. A hash implies nothing specific, that task is
>>>> left to your CM systems.
>>>>
>>>> For a BSP specific case of runtime values what about:
>>>>
>>>> const char* rtems_config_bsp_hash(void);
>>>>
>>>> with a default returning "DEFAULT". A BSP could override a weak function to
>>>> provide a hash computed in a specific way.
>>>>
>>>> When I said a build label I was considering ...
>>>>
>>>> [arm/beagleboneblack]
>>>> RTEMS_BUILD_LABEL = "...---..."
>>>>
>>>> with a function 'rtems_config_build_label' to fetch it. The default could be
>>>> "RTEMS" if not set in config.ini. This would be useful when tracking deployed
>>>> builds of RTEMS. Consider this as labelling the config.ini file in a human
>>>> readable way that suites my CM processes.
>>> thanks for broadening the perspective. Maybe just focusing on the performance
>>> limits was a bit too specific. However, if we put things into a hash which only
>>> weakly influence the performance characteristics, then comparable performance
>>> test runs will be hard over time.
>>
>> A hash provides nothing more than a unique data point. How it is used qualifies
>> what it means and so weak or hard is relative. The path I have put forward
>> simply says if the hash is not what you expect something has changed. I like
>> this because it is simple and clear at the origin. Exposing internal components
>> of a board's configuration so you can determine the reason adds complexity to
>> RTEMS and it is not clear to me what the advantages are when considering
>> something is fit for purpose.
>>
>> Note, there is nothing stopping additional adhoc interfaces being added to a
>> specific BSP that can be queried in a BSP specific manner to report extra
>> detail. This would be outside the formal RTEMS interfaces and could change. An
>> example of this is bootloader and boot rom output.
>>
>> Also I am not sure we need a secure sized hash. Something simple, small and fast
>> may be suitable.
>>
> This is true. The collision resistance of this hash is not too
> important, as long as small changes in configuration are not likely to
> have a hash collision. If two completely different configurations have
> the same hash, this is not likely a problem for  a user, but the
> tooling does need to be robust to the possibility.

I agree. I am not sure what could work here.

>>>> Can environment variables effect a build of RTEMS? If so you either need to
>>>> include them somehow or have waf ignore them.
>>>
>>> I don't know waf good enough. If some environment variables are set during ./waf
>>> configure a warning is printed. I don't know, if environment variables are used
>>> during ./waf build.
>>
>> I am the same. I noted it as a matter of being complete while we discuss this
>> topic. Would something in the documentation in relation to configuration
>> management be suitable?
>>
>>>>> My point is that we need a key reported by the BSP and then some performance
>>>>> limits which can be found by arch/bsp/key to check if there are performance
>>>>> regressions.
>>>> I am missing the place where the performance limits are held. Do the tests
>>>> report timing values and the checks against the limits happen on a host?
>>>
>>> Yes, this is what I proposed.
>>
>> Thanks and sorry for not picking up on this before now. It makes sense to do it
>> this way.
>>
> I chimed in on the idea of not using a hash, because of the opaqueness
> of the specification and difficulty to derive what should be
> reasonable performance based on small configuration changes from a
> standard set. In that case, we do punt some responsibility to the end
> user to start from a configuration with a known hash and performance
> bounds before defining their own. Otherwise, the best they can do is
> like what we do: run it, record the measurements, and use those as the
> bounds moving forward.
> 
> When a user sends us a report saying my configuration
> a/lwjeVQ:H#TIHFOAH doesn't match the performance of
> z./hleg.khEHWIWEHFWHFE then we can have this conversation again. :)

If the user is basing their figures on a set of results we publish would
providing a description in the YAML be sufficient? This moves the burden of
maintenance from being internal to RTEMS to outside. And I am fine if there are
mandatory informational fields.

Chris

>>> An alternative would be to generate tables with
>>> performance limits and excessive C preprocessor conditionals and let the tests
>>> check the limits. Another option is to let the build system generate the tables.
>>> This would require that the performance limits are a part of the build
>>> specification.
>>>
>>> The proposed work flow would be something like this:
>>>
>>> 1. You select a board to use for long term performance tests.
>>>
>>> 2. You define a set of configurations you want to test.
>>>
>>> 3. You do an initial run of the test suite for each configuration. The RTEMS
>>> Tester provides you with a machine readable output (test data) of the test run
>>> with the raw test output per test executable and some meta information (TODO).
>>>
>>> 4. A tool reads the  test data and the RTEMS specification and updates the
>>> specification with the performance limits obtained from the test run (maybe with
>>> some simple transformation, for example increase maximum by 10% and round to
>>> microseconds).
>>>
>>> 5. You review the performance limits and then commit them.
>>>
>>> 6. Later you run the tests with a new RTEMS commit, get the performance values,
>>> compare them against the limits stored in the specification, and generate a report.
>>>
>>> In the specification items the limits are stored like this:
>>>
>>> limits:
>>>       sparc/gr712rc:
>>>         DirtyCache:
>>>           max-upper-bound: 0.000005
>>>           mean-upper-bound: 0.000005
>>>
>>> So each BSP has a separate block of lines. This avoids trouble with merge
>>> conflicts.
>>>
>>> As discussed above, using arch/bsp as a key is not enough. We need to include
>>> other things, so it should be really:
>>>
>>> limits:
>>>       sparc/gr712rc/something-in-addition:
>>           configs:
>>             - 1727638abd7188282ef
>>             - 19292efab87ade8928e
>>             - etc
>>>         DirtyCache:
>>>           max-upper-bound: 0.000005
>>>           mean-upper-bound: 0.000005
>>>
>>
>> Nice. I think a hash still works. I would use it to raise an "alert" if it does
>> not match any listed value. By an "alert" I am attempting to avoid error or
>> warning because this depends on the context. A qualified system may want this to
>> be an error while a warning for me is OK if the timing figures are being achieved.
>>
> +1
> 
>> Chris