[PATCH 04/11] sapi: Add SMP lock profiling app. level data

Tue Mar 11 12:53:40 UTC 2014

On Tue, Mar 11, 2014 at 4:13 AM, Sebastian Huber
<sebastian.huber at embedded-brains.de> wrote:
> On 2014-03-10 16:52, Gedare Bloom wrote:
>>
>> On Mon, Mar 10, 2014 at 9:28 AM, Sebastian Huber
>> <sebastian.huber at embedded-brains.de> wrote:
>>>
>>> ---
>>>   cpukit/sapi/include/rtems/profiling.h |   88
>>> ++++++++++++++++++++++++++++++++-
>>>   cpukit/sapi/src/profilingreportxml.c  |   70 ++++++++++++++++++++++++++
>>>   2 files changed, 157 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/cpukit/sapi/include/rtems/profiling.h
>>> b/cpukit/sapi/include/rtems/profiling.h
>>> index ecb3ff7..2a66015 100644
>>> --- a/cpukit/sapi/include/rtems/profiling.h
>>> +++ b/cpukit/sapi/include/rtems/profiling.h
>>> @@ -66,7 +66,14 @@ typedef enum {
>>>      *
>>>      * @see rtems_profiling_per_cpu.
>>>      */
>>> -  RTEMS_PROFILING_PER_CPU
>>> +  RTEMS_PROFILING_PER_CPU,
>>> +
>>> +  /**
>>> +   * @brief Type of SMP lock profiling data.
>>> +   *
>>> +   * @see rtems_profiling_smp_lock.
>>> +   */
>>> +  RTEMS_PROFILING_SMP_LOCK
>>>   } rtems_profiling_type;
>>>
>>>   /**
>>> @@ -152,6 +159,80 @@ typedef struct {
>>>   } rtems_profiling_per_cpu;
>>>
>>>   /**
>>> + * @brief Count of lock contention counters for SMP lock profiling.
>>> + */
>>> +#define RTEMS_PROFILING_SMP_LOCK_CONTENTION_COUNTS 4
>>> +
>>> +/**
>>> + * @brief SMP lock profiling data.
>>> + *
>>> + * The lock acquire attempt instant is the point in time right after the
>>> + * interrupt disable action in the lock acquire sequence.
>>> + *
>>> + * The lock acquire instant is the point in time right after the lock
>>> + * acquisition.  This is the begin of the critical section code
>>> execution.
>>> + *
>>> + * The lock release instant is the point in time right before the
>>> interrupt
>>> + * enable action in the lock release sequence.
>>> + *
>>> + * The lock section time is the time elapsed between the lock acquire
>>> instant
>>> + * and the lock release instant.
>>> + *
>>> + * The lock acquire time is the time elapsed between the lock acquire
>>> attempt
>>> + * instant and the lock acquire instant.
>>
>> Move this last point up to where you define lock acquire attempt
>> instant and lock acquire instant.
>
>
> Ok.
>
>
>>
>>> + */
>>> +typedef struct {
>>> +  /**
>>> +   * @brief The profiling data header.
>>> +   */
>>> +  rtems_profiling_header header;
>>> +
>>> +  /**
>>> +   * @brief The lock name.
>>> +   */
>>> +  const char *name;
>>> +
>>> +  /**
>>> +   * @brief The maximum lock section time in nanoseconds.
>>> +   */
>>> +  uint32_t max_section_time;
>>> +
>>> +  /**
>>> +   * @brief The maximum lock acquire time in nanoseconds.
>>> +   */
>>> +  uint32_t max_acquire_time;
>>> +
>>
>> Any possible issues with 32-bit values here?
>
>
> It would be a major bug if you spend 4 seconds in an SMP lock.
>
>
>>
>>> +  /**
>>> +   * @brief The count of lock uses.
>>> +   *
>>> +   * This value may overflow.
>>> +   */
>>> +  uint64_t usage_count;
>>> +
>>> +  /**
>>> +   * @brief Total lock section time in nanoseconds.
>>> +   *
>>> +   * The average lock section time is the total section time divided by
>>> the
>>> +   * lock usage count.
>>> +   *
>>> +   * This value may overflow.
>>> +   */
>>> +  uint64_t total_section_time;
>>> +
>>> +  /**
>>> +   * @brief The counts of lock acquire operations by contention.
>>> +   *
>>> +   * The contention count for index N corresponds to a lock acquire
>>> attempt
>>> +   * with an initial queue length of N.  The last index corresponds to
>>> all
>>> +   * lock acquire attempts with an initial queue length greater than or
>>> equal
>>> +   * to RTEMS_PROFILING_SMP_LOCK_CONTENTION_COUNTS minus one.
>>> +   *
>>> +   * The values may overflow.
>>> +   */
>>> +  uint64_t
>>> contention_counts[RTEMS_PROFILING_SMP_LOCK_CONTENTION_COUNTS];
>>
>> contention_counts[0] is when there is no contention hence negligible
>> acquire_time?
>
>
> Yes, optimally you get the high numbers for contention_counts[0].  In case
> of the Giant lock it looks like this (smptests/smpload01):
>
>     <SMPLockProfilingReport name="Giant">
>       <MaxSectionTime unit="ns">233360</MaxSectionTime>
>       <MaxAcquireTime unit="ns">192160</MaxAcquireTime>
>       <UsageCount>9281</UsageCount>
>       <TotalSectionTime unit="ns">464815740</TotalSectionTime>
>       <ContentionCount initialQueueLength="0">2480</ContentionCount>
>       <ContentionCount initialQueueLength="1">3893</ContentionCount>
>       <ContentionCount initialQueueLength="2">2201</ContentionCount>
>       <ContentionCount initialQueueLength="3">707</ContentionCount>
>     </SMPLockProfilingReport>
>
>
>
>>
>> Is total_acquire_time not feasible or interesting to profile?
>
> [...]
>
> Hm, a good question.  I think the contention counter and the total acquire
> time are a measure for lock contention.  The ratio of total_acquire_time /
> total_section_time might be interesting.
>
> Should I add it?
>
Yes, it should be relatively cheap to do and gives a better idea
whether the contention is leading long delays or just a lot of small
ones.

> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax     : +49 89 189 47 41-09
> E-Mail  : sebastian.huber at embedded-brains.de
> PGP     : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.