[PATCH 4/4] SPARC: optimize IRQ enable & disable
Sebastian Huber
sebastian.huber at embedded-brains.de
Thu Nov 20 14:09:51 UTC 2014
On 20/11/14 12:36, Daniel Hellstrom wrote:
> On 11/20/2014 12:08 PM, Sebastian Huber wrote:
>> On 20/11/14 10:22, Daniel Hellstrom wrote:
>>>
>>> I will fix this. I missed it since it I never enabled RTEMS_PROFILING.
>>
>> I did a test suite run on NGMP with profiling enabled and your
>> patches with a local fix. So overall they don't make things worse
>> and the median of all maximum thread dispatch disabled times drops a
>> bit (Std = Git master, Opt = with your patches).
> How should I interpret this, is it all RTEMS test-suite you have run
> performing the profiling?
Most tests (448) print an XML profiling report, e.g. something like this
<ProfilingReport name="CLOCK TICK">
<PerCPUProfilingReport processorIndex="0">
<MaxThreadDispatchDisabledTime
unit="ns">49200</MaxThreadDispatchDisabledTime>
<MeanThreadDispatchDisabledTime
unit="ns">3484</MeanThreadDispatchDisabledTime>
<TotalThreadDispatchDisabledTime
unit="ns">4366444</TotalThreadDispatchDisabledTime>
<ThreadDispatchDisabledCount>1253</ThreadDispatchDisabledCount>
<MaxInterruptDelay unit="ns">7555</MaxInterruptDelay>
<MaxInterruptTime unit="ns">32822</MaxInterruptTime>
<MeanInterruptTime unit="ns">16377</MeanInterruptTime>
<TotalInterruptTime unit="ns">57485400</TotalInterruptTime>
<InterruptCount>3510</InterruptCount>
</PerCPUProfilingReport>
<SMPLockProfilingReport name="SMP lock stats">
<MaxAcquireTime unit="ns">1666</MaxAcquireTime>
<MaxSectionTime unit="ns">3022</MaxSectionTime>
<MeanAcquireTime unit="ns">1411</MeanAcquireTime>
<MeanSectionTime unit="ns">2393</MeanSectionTime>
<TotalAcquireTime unit="ns">14111</TotalAcquireTime>
<TotalSectionTime unit="ns">23933</TotalSectionTime>
<UsageCount>10</UsageCount>
<ContentionCount initialQueueLength="0">10</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="Giant">
<MaxAcquireTime unit="ns">1488</MaxAcquireTime>
<MaxSectionTime unit="ns">49711</MaxSectionTime>
<MeanAcquireTime unit="ns">800</MeanAcquireTime>
<MeanSectionTime unit="ns">9627</MeanSectionTime>
<TotalAcquireTime unit="ns">3409800</TotalAcquireTime>
<TotalSectionTime unit="ns">41013688</TotalSectionTime>
<UsageCount>4260</UsageCount>
<ContentionCount initialQueueLength="0">4260</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="chains">
<MaxAcquireTime unit="ns">1466</MaxAcquireTime>
<MaxSectionTime unit="ns">2466</MaxSectionTime>
<MeanAcquireTime unit="ns">929</MeanAcquireTime>
<MeanSectionTime unit="ns">2017</MeanSectionTime>
<TotalAcquireTime unit="ns">12088</TotalAcquireTime>
<TotalSectionTime unit="ns">26222</TotalSectionTime>
<UsageCount>13</UsageCount>
<ContentionCount initialQueueLength="0">13</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="per-CPU">
<MaxAcquireTime unit="ns">1888</MaxAcquireTime>
<MaxSectionTime unit="ns">3000</MaxSectionTime>
<MeanAcquireTime unit="ns">880</MeanAcquireTime>
<MeanSectionTime unit="ns">1378</MeanSectionTime>
<TotalAcquireTime unit="ns">610288</TotalAcquireTime>
<TotalSectionTime unit="ns">955266</TotalSectionTime>
<UsageCount>693</UsageCount>
<ContentionCount initialQueueLength="0">693</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="TOD">
<MaxAcquireTime unit="ns">1666</MaxAcquireTime>
<MaxSectionTime unit="ns">4400</MaxSectionTime>
<MeanAcquireTime unit="ns">803</MeanAcquireTime>
<MeanSectionTime unit="ns">1376</MeanSectionTime>
<TotalAcquireTime unit="ns">2858733</TotalAcquireTime>
<TotalSectionTime unit="ns">4894755</TotalSectionTime>
<UsageCount>3556</UsageCount>
<ContentionCount initialQueueLength="0">3556</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="mount table entry">
<MaxAcquireTime unit="ns">1800</MaxAcquireTime>
<MaxSectionTime unit="ns">4088</MaxSectionTime>
<MeanAcquireTime unit="ns">1217</MeanAcquireTime>
<MeanSectionTime unit="ns">2001</MeanSectionTime>
<TotalAcquireTime unit="ns">56022</TotalAcquireTime>
<TotalSectionTime unit="ns">92088</TotalSectionTime>
<UsageCount>46</UsageCount>
<ContentionCount initialQueueLength="0">46</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="LEON3 IrqCtrl">
<MaxAcquireTime unit="ns">1377</MaxAcquireTime>
<MaxSectionTime unit="ns">3022</MaxSectionTime>
<MeanAcquireTime unit="ns">1344</MeanAcquireTime>
<MeanSectionTime unit="ns">2877</MeanSectionTime>
<TotalAcquireTime unit="ns">2688</TotalAcquireTime>
<TotalSectionTime unit="ns">5755</TotalSectionTime>
<UsageCount>2</UsageCount>
<ContentionCount initialQueueLength="0">2</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="thread zombies">
<MaxAcquireTime unit="ns">1866</MaxAcquireTime>
<MaxSectionTime unit="ns">3466</MaxSectionTime>
<MeanAcquireTime unit="ns">1395</MeanAcquireTime>
<MeanSectionTime unit="ns">2435</MeanSectionTime>
<TotalAcquireTime unit="ns">6977</TotalAcquireTime>
<TotalSectionTime unit="ns">12177</TotalSectionTime>
<UsageCount>5</UsageCount>
<ContentionCount initialQueueLength="0">5</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
<SMPLockProfilingReport name="per-CPU state">
<MaxAcquireTime unit="ns">1644</MaxAcquireTime>
<MaxSectionTime unit="ns">4866</MaxSectionTime>
<MeanAcquireTime unit="ns">1222</MeanAcquireTime>
<MeanSectionTime unit="ns">2944</MeanSectionTime>
<TotalAcquireTime unit="ns">4888</TotalAcquireTime>
<TotalSectionTime unit="ns">11777</TotalSectionTime>
<UsageCount>4</UsageCount>
<ContentionCount initialQueueLength="0">4</ContentionCount>
<ContentionCount initialQueueLength="1">0</ContentionCount>
<ContentionCount initialQueueLength="2">0</ContentionCount>
<ContentionCount initialQueueLength="3">0</ContentionCount>
</SMPLockProfilingReport>
</ProfilingReport>
The image is a boxplot of all the MaxThreadDispatchDisabledTime samples.
>
> Turning on/off interrupts should be faster, but the code calling the
> on/off routines should also be faster. I used GRMON to extract
> instruction traces for window overflow/underflow handling to benchmark.
It is a nice improvement.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
More information about the devel
mailing list