[PATCH 4/4] SPARC: optimize IRQ enable & disable

Daniel Hellstrom daniel at gaisler.com
Thu Nov 20 14:23:40 UTC 2014


On 11/20/2014 03:09 PM, Sebastian Huber wrote:
>
> On 20/11/14 12:36, Daniel Hellstrom wrote:
>> On 11/20/2014 12:08 PM, Sebastian Huber wrote:
>>> On 20/11/14 10:22, Daniel Hellstrom wrote:
>>>>
>>>> I will fix this. I missed it since it I never enabled RTEMS_PROFILING. 
>>>
>>> I did a test suite run on NGMP with profiling enabled and your patches with a local fix.  So overall they don't make things worse and the median of all maximum thread dispatch disabled times drops 
>>> a bit (Std = Git master, Opt = with your patches).
>> How should I interpret this, is it all RTEMS test-suite you have run performing the profiling?
>
> Most tests (448) print an XML profiling report, e.g. something like this
>
>   <ProfilingReport name="CLOCK TICK">
>     <PerCPUProfilingReport processorIndex="0">
>       <MaxThreadDispatchDisabledTime unit="ns">49200</MaxThreadDispatchDisabledTime>
>       <MeanThreadDispatchDisabledTime unit="ns">3484</MeanThreadDispatchDisabledTime>
>       <TotalThreadDispatchDisabledTime unit="ns">4366444</TotalThreadDispatchDisabledTime>
> <ThreadDispatchDisabledCount>1253</ThreadDispatchDisabledCount>
>       <MaxInterruptDelay unit="ns">7555</MaxInterruptDelay>
>       <MaxInterruptTime unit="ns">32822</MaxInterruptTime>
>       <MeanInterruptTime unit="ns">16377</MeanInterruptTime>
>       <TotalInterruptTime unit="ns">57485400</TotalInterruptTime>
>       <InterruptCount>3510</InterruptCount>
>     </PerCPUProfilingReport>
>     <SMPLockProfilingReport name="SMP lock stats">
>       <MaxAcquireTime unit="ns">1666</MaxAcquireTime>
>       <MaxSectionTime unit="ns">3022</MaxSectionTime>
>       <MeanAcquireTime unit="ns">1411</MeanAcquireTime>
>       <MeanSectionTime unit="ns">2393</MeanSectionTime>
>       <TotalAcquireTime unit="ns">14111</TotalAcquireTime>
>       <TotalSectionTime unit="ns">23933</TotalSectionTime>
>       <UsageCount>10</UsageCount>
>       <ContentionCount initialQueueLength="0">10</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="Giant">
>       <MaxAcquireTime unit="ns">1488</MaxAcquireTime>
>       <MaxSectionTime unit="ns">49711</MaxSectionTime>
>       <MeanAcquireTime unit="ns">800</MeanAcquireTime>
>       <MeanSectionTime unit="ns">9627</MeanSectionTime>
>       <TotalAcquireTime unit="ns">3409800</TotalAcquireTime>
>       <TotalSectionTime unit="ns">41013688</TotalSectionTime>
>       <UsageCount>4260</UsageCount>
>       <ContentionCount initialQueueLength="0">4260</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="chains">
>       <MaxAcquireTime unit="ns">1466</MaxAcquireTime>
>       <MaxSectionTime unit="ns">2466</MaxSectionTime>
>       <MeanAcquireTime unit="ns">929</MeanAcquireTime>
>       <MeanSectionTime unit="ns">2017</MeanSectionTime>
>       <TotalAcquireTime unit="ns">12088</TotalAcquireTime>
>       <TotalSectionTime unit="ns">26222</TotalSectionTime>
>       <UsageCount>13</UsageCount>
>       <ContentionCount initialQueueLength="0">13</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="per-CPU">
>       <MaxAcquireTime unit="ns">1888</MaxAcquireTime>
>       <MaxSectionTime unit="ns">3000</MaxSectionTime>
>       <MeanAcquireTime unit="ns">880</MeanAcquireTime>
>       <MeanSectionTime unit="ns">1378</MeanSectionTime>
>       <TotalAcquireTime unit="ns">610288</TotalAcquireTime>
>       <TotalSectionTime unit="ns">955266</TotalSectionTime>
>       <UsageCount>693</UsageCount>
>       <ContentionCount initialQueueLength="0">693</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="TOD">
>       <MaxAcquireTime unit="ns">1666</MaxAcquireTime>
>       <MaxSectionTime unit="ns">4400</MaxSectionTime>
>       <MeanAcquireTime unit="ns">803</MeanAcquireTime>
>       <MeanSectionTime unit="ns">1376</MeanSectionTime>
>       <TotalAcquireTime unit="ns">2858733</TotalAcquireTime>
>       <TotalSectionTime unit="ns">4894755</TotalSectionTime>
>       <UsageCount>3556</UsageCount>
>       <ContentionCount initialQueueLength="0">3556</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="mount table entry">
>       <MaxAcquireTime unit="ns">1800</MaxAcquireTime>
>       <MaxSectionTime unit="ns">4088</MaxSectionTime>
>       <MeanAcquireTime unit="ns">1217</MeanAcquireTime>
>       <MeanSectionTime unit="ns">2001</MeanSectionTime>
>       <TotalAcquireTime unit="ns">56022</TotalAcquireTime>
>       <TotalSectionTime unit="ns">92088</TotalSectionTime>
>       <UsageCount>46</UsageCount>
>       <ContentionCount initialQueueLength="0">46</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="LEON3 IrqCtrl">
>       <MaxAcquireTime unit="ns">1377</MaxAcquireTime>
>       <MaxSectionTime unit="ns">3022</MaxSectionTime>
>       <MeanAcquireTime unit="ns">1344</MeanAcquireTime>
>       <MeanSectionTime unit="ns">2877</MeanSectionTime>
>       <TotalAcquireTime unit="ns">2688</TotalAcquireTime>
>       <TotalSectionTime unit="ns">5755</TotalSectionTime>
>       <UsageCount>2</UsageCount>
>       <ContentionCount initialQueueLength="0">2</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="thread zombies">
>       <MaxAcquireTime unit="ns">1866</MaxAcquireTime>
>       <MaxSectionTime unit="ns">3466</MaxSectionTime>
>       <MeanAcquireTime unit="ns">1395</MeanAcquireTime>
>       <MeanSectionTime unit="ns">2435</MeanSectionTime>
>       <TotalAcquireTime unit="ns">6977</TotalAcquireTime>
>       <TotalSectionTime unit="ns">12177</TotalSectionTime>
>       <UsageCount>5</UsageCount>
>       <ContentionCount initialQueueLength="0">5</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>     <SMPLockProfilingReport name="per-CPU state">
>       <MaxAcquireTime unit="ns">1644</MaxAcquireTime>
>       <MaxSectionTime unit="ns">4866</MaxSectionTime>
>       <MeanAcquireTime unit="ns">1222</MeanAcquireTime>
>       <MeanSectionTime unit="ns">2944</MeanSectionTime>
>       <TotalAcquireTime unit="ns">4888</TotalAcquireTime>
>       <TotalSectionTime unit="ns">11777</TotalSectionTime>
>       <UsageCount>4</UsageCount>
>       <ContentionCount initialQueueLength="0">4</ContentionCount>
>       <ContentionCount initialQueueLength="1">0</ContentionCount>
>       <ContentionCount initialQueueLength="2">0</ContentionCount>
>       <ContentionCount initialQueueLength="3">0</ContentionCount>
>     </SMPLockProfilingReport>
>   </ProfilingReport>
>
> The image is a boxplot of all the MaxThreadDispatchDisabledTime samples.

I think this is a nice utility and you seem to be able to update the results quickly too. Thanks. Since it is the test-suite you run how can we be sure that a test is not badly written, I mean 
disabling interrupts for too long time since the test itself is not testing interrupt latencies etc.? Perhaps the mean is most interesting, or that some of the max cases should be ignored? Have you 
analysed the worst tests to see where the effects comes from?

>
>>
>> Turning on/off interrupts should be faster, but the code calling the on/off routines should also be faster. I used GRMON to extract instruction traces for window overflow/underflow handling to 
>> benchmark. 
>
> It is a nice improvement.
I think it is now easier to add interrupt on/off performance tests and see where it was called from now that it is inlined. We can now from the trap handler (register %l1) see which code is 
responsible for the interrupt on/off call.

DanielH



More information about the devel mailing list