[PATCH 4/4] SPARC: optimize IRQ enable & disable

Sebastian Huber sebastian.huber at embedded-brains.de
Thu Nov 20 14:09:51 UTC 2014


On 20/11/14 12:36, Daniel Hellstrom wrote:
> On 11/20/2014 12:08 PM, Sebastian Huber wrote:
>> On 20/11/14 10:22, Daniel Hellstrom wrote:
>>>
>>> I will fix this. I missed it since it I never enabled RTEMS_PROFILING. 
>>
>> I did a test suite run on NGMP with profiling enabled and your 
>> patches with a local fix.  So overall they don't make things worse 
>> and the median of all maximum thread dispatch disabled times drops a 
>> bit (Std = Git master, Opt = with your patches).
> How should I interpret this, is it all RTEMS test-suite you have run 
> performing the profiling?

Most tests (448) print an XML profiling report, e.g. something like this

   <ProfilingReport name="CLOCK TICK">
     <PerCPUProfilingReport processorIndex="0">
       <MaxThreadDispatchDisabledTime 
unit="ns">49200</MaxThreadDispatchDisabledTime>
       <MeanThreadDispatchDisabledTime 
unit="ns">3484</MeanThreadDispatchDisabledTime>
       <TotalThreadDispatchDisabledTime 
unit="ns">4366444</TotalThreadDispatchDisabledTime>
<ThreadDispatchDisabledCount>1253</ThreadDispatchDisabledCount>
       <MaxInterruptDelay unit="ns">7555</MaxInterruptDelay>
       <MaxInterruptTime unit="ns">32822</MaxInterruptTime>
       <MeanInterruptTime unit="ns">16377</MeanInterruptTime>
       <TotalInterruptTime unit="ns">57485400</TotalInterruptTime>
       <InterruptCount>3510</InterruptCount>
     </PerCPUProfilingReport>
     <SMPLockProfilingReport name="SMP lock stats">
       <MaxAcquireTime unit="ns">1666</MaxAcquireTime>
       <MaxSectionTime unit="ns">3022</MaxSectionTime>
       <MeanAcquireTime unit="ns">1411</MeanAcquireTime>
       <MeanSectionTime unit="ns">2393</MeanSectionTime>
       <TotalAcquireTime unit="ns">14111</TotalAcquireTime>
       <TotalSectionTime unit="ns">23933</TotalSectionTime>
       <UsageCount>10</UsageCount>
       <ContentionCount initialQueueLength="0">10</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="Giant">
       <MaxAcquireTime unit="ns">1488</MaxAcquireTime>
       <MaxSectionTime unit="ns">49711</MaxSectionTime>
       <MeanAcquireTime unit="ns">800</MeanAcquireTime>
       <MeanSectionTime unit="ns">9627</MeanSectionTime>
       <TotalAcquireTime unit="ns">3409800</TotalAcquireTime>
       <TotalSectionTime unit="ns">41013688</TotalSectionTime>
       <UsageCount>4260</UsageCount>
       <ContentionCount initialQueueLength="0">4260</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="chains">
       <MaxAcquireTime unit="ns">1466</MaxAcquireTime>
       <MaxSectionTime unit="ns">2466</MaxSectionTime>
       <MeanAcquireTime unit="ns">929</MeanAcquireTime>
       <MeanSectionTime unit="ns">2017</MeanSectionTime>
       <TotalAcquireTime unit="ns">12088</TotalAcquireTime>
       <TotalSectionTime unit="ns">26222</TotalSectionTime>
       <UsageCount>13</UsageCount>
       <ContentionCount initialQueueLength="0">13</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="per-CPU">
       <MaxAcquireTime unit="ns">1888</MaxAcquireTime>
       <MaxSectionTime unit="ns">3000</MaxSectionTime>
       <MeanAcquireTime unit="ns">880</MeanAcquireTime>
       <MeanSectionTime unit="ns">1378</MeanSectionTime>
       <TotalAcquireTime unit="ns">610288</TotalAcquireTime>
       <TotalSectionTime unit="ns">955266</TotalSectionTime>
       <UsageCount>693</UsageCount>
       <ContentionCount initialQueueLength="0">693</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="TOD">
       <MaxAcquireTime unit="ns">1666</MaxAcquireTime>
       <MaxSectionTime unit="ns">4400</MaxSectionTime>
       <MeanAcquireTime unit="ns">803</MeanAcquireTime>
       <MeanSectionTime unit="ns">1376</MeanSectionTime>
       <TotalAcquireTime unit="ns">2858733</TotalAcquireTime>
       <TotalSectionTime unit="ns">4894755</TotalSectionTime>
       <UsageCount>3556</UsageCount>
       <ContentionCount initialQueueLength="0">3556</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="mount table entry">
       <MaxAcquireTime unit="ns">1800</MaxAcquireTime>
       <MaxSectionTime unit="ns">4088</MaxSectionTime>
       <MeanAcquireTime unit="ns">1217</MeanAcquireTime>
       <MeanSectionTime unit="ns">2001</MeanSectionTime>
       <TotalAcquireTime unit="ns">56022</TotalAcquireTime>
       <TotalSectionTime unit="ns">92088</TotalSectionTime>
       <UsageCount>46</UsageCount>
       <ContentionCount initialQueueLength="0">46</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="LEON3 IrqCtrl">
       <MaxAcquireTime unit="ns">1377</MaxAcquireTime>
       <MaxSectionTime unit="ns">3022</MaxSectionTime>
       <MeanAcquireTime unit="ns">1344</MeanAcquireTime>
       <MeanSectionTime unit="ns">2877</MeanSectionTime>
       <TotalAcquireTime unit="ns">2688</TotalAcquireTime>
       <TotalSectionTime unit="ns">5755</TotalSectionTime>
       <UsageCount>2</UsageCount>
       <ContentionCount initialQueueLength="0">2</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="thread zombies">
       <MaxAcquireTime unit="ns">1866</MaxAcquireTime>
       <MaxSectionTime unit="ns">3466</MaxSectionTime>
       <MeanAcquireTime unit="ns">1395</MeanAcquireTime>
       <MeanSectionTime unit="ns">2435</MeanSectionTime>
       <TotalAcquireTime unit="ns">6977</TotalAcquireTime>
       <TotalSectionTime unit="ns">12177</TotalSectionTime>
       <UsageCount>5</UsageCount>
       <ContentionCount initialQueueLength="0">5</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
     <SMPLockProfilingReport name="per-CPU state">
       <MaxAcquireTime unit="ns">1644</MaxAcquireTime>
       <MaxSectionTime unit="ns">4866</MaxSectionTime>
       <MeanAcquireTime unit="ns">1222</MeanAcquireTime>
       <MeanSectionTime unit="ns">2944</MeanSectionTime>
       <TotalAcquireTime unit="ns">4888</TotalAcquireTime>
       <TotalSectionTime unit="ns">11777</TotalSectionTime>
       <UsageCount>4</UsageCount>
       <ContentionCount initialQueueLength="0">4</ContentionCount>
       <ContentionCount initialQueueLength="1">0</ContentionCount>
       <ContentionCount initialQueueLength="2">0</ContentionCount>
       <ContentionCount initialQueueLength="3">0</ContentionCount>
     </SMPLockProfilingReport>
   </ProfilingReport>

The image is a boxplot of all the MaxThreadDispatchDisabledTime samples.

>
> Turning on/off interrupts should be faster, but the code calling the 
> on/off routines should also be faster. I used GRMON to extract 
> instruction traces for window overflow/underflow handling to benchmark. 

It is a nice improvement.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



More information about the devel mailing list