Time spent in ticks...
jakob.viketoft at aacmicrotec.com
Thu Oct 13 16:21:05 UTC 2016
From: Joel Sherrill [joel at rtems.org]
Sent: Thursday, October 13, 2016 17:38
To: Jakob Viketoft
Cc: devel at rtems.org
Subject: Re: Time spent in ticks...
>I don't have an or1k handy so ran on a sparc/erc32 simulator/
>It is is a SPARC v7 at 15 Mhz.
>These times are in microseconds and based on the tmtests.
>Specifically tm08and tm27.
>(1) rtems_clock_tick: only case - 52
>(2) rtems interrupt: entry overhead returns to interrupted task - 12
>(3) rtems interrupt: exit overhead returns to interrupted task - 4
>(4) rtems interrupt: entry overhead returns to nested interrupt - 11
>(5) rtems interrupt: exit overhead returns to nested interrupt - 3
>The clock tick test has 100 tasks but it looks like they are blocked on a semaphore
>Your times look WAY too high. Maybe the interrupt is stuck on and
>not being cleared.
>On the erc32, a nominal "nothing to do clock tick" would be 1+2+3 from
>above or 52+12+4 = 68 microseconds. 68 * 15 = 1020 machine cycles.
>So at a higher clock rate, it should be even less time.
>My gut feeling is that I think something is wrong with the ISR handler
>and it is stuck. But the performance is definitely way too high.
(Sorry if the format got somewhat I garbled, anything but top-posting have to be done manually...)
I re-tested my case using an -O3 optimization (we have been using -O0 during development for debugging purposes) and I got a good performance boost, but I'm still nowhere near your numbers. I can vouch for that the interrupt (exception really) isn't stuck, but that the code unfortunately takes a long time to compute. I have a subsecond counter (1/16 of a second) which I'm sampling at various places in the code, storing its numbers to a buffer in memory so as to interfere with the program as little as possible.
With -O3, a tick handling still takes ~320 us to perform, but the weight has now shifted. tc_windup takes ~214 us and the rest is obviously _Watchdog_Tick(). When fragmenting the tc_windup function to find the worst speed bumps the biggest contribution (~122 us) seem to be coming from scale factor recalculation. Since it's 64 bits, it's turned into a software function which can be quite time-consuming apparently.
Even though _Watchdog_Tick() "only" takes ~100 us now, it still sound much higher than your total tick with a slower system (we're running at 50 MHz).
Is there anything we can do to improve these numbers? Is Clock_isr intended to be run uninterrupted as it is now? Can't see that much of the BSP patch code has anything to do with the speed of what I'm looking at right now...
Senior Engineer in RTL and embedded software
ÅAC Microtec AB
Dag Hammarskjölds väg 48
SE-751 83 Uppsala, Sweden
T: +46 702 80 95 97
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the devel