Time spent in ticks...

Thu Oct 13 23:28:14 UTC 2016

Hello Joel,

On Friday 14 of October 2016 00:56:21 Joel Sherrill wrote:
> On Thu, Oct 13, 2016 at 1:37 PM, Joel Sherrill <joel at rtems.org> wrote:
> > On Thu, Oct 13, 2016 at 11:21 AM, Jakob Viketoft <
> >
> > jakob.viketoft at aacmicrotec.com> wrote:
> >> *From:* Joel Sherrill [joel at rtems.org]
> >> *Sent:* Thursday, October 13, 2016 17:38
> >> *To:* Jakob Viketoft
> >> *Cc:* devel at rtems.org
> >> *Subject:* Re: Time spent in ticks...
> >>
> >> >I don't have an or1k handy so ran on a sparc/erc32 simulator/
> >> >It is is a SPARC v7 at 15 Mhz.
> >> >
> >> >These times are in microseconds and based on the tmtests.
> >> >Specifically tm08and tm27.
> >> >
> >> >(1) rtems_clock_tick: only case - 52
> >> >(2) rtems interrupt: entry overhead returns to interrupted task - 12
> >> >(3) rtems interrupt: exit overhead returns to interrupted task - 4
> >> >(4) rtems interrupt: entry overhead returns to nested interrupt - 11
> >> >(5) rtems interrupt: exit overhead returns to nested interrupt - 3
> >
> > The above was from the master with SMP enabled. I repeated it with
> > SMP disabled and it had no impact.
> >
> > Since the timing change is post 4.11, I decided to try 4.11 with SMP
> > disabled:
> >
> > rtems_clock_tick: only case - 42
> > rtems interrupt: entry overhead returns to interrupted task - 11
> > rtems interrupt: exit overhead returns to interrupted task - 4
> > rtems interrupt: entry overhead returns to nested interrupt - 11
> > rtems interrupt: exit overhead returns to nested interrupt - 3
> >
> > So 42 + 12 + 4 = 58 microseconds, 58 * 15 = 870 cycles
> >
> > So the overhead has gone up some but as Pavel says it is quite likely
> > some mathematical operation on 64 bit types is slow on your CPU.
> >
> > HINT: If you can write a benchmark for 64-bit operations,
> > it would be a good comparison between CPUs and might
> > highlight where the software implementation needs improvement.
>
> I decided that another good point of reference was the powerpc/psim BSP. It
> reports the benchmarks in instructions:
>
> (1) rtems_clock_tick: only case - 229
> (2) rtems interrupt: entry overhead returns to interrupted task - 102
> (3) rtems interrupt: exit overhead returns to interrupted task - 95
> (4) rtems interrupt: entry overhead returns to nested interrupt - 105
> (5) rtems interrupt: exit overhead returns to nested interrupt - 85
>
> 229 + 102 + 96 = 427 instructions.
>
> That seems roughly inline with the erc32 which is 1 cycle for all
> instructions
> except loads which are 3 and stores which are 2. And the sparc has
> register windows so entering and exiting an ISR can potentially save
> and restore a lot of registers.
>
> So I am still leaning to Pavel's explanation that some primitive operation
> is really inefficient.

These numbers looks good.

I would expect that in the case of or1k there can be real penalty
if it is synthesized without multiply or barrel shifter.
Or CPU has these and compiler is set to not use them.
If that cannot be corrected (for example hardware multiplier
or shifter would cause design to not fit in FPGA) then there
is real problem and mitchmatch between RTEMS and CPU target
area. This can be solved by configurable time measurement
data type. For example use only ticks in 32-bit number
and change even timers queues to this type. It cannot be unconditional,
because today users of RTEMS expect that the time resolution
is better and that time does not overflow in longer range, ideally 2100
or more supported.

As for actual code, if I remember, I have not liked conversions
of monotonic to ticks in nanosleep and there has been some division.
The division is not in tick code (at least I thinks so). So this should
be OK. The packet sec and fractions format of timespec for one
of queues has some interresting properties but on the other hand
its repcaking has some overhead even in the tick processing.

If we take that for some CPU time spent in tick is for example 50 usec
then it is not problem if there are no deadlines in the similar range.
For example, tollerated latencies of 500 or 1000 usec and critical tasks
execution time is 300 usec then it is OK. If the tick rate is selected
1 kHz then 5% of CPU time consumption by time keeping looks like quite
a lot. If the timing of applications can tolerated tick time 0.1 sec (10 Hz)
then load contribution by tick processing is neglectable.

So all these numbers are relative to needs of planned target application.

Best wishes,

                Pavel