Time spent in ticks...

Tue Oct 18 16:47:10 UTC 2016

----- Am 18. Okt 2016 um 18:03 schrieb Jakob Viketoft jakob.viketoft at aacmicrotec.com:

> Hello Pavel, Joel, Sebastian,
> 
> From: Pavel Pisa [ppisa4lists at pikron.com]
> Sent: Thursday, October 13, 2016 19:09
> To: devel at rtems.org
> Cc: Jakob Viketoft; joel at rtems.org
> Subject: Re: Time spent in ticks...
> 
>> Hello Jakob,
> 
>> ...
> 
>> the time is measured and timers queue use 64-bit types for time
>> representation. When higher time measurement resolution than tick
>> is requested then it is reasonable (optimal) choice but it can be problem
>> for 16-bit CPUs and some 32-bit one as well.
> 
>> How you have configured or1k CPU? Have you available hardware multiplier
>> and barrel shifter or only shift by one and multiplier in SW?
>> Do the CFLAGS match available instructions?
> 
>> I am not sure, if there is not 64 division in the time computation
>> either. This is would be a killer for your CPU. The high resolution
>> time sources and even tickless timers support can be implemented
>> with full scaling and adjustment with only shifts, addition and
>> multiplications in hot paths.
> 
>> I have tried to understand to actual RTEMS time-keeping code
>> some time ago when nanosleep has been introduced and
>> I have tried to analyze, propose some changes and compared
>> it to Linux. See the thread following next messages
> 
>>  https://lists.rtems.org/pipermail/devel/2016-August/015720.html
> 
>>  https://lists.rtems.org/pipermail/devel/2016-August/015721.html
> 
>> Some discussed changes to nanosleep has been been implemented
>> already.
> 
>> Generally, try to measure how many times multiplication
>> and division is called in ISR.
>> I think that I am capable to design implementation which
>> restricted to mul, add and shr and minimizes number
>> of transformations but if it sis found that RTEMS implementation
>> needs to be optimized/changed then it can be task counted
>> in man months.
> 
>> Generally, if tick interrupt last more than 10 (may be 20) usec then
>> there is problem. One its source can be SW implementation ineffectiveness
>> other that OS selected and possibly application required features
>> are above selected CPU capabilities.
> 
> Sorry for my late response, I got caught on another hook for a couple of days
> but have now been able to wriggle free and delve deeper into the problem. First
> off, let me say that our or1k is configured to have both multiplier and
> division units and I can see that the toolchain match as they get implemented
> in the code (I can search for these generated instructions in a dump). However,
> for 64-bit multiplication and division, there is no matching hardware and these
> are implemented in software. The problematic code in our case is part of the
> tick code, in function tc_windup() in file cpukit/score/src/kern_tc.c.
> 
> Going from Joels clues about the erc32 and its timing, I looked into this a bit
> more and compared the assembler level to see what it made of the same
> Clock_isr. I found that in the erc32 case there is an overriding definition of
> Clock_driver_timecounter_tick() which ultimately leads it to use
> _Timecounter_Tick_simple where we were using the default _Timecounter_Tick.
> Now, this obviously won't hit the same speed bump and I believe going this way
> makes more sense for our CPU.
> 
> I just wanted to make sure that we don't lose any functionality or limit
> ourselves too much by going this route. Any comments or thoughts on this?
> Regarding CPU-features, the erc32 and or1k seem to be quite similar and should
> perhaps also have a more similar BSP implementation. Please let me know if I'm
> dead wrong... :)

The simple timercounter tick is there to support badly designed hardware and is less efficient than the normal timecounter tick.