Time spent in ticks...
jakob.viketoft at aacmicrotec.com
Tue Oct 18 16:03:22 UTC 2016
Hello Pavel, Joel, Sebastian,
From: Pavel Pisa [ppisa4lists at pikron.com]
Sent: Thursday, October 13, 2016 19:09
To: devel at rtems.org
Cc: Jakob Viketoft; joel at rtems.org
Subject: Re: Time spent in ticks...
> Hello Jakob,
> the time is measured and timers queue use 64-bit types for time
> representation. When higher time measurement resolution than tick
> is requested then it is reasonable (optimal) choice but it can be problem
> for 16-bit CPUs and some 32-bit one as well.
> How you have configured or1k CPU? Have you available hardware multiplier
> and barrel shifter or only shift by one and multiplier in SW?
> Do the CFLAGS match available instructions?
> I am not sure, if there is not 64 division in the time computation
> either. This is would be a killer for your CPU. The high resolution
> time sources and even tickless timers support can be implemented
> with full scaling and adjustment with only shifts, addition and
> multiplications in hot paths.
> I have tried to understand to actual RTEMS time-keeping code
> some time ago when nanosleep has been introduced and
> I have tried to analyze, propose some changes and compared
> it to Linux. See the thread following next messages
> Some discussed changes to nanosleep has been been implemented
> Generally, try to measure how many times multiplication
> and division is called in ISR.
> I think that I am capable to design implementation which
> restricted to mul, add and shr and minimizes number
> of transformations but if it sis found that RTEMS implementation
> needs to be optimized/changed then it can be task counted
> in man months.
> Generally, if tick interrupt last more than 10 (may be 20) usec then
> there is problem. One its source can be SW implementation ineffectiveness
> other that OS selected and possibly application required features
> are above selected CPU capabilities.
Sorry for my late response, I got caught on another hook for a couple of days but have now been able to wriggle free and delve deeper into the problem. First off, let me say that our or1k is configured to have both multiplier and division units and I can see that the toolchain match as they get implemented in the code (I can search for these generated instructions in a dump). However, for 64-bit multiplication and division, there is no matching hardware and these are implemented in software. The problematic code in our case is part of the tick code, in function tc_windup() in file cpukit/score/src/kern_tc.c.
Going from Joels clues about the erc32 and its timing, I looked into this a bit more and compared the assembler level to see what it made of the same Clock_isr. I found that in the erc32 case there is an overriding definition of Clock_driver_timecounter_tick() which ultimately leads it to use _Timecounter_Tick_simple where we were using the default _Timecounter_Tick. Now, this obviously won't hit the same speed bump and I believe going this way makes more sense for our CPU.
I just wanted to make sure that we don't lose any functionality or limit ourselves too much by going this route. Any comments or thoughts on this? Regarding CPU-features, the erc32 and or1k seem to be quite similar and should perhaps also have a more similar BSP implementation. Please let me know if I'm dead wrong... :)
More information about the devel