tms570 Cortex-R performance counters and some ideas related to RTEMS timekeeping code

Fri Aug 22 17:45:02 UTC 2014

On August 22, 2014 11:44:11 AM CDT, Pavel Pisa <pisa at cmp.felk.cvut.cz> wrote:
>Hello Joel,
>
>On Friday 22 of August 2014 17:25:24 Joel Sherrill wrote:
>> Pushed.
>>
>> Followups can just be subsequent patches.
>
>thanks, you are faster than light ...

Just truing to wrap up things on a Friday. :)

>As for the RTEMS timekeeping code, I can imagine how it could
>look better. I do not like Clock_driver_nanoseconds_since_last_tick.
>I am not even sure if it is really used by TOD (i.e. in ticker test
>seems to print rounded values on our board).

The Classic API get time method used returns TOD in a format with seconds and ticks since the last second. The print in that test only prints seconds. There is a nanoseconds sample which prints at higher granularity.

>
>In the fact I would like to see RTEMS to work completely tickless
>on hardware with modern free runing timebase and easily updated
>compare event hardware. That would allow to implement all POSIX time
>related functions with resolution limited only by hardware.
>

Agreed. 

>Scheduler is a question. Wen more than one task of same priority
>are ready to run then tick is easiest but even in such case
>slice time can be computed and only event for its overflow timer event
>is set.
>

We just need to take that into account for that as an input for calculating when the next tick occurs. I think time slice is the only factor other than the watchdog timers.

>But all that is huge amount of work.

Yep. But we can identify a development path where it is a sequence of smallish steps.

>I would start with easier side now. It is necessary to have reliable
>timebase. Consider 64 bit value running at some clock source speed.
>It is really hard to have that reliable on PC hardware, the common
>base 8254 can be used for that but access is horribly slow. All other
>mechanisms (HPET, TSC) are problematic - need probe and check that
>they are correct and synchronous between cores, do not change with
>sleep modes etc. Really difficult task which is solved by thousands
>lines of code by Linux kernel.
>
>But ARM and PowerPC based systems usually provide reasonable
>timer source register which is synchronized over all cores.
>Unfortuantelly, ARM ones provide usually only 32 bits wide register.
>I have solved problem how to extend that 32 bit counter to 64
>bit for one my friend who worked at BlackBerry. Their phones platform
>uses Cortex-A and QNX. The design constrains has been given
>by usecase - userspace events timestamping in QML profiller.
>This adds constrain that code can be called on more cores concurrently,
>using mutex would degrade performance horribly, privileged instructions
>cannot be used and value available from core was only 32 bit.
>
>I have designed for him attached code fragments and he has written
>some Qt derived code which is was used in Q10 phone debugging builds.
>
>The main ideas is to write extension to more than 60 bits without
>locking and use GCC builtin atomic support to ensure that counter
>overflow
>results only in single increment of higher value part.
>
>The only requirement for correct function is that clockCyclesExt()
>is called at least once per half of the counter overflow period
>and its execution is not interrupted for longer than equivalent time.
>Code even minimizes cache write contention cases.
>
>What do you think about use of this approach in RTEMS?

Sounds reasonable. The counter overflow period should be relatively long so this should be considered when determining the maximum length of time allowed between ticks.

>Then next step is to base timing on values which are not based on
>the ticks. I have seen that discussion about NTP time format
>(integer seconds + 1/2^32 fractions). Other option is 64bit nsec
>which is better regard 2038 overflow problem. The priority queue
>for finegrained timers ordering is tough task. It would worth
>to have all operations with additional paremeter about required
>precision
>for each interval/time event etc ...

I have discussed offline converting the delta chains in the watchdog to use timestamps. This would also let us do higher granularity absolute time events. Right now the TOD chain is second granularity.

But I don't have any good solutions to that. But it could be a discrete unit of work.

>But that is for longer discussion and incremental solution.
>
>I cannot provide my full time for such enhancements anyway.

None of us can. We have to have a plan with the steps and nibble.

>But it could be nice project if funding is found. I have friend
>who has grants from ESA to develop theory for precise time sources
>fussion (atomic clocks etc) and works on real hardware for satelite
>based clock synchronization too. We have spoken about Linux kernel
>NTP time synchronization and PLL loop long time ago and both gone
>to same conclusion how it should be done right way. I would be
>interresting
>to have this solution in RTEMS as well. But to do it right it would
>require some agency/company funded project. We have even networking
>cards with full IEEE-1588 HW support there for Intel and some articles
>about our findings regarding problem to synchronize time where most
>problematic part are latencies between ETHERNET card hardware and
>CPU core. They are even more problematic than precise time over
>local ETHERNET LAN ... So I think that there is enough competent
>people to come with something usesfull. But most of them cannot
>afford to work on it only for their pleassure.

This is likely not an area where volunteer effort will push it all the way through. 

>OK, that some dump of my ideas.
>
>I need to switch to other HW testing now to sustain our company
>and university project above sea level.

+1 I was reviewing a document while letting all the bsps build :)

>Best wishes,
>
>                  Pavel