tms570 Cortex-R performance counters and some ideas related to RTEMS timekeeping code

Fri Aug 22 16:44:11 UTC 2014

Hello Joel,

On Friday 22 of August 2014 17:25:24 Joel Sherrill wrote:
> Pushed.
>
> Followups can just be subsequent patches.

thanks, you are faster than light ...

As for the RTEMS timekeeping code, I can imagine how it could
look better. I do not like Clock_driver_nanoseconds_since_last_tick.
I am not even sure if it is really used by TOD (i.e. in ticker test
seems to print rounded values on our board).

In the fact I would like to see RTEMS to work completely tickless
on hardware with modern free runing timebase and easily updated
compare event hardware. That would allow to implement all POSIX time
related functions with resolution limited only by hardware.

Scheduler is a question. Wen more than one task of same priority
are ready to run then tick is easiest but even in such case
slice time can be computed and only event for its overflow timer event
is set.

But all that is huge amount of work.

I would start with easier side now. It is necessary to have reliable
timebase. Consider 64 bit value running at some clock source speed.
It is really hard to have that reliable on PC hardware, the common
base 8254 can be used for that but access is horribly slow. All other
mechanisms (HPET, TSC) are problematic - need probe and check that
they are correct and synchronous between cores, do not change with
sleep modes etc. Really difficult task which is solved by thousands
lines of code by Linux kernel.

But ARM and PowerPC based systems usually provide reasonable
timer source register which is synchronized over all cores.
Unfortuantelly, ARM ones provide usually only 32 bits wide register.
I have solved problem how to extend that 32 bit counter to 64
bit for one my friend who worked at BlackBerry. Their phones platform
uses Cortex-A and QNX. The design constrains has been given
by usecase - userspace events timestamping in QML profiller.
This adds constrain that code can be called on more cores concurrently,
using mutex would degrade performance horribly, privileged instructions
cannot be used and value available from core was only 32 bit.

I have designed for him attached code fragments and he has written
some Qt derived code which is was used in Q10 phone debugging builds.

The main ideas is to write extension to more than 60 bits without
locking and use GCC builtin atomic support to ensure that counter overflow
results only in single increment of higher value part.

The only requirement for correct function is that clockCyclesExt()
is called at least once per half of the counter overflow period
and its execution is not interrupted for longer than equivalent time.
Code even minimizes cache write contention cases.

What do you think about use of this approach in RTEMS?

Then next step is to base timing on values which are not based on
the ticks. I have seen that discussion about NTP time format
(integer seconds + 1/2^32 fractions). Other option is 64bit nsec
which is better regard 2038 overflow problem. The priority queue
for finegrained timers ordering is tough task. It would worth
to have all operations with additional paremeter about required precision
for each interval/time event etc ...

But that is for longer discussion and incremental solution.

I cannot provide my full time for such enhancements anyway.

But it could be nice project if funding is found. I have friend
who has grants from ESA to develop theory for precise time sources
fussion (atomic clocks etc) and works on real hardware for satelite
based clock synchronization too. We have spoken about Linux kernel
NTP time synchronization and PLL loop long time ago and both gone
to same conclusion how it should be done right way. I would be interresting
to have this solution in RTEMS as well. But to do it right it would
require some agency/company funded project. We have even networking
cards with full IEEE-1588 HW support there for Intel and some articles
about our findings regarding problem to synchronize time where most
problematic part are latencies between ETHERNET card hardware and
CPU core. They are even more problematic than precise time over
local ETHERNET LAN ... So I think that there is enough competent
people to come with something usesfull. But most of them cannot
afford to work on it only for their pleassure.

OK, that some dump of my ideas.

I need to switch to other HW testing now to sustain our company
and university project above sea level.

Best wishes,

                  Pavel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: atomic-extend-cc.c
Type: text/x-csrc
Size: 4481 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/devel/attachments/20140822/211108f0/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: atomic-extend-cc.cpp
Type: text/x-c++src
Size: 3615 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/devel/attachments/20140822/211108f0/attachment-0005.bin>