nanosleep.c remarks

Pavel Pisa pisa at cmp.felk.cvut.cz
Sat Jul 30 17:40:25 UTC 2016


Hello Gedare and Sebastian,

as the clock_nanosleep is in the place now, I am trying to
analyze consequences and I have some questions.

The first one, why is _Nanosleep_Pseudo_queue required
there. nanosleep is critical function for realtime and
it is quite possible that many threads on more CPUs
us that concurrently. But _Thread_queue_Enqueue calls
_Thread_queue_Acquire( the_thread_queue, &queue_context.Lock_context );
unconditionally. This leads to _SMP_ticket_lock_Acquire on SMP.
So this means that all calls are serialized and contend
for cache lines. But I do not understand why queue
used normally for wakeup request distribution is required
for nanosleep. Original code has only selected appropriate
state and activated scheduller

Original nanosleep

  /*
   *  Block for the desired amount of time
   */
  _Thread_Disable_dispatch();
    executing = _Thread_Executing;
    _Thread_Set_state(
      executing,
      STATES_DELAYING | STATES_INTERRUPTIBLE_BY_SIGNAL
    );
    _Watchdog_Initialize(
      &executing->Timer,
      _Thread_Delay_ended,
      0,
      executing
    );
    _Watchdog_Insert_ticks( &executing->Timer, ticks );
  _Thread_Enable_dispatch();

Actual nanosleep

  /*
   *  Block for the desired amount of time
   */
  _Thread_queue_Enqueue(
    &_Nanosleep_Pseudo_queue,
    &_Thread_queue_Operations_FIFO,
    executing,
    STATES_DELAYING | STATES_INTERRUPTIBLE_BY_SIGNAL,
    ticks,
    discipline,
    1
  );

But if simple _Thread_Set_state approach is not supported then
how is it that it is still used in rtems_task_wake_after

rtems_status_code rtems_task_wake_after(
  rtems_interval ticks
)
{
  /*
   * It is critical to obtain the executing thread after thread dispatching is
   * disabled on SMP configurations.
   */
  Thread_Control  *executing;
  Per_CPU_Control *cpu_self;

  cpu_self = _Thread_Dispatch_disable();
    executing = _Thread_Executing;

    if ( ticks == 0 ) {
      _Thread_Yield( executing );
    } else {
      _Thread_Set_state( executing, STATES_DELAYING );
      _Thread_Wait_flags_set( executing, THREAD_WAIT_STATE_BLOCKED );
      _Thread_Timer_insert_relative(
        executing,
        cpu_self,
        _Thread_Timeout,
        ticks
      );
    }
  _Thread_Dispatch_enable( cpu_self );
  return RTEMS_SUCCESSFUL;
}

Then the time stuff. 

nanosleep_helper() does not distinguish between CLOCK_REALTIME
and CLOCK_MONOTONIC when it computes remaining time (rmtp).
But the intention of this field is that if you call again
nanoslepp/clock_nanosleep with same parameters and rtmp
used as time to wait (in case of TIMER_ABSTIME is not set) then
the final wake time should be +/- same as if there has been
no interruption. If we consider POSIX required behavior/difference
between CLOCK_REALTIME and CLOCK_MONOTONIC and possibility
to adjust realtime clock then it would not work as expected.

By the way, _Timespec_From_ticks works expected way only for
first 1.19 hour after boot if used for absolute time (not used
that way in nanosleep).
For relative time, If the nanosleep is used for longer delay
than 4294 seconds then rtmp the result is complete garbage

void _Timespec_From_ticks(
  uint32_t         ticks,
  struct timespec *time
)
{
  uint32_t    usecs;

  usecs = ticks * rtems_configuration_get_microseconds_per_tick();

  time->tv_sec  = usecs / TOD_MICROSECONDS_PER_SECOND;
  time->tv_nsec = (usecs % TOD_MICROSECONDS_PER_SECOND) *
                    TOD_NANOSECONDS_PER_MICROSECOND;
}

If we consider that crystal oscillator is not perfect then
value of rtems_configuration_get_microseconds_per_tick has to be
tuned runtime but problem is that to not shift time by change
of scale if it is not changed at ticks == 0, it means
to use y = a * x + b there and at each time a from a1 to a2
is changed change b such that a2 * x + b2 = a1 * x + b1
to ensure tick to usec monotonicity for conversion of
monotonic time from ticks to timespec.

Another problem is that for higher frequency tick or ting time
source is the value rtems_configuration_get_microseconds_per_tick
is small so relative precision is insufficient.

For clock_nanosleep we get to _TOD_Absolute_timeout_to_ticks
which calls for CLOCK_MONOTONIC in

I have mostly lost track in the call chain there.
bintime2timespec is provided by NewLib as part of BSD time
framework introduction

https://devel.rtems.org/ticket/2271
https://www.daemon-systems.org/man/timecounter.9.html

Structure struct timecounter seems to be almost sane from
the documentation. But u_int64_t tc_frequency without
shifting right requires unnecessarily wide multiplication
or even worse division and relative resolution can be
low for some cases.

I am trying to study the code

static inline void _TOD_Get_zero_based_uptime_as_timespec(
  struct timespec *time
)
{
  _Timecounter_Nanouptime( time );
  --time->tv_sec;
}

where seconds decrement seems suspicious to me.

There seems to be data structures for precise time computation
and synchronization (sys/timeffc.h, etc.) but I  am not sure
if some of them are used.

General rule for POSIX systems is that CLOCK_MONOTONIC and 
CLOCK_REALTIME scaling is done in sync only the base and
step corrections are applied to CLOCK_REALTIME only.
But there seem to be two relatively independed paths
in the actual sources.

Other strict requirement for nanosleep is that it has
to suspend task in minimum for specified time. But I am
not sure if there is such round up in the actual code.
This is critical if user build his/her own timers
queue and premature wakeup leads to repeated abundant nanosleep
calls and (in the case of round down) it can even result in busy
loop for last tick cycle for example.

Generally there seems to be many multiplications, divisions
etc at leas in clock_nanosleep path.

I do not have full picture gained yet. But my feeling is
that there are at least some problematic things
which I have tried to analyze.

But generally, it is great that clock_nanosleep
is supported same as some other POSIX timed IPC
variants.

Best wishes,

               Pavel



More information about the devel mailing list