[RTEMS Project] #2307: Improved watchdog implementation

RTEMS trac trac at rtems.org
Wed May 20 07:13:01 UTC 2015


#2307: Improved watchdog implementation
-----------------------------+------------------------------
 Reporter:  sebastian.huber  |       Owner:  sebastian.huber
     Type:  enhancement      |      Status:  accepted
 Priority:  normal           |   Milestone:  4.11
Component:  cpukit           |     Version:  4.11
 Severity:  normal           |  Resolution:
 Keywords:                   |
-----------------------------+------------------------------
Description changed by sebastian.huber:

Old description:

> = Benefit =
>
> Improved average-case and worst-case performance.  Uni-processor
> configurations will also benefit from some changes, e.g. the watchdog
> insert without restarts.
>
> = Problem Description =
>
> The timekeeping is an important part of an operating system.  It includes
>
> * timer services,
> * timeout options for operating system operations.
>
> On RTEMS the timekeeping is implemented using
>
> * watchdog delta chains for timer and timeout services, and
> * a clock tick function {{{rtems_clock_tick()}}}.
>
> == Global Watchdog Delta Chain ==
>
> RTEMS uses two global watchdog delta chains.  One for clock tick based
> timers and one for seconds based timers.  This approach is not scalable,
> since each additional processor will add more load to the watchdog
> handler making it a bottleneck in the system.  Inter-processor interrupts
> must be issued to propagate scheduling decisions from the processor
> serving the clock tick interrupt to the processor assigned to execute a
> thread.  The insert operation has O(n) time complexity so it is desirable
> to have short watchdog delta chains.  Also removal of watchdog controls
> leads to a restart of the insert procedure.
>
> == Giant Lock for Watchdog Delta Chain ==
>
> The watchdog handler disables interrupts to protect critical sections.
> Since this is insufficient on SMP configurations the complete clock tick
> function is executed under Giant lock protection on SMP.  Thus the Giant
> lock section time depends on the execution time of all timer services in
> one clock tick, which can be arbitrarily long and is out of control of
> the operating system.
>
> = Problem Solution =
>
> Move the global watchdog state variables
>
> * {{{_Watchdog_Sync_level}}},
> * {{{_Watchdog_Sync_count}}},
> * {{{_Watchdog_Ticks_chain}}}, and
> * {{{_Watchdog_Seconds_chain}}}
>
> into a watchdog context structure and modify the watchdog operations to
> use a watchdog context instead of global variables directly.
>
> Add an
> [http://www.rtems.org/onlinedocs/doxygen/cpukit/html/group__ClassicINTRLocks.html
> interrupt lock] to protect the watchdog state changes.
>
> Replace the watchdog synchronization level and count, because the current
> approach does not work on SMP, since the interrupts can happen not only
> on the local processor.  The watchdog operations are
>
> * watchdog insert (requires a forward iteration of the delta chain, O(n);
> due to the possibility of restarts, it has an essentially unbounded
> execution time with the current implementation),
> * watchdog removal (constant time operation, O(1)), and
> * watchdog adjust (requires a forward iteration of the delta chain in the
> worst-case, O(n)).
>
> The watchdog synchronization level and count is used to detect the
> removal of watchdogs during a watchdog insert procedure.  In case a
> removal is detected the iteration restarts.  This can be avoided using a
> technique similar to the SMP lock statistics iteration
> ({{{SMP_lock_Stats_iteration_context}}}).  This would turn all watchdog
> operations into worst-case time O(n) operations.  For insert and adjust n
> is the count of watchdogs in the chain.  For removal n is the count of
> threads performing an insert operation.
>
> Move the watchdog context into the scheduler context to use one watchdog
> context per scheduler instance.  Take care that active watchdogs move in
> case of a scheduler change of a thread.

New description:

 = Benefit =

 Improved average-case and worst-case performance.  Uni-processor
 configurations will also benefit from some changes, e.g. the watchdog
 insert without restarts.

 = Problem Description =

 The timekeeping is an important part of an operating system.  It includes

 * timer services,
 * timeout options for operating system operations.

 On RTEMS the timekeeping is implemented using

 * watchdog delta chains for timer and timeout services, and
 * a clock tick function {{{rtems_clock_tick()}}}.

 == Global Watchdog Delta Chain ==

 RTEMS uses two global watchdog delta chains.  One for clock tick based
 timers and one for seconds based timers.  This approach is not scalable,
 since each additional processor will add more load to the watchdog handler
 making it a bottleneck in the system.  Inter-processor interrupts must be
 issued to propagate scheduling decisions from the processor serving the
 clock tick interrupt to the processor assigned to execute a thread.  The
 insert operation has O(n) time complexity so it is desirable to have short
 watchdog delta chains.  Also removal of watchdog controls leads to a
 restart of the insert procedure.

 == Giant Lock for Watchdog Delta Chain ==

 The watchdog handler disables interrupts to protect critical sections.
 Since this is insufficient on SMP configurations the complete clock tick
 function is executed under Giant lock protection on SMP.  Thus the Giant
 lock section time depends on the execution time of all timer services in
 one clock tick, which can be arbitrarily long and is out of control of the
 operating system.

 = Problem Solution =

 Move the global watchdog state variables

 * {{{_Watchdog_Sync_level}}},
 * {{{_Watchdog_Sync_count}}},
 * {{{_Watchdog_Ticks_chain}}}, and
 * {{{_Watchdog_Seconds_chain}}}

 into a watchdog context structure and modify the watchdog operations to
 use a watchdog context instead of global variables directly.

 Add an
 [http://www.rtems.org/onlinedocs/doxygen/cpukit/html/group__ClassicINTRLocks.html
 interrupt lock] to protect the watchdog state changes.

 Replace the watchdog synchronization level and count, because the current
 approach does not work on SMP, since the interrupts can happen not only on
 the local processor.  The watchdog operations are

 * watchdog insert (requires a forward iteration of the delta chain, O(n);
 due to the possibility of restarts, it has an essentially unbounded
 execution time with the current implementation),
 * watchdog removal (constant time operation, O(1)), and
 * watchdog adjust (requires a forward iteration of the delta chain in the
 worst-case, O(n)).

 The watchdog synchronization level and count is used to detect the removal
 of watchdogs during a watchdog insert procedure.  In case a removal is
 detected the iteration restarts.  This can be avoided using a technique
 similar to the SMP lock statistics iteration
 ({{{SMP_lock_Stats_iteration_context}}}).  This would turn all watchdog
 operations into worst-case time O(n) operations.  For insert and adjust n
 is the count of watchdogs in the chain.  For removal n is the count of
 threads performing an insert operation.

--

--
Ticket URL: <http://devel.rtems.org/ticket/2307#comment:6>
RTEMS Project <http://www.rtems.org/>
RTEMS Project


More information about the bugs mailing list