chrisj at rtems.org
Wed Jan 14 01:40:53 UTC 2015
I have moved this to the user list as this is not about RTEMS development.
On 14/01/2015 2:40 am, Daniel Gutson wrote:
> We are thinking about a "supervisor" watchdog, which runs in a high
> priority task, and
> has the following characteristics:
> a) tasks that "want" to be supervised are registered in the supervisor watchdog
> b) each supervised task is in one of the following mode:
> - automatic supervision
> - manual supervision
> - sleeping
> c) in "automatic supervision" mode, the supervisor watchdog keeps
> track of the program counter of the task.
> When the PC is the same after N cycles, the watchdog performs a
> predefined action (e.g. reset).
> d) supervised tasks in "manual supervision" have to kick the watchdog
> explicitly (e.g. by invoking a function of the API).
> e) the watchdog leaves alone the tasks in sleeping mode.
> The idea of the "automatic supervision" mode is to avoid polluting the
> task code due to spreading calls to the kick function,
> specially difficult when having to estimate the "distance" between
> these function calls.
> The idea of the "manual supervision" mode, which is rather
> traditional, is when the task executes tight inner loops.
> In this scheme, tasks should be in automatic mode as much as possible
> and switch to manual just in small bounded
> places of the code.
> Before entering in the discussion of the implementation, I'd like
> feeedback about the general idea please.
I have not done anything clever that attempts to monitor a task's
program counter. Apart of the possible complexity I would be concerned
about the processing time required with a high priority task.
In systems such as air traffic control voice switching I have opted for
a simple check-in type system. A module of code registers as an
important service. When the important module runs it checks out with a
time-out. The module must check-in before the time out expires.
Typically watchdog hardware needs to be hit more often than the time a
module may check out for so I have a low priority interrupt that
decrements a counter and while not zero it hits the watchdog hardware.
The watchdog supervisor updates this counter if all checked out users
have not timed out.
More information about the users