[RTEMS Project] #2556: Implement the O(m) Independence-Preserving Protocol (OMIP)
RTEMS trac
trac at rtems.org
Wed Jan 27 08:45:28 UTC 2016
#2556: Implement the O(m) Independence-Preserving Protocol (OMIP)
-----------------------------+-----------------------------
Reporter: sebastian.huber | Owner: sebastian.huber
Type: enhancement | Status: new
Priority: normal | Milestone: 4.12
Component: cpukit | Version: 4.12
Severity: normal | Keywords:
-----------------------------+-----------------------------
= Background =
The O(m) Independence-Preserving Protocol ([http://www.mpi-
sws.org/~bbb/papers/pdf/ecrts13b.pdf OMIP]) is a generalization of the
priority inheritance protocol to clustered scheduling which avoids the
non-preemptive sections present with priority boosting. The m denotes the
number of processors in the system. Its implementation requires an
extension of the scheduler helping protocol already used for the [http
://www-users.cs.york.ac.uk/~burns/MRSPpaper.pdf MrsP] semaphores.
However, the current implementation of the scheduler helping protocol has
two major issues, see Catellani, Sebastiano, Luca Bonato, Sebastian Huber,
and Enrico Mezzetti: Challenges in the Imple- mentation of MrsP. In
Reliable Software Technologies - Ada-Europe 2015, pages 179–195, 2015.
Firstly, the run-time of some scheduler operations depend on the size of
the resource dependency tree. Secondly, the scheduler operations of
threads which don't use shared resources must deal with the scheduler
helping protocol in case an owner of a shared resource is somehow
involved.
To illustrate the second issue, let us look at the following example. We
have a system with eight processors and two L2 caches. We assign
processor 0 to a partition P for latency sensitive real-time tasks (e.g.
sensor and actuator handling), processors 1, 2 and 3 are assigned to a
cluster C,,A,, and the remaining processors are assigned to a cluster
C,,_B,, for soft real-time worker tasks. The worker tasks use a shared
resource, e.g. a file system for data storage. Let us suppose a task R of
partition P sends a message to the workers. This may make a waiting
worker ready, which in turn pre-empts the owner of a shared resource. In
this case the scheduler helping protocol takes action and is carried out
by the task R. This contradicts the intended isolation of scheduler
instances.
The reason for this unfortunate coupling is a design issue of the
scheduler helping protocol implementation. Some scheduler operations may
return a thread in need of help. For example, if a thread is unblocked
which pre-empts an owner of a shared resource, then the pre-empted thread
is returned. Once a thread in need of help is returned, the ask for help
operation of the scheduler is executed. An alternative to this return
value based approach is the introduction of a pre-emption intervention
during thread dispatching. Threads taking part in the scheduler helping
protocol indicate this with a positive resource count value. In case a
thread dispatch occurs and pre-empts an owner of a shared resource, the
scheduler ask for help operation is invoked. So, the work is carried out
on behalf of the thread which takes part in the scheduler helping
protocol.
To overcome the first issue, an improved resource dependency tracking is
required. One approach is to use a recursive red-black tree based data
structure, see #2412.
= Implementation =
There are several steps necessary to implement OMIP.
* Introduce per-scheduler locks.
* Enable context switches with interrupts enabled.
* Add a pre-emption intervention to the thread dispatch.
* Add a table for priority nodes to the thread control block. For each
scheduler instance there is one priority node.
* Update the table in case the thread blocks on a resource, a timeout
while waiting for a resource occurs, or ownership of a resource is
transferred to the thread.
* Use this table in the pre-emption intervention.
* Update the MrsP implementation to the new infrastructure.
Currently, only one scheduler lock for all scheduler instances is used.
This simplified the MrsP implementation and due to the presence of a Giant
lock, this was not an issue. With the elimination of the Giant lock,
however, we need one scheduler lock per scheduler instance to really
profit from a decoupled system due to clustered scheduling.
The current implementation of thread dispatching has some implications
with respect to the interrupt latency. It is crucial to preserve the
system invariant that a thread can execute on at most one processor in the
system at a time. This is accomplished with a boolean indicator in the
thread context. The processor architecture specific context switch code
will mark that a thread context is no longer executing and waits that the
heir context stopped execution before it restores the heir context and
resumes execution of the heir thread (the boolean indicator is basically a
TTAS lock). So, there is one point in time in which a processor is
without a thread. This is essential to avoid cyclic dependencies in case
multiple threads migrate at once. Otherwise some supervising entity is
necessary to prevent deadlocks. Such a global supervisor would lead to
scalability problems so this approach is not used. Currently the context
switch is performed with interrupts disabled. Thus in case the heir
thread is currently executing on another processor, the time of disabled
interrupts is prolonged since one processor has to wait for another
processor to make progress.
If we add pre-emption intervention to the thread dispatch sequence, then
there is an even greater need to avoid this issue with the interrupt
latency. Interrupts normally store the context of the interrupted thread
on its stack. In case a thread is marked as not executing, we must not
use its thread stack to store such an interrupt context. We cannot use
the heir stack before it stopped execution on another processor. If we
enable interrupts during this transition, then we have to provide an
alternative thread independent stack for interrupts in this time frame.
The pre-emption intervention should be added to `_Thread_Do_dispatch()`
before the heir is read and perform the following pseudo-code actions.
{{{
pre_emption_intervention(executing):
if executing.resource_count > 0:
executing.lock()
if executing.is_ready():
for scheduler in executing.schedulers:
scheduler.lock()
if !executing.is_scheduled():
for scheduler in executing.schedulers:
scheduler.ask_for_help(executing)
for scheduler in executing.schedulers:
scheduler.unlock()
else if executing.active_help_level > 0:
idle.use(executing.scheduler_node)
executing.unlock()
}}}
The scheduler help operation affects multiple scheduler instances. In
terms of locking we have only two options,
* use a global scheduler lock, or
* obtain multiple per-scheduler locks at once.
A global scheduler lock is not an option. To avoid deadlocks obtain the
per-scheduler locks in a fixed order. However, in this case the per-
scheduler locks will observe different worst-case and average-case acquire
times (depending on the order).
Use a recursive data structure to determine the highest priority available
to a thread for each scheduler instance, e.g.
{{{
typedef struct Thread_Priority_node {
Priority_Control current_priority;
Priority_Control real_priority;
struct Thread_Priority_node *owner;
RBTree_Node Node;
RBTree_Control Inherited_priorities;
} Thread_Priority_node;
typedef struct {
...
Thread_Priority_node *priority_nodes; /* One per scheduler
instances */
...
} Thread_Control;
}}}
Initially a thread has a priority node reflecting its real priority. The
`Thread_Priority_node::owner` is `NULL`. The
`Thread_Priority_node::current_priority` is set to the real priority. The
`Thread_Priority_node::Inherited_priorities` is empty.
In case the thread must wait for ownership of a mutex, then it enqueues
its priority node in `Thread_Priority_node::Inherited_priorities` of the
mutex owner.
In case the thread is dequeued from the wait queue of a mutex, then it
dequeues its priority node in `Thread_Priority_node::Inherited_priorities`
of the previous mutex owner (ownership transfer) or the current mutex
owner (acquire timeout).
In case the minimum of the `Thread_Priority_node::real_priority` and the
`Thread_Priority_node::Inherited_priorities` changes, then
`Thread_Priority_node::current_priority` is updated. In case the
`Thread_Priority_node::owner` its not `NULL`, the priority change
propagates to the owner, and so on. In case
`Thread_Priority_node::current_priority` changes, the corresponding
scheduler is notified.
Use the thread lock to protect the priority nodes.
--
Ticket URL: <http://devel.rtems.org/ticket/2556>
RTEMS Project <http://www.rtems.org/>
RTEMS Project
More information about the bugs
mailing list