Thread Life-Cycle Changes

Sebastian Huber sebastian.huber at embedded-brains.de
Fri Jul 5 11:01:09 UTC 2013


Hello,

the current implementation to manage a thread life-cycle in RTEMS has some
weaknesses that turn into severe problems on SMP.  It leads also to POSIX and
C++ standard conformance defects in some cases.  Currently the thread
life-cycle changes are protected by the thread dispatch disable level and some
parts by the allocator mutex.  Since the thread dispatch disable level is
actually a giant mutex on SMP this leads in combination with the allocator
mutex to lock order reversal problems.

One problematic path is the destruction of threads.  Here we have currently the
following sequence:

1. Obtain the allocator mutex.

2. Disable thread dispatching.

3. Invalidate the object identifier.

4. Enable thread dispatching.

5. Call the thread delete extensions in the context of the deleting thread (not
necessarily the deleted thread).  The POSIX cleanup handlers are
called here from the POSIX delete extension.  POSIX mandates that the cleanup
handler are executed in the context of the corresponding thread.  So here we
have a POSIX violation.

http://pubs.opengroup.org/onlinepubs/000095399/functions/xsh_chap02_09.html#tag_02_09_05_03

6. Remove the thread from the scheduling and watchdog resources.

7. Delete scheduling, floating-point, stack and extensions resources.  Now the
deleted thread may execute on a freed thread stack!

8. Free the object.  Now the object (thread control block) is available for
re-use, but it is still used by the thread!  Only the disabled thread
dispatching prevents chaos.

9. Release the allocator mutex.  Now we have a lock order reversal (see step 1.
and 2.).

10. Enable thread dispatching.  Here a deleted executing thread disappears.
On SMP we have also a race-condition here.  This step looks in detail:

   if ( _Thread_Dispatch_decrement_disable_level() == 0 )
     /*
      * Here another processor may re-use resources of a deleted executing
      * thread, e.g. the stack.
      */
     _Thread_Dispatch();
   }

To overcome the issues we need considerable implementation changes in Score.
The thread life-cycle state must be explicit and independent of the thread
dispatch disable level and allocator mutex protection.

The thread life-cycle is determined by the following actions:

   CREATE - A thread is created.

   START - Starts a thread.  The thread must be dormant to get started.

   RESTART - Restarts a thread.  The thread must not be dormant to get
   restarted.

   SUSPEND - Suspends a thread.

   RESUME - Resumes a thread.

   DELETE - Deletes a thread.

   SET_PROTECTION - Sets the new protection state and returns the previous.
   This action is new.

The following thread life-cycle states are proposed.  These states are
orthogonal to the blocking states, e.g. DORMANT, SUSPENDED etc.:

   PROTECTED - The thread is protected from immediate restart, delete and
   suspend actions.  Can be controlled by pthread_setcancelstate() for example.

   RESTART_REQUESTED - The thread was PROTECTED and a valid restart action was
   perfomed.  The new life-cycle state is determined once the PROTECTED state is
   cleared.

   SUSPEND_REQUESTED - The thread was PROTECTED and a valid suspend action was
   perfomed.  The new life-cycle state is determined once the PROTECTED state is
   cleared.

   DELETE_REQUESTED - The thread was PROTECTED and a valid delete action was
   perfomed.  The new life-cycle state is determined once the PROTECTED state is
   cleared.

If several requests are pending after a cleared PROTECTED state, then DELETE
has the highest priority followed by SUSPEND and RESTART.

The cleanup handler invocation must execute in the corresponding thread
context.  In case a thread deletes itself, then this is trivial.  In case a
thread is deleted by another thread, then we can restart the thread with a
special reaper function that performs the delete procedure.  For this special
case restart we must use the current stack pointer of the deleted thread since
cleanup buffers may reside on the thread stack.

Restarting threads executing on a remote processor requires further
investigation.

The release of vital thread resources, e.g. the thread control block and the
thread stack must be split into two steps.  The first step places the resources
on a garbage list.  In case new threads are allocated the garbage list is used
to check for resources that can be released.  The resources on the garbage list
provide a release function that tries to release the resource depending on its
state (e.g. the corresponding thread stopped execution).

I am happy to get some feedback about the proposed changes.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



More information about the devel mailing list