[RTEMS Project] #2054: Race condition in FIFO/pipe implementation

Thu Dec 18 11:13:05 UTC 2014

#2054: Race condition in FIFO/pipe implementation
-----------------------------+----------------------------
 Reporter:  sebastian.huber  |       Owner:  joel.sherrill
     Type:  defect           |      Status:  new
 Priority:  low              |   Milestone:  5.0
Component:  cpukit           |     Version:  4.11
 Severity:  critical         |  Resolution:
 Keywords:                   |
-----------------------------+----------------------------
Changes (by sebastian.huber):

 * priority:  normal => low
 * severity:  normal => critical
 * milestone:  4.11 => 5.0

Old description:

> Does anyone know why the FIFO/pipe implementation uses barriers for
> synchronization?  For me the usage of the barrier looks like a hand
> crafted condition variable.
>
> I think there is also a race condition in its use.  The
> rtems_barrier_release() releases only the threads currently waiting at
> the barrier.  Now we have several code sections like this:
>
>       PIPE_UNLOCK(pipe);
>       if (! PIPE_WRITEWAIT(pipe))
>         ret = -EINTR;
>
> Thus the release may have no effect in case it happens after the
> PIPE_UNLOCK and before the PIPE_WRITEWAIT (which is a
> rtems_barrier_wait() with no time out).
>
> The bdbuf code uses a similar approach with semaphores.  It disables the
> preemption to avoid this race condition:
>
> static void
> rtems_bdbuf_anonymous_wait (rtems_bdbuf_waiters *waiters)
> {
>   rtems_status_code sc;
>   rtems_mode        prev_mode;
>
>   /*
>    * Indicate we are waiting.
>    */
>   ++waiters->count;
>
>   /*
>    * Disable preemption then unlock the cache and block.  There is no
> POSIX
>    * condition variable in the core API so this is a work around.
>    *
>    * The issue is a task could preempt after the cache is unlocked
> because it is
>    * blocking or just hits that window, and before this task has blocked
> on the
>    * semaphore. If the preempting task flushes the queue this task will
> not see
>    * the flush and may block for ever or until another transaction
> flushes this
>    * semaphore.
>    */
>   prev_mode = rtems_bdbuf_disable_preemption ();
>
>   /*
>    * Unlock the cache, wait, and lock the cache when we return.
>    */
>   rtems_bdbuf_unlock_cache ();
>
>   sc = rtems_semaphore_obtain (waiters->sema, RTEMS_WAIT,
> RTEMS_BDBUF_WAIT_TIMEOUT);
>
>   if (sc == RTEMS_TIMEOUT)
>     rtems_fatal_error_occurred (RTEMS_BLKDEV_FATAL_BDBUF_CACHE_WAIT_TO);
>
>   if (sc != RTEMS_UNSATISFIED)
>     rtems_fatal_error_occurred (RTEMS_BLKDEV_FATAL_BDBUF_CACHE_WAIT_2);
>
>   rtems_bdbuf_lock_cache ();
>
>   rtems_bdbuf_restore_preemption (prev_mode);
>
>   --waiters->count;
> }
>
> Please note that this approach is not SMP proof.  I think we need
> condition variables in the RTEMS Classic API.

New description:

 Does anyone know why the FIFO/pipe implementation uses barriers for
 synchronization?  For me the usage of the barrier looks like a hand
 crafted condition variable.

 I think there is also a race condition in its use.  The
 rtems_barrier_release() releases only the threads currently waiting at the
 barrier.  Now we have several code sections like this:

       PIPE_UNLOCK(pipe);
       if (! PIPE_WRITEWAIT(pipe))
         ret = -EINTR;

 Thus the release may have no effect in case it happens after the
 PIPE_UNLOCK and before the PIPE_WRITEWAIT (which is a rtems_barrier_wait()
 with no time out).

 The bdbuf code uses a similar approach with semaphores.  It disables the
 preemption to avoid this race condition:

 static void
 rtems_bdbuf_anonymous_wait (rtems_bdbuf_waiters *waiters)
 {
   rtems_status_code sc;
   rtems_mode        prev_mode;

   /*
    * Indicate we are waiting.
    */
   ++waiters->count;

   /*
    * Disable preemption then unlock the cache and block.  There is no
 POSIX
    * condition variable in the core API so this is a work around.
    *
    * The issue is a task could preempt after the cache is unlocked because
 it is
    * blocking or just hits that window, and before this task has blocked
 on the
    * semaphore. If the preempting task flushes the queue this task will
 not see
    * the flush and may block for ever or until another transaction flushes
 this
    * semaphore.
    */
   prev_mode = rtems_bdbuf_disable_preemption ();

   /*
    * Unlock the cache, wait, and lock the cache when we return.
    */
   rtems_bdbuf_unlock_cache ();

   sc = rtems_semaphore_obtain (waiters->sema, RTEMS_WAIT,
 RTEMS_BDBUF_WAIT_TIMEOUT);

   if (sc == RTEMS_TIMEOUT)
     rtems_fatal_error_occurred (RTEMS_BLKDEV_FATAL_BDBUF_CACHE_WAIT_TO);

   if (sc != RTEMS_UNSATISFIED)
     rtems_fatal_error_occurred (RTEMS_BLKDEV_FATAL_BDBUF_CACHE_WAIT_2);

   rtems_bdbuf_lock_cache ();

   rtems_bdbuf_restore_preemption (prev_mode);

   --waiters->count;
 }

 Please note that this approach is not SMP proof.  I think we need
 condition variables in the RTEMS Classic API.

--

--
Ticket URL: <http://devel.rtems.org/ticket/2054#comment:3>
RTEMS Project <http://www.rtems.org/>
RTEMS Project