Possible bug in _CORE_mutex_Seize()

Sat Sep 27 13:35:21 UTC 2003

Phil Torre wrote:
> 
> As unlikely as it sounds, I think we have found a bug in _CORE_mutex_Seize()
> which violates mutual exclusion.
> 
> This pertains to rtems-4.6.0pre4 running on MPC860 with an unsubmitted BSP.
> The sequence of events goes like this:
> 
> 1.      Thread 1 (Init) is running at priority 1.  It creates and starts
>         thread 2 (notification_task) at priority 196.  Since thread 2 is
>         at a lower priority, it doesn't start executing yet.
> 
> 2.      Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
>         external hardware to do something.  As soon as it goes to sleep,
>         thread 2 is now runnable and starts executing.
> 
> 3.      Thread 2 does some stuff, and then calls malloc().  Halfway through
>         rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
>         We do a context switch and thread 1 is now running.
> 
>         ** Before it lost the CPU, thread 2 had successfully called
> **
>         ** _RTEMS_Lock_allocator().  _RTEMS_Allocator_Mutex is held by  **
>         ** thread 2 when the context switch back to thread 1 occurs.    **
> 
> 4.      Thread 1 now calls rtems_start_task(), which invokes malloc(), which
> calls
>         rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
> 
>         _RTEMS_Lock_allocator() returns, *without blocking*.  The allocator
>         mutex is still held by thread 2, yet thread 1 proceeds in the belief
>         that it has the mutex.
> 
>         More detail:
>         When thread 1 calls rtems_task_start() in step #4, that function
>         calls _Thread_Get() on the task we want to start.  As a side effect,
>         _Thread_Get() increments _Thread_Dispatch_disable_level to 1.
> 
>         Shortly thereafter, _User_extensions_Thread_start() is called, which
>         calls libc_start_hook(), which calls calloc()->malloc()->
> 
> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
>         (Note that _Thread_Dispatch_disable_level is stil 1.)
>         _CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
> we
>         call _Thread_Disable_dispatch() (disable level is now 2!) followed
> by
>         _CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
> 
>         Because _Thread_Dispatch_disable_level is 2, the call to
> _Thread_Enable_dispatch()
>         just decrements it to 1 and returns without calling
> _Thread_Dispatch().
>         Thread 1 now happily proceeds to corrupt the heap free block chain.
> 
> 
> I don't understand the semantics of _Thread_Dispatch_disable_level well
> enough to
> provide a patch.  For now we will work around it by making sure our tasks
> don't call
> malloc() at the same time.  Hopefully those with deep kernel understanding
> can
> take a look at this and tell me if I'm smoking crack.  :)

Who is calling malloc() as a side-effect of rtems_task_start?  That was
never
an allowed operation.  If there is a user extension set that is doing
malloc()
during start, then it should do it during create.

You have to be EXTREMELY careful what you do in user extensions as they
are
called at very fragile moments in the system.

> -Phil
> 
> --
> 
> =====================================================================
> Phil Torre                               phone: 425-820-6363 x234
> Design Engineer                          email: ptorre at zetron.com
> Switching Systems Group                    fax: 425-820-7031
> Zetron, Inc.                               web: http://www.zetron.com
> 
> 

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel at OARcorp.com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
   Support Available             (256) 722-9985