Possible bug in _CORE_mutex_Seize()
Till Straumann
strauman at SLAC.Stanford.EDU
Sat Sep 27 00:52:08 UTC 2003
At a first glance, this doesn't look too good.
It seems to be a general issue with mixing
protection by dispatch disabling with blocking
synchronization.
From a dispatching disabled section, blocking
primitives must not be used (very much like
thy must not be used from an ISR) - obviously, the
blocking is effectively delayed until leaving
the the outermost dispatching disabled scope
-- with possibliy desastrous consequences!
However, I still cannot see how 'malloc()' fits
into the picture -- AFAIK, no kernel code calls
C-library code??? How does 'rtems_start_task()'
(do you mean rtems_task_start() ?) call 'malloc()' ???
You didn't call malloc from a user extension, did you?
-- Till
Phil Torre wrote:
> As unlikely as it sounds, I think we have found a bug in _CORE_mutex_Seize()
> which violates mutual exclusion.
>
> This pertains to rtems-4.6.0pre4 running on MPC860 with an unsubmitted BSP.
> The sequence of events goes like this:
>
>
> 1. Thread 1 (Init) is running at priority 1. It creates and starts
> thread 2 (notification_task) at priority 196. Since thread 2 is
> at a lower priority, it doesn't start executing yet.
>
> 2. Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
> external hardware to do something. As soon as it goes to sleep,
> thread 2 is now runnable and starts executing.
>
> 3. Thread 2 does some stuff, and then calls malloc(). Halfway through
> rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
> We do a context switch and thread 1 is now running.
>
> ** Before it lost the CPU, thread 2 had successfully called
> **
> ** _RTEMS_Lock_allocator(). _RTEMS_Allocator_Mutex is held by **
> ** thread 2 when the context switch back to thread 1 occurs. **
>
> 4. Thread 1 now calls rtems_start_task(), which invokes malloc(), which
> calls
> rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
>
> _RTEMS_Lock_allocator() returns, *without blocking*. The allocator
> mutex is still held by thread 2, yet thread 1 proceeds in the belief
> that it has the mutex.
>
> More detail:
> When thread 1 calls rtems_task_start() in step #4, that function
> calls _Thread_Get() on the task we want to start. As a side effect,
> _Thread_Get() increments _Thread_Dispatch_disable_level to 1.
>
> Shortly thereafter, _User_extensions_Thread_start() is called, which
> calls libc_start_hook(), which calls calloc()->malloc()->
>
> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
> (Note that _Thread_Dispatch_disable_level is stil 1.)
> _CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
> we
> call _Thread_Disable_dispatch() (disable level is now 2!) followed
> by
> _CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
>
> Because _Thread_Dispatch_disable_level is 2, the call to
> _Thread_Enable_dispatch()
> just decrements it to 1 and returns without calling
> _Thread_Dispatch().
> Thread 1 now happily proceeds to corrupt the heap free block chain.
>
>
> I don't understand the semantics of _Thread_Dispatch_disable_level well
> enough to
> provide a patch. For now we will work around it by making sure our tasks
> don't call
> malloc() at the same time. Hopefully those with deep kernel understanding
> can
> take a look at this and tell me if I'm smoking crack. :)
>
> -Phil
>
More information about the users
mailing list