Possible bug in _CORE_mutex_Seize()
Till Straumann
strauman at SLAC.Stanford.EDU
Tue Sep 30 00:33:47 UTC 2003
I gave this a first shot:
- added paranoia to _Core_mutex_Seize()
- allocate the reent struct from the workspace (should be legal
_because_ the hooks are 'dispatch disabled' protected).
- T.
Till Straumann wrote:
> Joel Sherrill wrote:
>
>> Following up on my own post since I didn't notice the bottom where you
>> actually explained
>> where malloc() was called when I read this early this morning.
>>
>> Phil Torre wrote:
>>
>>> As unlikely as it sounds, I think we have found a bug in
>>> _CORE_mutex_Seize()
>>> which violates mutual exclusion.
>>>
>>>
>> It simply assumes that _Thread_Dispatch_disable_level is 0. So
>> calling malloc() from within
>> a directive which uses dispatch disable locking is dangerous.
>> I think I have a solution. Move the body of the libc start hook to
>> the create hook.
>
>
>
> This is not gonna work :-( -- rtems_task_create() also uses dispatch
> disabling...
> How about allocating the reent structure from the workspace ?
>
>
> -- Till
>
>> It is probably
>> also necessary to change cpukit/score/include/rtems/score/apimutex.h
>> so that _API_Mutex_Allocate()
>> creates the Allocator mutex as nestable rather than
>> CORE_MUTEX_NESTING_IS_ERROR.
>>
>> Finally, it might not be a bad idea for it to be considered a fatal
>> RTEMS error if _API_Mutex_Locks
>> wants to block when _Thread_Dispatch_disable is non-zero. That would
>> be easier than this
>> happening again and debugging it.
>>
>> It might also be valid to consider it a fatal error in a memory
>> allocation is attempted when
>> _Thread_Dispatch_disable is zero.
>>
>>> This pertains to rtems-4.6.0pre4 running on MPC860 with an
>>> unsubmitted BSP.
>>> The sequence of events goes like this:
>>>
>>>
>>> 1. Thread 1 (Init) is running at priority 1. It creates and
>>> starts thread 2 (notification_task) at priority 196. Since
>>> thread 2 is
>>> at a lower priority, it doesn't start executing yet.
>>>
>>> 2. Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
>>> external hardware to do something. As soon as it goes to sleep,
>>> thread 2 is now runnable and starts executing.
>>>
>>> 3. Thread 2 does some stuff, and then calls malloc(). Halfway
>>> through
>>> rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
>>> We do a context switch and thread 1 is now running.
>>>
>>> ** Before it lost the CPU, thread 2 had successfully called
>>> **
>>> ** _RTEMS_Lock_allocator(). _RTEMS_Allocator_Mutex is held by **
>>> ** thread 2 when the context switch back to thread 1 occurs. **
>>>
>>> 4. Thread 1 now calls rtems_start_task(), which invokes malloc(),
>>> which
>>> calls
>>> rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
>>>
>>> _RTEMS_Lock_allocator() returns, *without blocking*. The allocator
>>> mutex is still held by thread 2, yet thread 1 proceeds in the belief
>>> that it has the mutex.
>>>
>>> More detail:
>>> When thread 1 calls rtems_task_start() in step #4, that function
>>> calls _Thread_Get() on the task we want to start. As a side effect,
>>> _Thread_Get() increments _Thread_Dispatch_disable_level to 1.
>>>
>>> Shortly thereafter, _User_extensions_Thread_start() is called, which
>>> calls libc_start_hook(), which calls calloc()->malloc()->
>>>
>>> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
>>>
>>> (Note that _Thread_Dispatch_disable_level is stil 1.)
>>> _CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
>>> we
>>> call _Thread_Disable_dispatch() (disable level is now 2!) followed
>>> by
>>> _CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
>>> Because _Thread_Dispatch_disable_level is 2, the call to
>>> _Thread_Enable_dispatch()
>>> just decrements it to 1 and returns without calling
>>> _Thread_Dispatch().
>>> Thread 1 now happily proceeds to corrupt the heap free block chain.
>>>
>>> I don't understand the semantics of _Thread_Dispatch_disable_level well
>>> enough to
>>> provide a patch. For now we will work around it by making sure our
>>> tasks
>>> don't call
>>> malloc() at the same time. Hopefully those with deep kernel
>>> understanding
>>> can
>>> take a look at this and tell me if I'm smoking crack. :)
>>>
>>> -Phil
>>>
>>>
>>>
>>
>>
>>
>
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: coremutex_seize_paranoia.diff
URL: <http://lists.rtems.org/pipermail/users/attachments/20030929/7e69ebdb/attachment-0001.ksh>
More information about the users
mailing list