Possible bug in _CORE_mutex_Seize()
Till Straumann
strauman at slac.stanford.edu
Tue Sep 30 03:56:39 UTC 2003
Joel Sherrill wrote:
> Till Straumann wrote:
>
>> Joel Sherrill wrote:
>>
>>> Till Straumann wrote:
>>>
>>>> Joel Sherrill wrote:
>>>>
>>>>> Following up on my own post since I didn't notice the bottom where
>>>>> you actually explained
>>>>> where malloc() was called when I read this early this morning.
>>>>>
>>>>> Phil Torre wrote:
>>>>>
>>>>>> As unlikely as it sounds, I think we have found a bug in
>>>>>> _CORE_mutex_Seize()
>>>>>> which violates mutual exclusion.
>>>>>>
>>>>>>
>>>>> It simply assumes that _Thread_Dispatch_disable_level is 0. So
>>>>> calling malloc() from within
>>>>> a directive which uses dispatch disable locking is dangerous.
>>>>> I think I have a solution. Move the body of the libc start hook
>>>>> to the create hook.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This is not gonna work :-( -- rtems_task_create() also uses dispatch
>>>> disabling...
>>>
>>>
>>>
>>>
>>> It will work if the allocation mutex allows nesting so the
>>> initialization call in cpukit/score/.../apimutex.h
>>> needs to change from nesting is error. See my earlier comment below
>>> your -- Till. :)
>>>
>>
>> Hmm - unless I'm missing something, I can't see what this has to do
>> with nesting. Could you explain how this should work, please?
>>
>> (Scenario: Thread 2 holds the allocator mutex and is suspended; thread 1
>> tries to acquire the allocator mutex (nesting allowed) from
>> a dispatching disabled section -> blocking of 1 is
>> effectively
>> postponed until 1 leaves the dispatching disabled section).
>
>
>
> It doesn't unless you move the malloc to the create hook. Also your
> patch doesn't
> check for a NULL pointer from Workspace_Allocate which is a distinct
> possibility.
You're right - 'memset()' should be called after checking the ptr for
NULL, of course.
>
>
> I generally like using the workspace instead of the main heap. But
> all code allocating
> memory should be in create hooks -- not start hooks. Then the nesting
> will work since
> it is inside another allocator critical section.
??? Sorry for being a pain - I still don't understand. Would you mind
explaining
this in a little more detail, please? When I look at
rtems_task_create(), I don't see where the
allocator mutex is acquired outside of the _Thread_Dispatch_disable()
protected section
(as a matter of fact, I don't see any nested RTEMS_Lock_allocator during
'create' at all)...
>
>
>>
>> -- Till
>>
>>
>>> The error checking I proposed won't work without tinkering though.
>>> Let's ignore that for the
>>> first pass.
>>>
>>>> How about allocating the reent structure from the workspace ?
>>>
>>>
>>>
>>>
>>> That's not necessarily a bad idea either. It is designed for memory
>>> allocation inside a directive.
>>> But if the allocation mutex allows nesting, it isn't necessary.
>>>
>>>>
>>>>
>>>> -- Till
>>>>
>>>>> It is probably
>>>>> also necessary to change
>>>>> cpukit/score/include/rtems/score/apimutex.h so that
>>>>> _API_Mutex_Allocate()
>>>>> creates the Allocator mutex as nestable rather than
>>>>> CORE_MUTEX_NESTING_IS_ERROR.
>>>>>
>>>>> Finally, it might not be a bad idea for it to be considered a
>>>>> fatal RTEMS error if _API_Mutex_Locks
>>>>> wants to block when _Thread_Dispatch_disable is non-zero. That
>>>>> would be easier than this
>>>>> happening again and debugging it.
>>>>>
>>>>> It might also be valid to consider it a fatal error in a memory
>>>>> allocation is attempted when
>>>>> _Thread_Dispatch_disable is zero.
>>>>>
>>>>>> This pertains to rtems-4.6.0pre4 running on MPC860 with an
>>>>>> unsubmitted BSP.
>>>>>> The sequence of events goes like this:
>>>>>>
>>>>>>
>>>>>> 1. Thread 1 (Init) is running at priority 1. It creates and
>>>>>> starts thread 2 (notification_task) at priority 196. Since
>>>>>> thread 2 is
>>>>>> at a lower priority, it doesn't start executing yet.
>>>>>>
>>>>>> 2. Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait
>>>>>> for some
>>>>>> external hardware to do something. As soon as it goes to sleep,
>>>>>> thread 2 is now runnable and starts executing.
>>>>>>
>>>>>> 3. Thread 2 does some stuff, and then calls malloc(). Halfway
>>>>>> through
>>>>>> rtems_region_get_segment(), the 10ms timer set by thread 1
>>>>>> expires.
>>>>>> We do a context switch and thread 1 is now running.
>>>>>>
>>>>>> ** Before it lost the CPU, thread 2 had successfully called
>>>>>> **
>>>>>> ** _RTEMS_Lock_allocator(). _RTEMS_Allocator_Mutex is held
>>>>>> by **
>>>>>> ** thread 2 when the context switch back to thread 1
>>>>>> occurs. **
>>>>>>
>>>>>> 4. Thread 1 now calls rtems_start_task(), which invokes
>>>>>> malloc(), which
>>>>>> calls
>>>>>> rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
>>>>>>
>>>>>> _RTEMS_Lock_allocator() returns, *without blocking*. The
>>>>>> allocator
>>>>>> mutex is still held by thread 2, yet thread 1 proceeds in the
>>>>>> belief
>>>>>> that it has the mutex.
>>>>>>
>>>>>> More detail:
>>>>>> When thread 1 calls rtems_task_start() in step #4, that function
>>>>>> calls _Thread_Get() on the task we want to start. As a side
>>>>>> effect,
>>>>>> _Thread_Get() increments _Thread_Dispatch_disable_level to 1.
>>>>>>
>>>>>> Shortly thereafter, _User_extensions_Thread_start() is
>>>>>> called, which
>>>>>> calls libc_start_hook(), which calls calloc()->malloc()->
>>>>>>
>>>>>> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
>>>>>>
>>>>>> (Note that _Thread_Dispatch_disable_level is stil 1.)
>>>>>> _CORE_mutex_Seize_interrupt_trylock() returns 1 (as it
>>>>>> should), so
>>>>>> we
>>>>>> call _Thread_Disable_dispatch() (disable level is now 2!)
>>>>>> followed
>>>>>> by
>>>>>> _CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
>>>>>> Because _Thread_Dispatch_disable_level is 2, the call to
>>>>>> _Thread_Enable_dispatch()
>>>>>> just decrements it to 1 and returns without calling
>>>>>> _Thread_Dispatch().
>>>>>> Thread 1 now happily proceeds to corrupt the heap free block
>>>>>> chain.
>>>>>> I don't understand the semantics of
>>>>>> _Thread_Dispatch_disable_level well
>>>>>> enough to
>>>>>> provide a patch. For now we will work around it by making sure
>>>>>> our tasks
>>>>>> don't call
>>>>>> malloc() at the same time. Hopefully those with deep kernel
>>>>>> understanding
>>>>>> can
>>>>>> take a look at this and tell me if I'm smoking crack. :)
>>>>>>
>>>>>> -Phil
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>
More information about the users
mailing list