Possible bug in _CORE_mutex_Seize()

Tue Sep 30 03:56:39 UTC 2003

Joel Sherrill wrote:

> Till Straumann wrote:
>
>> Joel Sherrill wrote:
>>
>>> Till Straumann wrote:
>>>
>>>> Joel Sherrill wrote:
>>>>
>>>>> Following up on my own post since I didn't notice the bottom where 
>>>>> you actually explained
>>>>> where malloc() was called when I read this early this morning.
>>>>>
>>>>> Phil Torre wrote:
>>>>>
>>>>>> As unlikely as it sounds, I think we have found a bug in 
>>>>>> _CORE_mutex_Seize()
>>>>>> which violates mutual exclusion.
>>>>>>  
>>>>>>
>>>>> It simply assumes that _Thread_Dispatch_disable_level is 0.  So 
>>>>> calling malloc() from within
>>>>> a directive which uses dispatch disable locking is dangerous.
>>>>> I think I have a solution.  Move the body of  the libc start hook 
>>>>> to the create hook. 
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> This is not gonna work :-( -- rtems_task_create() also uses dispatch
>>>> disabling... 
>>>
>>>
>>>
>>>
>>> It will work if the allocation mutex allows nesting so the 
>>> initialization call in cpukit/score/.../apimutex.h
>>> needs to change from nesting is error.  See my earlier comment below 
>>> your -- Till. :)
>>>
>>
>> Hmm - unless I'm missing something, I can't see what this has to do
>> with nesting. Could you explain how this should work, please?
>>
>> (Scenario: Thread 2 holds the allocator mutex and is suspended; thread 1
>>            tries to acquire the allocator mutex (nesting allowed) from
>>            a dispatching disabled section -> blocking of 1 is 
>> effectively
>>            postponed until 1 leaves the dispatching disabled section).
>
>
>
> It doesn't unless you move the malloc to the create hook.   Also your 
> patch doesn't
> check for a NULL pointer from Workspace_Allocate which is a distinct 
> possibility. 

You're right - 'memset()' should be called after checking the ptr for 
NULL, of course.

>
>
> I generally like using the workspace instead of the main heap.  But 
> all code allocating
> memory should be in create hooks -- not start hooks.  Then the nesting 
> will work since
> it is inside another allocator critical section. 

??? Sorry for being a pain - I still don't understand. Would you mind 
explaining
this in a little more detail, please? When I look at 
rtems_task_create(), I don't see where the
allocator mutex is acquired outside of the _Thread_Dispatch_disable() 
protected section
(as a matter of fact, I don't see any nested RTEMS_Lock_allocator during 
'create' at all)...

>
>
>>
>> -- Till
>>
>>
>>> The error checking I proposed won't work without tinkering though.  
>>> Let's ignore that for the
>>> first pass.
>>>
>>>> How about allocating the reent structure from the workspace ?
>>>
>>>
>>>
>>>
>>> That's not necessarily a bad idea either.  It is designed for memory 
>>> allocation inside a directive.
>>> But if the allocation mutex allows nesting, it isn't necessary.
>>>
>>>>
>>>>
>>>> -- Till
>>>>
>>>>>  It is probably
>>>>> also necessary to change 
>>>>> cpukit/score/include/rtems/score/apimutex.h so that 
>>>>> _API_Mutex_Allocate()
>>>>> creates the Allocator mutex as nestable rather than 
>>>>> CORE_MUTEX_NESTING_IS_ERROR.
>>>>>
>>>>> Finally, it might not be a bad idea for it to be considered a 
>>>>> fatal RTEMS error if _API_Mutex_Locks
>>>>> wants to block when _Thread_Dispatch_disable is non-zero.  That 
>>>>> would be easier than this
>>>>> happening again and debugging it.
>>>>>
>>>>> It might also be valid to consider it a fatal error in a memory 
>>>>> allocation is attempted when
>>>>> _Thread_Dispatch_disable is zero.
>>>>>
>>>>>> This pertains to rtems-4.6.0pre4 running on MPC860 with an 
>>>>>> unsubmitted BSP.
>>>>>> The sequence of events goes like this:
>>>>>>
>>>>>>
>>>>>> 1.    Thread 1 (Init) is running at priority 1.  It creates and 
>>>>>> starts     thread 2 (notification_task) at priority 196.  Since 
>>>>>> thread 2 is
>>>>>>     at a lower priority, it doesn't start executing yet.
>>>>>>
>>>>>> 2.    Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait 
>>>>>> for some
>>>>>>     external hardware to do something.  As soon as it goes to sleep,
>>>>>>     thread 2 is now runnable and starts executing.
>>>>>>
>>>>>> 3.    Thread 2 does some stuff, and then calls malloc().  Halfway 
>>>>>> through
>>>>>>     rtems_region_get_segment(), the 10ms timer set by thread 1 
>>>>>> expires.
>>>>>>     We do a context switch and thread 1 is now running.
>>>>>>
>>>>>>     ** Before it lost the CPU, thread 2 had successfully called
>>>>>> **
>>>>>>     ** _RTEMS_Lock_allocator().  _RTEMS_Allocator_Mutex is held 
>>>>>> by    **
>>>>>>     ** thread 2 when the context switch back to thread 1 
>>>>>> occurs.    **
>>>>>>
>>>>>> 4.    Thread 1 now calls rtems_start_task(), which invokes 
>>>>>> malloc(), which
>>>>>> calls
>>>>>>     rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
>>>>>>
>>>>>>     _RTEMS_Lock_allocator() returns, *without blocking*.  The 
>>>>>> allocator
>>>>>>     mutex is still held by thread 2, yet thread 1 proceeds in the 
>>>>>> belief
>>>>>>     that it has the mutex.
>>>>>>
>>>>>>     More detail:
>>>>>>     When thread 1 calls rtems_task_start() in step #4, that function
>>>>>>     calls _Thread_Get() on the task we want to start.  As a side 
>>>>>> effect,
>>>>>>     _Thread_Get() increments _Thread_Dispatch_disable_level to 1.
>>>>>>
>>>>>>     Shortly thereafter, _User_extensions_Thread_start() is 
>>>>>> called, which
>>>>>>     calls libc_start_hook(), which calls calloc()->malloc()->
>>>>>>     
>>>>>> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize(). 
>>>>>>
>>>>>>     (Note that _Thread_Dispatch_disable_level is stil 1.)
>>>>>>     _CORE_mutex_Seize_interrupt_trylock() returns 1 (as it 
>>>>>> should), so
>>>>>> we
>>>>>>     call _Thread_Disable_dispatch() (disable level is now 2!) 
>>>>>> followed
>>>>>> by
>>>>>>     _CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
>>>>>>         Because _Thread_Dispatch_disable_level is 2, the call to
>>>>>> _Thread_Enable_dispatch()
>>>>>>     just decrements it to 1 and returns without calling
>>>>>> _Thread_Dispatch().
>>>>>>     Thread 1 now happily proceeds to corrupt the heap free block 
>>>>>> chain.
>>>>>>    I don't understand the semantics of 
>>>>>> _Thread_Dispatch_disable_level well
>>>>>> enough to
>>>>>> provide a patch.  For now we will work around it by making sure 
>>>>>> our tasks
>>>>>> don't call
>>>>>> malloc() at the same time.  Hopefully those with deep kernel 
>>>>>> understanding
>>>>>> can
>>>>>> take a look at this and tell me if I'm smoking crack.  :)
>>>>>>
>>>>>> -Phil
>>>>>>
>>>>>>  
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>