Possible bug in _CORE_mutex_Seize()

Mon Sep 29 18:37:52 UTC 2003

Joel Sherrill wrote:
> Following up on my own post since I didn't notice the bottom where you 
> actually explained
> where malloc() was called when I read this early this morning.
> 

Funny - I had missed that, too...

> Phil Torre wrote:
> 
>> As unlikely as it sounds, I think we have found a bug in 
>> _CORE_mutex_Seize()
>> which violates mutual exclusion.
>>  
>>
> It simply assumes that _Thread_Dispatch_disable_level is 0.  So calling 
> malloc() from within
> a directive which uses dispatch disable locking is dangerous.
> I think I have a solution.  Move the body of  the libc start hook to the 
> create hook.   It is probably
> also necessary to change cpukit/score/include/rtems/score/apimutex.h so 
> that _API_Mutex_Allocate()
> creates the Allocator mutex as nestable rather than 
> CORE_MUTEX_NESTING_IS_ERROR.
> 
> Finally, it might not be a bad idea for it to be considered a fatal 
> RTEMS error if _API_Mutex_Locks
> wants to block when _Thread_Dispatch_disable is non-zero.  That would be 
> easier than this
> happening again and debugging it.
>
> It might also be valid to consider it a fatal error in a memory 
> allocation is attempted when
> _Thread_Dispatch_disable is zero.
>

you mean _Thread_Dispatch_disable_level is non-zero?

Joel - are you going to put out a patch for this? We soon are
going into production and issues like this one make me nervous ;-)

IMHO, it would be nice to have pre5 out ASAP, as there are a
few critical fixes which have been closed since pre4

-- Till

>> This pertains to rtems-4.6.0pre4 running on MPC860 with an unsubmitted 
>> BSP.
>> The sequence of events goes like this:
>>
>>
>> 1.    Thread 1 (Init) is running at priority 1.  It creates and starts 
>>     thread 2 (notification_task) at priority 196.  Since thread 2 is
>>     at a lower priority, it doesn't start executing yet.
>>
>> 2.    Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
>>     external hardware to do something.  As soon as it goes to sleep,
>>     thread 2 is now runnable and starts executing.
>>
>> 3.    Thread 2 does some stuff, and then calls malloc().  Halfway through
>>     rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
>>     We do a context switch and thread 1 is now running.
>>
>>     ** Before it lost the CPU, thread 2 had successfully called
>> **
>>     ** _RTEMS_Lock_allocator().  _RTEMS_Allocator_Mutex is held by    **
>>     ** thread 2 when the context switch back to thread 1 occurs.    **
>>
>> 4.    Thread 1 now calls rtems_start_task(), which invokes malloc(), 
>> which
>> calls
>>     rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
>>
>>     _RTEMS_Lock_allocator() returns, *without blocking*.  The allocator
>>     mutex is still held by thread 2, yet thread 1 proceeds in the belief
>>     that it has the mutex.
>>
>>     More detail:
>>     When thread 1 calls rtems_task_start() in step #4, that function
>>     calls _Thread_Get() on the task we want to start.  As a side effect,
>>     _Thread_Get() increments _Thread_Dispatch_disable_level to 1.
>>
>>     Shortly thereafter, _User_extensions_Thread_start() is called, which
>>     calls libc_start_hook(), which calls calloc()->malloc()->
>>     
>> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
>>     (Note that _Thread_Dispatch_disable_level is stil 1.)
>>     _CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
>> we
>>     call _Thread_Disable_dispatch() (disable level is now 2!) followed
>> by
>>     _CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
>>     
>>     Because _Thread_Dispatch_disable_level is 2, the call to
>> _Thread_Enable_dispatch()
>>     just decrements it to 1 and returns without calling
>> _Thread_Dispatch().
>>     Thread 1 now happily proceeds to corrupt the heap free block chain.
>>     
>>
>> I don't understand the semantics of _Thread_Dispatch_disable_level well
>> enough to
>> provide a patch.  For now we will work around it by making sure our tasks
>> don't call
>> malloc() at the same time.  Hopefully those with deep kernel 
>> understanding
>> can
>> take a look at this and tell me if I'm smoking crack.  :)
>>
>> -Phil
>>
>>  
>>
> 
> 
>