Possible bug in _CORE_mutex_Seize()

Sat Sep 27 18:52:37 UTC 2003

Following up on my own post since I didn't notice the bottom where you 
actually explained
where malloc() was called when I read this early this morning.

Phil Torre wrote:

>As unlikely as it sounds, I think we have found a bug in _CORE_mutex_Seize()
>which violates mutual exclusion.
>  
>
It simply assumes that _Thread_Dispatch_disable_level is 0.  So calling 
malloc() from within
a directive which uses dispatch disable locking is dangerous. 

I think I have a solution.  Move the body of  the libc start hook to the 
create hook.   It is probably
also necessary to change cpukit/score/include/rtems/score/apimutex.h so 
that _API_Mutex_Allocate()
creates the Allocator mutex as nestable rather than 
CORE_MUTEX_NESTING_IS_ERROR.

Finally, it might not be a bad idea for it to be considered a fatal 
RTEMS error if _API_Mutex_Locks
wants to block when _Thread_Dispatch_disable is non-zero.  That would be 
easier than this
happening again and debugging it.

It might also be valid to consider it a fatal error in a memory 
allocation is attempted when
_Thread_Dispatch_disable is zero.

>This pertains to rtems-4.6.0pre4 running on MPC860 with an unsubmitted BSP.
>The sequence of events goes like this:
>
>
>1.	Thread 1 (Init) is running at priority 1.  It creates and starts 
>	thread 2 (notification_task) at priority 196.  Since thread 2 is
>	at a lower priority, it doesn't start executing yet.
>
>2.	Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
>	external hardware to do something.  As soon as it goes to sleep,
>	thread 2 is now runnable and starts executing.
>
>3.	Thread 2 does some stuff, and then calls malloc().  Halfway through
>	rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
>	We do a context switch and thread 1 is now running.
>
>	** Before it lost the CPU, thread 2 had successfully called
>**
>	** _RTEMS_Lock_allocator().  _RTEMS_Allocator_Mutex is held by	**
>	** thread 2 when the context switch back to thread 1 occurs.	**
>
>4.	Thread 1 now calls rtems_start_task(), which invokes malloc(), which
>calls
>	rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
>
>	_RTEMS_Lock_allocator() returns, *without blocking*.  The allocator
>	mutex is still held by thread 2, yet thread 1 proceeds in the belief
>	that it has the mutex.
>
>	More detail:
>	When thread 1 calls rtems_task_start() in step #4, that function
>	calls _Thread_Get() on the task we want to start.  As a side effect,
>	_Thread_Get() increments _Thread_Dispatch_disable_level to 1.
>
>	Shortly thereafter, _User_extensions_Thread_start() is called, which
>	calls libc_start_hook(), which calls calloc()->malloc()->
>	
>rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
>	(Note that _Thread_Dispatch_disable_level is stil 1.)
>	_CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
>we
>	call _Thread_Disable_dispatch() (disable level is now 2!) followed
>by
>	_CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
>	
>	Because _Thread_Dispatch_disable_level is 2, the call to
>_Thread_Enable_dispatch()
>	just decrements it to 1 and returns without calling
>_Thread_Dispatch().
>	Thread 1 now happily proceeds to corrupt the heap free block chain.
>	
>
>I don't understand the semantics of _Thread_Dispatch_disable_level well
>enough to
>provide a patch.  For now we will work around it by making sure our tasks
>don't call
>malloc() at the same time.  Hopefully those with deep kernel understanding
>can
>take a look at this and tell me if I'm smoking crack.  :)
>
>-Phil
>
>  
>