Possible bug in _CORE_mutex_Seize()

Sat Sep 27 00:52:08 UTC 2003

At a first glance, this doesn't look too good.

It seems to be a general issue with mixing
protection by dispatch disabling with blocking
synchronization.

 From a dispatching disabled section, blocking
primitives must not be used (very much like
thy must not be used from an ISR) - obviously, the
blocking is effectively delayed until leaving
the the outermost dispatching disabled scope
-- with possibliy desastrous consequences!

However, I still cannot see how 'malloc()' fits
into the picture -- AFAIK, no kernel code calls
C-library code??? How does 'rtems_start_task()'
(do you mean rtems_task_start() ?) call 'malloc()' ???

You didn't call malloc from a user extension, did you?

-- Till

Phil Torre wrote:
> As unlikely as it sounds, I think we have found a bug in _CORE_mutex_Seize()
> which violates mutual exclusion.
> 
> This pertains to rtems-4.6.0pre4 running on MPC860 with an unsubmitted BSP.
> The sequence of events goes like this:
> 
> 
> 1.	Thread 1 (Init) is running at priority 1.  It creates and starts 
> 	thread 2 (notification_task) at priority 196.  Since thread 2 is
> 	at a lower priority, it doesn't start executing yet.
> 
> 2.	Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
> 	external hardware to do something.  As soon as it goes to sleep,
> 	thread 2 is now runnable and starts executing.
> 
> 3.	Thread 2 does some stuff, and then calls malloc().  Halfway through
> 	rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
> 	We do a context switch and thread 1 is now running.
> 
> 	** Before it lost the CPU, thread 2 had successfully called
> **
> 	** _RTEMS_Lock_allocator().  _RTEMS_Allocator_Mutex is held by	**
> 	** thread 2 when the context switch back to thread 1 occurs.	**
> 
> 4.	Thread 1 now calls rtems_start_task(), which invokes malloc(), which
> calls
> 	rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().
> 
> 	_RTEMS_Lock_allocator() returns, *without blocking*.  The allocator
> 	mutex is still held by thread 2, yet thread 1 proceeds in the belief
> 	that it has the mutex.
> 
> 	More detail:
> 	When thread 1 calls rtems_task_start() in step #4, that function
> 	calls _Thread_Get() on the task we want to start.  As a side effect,
> 	_Thread_Get() increments _Thread_Dispatch_disable_level to 1.
> 
> 	Shortly thereafter, _User_extensions_Thread_start() is called, which
> 	calls libc_start_hook(), which calls calloc()->malloc()->
> 	
> rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
> 	(Note that _Thread_Dispatch_disable_level is stil 1.)
> 	_CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
> we
> 	call _Thread_Disable_dispatch() (disable level is now 2!) followed
> by
> 	_CORE_mutex_Seize_interrupt_blocking() to block on the mutex.
> 	
> 	Because _Thread_Dispatch_disable_level is 2, the call to
> _Thread_Enable_dispatch()
> 	just decrements it to 1 and returns without calling
> _Thread_Dispatch().
> 	Thread 1 now happily proceeds to corrupt the heap free block chain.
> 	
> 
> I don't understand the semantics of _Thread_Dispatch_disable_level well
> enough to
> provide a patch.  For now we will work around it by making sure our tasks
> don't call
> malloc() at the same time.  Hopefully those with deep kernel understanding
> can
> take a look at this and tell me if I'm smoking crack.  :)
> 
> -Phil
>