Possible bug in _CORE_mutex_Seize()

Fri Sep 26 22:32:42 UTC 2003

As unlikely as it sounds, I think we have found a bug in _CORE_mutex_Seize()
which violates mutual exclusion.

This pertains to rtems-4.6.0pre4 running on MPC860 with an unsubmitted BSP.
The sequence of events goes like this:

1.	Thread 1 (Init) is running at priority 1.  It creates and starts 
	thread 2 (notification_task) at priority 196.  Since thread 2 is
	at a lower priority, it doesn't start executing yet.

2.	Thread 1 sleeps with rtems_task_wake_after(10 ms) to wait for some
	external hardware to do something.  As soon as it goes to sleep,
	thread 2 is now runnable and starts executing.

3.	Thread 2 does some stuff, and then calls malloc().  Halfway through
	rtems_region_get_segment(), the 10ms timer set by thread 1 expires.
	We do a context switch and thread 1 is now running.

	** Before it lost the CPU, thread 2 had successfully called
**
	** _RTEMS_Lock_allocator().  _RTEMS_Allocator_Mutex is held by	**
	** thread 2 when the context switch back to thread 1 occurs.	**

4.	Thread 1 now calls rtems_start_task(), which invokes malloc(), which
calls
	rtems_region_get_segment(), which calls _RTEMS_Lock_allocator().

	_RTEMS_Lock_allocator() returns, *without blocking*.  The allocator
	mutex is still held by thread 2, yet thread 1 proceeds in the belief
	that it has the mutex.

	More detail:
	When thread 1 calls rtems_task_start() in step #4, that function
	calls _Thread_Get() on the task we want to start.  As a side effect,
	_Thread_Get() increments _Thread_Dispatch_disable_level to 1.

	Shortly thereafter, _User_extensions_Thread_start() is called, which
	calls libc_start_hook(), which calls calloc()->malloc()->

rtems_region_get_segment()->_RTEMS_Lock_allocator()->_CORE_mutex_Seize().
	(Note that _Thread_Dispatch_disable_level is stil 1.)
	_CORE_mutex_Seize_interrupt_trylock() returns 1 (as it should), so
we
	call _Thread_Disable_dispatch() (disable level is now 2!) followed
by
	_CORE_mutex_Seize_interrupt_blocking() to block on the mutex.

	Because _Thread_Dispatch_disable_level is 2, the call to
_Thread_Enable_dispatch()
	just decrements it to 1 and returns without calling
_Thread_Dispatch().
	Thread 1 now happily proceeds to corrupt the heap free block chain.

I don't understand the semantics of _Thread_Dispatch_disable_level well
enough to
provide a patch.  For now we will work around it by making sure our tasks
don't call
malloc() at the same time.  Hopefully those with deep kernel understanding
can
take a look at this and tell me if I'm smoking crack.  :)

-Phil

-- 

=====================================================================
Phil Torre                               phone: 425-820-6363 x234
Design Engineer                          email: ptorre at zetron.com
Switching Systems Group                    fax: 425-820-7031
Zetron, Inc.                               web: http://www.zetron.com