BUG? stack corruption in posix thread.
Till Straumann
strauman at slac.stanford.edu
Tue Jun 12 21:32:46 UTC 2012
One additional question comes naturally (pthreadcreate.c):
Why is the return status of _Thread_Start() checked only after
(possibly) executing _Watchdog_Insert_ticks() ?
The way I would have coded is
/*** BEGIN ***/
_Thread_Disable_dispatch(); /* currently not present */
status = _Thread_Start(...);
if ( !status ) {
_Thread_Enable_dispatch(); /* currently not present */
_POSIX_Threads_Free( the_thread );
_RTEMS_Unlock_allocator();
return EINVAL;
}
/* the following 'if' statement is currently coded *before* the
status check */
if ( schedpolicy == SCHED_SPORADIC ) {
_Watchdog_Insert_ticks(
&api->Sporadic_timer,
_Timespec_To_ticks( &api->schedparam.ss_replenish_period )
);
}
_Thread_Enable_dispatch(); /* currently not present */
/*** END ***/
- T.
On 06/12/2012 04:15 PM, Joel Sherrill wrote:
> On 06/12/2012 04:06 PM, Till Straumann wrote:
> On 06/12/2012 03:47 PM, Joel Sherrill wrote:
>>> I agree with your analysis. Dispatching should be disabled
>>> once all resources are allocated and before _Thread_Start
>>> is called.
>>>
>>> But is it the capture thread or stack overflow checker
>>> that is writing the pattern?
>> It is the rtems_capture_start_task() extension callback
>> which fills the stack with 0xdeaddead when a task is started.
> OK.
>>> I assume you have a patch for this and it is helping?
>> I haven't tried yet. Mostly because I was unsure
>> if the critical section in pthreadcreate.c should include
>> _Watchdog_Insert_ticks() or if that is not necessary.
> I think it should be. If it were a SCHED_SPORADIC server
> then I don't think we would want a tick ISR to operate on
> its replenishment information until it was really running.
>> - T.
>>> On 06/12/2012 03:39 PM, Till Straumann wrote:
>>>> I observe pretty reproducable crashes when using pthreads under rtems-4.9.
>>>>
>>>> It seems that a function running in a new pthread context finds '0xdeaddead'
>>>> on the stack upon returning to its caller and jumps into the wild.
>>>>
>>>> Apparently, the capture engine overwrites the task's stack with 0xdeaddead
>>>> *after* the task is already running.
>>>>
>>>> I believe that 'pthread_create()' is the culprit. It creates a SCORE thread
>>>> and then calls
>>>>
>>>> _Thread_Start( )
>>>>
>>>> *without* disabling thread-dispatching.
>>>>
>>>> However, _Thread_Start() marks the thread as 'runnable' *before* calling
>>>> user extensions (_Thread_Start() body):
>>>>
>>>> {
>>>> if ( _States_Is_dormant( the_thread->current_state ) )
>>>> {../../score/src/apimutexunlock.c
>>>>
>>>> the_thread->Start.entry_point = (Thread_Entry) entry_point;
>>>>
>>>> the_thread->Start.prototype = the_prototype;
>>>> the_thread->Start.pointer_argument = pointer_argument;
>>>> the_thread->Start.numeric_argument = numeric_argument;
>>>>
>>>> _Thread_Load_environment( the_thread );
>>>>
>>>> _Thread_Ready( the_thread );
>>>>
>>>> _User_extensions_Thread_start( the_thread );
>>>>
>>>> return true;
>>>> }
>>>>
>>>> return false;
>>>> }
>>>>
>>>> Therefore, could it not be that the thread is already scheduled *before*
>>>> user extensions are executed? In this scenario, the following race condition
>>>> could occur:
>>>>
>>>> 1. thread X calls pthread_create
>>>> 2. _Thread_Start() marks new thread Y 'ready'
>>>> 3. 'Y' is scheduled, calls stuff and blocks
>>>> 4. 'X' runs again and executes user extensions for 'Y'
>>>> 5. capture engine's 'thread_start' extension fills 'Y's stack with
>>>> 0xdeaddead
>>>> 6. 'Y' is scheduled again, when popping a return address from the stack
>>>> it jumps to 0xdeaddead and crashes the system.
>>>>
>>>> NOTES:
>>>> - other APIs (rtems, itron) *have* thread-dispatching disabled around
>>>> _Thread_Start()
>>>> - the current 'master' branch seems to still suffer from this
>>>> - I consider this a serious bug.
>>>>
>>>> -- Till
>
More information about the users
mailing list