BUG? stack corruption in posix thread.

Tue Jun 12 21:32:46 UTC 2012

One additional question comes naturally (pthreadcreate.c):

Why is the return status of _Thread_Start() checked only after
(possibly) executing _Watchdog_Insert_ticks() ?

The way I would have coded is

/*** BEGIN ***/

   _Thread_Disable_dispatch(); /* currently not present */

   status = _Thread_Start(...);

   if ( !status ) {
     _Thread_Enable_dispatch(); /* currently not present */
     _POSIX_Threads_Free( the_thread );
     _RTEMS_Unlock_allocator();
     return EINVAL;
   }

   /* the following 'if' statement is currently coded *before* the 
status check */
   if ( schedpolicy == SCHED_SPORADIC ) {
     _Watchdog_Insert_ticks(
&api->Sporadic_timer,
       _Timespec_To_ticks( &api->schedparam.ss_replenish_period )
     );
   }

   _Thread_Enable_dispatch(); /* currently not present */

/*** END ***/

- T.

On 06/12/2012 04:15 PM, Joel Sherrill wrote:
> On 06/12/2012 04:06 PM, Till Straumann wrote:

> On 06/12/2012 03:47 PM, Joel Sherrill wrote:
>>> I agree with your analysis. Dispatching should be disabled
>>> once all resources are allocated and before _Thread_Start
>>> is called.
>>>
>>> But is it the capture thread or stack overflow checker
>>> that is writing the pattern?
>> It is the rtems_capture_start_task() extension callback
>> which fills the stack with 0xdeaddead when a task is started.
> OK.
>>> I assume you have a patch for this and it is helping?
>> I haven't tried yet. Mostly because I was unsure
>> if the critical section in pthreadcreate.c should include
>>     _Watchdog_Insert_ticks() or if that is not necessary.
> I think it should be.  If it were a SCHED_SPORADIC server
> then I don't think we would want a tick ISR to operate on
> its replenishment information until it was really running.
>> - T.
>>> On 06/12/2012 03:39 PM, Till Straumann wrote:
>>>> I observe pretty reproducable crashes when using pthreads under rtems-4.9.
>>>>
>>>> It seems that a function running in a new pthread context finds '0xdeaddead'
>>>> on the stack upon returning to its caller and jumps into the wild.
>>>>
>>>> Apparently, the capture engine overwrites the task's stack with 0xdeaddead
>>>> *after* the task is already running.
>>>>
>>>> I believe that 'pthread_create()' is the culprit. It creates a SCORE thread
>>>> and then calls
>>>>
>>>> _Thread_Start( )
>>>>
>>>> *without* disabling thread-dispatching.
>>>>
>>>> However, _Thread_Start() marks the thread as 'runnable' *before* calling
>>>> user extensions (_Thread_Start() body):
>>>>
>>>> {
>>>>        if ( _States_Is_dormant( the_thread->current_state ) )
>>>> {../../score/src/apimutexunlock.c
>>>>
>>>>          the_thread->Start.entry_point      = (Thread_Entry) entry_point;
>>>>
>>>>          the_thread->Start.prototype        = the_prototype;
>>>>          the_thread->Start.pointer_argument = pointer_argument;
>>>>          the_thread->Start.numeric_argument = numeric_argument;
>>>>
>>>>          _Thread_Load_environment( the_thread );
>>>>
>>>>          _Thread_Ready( the_thread );
>>>>
>>>>          _User_extensions_Thread_start( the_thread );
>>>>
>>>>          return true;
>>>>        }
>>>>
>>>>        return false;
>>>> }
>>>>
>>>> Therefore, could it not be that the thread is already scheduled *before*
>>>> user extensions are executed? In this scenario, the following race condition
>>>> could occur:
>>>>
>>>> 1. thread X calls pthread_create
>>>> 2. _Thread_Start() marks new thread Y 'ready'
>>>> 3. 'Y' is scheduled, calls stuff and blocks
>>>> 4. 'X' runs again and executes user extensions for 'Y'
>>>> 5. capture engine's 'thread_start' extension fills 'Y's stack with
>>>> 0xdeaddead
>>>> 6. 'Y' is scheduled again, when popping a return address from the stack
>>>>         it jumps to 0xdeaddead and crashes the system.
>>>>
>>>> NOTES:
>>>>       - other APIs (rtems, itron) *have* thread-dispatching disabled around
>>>> _Thread_Start()
>>>>       - the current 'master' branch seems to still suffer from this
>>>>       - I consider this a serious bug.
>>>>
>>>> -- Till
>