[Bug 1814] SMP race condition between stack free and dispatch

Wed Jun 15 18:02:57 UTC 2011

https://www.rtems.org/bugzilla/show_bug.cgi?id=1814

--- Comment #4 from Joel Sherrill <joel.sherrill at oarcorp.com> 2011-06-15 13:02:55 CDT ---
(In reply to comment #3)
> (In reply to comment #2)
> > (In reply to comment #1)
> > > I think this patch is not quite right.
> > > 
> > > Thread_Close executes with dispatch disabled when it is called from
> > > rtems_task_delete. Before freeing the stack, the thread is blocked by setting
> > > its state to dormant, which eventually calls Scheduler_Block.
> > > 
> > > I don't see how the thread's stack is used between when it is closed and any
> > > further dispatch.
> > > 
> > > Do you have a bug that is causing this behavior to be seen?
> > 
> > This is the code from rtems_task_delete() so I lean to agreeing with Gedare.
> > 
> >       _Thread_Close( the_information, the_thread );
> > 
> >       _RTEMS_tasks_Free( the_thread );
> > 
> >       _RTEMS_Unlock_allocator();
> >       _Thread_Enable_dispatch();
> 
> I'm talking about *SMP*, guys. :)

I know and I was concerned that it may not be 100% safe in non-SMP
situations. :)

Another idea I have had is to always enter _Thread_Dispatch with
a critical section level of 1.  I worry (but am not sure) there
may be a window on single-core where stack is freed, dispatch is enabled, 
then <bad?>ISR and ISR dispatch</bad?>.  We would never return to the
interrupted task again though so that may be OK.

> We have processor 1 that is finishing a thread and processor 2 waiting on the
> allocator mutex. As soon as processor 1 unlocks the allocator mutex, processor
> 2 grabs it and starts allocating memory as allocation is not protected by ISR
> disable nor the dispatch lock. It may allocate the stack of the thread on
> processor 1. Of course, processor 2 may immediately start using the memory
> overwriting the values on processor 1 stack.
> 
> In the mean-time, processor 2 arrives to Thread_Dispatch, but it is using the
> already freed stack.

I see the window.  No doubt.  It is strictly a delete(SELF) thing and 
unfortunately, that is probably the common use case.

> To Gedare: the actual situation where the bug showed itself was using 3
> processors and the memory got re-allocated as a stack of process 3, which even
> managed to start before process 1 got finally descheduled...
> 
> > 
> > Now on the other hand, looking at this pattern, I see that the lock allocator
> > and _Thread_Get (implicit disable dispatching) are in the wrong order with the
> > undoing at the other end of the method.
> > 
> > Lock allocator
> >   thread get disable dispatch
> >    ...
> >   Unlock allocator
> > Enable dispatch
> > 
> > That seems wrong but offhand, I don't know which side to change.  The
> > side-effects of disabling dispatching before getting the allocator mutex
> > worry me.
> > 
> 
> I was looking at it seriously, becase "Why it works on non-SMP?" This is in
> fact explained in your comment in the function...

But also may point out an issue with obtaining/releasing locks in the wrong
order.  In a non-SMP system, it is OK.  But for SMP may not be.

> > If you change the unlock order, then you do have the issue.
> > 
> > malloc() has a deferred free queue.  I wonder if this capability should be
> > moved to the heap.  Then we can just defer the free of the stack memory in
> > this case.  The next memory alloc/dealloc can perform the operation just like
> > we defer free's from ISRs.
> 
> Yes, this is another solution.

If you have been following the ARM unlimited discussion, I think it is another
side-effect of freeing memory for a task while it is running.

I worry that adding code to _Thread_Dispatch is making a critical performance
path worse and only handling one case.  Deferring all memory frees while
deleting SELF may be safer solution and not impact dispatch times.

Sebastian.. is this in the heap now?  Or would we need to merge it from 
the malloc family?

-- 
Configure bugmail: https://www.rtems.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.