[Bug 1814] SMP race condition between stack free and dispatch

Wed Jun 15 18:34:22 UTC 2011

https://www.rtems.org/bugzilla/show_bug.cgi?id=1814

--- Comment #5 from Marta Rybczynska <marta.rybczynska at kalray.eu> 2011-06-15 13:34:20 CDT ---
(In reply to comment #4)
> (In reply to comment #3)
> > (In reply to comment #2)
> > > (In reply to comment #1)
> > > > I think this patch is not quite right.
> > > > 
> > > > Thread_Close executes with dispatch disabled when it is called from
> > > > rtems_task_delete. Before freeing the stack, the thread is blocked by setting
> > > > its state to dormant, which eventually calls Scheduler_Block.
> > > > 
> > > > I don't see how the thread's stack is used between when it is closed and any
> > > > further dispatch.
> > > > 
> > > > Do you have a bug that is causing this behavior to be seen?
> > > 
> > > This is the code from rtems_task_delete() so I lean to agreeing with Gedare.
> > > 
> > >       _Thread_Close( the_information, the_thread );
> > > 
> > >       _RTEMS_tasks_Free( the_thread );
> > > 
> > >       _RTEMS_Unlock_allocator();
> > >       _Thread_Enable_dispatch();
> > 
> > I'm talking about *SMP*, guys. :)
> 
> I know and I was concerned that it may not be 100% safe in non-SMP
> situations. :)
> 
> Another idea I have had is to always enter _Thread_Dispatch with
> a critical section level of 1.  I worry (but am not sure) there
> may be a window on single-core where stack is freed, dispatch is enabled, 
> then <bad?>ISR and ISR dispatch</bad?>.  We would never return to the
> interrupted task again though so that may be OK.

In the current code it should work, but such ISR is in a very strange
situation. If the handler(s) in the ISR start using _Thread_executing, for
instance (and especially the free'd fields, which are set to NULL, but normally
are !NULL), it may need some special checks (realistic? done?).

At least, if it is left like that IMO it is worth a nice warning message in the
documentation.

I agree that this situation is somewhat fuzzy also in non-SMP. Generally,
use-after-free is a bad idea and that's exactly what is happening here.

> > > 
> > > That seems wrong but offhand, I don't know which side to change.  The
> > > side-effects of disabling dispatching before getting the allocator mutex
> > > worry me.
> > > 
> > 
> > I was looking at it seriously, becase "Why it works on non-SMP?" This is in
> > fact explained in your comment in the function...
> 
> But also may point out an issue with obtaining/releasing locks in the wrong
> order.  In a non-SMP system, it is OK.  But for SMP may not be.
>

For now I guess that in all SMP builds there's just one main lock, so it does
not cause problems. In general I agree that the order should be the same. If
the free is in fact delayed, there will be no problem, because use-after-free
will disappear. I think that+bringing locks to the right order would be a good
solution.

However, the most natural order of the locks will be in this case

lock_allocator
dispatch_disable

thread_close

dispatch_enable
unlock_allocator

However, it won't reach unlock_allocator because of the dispatch in
dispatch_enable. The reverse order may cause deadlocks IMO, as the dispatch
lock is more "internal".

> > > If you change the unlock order, then you do have the issue.
> > > 
> > > malloc() has a deferred free queue.  I wonder if this capability should be
> > > moved to the heap.  Then we can just defer the free of the stack memory in
> > > this case.  The next memory alloc/dealloc can perform the operation just like
> > > we defer free's from ISRs.
> > 
> > Yes, this is another solution.
> 
> If you have been following the ARM unlimited discussion, I think it is another
> side-effect of freeing memory for a task while it is running.

I can't say for sure, but we seem to have a similar problem with unlimited on
SMP. Have not looked for the reason, but this patch (stack/unlock) doesn't help
in the unlimited case.

> 
> I worry that adding code to _Thread_Dispatch is making a critical performance
> path worse and only handling one case.  Deferring all memory frees while
> deleting SELF may be safer solution and not impact dispatch times.

I agree. Just haven't seen the mechanism in the heap.
> 
> Sebastian.. is this in the heap now?  Or would we need to merge it from 
> the malloc family?

-- 
Configure bugmail: https://www.rtems.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.