RTEMS message queues and interrupt safety

Thomas Doerfler Thomas.Doerfler at imd-systems.de
Tue Apr 18 18:43:37 UTC 2006


Phil,

we had a similar leak in critical code boundaries some weeks ago, it was 
filed in the bug database under PR904. Maybe this would fix things for 
you too?

http://www.rtems.com/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=default&pr=904

wkr,
Thomas.

Phil Torre wrote:
> Greetings All,
> 
> We've got a weird bug that may interest someone.  I'd be interested
> in hearing from anyone who has seen similar.
> 
> The setup:  rtems-4.6.0 with various patches merged in from CVS,
> running on MPC855T, with an unsubmitted BSP.
> 
> The PowerPC host processor receives interrupts from an external DSP
> at a fairly high rate.  The ISR that services that interrupt sends
> messages to a queue that is read by a "classic API" task.  Several
> other foreground tasks also send messages to that queue.  When 
> running at high load (lots of interrupts firing) for extended periods
> of time, we sometimes see messages that have already been read from
> the queue "reappear", as much as tens of milliseconds later.  This
> seems to be happening because the number_of_pending_messages member 
> of the CORE_message_queue_Control struct is zero, but the chain of
> pending messages is non-empty.  When a new message is submitted, it
> goes to the end of the chain, and number_of_pending_messages becomes
> 1.  The next time the queue is read, the count is decremented back
> to zero, but the wrong message is returned.
> 
> I don't know exactly how the "count is zero but list is not empty"
> condition comes about.  I put in a bunch of instrumentation to try
> and catch it in the act of happening.  But, interrupts firing in
> the middle of my debug code was causing my debug code to false trigger.
> So, I resorted to turning interrupts off for the entire duration
> of both _CORE_message_queue_Submit() and _CORE_message_queue_Seize().
> Now my debug code doesn't false-trigger, but the actual bug doesn't
> happen any more.  We got pretty good at reproducing it, but with
> interrupts disabled in those two functions, we can't make the bug
> manifest any more.  I don't know if I have actually fixed something,
> or just forced the bug into hiding, biding its time.
> 
> Looking at the queue insert/remove code, I don't see a window.  I may
> be missing it, or there may not be one there and I've just changed 
> the timing enough with my interrupt disabling that we can't make
> the bug show itself the same way.
> 
> Any comments would be welcome.
> 
> Thanks,
> -Phil
>  


-- 
--------------------------------------------
IMD Ingenieurbuero fuer Microcomputertechnik
Thomas Doerfler           Herbststrasse 8
D-82178 Puchheim          Germany
email:    Thomas.Doerfler at imd-systems.de
PGP public key available at:
      http://www.imd-systems.de/pgpkey_en.html



More information about the users mailing list