RTEMS message queues and interrupt safety
Thomas Doerfler
Thomas.Doerfler at imd-systems.de
Tue Apr 18 18:43:37 UTC 2006
Phil,
we had a similar leak in critical code boundaries some weeks ago, it was
filed in the bug database under PR904. Maybe this would fix things for
you too?
http://www.rtems.com/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&database=default&pr=904
wkr,
Thomas.
Phil Torre wrote:
> Greetings All,
>
> We've got a weird bug that may interest someone. I'd be interested
> in hearing from anyone who has seen similar.
>
> The setup: rtems-4.6.0 with various patches merged in from CVS,
> running on MPC855T, with an unsubmitted BSP.
>
> The PowerPC host processor receives interrupts from an external DSP
> at a fairly high rate. The ISR that services that interrupt sends
> messages to a queue that is read by a "classic API" task. Several
> other foreground tasks also send messages to that queue. When
> running at high load (lots of interrupts firing) for extended periods
> of time, we sometimes see messages that have already been read from
> the queue "reappear", as much as tens of milliseconds later. This
> seems to be happening because the number_of_pending_messages member
> of the CORE_message_queue_Control struct is zero, but the chain of
> pending messages is non-empty. When a new message is submitted, it
> goes to the end of the chain, and number_of_pending_messages becomes
> 1. The next time the queue is read, the count is decremented back
> to zero, but the wrong message is returned.
>
> I don't know exactly how the "count is zero but list is not empty"
> condition comes about. I put in a bunch of instrumentation to try
> and catch it in the act of happening. But, interrupts firing in
> the middle of my debug code was causing my debug code to false trigger.
> So, I resorted to turning interrupts off for the entire duration
> of both _CORE_message_queue_Submit() and _CORE_message_queue_Seize().
> Now my debug code doesn't false-trigger, but the actual bug doesn't
> happen any more. We got pretty good at reproducing it, but with
> interrupts disabled in those two functions, we can't make the bug
> manifest any more. I don't know if I have actually fixed something,
> or just forced the bug into hiding, biding its time.
>
> Looking at the queue insert/remove code, I don't see a window. I may
> be missing it, or there may not be one there and I've just changed
> the timing enough with my interrupt disabling that we can't make
> the bug show itself the same way.
>
> Any comments would be welcome.
>
> Thanks,
> -Phil
>
--
--------------------------------------------
IMD Ingenieurbuero fuer Microcomputertechnik
Thomas Doerfler Herbststrasse 8
D-82178 Puchheim Germany
email: Thomas.Doerfler at imd-systems.de
PGP public key available at:
http://www.imd-systems.de/pgpkey_en.html
More information about the users
mailing list