rtems_message_queue_send from ISR?

Thu Nov 10 10:22:35 UTC 2011

Hi,

is it really OK to use rtems_message_queue_send() from an ISR, as 
http://www.rtems.com/onlinedocs/doc-current/share/rtems/html/c_user/c_user00114.html 
suggests?

See the message below... and 
http://lists.milkymist.org/pipermail/devel-milkymist.org/2011-November/002100.html 
for the previous e-mail.

Does the BSP/CPU code have anything to do to ensure that this race 
condition doesn't occur?

Sébastien

-------- Original Message --------
Subject: Re: [Milkymist-devel] [PATCH, tentative] RTEMS: don't rely on 
number_of_pending_messages in _CORE_message_queue_Submit
Date: Wed, 9 Nov 2011 21:27:16 -0300
From: Werner Almesberger <werner at almesberger.net>
Reply-To: Milkymist One, Milkymist SoC and Flickernoise developers' list 
<devel at lists.milkymist.org>
To: Milkymist One, Milkymist SoC and Flickernoise developers' list 
<devel at lists.milkymist.org>

Hmm, maybe I should rephrase the problem description so that it
becomes a bit clearer. There's also an enqueue-enqueue race I
hadn't mentioned. Here's the improved version:

If it's permissible to call rtems_message_queue_send from an
interrupt, then there is at least one race condition in the core
message subsystem.

This created the MIDI/mouse hang we love so much on M1.

The problem is as follows: RTEMS queues use pre-allocated message
buffers that are kept on an "inactive" (free) list. When enqueuing
a message, a buffer is first removed from the inactive list, data
it copied to it, and it is then added to the pending list.

The reverse happens when dequeuing. Besides these two queues, there
is also a counter called number_of_pending_messages keeping track,
as the name suggests, of the number of pending messages. It is
updated atomically together with changes to the pending buffers
list.

 From the above it is clear that the counter will be out of sync with
the inactive list between the beginning and the end of an enqueue or
dequeue operation.

In order to minimize interrupt latency, RTEMS disables interrupts
only when adding and removing buffers from lists, but not throughout
the whole enqueuing/dequeuing operation. Instead, it disables the
scheduler during these operations, but this doesn't prevent
interrupts.

This means that the inconsistency between number_of_pending_messages
and the inactive list can be observed from an interrupt handler if
enqueuing or dequeuing is in progress.

_CORE_message_queue_Submit checks whether there is still room in the
queue by reading number_of_pending_messages. If there is room, it
then calls _CORE_message_queue_Allocate_message_buffer to obtain a
free buffer.

Given that number_of_pending_messages and the list of inactive
buffers can disagree, e.g., if _CORE_message_queue_Seize or another
_CORE_message_queue_Submit is executing concurrently,
_CORE_message_queue_Allocate_message_buffer may fail to obtain a
free buffer despite the prior test.

_CORE_message_queue_Allocate_message_buffer can detect a lack of
free buffers and indicates it by returning a NULL pointer. Checking
whether NULL has been returned instead of a buffer is optional and
depends on RTEMS_DEBUG.

If no check is performed, _CORE_message_queue_Submit will then try
to use the buffer. In the absence of hardware detecting the
de-referencing of NULL pointers, the wounded system will limp on a
little further until, at least in the case of M1, it finally hangs
somewhere.

The patch below avoids the problem in the scenario described above
by not using number_of_pending_messages as an indicator of whether
free buffers are available, but by simply trying to get a buffer,
and handling the result of failure.

This is similar to how _CORE_message_queue_Seize works.

Another possibility would be to make testing of the_message no
longer optional. But then, there would basically be two tests for
the same condition, which is ugly.

- Werner
_______________________________________________
http://lists.milkymist.org/listinfo.cgi/devel-milkymist.org
IRC: #milkymist at Freenode