Synchronization problem in message queue mechanism

Sebastian Huber sebastian.huber at embedded-brains.de
Thu Aug 22 12:56:52 UTC 2013


Hello,

there was a PR related to message queues:

https://www.rtems.org/bugzilla/show_bug.cgi?id=1961

It was fixed in 4.10.2, but not in 4.10.1.  So this may explain why it needs 
longer in 4.10.2 to get into trouble.

I remember that there was a similar problem with a NULL pointer access in the 
RTEMS events.

If I compare the functions _Event_Timeout() and _Thread_queue_Process_timeout() 
I am a bit surprised that _Thread_queue_Process_timeout() doesn't use 
_ISR_Disable/Enable() to protect the access to the_thread_queue->sync_state. 
On a first glance this looks like a major bug.

I added a test case for the RTEMS event problem:

http://git.rtems.org/rtems/commit/?id=57f125d02595661b72d66f27b6f71c9b9579f516

It should be possible to use this as a template to reproduce your message queue 
problem.

On 2013-08-22 14:14, Cezar Antohe wrote:
>
> Hello guys,
>
> We have been using RTEMS 4.10.1 version in a clinical care med unit, and we
> believe there may be a synchronization problem in the message queue mechanisms.
> We've observed that sometimes, the values from the currently running thread TCB
> table are not valid anymore.
> Let me give you 2 examples:
>
> 1. In function "rtems_message_queue_receive" there is a call to
> "_Message_queue_Translate_core_message_queue_return_code" with input
> parameter "_Thread_Executing->Wait.return_code".
> This parameters gets corrupted after some hours of unit functioning, looking
> into the code for "_Message_queue_Translate_core_message_queue_return_code",
> the input should be less that 6 value, however, the return_code returns 13, out
> of bound array and invalid.
>
> 2. Another bad situation happens in "_Thread_queue_Timeout" function, when
> calling "_Thread_queue_Process_timeout" - the input parameter
> "Thread_Control*the_thread" has its Wait.queue NULL. No check on that queue
> pointer is made in "_Thread_queue_Process_timeout" function, which tries to
> access a NULL pointer.
>
> We are no experts in RTEMS functionality and we haven't modified anything in
> the current RTEMS code, however, we've noticed that the problem seems to appear
> when a thread consumes the messages from the queue, sets the queue to NULL,
> another thread calls queue insertion, wakes the first thread, however, its
> queue remains NULL.
>
> We are making tests with patches for RTEMS version 4.10.2, the problem still
> exists, however it's diminished, meaning is appears after more functioning time
> for the infusing unit.
>
> Any help / idea / fast debug RTEMS method would be very much appreciated.
>
> Thank you very much,
>
> Cezar Antohe
>
>
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users
>


-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



More information about the users mailing list