Synchronization problem in message queue mechanism
Sebastian Huber
sebastian.huber at embedded-brains.de
Thu Aug 22 12:56:52 UTC 2013
Hello,
there was a PR related to message queues:
https://www.rtems.org/bugzilla/show_bug.cgi?id=1961
It was fixed in 4.10.2, but not in 4.10.1. So this may explain why it needs
longer in 4.10.2 to get into trouble.
I remember that there was a similar problem with a NULL pointer access in the
RTEMS events.
If I compare the functions _Event_Timeout() and _Thread_queue_Process_timeout()
I am a bit surprised that _Thread_queue_Process_timeout() doesn't use
_ISR_Disable/Enable() to protect the access to the_thread_queue->sync_state.
On a first glance this looks like a major bug.
I added a test case for the RTEMS event problem:
http://git.rtems.org/rtems/commit/?id=57f125d02595661b72d66f27b6f71c9b9579f516
It should be possible to use this as a template to reproduce your message queue
problem.
On 2013-08-22 14:14, Cezar Antohe wrote:
>
> Hello guys,
>
> We have been using RTEMS 4.10.1 version in a clinical care med unit, and we
> believe there may be a synchronization problem in the message queue mechanisms.
> We've observed that sometimes, the values from the currently running thread TCB
> table are not valid anymore.
> Let me give you 2 examples:
>
> 1. In function "rtems_message_queue_receive" there is a call to
> "_Message_queue_Translate_core_message_queue_return_code" with input
> parameter "_Thread_Executing->Wait.return_code".
> This parameters gets corrupted after some hours of unit functioning, looking
> into the code for "_Message_queue_Translate_core_message_queue_return_code",
> the input should be less that 6 value, however, the return_code returns 13, out
> of bound array and invalid.
>
> 2. Another bad situation happens in "_Thread_queue_Timeout" function, when
> calling "_Thread_queue_Process_timeout" - the input parameter
> "Thread_Control*the_thread" has its Wait.queue NULL. No check on that queue
> pointer is made in "_Thread_queue_Process_timeout" function, which tries to
> access a NULL pointer.
>
> We are no experts in RTEMS functionality and we haven't modified anything in
> the current RTEMS code, however, we've noticed that the problem seems to appear
> when a thread consumes the messages from the queue, sets the queue to NULL,
> another thread calls queue insertion, wakes the first thread, however, its
> queue remains NULL.
>
> We are making tests with patches for RTEMS version 4.10.2, the problem still
> exists, however it's diminished, meaning is appears after more functioning time
> for the infusing unit.
>
> Any help / idea / fast debug RTEMS method would be very much appreciated.
>
> Thank you very much,
>
> Cezar Antohe
>
>
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users
>
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
More information about the users
mailing list