Synchronization problem in message queue mechanism
Joel Sherrill
joel.sherrill at OARcorp.com
Thu Aug 22 13:23:10 UTC 2013
On 8/22/2013 7:56 AM, Sebastian Huber wrote:
> Hello,
>
> there was a PR related to message queues:
>
> https://www.rtems.org/bugzilla/show_bug.cgi?id=1961
>
> It was fixed in 4.10.2, but not in 4.10.1. So this may explain why it needs
> longer in 4.10.2 to get into trouble.
>
> I remember that there was a similar problem with a NULL pointer access in the
> RTEMS events.
>
> If I compare the functions _Event_Timeout() and _Thread_queue_Process_timeout()
> I am a bit surprised that _Thread_queue_Process_timeout() doesn't use
> _ISR_Disable/Enable() to protect the access to the_thread_queue->sync_state.
> On a first glance this looks like a major bug.
The assumption is that _Thread_queue_Process_timeout() is called from a
clock tick
ISR but thinking that doesn't prevent a nested interrupt from occurring.
Does this system nest interrupts?
> I added a test case for the RTEMS event problem:
>
> http://git.rtems.org/rtems/commit/?id=57f125d02595661b72d66f27b6f71c9b9579f516
>
> It should be possible to use this as a template to reproduce your message queue
> problem.
>
> On 2013-08-22 14:14, Cezar Antohe wrote:
>> Hello guys,
>>
>> We have been using RTEMS 4.10.1 version in a clinical care med unit, and we
>> believe there may be a synchronization problem in the message queue mechanisms.
>> We've observed that sometimes, the values from the currently running thread TCB
>> table are not valid anymore.
>> Let me give you 2 examples:
>>
>> 1. In function "rtems_message_queue_receive" there is a call to
>> "_Message_queue_Translate_core_message_queue_return_code" with input
>> parameter "_Thread_Executing->Wait.return_code".
>> This parameters gets corrupted after some hours of unit functioning, looking
>> into the code for "_Message_queue_Translate_core_message_queue_return_code",
>> the input should be less that 6 value, however, the return_code returns 13, out
>> of bound array and invalid.
>>
>> 2. Another bad situation happens in "_Thread_queue_Timeout" function, when
>> calling "_Thread_queue_Process_timeout" - the input parameter
>> "Thread_Control*the_thread" has its Wait.queue NULL. No check on that queue
>> pointer is made in "_Thread_queue_Process_timeout" function, which tries to
>> access a NULL pointer.
>>
>> We are no experts in RTEMS functionality and we haven't modified anything in
>> the current RTEMS code, however, we've noticed that the problem seems to appear
>> when a thread consumes the messages from the queue, sets the queue to NULL,
>> another thread calls queue insertion, wakes the first thread, however, its
>> queue remains NULL.
>>
>> We are making tests with patches for RTEMS version 4.10.2, the problem still
>> exists, however it's diminished, meaning is appears after more functioning time
>> for the infusing unit.
>>
>> Any help / idea / fast debug RTEMS method would be very much appreciated.
>>
>> Thank you very much,
>>
>> Cezar Antohe
>>
>>
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.org
>> http://www.rtems.org/mailman/listinfo/rtems-users
>>
>
--
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherrill at OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
More information about the users
mailing list