RTEMS scheduler bug ?
Sebastian Huber
sebastian.huber at embedded-brains.de
Fri Mar 29 10:04:46 UTC 2019
Hello Catalin,
On 29/03/2019 10:56, Catalin Demergian wrote:
> Hi,
> We had some time ago (sept/oct 2018) a long discussion where I was
> suspecting a
> scheduler issue (subject
> "rtems_message_queue_receive/rtems_event_receive issues")
>
> We got to the point where I realized that _Chain_Append_unprotected
> might fail to add an
> element in the queue, with the effect of having a task in a funny
> state where state=READY, but
> the task will not be in the ready chain, so the task will never get
> CPU time anymore since a task
> needs to be blocked in order to be unblocked when new data arrives.
>
> We were using USB then, but this issue re-became hot because we just
> got the same issue
> over serial :)
> I believe there is a possible chain of events that can make
> _Chain_Append_unprotected to fail,
> explanations follow.
>
> /*
>
> ** @note It does NOT disable interrupts to ensure the atomicity of the*
>
> ** append operation.*
>
> */
>
> RTEMS_INLINE_ROUTINE void _Chain_Append_unprotected(
>
> Chain_Control *the_chain,
>
> Chain_Node *the_node
>
> )
>
> {
>
> Chain_Node *tail = _Chain_Tail( the_chain );
>
> Chain_Node *old_last = tail->previous;
>
> the_node->next = tail;
>
> * tail->previous = the_node;*
>
> * old_last->next = the_node;*
>
> the_node->previous = old_last;
>
> }
>
> The
>
> * tail->previous = the_node;*
>
> * old_last->next = the_node;*
>
> lines are the ones that actually add the element
>
> to the ready chain.
>
> If a thread executes those lines, but just before executing
>
> the_node->previous = old_last;
>
> another thread comes to add another node in this chain, it will set
> another node in
>
> tail->previous and old_last->next, and as a result, when the interrupted
>
> thread will continue to execute the last line, it will be for nothing,
> because the initial node will not be added to the ready chain.
>
>
> If this chain of events occur (*and after a while they will*), we get
> starvation for that task.
>
> I'm reproducing this issue in a long duration test, the duration
> before this happens varies from run to run, but it always happens.
>
>
> *What I'm proposing is the following*: call _Chain_Append instead of
> _Chain_Append_unprotected in
> schedulerpriorityimpl.h, _Scheduler_priority_Ready_queue_enqueue function.
>
>
> void _Chain_Append(
>
> Chain_Control *the_chain,
>
> Chain_Node *node
>
> )
>
> {
>
> ISR_Level level;
>
> _ISR_Disable( level );
>
> _Chain_Append_unprotected( the_chain, node );
>
> _ISR_Enable( level );
>
> }
>
>
> This way the add-element-to-chain operation becomes atomic.
>
> I was able to run a long duration test (8 hrs) in my setup with this
> fix successfully.
>
>
> What do you think ?
>
The _Scheduler_priority_Ready_queue_enqueue() should only be called with
interrupts disabled. So, disabling interrupts again should have no
effect. Could you please try out the attached patch and build the BSP
with --enable-rtems-debug?
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-score-Add-asserts.patch
Type: text/x-patch
Size: 1515 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20190329/ec9bc7fc/attachment-0002.bin>
More information about the users
mailing list