RTEMS scheduler bug ?

Catalin Demergian demergian at gmail.com
Fri Mar 29 09:56:45 UTC 2019


Hi,
We had some time ago (sept/oct 2018) a long discussion where I was
suspecting a
scheduler issue (subject "rtems_message_queue_receive/rtems_event_receive
issues")

We got to the point where I realized that _Chain_Append_unprotected might
fail to add an
element in the queue, with the effect of having a task in a funny state
where state=READY, but
the task will not be in the ready chain, so the task will never get CPU
time anymore since a task
needs to be blocked in order to be unblocked when new data arrives.

We were using USB then, but this issue re-became hot because we just got
the same issue
over serial :)
I believe there is a possible chain of events that can make
_Chain_Append_unprotected to fail,
explanations follow.

/*

** @note It does NOT disable interrupts to ensure the atomicity of the*

**       append operation.*

*/



RTEMS_INLINE_ROUTINE void _Chain_Append_unprotected(

  Chain_Control *the_chain,

  Chain_Node    *the_node

)

{

  Chain_Node *tail = _Chain_Tail( the_chain );

  Chain_Node *old_last = tail->previous;



  the_node->next = tail;

*  tail->previous = the_node;*

*  old_last->next = the_node;*

  the_node->previous = old_last;

}

The

*  tail->previous = the_node;*

*  old_last->next = the_node;*

lines are the ones that actually add the element

to the ready chain.

If a thread executes those lines, but just before executing

the_node->previous = old_last;

another thread comes to add another node in this chain, it will set another
node in

tail->previous and old_last->next, and as a result, when the interrupted

thread will continue to execute the last line, it will be for nothing,
because the initial node will not be added to the ready chain.


If this chain of events occur (*and after a while they will*), we get
starvation for that task.

I'm reproducing this issue in a long duration test, the duration before
this happens varies from run to run, but it always happens.


*What I'm proposing is the following*: call _Chain_Append instead of
_Chain_Append_unprotected in
schedulerpriorityimpl.h, _Scheduler_priority_Ready_queue_enqueue function.


void _Chain_Append(

  Chain_Control *the_chain,

  Chain_Node    *node

)

{

  ISR_Level level;



  _ISR_Disable( level );

    _Chain_Append_unprotected( the_chain, node );

  _ISR_Enable( level );

}


This way the add-element-to-chain operation becomes atomic.

I was able to run a long duration test (8 hrs) in my setup with this fix
successfully.


What do you think ?


regards,
Catalin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20190329/3c48d941/attachment.html>


More information about the users mailing list