RTEMS scheduler bug ?

Sebastian Huber sebastian.huber at embedded-brains.de
Fri Mar 29 10:04:46 UTC 2019


Hello Catalin,

On 29/03/2019 10:56, Catalin Demergian wrote:
> Hi,
> We had some time ago (sept/oct 2018) a long discussion where I was 
> suspecting a
> scheduler issue (subject 
> "rtems_message_queue_receive/rtems_event_receive issues")
>
> We got to the point where I realized that _Chain_Append_unprotected 
> might fail to add an
> element in the queue, with the effect of having a task in a funny 
> state where state=READY, but
> the task will not be in the ready chain, so the task will never get 
> CPU time anymore since a task
> needs to be blocked in order to be unblocked when new data arrives.
>
> We were using USB then, but this issue re-became hot because we just 
> got the same issue
> over serial :)
> I believe there is a possible chain of events that can make 
> _Chain_Append_unprotected to fail,
> explanations follow.
>
> /*
>
> ** @note It does NOT disable interrupts to ensure the atomicity of the*
>
> **       append operation.*
>
> */
>
> RTEMS_INLINE_ROUTINE void _Chain_Append_unprotected(
>
>   Chain_Control *the_chain,
>
>   Chain_Node    *the_node
>
> )
>
> {
>
>   Chain_Node *tail = _Chain_Tail( the_chain );
>
>   Chain_Node *old_last = tail->previous;
>
>   the_node->next = tail;
>
> *  tail->previous = the_node;*
>
> *  old_last->next = the_node;*
>
>   the_node->previous = old_last;
>
> }
>
> The
>
> *  tail->previous = the_node;*
>
> *  old_last->next = the_node;*
>
> lines are the ones that actually add the element
>
> to the ready chain.
>
> If a thread executes those lines, but just before executing
>
> the_node->previous = old_last;
>
> another thread comes to add another node in this chain, it will set 
> another node in
>
> tail->previous and old_last->next, and as a result, when the interrupted
>
> thread will continue to execute the last line, it will be for nothing, 
> because the initial node will not be added to the ready chain.
>
>
> If this chain of events occur (*and after a while they will*), we get 
> starvation for that task.
>
> I'm reproducing this issue in a long duration test, the duration 
> before this happens varies from run to run, but it always happens.
>
>
> *What I'm proposing is the following*: call _Chain_Append instead of 
> _Chain_Append_unprotected in 
> schedulerpriorityimpl.h, _Scheduler_priority_Ready_queue_enqueue function.
>
>
> void _Chain_Append(
>
>   Chain_Control *the_chain,
>
>   Chain_Node    *node
>
> )
>
> {
>
>   ISR_Level level;
>
>   _ISR_Disable( level );
>
>     _Chain_Append_unprotected( the_chain, node );
>
> _ISR_Enable( level );
>
> }
>
>
> This way the add-element-to-chain operation becomes atomic.
>
> I was able to run a long duration test (8 hrs) in my setup with this 
> fix successfully.
>
>
> What do you think ?
>

The _Scheduler_priority_Ready_queue_enqueue() should only be called with 
interrupts disabled. So, disabling interrupts again should have no 
effect. Could you please try out the attached patch and build the BSP 
with --enable-rtems-debug?

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-score-Add-asserts.patch
Type: text/x-patch
Size: 1515 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20190329/ec9bc7fc/attachment-0002.bin>


More information about the users mailing list