RTEMS scheduler bug ?

Wed Apr 3 13:41:37 UTC 2019

yes, I realized yesterday evening that gIntrErrs could be incremented in
the second if.
so I rewrote it like this

int gIntrptErrs;
int gInsertErrs;

RTEMS_INLINE_ROUTINE void _Scheduler_priority_Ready_queue_enqueue(
  Chain_Node                     *node,
  Scheduler_priority_Ready_queue *ready_queue,
  Priority_bit_map_Control       *bit_map
)
{
  Chain_Control *ready_chain = ready_queue->ready_chain;
  //_Assert(_ISR_Get_level() != 0);
  if(_ISR_Get_level() == 0)
gIntrptErrs++;

  cnt_before = _Chain_Node_count_unprotected(ready_chain);
  _Chain_Append/*_unprotected*/( ready_chain, node );
  cnt_after = _Chain_Node_count_unprotected(ready_chain);

  if(cnt_after != cnt_before + 1)
gInsertErrs++;

  _Priority_bit_map_Add( bit_map, &ready_queue->Priority_map );
}

It didn't seem that we enter that code with interrupts enabled .. output was
# cpuuse
-------------------------------------------------------------------------------
                              CPU USAGE BY THREAD
------------+----------------------------------------+---------------+---------
 ID         | NAME                                   | SECONDS       |
PERCENT
------------+----------------------------------------+---------------+---------
*cdemergian build 11.15 gIntrptErrs=0 gInsertErrs=2*
 0x09010001 | IDLE                                   |    244.595117 |
99.238
 0x0a010001 | UI1                                    |      1.000929 |
 0.406
 0x0a010002 | ntwk                                   |      0.099342 |
 0.040
 0x0a010003 | SCtx                                   |      0.068705 |
 0.027
 0x0a010004 | SCrx                                   |      0.089272 |
 0.036
 0x0a010005 | eRPC                                   |      0.000050 |
 0.000
 0x0a010006 | SHLL                                   |      0.550608 |
 0.223
 0x0b010001 |                                        |      0.000096 |
 0.000
 0x0b010002 |                                        |      0.068307 |
 0.027
------------+----------------------------------------+---------------+---------
 TIME SINCE LAST CPU USAGE RESET IN SECONDS:
246.528065
-------------------------------------------------------------------------------
[/] #
Not all time time, most of the runs both globals were zero, which is wierd
..

I also tried the patch. The issue was reproduced as well.
[/] # cpuuse
-------------------------------------------------------------------------------
                              CPU USAGE BY THREAD
------------+----------------------------------------+---------------+---------
 ID         | NAME                                   | SECONDS       |
PERCENT
------------+----------------------------------------+---------------+---------
*cdemergian build 16.25 gIntrptErrs=233694 gInsertErrs=1*
 0x09010001 | IDLE                                   |     94.488726 |
98.619
 0x0a010001 | UI1                                    |      1.000931 |
 1.044
 0x0a010002 | ntwk                                   |      0.030101 |
 0.031
 0x0a010003 | SCtx                                   |      0.021441 |
 0.022
 0x0a010004 | SCrx                                   |      0.027176 |
 0.028
 0x0a010005 | eRPC                                   |      0.000049 |
 0.000
 0x0a010006 | SHLL                                   |      0.215693 |
 0.225
 0x0b010001 |                                        |      0.000096 |
 0.000
 0x0b010002 |                                        |      0.027211 |
 0.028
------------+----------------------------------------+---------------+---------
 TIME SINCE LAST CPU USAGE RESET IN SECONDS:
 95.867059
-------------------------------------------------------------------------------

we are getting big numbers for gIntrptErrs (is that normal ? I don't
understand all the aspects of the patch just yet)
Catalin

On Tue, Apr 2, 2019 at 10:26 PM Sebastian Huber <
sebastian.huber at embedded-brains.de> wrote:

> On 02/04/2019 16:28, Catalin Demergian wrote:
> > I did more tests. it seems not the same type of error happens every
> > time. I got the _Configuration_Scheduler_priority_dflt  a few times,
> > but also
> > the 'enabled interrupts when they suppposed to be disabled' happened
> > as well
> >
> > RTEMS_INLINE_ROUTINE void _Scheduler_priority_Ready_queue_enqueue(
> >   Chain_Node                     *node,
> >   Scheduler_priority_Ready_queue *ready_queue,
> >   Priority_bit_map_Control       *bit_map
> > )
> > {
> >   Chain_Control *ready_chain = ready_queue->ready_chain;
> >   //_Assert(_ISR_Get_level() != 0);
> >   if(_ISR_Get_level() == 0)
> >     gIntrErrs++;
> >
> >   cnt_before = _Chain_Node_count_unprotected(ready_chain);
> >   _Chain_Append_unprotected( ready_chain, node );
> >   cnt_after = _Chain_Node_count_unprotected(ready_chain);
> >
> >   if(cnt_after != cnt_before + 1)
> > gIntrErrs++;
> >
> >   _Priority_bit_map_Add( bit_map, &ready_queue->Priority_map );
> > }
> >
> > .. and I modified the cpuuse command to display gIntrErrs
>
> What do you get if you use separate counters for these errors?
>
> I still believe that you use some interrupts with a too high priority.
> What happens if you apply the attached patch?
>
> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax     : +49 89 189 47 41-09
> E-Mail  : sebastian.huber at embedded-brains.de
> PGP     : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20190403/a625de0a/attachment-0002.html>