RTEMS scheduler bug ?

Thu Apr 4 12:09:45 UTC 2019

Hi Andrei,
thank you for the elaborated answer !

I checked my STM32 Cube settings, I have 3 enabled interrupts and they all
have the preemption priority/sub priority set to zero !
it seems I ran into the same issue you had in 2015 :)
I will take your advice - change the priorities, regenerate the code and
see what happens.

regards,
Catalin

On Wed, Apr 3, 2019 at 6:03 PM <groups at chichak.ca> wrote:

> This sounds like a problem I had in 2015 on an STM32 that Sebastian helped
> me get around. At the end of the ordeal I wrote:
>
> "A bit of review to begin with; I am working with an STM32F4 ARM Cortex
> M4F processor’s ADC section. A feature of this ADC is the ability to have
> conversions triggered by a timer (great for evenly sampled signals), the
> results transferred using double buffered DMA, giving you an interrupt when
> the DMA buffer is half full, then again when the buffer is full.
>
> To let my task know when there was data ready to process, the DMA
> half/full complete interrupt routines would call rtems_event_send. The task
> would pend on the events, with a timeout in case something screwed up.
>
> In my case, the timer would trigger 14 channels of ADC conversions to
> happen 400 times per second. This would yield 200 half full and 200 full
> interrupts per second, each calling rtems_event_send.
>
> This action would proceed for a few thousand seconds and then the program
> would crash and, doing some painful debugging, I managed to repeatedly
> catch the system attempting to expire some “watchdogs”, which I believe is
> the squelching of outstanding timeouts on satisfied rtems_event_receive
> calls.
>
>
>
> After trying a whole bunch of dead ends, Sebastian Huber asked me about
> the priorities of the interrupts being generated.
>
> The ARM architecture uses a vectored interrupt structure quite similar to
> the MC68xxx processors, where a device generates an interrupt and the
> address of the service routine is automatically picked up from a known
> place in a table without having to poll a bunch of registers to figure out
> what happened and branch off to the handler. The ARM processors have
> assignable priorities on most of the interrupts, so if two interrupts
> assert at the same time, or if a higher priority interrupt happens while an
> interrupt is in progress, you can predict what happens.
>
> What I didn’t know is that RTEMS implements something called Non-Maskable
> Interrupts (NMI). The software NMIs don’t seem to be like hardware NMIs (a
> hardware interrupt that can not be turned off), they just have the same
> name (much like the event watchdogs that aren’t like the hardware
> watchdogs).
>
> What I learned was that RTEMS NMIs are interrupt routines that are not
> allowed to use any RTEMS facilities. So, I presume, these routines would be
> used for dealing with devices that don’t need to interact with task code.
> The upside is that the interrupts can be entered bypassing RTEMS’ overhead.
>
> A drawback is that if you call for RTEMS facilities from within one of
> these routines, apparently, your code becomes rather crashy.
>
>
>
>
> To differentiate between NMI routines and a regular ISR that can call
> RTEMS facilities, the developers use the interrupt priorities and a mask.
> The NMI determination is not specific to the ARM family, each architecture
> has a mask that determines which bits are used to determine if an interrupt
> routine is an NMI or an ISR.
>
>
> ARM uses an 8 bit priority and a priority in the range of 0x00-0x7F
> indicates an NMI. On an ARM, the lower the number, the more urgent the
> interrupt, so NMIs have higher urgency than ISRs that can use RTEMS
> facilities.
>
>
> On the STM32F4, only 4 bits of 8 of priority are implemented, the 4 MSBs
> with the lower 4 being set to 0 (other Cortex M4 implementations have other
> combinations). In ST’s CubeMX tool, you can set the interrupt priority of
> the various interrupt sources in the range of 0-15 and Cube generates code
> to take care of the bit shifting for you. In my case I had set my
> priorities to 1,2,3 and 6. Shifted, these became 0x10, 0x20, 0x30, and
> 0x60. Since these numbers are all below 0x80, the RTEMS code was
> interpreting these interrupts as NMIs, bypassing a bunch of the necessary
> code to support RTEMS calls.
>
> By changing my interrupt priorities to 9, 10, 11, and 14 (shifting gives
> 0x90, 0xA0, 0xB0, and 0xE0), the interrupt routines lost their NMI nature
> and the system immediately became dead stable with a 1kHz tick interrupt
> rate, 2 ADC DMA interrupts at 200Hz each, and a CAN interrupt at about 36Hz.
> ”
>
>
>
> When I ported RTEMS5 to the STM32F7, I ran into the same issue and used
> the same method to get around it.
>
> I hope this helps.
>
> Andrei
>
>
>
>
>
> On 2019-April-03, at 07:46, Sebastian Huber <
> sebastian.huber at embedded-brains.de> wrote:
>
> On 03/04/2019 15:41, Catalin Demergian wrote:
>
> yes, I realized yesterday evening that gIntrErrs could be incremented in
> the second if.
> so I rewrote it like this
>
> int gIntrptErrs;
> int gInsertErrs;
>
> RTEMS_INLINE_ROUTINE void _Scheduler_priority_Ready_queue_enqueue(
>   Chain_Node                     *node,
>   Scheduler_priority_Ready_queue *ready_queue,
>   Priority_bit_map_Control       *bit_map
> )
> {
>   Chain_Control *ready_chain = ready_queue->ready_chain;
>   //_Assert(_ISR_Get_level() != 0);
>   if(_ISR_Get_level() == 0)
> gIntrptErrs++;
>
>   cnt_before = _Chain_Node_count_unprotected(ready_chain);
>   _Chain_Append/*_unprotected*/( ready_chain, node );
>   cnt_after = _Chain_Node_count_unprotected(ready_chain);
>
>   if(cnt_after != cnt_before + 1)
> gInsertErrs++;
>
>   _Priority_bit_map_Add( bit_map, &ready_queue->Priority_map );
> }
>
> It didn't seem that we enter that code with interrupts enabled .. output
> was
> # cpuuse
>
> -------------------------------------------------------------------------------
>                               CPU USAGE BY THREAD
>
> ------------+----------------------------------------+---------------+---------
>  ID         | NAME                                   | SECONDS       |
> PERCENT
>
> ------------+----------------------------------------+---------------+---------
> *cdemergian build 11.15 gIntrptErrs=0 gInsertErrs=2*
>  0x09010001 | IDLE                                   | 244.595117 |  99.238
>  0x0a010001 | UI1                                    |   1.000929 |   0.406
>  0x0a010002 | ntwk                                   |   0.099342 |   0.040
>  0x0a010003 | SCtx                                   |   0.068705 |   0.027
>  0x0a010004 | SCrx                                   |   0.089272 |   0.036
>  0x0a010005 | eRPC                                   |   0.000050 |   0.000
>  0x0a010006 | SHLL                                   |   0.550608 |   0.223
>  0x0b010001 |                                        |   0.000096 |   0.000
>  0x0b010002 |                                        |   0.068307 |   0.027
>
> ------------+----------------------------------------+---------------+---------
>  TIME SINCE LAST CPU USAGE RESET IN SECONDS:           246.528065
>
> -------------------------------------------------------------------------------
> [/] #
> Not all time time, most of the runs both globals were zero, which is wierd
> ..
>
> I also tried the patch. The issue was reproduced as well.
> [/] # cpuuse
>
> -------------------------------------------------------------------------------
>                               CPU USAGE BY THREAD
>
> ------------+----------------------------------------+---------------+---------
>  ID         | NAME                                   | SECONDS       |
> PERCENT
>
> ------------+----------------------------------------+---------------+---------
> *cdemergian build 16.25 gIntrptErrs=233694 gInsertErrs=1*
>  0x09010001 | IDLE                                   |    94.488726 |
> 98.619
>  0x0a010001 | UI1                                    |     1.000931 |
>  1.044
>  0x0a010002 | ntwk                                   |     0.030101 |
>  0.031
>  0x0a010003 | SCtx                                   |     0.021441 |
>  0.022
>  0x0a010004 | SCrx                                   |     0.027176 |
>  0.028
>  0x0a010005 | eRPC                                   |     0.000049 |
>  0.000
>  0x0a010006 | SHLL                                   |     0.215693 |
>  0.225
>  0x0b010001 |                                        |     0.000096 |
>  0.000
>  0x0b010002 |                                        |     0.027211 |
>  0.028
>
> ------------+----------------------------------------+---------------+---------
>  TIME SINCE LAST CPU USAGE RESET IN SECONDS:              95.867059
>
> -------------------------------------------------------------------------------
>
> we are getting big numbers for gIntrptErrs (is that normal ? I don't
> understand all the aspects of the patch just yet)
>
>
> Can you set a break point to the gIntrptErrs++ and print the stack traces?
>
> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone   : +49 89 189 47 41-16
> Fax     : +49 89 189 47 41-09
> E-Mail  : sebastian.huber at embedded-brains.de
> <sebastian.huber at embedded-brains.de>
> PGP     : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>
> _______________________________________________
> users mailing list
> users at rtems.org
> http://lists.rtems.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users at rtems.org
> http://lists.rtems.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20190404/3ab3ec78/attachment.html>