RTEMS scheduler bug ?
Catalin Demergian
demergian at gmail.com
Thu Apr 4 12:09:45 UTC 2019
Hi Andrei,
thank you for the elaborated answer !
I checked my STM32 Cube settings, I have 3 enabled interrupts and they all
have the preemption priority/sub priority set to zero !
it seems I ran into the same issue you had in 2015 :)
I will take your advice - change the priorities, regenerate the code and
see what happens.
regards,
Catalin
On Wed, Apr 3, 2019 at 6:03 PM <groups at chichak.ca> wrote:
> This sounds like a problem I had in 2015 on an STM32 that Sebastian helped
> me get around. At the end of the ordeal I wrote:
>
> "A bit of review to begin with; I am working with an STM32F4 ARM Cortex
> M4F processor’s ADC section. A feature of this ADC is the ability to have
> conversions triggered by a timer (great for evenly sampled signals), the
> results transferred using double buffered DMA, giving you an interrupt when
> the DMA buffer is half full, then again when the buffer is full.
>
> To let my task know when there was data ready to process, the DMA
> half/full complete interrupt routines would call rtems_event_send. The task
> would pend on the events, with a timeout in case something screwed up.
>
> In my case, the timer would trigger 14 channels of ADC conversions to
> happen 400 times per second. This would yield 200 half full and 200 full
> interrupts per second, each calling rtems_event_send.
>
> This action would proceed for a few thousand seconds and then the program
> would crash and, doing some painful debugging, I managed to repeatedly
> catch the system attempting to expire some “watchdogs”, which I believe is
> the squelching of outstanding timeouts on satisfied rtems_event_receive
> calls.
>
>
>
> After trying a whole bunch of dead ends, Sebastian Huber asked me about
> the priorities of the interrupts being generated.
>
> The ARM architecture uses a vectored interrupt structure quite similar to
> the MC68xxx processors, where a device generates an interrupt and the
> address of the service routine is automatically picked up from a known
> place in a table without having to poll a bunch of registers to figure out
> what happened and branch off to the handler. The ARM processors have
> assignable priorities on most of the interrupts, so if two interrupts
> assert at the same time, or if a higher priority interrupt happens while an
> interrupt is in progress, you can predict what happens.
>
> What I didn’t know is that RTEMS implements something called Non-Maskable
> Interrupts (NMI). The software NMIs don’t seem to be like hardware NMIs (a
> hardware interrupt that can not be turned off), they just have the same
> name (much like the event watchdogs that aren’t like the hardware
> watchdogs).
>
> What I learned was that RTEMS NMIs are interrupt routines that are not
> allowed to use any RTEMS facilities. So, I presume, these routines would be
> used for dealing with devices that don’t need to interact with task code.
> The upside is that the interrupts can be entered bypassing RTEMS’ overhead.
>
> A drawback is that if you call for RTEMS facilities from within one of
> these routines, apparently, your code becomes rather crashy.
>
>
>
>
> To differentiate between NMI routines and a regular ISR that can call
> RTEMS facilities, the developers use the interrupt priorities and a mask.
> The NMI determination is not specific to the ARM family, each architecture
> has a mask that determines which bits are used to determine if an interrupt
> routine is an NMI or an ISR.
>
>
> ARM uses an 8 bit priority and a priority in the range of 0x00-0x7F
> indicates an NMI. On an ARM, the lower the number, the more urgent the
> interrupt, so NMIs have higher urgency than ISRs that can use RTEMS
> facilities.
>
>
> On the STM32F4, only 4 bits of 8 of priority are implemented, the 4 MSBs
> with the lower 4 being set to 0 (other Cortex M4 implementations have other
> combinations). In ST’s CubeMX tool, you can set the interrupt priority of
> the various interrupt sources in the range of 0-15 and Cube generates code
> to take care of the bit shifting for you. In my case I had set my
> priorities to 1,2,3 and 6. Shifted, these became 0x10, 0x20, 0x30, and
> 0x60. Since these numbers are all below 0x80, the RTEMS code was
> interpreting these interrupts as NMIs, bypassing a bunch of the necessary
> code to support RTEMS calls.
>
> By changing my interrupt priorities to 9, 10, 11, and 14 (shifting gives
> 0x90, 0xA0, 0xB0, and 0xE0), the interrupt routines lost their NMI nature
> and the system immediately became dead stable with a 1kHz tick interrupt
> rate, 2 ADC DMA interrupts at 200Hz each, and a CAN interrupt at about 36Hz.
> ”
>
>
>
> When I ported RTEMS5 to the STM32F7, I ran into the same issue and used
> the same method to get around it.
>
> I hope this helps.
>
> Andrei
>
>
>
>
>
> On 2019-April-03, at 07:46, Sebastian Huber <
> sebastian.huber at embedded-brains.de> wrote:
>
> On 03/04/2019 15:41, Catalin Demergian wrote:
>
> yes, I realized yesterday evening that gIntrErrs could be incremented in
> the second if.
> so I rewrote it like this
>
> int gIntrptErrs;
> int gInsertErrs;
>
> RTEMS_INLINE_ROUTINE void _Scheduler_priority_Ready_queue_enqueue(
> Chain_Node *node,
> Scheduler_priority_Ready_queue *ready_queue,
> Priority_bit_map_Control *bit_map
> )
> {
> Chain_Control *ready_chain = ready_queue->ready_chain;
> //_Assert(_ISR_Get_level() != 0);
> if(_ISR_Get_level() == 0)
> gIntrptErrs++;
>
> cnt_before = _Chain_Node_count_unprotected(ready_chain);
> _Chain_Append/*_unprotected*/( ready_chain, node );
> cnt_after = _Chain_Node_count_unprotected(ready_chain);
>
> if(cnt_after != cnt_before + 1)
> gInsertErrs++;
>
> _Priority_bit_map_Add( bit_map, &ready_queue->Priority_map );
> }
>
> It didn't seem that we enter that code with interrupts enabled .. output
> was
> # cpuuse
>
> -------------------------------------------------------------------------------
> CPU USAGE BY THREAD
>
> ------------+----------------------------------------+---------------+---------
> ID | NAME | SECONDS |
> PERCENT
>
> ------------+----------------------------------------+---------------+---------
> *cdemergian build 11.15 gIntrptErrs=0 gInsertErrs=2*
> 0x09010001 | IDLE | 244.595117 | 99.238
> 0x0a010001 | UI1 | 1.000929 | 0.406
> 0x0a010002 | ntwk | 0.099342 | 0.040
> 0x0a010003 | SCtx | 0.068705 | 0.027
> 0x0a010004 | SCrx | 0.089272 | 0.036
> 0x0a010005 | eRPC | 0.000050 | 0.000
> 0x0a010006 | SHLL | 0.550608 | 0.223
> 0x0b010001 | | 0.000096 | 0.000
> 0x0b010002 | | 0.068307 | 0.027
>
> ------------+----------------------------------------+---------------+---------
> TIME SINCE LAST CPU USAGE RESET IN SECONDS: 246.528065
>
> -------------------------------------------------------------------------------
> [/] #
> Not all time time, most of the runs both globals were zero, which is wierd
> ..
>
> I also tried the patch. The issue was reproduced as well.
> [/] # cpuuse
>
> -------------------------------------------------------------------------------
> CPU USAGE BY THREAD
>
> ------------+----------------------------------------+---------------+---------
> ID | NAME | SECONDS |
> PERCENT
>
> ------------+----------------------------------------+---------------+---------
> *cdemergian build 16.25 gIntrptErrs=233694 gInsertErrs=1*
> 0x09010001 | IDLE | 94.488726 |
> 98.619
> 0x0a010001 | UI1 | 1.000931 |
> 1.044
> 0x0a010002 | ntwk | 0.030101 |
> 0.031
> 0x0a010003 | SCtx | 0.021441 |
> 0.022
> 0x0a010004 | SCrx | 0.027176 |
> 0.028
> 0x0a010005 | eRPC | 0.000049 |
> 0.000
> 0x0a010006 | SHLL | 0.215693 |
> 0.225
> 0x0b010001 | | 0.000096 |
> 0.000
> 0x0b010002 | | 0.027211 |
> 0.028
>
> ------------+----------------------------------------+---------------+---------
> TIME SINCE LAST CPU USAGE RESET IN SECONDS: 95.867059
>
> -------------------------------------------------------------------------------
>
> we are getting big numbers for gIntrptErrs (is that normal ? I don't
> understand all the aspects of the patch just yet)
>
>
> Can you set a break point to the gIntrptErrs++ and print the stack traces?
>
> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone : +49 89 189 47 41-16
> Fax : +49 89 189 47 41-09
> E-Mail : sebastian.huber at embedded-brains.de
> <sebastian.huber at embedded-brains.de>
> PGP : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
>
> _______________________________________________
> users mailing list
> users at rtems.org
> http://lists.rtems.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users at rtems.org
> http://lists.rtems.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20190404/3ab3ec78/attachment-0002.html>
More information about the users
mailing list