rtems_semaphore_obtain

Wed Apr 4 22:15:51 UTC 2007

Jiri Gaisler wrote:
> The investigation we made showed that a too fast interrupt rate
> can cause a stack overflow and crash the system. Ideally, the
> code should be changed such that the current interrupt stack
> frame is fully unwound before switching treads and re-enabling
> interrupts. This might or might not be difficult to implement,
> and might have some undesired side-effects.
>
> Joel, what is your opinion ..?
>
>   
It might be desirable but this is NOT a characteristic that is specific to
the SPARC.  Every port does this the same way.  You eventually have
to have enough spare CPU time to unwind the original interrupt frame

Plus you can't really unwind the stack then.  All task switches in RTEMS
are task to task.  When an interrupt occurs, there are some registers not
saved as part of the regular task to task switch.  These are the core of 
that
Interrupt Stack Frame you need to pop off.  It isn't as clear on the SPARC
but look at the same code on the m68k:

        .global SYM (_ISR_Dispatch)
SYM (_ISR_Dispatch):
        movml   d0-d1/a0-a1,a7 at -
        jsr     SYM (_Thread_Dispatch)    <---- task switch occurs here
        movml   a7 at +,d0-d1/a0-a1
        rte

On the m68k, d0-d1/a0-a1 are scratch registers which are assumed to
be used by _Thread_Dispatch and must be saved/restored around the call.
On the m68k, this means that 4 registers and an interrupt stack frame are
on the stack until the interrupted task gets switched back.

It would require each port to do some pretty fancy magic to avoid this
and I don't even know it is possible.  The SPARC with its register windows
MIGHT be contorted to do it but other architectures probably can't at all.

The test is structured so RTEMS thinks it needs to get to the IDLE thread
but I am not sure it ever gets there (and back).  Does the IDLE task body
ever run?

The bottom line is that the CPU must have some processing power left
for tasks after the interrupts occur and in this case, there simply 
isn't any.

I did a quick experiment and set delay to 2000 and let it count down each
iteration of the loop.  At 1114, the test died.  At 1234, we were still 
hitting
the IDLE thread body every interrupt.  As the number of cycles between
ISRs decreased, we keep interrupting one or two instructions back from
the end of _ISR_Dispatch.  Maybe that area could be better protected.

Jiri... I have attached my version of the test ... maybe there is 
something that
could be done to make things a little better but it still takes ~1114 
cycles to
process each "block/switch to IDLE/ISR/unblock/switch from IDLE to task"
iteration.  We have to let that complete. :)

--joel

> Jiri.
>
> Joel Sherrill wrote:
>   
>> John Pickwick wrote:
>>     
>>> Hi,
>>>
>>> on that thread, after Jiri's last message, it is unclear to me if there is 
>>> still a problem for LEON2 BSP in the official 4.6.5 release (from RTEMS 
>>> site).
>>>
>>> Please can someone clarify this point, thanx
>>>
>>>   
>>>       
>> Jiri ships a version of RTEMS with some patches that may or may not be in
>> the official tree at any given point in time.  He will have to comment 
>> on what
>> is in his current patch set.
>>
>> But so far, we haven't found anything wrong with RTEMS based upon the 
>> test code.
>> So far it looks like the code is generating interrupts faster than they 
>> can be processed
>> and eventually blowing the stack.  There must be enough time between 
>> interrupts
>> so you do all cleanup from the interrupted task eventually.
>>
>> --joel
>>     
>>> John
>>>
>>> ----- Original Message ----- 
>>> From: "Joel Sherrill" <joel.sherrill at oarcorp.com>
>>> To: "Thomas Doerfler (nt)" <Thomas.Doerfler at imd-systems.de>
>>> Cc: <rtems-users at rtems.org>
>>> Sent: Thursday, March 29, 2007 10:22 PM
>>> Subject: Re: rtems_semaphore_obtain
>>>
>>>
>>>   
>>>       
>>>> Thomas Doerfler (nt) wrote:
>>>>     
>>>>         
>> Hi,
>>
>> Some while ago we had a thread on the rtems mailing list which might
>> handle your problem. We found out, that gcc takes the liberty to move
>> some memory accesses that should occure between the irq disable/enable
>> calls to a location before or after the irq disabled section. Try
>> looking for a keyword like "memory barrier" on the mailing list archive 
>> :-)
>>
>>
>>       
>>     
>>>>> This was my first guess also but it fails with the edge of the 4.6 and
>>>>> 4.7 branches
>>>>> as well as using events for synchronization so it is most likely not the
>>>>> barrier problem
>>>>> again.
>>>>>
>>>>> --joel
>>>>>     
>>>>>           
>> wkr,
>> Thomas.
>>
>>
>> Johan Zandin schrieb:
>>
>>       
>>     
>>>>>>> Sergei Organov  writes:
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>               
>>>>>>>> "Johan Zandin" <johan.zandin at space.se> writes:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>                 
>>>>>>>>>       _Context_Switch( &executing->Registers, &heir->Registers );
>>>>>>>>>
>>>>>>>>>       executing = _Thread_Executing;
>>>>>>>>>
>>>>>>>>>       _ISR_Disable( level );         -----+
>>>>>>>>>                                           |  Region where
>>>>>>>>>    }                                      |  an occuring
>>>>>>>>>                                           |  interrupt
>>>>>>>>>     _Thread_Dispatch_disable_level = 0;   |  causes problems
>>>>>>>>>                                           |
>>>>>>>>>     _ISR_Enable( level );            -----+
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>                   
>>>>>>>> But how interrupt can occur when it's disabled in this region?! If
>>>>>>>> _ISR_Disable()/_ISR_Enable() don't work on your target, you have hard
>>>>>>>> trouble anyway.
>>>>>>>>
>>>>>>>>           
>>>>>>>>                 
>>>>>>> The HW interrupt occurs but is left pending until ISRs are enabled,
>>>>>>> so the ISR does not execute until somewhere within the _ISR_Enable call
>>>>>>> (in the first cycle when ISRs are enabled in the CPU again).
>>>>>>>
>>>>>>> /Johan
>>>>>>>
>>>>>>> -----------------------------------------------------------
>>>>>>> Johan Zandin                      Software Engineer
>>>>>>> Saab Space AB                     Phone: +46-31-735 41 47
>>>>>>> SE-405 15 Gothenburg, Sweden      Fax:   +46-31-735 40 00
>>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> rtems-users mailing list
>>>>>>> rtems-users at rtems.com
>>>>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>>>>
>>>>>>>         
>>>>>>>               
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>   
>
>   
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.com
>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>
>>>>     
>>>>         
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.com
>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>   
>>>       
>
>   
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>     
>
>
>   

-------------- next part --------------
A non-text attachment was scrubbed...
Name: rtems_semaphore_error.c
Type: text/x-csrc
Size: 1504 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20070404/ac3b152e/attachment.bin>