PowerPC exceptions and critical interrupts

Mon Jul 7 21:05:13 UTC 2008

Sorry for my late answer. I just didn't have time :-(

Sebastian Huber wrote:
> Hi,
> sorry for the long silence but the critical exceptions troubled me a 
> lot in the last days.
>
> Till Straumann wrote:
>> Sebastian Huber wrote:
>>  
>>> [...]
>>> Yes, but this is yet another magic place that the BSP developer has 
>>> to be aware of.
>>>     
>> No. The cpu support takes care of this. If the CPU is one of the 
>> 'BookE'-style CPUs then
>> the mask is changed to include CE and DE.
>>   
>
> My experience covers only two BSPs and for them were CPU ==  BSP ;-)
?? I don't understand this comment. What I tried to say was that 
(provided that your BSP
uses a CPU flavor that is already supported) the correct IRQ mask bits 
should be selected
by the CPU support. If you want to support a new flavor then you have to 
add to the different
bits and pieces to CPU support.
>
>>> Here are some constrains that I assume:
>>> C1: Three asynchronous exception types ME, CE and EE with priorities 
>>> ME > CE > EE. (Debug exceptions DE?)
>>>     
>> Yes.
>>  
>>> C2: The exception handler code must not be interrupted by a lower 
>>> priority exception (mostly due to the thread dispatch disable level).
>>>     
>> I don't think the thread-dispatch disable level is an issue here.
>> IIRC what can get messed up are the SRRs, i.e., it could be that
>>
>> 1)  EE saves IR and MSR into SRRs
>> 2)  CE happens, saves IR and MSR into critical SRRs
>> 3)  handler is dispatched; if the handler re-enables EE then
>>      the values saved in SRRs at 1) could be overwritten by a
>>      second EE.
>>  
>>> C3: To support RTEMS functions and the dispatcher the critical 
>>> sections in the system core must be protected by 
>>> rtems_interrupt_{enable, disable}. Thus if we want to use RTEMS 
>>> within machine check exceptions we have to add MSR_ME to the disable 
>>> mask. This may result in check stops. This is an inacceptable tradeoff.
>>>     
>> I agree. MSR_ME should always be enabled. This severely limits what a 
>> ME handler
>> is allowed to do but IMO in many cases we don't have to recover from 
>> a ME which
>> can be considered fatal but we do want print diagnostic output. A 
>> check-stop is most
>> undesirable.
>>  
>>> C4: The asynchronous high level handler share a common stack.
>>>     
>> Just so we can leverage the existing support code (and the whole 
>> configuration-table
>> business etc).
>>  
>>> The user of critical interrupts has to be very careful not to enable 
>>> external exceptions since a
>>>
>>> uint32_t level = _ISR_Get_level();
>>> _ISR_Set_level( 0);
>>> _ISR_Set_level( level);
>>>
>>> will not work.
>>>     
>> If we support critical interrupts then the inline routine in irq_supp.h
>> could be modified accordingly. All BSPs using the inline routine would
>> not have to care.
>>   
> I don't know how to handle this proper. For my tests I changed it to:
>
> static inline uint32_t   _CPU_ISR_Get_level( void )
> {
>  register unsigned int msr;
>  _CPU_MSR_GET(msr);
>  if (msr & MSR_EE) {
>    return 0;
>  } else {
>    return 1;
>  }
> }
Not sure that's the right thing. We now have effectively three levels
 0 all exceptions enabled
 1 only critical exceptions enabled
 2 all exceptions disabled
(assuming - as we had agreed earlier -  that machine-check is never 
disabled,
not even in level 2)

Get_level and Set_level and probably MODES_INTERRUPT_MASK etc.
would have to be adapted accordingly.

The ISR wrapper code would only re-enable exceptions at the level
of the currently handled exception.
>
> static inline void _CPU_ISR_Set_level( uint32_t   level )
> {
>  register unsigned int msr;
>  _CPU_MSR_GET(msr);
>  if (!(level & CPU_MODES_INTERRUPT_MASK)) {
>    msr |= MSR_EE | MSR_E300_CE | MSR_ME;
This is BAD because it re-introduces PPC flavor-specific bits into
cpukit. The exact bit combination should be stored in a variable which
is set up by CPU support code (libcpu). The most critical value
(i.e., the mask that is used to totally disable interrupts) could be cached
in a SPRG.
>  }  else {
>    msr &= ~MSR_EE;
>  }
>  _CPU_MSR_SET(msr);
> }
>
> I added MSR_E300_CE and MSR_ME only because _CPU_ISR_Set_level() will 
> be used to set the initial task MSR. Is there another way to set such 
> bits?
>
>>> The ppc_exc_msr_irq_mask is currently used to protect the stack 
>>> switch and ISR nest level.
>>>     
>> But the ultimate purpose was using that same mask for 
>> rtems_interrupt_enable/disable.
>> For efficiency reasons I'd suggest to cache the mask in a SPRG.
>>  
>>> Machine checks are not included in the disable mask (MSR_ME) that is 
>>> an inconsistency the current implementation for C4.     
>> Yes.
>>  
>>> After a review of the code I think that if C2 is true the disabling 
>>> of all asynchronous exceptions is superfluous. A stack switch would 
>>> be problematic only if we increment the ISR nest level and perform 
>>> the stack switch afterwards, because a higher priority exception may 
>>> assume that a stack switch is not necessary but actually has not 
>>> occured.
>>>     
>> Not sure I understand how you would make the operation of testing 
>> whether a switch is
>> necessary and doing the switch an atomic operation (you also have to 
>> take in consideration
>> the case where CE tries to switch just while EE switches back).
>>
>> I believe I thought about it carefully and came to the conclusion 
>> that disabling all
>> interrupts was necessary (but I could, of course be proven wrong). OTOH,
>> I don't see any harm because the critical section is only 7 
>> instructions long.
>>   
> Yes you are right. If we don't disable the exceptions then a high 
> priority exception may overwrite its own stack if the stack is 
> switched but the lower priority exception was interrupted before it 
> could write the new nest level.
>
> I store now the interrupt stack end in SPR2 and check if the exception 
> stack is in the interrupt stack area. If not then the stack will be 
> switched:
>
>    /* Switch stack if necessary */
>    mfspr    SCRATCH_REGISTER_0, SPRG1
>    cmpw    SCRATCH_REGISTER_0, r1
>    blt    wrap_stack_switch_\_FLVR
>    mfspr    SCRATCH_REGISTER_1, SPRG2
>    cmpw    SCRATCH_REGISTER_1, r1
>    blt    wrap_stack_switch_done_\_FLVR
>
> wrap_stack_switch_\_FLVR:
>
>    mr    r1, SCRATCH_REGISTER_0
>
> wrap_stack_switch_done_\_FLVR:
>
> The switch back is only (FRAME_REGISTER is the r1 of the exception 
> prologue which points to the exception frame):
>
>    mr    r1, FRAME_REGISTER

Here I'm lost - why do you want to change the current algorithm? It 
already takes care of switching
the stack if necessary in a safe way. At least that is what it should do 
-- if you think there is an error
then please try to explain.
>
>
> But if old or new the critical exceptions don't work for me. One bug 
> was in the epilogue. You have to disable all exceptions which may 
> cause a context switch between the restore of the SRRs and the RFI.
Not sure there is a bug. During the epilogue the MSR setting should be 
the same as when the
exception was taken (otherwise there is a bug).
Therefore, during the epilogue of a non-critical exception, EE should 
already be
disabled, during the epilogue of a critical interrupt CE and EE should 
be disabled.

Because there are two sets of SRRs it is OK if a critical exception 
happens during the execution
of the epilogue of a non-critical one.

If you think there is a bug please describe a detailed scenario of a 
race condition.

-- T.
>
> I wrote a test program with several external timers which generate 
> normal interrupts and one critical. Sometimes the system crashes in a 
> way that disconnects the Lauterbach debugger.  We tried hard but have 
> not figured out the error yet. Currently I have no time left so I have 
> to postpone this issue.
>
> For the other ISR functions I use this:
>
> static inline uint32_t score_powerpc_interrupt_disable()
> {
>  uint32_t level;
>  uint32_t mask;
>
>  asm volatile (
>    "mfmsr %0;"
>    "mfspr %1, 272;"
>    "andc %1, %0, %1;"
>    "mtmsr %1"
>    : "=r" (level), "=r" (mask)
>  );
>
>  return level;
> }
>
> static inline void score_powerpc_interrupt_enable( uint32_t level)
> {
>  asm volatile (
>    "mtmsr %0"
>    :
>    : "r" (level)
>  );
> }
>
> static inline uint32_t score_powerpc_interrupt_flash( uint32_t level)
> {
>  uint32_t current_level;
>
>  asm volatile (
>    "mfmsr %0;"
>    "mtmsr %1;"
>    "mtmsr %0"
>    : "=&r" (current_level)
>    : "r" (level)
>  );
>
>  return level;
> }
>
>>> The RTEMS support for critical interrupts has its obstacles but is 
>>> feasible.
>>>     
>> I tried to do prepare for this and I believe the hardest part is 
>> already done.
>> What remains is :
>>   - change rtems_interrupt_enable()/disable so that all interrupts 
>> are masked
>>     (I propose to copy ppc_exc_msr_irq_mask -> SPRG0 and then use 
>> that to
>>     mask and unmask interrupts).
>>   - maybe introduce 2 levels of interrupts ('normal' and 'critical') 
>> and change
>>     the inline (irq_supp.h) so that it enables the current level - 1 
>> instead of enabling
>>     level 0.
>>  
>>> For machine checks I am against to add it due to the check stop 
>>> problematic.
>>>     
>> Agreed.
>>
>> I would like to make these changes soon, i.e., prior to 4.9 being cut
>>
>> -- Till
>>  
>>>> What needs to be done is:
>>>>
>>>>  - eliminate current checks for SPRG0 contents
>>>>  - load SPRG0 with ppc_exc_msr_irq_mask (probably from
>>>>    initialize_exceptions)
>>>>  - convert _CPU_ISR_Set_level, _CPU_ISR_Disable, _CPU_ISR_Enable
>>>>    etc. to using SPRG0 instead of a static value.
>>>>  - convert the bspsupport to using SPRG0 instead of the 
>>>> ppc_exc_msr_irq_mask
>>>>    variable (optional but would be a little bit more efficient and 
>>>> consistent).
>>>>
>>>> -- Till
>>>>
>>>> Sebastian Huber wrote:
>>>>  
>>>>      
>>>>> Hi,
>>>>> I adapt the gen83xx BSP for the MPC8313E currently. I switched to 
>>>>> the new support code (libcpu/powerpc/new-exceptions/bspsupport) 
>>>>> and adjusted it for e300 cores. With this framework it is possible 
>>>>> to use RTEMS routines and the dispatcher from external and 
>>>>> critical exceptions (some sophisticated lock variables are 
>>>>> involved for this).  In order too test it I configured the 
>>>>> interrupt controller (IPIC) in a way so that for some interrupt 
>>>>> sources a critical interrupt exception will be generated. This 
>>>>> works fine except that the current 
>>>>> rtems_interrupt_{enable,disable} functions are not aware of 
>>>>> critical interrupts. This leads to race conditions in the system 
>>>>> core. I guess that this is also true for Book E critical 
>>>>> interrupts. A solution is to add the appropriate bits to the 
>>>>> disable mask (MSR_CE or MSR_E300_CE), but this leads to a 
>>>>> preprecessor or runtime cpu check orgie. Do we really need RTEMS 
>>>>> support in critical or machine check exceptions?
>>>>>
>>>>> Ciao,
>>>>>     Sebastian
>>>>>
>>>>>               
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.com
>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>         
>>>     
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>   
>
>