PowerPC exceptions and critical interrupts

Thu Jun 19 16:05:44 UTC 2008

Hi,
sorry for the long silence but the critical exceptions troubled me a lot 
in the last days.

Till Straumann wrote:
> Sebastian Huber wrote:
>   
>> [...]
>> Yes, but this is yet another magic place that the BSP developer has to 
>> be aware of.
>>     
> No. The cpu support takes care of this. If the CPU is one of the 
> 'BookE'-style CPUs then
> the mask is changed to include CE and DE.
>   

My experience covers only two BSPs and for them were CPU ==  BSP ;-)

>> Here are some constrains that I assume:
>> C1: Three asynchronous exception types ME, CE and EE with priorities 
>> ME > CE > EE. (Debug exceptions DE?)
>>     
> Yes.
>   
>> C2: The exception handler code must not be interrupted by a lower 
>> priority exception (mostly due to the thread dispatch disable level).
>>     
> I don't think the thread-dispatch disable level is an issue here.
> IIRC what can get messed up are the SRRs, i.e., it could be that
>
> 1)  EE saves IR and MSR into SRRs
> 2)  CE happens, saves IR and MSR into critical SRRs
> 3)  handler is dispatched; if the handler re-enables EE then
>      the values saved in SRRs at 1) could be overwritten by a
>      second EE.
>   
>> C3: To support RTEMS functions and the dispatcher the critical 
>> sections in the system core must be protected by 
>> rtems_interrupt_{enable, disable}. Thus if we want to use RTEMS within 
>> machine check exceptions we have to add MSR_ME to the disable mask. 
>> This may result in check stops. This is an inacceptable tradeoff.
>>     
> I agree. MSR_ME should always be enabled. This severely limits what a ME 
> handler
> is allowed to do but IMO in many cases we don't have to recover from a 
> ME which
> can be considered fatal but we do want print diagnostic output. A 
> check-stop is most
> undesirable.
>   
>> C4: The asynchronous high level handler share a common stack.
>>     
> Just so we can leverage the existing support code (and the whole 
> configuration-table
> business etc).
>   
>> The user of critical interrupts has to be very careful not to enable 
>> external exceptions since a
>>
>> uint32_t level = _ISR_Get_level();
>> _ISR_Set_level( 0);
>> _ISR_Set_level( level);
>>
>> will not work.
>>     
> If we support critical interrupts then the inline routine in irq_supp.h
> could be modified accordingly. All BSPs using the inline routine would
> not have to care.
>   
I don't know how to handle this proper. For my tests I changed it to:

static inline uint32_t   _CPU_ISR_Get_level( void )
{
  register unsigned int msr;
  _CPU_MSR_GET(msr);
  if (msr & MSR_EE) {
    return 0;
  } else {
    return 1;
  }
}

static inline void _CPU_ISR_Set_level( uint32_t   level )
{
  register unsigned int msr;
  _CPU_MSR_GET(msr);
  if (!(level & CPU_MODES_INTERRUPT_MASK)) {
    msr |= MSR_EE | MSR_E300_CE | MSR_ME;
  }  else {
    msr &= ~MSR_EE;
  }
  _CPU_MSR_SET(msr);
}

I added MSR_E300_CE and MSR_ME only because _CPU_ISR_Set_level() will be 
used to set the initial task MSR. Is there another way to set such bits?

>> The ppc_exc_msr_irq_mask is currently used to protect the stack switch 
>> and ISR nest level.
>>     
> But the ultimate purpose was using that same mask for 
> rtems_interrupt_enable/disable.
> For efficiency reasons I'd suggest to cache the mask in a SPRG.
>   
>> Machine checks are not included in the disable mask (MSR_ME) that is 
>> an inconsistency the current implementation for C4. 
>>     
> Yes.
>   
>> After a review of the code I think that if C2 is true the disabling of 
>> all asynchronous exceptions is superfluous. A stack switch would be 
>> problematic only if we increment the ISR nest level and perform the 
>> stack switch afterwards, because a higher priority exception may 
>> assume that a stack switch is not necessary but actually has not occured.
>>     
> Not sure I understand how you would make the operation of testing 
> whether a switch is
> necessary and doing the switch an atomic operation (you also have to 
> take in consideration
> the case where CE tries to switch just while EE switches back).
>
> I believe I thought about it carefully and came to the conclusion that 
> disabling all
> interrupts was necessary (but I could, of course be proven wrong). OTOH,
> I don't see any harm because the critical section is only 7 instructions 
> long.
>   
Yes you are right. If we don't disable the exceptions then a high 
priority exception may overwrite its own stack if the stack is switched 
but the lower priority exception was interrupted before it could write 
the new nest level.

I store now the interrupt stack end in SPR2 and check if the exception 
stack is in the interrupt stack area. If not then the stack will be 
switched:

    /* Switch stack if necessary */
    mfspr    SCRATCH_REGISTER_0, SPRG1
    cmpw    SCRATCH_REGISTER_0, r1
    blt    wrap_stack_switch_\_FLVR
    mfspr    SCRATCH_REGISTER_1, SPRG2
    cmpw    SCRATCH_REGISTER_1, r1
    blt    wrap_stack_switch_done_\_FLVR

wrap_stack_switch_\_FLVR:

    mr    r1, SCRATCH_REGISTER_0

wrap_stack_switch_done_\_FLVR:

The switch back is only (FRAME_REGISTER is the r1 of the exception 
prologue which points to the exception frame):

    mr    r1, FRAME_REGISTER

But if old or new the critical exceptions don't work for me. One bug was 
in the epilogue. You have to disable all exceptions which may cause a 
context switch between the restore of the SRRs and the RFI.

I wrote a test program with several external timers which generate 
normal interrupts and one critical. Sometimes the system crashes in a 
way that disconnects the Lauterbach debugger.  We tried hard but have 
not figured out the error yet. Currently I have no time left so I have 
to postpone this issue.

For the other ISR functions I use this:

static inline uint32_t score_powerpc_interrupt_disable()
{
  uint32_t level;
  uint32_t mask;

  asm volatile (
    "mfmsr %0;"
    "mfspr %1, 272;"
    "andc %1, %0, %1;"
    "mtmsr %1"
    : "=r" (level), "=r" (mask)
  );

  return level;
}

static inline void score_powerpc_interrupt_enable( uint32_t level)
{
  asm volatile (
    "mtmsr %0"
    :
    : "r" (level)
  );
}

static inline uint32_t score_powerpc_interrupt_flash( uint32_t level)
{
  uint32_t current_level;

  asm volatile (
    "mfmsr %0;"
    "mtmsr %1;"
    "mtmsr %0"
    : "=&r" (current_level)
    : "r" (level)
  );

  return level;
}

>> The RTEMS support for critical interrupts has its obstacles but is 
>> feasible.
>>     
> I tried to do prepare for this and I believe the hardest part is already 
> done.
> What remains is :
>   - change rtems_interrupt_enable()/disable so that all interrupts are 
> masked
>     (I propose to copy ppc_exc_msr_irq_mask -> SPRG0 and then use that to
>     mask and unmask interrupts).
>   - maybe introduce 2 levels of interrupts ('normal' and 'critical') and 
> change
>     the inline (irq_supp.h) so that it enables the current level - 1 
> instead of enabling
>     level 0.
>   
>> For machine checks I am against to add it due to the check stop 
>> problematic.
>>     
> Agreed.
>
> I would like to make these changes soon, i.e., prior to 4.9 being cut
>
> -- Till
>   
>>> What needs to be done is:
>>>
>>>  - eliminate current checks for SPRG0 contents
>>>  - load SPRG0 with ppc_exc_msr_irq_mask (probably from
>>>    initialize_exceptions)
>>>  - convert _CPU_ISR_Set_level, _CPU_ISR_Disable, _CPU_ISR_Enable
>>>    etc. to using SPRG0 instead of a static value.
>>>  - convert the bspsupport to using SPRG0 instead of the 
>>> ppc_exc_msr_irq_mask
>>>    variable (optional but would be a little bit more efficient and 
>>> consistent).
>>>
>>> -- Till
>>>
>>> Sebastian Huber wrote:
>>>  
>>>       
>>>> Hi,
>>>> I adapt the gen83xx BSP for the MPC8313E currently. I switched to 
>>>> the new support code (libcpu/powerpc/new-exceptions/bspsupport) and 
>>>> adjusted it for e300 cores. With this framework it is possible to 
>>>> use RTEMS routines and the dispatcher from external and critical 
>>>> exceptions (some sophisticated lock variables are involved for 
>>>> this).  In order too test it I configured the interrupt controller 
>>>> (IPIC) in a way so that for some interrupt sources a critical 
>>>> interrupt exception will be generated. This works fine except that 
>>>> the current rtems_interrupt_{enable,disable} functions are not aware 
>>>> of critical interrupts. This leads to race conditions in the system 
>>>> core. I guess that this is also true for Book E critical interrupts. 
>>>> A solution is to add the appropriate bits to the disable mask 
>>>> (MSR_CE or MSR_E300_CE), but this leads to a preprecessor or runtime 
>>>> cpu check orgie. Do we really need RTEMS support in critical or 
>>>> machine check exceptions?
>>>>
>>>> Ciao,
>>>>     Sebastian
>>>>
>>>>       
>>>>         
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.com
>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>   
>>>       
>>     
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>   

-- 
Sebastian Huber, Embedded Brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax     : +49 89 18 90 80 79-9
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.