PowerPC exceptions and critical interrupts
Till Straumann
strauman at slac.stanford.edu
Mon Jul 7 21:05:13 UTC 2008
Sorry for my late answer. I just didn't have time :-(
Sebastian Huber wrote:
> Hi,
> sorry for the long silence but the critical exceptions troubled me a
> lot in the last days.
>
> Till Straumann wrote:
>> Sebastian Huber wrote:
>>
>>> [...]
>>> Yes, but this is yet another magic place that the BSP developer has
>>> to be aware of.
>>>
>> No. The cpu support takes care of this. If the CPU is one of the
>> 'BookE'-style CPUs then
>> the mask is changed to include CE and DE.
>>
>
> My experience covers only two BSPs and for them were CPU == BSP ;-)
?? I don't understand this comment. What I tried to say was that
(provided that your BSP
uses a CPU flavor that is already supported) the correct IRQ mask bits
should be selected
by the CPU support. If you want to support a new flavor then you have to
add to the different
bits and pieces to CPU support.
>
>>> Here are some constrains that I assume:
>>> C1: Three asynchronous exception types ME, CE and EE with priorities
>>> ME > CE > EE. (Debug exceptions DE?)
>>>
>> Yes.
>>
>>> C2: The exception handler code must not be interrupted by a lower
>>> priority exception (mostly due to the thread dispatch disable level).
>>>
>> I don't think the thread-dispatch disable level is an issue here.
>> IIRC what can get messed up are the SRRs, i.e., it could be that
>>
>> 1) EE saves IR and MSR into SRRs
>> 2) CE happens, saves IR and MSR into critical SRRs
>> 3) handler is dispatched; if the handler re-enables EE then
>> the values saved in SRRs at 1) could be overwritten by a
>> second EE.
>>
>>> C3: To support RTEMS functions and the dispatcher the critical
>>> sections in the system core must be protected by
>>> rtems_interrupt_{enable, disable}. Thus if we want to use RTEMS
>>> within machine check exceptions we have to add MSR_ME to the disable
>>> mask. This may result in check stops. This is an inacceptable tradeoff.
>>>
>> I agree. MSR_ME should always be enabled. This severely limits what a
>> ME handler
>> is allowed to do but IMO in many cases we don't have to recover from
>> a ME which
>> can be considered fatal but we do want print diagnostic output. A
>> check-stop is most
>> undesirable.
>>
>>> C4: The asynchronous high level handler share a common stack.
>>>
>> Just so we can leverage the existing support code (and the whole
>> configuration-table
>> business etc).
>>
>>> The user of critical interrupts has to be very careful not to enable
>>> external exceptions since a
>>>
>>> uint32_t level = _ISR_Get_level();
>>> _ISR_Set_level( 0);
>>> _ISR_Set_level( level);
>>>
>>> will not work.
>>>
>> If we support critical interrupts then the inline routine in irq_supp.h
>> could be modified accordingly. All BSPs using the inline routine would
>> not have to care.
>>
> I don't know how to handle this proper. For my tests I changed it to:
>
> static inline uint32_t _CPU_ISR_Get_level( void )
> {
> register unsigned int msr;
> _CPU_MSR_GET(msr);
> if (msr & MSR_EE) {
> return 0;
> } else {
> return 1;
> }
> }
Not sure that's the right thing. We now have effectively three levels
0 all exceptions enabled
1 only critical exceptions enabled
2 all exceptions disabled
(assuming - as we had agreed earlier - that machine-check is never
disabled,
not even in level 2)
Get_level and Set_level and probably MODES_INTERRUPT_MASK etc.
would have to be adapted accordingly.
The ISR wrapper code would only re-enable exceptions at the level
of the currently handled exception.
>
> static inline void _CPU_ISR_Set_level( uint32_t level )
> {
> register unsigned int msr;
> _CPU_MSR_GET(msr);
> if (!(level & CPU_MODES_INTERRUPT_MASK)) {
> msr |= MSR_EE | MSR_E300_CE | MSR_ME;
This is BAD because it re-introduces PPC flavor-specific bits into
cpukit. The exact bit combination should be stored in a variable which
is set up by CPU support code (libcpu). The most critical value
(i.e., the mask that is used to totally disable interrupts) could be cached
in a SPRG.
> } else {
> msr &= ~MSR_EE;
> }
> _CPU_MSR_SET(msr);
> }
>
> I added MSR_E300_CE and MSR_ME only because _CPU_ISR_Set_level() will
> be used to set the initial task MSR. Is there another way to set such
> bits?
>
>>> The ppc_exc_msr_irq_mask is currently used to protect the stack
>>> switch and ISR nest level.
>>>
>> But the ultimate purpose was using that same mask for
>> rtems_interrupt_enable/disable.
>> For efficiency reasons I'd suggest to cache the mask in a SPRG.
>>
>>> Machine checks are not included in the disable mask (MSR_ME) that is
>>> an inconsistency the current implementation for C4.
>> Yes.
>>
>>> After a review of the code I think that if C2 is true the disabling
>>> of all asynchronous exceptions is superfluous. A stack switch would
>>> be problematic only if we increment the ISR nest level and perform
>>> the stack switch afterwards, because a higher priority exception may
>>> assume that a stack switch is not necessary but actually has not
>>> occured.
>>>
>> Not sure I understand how you would make the operation of testing
>> whether a switch is
>> necessary and doing the switch an atomic operation (you also have to
>> take in consideration
>> the case where CE tries to switch just while EE switches back).
>>
>> I believe I thought about it carefully and came to the conclusion
>> that disabling all
>> interrupts was necessary (but I could, of course be proven wrong). OTOH,
>> I don't see any harm because the critical section is only 7
>> instructions long.
>>
> Yes you are right. If we don't disable the exceptions then a high
> priority exception may overwrite its own stack if the stack is
> switched but the lower priority exception was interrupted before it
> could write the new nest level.
>
> I store now the interrupt stack end in SPR2 and check if the exception
> stack is in the interrupt stack area. If not then the stack will be
> switched:
>
> /* Switch stack if necessary */
> mfspr SCRATCH_REGISTER_0, SPRG1
> cmpw SCRATCH_REGISTER_0, r1
> blt wrap_stack_switch_\_FLVR
> mfspr SCRATCH_REGISTER_1, SPRG2
> cmpw SCRATCH_REGISTER_1, r1
> blt wrap_stack_switch_done_\_FLVR
>
> wrap_stack_switch_\_FLVR:
>
> mr r1, SCRATCH_REGISTER_0
>
> wrap_stack_switch_done_\_FLVR:
>
> The switch back is only (FRAME_REGISTER is the r1 of the exception
> prologue which points to the exception frame):
>
> mr r1, FRAME_REGISTER
Here I'm lost - why do you want to change the current algorithm? It
already takes care of switching
the stack if necessary in a safe way. At least that is what it should do
-- if you think there is an error
then please try to explain.
>
>
> But if old or new the critical exceptions don't work for me. One bug
> was in the epilogue. You have to disable all exceptions which may
> cause a context switch between the restore of the SRRs and the RFI.
Not sure there is a bug. During the epilogue the MSR setting should be
the same as when the
exception was taken (otherwise there is a bug).
Therefore, during the epilogue of a non-critical exception, EE should
already be
disabled, during the epilogue of a critical interrupt CE and EE should
be disabled.
Because there are two sets of SRRs it is OK if a critical exception
happens during the execution
of the epilogue of a non-critical one.
If you think there is a bug please describe a detailed scenario of a
race condition.
-- T.
>
> I wrote a test program with several external timers which generate
> normal interrupts and one critical. Sometimes the system crashes in a
> way that disconnects the Lauterbach debugger. We tried hard but have
> not figured out the error yet. Currently I have no time left so I have
> to postpone this issue.
>
> For the other ISR functions I use this:
>
> static inline uint32_t score_powerpc_interrupt_disable()
> {
> uint32_t level;
> uint32_t mask;
>
> asm volatile (
> "mfmsr %0;"
> "mfspr %1, 272;"
> "andc %1, %0, %1;"
> "mtmsr %1"
> : "=r" (level), "=r" (mask)
> );
>
> return level;
> }
>
> static inline void score_powerpc_interrupt_enable( uint32_t level)
> {
> asm volatile (
> "mtmsr %0"
> :
> : "r" (level)
> );
> }
>
> static inline uint32_t score_powerpc_interrupt_flash( uint32_t level)
> {
> uint32_t current_level;
>
> asm volatile (
> "mfmsr %0;"
> "mtmsr %1;"
> "mtmsr %0"
> : "=&r" (current_level)
> : "r" (level)
> );
>
> return level;
> }
>
>>> The RTEMS support for critical interrupts has its obstacles but is
>>> feasible.
>>>
>> I tried to do prepare for this and I believe the hardest part is
>> already done.
>> What remains is :
>> - change rtems_interrupt_enable()/disable so that all interrupts
>> are masked
>> (I propose to copy ppc_exc_msr_irq_mask -> SPRG0 and then use
>> that to
>> mask and unmask interrupts).
>> - maybe introduce 2 levels of interrupts ('normal' and 'critical')
>> and change
>> the inline (irq_supp.h) so that it enables the current level - 1
>> instead of enabling
>> level 0.
>>
>>> For machine checks I am against to add it due to the check stop
>>> problematic.
>>>
>> Agreed.
>>
>> I would like to make these changes soon, i.e., prior to 4.9 being cut
>>
>> -- Till
>>
>>>> What needs to be done is:
>>>>
>>>> - eliminate current checks for SPRG0 contents
>>>> - load SPRG0 with ppc_exc_msr_irq_mask (probably from
>>>> initialize_exceptions)
>>>> - convert _CPU_ISR_Set_level, _CPU_ISR_Disable, _CPU_ISR_Enable
>>>> etc. to using SPRG0 instead of a static value.
>>>> - convert the bspsupport to using SPRG0 instead of the
>>>> ppc_exc_msr_irq_mask
>>>> variable (optional but would be a little bit more efficient and
>>>> consistent).
>>>>
>>>> -- Till
>>>>
>>>> Sebastian Huber wrote:
>>>>
>>>>
>>>>> Hi,
>>>>> I adapt the gen83xx BSP for the MPC8313E currently. I switched to
>>>>> the new support code (libcpu/powerpc/new-exceptions/bspsupport)
>>>>> and adjusted it for e300 cores. With this framework it is possible
>>>>> to use RTEMS routines and the dispatcher from external and
>>>>> critical exceptions (some sophisticated lock variables are
>>>>> involved for this). In order too test it I configured the
>>>>> interrupt controller (IPIC) in a way so that for some interrupt
>>>>> sources a critical interrupt exception will be generated. This
>>>>> works fine except that the current
>>>>> rtems_interrupt_{enable,disable} functions are not aware of
>>>>> critical interrupts. This leads to race conditions in the system
>>>>> core. I guess that this is also true for Book E critical
>>>>> interrupts. A solution is to add the appropriate bits to the
>>>>> disable mask (MSR_CE or MSR_E300_CE), but this leads to a
>>>>> preprecessor or runtime cpu check orgie. Do we really need RTEMS
>>>>> support in critical or machine check exceptions?
>>>>>
>>>>> Ciao,
>>>>> Sebastian
>>>>>
>>>>>
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.com
>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>
>>>
>>
>> _______________________________________________
>> rtems-users mailing list
>> rtems-users at rtems.com
>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>
>
>
More information about the users
mailing list