SEVERE Bug in mc68360 _ISR_Handler???
Joel Sherrill
joel.sherrill at OARcorp.com
Tue Jul 17 12:16:53 UTC 2001
Quick answer to long detailed analysis. The short answer
is that on m68k architectures where there are separate
stacks, we check the ISF (F/VO) in m68k parlance) as a hardware
means to determine if we are nested or not.
#if ( M68K_HAS_SEPARATE_STACKS == 1 )
movew #0xf000,d0 | isolate format nibble
andw a7@(SAVED+FVO_OFFSET),d0 | get F/VO
cmpiw #0x1000,d0 | is it a throwaway isf?
bne exit | NOT outer level, so branch
#endif
Is there any indication in HARDWARE that this is a nested interrupt?
If not, are you assured that the first instruction of the outer
ISR is or is not executed? The m68k _ISR_Handler code increments
_Thread_Dispatch_disable_level as the first instruction on the CPU32.
SYM (_ISR_Handler):
addql #1,SYM (_Thread_Dispatch_disable_level) | disable
multitasking
If the architecture guarantees the 1st instruction of an ISR is
executed,
then this would be sufficient to precent this scenario.
I am not trying to argue you out of what is happening, only that
we need to be 100% sure that the 360 does not guarantee the execution
of the 1st instruction of an ISR in the case of nested interrupts of
a particular priority sequence. This is basically arguing over
precisely where the transition between (2) and (3) below occur.
--joel
Thomas Doerfler wrote:
>
> Hello,
>
> i address this list to get some help concerning the behaviour of the
> _ISR_Handler used for the MC68360 in rtems-4.5.0. I think there is a
> very small chance, that lower-level interrupts get lost (or delayed
> forever), when a higher level interrupt comes up at a critical point.
>
> This mail is going to be a bit long, but the issue is rather
> complicated aswell.
>
> SYSTEM BACKGROUND
> ===================
> I have designed a system based on the MC68360 (and the gen68360 BSP),
> which is heavily working with Ethernet and TCP/IP. Ethernet works
> with built-in SCC1, all CPM interrupt sources are handled on IRQ
> Level 4. I use the PIT as system clock timer, working on IRQ Level 6
> (so it is higher than the CPM IRQ level).
>
> All in all the system works fine, but in very rare occasions the
> system communication interfaces got stuck. Last week I succeeded to
> find out why. I built a test environment and sent UDP packets to the
> system with almost all the ethernet bandwidth, adding a flood ping to
> the network load. In that environment it took between 1 and 4 hours
> until the system got stuck, and then I found that the "In-Service-
> Bit" of SCC1 in the CPM Interrupt Controller was set although the
> core did not execute the corresponding interrupt function.
>
> This bit gets set whenever the CPM Interrupt Controller sends the
> SCC1 vector number to the CPU and must be cleared in software. As
> long as this bit is set, no other CPM interrupts will be issued.
> NOTE: Even the SCC1 interrupt request will no longer be asserted
> until this bit gets cleared.
>
> The code of the SCC1 interrupt handler
>
> "m360Enet_interrupt_handler (rtems_vector_number v)"
>
> is correct, whenever this handler gets called, the ISR bit is
> definitively cleared. So my assumption is, that:
>
> 1) a SCC1 interrupt gets asserted,
>
> 2) then the CPU performs the corresponding vector fetch
>
> 3) but in rare conditions the corresponding handler will not get
> called
>
> By the way: I lowered the PIT IRQ request level to 3, then the system
> worked fine....
>
> STRUCTURE OF _ISR_HANDLER
> =========================
> For the MC68360 target, the following Preprocessor options are
> defined:
>
> M68K_COLDFIRE_ARCH=0
> CPU_HAS_SOFTWARE_INTERRUPT_STACK=1
> M68K_HAS_PREINDEXING=1
> M68K_HAS_SEPARATE_STACKS=0
> M68K_HAS_VBR=1
>
> The function "_ISR_Handler" in exec/score/cpu/m68k/cpu_asm.S performs
> the following basic steps:
>
> A) Increment _Thread_Dispatch_disable_level
>
> B) disable all interrupts
>
> C) If _ISR_Nest_level==0: switch from task stack to interrupt stack
>
> D) Increment _ISR_Nest_level
>
> E) reenable higher interrupts
>
> F) call user interrupt handler
>
> G) disable all interrupts
>
> H) Decrement _ISR_Nest_level
>
> I) If _ISR_Nest_level==0: switch back from int stack to task stack
>
> J) reenable higher interrupts
>
> K) Decrement _Thread_Dispatch_disable_level
>
> L) If _Thread_Dispatch_disable_level==0 and Context switch needed:
> switch to new context (using _Thread_Dispatch)
>
> M) return to interrupted code
>
> ASSUMED BUG SEQUENCE
> ====================
>
> I assume, that the following events may loose the Level4 SCC1
> Interrupt:
>
> 1) A SCC1 IRQ4 occures, the CPU performs a vector fetch, the CPM
> Interrupt controller supplies the corresponding vector and sets the
> SCC1-In-Service-Bit
>
> 2) The CPU enters _ISR_Handler for Level 4/SCC1 Interrupt.
>
> 3) Before any real code gets executed, the PIT times out, issueing a
> Level 6 Interrupt, so the CPU stores its basic context on the current
> (task) stack and reenters _ISR_Handler for Level 6/PIT. Please note,
> that _ISR_Nest_level and _Thread_Dispatch_disable_level have not yet
> been intcremented for the SCC1 Interrupt.
>
> 4) The PIT Interrupt Handler executes and requests a context switch
> (wakes up some task or so).
>
> 5) the general _ISR_Handler for Level 6/PIT then finds out, that it
> was the only instance of _ISR_Handler running (because
> _Thread_Dispatch_disable_level was 0) and therefore it performs a
> context switch according to step L). This will make the corresponding
> "woken" task to be executed, not the SCC1 interrupt handler.
>
> So what do we have now:
>
> - the SCC1 driver's interrupt handler has not yet been executed
>
> - the physical SCC1 interrupt request signal is not applied to the
> CPU, because it is locked out due to the still-set "SCC1 In-Service"
> bit
>
> - Any further CPM interrupts are blocked
>
> - the CPU executes the woken task, not knowing that it should resume
> executing the SCC1 interrupt function
>
> The SCC1 interrupt function might resume, when RTEMS switches back to
> the suspended task, but this does not seem to happen
>
> NOTE:
> =====
> At the head of the _ISR_Handler code, a comment states:
>
> /*
> * With this approach, lower priority interrupts may
> * execute twice if a higher priority interrupt is
> * acknowledged before _Thread_Dispatch_disable is
> * incremented and the higher priority interrupt
> * performs a context switch after executing. The lower
> * priority interrupt will execute (1) at the end of the
> * higher priority interrupt in the new context if
> * permitted by the new interrupt level mask, and (2) when
> * the original context regains the cpu.
> */
>
> The statement itself was very suprising for me. And from my point of
> view, case (1) is not true for hardware, that negates the interrupt
> request as soon as the CPU has performed the vector fetch (which is
> absolutely legal according to the M68K architecture).
>
> It may take a LONG time until case (2) occures. In my situation I
> assume that this doesn't occure at all :-((
>
> SOME QUESTIONS
> ==============
> 1) I don't understand, why the suspended context does not get
> executed again.
>
> 2) I don't have a better solution for _ISR_Handler. Any ideas?
>
> 3) I can't belive, that I would be the first one to find that problem?
>
> 4) I don't know, whether I am on the right track at all...
>
> So here we are. I hope I could make my ideas clear in this mail. Any
> hints welcome....
>
> Bye
> Thomas.
>
> --------------------------------------------
> IMD Ingenieurbuero fuer Microcomputertechnik
> Thomas Doerfler Herbststrasse 8
> D-82178 Puchheim Germany
> email: Thomas.Doerfler at imd-systems.de
> PGP public key available at: http://www.imd-systems.de/pgp_key.htm
--
Joel Sherrill, Ph.D. Director of Research & Development
joel at OARcorp.com On-Line Applications Research
Ask me about RTEMS: a free RTOS Huntsville AL 35805
Support Available (256) 722-9985
More information about the users
mailing list