SEVERE Bug in mc68360 _ISR_Handler???

Tue Jul 17 11:47:47 UTC 2001

Hello,

i address this list to get some help concerning the behaviour of the 
_ISR_Handler used for the MC68360 in rtems-4.5.0. I think there is a 
very small chance, that lower-level interrupts get lost (or delayed 
forever), when a higher level interrupt comes up at a critical point. 

This mail is going to be a bit long, but the issue is rather 
complicated aswell.

SYSTEM BACKGROUND
===================
I have designed a system based on the MC68360 (and the gen68360 BSP), 
which is heavily working with Ethernet and TCP/IP. Ethernet works 
with built-in SCC1, all CPM interrupt sources are handled on IRQ 
Level 4. I use the PIT as system clock timer, working on IRQ Level 6 
(so it is higher than the CPM IRQ level).

All in all the system works fine, but in very rare occasions the 
system communication interfaces got stuck. Last week I succeeded to 
find out why. I built a test environment and sent UDP packets to the 
system with almost all the ethernet bandwidth, adding a flood ping to 
the network load. In that environment it took between 1 and 4 hours 
until the system got stuck, and then I found that the "In-Service-
Bit" of SCC1 in the CPM Interrupt Controller was set although the 
core did not execute the corresponding interrupt function. 

This bit gets set whenever the CPM Interrupt Controller sends the 
SCC1 vector number to the CPU and must be cleared in software. As 
long as this bit is set, no other CPM interrupts will be issued. 
NOTE: Even the SCC1 interrupt request will no longer be asserted 
until this bit gets cleared.

The code of the SCC1 interrupt handler 

"m360Enet_interrupt_handler (rtems_vector_number v)"

is correct, whenever this handler gets called, the ISR bit is 
definitively cleared. So my assumption is, that:

1) a SCC1 interrupt gets asserted, 

2) then the CPU performs the corresponding vector fetch

3) but in rare conditions the corresponding handler will not get 
called

By the way: I lowered the PIT IRQ request level to 3, then the system 
worked fine....

STRUCTURE OF _ISR_HANDLER
=========================
For the MC68360 target, the following Preprocessor options are 
defined:

M68K_COLDFIRE_ARCH=0
CPU_HAS_SOFTWARE_INTERRUPT_STACK=1
M68K_HAS_PREINDEXING=1
M68K_HAS_SEPARATE_STACKS=0
M68K_HAS_VBR=1

The function "_ISR_Handler" in exec/score/cpu/m68k/cpu_asm.S performs 
the following basic steps:

A) Increment _Thread_Dispatch_disable_level

B) disable all interrupts

C) If _ISR_Nest_level==0: switch from task stack to interrupt stack

D) Increment _ISR_Nest_level

E) reenable higher interrupts 

F) call user interrupt handler

G) disable all interrupts

H) Decrement _ISR_Nest_level

I) If _ISR_Nest_level==0: switch back from int stack to task stack

J) reenable higher interrupts 

K) Decrement _Thread_Dispatch_disable_level

L) If _Thread_Dispatch_disable_level==0 and Context switch needed: 
switch to new context (using _Thread_Dispatch)

M) return to interrupted code

ASSUMED BUG SEQUENCE
====================

I assume, that the following events may loose the Level4 SCC1 
Interrupt:

1) A SCC1 IRQ4 occures, the CPU performs a vector fetch, the CPM 
Interrupt controller supplies the corresponding vector and sets the 
SCC1-In-Service-Bit

2) The CPU enters _ISR_Handler for Level 4/SCC1 Interrupt. 

3) Before any real code gets executed, the PIT times out, issueing a 
Level 6 Interrupt, so the CPU stores its basic context on the current 
(task) stack and reenters _ISR_Handler for Level 6/PIT. Please note, 
that _ISR_Nest_level and _Thread_Dispatch_disable_level have not yet 
been intcremented for the SCC1 Interrupt.

4) The PIT Interrupt Handler executes and requests a context switch 
(wakes up some task or so). 

5) the general _ISR_Handler for Level 6/PIT then finds out, that it 
was the only instance of _ISR_Handler running (because 
_Thread_Dispatch_disable_level was 0) and therefore it performs a 
context switch according to step L). This will make the corresponding 
"woken" task to be executed, not the SCC1 interrupt handler.

So what do we have now: 

- the SCC1 driver's interrupt handler has not yet been executed

- the physical SCC1 interrupt request signal is not applied to the 
CPU, because it is locked out due to the still-set "SCC1 In-Service" 
bit

- Any further CPM interrupts are blocked

- the CPU executes the woken task, not knowing that it should resume 
executing the SCC1 interrupt function

The SCC1 interrupt function might resume, when RTEMS switches back to 
the suspended task, but this does not seem to happen

NOTE: 
=====
At the head of the _ISR_Handler code, a comment states:

/*
 *  With this approach, lower priority interrupts may
 *  execute twice if a higher priority interrupt is
 *  acknowledged before _Thread_Dispatch_disable is
 *  incremented and the higher priority interrupt
 *  performs a context switch after executing. The lower
 *  priority interrupt will execute (1) at the end of the
 *  higher priority interrupt in the new context if
 *  permitted by the new interrupt level mask, and (2) when
 *  the original context regains the cpu.
 */

The statement itself was very suprising for me. And from my point of 
view, case (1) is not true for hardware, that negates the interrupt 
request as soon as the CPU has performed the vector fetch (which is 
absolutely legal according to the M68K architecture).

It may take a LONG time until case (2) occures. In my situation I 
assume that this doesn't occure at all :-((

SOME QUESTIONS
==============
1) I don't understand, why the suspended context does not get 
executed again. 

2) I don't have a better solution for _ISR_Handler. Any ideas?

3) I can't belive, that I would be the first one to find that problem?

4) I don't know, whether I am on the right track at all...

So here we are. I hope I could make my ideas clear in this mail. Any 
hints welcome....

Bye
	Thomas.

--------------------------------------------
IMD Ingenieurbuero fuer Microcomputertechnik
Thomas Doerfler           Herbststrasse 8
D-82178 Puchheim          Germany
email:    Thomas.Doerfler at imd-systems.de
PGP public key available at: http://www.imd-systems.de/pgp_key.htm