MVME2304 Exception 3

Mon Aug 18 18:20:35 UTC 2003

Paul D Jines writes:
 > 
 > > >
 > > Exception 3 is a mmu error.  At least the motorla_shared powerpc bsp's
 > > use the mmu to create a flat memory model where accesses outside the
 > > range will throw this exception- null pointers, corrupt pointers that
 > > kind of thing.
 > 
 > > This suggests some kind of bsp error.  Could the OP please post the
 > > register dump?  Its often possible to find what was running when the
 > > exception is thrown, and is also helpful to know if it always happens
 > > in the same place.

Cool.  I left your trace quoted and have inserted comments at the
appropriate spots.

 > Here is the output from the GeSys run:
 > 
 > PPC1-Bug>nbo
 > Network Booting from: DEC21140, Controller 0, Device 0
 > Device Name: /pci at 80000000/pci1011,9 at e,0:0,0
 > Loading: epics/vme01/cexp/rtems-4.6.0pre4-SSRL_20030731-mvme2307.exe
 > 
 > Client IP Address      = 192.168.100.40
 > Server IP Address      = 192.168.100.20
 > Gateway IP Address     = 192.168.100.1
 > Subnet IP Address Mask = 255.255.255.0
 > Boot File Name         = epics/vme01/cexp/rtems-4.6.0
 > pre4-SSRL_20030731-mvme2307
 > ..exe
 > Argument File Name     =
 > 
 > Network Boot File load in progress... To abort hit <BREAK>
 > 
 > Bytes Received =&906060, Bytes Loaded =&906060
 > Bytes/Second   =&302020, Elapsed Time =3 Second(s)
 > 
 > Residual-Data Located at: $07F88000
 > 
 > Model: 000000000000000000000000000(e2)
 > Serial: MOT05CA697
 > Processor/Bus frequencies (Hz): 333362624/66671288
 > Time Base Divisor: 4000
 > Memory Size: 8000000
 > Original MSR: 3040
 > Original HID0: 82
 > Original R31: 0
 > 
 > PCI: Probing PCI hardware
 > 
 > RTEMS 4.x/PPC load:
 > Uncompressing the kernel...
 > Kernel at 0x00000000, size=0x1d6690
 > Initrd at 0x00000000, size=0x0
 > Residual data at 0x001d7000
 > Command line at 0x001de000
 > done
 > Now booting...
 > -----------------------------------------
 > Welcome to rtems-4.6.0pre4(PowerPC/PowerPC 604/mvme2307) on MVME 2300
 > -----------------------------------------
 > OpenPIC found at C1000000.
 > pci : Interrupt routing not available for this bsp
 > OpenPIC Version ? (2 CPUs and 17 IRQ sources) at 0xC1000000
 > OpenPIC Vendor 0 (Motorola), Device 0 (Raven), Stepping 2
 > OpenPIC timer frequency is 8333647 Hz
 > Universe II PCI-VME bridge detected at 0xC1040000, IRQ 11
 > Universe Master Ports:
 > Port  VME-Addr   Size       PCI-Adrs   Mode:
 > 0:    0x20000000 0x0F000000 0x10000000 A32, Dat, Sup
 > 1:    0x00000000 0x00FF0000 0x1F000000 A24, Dat, Sup
 > 2:    0x00000000 0x00010000 0x1FFF0000 A16, Dat, Sup
 > Universe Slave Ports:
 > Port  VME-Addr   Size       PCI-Adrs   Mode:
 > 0:    0xC0000000 0x07F80000 0x80000000 A32, Pgm, Dat, Sup, Usr
 > Overriding main IRQ line PCI info with 5

OK, the fault looks like its happening in the BSP.  The address of
interest here is the "Next PC".  It indicates the actual instruction
executing when the exception was asserted.  If you
"powerpc-rtems-objdump -dt <filename>", you can scroll down to that
exact address and identify the function where it occured.  The stack
trace below the register dump will let you trace the function calls
that got you there, which is handy too.

WRT <filename>, you can't use the boot image because that has the
runtime executable compressed in it, and this error is in the runtime
executable.  When you next compile and link, observe the final steps
in the process where a file "rtems" is generated and then compressed.
The previous step is where the runtime image is linked and its the
output file from that step that you'll need to get.

I tend to do my own linking, so its easy to save off a copy of the
executable- but I don't know what your makefile looks like.  Any
further diagnostics of whats going on is going to require this file,
so if I may suggest something, it migh be helpful to work out how to
get it first (since by default it gets compressed, then deleted).
Perhaps it would be straightfoward to add something the above objdump
invocation with output redirected to a text file, right after the
linker runs.

 > exception handler called for exception 3
 >          Next PC or Address of fault = 130E84
 >          Saved MSR = 3032
 >          R0 = 0
 >          R1 = 1D75E0
 >          R2 = 0
 >          R3 = 1DB690
 >          R4 = 0
 >          R5 = 7316000
 >          R6 = A8180
 >          R7 = 1D0000
 >          R8 = 10000
 >          R9 = 78C297
 >          R10 = 7F7F208
 >          R11 = 788000
 >          R12 = 1FE
 >          R13 = 1C1198
 >          R14 = 0
 >          R15 = 0
 >          R16 = 0
 >          R17 = 0
 >          R18 = 0
 >          R19 = 0
 >          R20 = 0
 >          R21 = 0
 >          R22 = 3032
 >          R23 = 1CE2C0
 >          R24 = 1D4F18
 >          R25 = 0
 >          R26 = 1C0000
 >          R27 = 0
 >          R28 = 1D0000
 >          R29 = 1D5138
 >          R30 = 78C2970
 >          R31 = 1DB690
 >          CR = 44800042
 >          CTR = 731600
 >          XER = 20000000
 >          LR = DB968
 >          DAR = 788000
 > Stack Trace:
 >   IP: 0x00130E84, LR: 0x000DB968
 > --^ 0x0011323C--^ 0x000A8180--^ 0x000A8688--^ 0x000C1658--^ 0x000A8124
 > --^ 0x0000321C
 > unrecoverable exception!!! Push reset button

 > The above is from a previous run.  We just powered up
 > the crate to get a fresh dump.  The first try we had
 > incorrect DHCP settings.  The second run produced the
 > exception below (no guarantees it isn't because of
 > our st.sys file.... ).  The third produced a crash
 > similar to the first, but a few registers were different.
 > These were:
 > 
 >           R5 = 4FED000
 >           R11 = 2AB1000
 >           CTR = 4FED00
 >           DAR = 2AB1000
 > 
 > Everything else was the same.

This suggests to me there is an interrupt related timing fault
happening- it'll be hard to trace further without knowing where the
code is executing when the fault occurs.

 > The second run produced this:
 > 
 > (first part the same)
 > 
 > Overriding main IRQ line PCI info with 5
 > dec21140 : found device 'dc1', bus 0x00, dev 0x0E, func 0x00
 > dec2114x : driver attached
 > dec2114x : driver tasks created
 > bodoetcp2c1_1i4nxi t::  0u0s:i0n1g: AnFe:t0wAo:r5k5 :i2n3t e r fnaacmee
 > ''ddcc1
 > ,''
 >   io 11000, mem C1041000, int 10

The above is weird but normal, it comes from the dec21140 driver
printing messages while your application task is also printing them,
leading to the output being interleaved until either one stops.  If
its at all possible, put a sleep(1) or something in right after the
network is intialized, before the bootp starts up.  That way the dec
driver can finish telling us things before the bootp proceeds.

 > Bootpc testing starting
 > bootpc hw address is 0:1:af:a:55:23
 > My ip address is 192.168.100.40
 > Domain Name Server is 130.39.3.5
 > Domain Name Server is 130.39.244.30
 > Domain Name Server is 130.39.254.5
 > Hostname is vme01
 > Ignoring BOOTP/DHCP option code 40
 > Time Server is 132.163.4.102
 > Time Server is 132.163.4.103
 > Time Server is 132.163.4.101
 > Domain name is camd.edu
 > Boot file is epics/vme01/cexp/rtems-4.6.0pre4-SSRL_20030731-mvme2307.exe
 > Subnet mask is 255.255.255.0
 > Server ip address is 192.168.100.20
 > Gateway ip address is 192.168.100.1
 > Log server ip address is 192.168.100.20
 > $Id: init.c,v 1.11 2003/04/24 02:02:25 till Exp $
 > Welcome to RTEMS GeSys
 > This system $Name: SSRL_RTEMS_20030731 $ was built on 20030731PDT18:01:13
 > Trying to synchronize NTP...OK
 > Installing TIOCGWINSZ line discipline: ok.
 > Change Dir to '/TFTP/BOOTP_HOST/epics/vme01/cexp/'
 > Trying symfile '/TFTP/BOOTP_HOST/epics/vme01/cexp/rtems-4.6.0
 > pre4-SSRL_20030731-
 > mvme2307.sym', system script 'st.sys'
 > Type 'cexp.help()' for help (no quotes)
 > 'st.sys':
 > 
 > (text deleted)
 > 
 >   printf("Hello World")
 > Hello World0x0000000b (11)
 >   cexp.help()
 > 
 >         int cexp (char* cmdline)
 > Cexp builtin routines are:
 > 
 > (more text deleted)
 > Type a C expression, e.g.
 > 
 >      printf("hello %s\n","cruelworld" + 5)
 > 
 > 0x00169894 (1480852)

Its definitely gone insane at this point.  It looks sort of like more
than one task starts printing stuff, then one bombs with the
exception....

Gregm

 > n
 > §S<81>
 > T
 > ype 'cexp.help()' for help (no quotes)
 > Ce
 >                                                                 exception
 > handl
 > er called for exception 3
 >          Next PC or Address of fault = 108A60
 >          Saved MSR = B032
 >          R0 = 1746
 >          R1 = 7F41938
 >          R2 = 0
 >          R3 = 6039C
 >          R4 = 7F7F208
 >          R5 = 0
 >          R6 = 2C030000
 >          R7 = 1D0000
 >          R8 = 108A40
 >          R9 = 1C0000
 >          R10 = 7F46E40
 >          R11 = 1B97F0
 >          R12 = 0
 >          R13 = 1C1198
 >          R14 = 0
 >          R15 = 0
 >          R16 = 0
 >          R17 = 0
 >          R18 = 0
 >          R19 = 0
 >          R20 = 0
 >          R21 = 0
 >          R22 = 0
 >          R23 = 1D4F18
 >          R24 = 1C0000
 >          R25 = 1D0000
 >          R26 = 1C0000
 >          R27 = 1C0000
 >          R28 = 7F7C7A8
 >          R29 = 1D4F18
 >          R30 = 7F7F208
 >          R31 = 1B1CA8
 >          CR = 44242028
 >          CTR = 108A40
 >          XER = 0
 >          LR = 113468
 >          DAR = 6039C
 > Stack Trace:
 >   IP: 0x00108A60, LR: 0x00113468
 > --^ 0x00113548--^ 0x000BF2EC--^ 0x00103CC8--^ 0x00110368--^ 0x00110484
 > --^ 0x000A8CE4--^ 0x000D2B44--^ 0x000C0028--^ 0x0011D72C--^ 0x0011D784
 > --^ 0x001363E4--^ 0x00124BA8--^ 0x00126550--^ 0x00126680--^ 0x0006039C
 > --^ 0x00060690--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > --^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > --^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > --^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > --^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > --^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > --^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0--^ 0x000606E0
 > Too many stack frames (stack possibly corrupted), giving up...
 > unrecoverable exception!!! Push reset button