MVME 2307 BSP Exceptions

Till Straumann strauman at slac.stanford.edu
Mon Jul 3 19:48:44 UTC 2017


Your approach (associate addresses in stack trace with source code) is 
what I usually do, too.
Also disassemble the fault location and get hints from the register values.

I assume between #2 and #3 there was also a power-cycle.

It is interesting that the fault affects different threads and areas of 
code. This has a strong smell of
memory/stack corruption. I.e., your culprit could be code which is 
totally unrelated with where the
fault occurs.

In the first case (program exception, e.g., due to illegal instruction) 
you'd inspect the fault address
and verify that it holds an illegal instruction. You'd then check if the 
address itself is ok, i.e., somewhere
in the text segment where it would make sense. It could be that the text 
was overwritten. Otherwise,
the problem occurred earlier e.g., by jumping to a corrupted pointer value.

If you are able to locate the corrupted memory then sometimes its 
contents can give you a
hint as to what wrote to it. Otherwise an 'electric fence' tool can be 
quite useful. This is a library
which wraps malloc & friends to allocate memory always so that start or 
end are aligned on a
page-boundary. It uses the MMU to protect adjacent pages and thus causes 
an exception on attempt
to write outside of the allocated region (with standard MMUs it is only 
possible to protect either the
beginning or the end of the region).

HTH
- Till

On 06/26/2017 08:40 PM, Matt Rippa wrote:
> Hi Till, all:
>
> Last week we recommissioned one of our telescope systems using RTEMS 
> and EPICS on the MVME2307 BSP. After nearly two days of uptime (~42 
> hours), we get various unrecoverable exceptions (shown below). This 
> occurs suddenly when the system is "idle" and not being used for many 
> hours.
>
> Any suggestions tracking down this stack trace, finding the offending 
> application code, or getting this to fail-faster?
> We've produced an object dump available at the following link: 
> https://github.com/rcardenes/crcs-info
>
> Has anyone else used the MVME2307 BSP?
>
> Thanks,
> -Matt
>
> System details:
> RTEMS: 4.10.2
> EPICS: 3.14.12.4.
> Hardware: MVME-2700 (using mvme2307 alias)
> VME Boards:
>
>  1. Bancomm 635 Time Board
>  2. PMAC 1
>  3. Xycom-240
>
>
>
> *Event #1:*
>
>     Jun 23 15:19:43  E) PORT: crcs, MSG: Exceptionhandler called for
>     exception 7 (0x7)
>     Jun 23 15:19:43  E) PORT: crcs, MSG: #011 Next PC or Address of
>     fault = 011B010C
>     Jun 23 15:19:43  E) PORT: crcs, MSG: #011 Saved MSR = 0008B032
>     Jun 23 15:19:43  E) PORT: crcs, MSG: #011 Context: Task ID 0x0A01004B
>     Jun 23 15:19:43  E) PORT: crcs, MSG: #011 R0  = 01010101 R1  =
>     00AF5A40 R2  = 00000000 R3  = 007F8014
>     Jun 23 15:19:43  E) PORT: crcs, MSG: #011 R4  = 00000014 R5  =
>     007AD148 R6  = 00AF62BC R7  = 00000000
>     Jun 23 15:19:43  E) PORT: crcs, MSG: #011 R8  = 000031A8 R9  =
>     011B0101 R10 = 00AF62B8 R11 = 007AD148
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 R12 = 22000022 R13 =
>     001E4B90 R14 = AEFE7BBF R15 = 77FF7FBB
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 R16 = AA7FBFFC R17 =
>     FF7FF5FF R18 = 001E0000 R19 = 00000000
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 R20 = 000634A0 R21 =
>     001929E8 R22 = 001A2284 R23 = 00191098
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 R24 = 00000000 R25 =
>     00000001 R26 = 00000001 R27 = 001A85C4
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 R28 = 00AF62BC R29 =
>     00000028 R30 = 00419260 R31 = 007AD148
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 CR  = 22042028
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 CTR = 011B00FF
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 XER = 20000000
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 LR  = 0006371C
>     Jun 23 15:19:44  E) PORT: crcs, MSG: #011 DAR = 00000000
>     Jun 23 15:19:44  E) PORT: crcs, MSG: Stack Trace:
>     Jun 23 15:19:44  E) PORT: crcs, MSG:   IP: 0x011B010C, LR: 0x0006371C
>     Jun 23 15:19:44  E) PORT: crcs, MSG: --^ 0x0006371C--^
>     0x00082FDC--^ 0x000E7094--^ 0x00136048--^ 0x00135F6C
>     Jun 23 15:19:44  E) PORT: crcs, MSG: Suspending faulting task
>     (0x0A01004B) 
>
> ...
>
> Using the memory map for the Exception 7 we reach the following trace:
>
>         0x00135f6c _Thread_Handler
>
>         /gem_sw/targetOS/RTEMS/source/rtems/rtems-4.10.2/cpukit/score/src/threadhandler.c:80
>
>         0x00136048
>
>         /gem_sw/targetOS/RTEMS/source/rtems/rtems-4.10.2/cpukit/score/src/threadhandler.c:145
>
>         0x000e7094
>
>         /gem_sw/epics/R3.14.12.4/base/src/libCom/osi/os/RTEMS/osdThread.c:169
>
>         0x00082fdc
>
>         /gem_sw/epics/R3.14.12.4/base/src/db/dbEvent.c:883
>
>         0x0006371c
>
>         /gem_sw/targetOS/RTEMS/source/tools/gcc-4.4.7/newlib/libc/stdio/vfscanf.c:270
>
>
> Power Cycle
>
> *Event #2:*
>
>     Jun 23 18:32:20  E) PORT: crcs, MSG: Exceptionhandler called for
>     exception 3 (0x3)
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 Next PC or Address of
>     fault = 001541E4
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 Saved MSR = 00009032
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 Context: Task ID 0x09010001
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 R0  = FFF5A4DA R1  =
>     00366720 R2  = 00000000 R3  = FFF0A442
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 R4  = 00352498 R5  =
>     00000008 R6  = 00000000 R7  = FFF0A442
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 R8  = 00352498 R9  =
>     00000002 R10 = 00000000 R11 = 00000001
>     Jun 23 18:32:20  E) PORT: crcs, MSG: #011 R12 = 003BC0E0 R13 =
>     001E4B90 R14 = 00000000 R15 = 00000000
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 R16 = 00000000 R17 =
>     00000000 R18 = 00000000 R19 = 00000000
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 R20 = 00000000 R21 =
>     00000000 R22 = 00000000 R23 = 00000000
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 R24 = 00000000 R25 =
>     00000000 R26 = 00000000 R27 = 00000000
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 R28 = FFF0A340 R29 =
>     00000000 R30 = FFF0A341 R31 = 0034D5B0
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 CR  = 40000004
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 CTR = 0012E3AC
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 XER = 00000000
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 LR  = 000F2CDC
>     Jun 23 18:32:21  E) PORT: crcs, MSG: #011 DAR = FFF0A442
>     Jun 23 18:32:21  E) PORT: crcs, MSG: Stack Trace:
>     Jun 23 18:32:21  E) PORT: crcs, MSG:   IP: 0x001541E4, LR: 0x000F2CDC
>     Jun 23 18:32:21  E) PORT: crcs, MSG: --^ 0x00135F6C
>     Jun 23 18:32:21  E) PORT: crcs, MSG: Suspending faulting task
>     (0x09010001)
>
>
> *​Event #3:​*
>
>     Jun 24 07:37:50  E) PORT: crcs, MSG: *Exception handler called for
>     exception 8(0x8)
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 Next PC or Address of
>     fault = 0015BB08
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 Saved MSR = 00009032
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 Context: Task ID 0x09010001
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 R0  = 00157754 R1  =
>     003664B8 R2  = 00000000 R3  = 00366758
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 R4  = 0036661C R5  =
>     001B4C68 R6  = 00366610 R7  = 00000000
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 R8  = 005F5370 R9  =
>     00642450 R10 = 0034C940 R11 = 00366688
>     Jun 24 07:37:50  E) PORT: crcs, MSG: #011 R12 = 40000048 R13 =
>     001E4B90 R14 = 001B4C68 R15 = 00000000
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 R16 = 00000000 R17 =
>     00000000 R18 = 00000000 R19 = 00366610
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 R20 = 00000000 R21 =
>     00000000 R22 = 00366758 R23 = 001B4CC0
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 R24 = 0036661C R25 =
>     00377E40 R26 = 00000000 R27 = 0000002F
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 R28 = 00377E40 R29 =
>     001E49B0 R30 = 00377E40 R31 = 00000000
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 CR  = 40000048
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 CTR = 00000000
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 XER = 00000000
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 LR  = 00157754
>     Jun 24 07:37:51  E) PORT: crcs, MSG: #011 DAR = 00000000
>     Jun 24 07:37:51  E) PORT: crcs, MSG: Stack Trace:
>     Jun 24 07:37:51  E) PORT: crcs, MSG:   IP: 0x0015BB08, LR: 0x00157754
>     Jun 24 07:37:51  E) PORT: crcs, MSG: --^ 0x00157754--^
>     0x000F2820--^ 0x000F2DA0--^ 0x001E49BD--^ 0x00135F6C
>     Jun 24 07:37:51  E) PORT: crcs, MSG: Suspending faulting task
>     (0x09010001) 
>
>
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20170703/837f7d97/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: efence.c
Type: text/x-csrc
Size: 11304 bytes
Desc: not available
URL: <http://lists.rtems.org/pipermail/users/attachments/20170703/837f7d97/attachment-0001.bin>


More information about the users mailing list