Determining the cause of a segfault in RTEMS

Wed Mar 13 01:50:15 UTC 2013

> > Normally in general-purpose (not embedded) programming, the most
> > straightforward way to determine the cause of a segfault is to look at
> > its backtrace. However, this approach isn't really helpful in my
> > case.. I'm writing an RTEMS application that has around 4 tasks, and
> > stepping through the program doesn't exactly show context switches.
> > When I get a segfault, the backtrace only shows the following
> >
> > #0  0xcd95a758 in ?? ()
> > #1  0x40000190 in trap_table () at
> > ../../../../../../../../rtems-4.10.2/c/src/lib/libbsp/sparc/leon3/../.
> > ./spar
> > c/shared/start.S:88
> >
> > Which is extremely unhelpful. Stepping through the program also
> > doesn't really help, because it seems to crash while waiting for
> > events, which makes no sense to me.
> >
> 
> The stack appears corrupt because the exception stack frame is a different
> format to the standard stack frame gdb expects and attempts to decode. All
> the data is present, it is just not available via gdb's stack frame
printing.

That is very helpful, thanks. I'm doing some string copying on arrays
allocated on the stack, which is what I suspected is causing it, but then I
dismissed it because I knew for sure that I'm not copying anything larger
than what the array can hold. But I guess I should take a better look at the
copying code now as I hadn't considered the fact that embedded targets tend
to have small stacks.

As Angelo Fraietta mentioned it could be caused by my stack size being too
small.. however I saw that my minimum stack size is configured to be 1024*2,
which should be enough for what I'm doing.. but I'll play around with it a
bit more and see how that goes. 

> > Is there any other proper way to figure out what's causing the
> > segfault in RTEMS? I'm thinking maybe using the capture engine might
> > be a good idea because it should tell what task was running last, but
> > I haven't used it yet, I only know what it does.. so I'm not sure if
that'll help.
> 
> This is architecture and sometimes BSP specific so exact details are not
easy
> to give. The best solution is find the address the exception is branching
to
> and then set a break point there. The idea is to get as close to the point
the
> exception happens. More often than not this lets you see a decent stack
> frame in gdb. Have a look start.S and see if it is easy to see a possible
entry
> point.

The line in start.S that the backtrace refers to only defines an entry for a
table of traps from what I can tell.. in this case it's a DMA access error,
which indicates that something is writing somewhere that it's not supposed
to. But that's the only thing I can figure out from it.

Generally speaking the start.S file isn't very helpful.. It only contains
code related to starting up the SPARC cpu from what I can tell... 

I thought that having a backtrace like this from segfaults on RTEMS was
normal, which is why I sent the message in the first place :)

> Which version of RTEMS are you using ?
4.10.2

> Which BSP are you using ?
LEON3