Determining the cause of a segfault in RTEMS

Wed Mar 13 12:39:44 UTC 2013

On Wed, Mar 13, 2013 at 1:59 AM, Mohammed Khoory <mkhoory at eiast.ae> wrote:
> I've increased the stack size to 32K by defining
> CONFIGURE_MINIMUM_TASK_STACK_SIZE and CONFIGURE_MINIMUM_STACK_SIZE and
> CONFIGURE_INIT_TASK_STACK_SIZE .. I also made sure that I was starting new
> tasks using RTEMS_CONFIGURED_MINIMUM_STACK_SIZE .. I'm still getting the
> issue however as if nothing has changed. I think this means that there's
> something wrong in my code, like something somewhere is writing out of
> bounds, and not from the stack being too small or anything like that... So
> I'll keep looking
>
> I forgot to mention that the arrays in question that involve a lot of
> copying are around 100-120 chars in size for each task (which is probably
> nothing compared to the default 2k-4k allocated for the stacks).
>
An array of 100 char is 800 bytes. 5 such arrays will overflow 4k
limit. If you call multiple such functions the stack pressure can grow
quickly. One easy check is to move all your arrays to global
variables. Then they will be pre-allocated for you in the .data
section of your program binary. Of course this won't work if you are
multitasking or have reentrant functions.

> Thanks for the replies, it really helped me look in the right direction.
>
> Small question: is it normal for RTEMS_CONFIGURED_MINIMUM_STACK_SIZE to be
> defined as 0? I've noticed this while stepping through the program, and I
> was expecting it to be 32768. I assume maybe the RTEMS code considers 0 as
> "check configuration" or something... I just want to make sure.
>
See cpukit/rtems/include/rtems.h where it is defined. I don't actually
see the macro used anywhere in the tree though, so I don't know if it
has any effect.

>>-----Original Message-----
>>From: Joel Sherrill [mailto:Joel.Sherrill at OARcorp.com]
>>Sent: Wednesday, March 13, 2013 11:03 AM
>>To: Mohammed Khoory
>>Cc: Chris Johns; rtems-users at rtems.org
>>Subject: RE: Determining the cause of a segfault in RTEMS
>>
>>For architectural reasons, 2k is very likely much too small on any SPARC
> tbsp.
>>Try increasing the minimum to something like 32k or larger to prove it is a
> stack
>>problem.
>>
>>If it runs, we can talk about stack checker and usage reports.
>>
>>--joel
>>
>>Mohammed Khoory <mkhoory at eiast.ae> wrote:
>>
>>
>>> > Normally in general-purpose (not embedded) programming, the most
>>> > straightforward way to determine the cause of a segfault is to look
>>> > at its backtrace. However, this approach isn't really helpful in my
>>> > case.. I'm writing an RTEMS application that has around 4 tasks, and
>>> > stepping through the program doesn't exactly show context switches.
>>> > When I get a segfault, the backtrace only shows the following
>>> >
>>> > #0  0xcd95a758 in ?? ()
>>> > #1  0x40000190 in trap_table () at
>>> > ../../../../../../../../rtems-4.10.2/c/src/lib/libbsp/sparc/leon3/../.
>>> > ./spar
>>> > c/shared/start.S:88
>>> >
>>> > Which is extremely unhelpful. Stepping through the program also
>>> > doesn't really help, because it seems to crash while waiting for
>>> > events, which makes no sense to me.
>>> >
>>>
>>> The stack appears corrupt because the exception stack frame is a
>>> different format to the standard stack frame gdb expects and attempts
>>> to decode. All the data is present, it is just not available via gdb's
>>> stack frame
>>printing.
>>
>>That is very helpful, thanks. I'm doing some string copying on arrays
> allocated
>>on the stack, which is what I suspected is causing it, but then I dismissed
> it
>>because I knew for sure that I'm not copying anything larger than what the
>>array can hold. But I guess I should take a better look at the copying code
> now
>>as I hadn't considered the fact that embedded targets tend to have small
>>stacks.
>>
>>As Angelo Fraietta mentioned it could be caused by my stack size being too
>>small.. however I saw that my minimum stack size is configured to be
> 1024*2,
>>which should be enough for what I'm doing.. but I'll play around with it a
> bit
>>more and see how that goes.
>>
>>> > Is there any other proper way to figure out what's causing the
>>> > segfault in RTEMS? I'm thinking maybe using the capture engine might
>>> > be a good idea because it should tell what task was running last,
>>> > but I haven't used it yet, I only know what it does.. so I'm not
>>> > sure if
>>that'll help.
>>>
>>> This is architecture and sometimes BSP specific so exact details are
>>> not
>>easy
>>> to give. The best solution is find the address the exception is
>>> branching
>>to
>>> and then set a break point there. The idea is to get as close to the
>>> point
>>the
>>> exception happens. More often than not this lets you see a decent
>>> stack frame in gdb. Have a look start.S and see if it is easy to see a
>>> possible
>>entry
>>> point.
>>
>>The line in start.S that the backtrace refers to only defines an entry for
> a table
>>of traps from what I can tell.. in this case it's a DMA access error, which
>>indicates that something is writing somewhere that it's not supposed to.
> But
>>that's the only thing I can figure out from it.
>>
>>Generally speaking the start.S file isn't very helpful.. It only contains
> code
>>related to starting up the SPARC cpu from what I can tell...
>>
>>I thought that having a backtrace like this from segfaults on RTEMS was
> normal,
>>which is why I sent the message in the first place :)
>>
>>> Which version of RTEMS are you using ?
>>4.10.2
>>
>>> Which BSP are you using ?
>>LEON3
>>
>>_______________________________________________
>>rtems-users mailing list
>>rtems-users at rtems.org
>>http://www.rtems.org/mailman/listinfo/rtems-users
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users