Determining the cause of a segfault in RTEMS

Wed Mar 13 15:01:23 UTC 2013

On 3/13/2013 9:48 AM, Gedare Bloom wrote:
> On Wed, Mar 13, 2013 at 8:39 AM, Gedare Bloom <gedare at rtems.org> wrote:
>> On Wed, Mar 13, 2013 at 1:59 AM, Mohammed Khoory <mkhoory at eiast.ae> wrote:
>>> I've increased the stack size to 32K by defining
>>> CONFIGURE_MINIMUM_TASK_STACK_SIZE and CONFIGURE_MINIMUM_STACK_SIZE and
>>> CONFIGURE_INIT_TASK_STACK_SIZE .. I also made sure that I was starting new
>>> tasks using RTEMS_CONFIGURED_MINIMUM_STACK_SIZE .. I'm still getting the
>>> issue however as if nothing has changed. I think this means that there's
>>> something wrong in my code, like something somewhere is writing out of
>>> bounds, and not from the stack being too small or anything like that... So
>>> I'll keep looking
>>>
>>> I forgot to mention that the arrays in question that involve a lot of
>>> copying are around 100-120 chars in size for each task (which is probably
>>> nothing compared to the default 2k-4k allocated for the stacks).
>>>
>> An array of 100 char is 800 bytes. 5 such arrays will overflow 4k
>> limit. If you call multiple such functions the stack pressure can grow
>> quickly. One easy check is to move all your arrays to global
>> variables. Then they will be pre-allocated for you in the .data
>> section of your program binary. Of course this won't work if you are
>> multitasking or have reentrant functions.
>>
> Oops! I should not do math before coffee. 100 chars would be 800 bits,
> so 100 bytes. I guess you would need a lot of those to overflow your
> stack. Unless of course you are writing past the end of your arrays,
> in which case all bets are off.
You don't need to be too far off on your math on the SPARC.

/** This defines the size of the minimum stack frame. */
#define CPU_MINIMUM_STACK_FRAME_SIZE          0x60

The ERC32 and LEON's have 8 register windows. As a MINIMUM you
are likely going to need 8*9=768 bytes of stacks just to do a flush
at context switch time unless you can guarantee you method call
depth is small. Above that minimum,you have to account for
automatic variables (e.g. local variables on the stack) and your
function call depth.

If your call depth goes beyond 8, you also have to account for
some extra space for the register window overflow trap handler
to work.

I have never seen this link before but it looks good and
explains it in detail: http://www.sics.se/~psm/sparcstack.html

This much is SPARC specific. The guidance on not declaring large
arrays and buffers on the stack .. and call depth impacting stack
usage is general advice -- independent of target CPU architecture.
>>> Thanks for the replies, it really helped me look in the right direction.
>>>
>>> Small question: is it normal for RTEMS_CONFIGURED_MINIMUM_STACK_SIZE to be
>>> defined as 0? I've noticed this while stepping through the program, and I
>>> was expecting it to be 32768. I assume maybe the RTEMS code considers 0 as
>>> "check configuration" or something... I just want to make sure.
>>>
>> See cpukit/rtems/include/rtems.h where it is defined. I don't actually
>> see the macro used anywhere in the tree though, so I don't know if it
>> has any effect.
>>
>>>> -----Original Message-----
>>>> From: Joel Sherrill [mailto:Joel.Sherrill at OARcorp.com]
>>>> Sent: Wednesday, March 13, 2013 11:03 AM
>>>> To: Mohammed Khoory
>>>> Cc: Chris Johns; rtems-users at rtems.org
>>>> Subject: RE: Determining the cause of a segfault in RTEMS
>>>>
>>>> For architectural reasons, 2k is very likely much too small on any SPARC
>>> tbsp.
>>>> Try increasing the minimum to something like 32k or larger to prove it is a
>>> stack
>>>> problem.
>>>>
>>>> If it runs, we can talk about stack checker and usage reports.
>>>>
>>>> --joel
>>>>
>>>> Mohammed Khoory <mkhoory at eiast.ae> wrote:
>>>>
>>>>
>>>>>> Normally in general-purpose (not embedded) programming, the most
>>>>>> straightforward way to determine the cause of a segfault is to look
>>>>>> at its backtrace. However, this approach isn't really helpful in my
>>>>>> case.. I'm writing an RTEMS application that has around 4 tasks, and
>>>>>> stepping through the program doesn't exactly show context switches.
>>>>>> When I get a segfault, the backtrace only shows the following
>>>>>>
>>>>>> #0  0xcd95a758 in ?? ()
>>>>>> #1  0x40000190 in trap_table () at
>>>>>> ../../../../../../../../rtems-4.10.2/c/src/lib/libbsp/sparc/leon3/../.
>>>>>> ./spar
>>>>>> c/shared/start.S:88
>>>>>>
>>>>>> Which is extremely unhelpful. Stepping through the program also
>>>>>> doesn't really help, because it seems to crash while waiting for
>>>>>> events, which makes no sense to me.
>>>>>>
>>>>> The stack appears corrupt because the exception stack frame is a
>>>>> different format to the standard stack frame gdb expects and attempts
>>>>> to decode. All the data is present, it is just not available via gdb's
>>>>> stack frame
>>>> printing.
>>>>
>>>> That is very helpful, thanks. I'm doing some string copying on arrays
>>> allocated
>>>> on the stack, which is what I suspected is causing it, but then I dismissed
>>> it
>>>> because I knew for sure that I'm not copying anything larger than what the
>>>> array can hold. But I guess I should take a better look at the copying code
>>> now
>>>> as I hadn't considered the fact that embedded targets tend to have small
>>>> stacks.
>>>>
>>>> As Angelo Fraietta mentioned it could be caused by my stack size being too
>>>> small.. however I saw that my minimum stack size is configured to be
>>> 1024*2,
>>>> which should be enough for what I'm doing.. but I'll play around with it a
>>> bit
>>>> more and see how that goes.
>>>>
>>>>>> Is there any other proper way to figure out what's causing the
>>>>>> segfault in RTEMS? I'm thinking maybe using the capture engine might
>>>>>> be a good idea because it should tell what task was running last,
>>>>>> but I haven't used it yet, I only know what it does.. so I'm not
>>>>>> sure if
>>>> that'll help.
>>>>> This is architecture and sometimes BSP specific so exact details are
>>>>> not
>>>> easy
>>>>> to give. The best solution is find the address the exception is
>>>>> branching
>>>> to
>>>>> and then set a break point there. The idea is to get as close to the
>>>>> point
>>>> the
>>>>> exception happens. More often than not this lets you see a decent
>>>>> stack frame in gdb. Have a look start.S and see if it is easy to see a
>>>>> possible
>>>> entry
>>>>> point.
>>>> The line in start.S that the backtrace refers to only defines an entry for
>>> a table
>>>> of traps from what I can tell.. in this case it's a DMA access error, which
>>>> indicates that something is writing somewhere that it's not supposed to.
>>> But
>>>> that's the only thing I can figure out from it.
>>>>
>>>> Generally speaking the start.S file isn't very helpful.. It only contains
>>> code
>>>> related to starting up the SPARC cpu from what I can tell...
>>>>
>>>> I thought that having a backtrace like this from segfaults on RTEMS was
>>> normal,
>>>> which is why I sent the message in the first place :)
>>>>
>>>>> Which version of RTEMS are you using ?
>>>> 4.10.2
>>>>
>>>>> Which BSP are you using ?
>>>> LEON3
>>>>
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.org
>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.org
>>> http://www.rtems.org/mailman/listinfo/rtems-users

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel.sherrill at OARcorp.com        On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available                (256) 722-9985