Determining the cause of a segfault in RTEMS

Thu Mar 14 01:43:33 UTC 2013

Well it seems I might have determined the cause of all my miseries.. though
I'm not sure how to explain it, and it could be just an illusion over
something nastier.

See, I've been implementing a simple message queue using RTEMS Partitions
just to understand and demonstrate how partitions work. I've got a task
going into a loop checking the queue and then writing the data into a file.
The data is stored temporarily into the buffers provided by the partition,
so something like an array based queue.

What I've noticed is that when I make it so that the buffer is 40 bytes
bigger than what it used to be (that's the minimum I can increment it due to
alignment from what I can tell), the problem is completely gone. Since the
buffer isn't located in the stack, this tells me that this isn't a stack
issue at all. So this might mean that somewhere in my code I'm copying into
the buffer more than what the buffer can hold, but I've investigated my code
enough and I guarantee that there's nothing that does that. What I can't
explain is how that would corrupt the stack.. unless the partition is
located in the stack area? But I doubt that's the case (I've set the
starting address to be 0x41000000 with a size of 10*160 for 10x 160byte
buffers .. it used to be 120bytes per buffer when I got the segfault). 

Investigating it further, I've noticed that when I set the buffer size to 80
bytes, I get SIGILL ... interesting... This also occurs when I set the
buffer size to 200... (but not at 240)..

I guess Partition on LEON3 just doesn't like any buffer size that's not a
>=2 multiple of 80 bytes... ? 

>On 3/13/2013 9:48 AM, Gedare Bloom wrote:
>> On Wed, Mar 13, 2013 at 8:39 AM, Gedare Bloom <gedare at rtems.org>
>wrote:
>>> An array of 100 char is 800 bytes. 5 such arrays will overflow 4k
>>> limit. If you call multiple such functions the stack pressure can
>>> grow quickly. One easy check is to move all your arrays to global
>>> variables. Then they will be pre-allocated for you in the .data
>>> section of your program binary. Of course this won't work if you are
>>> multitasking or have reentrant functions.
>>>
>> Oops! I should not do math before coffee. 100 chars would be 800 bits,
>> so 100 bytes. I guess you would need a lot of those to overflow your
>> stack. Unless of course you are writing past the end of your arrays,
>> in which case all bets are off.
>You don't need to be too far off on your math on the SPARC.
>
>/** This defines the size of the minimum stack frame. */
>#define CPU_MINIMUM_STACK_FRAME_SIZE          0x60
>
>The ERC32 and LEON's have 8 register windows. As a MINIMUM you are likely
>going to need 8*9=768 bytes of stacks just to do a flush at context switch
time
>unless you can guarantee you method call depth is small. Above that
>minimum,you have to account for automatic variables (e.g. local variables
on
>the stack) and your function call depth.
>
>If your call depth goes beyond 8, you also have to account for some extra
>space for the register window overflow trap handler to work.
>
>I have never seen this link before but it looks good and explains it in
detail:
>http://www.sics.se/~psm/sparcstack.html
>
>This much is SPARC specific. The guidance on not declaring large arrays and
>buffers on the stack .. and call depth impacting stack usage is general
advice --
>independent of target CPU architecture.

I had the impression that the leon cpu had a large bunch of registers, but
for some reason I did not think it would affect the stack usage that much.
Thanks for the insight, it's great to keep this in mind.

I am convinced however that this isn't a stack overflow or stack corruption
or anything like that because
1. I increased the stack size by quite a bit (32k, used to be 4k)
2. I temporarily made the char array global to see if it still happened
anyway
3. The char array is only a hundred bytes.. It's not that much anyway

And the problem still occurred.