Unable to reconfigure the stack size

Mon Nov 21 17:19:00 UTC 2011

On 11/21/2011 11:31 AM, Fabricio de Novaes Kucinskis wrote:
> Hi Joel, and thank you for this sunday answer!
>
> So, RTEMS should have changed the stack size when I defined
> CONFIGURE_MINIMUM_STACK_SIZE and used RTEMS_CONFIGURED_MINIMUM_STACK_SIZE when
> creating the task, but it didn't.
The constant is CONFIGURE_MINIMUM_TASK_STACK_SIZE.

Using the wrong name is certainly going to have no effect.

Sorry I didn't see that on a Sunday.
> (I think my other comments - stack checker error, STACK_SIZE and other RTEMS
> configs - can be ignored for now, for the good of the discussion.)
Yep.
> I'd like to know if someone on the list was able to change or not the default
> stack size for the ERC32/SIS BSPs. If someone have tried it, please report if
> it worked or not.
>
Try the right constant name.

One of the examples-v2/ticker/low_ticker variations shows how to use 
this and I know it works
on sis because that is where my workspace size numbers in presentations 
came from.
>>> You have pulled a lot of thread and looked for a horse (e.g. blown
>>> stack) which it smells to me like a zebra (e.g. stray write onto stack
> memory).
>
> Among all the details I forgot to add that - remember, I was dizzy ;) -, apart
> from the size of the stack for the ISIS task (4036 bytes, almost at the 4
> kbytes limit) a few instructions before the error, the local variable which is
> overwritten in the memcpy operation is inside the .bss area:
>
> - Local variable address: 0x02058920
> - End of .bss section:    0x0205c150
>
> Also, the instruction that causes the overwrite seems to be preety safe:
>
> memcpy(&destinationElement,&sourceElement, sizeof(Element)); [not exactly
> this, but this is what it does]
The instruction is safe .. but maybe not the destination address or 
contents of the source.

> Finally, when I reduce the size of the arrays that are placed inside the .bss
> area (moving its end far from the RTEMS stack start), I have no error.

Something else got corrupted. :)

Since you know the memcpy is the culprit, is the destination correct 
given the source?

Can you do a backtrace?

> So, do you think this can be a bug in the ERC32 BSP? If so, where to look at?
> This whole stack configuration seems a little complicated to me.
It isn't a BSP error.  You are probably lucky it is reproducible. :)

--joel
> Thanks again,
>
> Fabrício.
>
>
> On Sun, 20 Nov 2011 10:18:06 -0600, Joel Sherrill wrote
>> On 11/20/2011 09:19 AM, Fabricio de Novaes Kucinskis wrote:
>>> Hello everybody,
>>>
>>> I have an application that has blown the stack, tried a lot of
>>> things to
> fix
>>> it, and up to now nothing worked. In fact, nothing that I've tried
>>> so far
> had
>>> any effect on the stack size.
>>>
>>> It's clear to me that I'm missing or misunderstanding something. In
>>> order
> to
>>> discover what, follows a detailed description (sorry for the length)
>>> of my problem, and what I've tried - my hope is that, by describing
>>> in detail, I expose what I'm doing wrong and allow you to point it.
>>>
>>> I'm using RTEMS 4.10.0 for the SIS BSP.
>>>
>>> I have a task that demands more than the RTEMS default stack size
>>> for the
>>> ERC32 (I'm using SIS to try it, but I think this should not be an issue).
> At
>>> some point a local variable is overwritten by a memcpy applied to a
>>> large array in the .bss area, and the application falls in an infinite
> loop.
>> With sis you can use the watch command to find out where the write
>> comes from.  It may not be a stack overflow but a stray write that
>> just happens to hit the stack.
>>> Follows the stack report taken immediatelly before the blow:
>>>
>>> Stack usage by thread
>>>       ID      NAME    LOW          HIGH     CURRENT     AVAILABLE     USED
>>> 0x09010001  IDLE 000205E2D0 - 000205F2DF 000205F0A0      4096        752
>>> 0x0A010002  ISIS 0002060BB0 - 0002061BBF 00020617C8      4096       4036
>>> 0xFFFFFFFF  INTR 000205C5D0 - 000205D5CF 0000000000      4080        576
>>> Memory exception at 2cbe13c (illegal address) Unexpected trap ( 9)
>>> at address 0x02014228 data access exception at 0x02CBE13C
>>>
>>> Note: the "Memory exception" error only happens when I enable the
>>> stack checker, but I assume this is expected.
>>>
>> Maybe.. maybe not.. :)
>>> "Ok", I thought, "the default stack is not enough, so let's change
>>> it" -
> and
>>> that's what I've been trying to do for the last couple of days, with
>>> no success.
>>>
>>> The first thing I tried was to change the stack size for the
>>> application, setting CONFIGURE_MINIMUM_STACK_SIZE to 8 kbytes, and
>>> changing the
> creation of
>>> the task, using RTEMS_CONFIGURED_MINIMUM_STACK_SIZE. But as a new
>>> stack report has shown, it seems to have no effect on the stack size.
> The
>>> same goes for the CONFIGURE_EXTRA_TASK_STACKS #define.
>>>
>> CONFIGURE_MINIMUM_STACK_SIZE and the change you made to
>> rtems_task_create for the "ISIS" task should have changed its size to
>> 8K.
>>
>> CONFIGURE_EXTRA_TASK_STACKS just reserves memory in the work space to
>> account for tasks which are created with greater than minimum.
>>> Starting to be worried, I've tried to change the start address of
>>> the
> RTEMS
>>> work area by using CONFIGURE_EXECUTIVE_RAM_WORK_AREA, just to see
>>> what happens. Again, nothing different.
>>>
>>> To illustrate, follows the configuration with everything I tried.
>>> The corresponding stack report is exactly the same as above.
>>>
>>> #define CONFIGURE_MINIMUM_STACK_SIZE 		(1024 * 8)
>>> #define CONFIGURE_EXTRA_TASK_STACKS 		(1024 * 8)
>>> #define CONFIGURE_EXECUTIVE_RAM_WORK_AREA	0x02100000
>>> #define CONFIGURE_STACK_CHECKER_ENABLED
>>>
>> Hmmm... I think CONFIGURE_EXECUTIVE_RAM_WORK_AREA may not be honoured
>> by most BSPs and is definitely NOT supported with the new shared
>> workspace shared framework.
>>
>> Anyway, the sparc BSPs are definitely overwriting that field in
>> 4.10 without honouring if it was NULL or not.
>>> Now a little bit desperate, I went into the ERC32 BSP code. There is
>>> a STACK_SIZE defined in the start.S file, but not used there. The
>>> same is redefined in bspgetworkarea.c. But the way it is used
>>> suggests to me that
> the
>>> RTEMS work area shall not touch into this area:
>>>
>> Yes .. unfortunately that is defined in two (or three) places. :(
>>
>> But it has nothing to do with task stack size.  It is the size of the
>> stack that the BSP initialization runs on until the switch to the
>> first task.
>>> void bsp_get_work_area(...) {
>>>     /* must be identical to STACK_SIZE in start.S */
>>>     #define STACK_SIZE (16 * 1024)
>>>     *work_area_start      =&end;
>>>     *work_area_size       = (void *)rdb_start - (void *)&end - STACK_SIZE;
>>>
>>> Being "end" a symbol set at linkcmds.base, pointing to the end of
>>> the .bss area, and "rdb_start" pointing to the end of RAM, I assumed
>>> that RTEMS
> sets
>>> the ERC stack pointer to work_area_start + work_area_size, but this
>>> seems
> not
>>> to be the case.
>>>
>>    From c/src/lib/libbsp/sparc/shared/start.S
>>
>>           set     (SYM(rdb_start)), %g6   ! End of RAM
>>           st      %sp, [%g6]
>>           sub     %sp, 4, %sp             ! stack starts at end of
>> RAM - 4         andn    %sp, 0x0f, %sp          ! align stack on 16-
>> byte boundary         mov     %sp, %fp                ! Set frame pointer
>>           nop
>>
>> The starting stack pointer is set to the end of RAM and grows down.
>> The area from end of ram to "end of ram - STACK_SIZE" is the starting
>> stack.
>>
>>> Digging deeper into the BSP code, I saw that _CPU_Context_switch
>>> saves the registers, including the stack pointer (not touched by
>>> RTEMS yet). At the first call to _CPU_Context_restore_heir, the stack
> pointer is "restored"
> to an
>>> address that I couldn't relate to anything:
>>>
>>> - work_area_start = 0x205c150 (end of the .bss section)
>>> - work_area_size = 0x39feb0 (last RAM address - first free RAM
>>> address -
>>> STACK_SIZE)
>>> - restored stack pointer at _CPU_Context_restore_heir = 0x20606f0
>>> (???)
>>>
>> Tasks stacks are from the RTEMS Workspace.  That stack pointer
>> (0x20606f0) is within the right address range since it is between
>> work_area_start and its end.  It is also properly aligned.  I think
>> that is correct.
>>
>> It looks to me that you have a stray write over something on the task
>> stack.  It could be as simple as someone writing too many bytes into a
>> buffer on the stack that isn't that large.
>>
>> Step out of that and watch the %sp of the task.  At some point, it
>> must be going bad.  It is doing that because it is being restored from
>> RAM and that memory must have been written to unintentionally.
>>
>> This will sound hard but you need to figure out what address the bad
>> %sp is coming from and set a watchpoint on accesses to it.  At some
>> point, the bad value will go in.  Then you have your culprit.
>>
>> Then the question is to find the fix.
>>> That's when I, already dizzy, stopped trying and decided to ask the list.
> It
>>> shall be something (maybe elementary) that I'm doing wrong, but I
>>> don't
> know
>>> what it could be, nor where to look at anymore.
>>>
>> You have pulled a lot of thread and looked for a horse (e.g. blown
>> stack) which it smells to me like a zebra (e.g. stray write onto stack
>> memory).
>>
>> --joel
>>> Thanks for your time and best regards,
>>>
>>> Fabrício Kucinskis.