Unable to reconfigure the stack size

Sun Nov 20 16:18:06 UTC 2011

On 11/20/2011 09:19 AM, Fabricio de Novaes Kucinskis wrote:
> Hello everybody,
>
> I have an application that has blown the stack, tried a lot of things to fix
> it, and up to now nothing worked. In fact, nothing that I've tried so far had
> any effect on the stack size.
>
> It's clear to me that I'm missing or misunderstanding something. In order to
> discover what, follows a detailed description (sorry for the length) of my
> problem, and what I've tried - my hope is that, by describing in detail, I
> expose what I'm doing wrong and allow you to point it.
>
> I'm using RTEMS 4.10.0 for the SIS BSP.
>
> I have a task that demands more than the RTEMS default stack size for the
> ERC32 (I'm using SIS to try it, but I think this should not be an issue). At
> some point a local variable is overwritten by a memcpy applied to a large
> array in the .bss area, and the application falls in an infinite loop.
>
With sis you can use the watch command to find out where the
write comes from.  It may not be a stack overflow but a stray write
that just happens to hit the stack.
> Follows the stack report taken immediatelly before the blow:
>
> Stack usage by thread
>      ID      NAME    LOW          HIGH     CURRENT     AVAILABLE     USED
> 0x09010001  IDLE 000205E2D0 - 000205F2DF 000205F0A0      4096        752
> 0x0A010002  ISIS 0002060BB0 - 0002061BBF 00020617C8      4096       4036
> 0xFFFFFFFF  INTR 000205C5D0 - 000205D5CF 0000000000      4080        576
> Memory exception at 2cbe13c (illegal address)
> Unexpected trap ( 9) at address 0x02014228
> data access exception at 0x02CBE13C
>
> Note: the "Memory exception" error only happens when I enable the stack
> checker, but I assume this is expected.
>
Maybe.. maybe not.. :)
> "Ok", I thought, "the default stack is not enough, so let's change it" - and
> that's what I've been trying to do for the last couple of days, with no
> success.
>
> The first thing I tried was to change the stack size for the application,
> setting CONFIGURE_MINIMUM_STACK_SIZE to 8 kbytes, and changing the creation of
> the task, using RTEMS_CONFIGURED_MINIMUM_STACK_SIZE. But as a
> new stack report has shown, it seems to have no effect on the stack size. The
> same goes for the CONFIGURE_EXTRA_TASK_STACKS #define.
>
CONFIGURE_MINIMUM_STACK_SIZE and the change you made to
rtems_task_create for the "ISIS" task should have changed its size
to 8K.

CONFIGURE_EXTRA_TASK_STACKS just reserves memory in the
work space to account for tasks which are created with greater
than minimum.
> Starting to be worried, I've tried to change the start address of the RTEMS
> work area by using CONFIGURE_EXECUTIVE_RAM_WORK_AREA, just to see what
> happens. Again, nothing different.
>
> To illustrate, follows the configuration with everything I tried. The
> corresponding stack report is exactly the same as above.
>
> #define CONFIGURE_MINIMUM_STACK_SIZE 		(1024 * 8)
> #define CONFIGURE_EXTRA_TASK_STACKS 		(1024 * 8)
> #define CONFIGURE_EXECUTIVE_RAM_WORK_AREA	0x02100000
> #define CONFIGURE_STACK_CHECKER_ENABLED
>
Hmmm... I think CONFIGURE_EXECUTIVE_RAM_WORK_AREA may
not be honoured by most BSPs and is definitely NOT supported with the
new shared workspace shared framework.

Anyway, the sparc BSPs are definitely overwriting that field in
4.10 without honouring if it was NULL or not.
> Now a little bit desperate, I went into the ERC32 BSP code. There is a
> STACK_SIZE defined in the start.S file, but not used there. The same is
> redefined in bspgetworkarea.c. But the way it is used suggests to me that the
> RTEMS work area shall not touch into this area:
>
Yes .. unfortunately that is defined in two (or three) places. :(

But it has nothing to do with task stack size.  It is the size of the stack
that the BSP initialization runs on until the switch to the first task.
> void bsp_get_work_area(...) {
>    /* must be identical to STACK_SIZE in start.S */
>    #define STACK_SIZE (16 * 1024)
>    *work_area_start      =&end;
>    *work_area_size       = (void *)rdb_start - (void *)&end - STACK_SIZE;
>
> Being "end" a symbol set at linkcmds.base, pointing to the end of the .bss
> area, and "rdb_start" pointing to the end of RAM, I assumed that RTEMS sets
> the ERC stack pointer to work_area_start + work_area_size, but this seems not
> to be the case.
>
 From c/src/lib/libbsp/sparc/shared/start.S

         set     (SYM(rdb_start)), %g6   ! End of RAM
         st      %sp, [%g6]
         sub     %sp, 4, %sp             ! stack starts at end of RAM - 4
         andn    %sp, 0x0f, %sp          ! align stack on 16-byte boundary
         mov     %sp, %fp                ! Set frame pointer
         nop

The starting stack pointer is set to the end of RAM and grows down.
The area from end of ram to "end of ram - STACK_SIZE" is the starting
stack.

> Digging deeper into the BSP code, I saw that _CPU_Context_switch saves the
> registers, including the stack pointer (not touched by RTEMS yet). At the
> first call to _CPU_Context_restore_heir, the stack pointer is "restored" to an
> address that I couldn't relate to anything:
>
> - work_area_start = 0x205c150 (end of the .bss section)
> - work_area_size = 0x39feb0 (last RAM address - first free RAM address -
> STACK_SIZE)
> - restored stack pointer at _CPU_Context_restore_heir = 0x20606f0 (???)
>
Tasks stacks are from the RTEMS Workspace.  That stack pointer
(0x20606f0) is within the right address range since it is between
work_area_start and its end.  It is also properly aligned.  I think
that is correct.

It looks to me that you have a stray write over something on
the task stack.  It could be as simple as someone writing too
many bytes into a buffer on the stack that isn't that large.

Step out of that and watch the %sp of the task.  At some point,
it must be going bad.  It is doing that because it is being
restored from RAM and that memory must have been written
to unintentionally.

This will sound hard but you need to figure out what address
the bad %sp is coming from and set a watchpoint on accesses
to it.  At some point, the bad value will go in.  Then you have
your culprit.

Then the question is to find the fix.
> That's when I, already dizzy, stopped trying and decided to ask the list. It
> shall be something (maybe elementary) that I'm doing wrong, but I don't know
> what it could be, nor where to look at anymore.
>
You have pulled a lot of thread and looked for a horse (e.g. blown stack)
which it smells to me like a zebra (e.g. stray write onto stack memory).

--joel
> Thanks for your time and best regards,
>
> Fabrício Kucinskis.
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users