Unable to reconfigure the stack size

Fabricio de Novaes Kucinskis fabricio at satelite.dea.inpe.br
Mon Nov 21 17:31:51 UTC 2011


Hi Joel, and thank you for this sunday answer!

So, RTEMS should have changed the stack size when I defined 
CONFIGURE_MINIMUM_STACK_SIZE and used RTEMS_CONFIGURED_MINIMUM_STACK_SIZE when 
creating the task, but it didn't. 

(I think my other comments - stack checker error, STACK_SIZE and other RTEMS 
configs - can be ignored for now, for the good of the discussion.)

I'd like to know if someone on the list was able to change or not the default 
stack size for the ERC32/SIS BSPs. If someone have tried it, please report if 
it worked or not.

Concerning this:

>> You have pulled a lot of thread and looked for a horse (e.g. blown 
>> stack) which it smells to me like a zebra (e.g. stray write onto stack 
memory).

Among all the details I forgot to add that - remember, I was dizzy ;) -, apart 
from the size of the stack for the ISIS task (4036 bytes, almost at the 4 
kbytes limit) a few instructions before the error, the local variable which is 
overwritten in the memcpy operation is inside the .bss area:

- Local variable address: 0x02058920
- End of .bss section:    0x0205c150

Also, the instruction that causes the overwrite seems to be preety safe:

memcpy(&destinationElement, &sourceElement, sizeof(Element)); [not exactly 
this, but this is what it does]

Finally, when I reduce the size of the arrays that are placed inside the .bss 
area (moving its end far from the RTEMS stack start), I have no error.

So, do you think this can be a bug in the ERC32 BSP? If so, where to look at? 
This whole stack configuration seems a little complicated to me.

Thanks again,

Fabrício.


On Sun, 20 Nov 2011 10:18:06 -0600, Joel Sherrill wrote
> On 11/20/2011 09:19 AM, Fabricio de Novaes Kucinskis wrote:
> > Hello everybody,
> >
> > I have an application that has blown the stack, tried a lot of 
> > things to
fix
> > it, and up to now nothing worked. In fact, nothing that I've tried 
> > so far
had
> > any effect on the stack size.
> >
> > It's clear to me that I'm missing or misunderstanding something. In 
> > order
to
> > discover what, follows a detailed description (sorry for the length) 
> > of my problem, and what I've tried - my hope is that, by describing 
> > in detail, I expose what I'm doing wrong and allow you to point it.
> >
> > I'm using RTEMS 4.10.0 for the SIS BSP.
> >
> > I have a task that demands more than the RTEMS default stack size 
> > for the
> > ERC32 (I'm using SIS to try it, but I think this should not be an issue). 
At
> > some point a local variable is overwritten by a memcpy applied to a 
> > large array in the .bss area, and the application falls in an infinite 
loop.
> >
> With sis you can use the watch command to find out where the write 
> comes from.  It may not be a stack overflow but a stray write that 
> just happens to hit the stack.
> > Follows the stack report taken immediatelly before the blow:
> >
> > Stack usage by thread
> >      ID      NAME    LOW          HIGH     CURRENT     AVAILABLE     USED
> > 0x09010001  IDLE 000205E2D0 - 000205F2DF 000205F0A0      4096        752
> > 0x0A010002  ISIS 0002060BB0 - 0002061BBF 00020617C8      4096       4036
> > 0xFFFFFFFF  INTR 000205C5D0 - 000205D5CF 0000000000      4080        576
> > Memory exception at 2cbe13c (illegal address) Unexpected trap ( 9) 
> > at address 0x02014228 data access exception at 0x02CBE13C
> >
> > Note: the "Memory exception" error only happens when I enable the 
> > stack checker, but I assume this is expected.
> >
> Maybe.. maybe not.. :)
> > "Ok", I thought, "the default stack is not enough, so let's change 
> > it" -
and
> > that's what I've been trying to do for the last couple of days, with 
> > no success.
> >
> > The first thing I tried was to change the stack size for the 
> > application, setting CONFIGURE_MINIMUM_STACK_SIZE to 8 kbytes, and 
> > changing the
creation of
> > the task, using RTEMS_CONFIGURED_MINIMUM_STACK_SIZE. But as a new 
> > stack report has shown, it seems to have no effect on the stack size.
The
> > same goes for the CONFIGURE_EXTRA_TASK_STACKS #define.
> >
> CONFIGURE_MINIMUM_STACK_SIZE and the change you made to 
> rtems_task_create for the "ISIS" task should have changed its size to 
> 8K.
> 
> CONFIGURE_EXTRA_TASK_STACKS just reserves memory in the work space to 
> account for tasks which are created with greater than minimum.
> > Starting to be worried, I've tried to change the start address of 
> > the
RTEMS
> > work area by using CONFIGURE_EXECUTIVE_RAM_WORK_AREA, just to see 
> > what happens. Again, nothing different.
> >
> > To illustrate, follows the configuration with everything I tried. 
> > The corresponding stack report is exactly the same as above.
> >
> > #define CONFIGURE_MINIMUM_STACK_SIZE 		(1024 * 8)
> > #define CONFIGURE_EXTRA_TASK_STACKS 		(1024 * 8)
> > #define CONFIGURE_EXECUTIVE_RAM_WORK_AREA	0x02100000
> > #define CONFIGURE_STACK_CHECKER_ENABLED
> >
> Hmmm... I think CONFIGURE_EXECUTIVE_RAM_WORK_AREA may not be honoured 
> by most BSPs and is definitely NOT supported with the new shared 
> workspace shared framework.
> 
> Anyway, the sparc BSPs are definitely overwriting that field in
> 4.10 without honouring if it was NULL or not.
> > Now a little bit desperate, I went into the ERC32 BSP code. There is 
> > a STACK_SIZE defined in the start.S file, but not used there. The 
> > same is redefined in bspgetworkarea.c. But the way it is used 
> > suggests to me that
the
> > RTEMS work area shall not touch into this area:
> >
> Yes .. unfortunately that is defined in two (or three) places. :(
> 
> But it has nothing to do with task stack size.  It is the size of the 
> stack that the BSP initialization runs on until the switch to the 
> first task.
> > void bsp_get_work_area(...) {
> >    /* must be identical to STACK_SIZE in start.S */
> >    #define STACK_SIZE (16 * 1024)
> >    *work_area_start      =&end;
> >    *work_area_size       = (void *)rdb_start - (void *)&end - STACK_SIZE;
> >
> > Being "end" a symbol set at linkcmds.base, pointing to the end of 
> > the .bss area, and "rdb_start" pointing to the end of RAM, I assumed 
> > that RTEMS
sets
> > the ERC stack pointer to work_area_start + work_area_size, but this 
> > seems
not
> > to be the case.
> >
>  From c/src/lib/libbsp/sparc/shared/start.S
> 
>          set     (SYM(rdb_start)), %g6   ! End of RAM
>          st      %sp, [%g6]
>          sub     %sp, 4, %sp             ! stack starts at end of 
> RAM - 4         andn    %sp, 0x0f, %sp          ! align stack on 16-
> byte boundary         mov     %sp, %fp                ! Set frame pointer
>          nop
> 
> The starting stack pointer is set to the end of RAM and grows down.
> The area from end of ram to "end of ram - STACK_SIZE" is the starting 
> stack.
> 
> > Digging deeper into the BSP code, I saw that _CPU_Context_switch 
> > saves the registers, including the stack pointer (not touched by 
> > RTEMS yet). At the first call to _CPU_Context_restore_heir, the stack 
pointer is "restored"
to an
> > address that I couldn't relate to anything:
> >
> > - work_area_start = 0x205c150 (end of the .bss section)
> > - work_area_size = 0x39feb0 (last RAM address - first free RAM 
> > address -
> > STACK_SIZE)
> > - restored stack pointer at _CPU_Context_restore_heir = 0x20606f0 
> > (???)
> >
> Tasks stacks are from the RTEMS Workspace.  That stack pointer
> (0x20606f0) is within the right address range since it is between 
> work_area_start and its end.  It is also properly aligned.  I think 
> that is correct.
> 
> It looks to me that you have a stray write over something on the task 
> stack.  It could be as simple as someone writing too many bytes into a 
> buffer on the stack that isn't that large.
> 
> Step out of that and watch the %sp of the task.  At some point, it 
> must be going bad.  It is doing that because it is being restored from 
> RAM and that memory must have been written to unintentionally.
> 
> This will sound hard but you need to figure out what address the bad 
> %sp is coming from and set a watchpoint on accesses to it.  At some 
> point, the bad value will go in.  Then you have your culprit.
> 
> Then the question is to find the fix.
> > That's when I, already dizzy, stopped trying and decided to ask the list. 
It
> > shall be something (maybe elementary) that I'm doing wrong, but I 
> > don't
know
> > what it could be, nor where to look at anymore.
> >
> You have pulled a lot of thread and looked for a horse (e.g. blown
> stack) which it smells to me like a zebra (e.g. stray write onto stack 
> memory).
> 
> --joel
> > Thanks for your time and best regards,
> >
> > Fabrício Kucinskis.




More information about the users mailing list