Unable to reconfigure the stack size
Fabricio de Novaes Kucinskis
fabricio at satelite.dea.inpe.br
Tue Nov 22 03:38:23 UTC 2011
Hello Gedare,
Thanks for the info on the boot-up stack. My application was really
overflowing the stack, since a local variable was allocated in the .bss area.
But I fixed the configuration following what Joel pointed, and now the app has
enough stack to run.
Thank you and best regards,
Fabrício.
On Mon, 21 Nov 2011 12:34:23 -0500, Gedare Bloom wrote
> Just to add on: note that the "StackStart" (BSP stack) is a small
> stack area that is only used during boot-up. The applications get
> their own stacks that are typically allocated through the Workspace
> during rtems' initialization. It is possible that the application
> task stack is overflowing, and if increasing
> CONFIGURE_MINIMUM_TASK_STACK_SIZE to a large value (e.g. 32 *
> minimum) changes the application behavior then that is a good indicator.
>
> On Mon, Nov 21, 2011 at 12:19 PM, Joel Sherrill
> <joel.sherrill at oarcorp.com> wrote:
> > On 11/21/2011 11:31 AM, Fabricio de Novaes Kucinskis wrote:
> >>
> >> Hi Joel, and thank you for this sunday answer!
> >>
> >> So, RTEMS should have changed the stack size when I defined
> >> CONFIGURE_MINIMUM_STACK_SIZE and used RTEMS_CONFIGURED_MINIMUM_STACK_SIZE
> >> when
> >> creating the task, but it didn't.
> >
> > The constant is CONFIGURE_MINIMUM_TASK_STACK_SIZE.
> >
> > Using the wrong name is certainly going to have no effect.
> >
> > Sorry I didn't see that on a Sunday.
> >>
> >> (I think my other comments - stack checker error, STACK_SIZE and other
> >> RTEMS
> >> configs - can be ignored for now, for the good of the discussion.)
> >
> > Yep.
> >>
> >> I'd like to know if someone on the list was able to change or not the
> >> default
> >> stack size for the ERC32/SIS BSPs. If someone have tried it, please
report
> >> if
> >> it worked or not.
> >>
> > Try the right constant name.
> >
> > One of the examples-v2/ticker/low_ticker variations shows how to use this
> > and I know it works
> > on sis because that is where my workspace size numbers in presentations
came
> > from.
> >>>>
> >>>> You have pulled a lot of thread and looked for a horse (e.g. blown
> >>>> stack) which it smells to me like a zebra (e.g. stray write onto stack
> >>
> >> memory).
> >>
> >> Among all the details I forgot to add that - remember, I was dizzy ;) -,
> >> apart
> >> from the size of the stack for the ISIS task (4036 bytes, almost at the 4
> >> kbytes limit) a few instructions before the error, the local variable
> >> which is
> >> overwritten in the memcpy operation is inside the .bss area:
> >>
> >> - Local variable address: 0x02058920
> >> - End of .bss section: 0x0205c150
> >>
> >> Also, the instruction that causes the overwrite seems to be preety safe:
> >>
> >> memcpy(&destinationElement,&sourceElement, sizeof(Element)); [not exactly
> >> this, but this is what it does]
> >
> > The instruction is safe .. but maybe not the destination address or
contents
> > of the source.
> >
> >> Finally, when I reduce the size of the arrays that are placed inside the
> >> .bss
> >> area (moving its end far from the RTEMS stack start), I have no error.
> >
> > Something else got corrupted. :)
> >
> > Since you know the memcpy is the culprit, is the destination correct given
> > the source?
> >
> > Can you do a backtrace?
> >
> >> So, do you think this can be a bug in the ERC32 BSP? If so, where to look
> >> at?
> >> This whole stack configuration seems a little complicated to me.
> >
> > It isn't a BSP error. You are probably lucky it is reproducible. :)
> >
> > --joel
> >>
> >> Thanks again,
> >>
> >> Fabrício.
> >>
> >>
> >> On Sun, 20 Nov 2011 10:18:06 -0600, Joel Sherrill wrote
> >>>
> >>> On 11/20/2011 09:19 AM, Fabricio de Novaes Kucinskis wrote:
> >>>>
> >>>> Hello everybody,
> >>>>
> >>>> I have an application that has blown the stack, tried a lot of
> >>>> things to
> >>
> >> fix
> >>>>
> >>>> it, and up to now nothing worked. In fact, nothing that I've tried
> >>>> so far
> >>
> >> had
> >>>>
> >>>> any effect on the stack size.
> >>>>
> >>>> It's clear to me that I'm missing or misunderstanding something. In
> >>>> order
> >>
> >> to
> >>>>
> >>>> discover what, follows a detailed description (sorry for the length)
> >>>> of my problem, and what I've tried - my hope is that, by describing
> >>>> in detail, I expose what I'm doing wrong and allow you to point it.
> >>>>
> >>>> I'm using RTEMS 4.10.0 for the SIS BSP.
> >>>>
> >>>> I have a task that demands more than the RTEMS default stack size
> >>>> for the
> >>>> ERC32 (I'm using SIS to try it, but I think this should not be an
> >>>> issue).
> >>
> >> At
> >>>>
> >>>> some point a local variable is overwritten by a memcpy applied to a
> >>>> large array in the .bss area, and the application falls in an infinite
> >>
> >> loop.
> >>>
> >>> With sis you can use the watch command to find out where the write
> >>> comes from. It may not be a stack overflow but a stray write that
> >>> just happens to hit the stack.
> >>>>
> >>>> Follows the stack report taken immediatelly before the blow:
> >>>>
> >>>> Stack usage by thread
> >>>> ID NAME LOW HIGH CURRENT AVAILABLE
> >>>> USED
> >>>> 0x09010001 IDLE 000205E2D0 - 000205F2DF 000205F0A0 4096
752
> >>>> 0x0A010002 ISIS 0002060BB0 - 0002061BBF 00020617C8 4096
4036
> >>>> 0xFFFFFFFF INTR 000205C5D0 - 000205D5CF 0000000000 4080
576
> >>>> Memory exception at 2cbe13c (illegal address) Unexpected trap ( 9)
> >>>> at address 0x02014228 data access exception at 0x02CBE13C
> >>>>
> >>>> Note: the "Memory exception" error only happens when I enable the
> >>>> stack checker, but I assume this is expected.
> >>>>
> >>> Maybe.. maybe not.. :)
> >>>>
> >>>> "Ok", I thought, "the default stack is not enough, so let's change
> >>>> it" -
> >>
> >> and
> >>>>
> >>>> that's what I've been trying to do for the last couple of days, with
> >>>> no success.
> >>>>
> >>>> The first thing I tried was to change the stack size for the
> >>>> application, setting CONFIGURE_MINIMUM_STACK_SIZE to 8 kbytes, and
> >>>> changing the
> >>
> >> creation of
> >>>>
> >>>> the task, using RTEMS_CONFIGURED_MINIMUM_STACK_SIZE. But as a new
> >>>> stack report has shown, it seems to have no effect on the stack size.
> >>
> >> The
> >>>>
> >>>> same goes for the CONFIGURE_EXTRA_TASK_STACKS #define.
> >>>>
> >>> CONFIGURE_MINIMUM_STACK_SIZE and the change you made to
> >>> rtems_task_create for the "ISIS" task should have changed its size to
> >>> 8K.
> >>>
> >>> CONFIGURE_EXTRA_TASK_STACKS just reserves memory in the work space to
> >>> account for tasks which are created with greater than minimum.
> >>>>
> >>>> Starting to be worried, I've tried to change the start address of
> >>>> the
> >>
> >> RTEMS
> >>>>
> >>>> work area by using CONFIGURE_EXECUTIVE_RAM_WORK_AREA, just to see
> >>>> what happens. Again, nothing different.
> >>>>
> >>>> To illustrate, follows the configuration with everything I tried.
> >>>> The corresponding stack report is exactly the same as above.
> >>>>
> >>>> #define CONFIGURE_MINIMUM_STACK_SIZE (1024 * 8)
> >>>> #define CONFIGURE_EXTRA_TASK_STACKS (1024 * 8)
> >>>> #define CONFIGURE_EXECUTIVE_RAM_WORK_AREA 0x02100000
> >>>> #define CONFIGURE_STACK_CHECKER_ENABLED
> >>>>
> >>> Hmmm... I think CONFIGURE_EXECUTIVE_RAM_WORK_AREA may not be honoured
> >>> by most BSPs and is definitely NOT supported with the new shared
> >>> workspace shared framework.
> >>>
> >>> Anyway, the sparc BSPs are definitely overwriting that field in
> >>> 4.10 without honouring if it was NULL or not.
> >>>>
> >>>> Now a little bit desperate, I went into the ERC32 BSP code. There is
> >>>> a STACK_SIZE defined in the start.S file, but not used there. The
> >>>> same is redefined in bspgetworkarea.c. But the way it is used
> >>>> suggests to me that
> >>
> >> the
> >>>>
> >>>> RTEMS work area shall not touch into this area:
> >>>>
> >>> Yes .. unfortunately that is defined in two (or three) places. :(
> >>>
> >>> But it has nothing to do with task stack size. It is the size of the
> >>> stack that the BSP initialization runs on until the switch to the
> >>> first task.
> >>>>
> >>>> void bsp_get_work_area(...) {
> >>>> /* must be identical to STACK_SIZE in start.S */
> >>>> #define STACK_SIZE (16 * 1024)
> >>>> *work_area_start =&end;
> >>>> *work_area_size = (void *)rdb_start - (void *)&end -
> >>>> STACK_SIZE;
> >>>>
> >>>> Being "end" a symbol set at linkcmds.base, pointing to the end of
> >>>> the .bss area, and "rdb_start" pointing to the end of RAM, I assumed
> >>>> that RTEMS
> >>
> >> sets
> >>>>
> >>>> the ERC stack pointer to work_area_start + work_area_size, but this
> >>>> seems
> >>
> >> not
> >>>>
> >>>> to be the case.
> >>>>
> >>> From c/src/lib/libbsp/sparc/shared/start.S
> >>>
> >>> set (SYM(rdb_start)), %g6 ! End of RAM
> >>> st %sp, [%g6]
> >>> sub %sp, 4, %sp ! stack starts at end of
> >>> RAM - 4 andn %sp, 0x0f, %sp ! align stack on 16-
> >>> byte boundary mov %sp, %fp ! Set frame
pointer
> >>> nop
> >>>
> >>> The starting stack pointer is set to the end of RAM and grows down.
> >>> The area from end of ram to "end of ram - STACK_SIZE" is the starting
> >>> stack.
> >>>
> >>>> Digging deeper into the BSP code, I saw that _CPU_Context_switch
> >>>> saves the registers, including the stack pointer (not touched by
> >>>> RTEMS yet). At the first call to _CPU_Context_restore_heir, the stack
> >>
> >> pointer is "restored"
> >> to an
> >>>>
> >>>> address that I couldn't relate to anything:
> >>>>
> >>>> - work_area_start = 0x205c150 (end of the .bss section)
> >>>> - work_area_size = 0x39feb0 (last RAM address - first free RAM
> >>>> address -
> >>>> STACK_SIZE)
> >>>> - restored stack pointer at _CPU_Context_restore_heir = 0x20606f0
> >>>> (???)
> >>>>
> >>> Tasks stacks are from the RTEMS Workspace. That stack pointer
> >>> (0x20606f0) is within the right address range since it is between
> >>> work_area_start and its end. It is also properly aligned. I think
> >>> that is correct.
> >>>
> >>> It looks to me that you have a stray write over something on the task
> >>> stack. It could be as simple as someone writing too many bytes into a
> >>> buffer on the stack that isn't that large.
> >>>
> >>> Step out of that and watch the %sp of the task. At some point, it
> >>> must be going bad. It is doing that because it is being restored from
> >>> RAM and that memory must have been written to unintentionally.
> >>>
> >>> This will sound hard but you need to figure out what address the bad
> >>> %sp is coming from and set a watchpoint on accesses to it. At some
> >>> point, the bad value will go in. Then you have your culprit.
> >>>
> >>> Then the question is to find the fix.
> >>>>
> >>>> That's when I, already dizzy, stopped trying and decided to ask the
> >>>> list.
> >>
> >> It
> >>>>
> >>>> shall be something (maybe elementary) that I'm doing wrong, but I
> >>>> don't
> >>
> >> know
> >>>>
> >>>> what it could be, nor where to look at anymore.
> >>>>
> >>> You have pulled a lot of thread and looked for a horse (e.g. blown
> >>> stack) which it smells to me like a zebra (e.g. stray write onto stack
> >>> memory).
> >>>
> >>> --joel
> >>>>
> >>>> Thanks for your time and best regards,
> >>>>
> >>>> Fabrício Kucinskis.
> >
> > _______________________________________________
> > rtems-users mailing list
> > rtems-users at rtems.org
> > http://www.rtems.org/mailman/listinfo/rtems-users
> >
More information about the users
mailing list