still problem with ARM and Unlimited Task Test
Joachim Rahn
Joachim.Rahn at helmholtz-berlin.de
Fri Jun 17 06:39:29 UTC 2011
On 16.06.2011 18:58, Joel Sherrill wrote:
> On 06/16/2011 11:29 AM, Gedare Bloom wrote:
>> On Wed, Jun 15, 2011 at 12:07 PM, Joel Sherrill
>> <joel.sherrill at oarcorp.com> wrote:
>>> Chris.. cc'ing you on this due to the unlimited nature.
>>>
>>> Sebastian.. cc'ing you to conform me on heap side-effect.
>>>
>>> On 06/15/2011 09:31 AM, Joachim Rahn wrote:
>>>> On 15.06.2011 15:44, Joel Sherrill wrote:
>>>>> On 06/15/2011 08:08 AM, Joachim Rahn wrote:
>>>>>> Hi Joel,
>>>>>>
>>>>>> hope I don't bother you too much...
>>>>>>
>>>>>> Now I found some time beside my main work to come back to my problem
>>>>>> with the
>>>>>> failing Unlimited Task Test on our ARM board.
>>>>>>
>>>>>> Any advice regarding the following would be welcome!!!
>>>>>> ------------------------------------------------------
>>>>>>
>>>>>> In fact our problem has nothing directly to do with the update between
>>>>>> RTEMS 4.9.3 and 4.9.5 !
>>>>>>
>>>>>> BTW: We now use the shared code in the BSP tree to initialize the
>>>>>> workspace (BSP_BOOTCARD_HANDLES_RAM_ALLOCATION = TRUE)
>>>>>> and we compile every thing with all RTEMS and HEAP debugging on.
>>>>>>
>>>>>> We now have a setup with RTEMS 4.9.3 on a AT91SAM9263-EK evaluation
>>>>>> board from Atmel which reproduces
>>>>>> our data abort fault.
>>>>>>
>>>>>> The problem seems to be that under some circumstances the routine
>>>>>>
>>>>>> _RTEMS_task_Switch_extension(Thread_Control *executing,
>>>>>> Thread_Control *heir)
>>>>>>
>>>>>> will be called with a reference to an executing task (*executing) after
>>>>>> the routine
>>>>>>
>>>>>> _RTEMS_task_Delete_extension(Thread_Control *executing,
>>>>>> Thread_Control *heir)
>>>>>>
>>>>>> has already deleted the certain task and a following call sequence to
>>>>>>
>>>>>> _RTEMS_tasks_Free
>>>>>> _Objects_Free
>>>>>> _Objects_Shrink_information
>>>>>>
>>>>>> ends up in a call to
>>>>>>
>>>>>> _Heap_Free(Heap_Control *the_heap, void *starting_address)
>>>>>>
>>>>>> which frees the memory used by this certain Thread_Control struct of
>>>>>> that task and overwites the
>>>>>> pointer "executing->task_variables" (which now should be NULL) with some
>>>>>> heap information.
>>>>>> Because "executing->task_variables" now is corrupted the call to
>>>>>> _RTEMS_task_Switch_extension
>>>>>> leads to a data_abort.
>>>>>>
>>>>>> GDB output of the concerning call sequences using the gdb up command
>>>>>> looks like:
>>>>>>
>>>>>> GDB stack walk: _RTEMS_tasks_Delete_extension (executing=0x23fca500,
>>>>>> deleted=0x23fca500)
>>>>>> _User_extensions_Thread_delete (the_thread=0x23fca500)
>>>>>> _Thread_Close (information=0x2002fd98,
>>>>>> the_thread=0x23fca500)
>>>>>> rtems_task_delete (id=0)
>>>>>> test_task (my_number=8)
>>>>>> _Thread_Handler ()
>>>>>> _Objects_API_maximum_class (api=536959452)
>>>>>>
>>>>>> GDB stack walk: _Heap_Free (the_heap=0x2002fe4c,
>>>>>> starting_address=0x23fca0a0)
>>>>>> _Workspace_Free (block=0x23fca0a0)
>>>>>> _Objects_Shrink_information (information=0x2002fd98)
>>>>>> _Objects_Free (information=0x2002fd98,
>>>>>> the_object=0x23fca500)
>>>>>> _RTEMS_tasks_Free (the_task=0x23fca500)
>>>>>> rtems_task_delete (id=0)
>>>>>> test_task (my_number=8)
>>>>>> _Thread_Handler ()
>>>>>> _Objects_API_maximum_class (api=536959452)
>>>>>>
>>>>>> NOW "executing->task_variables" IS CORRUPTED !!!!!
>>>>>>
>>>>>> GDB stack walk: _RTEMS_tasks_Switch_extension (executing=0x23fca500,
>>>>>> heir=0x23fac488)
>>>>>> _User_extensions_Thread_switch (executing=0x23fca500,
>>>>>> heir=0x23fac488)
>>>>>> _Thread_Dispatch ()
>>>>>> _Thread_Enable_dispatch ()
>>>>>> rtems_task_delete (id=0)
>>>>>> test_task (my_number=8)
>>>>>> _Thread_Handler ()
>>>>>> _Objects_API_maximum_class (api=536959452)
>>>>>>
>>>>>>
>>>>>> By chance the pointer to "next_block->prev_size" in the call to
>>>>>> _Heap_Free has the same location as
>>>>>> "executing->task_variables" in the concerning Thread_Control struct and
>>>>>> therefore _RTEMS_task_Switch_extension
>>>>>> tries to access a bad memory location which of course leads to a
>>>>>> data_abort.
>>>>>> May be under other circumstances one will never stumble upon this?
>>>>>>
>>>>> The memory has indeed been freed and is not supposed to be used.
>>>>> In fact, executing->task_variables should be NULL. I see it set
>>>>> to NULL in _RTEMS_tasks_Delete_extension. Can you verify that?
>>>>>
>>>> YES: I've verified it, executing->task_variables is set to NULL by
>>>> _RTEMS_tasks_Delete_extension!
>>>>
>>>> BUT: after _Heap_Free has been called executing->task_variables is altered
>>>> because at the former location of executing->task_variables now the
>>>> _Heap_Free routine
>>>> expects next_block->prev_size and alters it to 3096 or 0xCE0.
>>>>
>>>> The following call to _RTEMS_tasks_Switch_extension checks if
>>>> executing->task_variables is NULL
>>>> but it's now 0xCE0 resp. NOT NULL.
>>> I hope this is reproducible enough to verify this.
>>>
>>> Can you set a watchpoint after the memory location that is set to
>>> NULL and then overwritten? I don't see any reasonable way for the
>>> heap to write to this address. task_variables is the last field
>>> in the Thread_Control block. TCBs are pre-allocated and never
>>> freed back to the heap except for "shrinking" in the unlimited
>>> object case.
>>>
>>> What I suspect is happening, is that when the task in question
>>> is deleted, _Objects_Shrink_information is being called and freeing
>>> a "chunk" of unused TCBs. If the array of TCBS EXACTLY meets the
>>> alignment, then when it is freed, there will be no pad at the end
>>> of it since it is a multiple of 4 (not sure what else). The last four
>>> bytes must be getting overwritten when the memory is freed.
>>>
>>> Chris .. Sebastian.. does that seem possible?
>>>
>>> One way to check this is to add a few unused fields to the end
>>> of the Thread_Control, set them to 0 when initialized and
>>> check that they are overwritten.
>>>
>>> Permanent fixes include:
>>>
>>> + checking for dormant start in switch extension
>> If I understand the root of the problem is that switch extension is
>> called on a task switch when the executing task has been deleted and
>> is being removed from consideration, i.e. state is dormant. The free'd
>> memory should not be accessed by any switch extension. I think
>> checking for dormant state is a better fix than the latter. I would
>> say to put it in the score function _User_extensions_Thread_switch,
>> this will introduce a single if statement, but will allow extensions
>> to be unchanged.
>>
> I thought about this but the problem is that most extensions
> are written as:
>
> switch( executing, heir )
> {
> save something on executing
> restore something for heir
> }
>
> If you skip the switch extensions entirely when executing is
> dormant, you miss the ability to do something for the heir.
>
> The check logic would have to be in every switch extension
> implementation. :(
>>> + moving something large-ish like Start down to the
>>> end of the TCB structure. Then when freed like this, it
>>> won't matter since it has already been used and won't
>>> be referenced.
>>>
>> This might fix the problem this time, but it is a hack and introduces
>> undefined behavior (accessing/using memory that was freed).
>>
> Yep. I was only considering this for release branches as
> a workaround. Alternatively adding a pad field to the TCB
> on the release branches is a solution but that adds to the
> memory requirement for all applications.
>
> The cleanest solution is to defer actually freeing the memory.
> This has always been a potential issue with delete(SELF). You
> have to continue to use the stack and we accounted for that
> on single core. But SMP makes this even dangerous. I want to
> do this on the head.
>
> We need a simple non-invasive solution for the release branches.
>
> This case is insidious because it is essentially a random
> combination of proper alignment, size of the allocation
> and the fact it is a task deleting itself and shrinking
> the set of unlimited objects. This case can't occur without
> unlimited objects enabled.
>
> Joachim.. does the patch I posted even solve the issue?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
First try this morning looks good, unlimited works without complains !!!!
I will test some other configurations and report if something fails.
Thanks a lot for the patch!!!!!
>>> If we decide the second alternative is better, then there
>>> needs to be some documentation.
>>>
>>> If the first is implemented, then unfortunately, I think
>>> it needs to be in every switch extension.
>>>
>>>> <...snip... cpukit/rtems/src/tasks.c>
>>>>
>>>> void _RTEMS_tasks_Switch_extension(
>>>> Thread_Control *executing,
>>>> Thread_Control *heir
>>>> )
>>>> {
>>>> rtems_task_variable_t *tvp;
>>>>
>>>> /*
>>>> * Per Task Variables
>>>> */
>>>>
>>>> tvp = executing->task_variables;
>>>> while (tvp) {
>>>> tvp->tval = *tvp->ptr;
>>>>
>>>> <...snip...>
>>>>
>>>> therefore the check of NULL fails and the last line of code in the
>>>> snippet results into a data abort...
>>>>
>>>>
>>>>> I think the extensions should ensure they are not operating on
>>>>> a deleted task. The extensions pointers and task variable
>>>>> pointer should be NULL at this point. Worst case, they can
>>>>> check the state of executing and if is has STATES_DORMANT set,
>>>>> then don't do anything for executing.
>>>>>
>>>>> I checked the 4.9 source for this part of the Classic API extensions.
>>>>> They are setting things to NULL and the switch extension is checking
>>>>> it.
>>>>>
>>>>> FWIW there is a PR outstanding spotted on SMP work where the
>>>>> thread stack is freed and potentially reallocated for some other
>>>>> purpose before the delete(SELF) task is finished switching out.
>>>>> I don't think that's happening here but it is worth mentioning.
>>>>>
>>>>>> BTW: When I change (as a test) the definition of CPU_HEAP_ALIGNMENT in
>>>>>> ..../rtems/cpukit/score/cpu/arm/rtems/score/cpu.h
>>>>>> from CPU_ALIGNMENT (which is 4) to something larger than
>>>>>> CPU_ALIGNMENT,
>>>>>> the unlimited test works fine.
>>>>>>
>>>>>> Any idea or advice ...?
>>>>>>
>>>>>> Regards,
>>>>>> Joachim
>>>>>>
>>>>>> On 01.03.2011 17:09, Joel Sherrill wrote:
>>>>>>> On 03/01/2011 07:48 AM, Joachim Rahn wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> after updating from rtems-4.9.3 to rtems-4.9.5 the "Unlimited Task
>>>>>>>> Test" on my
>>>>>>>> ARM cpu at91sam9263 fails with a message like...
>>>>>>>>
>>>>>>>> [...skip...]
>>>>>>>> task 19 ending.
>>>>>>>> task 20 ending.
>>>>>>>> task 21 ending.
>>>>>>>> task 7 ending.
>>>>>>>> task 8 ending.
>>>>>>>>
>>>>>>>> INSN_LDR
>>>>>>>> data_abort at address 0x20018CD8, instruction: 0xE5932000, spsr =
>>>>>>>> 0x20000013
>>>>>>>> active thread thread 0x0A010001
>>>>>>>> Previous sp=0x200629A8 lr=0x200135E0 and actual cpsr=60000097
>>>>>>>> 0x20038E30 0x20056EA8 0x0000117C 0x200629E0 0x200629C4 0x200135E0
>>>>>>>> 0x20018CB8 0x20038E30 0x20056EA8 0x20026EC0 0x20026EC0 0x20062A18
>>>>>>>> 0x200629E4 0x20010100 0x200135AC 0x00000000 0x00000000 0x00000000
>>>>>>>> 0x00000000 0x20056EA8 0x20038E30 0x60000013 0x600000D3 0x00000000
>>>>>>>> 0x00000000 0x20062A28 0x20062A1C 0x2000AE48 0x2000FFF8 0x20062A4C
>>>>>>>> 0x20062A2C 0x2000ADA0 0x2000AE24 0x521C9845 0x20056EA8 0x00000000
>>>>>>>> 0x00000000 0x2002ACD8 0x20062A64 0x20062A50 0x20000348 0x2000AD28
>>>>>>>> 0x00000008 0x00000001 0x20062A84 0x20062A68 0x2001C2D4 0x20000310
>>>>>>>>
>>>>>>>> [...skip...]
>>>>>>>>
>>>>>>>> which commonly means the cpu tries to access non available memory.
>>>>>>>>
>>>>>>>> After removing the bugfix bug1718 the "Unlimited Task Test" works
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> (https://www.rtems.org/bugzilla/show_bug.cgi?id=1718)
>>>>>>>>
>>>>>>>> *** rtems-4.9.3: ./cpukit/sapi/include/confdefs.h *** unlimited task
>>>>>>>> test works
>>>>>>>> [...skip...]
>>>>>>>>
>>>>>>>> #define CONFIGURE_MEMORY_PER_TASK_FOR_POSIX_API \
>>>>>>>> _Configure_From_workspace( \
>>>>>>>> sizeof (POSIX_API_Control) + \
>>>>>>>> (sizeof (void *) * (CONFIGURE_MAXIMUM_POSIX_KEYS)) \
>>>>>>>> )
>>>>>>>>
>>>>>>>> [...skip...]
>>>>>>>>
>>>>>>>> *** rtems-4.9.5: ./cpukit/sapi/include/confdefs.h *** unlimited task
>>>>>>>> test doesn't work
>>>>>>>> [...skip...]
>>>>>>>> #define CONFIGURE_MEMORY_PER_TASK_FOR_POSIX_API \
>>>>>>>> _Configure_From_workspace( \
>>>>>>>> CONFIGURE_MINIMUM_TASK_STACK_SIZE + \
>>>>>>>> sizeof (POSIX_API_Control) + \
>>>>>>>> (sizeof (void *) * (CONFIGURE_MAXIMUM_POSIX_KEYS)) \
>>>>>>>> )
>>>>>>>> [...skip...]
>>>>>>>>
>>>>>>>> Any hints respectively does anyone observe the same?
>>>>>>>>
>>>>>>> That patch wouldn't directly cause that failure.
>>>>>>> The only think I can see is that does change the
>>>>>>> amount of workspace reserved up front (by a lot).
>>>>>>>
>>>>>>> Is this a BSP which is in the RTEMS tree? I am
>>>>>>> suspicious that there isn't enough memory for
>>>>>>> the workspace/heap and the BSP initialization
>>>>>>> isn't recognizing this. Eventually the task stacks,
>>>>>>> heap, etc all collide, there is corruption and you crash.
>>>>>>>
>>>>>>> So we would need to know the following:
>>>>>>>
>>>>>>> + address of end of BSS
>>>>>>> + start of memory for heap and length
>>>>>>> + start of memory for RTEMS workspace and length.
>>>>>>> + amount of RAM
>>>>>>>
>>>>>>> Assuming that the workspace/heap are from end of
>>>>>>> BSS to the end of RAM.
>>>>>>>> Cheers
>>>>>>>>
>>>>>>>> --
>>>>>>>> Joachim
>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>>
>>>>>>>> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
>>>>>>>>
>>>>>>>> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
>>>>>>>> Forschungszentren e.V
>>>>>>>>
>>>>>>>> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch,
>>>>>>>> stv. Vorsitzende Dr. Beatrix Vierkorn- Rudolph
>>>>>>>> Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Prof. Dr. Dr.
>>>>>>>> h.c. Wolfgang Eberhardt, Dr. Ulrich Breuer
>>>>>>>>
>>>>>>>> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
>>>>>>>>
>>>>>>>> Postadresse:
>>>>>>>> Hahn-Meitner-Platz 1
>>>>>>>> D-14109 Berlin
>>>>>>>>
>>>>>>>> http://www.helmholtz-berlin.de
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> rtems-users mailing list
>>>>>>>> rtems-users at rtems.org
>>>>>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>>> ________________________________
>>>>>>
>>>>>> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
>>>>>>
>>>>>> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
>>>>>> Forschungszentren e.V.
>>>>>>
>>>>>> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch,
>>>>>> stv. Vorsitzende Dr. Beatrix Vierkorn-Rudolph
>>>>>> Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer
>>>>>>
>>>>>> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
>>>>>>
>>>>>> Postadresse:
>>>>>> Hahn-Meitner-Platz 1
>>>>>> D-14109 Berlin
>>>>>>
>>>>>> http://www.helmholtz-berlin.de
>>>> ________________________________
>>>>
>>>> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
>>>>
>>>> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
>>>> Forschungszentren e.V.
>>>>
>>>> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv.
>>>> Vorsitzende Dr. Beatrix Vierkorn-Rudolph
>>>> Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer
>>>>
>>>> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
>>>>
>>>> Postadresse:
>>>> Hahn-Meitner-Platz 1
>>>> D-14109 Berlin
>>>>
>>>> http://www.helmholtz-berlin.de
>>>
>>> --
>>> Joel Sherrill, Ph.D. Director of Research& Development
>>> joel.sherrill at OARcorp.com On-Line Applications Research
>>> Ask me about RTEMS: a free RTOS Huntsville AL 35805
>>> Support Available (256) 722-9985
>>>
>>>
>>> _______________________________________________
>>> rtems-users mailing list
>>> rtems-users at rtems.org
>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>
>
>
--
Joachim Rahn
____________________________________________________________
Joachim.Rahn at Helmholtz-Berlin.de
Albert-Einstein-Strasse 15, 12489 Berlin, Germany
Phone: +49 30 8062 - 14864
Fax: +49 30 8062 - 14632
________________________________
Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V.
Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv. Vorsitzende Dr. Beatrix Vierkorn-Rudolph
Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer
Sitz Berlin, AG Charlottenburg, 89 HRB 5583
Postadresse:
Hahn-Meitner-Platz 1
D-14109 Berlin
http://www.helmholtz-berlin.de
More information about the users
mailing list