still problem with ARM and Unlimited Task Test

Gedare Bloom gedare at gwmail.gwu.edu
Thu Jun 16 16:29:49 UTC 2011


On Wed, Jun 15, 2011 at 12:07 PM, Joel Sherrill
<joel.sherrill at oarcorp.com> wrote:
> Chris.. cc'ing you on this due to the unlimited nature.
>
> Sebastian.. cc'ing you to conform me on heap side-effect.
>
> On 06/15/2011 09:31 AM, Joachim Rahn wrote:
>>
>> On 15.06.2011 15:44, Joel Sherrill wrote:
>>>
>>> On 06/15/2011 08:08 AM, Joachim Rahn wrote:
>>>>
>>>> Hi Joel,
>>>>
>>>> hope I don't bother you too much...
>>>>
>>>> Now I found some time beside my main work to come back to my problem
>>>> with the
>>>> failing Unlimited Task Test on our ARM board.
>>>>
>>>> Any advice regarding the following would be welcome!!!
>>>> ------------------------------------------------------
>>>>
>>>> In fact our problem has nothing directly to do with the update between
>>>> RTEMS 4.9.3 and 4.9.5 !
>>>>
>>>> BTW: We now use the shared code in the BSP tree to initialize the
>>>> workspace (BSP_BOOTCARD_HANDLES_RAM_ALLOCATION = TRUE)
>>>>       and we compile every thing with all RTEMS and HEAP debugging on.
>>>>
>>>> We now have a setup with RTEMS 4.9.3 on a AT91SAM9263-EK evaluation
>>>> board from Atmel which reproduces
>>>> our data abort fault.
>>>>
>>>> The problem seems to be that under some circumstances the routine
>>>>
>>>>          _RTEMS_task_Switch_extension(Thread_Control *executing,
>>>> Thread_Control *heir)
>>>>
>>>> will be called with a reference to an executing task (*executing) after
>>>> the routine
>>>>
>>>>          _RTEMS_task_Delete_extension(Thread_Control *executing,
>>>> Thread_Control *heir)
>>>>
>>>> has already deleted the certain task and a following call sequence to
>>>>
>>>>          _RTEMS_tasks_Free
>>>>          _Objects_Free
>>>>          _Objects_Shrink_information
>>>>
>>>> ends up in a call to
>>>>
>>>>          _Heap_Free(Heap_Control *the_heap, void *starting_address)
>>>>
>>>> which frees the memory used by this certain Thread_Control struct of
>>>> that task and overwites the
>>>> pointer "executing->task_variables" (which now should be NULL) with some
>>>> heap information.
>>>> Because "executing->task_variables" now is corrupted the call to
>>>> _RTEMS_task_Switch_extension
>>>> leads to a data_abort.
>>>>
>>>> GDB output of the concerning call sequences using the gdb up command
>>>> looks like:
>>>>
>>>> GDB stack walk: _RTEMS_tasks_Delete_extension (executing=0x23fca500,
>>>> deleted=0x23fca500)
>>>>                  _User_extensions_Thread_delete (the_thread=0x23fca500)
>>>>                  _Thread_Close (information=0x2002fd98,
>>>> the_thread=0x23fca500)
>>>>                   rtems_task_delete (id=0)
>>>>                   test_task (my_number=8)
>>>>                  _Thread_Handler ()
>>>>                  _Objects_API_maximum_class (api=536959452)
>>>>
>>>> GDB stack walk: _Heap_Free (the_heap=0x2002fe4c,
>>>> starting_address=0x23fca0a0)
>>>>                  _Workspace_Free (block=0x23fca0a0)
>>>>                  _Objects_Shrink_information (information=0x2002fd98)
>>>>                  _Objects_Free (information=0x2002fd98,
>>>> the_object=0x23fca500)
>>>>                  _RTEMS_tasks_Free (the_task=0x23fca500)
>>>>                   rtems_task_delete (id=0)
>>>>                   test_task (my_number=8)
>>>>                  _Thread_Handler ()
>>>>                  _Objects_API_maximum_class (api=536959452)
>>>>
>>>> NOW "executing->task_variables" IS CORRUPTED !!!!!
>>>>
>>>> GDB stack walk: _RTEMS_tasks_Switch_extension (executing=0x23fca500,
>>>> heir=0x23fac488)
>>>>                  _User_extensions_Thread_switch (executing=0x23fca500,
>>>> heir=0x23fac488)
>>>>                  _Thread_Dispatch ()
>>>>                  _Thread_Enable_dispatch ()
>>>>                   rtems_task_delete (id=0)
>>>>                   test_task (my_number=8)
>>>>                  _Thread_Handler ()
>>>>                  _Objects_API_maximum_class (api=536959452)
>>>>
>>>>
>>>> By chance the pointer to "next_block->prev_size" in the call to
>>>> _Heap_Free has the same location as
>>>> "executing->task_variables" in the concerning Thread_Control struct and
>>>> therefore _RTEMS_task_Switch_extension
>>>> tries to access a bad memory location which of course leads to a
>>>> data_abort.
>>>> May be under other circumstances one will never stumble upon this?
>>>>
>>> The memory has indeed been freed and is not supposed to be used.
>>> In fact, executing->task_variables should be NULL.  I see it set
>>> to NULL in _RTEMS_tasks_Delete_extension.  Can you verify that?
>>>
>> YES: I've verified it, executing->task_variables is set to NULL by
>> _RTEMS_tasks_Delete_extension!
>>
>> BUT: after _Heap_Free has been called executing->task_variables is altered
>>      because at the former location of executing->task_variables now the
>> _Heap_Free routine
>>      expects next_block->prev_size and alters it to 3096 or 0xCE0.
>>
>>      The following call to _RTEMS_tasks_Switch_extension checks if
>> executing->task_variables is NULL
>>      but it's now 0xCE0 resp. NOT NULL.
>
> I hope this is reproducible enough to verify this.
>
> Can you set a watchpoint after the memory location that is set to
> NULL and then overwritten?  I don't see any reasonable way for the
> heap to write to this address.   task_variables is the last field
> in the Thread_Control block.  TCBs are pre-allocated and never
> freed back to the heap except for "shrinking" in the unlimited
> object case.
>
> What I suspect is happening, is that when the task in question
> is deleted, _Objects_Shrink_information is being called and freeing
> a "chunk" of unused TCBs.  If the array of TCBS EXACTLY meets the
> alignment, then when it is freed, there will be no pad at the end
> of it since it is a multiple of 4 (not sure what else).  The last four
> bytes must be getting overwritten when the memory is freed.
>
> Chris .. Sebastian.. does that seem possible?
>
> One way to check this is to add a few unused fields to the end
> of the Thread_Control, set them to 0 when initialized and
> check that they are overwritten.
>
> Permanent fixes include:
>
> + checking for dormant start in switch extension
If I understand the root of the problem is that switch extension is
called on a task switch when the executing task has been deleted and
is being removed from consideration, i.e. state is dormant. The free'd
memory should not be accessed by any switch extension. I think
checking for dormant state is a better fix than the latter. I would
say to put it in the score function _User_extensions_Thread_switch,
this will introduce a single if statement, but will allow extensions
to be unchanged.

> + moving something large-ish like Start down to the
> end of the TCB structure.  Then when freed like this, it
> won't matter since it has already been used and won't
> be referenced.
>
This might fix the problem this time, but it is a hack and introduces
undefined behavior (accessing/using memory that was freed).

> If we decide the second alternative is better, then there
> needs to be some documentation.
>
> If the first is implemented, then unfortunately, I think
> it needs to be in every switch extension.
>
>> <...snip... cpukit/rtems/src/tasks.c>
>>
>> void _RTEMS_tasks_Switch_extension(
>>   Thread_Control *executing,
>>   Thread_Control *heir
>> )
>> {
>>   rtems_task_variable_t *tvp;
>>
>>   /*
>>    *  Per Task Variables
>>    */
>>
>>   tvp = executing->task_variables;
>>   while (tvp) {
>>     tvp->tval = *tvp->ptr;
>>
>> <...snip...>
>>
>>      therefore the check of NULL fails and the last line of code in the
>> snippet results into a data abort...
>>
>>
>>> I think the extensions should ensure they are not operating on
>>> a deleted task.  The extensions pointers and task variable
>>> pointer should be NULL at this point.  Worst case, they can
>>> check the state of executing and if is has STATES_DORMANT set,
>>> then don't do anything for executing.
>>>
>>> I checked the 4.9 source for this part of the Classic API extensions.
>>> They are setting things to NULL and the switch extension is checking
>>> it.
>>>
>>> FWIW there is a PR outstanding spotted on SMP work where the
>>> thread stack is freed and potentially reallocated for some other
>>> purpose before the delete(SELF) task is finished switching out.
>>> I don't think that's happening here but it is worth mentioning.
>>>
>>>> BTW: When I change (as a test) the definition of CPU_HEAP_ALIGNMENT in
>>>>       ..../rtems/cpukit/score/cpu/arm/rtems/score/cpu.h
>>>>       from CPU_ALIGNMENT (which is 4) to something larger than
>>>> CPU_ALIGNMENT,
>>>>       the unlimited test works fine.
>>>>
>>>> Any idea or advice ...?
>>>>
>>>> Regards,
>>>> Joachim
>>>>
>>>> On 01.03.2011 17:09, Joel Sherrill wrote:
>>>>>
>>>>> On 03/01/2011 07:48 AM, Joachim Rahn wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> after updating from rtems-4.9.3 to rtems-4.9.5 the "Unlimited Task
>>>>>> Test" on my
>>>>>> ARM cpu at91sam9263 fails with a message like...
>>>>>>
>>>>>> [...skip...]
>>>>>> task 19 ending.
>>>>>> task 20 ending.
>>>>>> task 21 ending.
>>>>>> task 7 ending.
>>>>>> task 8 ending.
>>>>>>
>>>>>> INSN_LDR
>>>>>> data_abort at address 0x20018CD8, instruction: 0xE5932000,   spsr =
>>>>>> 0x20000013
>>>>>> active thread thread 0x0A010001
>>>>>> Previous sp=0x200629A8 lr=0x200135E0 and actual cpsr=60000097
>>>>>>    0x20038E30 0x20056EA8 0x0000117C 0x200629E0 0x200629C4 0x200135E0
>>>>>>    0x20018CB8 0x20038E30 0x20056EA8 0x20026EC0 0x20026EC0 0x20062A18
>>>>>>    0x200629E4 0x20010100 0x200135AC 0x00000000 0x00000000 0x00000000
>>>>>>    0x00000000 0x20056EA8 0x20038E30 0x60000013 0x600000D3 0x00000000
>>>>>>    0x00000000 0x20062A28 0x20062A1C 0x2000AE48 0x2000FFF8 0x20062A4C
>>>>>>    0x20062A2C 0x2000ADA0 0x2000AE24 0x521C9845 0x20056EA8 0x00000000
>>>>>>    0x00000000 0x2002ACD8 0x20062A64 0x20062A50 0x20000348 0x2000AD28
>>>>>>    0x00000008 0x00000001 0x20062A84 0x20062A68 0x2001C2D4 0x20000310
>>>>>>
>>>>>> [...skip...]
>>>>>>
>>>>>> which commonly means the cpu tries to access non available memory.
>>>>>>
>>>>>> After removing the bugfix bug1718 the "Unlimited Task Test" works
>>>>>> fine.
>>>>>>
>>>>>> (https://www.rtems.org/bugzilla/show_bug.cgi?id=1718)
>>>>>>
>>>>>> *** rtems-4.9.3: ./cpukit/sapi/include/confdefs.h *** unlimited task
>>>>>> test works
>>>>>> [...skip...]
>>>>>>
>>>>>>     #define CONFIGURE_MEMORY_PER_TASK_FOR_POSIX_API \
>>>>>>       _Configure_From_workspace( \
>>>>>>         sizeof (POSIX_API_Control) + \
>>>>>>        (sizeof (void *) * (CONFIGURE_MAXIMUM_POSIX_KEYS)) \
>>>>>>       )
>>>>>>
>>>>>> [...skip...]
>>>>>>
>>>>>> *** rtems-4.9.5: ./cpukit/sapi/include/confdefs.h *** unlimited task
>>>>>> test doesn't work
>>>>>> [...skip...]
>>>>>>     #define CONFIGURE_MEMORY_PER_TASK_FOR_POSIX_API \
>>>>>>       _Configure_From_workspace( \
>>>>>>         CONFIGURE_MINIMUM_TASK_STACK_SIZE + \
>>>>>>         sizeof (POSIX_API_Control) + \
>>>>>>        (sizeof (void *) * (CONFIGURE_MAXIMUM_POSIX_KEYS)) \
>>>>>>       )
>>>>>> [...skip...]
>>>>>>
>>>>>> Any hints respectively does anyone observe the same?
>>>>>>
>>>>> That patch wouldn't directly cause that failure.
>>>>> The only think I can see is that does change the
>>>>> amount of workspace reserved up front (by a lot).
>>>>>
>>>>> Is this a BSP which is in the RTEMS tree?  I am
>>>>> suspicious that there isn't enough memory for
>>>>> the workspace/heap and the BSP initialization
>>>>> isn't recognizing this.  Eventually the task stacks,
>>>>> heap, etc all collide, there is corruption and you crash.
>>>>>
>>>>> So we would need to know the following:
>>>>>
>>>>> + address of end of BSS
>>>>> + start of memory for heap and length
>>>>> + start of memory for RTEMS workspace and length.
>>>>> + amount of RAM
>>>>>
>>>>> Assuming that the workspace/heap are from end of
>>>>> BSS to the end of RAM.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> --
>>>>>> Joachim
>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
>>>>>>
>>>>>> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
>>>>>> Forschungszentren e.V
>>>>>>
>>>>>> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch,
>>>>>> stv. Vorsitzende Dr. Beatrix Vierkorn- Rudolph
>>>>>> Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Prof. Dr. Dr.
>>>>>> h.c. Wolfgang Eberhardt, Dr. Ulrich Breuer
>>>>>>
>>>>>> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
>>>>>>
>>>>>> Postadresse:
>>>>>> Hahn-Meitner-Platz 1
>>>>>> D-14109 Berlin
>>>>>>
>>>>>> http://www.helmholtz-berlin.de
>>>>>>
>>>>>> _______________________________________________
>>>>>> rtems-users mailing list
>>>>>> rtems-users at rtems.org
>>>>>> http://www.rtems.org/mailman/listinfo/rtems-users
>>>>
>>>> ________________________________
>>>>
>>>> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
>>>>
>>>> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
>>>> Forschungszentren e.V.
>>>>
>>>> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch,
>>>> stv. Vorsitzende Dr. Beatrix Vierkorn-Rudolph
>>>> Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer
>>>>
>>>> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
>>>>
>>>> Postadresse:
>>>> Hahn-Meitner-Platz 1
>>>> D-14109 Berlin
>>>>
>>>> http://www.helmholtz-berlin.de
>>>
>>
>> ________________________________
>>
>> Helmholtz-Zentrum Berlin für Materialien und Energie GmbH
>>
>> Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher
>> Forschungszentren e.V.
>>
>> Aufsichtsrat: Vorsitzender Prof. Dr. Dr. h.c. mult. Joachim Treusch, stv.
>> Vorsitzende Dr. Beatrix Vierkorn-Rudolph
>> Geschäftsführer: Prof. Dr. Anke Rita Kaysser-Pyzalla, Dr. Ulrich Breuer
>>
>> Sitz Berlin, AG Charlottenburg, 89 HRB 5583
>>
>> Postadresse:
>> Hahn-Meitner-Platz 1
>> D-14109 Berlin
>>
>> http://www.helmholtz-berlin.de
>
>
> --
> Joel Sherrill, Ph.D.             Director of Research&  Development
> joel.sherrill at OARcorp.com        On-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>   Support Available             (256) 722-9985
>
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users
>




More information about the users mailing list