About HEAP error
Richi Dubey
richidubey at gmail.com
Sat Mar 6 05:22:58 UTC 2021
>
> breaking at _Terminate and doing a back trace will give
> you the exact line the error is raised from.
I don't know why I did not focus on this earlier! So I put a breakpoint at
the line <https://git.rtems.org/rtems/tree/cpukit/score/src/heapfree.c#n73>
which detects a fault and now we know which is the corrupted address:
-----------------------------------------------------------------
Continuing.
Thread 1 hit Breakpoint 9, _Heap_Protection_check_free_block (heap=0x202ba8
<_Workspace_Area>, block=0x206fec) at
/home/richi/quick-start/LatestStrong/src/rtems/c/src/../../cpukit/score/src/heapfree.c:73
73 _Heap_Protection_block_error( heap, block,
HEAP_ERROR_FREE_PATTERN );
(gdb) p current
$7 = (uintptr_t *) 0x207bac
-----------------------------------------------------------------
but when I try to find which function actually changes its value by putting
a watchpoint, I get this:
-----------------------------------------------------------------
(gdb) watch *(uintptr_t *) 0x207bac
Hardware watchpoint 7: *(uintptr_t *) 0x207bac
(gdb) reset
Loading section .start, size 0xa5c lma 0x100000
...
Transfer rate: 3102 KB/sec, 1855 bytes/write.
(gdb) c
Continuing.
Thread 3 hit Hardware watchpoint 7: *(uintptr_t *) 0x207bac
Old value = 1134949
New value = 1050747
0x00102c06 in _SMP_Get_current_processor () at
/home/richi/quick-start/LatestStrong/src/rtems/cpukit/include/rtems/score/smp.h:65
65 {
(gdb)
Continuing.
Thread 3 hit Hardware watchpoint 7: *(uintptr_t *) 0x207bac
Old value = 1050747
New value = 2128816
0x00102bf6 in _ARM_Wait_for_event () at
/home/richi/quick-start/LatestStrong/src/rtems/cpukit/score/cpu/arm/include/rtems/score/cpu.h:505
505 {
(gdb)
Continuing.
[Switching to Thread 1.1]
Thread 1 hit Breakpoint 6, Init (argument=2107944) at
/home/richi/quick-start/LatestStrong/src/rtems/c/src/../../testsuites/sptests/sp02/init.c:26
26 TEST_BEGIN();
(gdb)
Continuing.
Thread 1 hit Hardware watchpoint 7: *(uintptr_t *) 0x207bac
Old value = 2128816
New value = 3876142303
_Heap_Protection_delay_block_free (heap=0x202ba8 <_Workspace_Area>,
block=0x206fec) at
/home/richi/quick-start/LatestStrong/src/rtems/c/src/../../cpukit/score/src/heapfree.c:55
55 for ( current = pattern_begin; current != pattern_end; ++current ) {
(gdb)
Continuing.
[Switching to Thread 1.3]
Thread 3 hit Hardware watchpoint 7: *(uintptr_t *) 0x207bac
Old value = 3876142303
New value = 2128816
0x00102bf6 in _ARM_Wait_for_event () at
/home/richi/quick-start/LatestStrong/src/rtems/cpukit/score/cpu/arm/include/rtems/score/cpu.h:505
505 {
(gdb) bt
#0 0x00102bf6 in _ARM_Wait_for_event () at
/home/richi/quick-start/LatestStrong/src/rtems/cpukit/score/cpu/arm/include/rtems/score/cpu.h:505
#1 0x001008a8 in bsp_start_hook_0 () at
/home/richi/quick-start/LatestStrong/src/rtems/c/src/lib/libbsp/arm/realview-pbx-a9/../../../../../../bsps/arm/realview-pbx-a9/start/bspstarthooks.c:76
#2 0x00100244 in bsp_start_vector_table_end () at
/home/richi/quick-start/LatestStrong/src/rtems/c/src/lib/libbsp/arm/realview-pbx-a9/../../../../../../bsps/arm/shared/start/start.S:464
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
-----------------------------------------------------------------
The value of HEAP_FREE_PATTERN is 3876142303. So I know that the
function _ARM_Wait_for_event is the one that actually corrupts this value.
But, the backtrace does not help much. Which function calls this function?
What is the trace of this function?
On Sat, Mar 6, 2021 at 10:28 AM Richi Dubey <richidubey at gmail.com> wrote:
> This is just the detection point. The allocate is doing a validity check
>> and something is wrong from an overwrite.
>> FWIW this is pretty early in the test I think.
>
> Good point, the corruption has already happened earlier, and yes its quite
> early :
> ...
> #13 0x0010860e in rtems_task_create (name=1413558560, initial_priority=1,
> stack_size=4096, initial_modes=0, attribute_set=0, id=0x202444 <Task_id+4>)
> at
> /home/richi/quick-start/LatestStrong/src/rtems/c/src/../../cpukit/rtems/src/taskcreate.c:84
> #14 0x001015fa in Init (argument=2107944) at
> /home/richi/quick-start/LatestStrong/src/rtems/c/src/../../testsuites/sptests/sp02/init.c:101
>
>
>
> Look manually at say 32 bytes at that address at various points during the
>> program's execution. I think this is one where a binary search for the
>> corrupting action happens.
>>
> Yes, or maybe I would try to manually put a watchpoint at all the 32 bytes
> starting 0x202ba8 and see if it works.
>
> Is this using your scheduler as the default? If so, I'd be suspicious of
>> anything allocated for it and if you were riding outside an area allocated
>> for you.
>
> Yes! Maybe the scheduler access a variable outside its bound (maybe an
> array element outside its array size), but if that is true, there should be
> a lot more failure with HEAP_ERROR. I would still give it a look.
>
> Thanks again for your help.
>
> On Fri, Mar 5, 2021 at 7:01 PM Joel Sherrill <joel at rtems.org> wrote:
>
>>
>>
>> On Thu, Mar 4, 2021, 11:31 PM Richi Dubey <richidubey at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks to both of you for helping me out with this!
>>>
>>> When I backtrace on _Terminate: I get this:
>>>
>>> Init -> rtems_task_create -> ... -> _Heap_Allocate -> ...
>>> ->_Heap_Protection_check_free_block -> _Heap_Protection_block_error
>>> ->_Heap_Protection_block_error_default -> _Terminate
>>> (the_source=RTEMS_FATAL_SOURCE_HEAP, the_error=2125180). So I will try to
>>> debug this trace.
>>>
>>
>> This is just the detection point. The allocate is doing a validity check
>> and something is wrong from an overwrite.
>>
>> FWIW this is pretty early in the test I think.
>>
>>
>>
>>
>>> Also, setting a watchpoint doesn't help:
>>>
>>
>> Look manually at say 32 bytes at that address at various points during
>> the program's execution. I think this is one where a binary search for the
>> corrupting action happens.
>>
>> Is this using your scheduler as the default? If so, I'd be suspicious of
>> anything allocated for it and if you were riding outside an area allocated
>> for you.
>>
>>>
>>> Thread 1 hit Breakpoint 6, Init (argument=2107944) at
>>> /home/richi/quick-start/LatestStrong/src/rtems/c/src/../../testsuites/sptests/sp02/init.c:26
>>> 26 TEST_BEGIN();
>>> (gdb) watch *(unsigned int)* 0x202ba8
>>> Hardware watchpoint 13: *(unsigned int)* 0x202ba8
>>> (gdb) c
>>> Continuing.
>>>
>>> Thread 1 hit Breakpoint 5, _Terminate
>>> (the_source=RTEMS_FATAL_SOURCE_HEAP, the_error=2125180) at
>>> /home/richi/quick-start/LatestStrong/src/rtems/c/src/../../cpukit/score/src/interr.c:38
>>> 38 _User_extensions_Fatal( the_source, the_error );
>>>
>>>
>>>
>>> On Wed, Mar 3, 2021 at 10:20 PM Gedare Bloom <gedare at rtems.org> wrote:
>>>
>>>> On Wed, Mar 3, 2021 at 9:49 AM Gedare Bloom <gedare at rtems.org> wrote:
>>>> >
>>>> > On Wed, Mar 3, 2021 at 9:28 AM Joel Sherrill <joel.sherrill at gmail.com>
>>>> wrote:
>>>> > >
>>>> > >
>>>> > >
>>>> > > On Wed, Mar 3, 2021 at 9:50 AM Gedare Bloom <gedare at rtems.org>
>>>> wrote:
>>>> > >>
>>>> > >> On Wed, Mar 3, 2021 at 12:08 AM Richi Dubey <richidubey at gmail.com>
>>>> wrote:
>>>> > >> >
>>>> > >> > What's the element written after the free? I set a watch at the
>>>> exact block location, but it doesn't work:
>>>> > >> >
>>>> > >> > Hardware watchpoint 7: *0x202ba8
>>>> > >> > (gdb) watch *0x206fec
>>>> > >> > Hardware watchpoint 8: *0x206fec
>>>> > >> That's just the first byte in the block. If you can figure out
>>>> which
>>>> > >> bytes/words in the block get accessed that would help you.
>>>> > >
>>>> > >
>>>> > > What about watch *(unsigned int)* 0x202ba8?
>>>> > >
>>>> > > Won't that look at more bytes?
>>>> >
>>>> > And this is just the first byte of the workspace area.
>>>> >
>>>> 4 bytes :)
>>>> > >
>>>> > > In case you do need to look at more bytes in the fence...
>>>> > > breaking at _Terminate and doing a back trace will give
>>>> > > you the exact line the error is raised from. You can then set a
>>>> > > breakpoint at that on the next line and look at local variables.
>>>> > > The corruption may be in the fence somewhere beyond the
>>>> > > first 32-bits.
>>>> > >
>>>> In the case of heap corruption, the corruption is detected after it
>>>> already happened. Narrowing down when/where the corruption happens is
>>>> necessary. The next thing to do would be to examine the pattern that
>>>> triggers the violation, and see where it got modified. This might
>>>> provide a byte address to set a watch on.
>>>>
>>>> > > Sometimes it is easy to binary search for an issue like this
>>>> > > on a simulator. But with a watchpoint, you should be able to
>>>> > > determine the precise word which is corrupted in the fence
>>>> > > and break on that write.
>>>> > >
>>>> > > --joel
>>>> > >>
>>>> > >>
>>>> > >> > (gdb) c
>>>> > >> > Continuing.
>>>> > >> >
>>>> > >> > Thread 1 hit Breakpoint 6, Init (argument=2107944) at
>>>> /home/richi/quick-start/LatestStrong/src/rtems/c/src/../../testsuites/sptests/sp02/init.c:26
>>>> > >> > 26 TEST_BEGIN();
>>>> > >> > (gdb)
>>>> > >> > Continuing.
>>>> > >> >
>>>> > >> > Thread 1 hit Breakpoint 5, _Terminate
>>>> (the_source=RTEMS_FATAL_SOURCE_HEAP, the_error=2125180) at
>>>> /home/richi/quick-start/LatestStrong/src/rtems/c/src/../../cpukit/score/src/interr.c:38
>>>> > >> > 38 _User_extensions_Fatal( the_source, the_error );
>>>> > >> > (gdb)
>>>> > >> > Continuing.
>>>> > >> >
>>>> > >> > Thread 1 hit Breakpoint 4, bsp_reset () at
>>>> /home/richi/quick-start/LatestStrong/src/rtems/c/src/lib/libbsp/arm/realview-pbx-a9/../../../../../../bsps/arm/realview-pbx-a9/start/bspreset.c:19
>>>> > >> > 19 volatile uint32_t *sys_lock = (volatile uint32_t *)
>>>> 0x10000020;
>>>> > >> > (gdb)
>>>> > >> >
>>>> > >> > On Wed, Mar 3, 2021 at 12:31 PM Sebastian Huber <
>>>> sebastian.huber at embedded-brains.de> wrote:
>>>> > >> >>
>>>> > >> >> On 02/03/2021 05:44, Richi Dubey wrote:
>>>> > >> >>
>>>> > >> >> >
>>>> > >> >> > (gdb) p *(Heap_Error_context*)(0x00206d7c)
>>>> > >> >> > $5 = {
>>>> > >> >> > heap = 0x202ba8 <_Workspace_Area>,
>>>> > >> >> > block = 0x206fec,
>>>> > >> >> > reason = HEAP_ERROR_FREE_PATTERN
>>>> > >> >> > }
>>>> > >> >>
>>>> > >> >> If it is always the same address, then you can set a watch
>>>> point to an
>>>> > >> >> element which is written after the free to catch the function
>>>> which
>>>> > >> >> writes into this area.
>>>> > >> >>
>>>> > >> >> --
>>>> > >> >> embedded brains GmbH
>>>> > >> >> Herr Sebastian HUBER
>>>> > >> >> Dornierstr. 4
>>>> > >> >> 82178 Puchheim
>>>> > >> >> Germany
>>>> > >> >> email: sebastian.huber at embedded-brains.de
>>>> > >> >> phone: +49-89-18 94 741 - 16
>>>> > >> >> fax: +49-89-18 94 741 - 08
>>>> > >> >>
>>>> > >> >> Registergericht: Amtsgericht München
>>>> > >> >> Registernummer: HRB 157899
>>>> > >> >> Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas
>>>> Dörfler
>>>> > >> >> Unsere Datenschutzerklärung finden Sie hier:
>>>> > >> >> https://embedded-brains.de/datenschutzerklaerung/
>>>> > >> >>
>>>> > >> > _______________________________________________
>>>> > >> > devel mailing list
>>>> > >> > devel at rtems.org
>>>> > >> > http://lists.rtems.org/mailman/listinfo/devel
>>>> > >> _______________________________________________
>>>> > >> devel mailing list
>>>> > >> devel at rtems.org
>>>> > >> http://lists.rtems.org/mailman/listinfo/devel
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20210306/e48bb742/attachment-0001.html>
More information about the devel
mailing list