or1k test was .. Re: [PATCH] or1k: New cache manager.

Gedare Bloom gedare at rtems.org
Tue Sep 16 21:12:27 UTC 2014


On Tue, Sep 16, 2014 at 5:08 PM, Joel Sherrill
<joel.sherrill at oarcorp.com> wrote:
> Gedare.. cc'ed you for help in spotting an empty rbtree
> in gdb. See below.
> On 9/16/2014 2:45 PM, Hesham Moustafa wrote:
>> Breakpoint 2, 0x00000600 in _unalign ()
>> (gdb) bt
>> #0  0x00000600 in _unalign ()
>> #1  0x0002ec4c in _RBTree_Next (
>>     node=0x40890, dir=RBT_RIGHT)
>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35
>> #2  0x0002e2f4 in _RBTree_Successor (
>>     node=0x40890)
>>     at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512
>> #3  0x0002e8c0 in _RBTree_Extract (
>>     the_rbtree=0x4198c,
>>     the_node=0x40890)
>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106
>> #4  0x00021524 in _RBTree_Get (
>>     the_rbtree=0x4198c, dir=RBT_LEFT)
>>     at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540
>> #5  0x000215c8 in _Thread_queue_Dequeue
>>     (the_thread_queue=0x4198c)
>> ---Type <return> to continue, or q <return> to quit---
>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51
>> #6  0x00017c14 in _CORE_semaphore_Surrender (the_semaphore=0x4198c,
>>     id=436273153,
>>     api_semaphore_mp_support=0x0)
>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37
>> #7  0x00014868 in rtems_semaphore_release (id=436273153)
>>     at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102
>> #8  0x00026cfc in rtems_libio_unlock ()
>>     at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253
>> #9  0x00026d5c in rtems_filesystem_default_unlock (mt_entry=0x49ce0)
>>     at ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type
>> <return> to continue, or q <return> to quit---
>> k_and_unlock.c:39
>> #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c)
>>     at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292
>> #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c)
>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29
>> #12 0x00029734 in rtems_libio_free (
>>     iop=0x49c50)
>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136
>> #13 0x0002912c in close (fd=0)
>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38
>> #14 0x000064b0 in rtems_libio_exit ()
>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31
>> ---Type <return> to continue, or q <return> to quit---
>> #15 0x0003b058 in _exit (status=0)
>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46
>> #16 0x00034798 in exit (code=0)
>>     at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
>> #17 0x00002e3c in Test_task (unused=1)
>>     at ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41
>> #18 0x000340f0 in _Thread_Handler ()
>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>> #19 0x00034078 in _User_extensions_Thread_exitted (executing=0x40890)
>>     at ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
>> Backtrace stopped: frame did not save the PC
>> (gdb)
>>
>>
>> It breaks at _RBTree_Next specifically at the following line:
>>  while ( ( current = current->child[ opp_dir ] ) != NULL )
>>
>> (gdb) p current->child[ opp_dir ]
>> Cannot access memory at address 0xa010006
>> (gdb) p current
>> $1 = (RBTree_Node *) 0xa010002
> These look like object ids.
>> This address is invalid, the current memory length should be only 32
>> MB (0x2000000)
>>
>> http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20
>>
>> So I guest current->child is overwritten somehow?
> Yep.  Two approaches.
>
> + Set a watchpoint in gdb if it is supported. But even if supported,
> it will likely slow the run tremendously.
> + Break selectively and more or less binary search for where it is
> overwritten.  I would break at the first call to _ISR_Dispatch
> (or whatever you called it) and see if it gets clobbered.
>
> That could be clobbered VERY early in the program. It could be
> a blown stack. But it could just be a stray write. Check the value
> of that semaphore's rbtree when you get to Init and just
> break periodically and see where it is corrupt.
>
> I cc'ed Gedare because I don't know how to spot that the rbtree
> is empty in gdb.
>
You should be able to watch one of the pointers from the
rbtree_control. I think there is a check for rbtree_is_empty that
would also tell you what to do. I don't have the code in front of me
to check.
-Gedare

> You need to see where that memory is overwritten.
>
> Again running all tests with the simulator clock tick could
> eliminate the ISR code as the culprit. :)
>> On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill
>> <joel.sherrill at oarcorp.com> wrote:
>>> On 9/16/2014 2:17 PM, Hesham Moustafa wrote:
>>>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill
>>>> <joel.sherrill at oarcorp.com> wrote:
>>>>>
>>>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote:
>>>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill
>>>>>> <joel.sherrill at oarcorp.com> wrote:
>>>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote:
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill
>>>>>>>> <joel.sherrill at oarcorp.com> wrote:
>>>>>>>>> I don't understand this but I got it applied.
>>>>>>>>>
>>>>>>>>> I manually edited the saved email to delete the preinstall.am
>>>>>>>>> changes.  I committed the rest. Then I ran bootstrap -p myself
>>>>>>>>> and folded that into the rest of your patch.
>>>>>>>>>
>>>>>>>>> It should all be committed now.
>>>>>>>>>
>>>>>>>> Thanks for doing this, me too do not know what's wrong. BTW, commits
>>>>>>>> are not mirrored on github since 4 days ago.
>>>>>>>>
>>>>>>>>> How about some new test results. :)
>>>>>>>>>
>>>>>>>> I did run one last night, no big progress since previous results :( Is
>>>>>>>> there any tool, script, utility program or whatever that I can use to
>>>>>>>> detect wrong memory access (i.e, stack overwrite, heap corruption,
>>>>>>>> access to another task context)? I tried to add -fstack-protector-all
>>>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker just hangs.
>>>>>>> I haven't checked into how gcc does its stack overwrite protection.
>>>>>>>
>>>>>>> The tests by themselves don't have these problems. The first
>>>>>>> possible source is incorrect layout of sections to memory by
>>>>>>> the linker script. There is some debug code in boot
>>>>>>>
>>>>>>> There used to be debug printk's in bspgetworkarea.c so you
>>>>>>> could check if areas overlapped. That usually causes bad issues
>>>>>>> though. Let's go through some basics:
>>>>>>>
>>>>>>> + Does hello world run and exit cleanly?
>>>>>>>
>>>>>> The output of Hello World is:
>>>>>>
>>>>>> *** BEGIN OF TEST HELLO WORLD ***
>>>>>> Hello World
>>>>>> *** END OF TEST HELLO WORLD ***
>>>>>> Fatal Error 5.0 Halted
>>>>>>
>>>>>>   From GDB:
>>>>>>
>>>>>> Breakpoint 1, _Terminate (
>>>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
>>>>>> the_error=0)
>>>>>>       at
>>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
>>>>>> 39  _ISR_Disable_without_giant( level );
>>>>>> (gdb) bt
>>>>>> #0  _Terminate (
>>>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
>>>>>> the_error=0)
>>>>>>       at
>>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
>>>>>> #1  0x0003b5f8 in rtems_shutdown_executive (result=0)
>>>>>>       at
>>>>>> ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21
>>>>>> #2  0x0003b350 in _exit (status=0)
>>>>>>       at
>>>>>>
>>>>>> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47
>>>>>> #3  0x0002cc30 in exit (code=0)
>>>>>>       at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
>>>>>> #4  0x00002184 in Init (ignored=253816)
>>>>>>       at
>>>>>>
>>>>>> ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33
>>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>>> #5  0x0002c5b8 in _Thread_Handler ()
>>>>>>       at
>>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>>>>>> #6  0x0002c540 in _User_extensions_Thread_exitted (executing=0x40080)
>>>>>>       at
>>>>>> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
>>>>> This is normal and OK. Look at the arguments to _Terminate.
>>>>>>> + How far does ticker get?
>>>>>>>
>>>>>> Ticker goes to the end:
>>>>>>
>>>>>> *** BEGIN OF TEST CLOCK TICK ***
>>>>>> TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>>>>> TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>>>>> TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>>>>> TA1  - rtems_clock_get_tod - 09:00:05   12/31/1988
>>>>>> TA2  - rtems_clock_get_tod - 09:00:10   12/31/1988
>>>>>> TA1  - rtems_clock_get_tod - 09:00:10   12/31/1988
>>>>>> TA3  - rtems_clock_get_tod - 09:00:15   12/31/1988
>>>>>> TA1  - rtems_clock_get_tod - 09:00:15   12/31/1988
>>>>>> TA2  - rtems_clock_get_tod - 09:00:20   12/31/1988
>>>>>> TA1  - rtems_clock_get_tod - 09:00:20   12/31/1988
>>>>>> TA1  - rtems_clock_get_tod - 09:00:25   12/31/1988
>>>>>> TA3  - rtems_clock_get_tod - 09:00:30   12/31/1988
>>>>>> TA2  - rtems_clock_get_tod - 09:00:30   12/31/1988
>>>>>> TA1  - rtems_clock_get_tod - 09:00:30   12/31/1988
>>>>>> *** END OF TEST CLOCK TICK ***
>>>>>> Fatal Error 9.276564 Halted
>>>>>>
>>>>>>   From GDB:
>>>>>>
>>>>>> (gdb) break _Terminate
>>>>>> Breakpoint 1 at 0x19a68: file
>>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c, line
>>>>>> 39.
>>>>>> (gdb) break _OR1K_Exception_default
>>>>>> Breakpoint 2 at 0x2686c: file
>>>>>>
>>>>>>
>>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c,
>>>>>> line 22.
>>>>>> (gdb) c
>>>>>> The program is not being run.
>>>>>> (gdb) target remote :50001
>>>>>> Remote debugging using :50001
>>>>>> 0x00000100 in _reset ()
>>>>>> (gdb) c
>>>>>> Continuing.
>>>>>>
>>>>>> Breakpoint 2, _OR1K_Exception_default (vector=6, frame=0x43854) at
>>>>>>
>>>>>>
>>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
>>>>>> 22  rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION, (rtems_fatal_code) frame
>>>>>> );
>>>>>> (gdb) bt
>>>>>> #0  _OR1K_Exception_default (vector=6, frame=0x43854) at
>>>>>>
>>>>>>
>>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
>>>>>> #1  0x00026980 in jump_to_c_handler ()
>>>>>> Backtrace stopped: frame did not save the PC
>>>>>>
>>>>>> vector 6 is _unalign exception.
>>>>> Set a break point at exit() (I think) and rtems_shutdown_executive(). You
>>>>> could start in the task which calls whatever kicks off the shutdown
>>>>> sequence.
>>>>> It looks like something in the shutdown procedure trips over something.
>>>>> This might be easy to debug.
>>>>>
>>>> I did add just a function call to _CPU_Exception_frame_print(frame);
>>>>   from _OR1K_Exception_default(uint32_t vector, CPU_Exception_frame
>>>> *frame)
>>>> And ticker exits normally without even entering
>>>> _OR1K_Exception_defaul as it did before. This is very weird. Does this
>>>> mean that some areas of the code are overlapped from the linker
>>>> script?
>>> I doubt it. I suspect something unitialized or not aligned properly.
>>>
>>> Set a breakpoint at
>>> http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40
>>> next over the print and then step through rtems_test_exit() and see
>>> where it faults.
>>>>> If the fault address is in the exception data, you can map that back to
>>>>> the
>>>>> nm file and see what file that was in, then that might help.
>>>>>>> + Have you tried the trick I suggested earlier to disable the
>>>>>>> real clock tick driver, use the simulator idle tick code, and
>>>>>>> disable all the tests that are known to fail that way. This
>>>>>>> will eliminate the ISR code as an issue because you won't
>>>>>>> have any (if console output if polled).  See h8sim for
>>>>>>> an example. Should be a Makefile.am change, adding
>>>>>>> an include to the testsuite configuration file, building
>>>>>>> and running.
>>>>>>>
>>> --
>>> Joel Sherrill, Ph.D.             Director of Research & Development
>>> joel.sherrill at OARcorp.com        On-Line Applications Research
>>> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>>> Support Available                (256) 722-9985
>>>
>
> --
> Joel Sherrill, Ph.D.             Director of Research & Development
> joel.sherrill at OARcorp.com        On-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
> Support Available                (256) 722-9985
>
>



More information about the devel mailing list