or1k test was .. Re: [PATCH] or1k: New cache manager.

Joel Sherrill joel.sherrill at oarcorp.com
Tue Sep 16 21:08:01 UTC 2014


Gedare.. cc'ed you for help in spotting an empty rbtree
in gdb. See below.
On 9/16/2014 2:45 PM, Hesham Moustafa wrote:
> Breakpoint 2, 0x00000600 in _unalign ()
> (gdb) bt
> #0  0x00000600 in _unalign ()
> #1  0x0002ec4c in _RBTree_Next (
>     node=0x40890, dir=RBT_RIGHT)
>     at ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35
> #2  0x0002e2f4 in _RBTree_Successor (
>     node=0x40890)
>     at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512
> #3  0x0002e8c0 in _RBTree_Extract (
>     the_rbtree=0x4198c,
>     the_node=0x40890)
>     at ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106
> #4  0x00021524 in _RBTree_Get (
>     the_rbtree=0x4198c, dir=RBT_LEFT)
>     at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540
> #5  0x000215c8 in _Thread_queue_Dequeue
>     (the_thread_queue=0x4198c)
> ---Type <return> to continue, or q <return> to quit---
>     at ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51
> #6  0x00017c14 in _CORE_semaphore_Surrender (the_semaphore=0x4198c,
>     id=436273153,
>     api_semaphore_mp_support=0x0)
>     at ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37
> #7  0x00014868 in rtems_semaphore_release (id=436273153)
>     at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102
> #8  0x00026cfc in rtems_libio_unlock ()
>     at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253
> #9  0x00026d5c in rtems_filesystem_default_unlock (mt_entry=0x49ce0)
>     at ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type
> <return> to continue, or q <return> to quit---
> k_and_unlock.c:39
> #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c)
>     at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292
> #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c)
>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29
> #12 0x00029734 in rtems_libio_free (
>     iop=0x49c50)
>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136
> #13 0x0002912c in close (fd=0)
>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38
> #14 0x000064b0 in rtems_libio_exit ()
>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31
> ---Type <return> to continue, or q <return> to quit---
> #15 0x0003b058 in _exit (status=0)
>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46
> #16 0x00034798 in exit (code=0)
>     at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
> #17 0x00002e3c in Test_task (unused=1)
>     at ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41
> #18 0x000340f0 in _Thread_Handler ()
>     at ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
> #19 0x00034078 in _User_extensions_Thread_exitted (executing=0x40890)
>     at ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
> Backtrace stopped: frame did not save the PC
> (gdb)
>
>
> It breaks at _RBTree_Next specifically at the following line:
>  while ( ( current = current->child[ opp_dir ] ) != NULL )
>
> (gdb) p current->child[ opp_dir ]
> Cannot access memory at address 0xa010006
> (gdb) p current
> $1 = (RBTree_Node *) 0xa010002
These look like object ids.
> This address is invalid, the current memory length should be only 32
> MB (0x2000000)
>
> http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20
>
> So I guest current->child is overwritten somehow?
Yep.  Two approaches.

+ Set a watchpoint in gdb if it is supported. But even if supported,
it will likely slow the run tremendously.
+ Break selectively and more or less binary search for where it is
overwritten.  I would break at the first call to _ISR_Dispatch
(or whatever you called it) and see if it gets clobbered.

That could be clobbered VERY early in the program. It could be
a blown stack. But it could just be a stray write. Check the value
of that semaphore's rbtree when you get to Init and just
break periodically and see where it is corrupt.

I cc'ed Gedare because I don't know how to spot that the rbtree
is empty in gdb.

You need to see where that memory is overwritten.

Again running all tests with the simulator clock tick could
eliminate the ISR code as the culprit. :)
> On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill
> <joel.sherrill at oarcorp.com> wrote:
>> On 9/16/2014 2:17 PM, Hesham Moustafa wrote:
>>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill
>>> <joel.sherrill at oarcorp.com> wrote:
>>>>
>>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote:
>>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill
>>>>> <joel.sherrill at oarcorp.com> wrote:
>>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill
>>>>>>> <joel.sherrill at oarcorp.com> wrote:
>>>>>>>> I don't understand this but I got it applied.
>>>>>>>>
>>>>>>>> I manually edited the saved email to delete the preinstall.am
>>>>>>>> changes.  I committed the rest. Then I ran bootstrap -p myself
>>>>>>>> and folded that into the rest of your patch.
>>>>>>>>
>>>>>>>> It should all be committed now.
>>>>>>>>
>>>>>>> Thanks for doing this, me too do not know what's wrong. BTW, commits
>>>>>>> are not mirrored on github since 4 days ago.
>>>>>>>
>>>>>>>> How about some new test results. :)
>>>>>>>>
>>>>>>> I did run one last night, no big progress since previous results :( Is
>>>>>>> there any tool, script, utility program or whatever that I can use to
>>>>>>> detect wrong memory access (i.e, stack overwrite, heap corruption,
>>>>>>> access to another task context)? I tried to add -fstack-protector-all
>>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker just hangs.
>>>>>> I haven't checked into how gcc does its stack overwrite protection.
>>>>>>
>>>>>> The tests by themselves don't have these problems. The first
>>>>>> possible source is incorrect layout of sections to memory by
>>>>>> the linker script. There is some debug code in boot
>>>>>>
>>>>>> There used to be debug printk's in bspgetworkarea.c so you
>>>>>> could check if areas overlapped. That usually causes bad issues
>>>>>> though. Let's go through some basics:
>>>>>>
>>>>>> + Does hello world run and exit cleanly?
>>>>>>
>>>>> The output of Hello World is:
>>>>>
>>>>> *** BEGIN OF TEST HELLO WORLD ***
>>>>> Hello World
>>>>> *** END OF TEST HELLO WORLD ***
>>>>> Fatal Error 5.0 Halted
>>>>>
>>>>>   From GDB:
>>>>>
>>>>> Breakpoint 1, _Terminate (
>>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
>>>>> the_error=0)
>>>>>       at
>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
>>>>> 39  _ISR_Disable_without_giant( level );
>>>>> (gdb) bt
>>>>> #0  _Terminate (
>>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
>>>>> the_error=0)
>>>>>       at
>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
>>>>> #1  0x0003b5f8 in rtems_shutdown_executive (result=0)
>>>>>       at
>>>>> ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21
>>>>> #2  0x0003b350 in _exit (status=0)
>>>>>       at
>>>>>
>>>>> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47
>>>>> #3  0x0002cc30 in exit (code=0)
>>>>>       at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
>>>>> #4  0x00002184 in Init (ignored=253816)
>>>>>       at
>>>>>
>>>>> ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33
>>>>> ---Type <return> to continue, or q <return> to quit---
>>>>> #5  0x0002c5b8 in _Thread_Handler ()
>>>>>       at
>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>>>>> #6  0x0002c540 in _User_extensions_Thread_exitted (executing=0x40080)
>>>>>       at
>>>>> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
>>>> This is normal and OK. Look at the arguments to _Terminate.
>>>>>> + How far does ticker get?
>>>>>>
>>>>> Ticker goes to the end:
>>>>>
>>>>> *** BEGIN OF TEST CLOCK TICK ***
>>>>> TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>>>> TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>>>> TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>>>> TA1  - rtems_clock_get_tod - 09:00:05   12/31/1988
>>>>> TA2  - rtems_clock_get_tod - 09:00:10   12/31/1988
>>>>> TA1  - rtems_clock_get_tod - 09:00:10   12/31/1988
>>>>> TA3  - rtems_clock_get_tod - 09:00:15   12/31/1988
>>>>> TA1  - rtems_clock_get_tod - 09:00:15   12/31/1988
>>>>> TA2  - rtems_clock_get_tod - 09:00:20   12/31/1988
>>>>> TA1  - rtems_clock_get_tod - 09:00:20   12/31/1988
>>>>> TA1  - rtems_clock_get_tod - 09:00:25   12/31/1988
>>>>> TA3  - rtems_clock_get_tod - 09:00:30   12/31/1988
>>>>> TA2  - rtems_clock_get_tod - 09:00:30   12/31/1988
>>>>> TA1  - rtems_clock_get_tod - 09:00:30   12/31/1988
>>>>> *** END OF TEST CLOCK TICK ***
>>>>> Fatal Error 9.276564 Halted
>>>>>
>>>>>   From GDB:
>>>>>
>>>>> (gdb) break _Terminate
>>>>> Breakpoint 1 at 0x19a68: file
>>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c, line
>>>>> 39.
>>>>> (gdb) break _OR1K_Exception_default
>>>>> Breakpoint 2 at 0x2686c: file
>>>>>
>>>>>
>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c,
>>>>> line 22.
>>>>> (gdb) c
>>>>> The program is not being run.
>>>>> (gdb) target remote :50001
>>>>> Remote debugging using :50001
>>>>> 0x00000100 in _reset ()
>>>>> (gdb) c
>>>>> Continuing.
>>>>>
>>>>> Breakpoint 2, _OR1K_Exception_default (vector=6, frame=0x43854) at
>>>>>
>>>>>
>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
>>>>> 22  rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION, (rtems_fatal_code) frame
>>>>> );
>>>>> (gdb) bt
>>>>> #0  _OR1K_Exception_default (vector=6, frame=0x43854) at
>>>>>
>>>>>
>>>>> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
>>>>> #1  0x00026980 in jump_to_c_handler ()
>>>>> Backtrace stopped: frame did not save the PC
>>>>>
>>>>> vector 6 is _unalign exception.
>>>> Set a break point at exit() (I think) and rtems_shutdown_executive(). You
>>>> could start in the task which calls whatever kicks off the shutdown
>>>> sequence.
>>>> It looks like something in the shutdown procedure trips over something.
>>>> This might be easy to debug.
>>>>
>>> I did add just a function call to _CPU_Exception_frame_print(frame);
>>>   from _OR1K_Exception_default(uint32_t vector, CPU_Exception_frame
>>> *frame)
>>> And ticker exits normally without even entering
>>> _OR1K_Exception_defaul as it did before. This is very weird. Does this
>>> mean that some areas of the code are overlapped from the linker
>>> script?
>> I doubt it. I suspect something unitialized or not aligned properly.
>>
>> Set a breakpoint at
>> http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40
>> next over the print and then step through rtems_test_exit() and see
>> where it faults.
>>>> If the fault address is in the exception data, you can map that back to
>>>> the
>>>> nm file and see what file that was in, then that might help.
>>>>>> + Have you tried the trick I suggested earlier to disable the
>>>>>> real clock tick driver, use the simulator idle tick code, and
>>>>>> disable all the tests that are known to fail that way. This
>>>>>> will eliminate the ISR code as an issue because you won't
>>>>>> have any (if console output if polled).  See h8sim for
>>>>>> an example. Should be a Makefile.am change, adding
>>>>>> an include to the testsuite configuration file, building
>>>>>> and running.
>>>>>>
>> --
>> Joel Sherrill, Ph.D.             Director of Research & Development
>> joel.sherrill at OARcorp.com        On-Line Applications Research
>> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>> Support Available                (256) 722-9985
>>

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel.sherrill at OARcorp.com        On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available                (256) 722-9985





More information about the devel mailing list