or1k test was .. Re: [PATCH] or1k: New cache manager.

Hesham Moustafa heshamelmatary at gmail.com
Wed Sep 17 19:48:24 UTC 2014


On Wed, Sep 17, 2014 at 9:44 PM, Joel Sherrill <joel.sherrill at oarcorp.com>
wrote:

>
> On 9/17/2014 12:44 PM, Hesham Moustafa wrote:
> >
> > On Tue, Sep 16, 2014 at 11:08 PM, Joel Sherrill <
> joel.sherrill at oarcorp.com
> > <mailto:joel.sherrill at oarcorp.com>> wrote:
> >
> >     Gedare.. cc'ed you for help in spotting an empty rbtree
> >     in gdb. See below.
> >     On 9/16/2014 2:45 PM, Hesham Moustafa wrote:
> >      > Breakpoint 2, 0x00000600 in _unalign ()
> >      > (gdb) bt
> >      > #0  0x00000600 in _unalign ()
> >      > #1  0x0002ec4c in _RBTree_Next (
> >      >     node=0x40890, dir=RBT_RIGHT)
> >      >     at
> ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35
> >      > #2  0x0002e2f4 in _RBTree_Successor (
> >      >     node=0x40890)
> >      >     at
> ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512
> >      > #3  0x0002e8c0 in _RBTree_Extract (
> >      >     the_rbtree=0x4198c,
> >      >     the_node=0x40890)
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106
> >      > #4  0x00021524 in _RBTree_Get (
> >      >     the_rbtree=0x4198c, dir=RBT_LEFT)
> >      >     at
> ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540
> >      > #5  0x000215c8 in _Thread_queue_Dequeue
> >      >     (the_thread_queue=0x4198c)
> >      > ---Type <return> to continue, or q <return> to quit---
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51
> >      > #6  0x00017c14 in _CORE_semaphore_Surrender
> (the_semaphore=0x4198c,
> >      >     id=436273153,
> >      >     api_semaphore_mp_support=0x0)
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37
> >      > #7  0x00014868 in rtems_semaphore_release (id=436273153)
> >      >     at
> ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102
> >      > #8  0x00026cfc in rtems_libio_unlock ()
> >      >     at
> ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253
> >      > #9  0x00026d5c in rtems_filesystem_default_unlock
> (mt_entry=0x49ce0)
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type
> >      > <return> to continue, or q <return> to quit---
> >      > k_and_unlock.c:39
> >      > #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c)
> >      >     at
> ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292
> >      > #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c)
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29
> >      > #12 0x00029734 in rtems_libio_free (
> >      >     iop=0x49c50)
> >      >     at
> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136
> >      > #13 0x0002912c in close (fd=0)
> >      >     at
> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38
> >      > #14 0x000064b0 in rtems_libio_exit ()
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31
> >      > ---Type <return> to continue, or q <return> to quit---
> >      > #15 0x0003b058 in _exit (status=0)
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46
> >      > #16 0x00034798 in exit (code=0)
> >      >     at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
> >      > #17 0x00002e3c in Test_task (unused=1)
> >      >     at
> >
>  ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41
> >      > #18 0x000340f0 in _Thread_Handler ()
> >      >     at
> >
>  ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
> >      > #19 0x00034078 in _User_extensions_Thread_exitted
> (executing=0x40890)
> >      >     at
> >
>  ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
> >      > Backtrace stopped: frame did not save the PC
> >      > (gdb)
> >      >
> >      >
> >      > It breaks at _RBTree_Next specifically at the following line:
> >      >  while ( ( current = current->child[ opp_dir ] ) != NULL )
> >      >
> >      > (gdb) p current->child[ opp_dir ]
> >      > Cannot access memory at address 0xa010006
> >      > (gdb) p current
> >      > $1 = (RBTree_Node *) 0xa010002
> >     These look like object ids.
> >     > This address is invalid, the current memory length should be only
> 32
> >     > MB (0x2000000)
> >     >
> >     >
> http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20
> >     >
> >     > So I guest current->child is overwritten somehow?
> >     Yep.  Two approaches.
> >
> >     + Set a watchpoint in gdb if it is supported. But even if supported,
> >     it will likely slow the run tremendously.
> >
> > There is no HW watchpoint supported.
> >
> >     + Break selectively and more or less binary search for where it is
> >     overwritten.  I would break at the first call to _ISR_Dispatch
> >     (or whatever you called it) and see if it gets clobbered.
> >
> >     That could be clobbered VERY early in the program. It could be
> >     a blown stack. But it could just be a stray write. Check the value
> >     of that semaphore's rbtree when you get to Init and just
> >     break periodically and see where it is corrupt.
> >
> > That's what I did. As you assumed, it's clobbered very early.
> >
> > Breakpoint 1, _Objects_Extend_information (
> >      information=0x3e26c <_RTEMS_tasks_Information>)
> >      at
> >
> ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67
> > 67 do_extend     = true;
> > (gdb) bt
> > #0  _Objects_Extend_information (
> >      information=0x3e26c <_RTEMS_tasks_Information>)
> >      at
> >
> ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67
> > #1  0x0001b554 in _Objects_Initialize_information (
> >      information=0x3e26c <_RTEMS_tasks_Information>,
> >      the_api=OBJECTS_CLASSIC_API, the_class=1, maximum=4,
> >      size=1424, is_string=false, maximum_name_length=4)
> >      at
> >
> ../../../../../../rtems/c/src/../../cpukit/score/src/objectinitializeinformation.c:126
> > #2  0x0002c688 in _RTEMS_tasks_Manager_initialization ()
> >      at ../../../../../../rtems/c/src/../../cpukit/rtems/src/tasks.c:197
> > #3  0x00015bd4 in _RTEMS_API_Initialize ()
> >      at ../../../../../../rtems/c/src/../../cpukit/sapi/src/rtemsapi.c:59
> > #4  0x0001590c in rtems_initialize_data_structures ()
> >      at ../../../../../../rtems/c/src/../../cpukit/sapi/src/exinit.c:140
> > #5  0x0000333c in boot_card (cmdline=0x0)
> > ---Type <return> to continue, or q <return> to quit---
> >      at
> >
> ../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/bootcard.c:92
> > #6  0x00000000 in ?? ()
> > (gdb)
> >
> > Specifically, here
> >
> http://git.rtems.org/rtems/tree/cpukit/score/src/objectextendinformation.c#n261
> I think this is the first time it is initialized. What's the next time
> it is modified?
>
> Yes it's the first time. And this first time contains the invalid address,
it's not modified after that.

> But this looking like task manager class information and not a semaphore
> like the crash so this is odd. :(
> >
> >     I cc'ed Gedare because I don't know how to spot that the rbtree
> >     is empty in gdb.
> >
> >     You need to see where that memory is overwritten.
> >
> >     Again running all tests with the simulator clock tick could
> >     eliminate the ISR code as the culprit. :)
> >      > On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill
> >      > <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>
> wrote:
> >      >> On 9/16/2014 2:17 PM, Hesham Moustafa wrote:
> >      >>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill
> >      >>> <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>
> wrote:
> >      >>>>
> >      >>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote:
> >      >>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill
> >      >>>>> <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>
> wrote:
> >      >>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote:
> >      >>>>>>> Hi
> >      >>>>>>>
> >      >>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill
> >      >>>>>>> <joel.sherrill at oarcorp.com <mailto:
> joel.sherrill at oarcorp.com>> wrote:
> >      >>>>>>>> I don't understand this but I got it applied.
> >      >>>>>>>>
> >      >>>>>>>> I manually edited the saved email to delete the
> preinstall.am
> >     <http://preinstall.am>
> >      >>>>>>>> changes.  I committed the rest. Then I ran bootstrap -p
> myself
> >      >>>>>>>> and folded that into the rest of your patch.
> >      >>>>>>>>
> >      >>>>>>>> It should all be committed now.
> >      >>>>>>>>
> >      >>>>>>> Thanks for doing this, me too do not know what's wrong.
> BTW, commits
> >      >>>>>>> are not mirrored on github since 4 days ago.
> >      >>>>>>>
> >      >>>>>>>> How about some new test results. :)
> >      >>>>>>>>
> >      >>>>>>> I did run one last night, no big progress since previous
> results :( Is
> >      >>>>>>> there any tool, script, utility program or whatever that I
> can use to
> >      >>>>>>> detect wrong memory access (i.e, stack overwrite, heap
> corruption,
> >      >>>>>>> access to another task context)? I tried to add
> -fstack-protector-all
> >      >>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker
> just hangs.
> >      >>>>>> I haven't checked into how gcc does its stack overwrite
> protection.
> >      >>>>>>
> >      >>>>>> The tests by themselves don't have these problems. The first
> >      >>>>>> possible source is incorrect layout of sections to memory by
> >      >>>>>> the linker script. There is some debug code in boot
> >      >>>>>>
> >      >>>>>> There used to be debug printk's in bspgetworkarea.c so you
> >      >>>>>> could check if areas overlapped. That usually causes bad
> issues
> >      >>>>>> though. Let's go through some basics:
> >      >>>>>>
> >      >>>>>> + Does hello world run and exit cleanly?
> >      >>>>>>
> >      >>>>> The output of Hello World is:
> >      >>>>>
> >      >>>>> *** BEGIN OF TEST HELLO WORLD ***
> >      >>>>> Hello World
> >      >>>>> *** END OF TEST HELLO WORLD ***
> >      >>>>> Fatal Error 5.0 Halted
> >      >>>>>
> >      >>>>>   From GDB:
> >      >>>>>
> >      >>>>> Breakpoint 1, _Terminate (
> >      >>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
> >      >>>>> the_error=0)
> >      >>>>>       at
> >      >>>>>
> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
> >      >>>>> 39  _ISR_Disable_without_giant( level );
> >      >>>>> (gdb) bt
> >      >>>>> #0  _Terminate (
> >      >>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
> >      >>>>> the_error=0)
> >      >>>>>       at
> >      >>>>>
> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
> >      >>>>> #1  0x0003b5f8 in rtems_shutdown_executive (result=0)
> >      >>>>>       at
> >      >>>>>
> ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21
> >      >>>>> #2  0x0003b350 in _exit (status=0)
> >      >>>>>       at
> >      >>>>>
> >      >>>>>
> >
>  ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47
> >      >>>>> #3  0x0002cc30 in exit (code=0)
> >      >>>>>       at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
> >      >>>>> #4  0x00002184 in Init (ignored=253816)
> >      >>>>>       at
> >      >>>>>
> >      >>>>>
> ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33
> >      >>>>> ---Type <return> to continue, or q <return> to quit---
> >      >>>>> #5  0x0002c5b8 in _Thread_Handler ()
> >      >>>>>       at
> >      >>>>>
> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
> >      >>>>> #6  0x0002c540 in _User_extensions_Thread_exitted
> (executing=0x40080)
> >      >>>>>       at
> >      >>>>>
> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
> >      >>>> This is normal and OK. Look at the arguments to _Terminate.
> >      >>>>>> + How far does ticker get?
> >      >>>>>>
> >      >>>>> Ticker goes to the end:
> >      >>>>>
> >      >>>>> *** BEGIN OF TEST CLOCK TICK ***
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
> >      >>>>> TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
> >      >>>>> TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:05   12/31/1988
> >      >>>>> TA2  - rtems_clock_get_tod - 09:00:10   12/31/1988
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:10   12/31/1988
> >      >>>>> TA3  - rtems_clock_get_tod - 09:00:15   12/31/1988
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:15   12/31/1988
> >      >>>>> TA2  - rtems_clock_get_tod - 09:00:20   12/31/1988
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:20   12/31/1988
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:25   12/31/1988
> >      >>>>> TA3  - rtems_clock_get_tod - 09:00:30   12/31/1988
> >      >>>>> TA2  - rtems_clock_get_tod - 09:00:30   12/31/1988
> >      >>>>> TA1  - rtems_clock_get_tod - 09:00:30   12/31/1988
> >      >>>>> *** END OF TEST CLOCK TICK ***
> >      >>>>> Fatal Error 9.276564 Halted
> >      >>>>>
> >      >>>>>   From GDB:
> >      >>>>>
> >      >>>>> (gdb) break _Terminate
> >      >>>>> Breakpoint 1 at 0x19a68: file
> >      >>>>>
> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c, line
> >      >>>>> 39.
> >      >>>>> (gdb) break _OR1K_Exception_default
> >      >>>>> Breakpoint 2 at 0x2686c: file
> >      >>>>>
> >      >>>>>
> >      >>>>>
> >
>  ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c,
> >      >>>>> line 22.
> >      >>>>> (gdb) c
> >      >>>>> The program is not being run.
> >      >>>>> (gdb) target remote :50001
> >      >>>>> Remote debugging using :50001
> >      >>>>> 0x00000100 in _reset ()
> >      >>>>> (gdb) c
> >      >>>>> Continuing.
> >      >>>>>
> >      >>>>> Breakpoint 2, _OR1K_Exception_default (vector=6,
> frame=0x43854) at
> >      >>>>>
> >      >>>>>
> >      >>>>>
> >
>  ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
> >      >>>>> 22  rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION,
> (rtems_fatal_code) frame
> >      >>>>> );
> >      >>>>> (gdb) bt
> >      >>>>> #0  _OR1K_Exception_default (vector=6, frame=0x43854) at
> >      >>>>>
> >      >>>>>
> >      >>>>>
> >
>  ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
> >      >>>>> #1  0x00026980 in jump_to_c_handler ()
> >      >>>>> Backtrace stopped: frame did not save the PC
> >      >>>>>
> >      >>>>> vector 6 is _unalign exception.
> >      >>>> Set a break point at exit() (I think) and
> rtems_shutdown_executive(). You
> >      >>>> could start in the task which calls whatever kicks off the
> shutdown
> >      >>>> sequence.
> >      >>>> It looks like something in the shutdown procedure trips over
> something.
> >      >>>> This might be easy to debug.
> >      >>>>
> >      >>> I did add just a function call to
> _CPU_Exception_frame_print(frame);
> >      >>>   from _OR1K_Exception_default(uint32_t vector,
> CPU_Exception_frame
> >      >>> *frame)
> >      >>> And ticker exits normally without even entering
> >      >>> _OR1K_Exception_defaul as it did before. This is very weird.
> Does this
> >      >>> mean that some areas of the code are overlapped from the linker
> >      >>> script?
> >      >> I doubt it. I suspect something unitialized or not aligned
> properly.
> >      >>
> >      >> Set a breakpoint at
> >      >>
> http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40
> >      >> next over the print and then step through rtems_test_exit() and
> see
> >      >> where it faults.
> >      >>>> If the fault address is in the exception data, you can map
> that back to
> >      >>>> the
> >      >>>> nm file and see what file that was in, then that might help.
> >      >>>>>> + Have you tried the trick I suggested earlier to disable the
> >      >>>>>> real clock tick driver, use the simulator idle tick code, and
> >      >>>>>> disable all the tests that are known to fail that way. This
> >      >>>>>> will eliminate the ISR code as an issue because you won't
> >      >>>>>> have any (if console output if polled).  See h8sim for
> >      >>>>>> an example. Should be a Makefile.am change, adding
> >      >>>>>> an include to the testsuite configuration file, building
> >      >>>>>> and running.
> >      >>>>>>
> >      >> --
> >      >> Joel Sherrill, Ph.D.             Director of Research &
> Development
> >      >> joel.sherrill at OARcorp.com        On-Line Applications Research
> >      >> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
> >      >> Support Available                (256) 722-9985
> >      >>
> >
> >     --
> >     Joel Sherrill, Ph.D.             Director of Research & Development
> >     joel.sherrill at OARcorp.com        On-Line Applications Research
> >     Ask me about RTEMS: a free RTOS  Huntsville AL 35805
> >     Support Available                (256) 722-9985
> >
> >
> >
>
> --
> Joel Sherrill, Ph.D.             Director of Research & Development
> joel.sherrill at OARcorp.com        On-Line Applications Research
> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
> Support Available                (256) 722-9985
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20140917/fe0ab45e/attachment-0001.html>


More information about the devel mailing list