or1k test was .. Re: [PATCH] or1k: New cache manager.

Joel Sherrill joel.sherrill at oarcorp.com
Wed Sep 17 20:07:44 UTC 2014


On 9/17/2014 2:48 PM, Hesham Moustafa wrote:
>
> On Wed, Sep 17, 2014 at 9:44 PM, Joel Sherrill <joel.sherrill at oarcorp.com 
> <mailto:joel.sherrill at oarcorp.com>> wrote:
>
>
>     On 9/17/2014 12:44 PM, Hesham Moustafa wrote:
>     >
>     > On Tue, Sep 16, 2014 at 11:08 PM, Joel Sherrill <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>
>      > <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>> wrote:
>      >
>      >     Gedare.. cc'ed you for help in spotting an empty rbtree
>      >     in gdb. See below.
>      >     On 9/16/2014 2:45 PM, Hesham Moustafa wrote:
>      >      > Breakpoint 2, 0x00000600 in _unalign ()
>      >      > (gdb) bt
>      >      > #0  0x00000600 in _unalign ()
>      >      > #1  0x0002ec4c in _RBTree_Next (
>      >      >     node=0x40890, dir=RBT_RIGHT)
>      >      >     at
>     ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35
>      >      > #2  0x0002e2f4 in _RBTree_Successor (
>      >      >     node=0x40890)
>      >      >     at
>     ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512
>      >      > #3  0x0002e8c0 in _RBTree_Extract (
>      >      >     the_rbtree=0x4198c,
>      >      >     the_node=0x40890)
>      >      >     at
>      >     ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106
>      >      > #4  0x00021524 in _RBTree_Get (
>      >      >     the_rbtree=0x4198c, dir=RBT_LEFT)
>      >      >     at
>     ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540
>      >      > #5  0x000215c8 in _Thread_queue_Dequeue
>      >      >     (the_thread_queue=0x4198c)
>      >      > ---Type <return> to continue, or q <return> to quit---
>      >      >     at
>      >     ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51
>      >      > #6  0x00017c14 in _CORE_semaphore_Surrender (the_semaphore=0x4198c,
>      >      >     id=436273153,
>      >      >     api_semaphore_mp_support=0x0)
>      >      >     at
>      >   
>       ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37
>      >      > #7  0x00014868 in rtems_semaphore_release (id=436273153)
>      >      >     at
>     ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102
>      >      > #8  0x00026cfc in rtems_libio_unlock ()
>      >      >     at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253
>      >      > #9  0x00026d5c in rtems_filesystem_default_unlock (mt_entry=0x49ce0)
>      >      >     at
>      >   
>       ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type
>      >      > <return> to continue, or q <return> to quit---
>      >      > k_and_unlock.c:39
>      >      > #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c)
>      >      >     at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292
>      >      > #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c)
>      >      >     at
>      >     ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29
>      >      > #12 0x00029734 in rtems_libio_free (
>      >      >     iop=0x49c50)
>      >      >     at
>     ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136
>      >      > #13 0x0002912c in close (fd=0)
>      >      >     at
>     ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38
>      >      > #14 0x000064b0 in rtems_libio_exit ()
>      >      >     at
>      >   
>       ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31
>      >      > ---Type <return> to continue, or q <return> to quit---
>      >      > #15 0x0003b058 in _exit (status=0)
>      >      >     at
>      >   
>       ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46
>      >      > #16 0x00034798 in exit (code=0)
>      >      >     at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
>      >      > #17 0x00002e3c in Test_task (unused=1)
>      >      >     at
>      >   
>       ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41
>      >      > #18 0x000340f0 in _Thread_Handler ()
>      >      >     at
>      >     ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>      >      > #19 0x00034078 in _User_extensions_Thread_exitted (executing=0x40890)
>      >      >     at
>      >     ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
>      >      > Backtrace stopped: frame did not save the PC
>      >      > (gdb)
>      >      >
>      >      >
>      >      > It breaks at _RBTree_Next specifically at the following line:
>      >      >  while ( ( current = current->child[ opp_dir ] ) != NULL )
>      >      >
>      >      > (gdb) p current->child[ opp_dir ]
>      >      > Cannot access memory at address 0xa010006
>      >      > (gdb) p current
>      >      > $1 = (RBTree_Node *) 0xa010002
>      >     These look like object ids.
>      >     > This address is invalid, the current memory length should be only 32
>      >     > MB (0x2000000)
>      >     >
>      >   
>       >http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20
>      >     >
>      >     > So I guest current->child is overwritten somehow?
>      >     Yep.  Two approaches.
>      >
>      >     + Set a watchpoint in gdb if it is supported. But even if supported,
>      >     it will likely slow the run tremendously.
>      >
>      > There is no HW watchpoint supported.
>      >
>      >     + Break selectively and more or less binary search for where it is
>      >     overwritten.  I would break at the first call to _ISR_Dispatch
>      >     (or whatever you called it) and see if it gets clobbered.
>      >
>      >     That could be clobbered VERY early in the program. It could be
>      >     a blown stack. But it could just be a stray write. Check the value
>      >     of that semaphore's rbtree when you get to Init and just
>      >     break periodically and see where it is corrupt.
>      >
>      > That's what I did. As you assumed, it's clobbered very early.
>      >
>      > Breakpoint 1, _Objects_Extend_information (
>      >      information=0x3e26c <_RTEMS_tasks_Information>)
>      >      at
>      >
>     ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67
>      > 67 do_extend     = true;
>      > (gdb) bt
>      > #0  _Objects_Extend_information (
>      >      information=0x3e26c <_RTEMS_tasks_Information>)
>      >      at
>      >
>     ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67
>      > #1  0x0001b554 in _Objects_Initialize_information (
>      >      information=0x3e26c <_RTEMS_tasks_Information>,
>      >      the_api=OBJECTS_CLASSIC_API, the_class=1, maximum=4,
>      >      size=1424, is_string=false, maximum_name_length=4)
>      >      at
>      >
>     ../../../../../../rtems/c/src/../../cpukit/score/src/objectinitializeinformation.c:126
>      > #2  0x0002c688 in _RTEMS_tasks_Manager_initialization ()
>      >      at ../../../../../../rtems/c/src/../../cpukit/rtems/src/tasks.c:197
>      > #3  0x00015bd4 in _RTEMS_API_Initialize ()
>      >      at ../../../../../../rtems/c/src/../../cpukit/sapi/src/rtemsapi.c:59
>      > #4  0x0001590c in rtems_initialize_data_structures ()
>      >      at ../../../../../../rtems/c/src/../../cpukit/sapi/src/exinit.c:140
>      > #5  0x0000333c in boot_card (cmdline=0x0)
>      > ---Type <return> to continue, or q <return> to quit---
>      >      at
>      >
>     ../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/bootcard.c:92
>      > #6  0x00000000 in ?? ()
>      > (gdb)
>      >
>      > Specifically, here
>      >
>     http://git.rtems.org/rtems/tree/cpukit/score/src/objectextendinformation.c#n261
>     I think this is the first time it is initialized. What's the next time
>     it is modified?
>
> Yes it's the first time. And this first time contains the invalid address, it's 
> not modified after that.
Flip your thinking of the bug. This memory is the control area for all
Classic API Tasks. It is initialized at startup and most of it won't be
touched.
The fact that a semaphore call references it is broken. :(

Step into the call to rtems_semaphore_release() on the failure path.
Probably
have to break on the close() call and step. When it calls _Semaphore_Get,
look at all the entries in _Semaphore_Information.local_table. I suspect one
or more of them doesn't actually point to a semaphore.

Break at Init and dump the contents for N (void *) slots based on the
maximum
number of Classic API Semaphores. Compare at the end.
>     But this looking like task manager class information and not a semaphore
>     like the crash so this is odd. :(
>     >
>     >     I cc'ed Gedare because I don't know how to spot that the rbtree
>     >     is empty in gdb.
>     >
>     >     You need to see where that memory is overwritten.
>     >
>     >     Again running all tests with the simulator clock tick could
>     >     eliminate the ISR code as the culprit. :)
>     >      > On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill
>     >      > <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>
>     <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>> wrote:
>     >      >> On 9/16/2014 2:17 PM, Hesham Moustafa wrote:
>     >      >>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill
>     >      >>> <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>
>     <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>> wrote:
>     >      >>>>
>     >      >>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote:
>     >      >>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill
>     >      >>>>> <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>
>     <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>> wrote:
>     >      >>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote:
>     >      >>>>>>> Hi
>     >      >>>>>>>
>     >      >>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill
>     >      >>>>>>> <joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>
>     <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>> wrote:
>     >      >>>>>>>> I don't understand this but I got it applied.
>     >      >>>>>>>>
>     >      >>>>>>>> I manually edited the saved email to delete thepreinstall.am <http://preinstall.am>
>      >     <http://preinstall.am>
>      >      >>>>>>>> changes.  I committed the rest. Then I ran bootstrap -p myself
>      >      >>>>>>>> and folded that into the rest of your patch.
>      >      >>>>>>>>
>      >      >>>>>>>> It should all be committed now.
>      >      >>>>>>>>
>      >      >>>>>>> Thanks for doing this, me too do not know what's wrong. BTW,
>     commits
>      >      >>>>>>> are not mirrored on github since 4 days ago.
>      >      >>>>>>>
>      >      >>>>>>>> How about some new test results. :)
>      >      >>>>>>>>
>      >      >>>>>>> I did run one last night, no big progress since previous
>     results :( Is
>      >      >>>>>>> there any tool, script, utility program or whatever that I
>     can use to
>      >      >>>>>>> detect wrong memory access (i.e, stack overwrite, heap
>     corruption,
>      >      >>>>>>> access to another task context)? I tried to add
>     -fstack-protector-all
>      >      >>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker
>     just hangs.
>      >      >>>>>> I haven't checked into how gcc does its stack overwrite
>     protection.
>      >      >>>>>>
>      >      >>>>>> The tests by themselves don't have these problems. The first
>      >      >>>>>> possible source is incorrect layout of sections to memory by
>      >      >>>>>> the linker script. There is some debug code in boot
>      >      >>>>>>
>      >      >>>>>> There used to be debug printk's in bspgetworkarea.c so you
>      >      >>>>>> could check if areas overlapped. That usually causes bad issues
>      >      >>>>>> though. Let's go through some basics:
>      >      >>>>>>
>      >      >>>>>> + Does hello world run and exit cleanly?
>      >      >>>>>>
>      >      >>>>> The output of Hello World is:
>      >      >>>>>
>      >      >>>>> *** BEGIN OF TEST HELLO WORLD ***
>      >      >>>>> Hello World
>      >      >>>>> *** END OF TEST HELLO WORLD ***
>      >      >>>>> Fatal Error 5.0 Halted
>      >      >>>>>
>      >      >>>>>   From GDB:
>      >      >>>>>
>      >      >>>>> Breakpoint 1, _Terminate (
>      >      >>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
>      >      >>>>> the_error=0)
>      >      >>>>>       at
>      >      >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
>      >      >>>>> 39  _ISR_Disable_without_giant( level );
>      >      >>>>> (gdb) bt
>      >      >>>>> #0  _Terminate (
>      >      >>>>>       the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,
>      >      >>>>> the_error=0)
>      >      >>>>>       at
>      >      >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
>      >      >>>>> #1  0x0003b5f8 in rtems_shutdown_executive (result=0)
>      >      >>>>>       at
>      >      >>>>>
>     ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21
>      >      >>>>> #2  0x0003b350 in _exit (status=0)
>      >      >>>>>       at
>      >      >>>>>
>      >      >>>>>
>      >   
>       ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47
>      >      >>>>> #3  0x0002cc30 in exit (code=0)
>      >      >>>>>       at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
>      >      >>>>> #4  0x00002184 in Init (ignored=253816)
>      >      >>>>>       at
>      >      >>>>>
>      >      >>>>>
>     ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33
>      >      >>>>> ---Type <return> to continue, or q <return> to quit---
>      >      >>>>> #5  0x0002c5b8 in _Thread_Handler ()
>      >      >>>>>       at
>      >      >>>>>
>     ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>      >      >>>>> #6  0x0002c540 in _User_extensions_Thread_exitted
>     (executing=0x40080)
>      >      >>>>>       at
>      >      >>>>>
>     ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
>      >      >>>> This is normal and OK. Look at the arguments to _Terminate.
>      >      >>>>>> + How far does ticker get?
>      >      >>>>>>
>      >      >>>>> Ticker goes to the end:
>      >      >>>>>
>      >      >>>>> *** BEGIN OF TEST CLOCK TICK ***
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
>      >      >>>>> TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
>      >      >>>>> TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:05   12/31/1988
>      >      >>>>> TA2  - rtems_clock_get_tod - 09:00:10   12/31/1988
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:10   12/31/1988
>      >      >>>>> TA3  - rtems_clock_get_tod - 09:00:15   12/31/1988
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:15   12/31/1988
>      >      >>>>> TA2  - rtems_clock_get_tod - 09:00:20   12/31/1988
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:20   12/31/1988
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:25   12/31/1988
>      >      >>>>> TA3  - rtems_clock_get_tod - 09:00:30   12/31/1988
>      >      >>>>> TA2  - rtems_clock_get_tod - 09:00:30   12/31/1988
>      >      >>>>> TA1  - rtems_clock_get_tod - 09:00:30   12/31/1988
>      >      >>>>> *** END OF TEST CLOCK TICK ***
>      >      >>>>> Fatal Error 9.276564 Halted
>      >      >>>>>
>      >      >>>>>   From GDB:
>      >      >>>>>
>      >      >>>>> (gdb) break _Terminate
>      >      >>>>> Breakpoint 1 at 0x19a68: file
>      >      >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c,
>     line
>      >      >>>>> 39.
>      >      >>>>> (gdb) break _OR1K_Exception_default
>      >      >>>>> Breakpoint 2 at 0x2686c: file
>      >      >>>>>
>      >      >>>>>
>      >      >>>>>
>      >   
>       ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c,
>      >      >>>>> line 22.
>      >      >>>>> (gdb) c
>      >      >>>>> The program is not being run.
>      >      >>>>> (gdb) target remote :50001
>      >      >>>>> Remote debugging using :50001
>      >      >>>>> 0x00000100 in _reset ()
>      >      >>>>> (gdb) c
>      >      >>>>> Continuing.
>      >      >>>>>
>      >      >>>>> Breakpoint 2, _OR1K_Exception_default (vector=6, frame=0x43854) at
>      >      >>>>>
>      >      >>>>>
>      >      >>>>>
>      >   
>       ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
>      >      >>>>> 22  rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION,
>     (rtems_fatal_code) frame
>      >      >>>>> );
>      >      >>>>> (gdb) bt
>      >      >>>>> #0  _OR1K_Exception_default (vector=6, frame=0x43854) at
>      >      >>>>>
>      >      >>>>>
>      >      >>>>>
>      >   
>       ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
>      >      >>>>> #1  0x00026980 in jump_to_c_handler ()
>      >      >>>>> Backtrace stopped: frame did not save the PC
>      >      >>>>>
>      >      >>>>> vector 6 is _unalign exception.
>      >      >>>> Set a break point at exit() (I think) and
>     rtems_shutdown_executive(). You
>      >      >>>> could start in the task which calls whatever kicks off the shutdown
>      >      >>>> sequence.
>      >      >>>> It looks like something in the shutdown procedure trips over
>     something.
>      >      >>>> This might be easy to debug.
>      >      >>>>
>      >      >>> I did add just a function call to _CPU_Exception_frame_print(frame);
>      >      >>>   from _OR1K_Exception_default(uint32_t vector, CPU_Exception_frame
>      >      >>> *frame)
>      >      >>> And ticker exits normally without even entering
>      >      >>> _OR1K_Exception_defaul as it did before. This is very weird.
>     Does this
>      >      >>> mean that some areas of the code are overlapped from the linker
>      >      >>> script?
>      >      >> I doubt it. I suspect something unitialized or not aligned properly.
>      >      >>
>      >      >> Set a breakpoint at
>      >      >> http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40
>      >      >> next over the print and then step through rtems_test_exit() and see
>      >      >> where it faults.
>      >      >>>> If the fault address is in the exception data, you can map that
>     back to
>      >      >>>> the
>      >      >>>> nm file and see what file that was in, then that might help.
>      >      >>>>>> + Have you tried the trick I suggested earlier to disable the
>      >      >>>>>> real clock tick driver, use the simulator idle tick code, and
>      >      >>>>>> disable all the tests that are known to fail that way. This
>      >      >>>>>> will eliminate the ISR code as an issue because you won't
>      >      >>>>>> have any (if console output if polled).  See h8sim for
>      >      >>>>>> an example. Should be a Makefile.am change, adding
>      >      >>>>>> an include to the testsuite configuration file, building
>      >      >>>>>> and running.
>      >      >>>>>>
>      >      >> --
>      >      >> Joel Sherrill, Ph.D.             Director of Research & Development
>      >      >> joel.sherrill at OARcorp.com        On-Line Applications Research
>      >      >> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>      >      >> Support Available                (256) 722-9985
>      >      >>
>      >
>      >     --
>      >     Joel Sherrill, Ph.D.             Director of Research & Development
>      >     joel.sherrill at OARcorp.com        On-Line Applications Research
>      >     Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>      >     Support Available                (256) 722-9985
>      >
>      >
>      >
>
>     --
>     Joel Sherrill, Ph.D.             Director of Research & Development
>     joel.sherrill at OARcorp.com        On-Line Applications Research
>     Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>     Support Available                (256) 722-9985
>
>

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel.sherrill at OARcorp.com        On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available                (256) 722-9985




More information about the devel mailing list