<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Sep 17, 2014 at 9:44 PM, Joel Sherrill <span dir="ltr"><<a href="mailto:joel.sherrill@oarcorp.com" target="_blank">joel.sherrill@oarcorp.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
On 9/17/2014 12:44 PM, Hesham Moustafa wrote:<br>
><br>
> On Tue, Sep 16, 2014 at 11:08 PM, Joel Sherrill <<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a><br>
</span><div><div class="h5">> <mailto:<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a>>> wrote:<br>
><br>
> Gedare.. cc'ed you for help in spotting an empty rbtree<br>
> in gdb. See below.<br>
> On 9/16/2014 2:45 PM, Hesham Moustafa wrote:<br>
> > Breakpoint 2, 0x00000600 in _unalign ()<br>
> > (gdb) bt<br>
> > #0 0x00000600 in _unalign ()<br>
> > #1 0x0002ec4c in _RBTree_Next (<br>
> > node=0x40890, dir=RBT_RIGHT)<br>
> > at ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35<br>
> > #2 0x0002e2f4 in _RBTree_Successor (<br>
> > node=0x40890)<br>
> > at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512<br>
> > #3 0x0002e8c0 in _RBTree_Extract (<br>
> > the_rbtree=0x4198c,<br>
> > the_node=0x40890)<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106<br>
> > #4 0x00021524 in _RBTree_Get (<br>
> > the_rbtree=0x4198c, dir=RBT_LEFT)<br>
> > at ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540<br>
> > #5 0x000215c8 in _Thread_queue_Dequeue<br>
> > (the_thread_queue=0x4198c)<br>
> > ---Type <return> to continue, or q <return> to quit---<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51<br>
> > #6 0x00017c14 in _CORE_semaphore_Surrender (the_semaphore=0x4198c,<br>
> > id=436273153,<br>
> > api_semaphore_mp_support=0x0)<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37<br>
> > #7 0x00014868 in rtems_semaphore_release (id=436273153)<br>
> > at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102<br>
> > #8 0x00026cfc in rtems_libio_unlock ()<br>
> > at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253<br>
> > #9 0x00026d5c in rtems_filesystem_default_unlock (mt_entry=0x49ce0)<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type<br>
> > <return> to continue, or q <return> to quit---<br>
> > k_and_unlock.c:39<br>
> > #10 0x0002920c in rtems_filesystem_instance_unlock (loc=0x49c5c)<br>
> > at ../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292<br>
> > #11 0x00029268 in rtems_filesystem_location_free (loc=0x49c5c)<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29<br>
> > #12 0x00029734 in rtems_libio_free (<br>
> > iop=0x49c50)<br>
> > at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136<br>
> > #13 0x0002912c in close (fd=0)<br>
> > at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38<br>
> > #14 0x000064b0 in rtems_libio_exit ()<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31<br>
> > ---Type <return> to continue, or q <return> to quit---<br>
> > #15 0x0003b058 in _exit (status=0)<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46<br>
> > #16 0x00034798 in exit (code=0)<br>
> > at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70<br>
> > #17 0x00002e3c in Test_task (unused=1)<br>
> > at<br>
> ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41<br>
> > #18 0x000340f0 in _Thread_Handler ()<br>
> > at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192<br>
> > #19 0x00034078 in _User_extensions_Thread_exitted (executing=0x40890)<br>
> > at<br>
> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243<br>
> > Backtrace stopped: frame did not save the PC<br>
> > (gdb)<br>
> ><br>
> ><br>
> > It breaks at _RBTree_Next specifically at the following line:<br>
> > while ( ( current = current->child[ opp_dir ] ) != NULL )<br>
> ><br>
> > (gdb) p current->child[ opp_dir ]<br>
> > Cannot access memory at address 0xa010006<br>
> > (gdb) p current<br>
> > $1 = (RBTree_Node *) 0xa010002<br>
> These look like object ids.<br>
> > This address is invalid, the current memory length should be only 32<br>
> > MB (0x2000000)<br>
> ><br>
> ><a href="http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20" target="_blank">http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20</a><br>
> ><br>
> > So I guest current->child is overwritten somehow?<br>
> Yep. Two approaches.<br>
><br>
> + Set a watchpoint in gdb if it is supported. But even if supported,<br>
> it will likely slow the run tremendously.<br>
><br>
> There is no HW watchpoint supported.<br>
><br>
> + Break selectively and more or less binary search for where it is<br>
> overwritten. I would break at the first call to _ISR_Dispatch<br>
> (or whatever you called it) and see if it gets clobbered.<br>
><br>
> That could be clobbered VERY early in the program. It could be<br>
> a blown stack. But it could just be a stray write. Check the value<br>
> of that semaphore's rbtree when you get to Init and just<br>
> break periodically and see where it is corrupt.<br>
><br>
> That's what I did. As you assumed, it's clobbered very early.<br>
><br>
> Breakpoint 1, _Objects_Extend_information (<br>
> information=0x3e26c <_RTEMS_tasks_Information>)<br>
> at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67<br>
> 67 do_extend = true;<br>
> (gdb) bt<br>
> #0 _Objects_Extend_information (<br>
> information=0x3e26c <_RTEMS_tasks_Information>)<br>
> at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67<br>
> #1 0x0001b554 in _Objects_Initialize_information (<br>
> information=0x3e26c <_RTEMS_tasks_Information>,<br>
> the_api=OBJECTS_CLASSIC_API, the_class=1, maximum=4,<br>
> size=1424, is_string=false, maximum_name_length=4)<br>
> at<br>
> ../../../../../../rtems/c/src/../../cpukit/score/src/objectinitializeinformation.c:126<br>
> #2 0x0002c688 in _RTEMS_tasks_Manager_initialization ()<br>
> at ../../../../../../rtems/c/src/../../cpukit/rtems/src/tasks.c:197<br>
> #3 0x00015bd4 in _RTEMS_API_Initialize ()<br>
> at ../../../../../../rtems/c/src/../../cpukit/sapi/src/rtemsapi.c:59<br>
> #4 0x0001590c in rtems_initialize_data_structures ()<br>
> at ../../../../../../rtems/c/src/../../cpukit/sapi/src/exinit.c:140<br>
> #5 0x0000333c in boot_card (cmdline=0x0)<br>
> ---Type <return> to continue, or q <return> to quit---<br>
> at<br>
> ../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/bootcard.c:92<br>
> #6 0x00000000 in ?? ()<br>
> (gdb)<br>
><br>
> Specifically, here<br>
> <a href="http://git.rtems.org/rtems/tree/cpukit/score/src/objectextendinformation.c#n261" target="_blank">http://git.rtems.org/rtems/tree/cpukit/score/src/objectextendinformation.c#n261</a><br>
</div></div>I think this is the first time it is initialized. What's the next time<br>
it is modified?<br>
<br></blockquote><div>Yes it's the first time. And this first time contains the invalid address, it's not modified after that. </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
But this looking like task manager class information and not a semaphore<br>
like the crash so this is odd. :(<br>
<span class="">><br>
> I cc'ed Gedare because I don't know how to spot that the rbtree<br>
> is empty in gdb.<br>
><br>
> You need to see where that memory is overwritten.<br>
><br>
> Again running all tests with the simulator clock tick could<br>
> eliminate the ISR code as the culprit. :)<br>
> > On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill<br>
</span><span class="">> > <<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a> <mailto:<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a>>> wrote:<br>
> >> On 9/16/2014 2:17 PM, Hesham Moustafa wrote:<br>
> >>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill<br>
</span><span class="">> >>> <<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a> <mailto:<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a>>> wrote:<br>
> >>>><br>
> >>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote:<br>
> >>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill<br>
</span><span class="">> >>>>> <<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a> <mailto:<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a>>> wrote:<br>
> >>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote:<br>
> >>>>>>> Hi<br>
> >>>>>>><br>
> >>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill<br>
</span><span class="">> >>>>>>> <<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a> <mailto:<a href="mailto:joel.sherrill@oarcorp.com">joel.sherrill@oarcorp.com</a>>> wrote:<br>
> >>>>>>>> I don't understand this but I got it applied.<br>
> >>>>>>>><br>
> >>>>>>>> I manually edited the saved email to delete the <a href="http://preinstall.am" target="_blank">preinstall.am</a><br>
</span>> <<a href="http://preinstall.am" target="_blank">http://preinstall.am</a>><br>
<div class="HOEnZb"><div class="h5">> >>>>>>>> changes. I committed the rest. Then I ran bootstrap -p myself<br>
> >>>>>>>> and folded that into the rest of your patch.<br>
> >>>>>>>><br>
> >>>>>>>> It should all be committed now.<br>
> >>>>>>>><br>
> >>>>>>> Thanks for doing this, me too do not know what's wrong. BTW, commits<br>
> >>>>>>> are not mirrored on github since 4 days ago.<br>
> >>>>>>><br>
> >>>>>>>> How about some new test results. :)<br>
> >>>>>>>><br>
> >>>>>>> I did run one last night, no big progress since previous results :( Is<br>
> >>>>>>> there any tool, script, utility program or whatever that I can use to<br>
> >>>>>>> detect wrong memory access (i.e, stack overwrite, heap corruption,<br>
> >>>>>>> access to another task context)? I tried to add -fstack-protector-all<br>
> >>>>>>> to gcc, but QEMU did not get anything or core-dump, ticker just hangs.<br>
> >>>>>> I haven't checked into how gcc does its stack overwrite protection.<br>
> >>>>>><br>
> >>>>>> The tests by themselves don't have these problems. The first<br>
> >>>>>> possible source is incorrect layout of sections to memory by<br>
> >>>>>> the linker script. There is some debug code in boot<br>
> >>>>>><br>
> >>>>>> There used to be debug printk's in bspgetworkarea.c so you<br>
> >>>>>> could check if areas overlapped. That usually causes bad issues<br>
> >>>>>> though. Let's go through some basics:<br>
> >>>>>><br>
> >>>>>> + Does hello world run and exit cleanly?<br>
> >>>>>><br>
> >>>>> The output of Hello World is:<br>
> >>>>><br>
> >>>>> *** BEGIN OF TEST HELLO WORLD ***<br>
> >>>>> Hello World<br>
> >>>>> *** END OF TEST HELLO WORLD ***<br>
> >>>>> Fatal Error 5.0 Halted<br>
> >>>>><br>
> >>>>> From GDB:<br>
> >>>>><br>
> >>>>> Breakpoint 1, _Terminate (<br>
> >>>>> the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,<br>
> >>>>> the_error=0)<br>
> >>>>> at<br>
> >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39<br>
> >>>>> 39 _ISR_Disable_without_giant( level );<br>
> >>>>> (gdb) bt<br>
> >>>>> #0 _Terminate (<br>
> >>>>> the_source=RTEMS_FATAL_SOURCE_EXIT, is_internal=false,<br>
> >>>>> the_error=0)<br>
> >>>>> at<br>
> >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39<br>
> >>>>> #1 0x0003b5f8 in rtems_shutdown_executive (result=0)<br>
> >>>>> at<br>
> >>>>> ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21<br>
> >>>>> #2 0x0003b350 in _exit (status=0)<br>
> >>>>> at<br>
> >>>>><br>
> >>>>><br>
> ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47<br>
> >>>>> #3 0x0002cc30 in exit (code=0)<br>
> >>>>> at ../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70<br>
> >>>>> #4 0x00002184 in Init (ignored=253816)<br>
> >>>>> at<br>
> >>>>><br>
> >>>>> ../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33<br>
> >>>>> ---Type <return> to continue, or q <return> to quit---<br>
> >>>>> #5 0x0002c5b8 in _Thread_Handler ()<br>
> >>>>> at<br>
> >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192<br>
> >>>>> #6 0x0002c540 in _User_extensions_Thread_exitted (executing=0x40080)<br>
> >>>>> at<br>
> >>>>> ../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243<br>
> >>>> This is normal and OK. Look at the arguments to _Terminate.<br>
> >>>>>> + How far does ticker get?<br>
> >>>>>><br>
> >>>>> Ticker goes to the end:<br>
> >>>>><br>
> >>>>> *** BEGIN OF TEST CLOCK TICK ***<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988<br>
> >>>>> TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988<br>
> >>>>> TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:05 12/31/1988<br>
> >>>>> TA2 - rtems_clock_get_tod - 09:00:10 12/31/1988<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:10 12/31/1988<br>
> >>>>> TA3 - rtems_clock_get_tod - 09:00:15 12/31/1988<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:15 12/31/1988<br>
> >>>>> TA2 - rtems_clock_get_tod - 09:00:20 12/31/1988<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:20 12/31/1988<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:25 12/31/1988<br>
> >>>>> TA3 - rtems_clock_get_tod - 09:00:30 12/31/1988<br>
> >>>>> TA2 - rtems_clock_get_tod - 09:00:30 12/31/1988<br>
> >>>>> TA1 - rtems_clock_get_tod - 09:00:30 12/31/1988<br>
> >>>>> *** END OF TEST CLOCK TICK ***<br>
> >>>>> Fatal Error 9.276564 Halted<br>
> >>>>><br>
> >>>>> From GDB:<br>
> >>>>><br>
> >>>>> (gdb) break _Terminate<br>
> >>>>> Breakpoint 1 at 0x19a68: file<br>
> >>>>> ../../../../../../rtems/c/src/../../cpukit/score/src/interr.c, line<br>
> >>>>> 39.<br>
> >>>>> (gdb) break _OR1K_Exception_default<br>
> >>>>> Breakpoint 2 at 0x2686c: file<br>
> >>>>><br>
> >>>>><br>
> >>>>><br>
> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c,<br>
> >>>>> line 22.<br>
> >>>>> (gdb) c<br>
> >>>>> The program is not being run.<br>
> >>>>> (gdb) target remote :50001<br>
> >>>>> Remote debugging using :50001<br>
> >>>>> 0x00000100 in _reset ()<br>
> >>>>> (gdb) c<br>
> >>>>> Continuing.<br>
> >>>>><br>
> >>>>> Breakpoint 2, _OR1K_Exception_default (vector=6, frame=0x43854) at<br>
> >>>>><br>
> >>>>><br>
> >>>>><br>
> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22<br>
> >>>>> 22 rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION, (rtems_fatal_code) frame<br>
> >>>>> );<br>
> >>>>> (gdb) bt<br>
> >>>>> #0 _OR1K_Exception_default (vector=6, frame=0x43854) at<br>
> >>>>><br>
> >>>>><br>
> >>>>><br>
> ../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22<br>
> >>>>> #1 0x00026980 in jump_to_c_handler ()<br>
> >>>>> Backtrace stopped: frame did not save the PC<br>
> >>>>><br>
> >>>>> vector 6 is _unalign exception.<br>
> >>>> Set a break point at exit() (I think) and rtems_shutdown_executive(). You<br>
> >>>> could start in the task which calls whatever kicks off the shutdown<br>
> >>>> sequence.<br>
> >>>> It looks like something in the shutdown procedure trips over something.<br>
> >>>> This might be easy to debug.<br>
> >>>><br>
> >>> I did add just a function call to _CPU_Exception_frame_print(frame);<br>
> >>> from _OR1K_Exception_default(uint32_t vector, CPU_Exception_frame<br>
> >>> *frame)<br>
> >>> And ticker exits normally without even entering<br>
> >>> _OR1K_Exception_defaul as it did before. This is very weird. Does this<br>
> >>> mean that some areas of the code are overlapped from the linker<br>
> >>> script?<br>
> >> I doubt it. I suspect something unitialized or not aligned properly.<br>
> >><br>
> >> Set a breakpoint at<br>
> >> <a href="http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40" target="_blank">http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40</a><br>
> >> next over the print and then step through rtems_test_exit() and see<br>
> >> where it faults.<br>
> >>>> If the fault address is in the exception data, you can map that back to<br>
> >>>> the<br>
> >>>> nm file and see what file that was in, then that might help.<br>
> >>>>>> + Have you tried the trick I suggested earlier to disable the<br>
> >>>>>> real clock tick driver, use the simulator idle tick code, and<br>
> >>>>>> disable all the tests that are known to fail that way. This<br>
> >>>>>> will eliminate the ISR code as an issue because you won't<br>
> >>>>>> have any (if console output if polled). See h8sim for<br>
> >>>>>> an example. Should be a Makefile.am change, adding<br>
> >>>>>> an include to the testsuite configuration file, building<br>
> >>>>>> and running.<br>
> >>>>>><br>
> >> --<br>
> >> Joel Sherrill, Ph.D. Director of Research & Development<br>
> >> joel.sherrill@OARcorp.com On-Line Applications Research<br>
> >> Ask me about RTEMS: a free RTOS Huntsville AL 35805<br>
> >> Support Available (256) 722-9985<br>
> >><br>
><br>
> --<br>
> Joel Sherrill, Ph.D. Director of Research & Development<br>
> joel.sherrill@OARcorp.com On-Line Applications Research<br>
> Ask me about RTEMS: a free RTOS Huntsville AL 35805<br>
> Support Available (256) 722-9985<br>
><br>
><br>
><br>
<br>
--<br>
Joel Sherrill, Ph.D. Director of Research & Development<br>
joel.sherrill@OARcorp.com On-Line Applications Research<br>
Ask me about RTEMS: a free RTOS Huntsville AL 35805<br>
Support Available (256) 722-9985<br>
<br>
</div></div></blockquote></div><br></div></div>