or1k test was .. Re: [PATCH] or1k: New cache manager.
Hesham Moustafa
heshamelmatary at gmail.com
Thu Sep 18 19:26:42 UTC 2014
Hi,
The semaphore functions were producing _unalign exception before (printf
issue).
I added #define STACK_CHECKER_ON to ticker and here is what I got:
*** BEGIN OF TEST CLOCK TICK ***
BLOWN STACK!!!
task control block: 0x00041488
task ID: 0x0A010001
task name: 0x00000000
task name string:
task stack area (4096 Bytes): 0x00043DA0 .. 0x00044DA0
Fatal Error 8.0 Halted
However defining RTEMS_HEAVY_STACK_DEBUG and RTEMS_HEAVY_MALLOC_DEBUG does
not affect ticker, and it runs till the end.
On Wed, Sep 17, 2014 at 10:07 PM, Joel Sherrill <joel.sherrill at oarcorp.com>
wrote:
>
>
> On 9/17/2014 2:48 PM, Hesham Moustafa wrote:
> >
> > On Wed, Sep 17, 2014 at 9:44 PM, Joel Sherrill <
joel.sherrill at oarcorp.com
> > <mailto:joel.sherrill at oarcorp.com>> wrote:
> >
> >
> > On 9/17/2014 12:44 PM, Hesham Moustafa wrote:
> > >
> > > On Tue, Sep 16, 2014 at 11:08 PM, Joel Sherrill <
joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>
> > > <mailto:joel.sherrill at oarcorp.com <mailto:
joel.sherrill at oarcorp.com>>> wrote:
> > >
> > > Gedare.. cc'ed you for help in spotting an empty rbtree
> > > in gdb. See below.
> > > On 9/16/2014 2:45 PM, Hesham Moustafa wrote:
> > > > Breakpoint 2, 0x00000600 in _unalign ()
> > > > (gdb) bt
> > > > #0 0x00000600 in _unalign ()
> > > > #1 0x0002ec4c in _RBTree_Next (
> > > > node=0x40890, dir=RBT_RIGHT)
> > > > at
> > ../../../../../../rtems/c/src/../../cpukit/score/src/rbtreenext.c:35
> > > > #2 0x0002e2f4 in _RBTree_Successor (
> > > > node=0x40890)
> > > > at
> > ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:512
> > > > #3 0x0002e8c0 in _RBTree_Extract (
> > > > the_rbtree=0x4198c,
> > > > the_node=0x40890)
> > > > at
> > >
../../../../../../rtems/c/src/../../cpukit/score/src/rbtreeextract.c:106
> > > > #4 0x00021524 in _RBTree_Get (
> > > > the_rbtree=0x4198c, dir=RBT_LEFT)
> > > > at
> > ../../cpukit/../../../or1ksim/lib/include/rtems/score/rbtree.h:540
> > > > #5 0x000215c8 in _Thread_queue_Dequeue
> > > > (the_thread_queue=0x4198c)
> > > > ---Type <return> to continue, or q <return> to quit---
> > > > at
> > >
../../../../../../rtems/c/src/../../cpukit/score/src/threadqdequeue.c:51
> > > > #6 0x00017c14 in _CORE_semaphore_Surrender
(the_semaphore=0x4198c,
> > > > id=436273153,
> > > > api_semaphore_mp_support=0x0)
> > > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/score/src/coresemsurrender.c:37
> > > > #7 0x00014868 in rtems_semaphore_release (id=436273153)
> > > > at
> >
../../../../../../rtems/c/src/../../cpukit/rtems/src/semrelease.c:102
> > > > #8 0x00026cfc in rtems_libio_unlock ()
> > > > at
../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:253
> > > > #9 0x00026d5c in rtems_filesystem_default_unlock
(mt_entry=0x49ce0)
> > > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/libfs/src/defaults/default_loc---Type
> > > > <return> to continue, or q <return> to quit---
> > > > k_and_unlock.c:39
> > > > #10 0x0002920c in rtems_filesystem_instance_unlock
(loc=0x49c5c)
> > > > at
../../cpukit/../../../or1ksim/lib/include/rtems/libio_.h:292
> > > > #11 0x00029268 in rtems_filesystem_location_free
(loc=0x49c5c)
> > > > at
> > >
../../../../../../rtems/c/src/../../cpukit/libcsupport/src/freenode.c:29
> > > > #12 0x00029734 in rtems_libio_free (
> > > > iop=0x49c50)
> > > > at
> >
../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio.c:136
> > > > #13 0x0002912c in close (fd=0)
> > > > at
> >
../../../../../../rtems/c/src/../../cpukit/libcsupport/src/close.c:38
> > > > #14 0x000064b0 in rtems_libio_exit ()
> > > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/libcsupport/src/libio_exit.c:31
> > > > ---Type <return> to continue, or q <return> to quit---
> > > > #15 0x0003b058 in _exit (status=0)
> > > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:46
> > > > #16 0x00034798 in exit (code=0)
> > > > at
../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
> > > > #17 0x00002e3c in Test_task (unused=1)
> > > > at
> > >
> >
../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:41
> > > > #18 0x000340f0 in _Thread_Handler ()
> > > > at
> > >
../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
> > > > #19 0x00034078 in _User_extensions_Thread_exitted
(executing=0x40890)
> > > > at
> > >
../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
> > > > Backtrace stopped: frame did not save the PC
> > > > (gdb)
> > > >
> > > >
> > > > It breaks at _RBTree_Next specifically at the following
line:
> > > > while ( ( current = current->child[ opp_dir ] ) != NULL )
> > > >
> > > > (gdb) p current->child[ opp_dir ]
> > > > Cannot access memory at address 0xa010006
> > > > (gdb) p current
> > > > $1 = (RBTree_Node *) 0xa010002
> > > These look like object ids.
> > > > This address is invalid, the current memory length should
be only 32
> > > > MB (0x2000000)
> > > >
> > >
> > >
http://git.rtems.org/rtems/tree/c/src/lib/libbsp/or1k/or1ksim/startup/linkcmds#n20
> > > >
> > > > So I guest current->child is overwritten somehow?
> > > Yep. Two approaches.
> > >
> > > + Set a watchpoint in gdb if it is supported. But even if
supported,
> > > it will likely slow the run tremendously.
> > >
> > > There is no HW watchpoint supported.
> > >
> > > + Break selectively and more or less binary search for where
it is
> > > overwritten. I would break at the first call to
_ISR_Dispatch
> > > (or whatever you called it) and see if it gets clobbered.
> > >
> > > That could be clobbered VERY early in the program. It could
be
> > > a blown stack. But it could just be a stray write. Check the
value
> > > of that semaphore's rbtree when you get to Init and just
> > > break periodically and see where it is corrupt.
> > >
> > > That's what I did. As you assumed, it's clobbered very early.
> > >
> > > Breakpoint 1, _Objects_Extend_information (
> > > information=0x3e26c <_RTEMS_tasks_Information>)
> > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67
> > > 67 do_extend = true;
> > > (gdb) bt
> > > #0 _Objects_Extend_information (
> > > information=0x3e26c <_RTEMS_tasks_Information>)
> > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/score/src/objectextendinformation.c:67
> > > #1 0x0001b554 in _Objects_Initialize_information (
> > > information=0x3e26c <_RTEMS_tasks_Information>,
> > > the_api=OBJECTS_CLASSIC_API, the_class=1, maximum=4,
> > > size=1424, is_string=false, maximum_name_length=4)
> > > at
> > >
> >
../../../../../../rtems/c/src/../../cpukit/score/src/objectinitializeinformation.c:126
> > > #2 0x0002c688 in _RTEMS_tasks_Manager_initialization ()
> > > at
../../../../../../rtems/c/src/../../cpukit/rtems/src/tasks.c:197
> > > #3 0x00015bd4 in _RTEMS_API_Initialize ()
> > > at
../../../../../../rtems/c/src/../../cpukit/sapi/src/rtemsapi.c:59
> > > #4 0x0001590c in rtems_initialize_data_structures ()
> > > at
../../../../../../rtems/c/src/../../cpukit/sapi/src/exinit.c:140
> > > #5 0x0000333c in boot_card (cmdline=0x0)
> > > ---Type <return> to continue, or q <return> to quit---
> > > at
> > >
> >
../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/bootcard.c:92
> > > #6 0x00000000 in ?? ()
> > > (gdb)
> > >
> > > Specifically, here
> > >
> >
http://git.rtems.org/rtems/tree/cpukit/score/src/objectextendinformation.c#n261
> > I think this is the first time it is initialized. What's the next
time
> > it is modified?
> >
> > Yes it's the first time. And this first time contains the invalid
address, it's
> > not modified after that.
> Flip your thinking of the bug. This memory is the control area for all
> Classic API Tasks. It is initialized at startup and most of it won't be
> touched.
> The fact that a semaphore call references it is broken. :(
>
> Step into the call to rtems_semaphore_release() on the failure path.
> Probably
> have to break on the close() call and step. When it calls _Semaphore_Get,
> look at all the entries in _Semaphore_Information.local_table. I suspect
one
> or more of them doesn't actually point to a semaphore.
>
> Break at Init and dump the contents for N (void *) slots based on the
> maximum
> number of Classic API Semaphores. Compare at the end.
> > But this looking like task manager class information and not a
semaphore
> > like the crash so this is odd. :(
> > >
> > > I cc'ed Gedare because I don't know how to spot that the
rbtree
> > > is empty in gdb.
> > >
> > > You need to see where that memory is overwritten.
> > >
> > > Again running all tests with the simulator clock tick could
> > > eliminate the ISR code as the culprit. :)
> > > > On Tue, Sep 16, 2014 at 9:21 PM, Joel Sherrill
> > > > <joel.sherrill at oarcorp.com <mailto:
joel.sherrill at oarcorp.com>
> > <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>>
wrote:
> > > >> On 9/16/2014 2:17 PM, Hesham Moustafa wrote:
> > > >>> On Tue, Sep 16, 2014 at 8:42 PM, Joel Sherrill
> > > >>> <joel.sherrill at oarcorp.com <mailto:
joel.sherrill at oarcorp.com>
> > <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>>
wrote:
> > > >>>>
> > > >>>> On 9/16/2014 1:34 PM, Hesham Moustafa wrote:
> > > >>>>> On Tue, Sep 16, 2014 at 8:15 PM, Joel Sherrill
> > > >>>>> <joel.sherrill at oarcorp.com <mailto:
joel.sherrill at oarcorp.com>
> > <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>>
wrote:
> > > >>>>>> On 9/16/2014 12:54 PM, Hesham Moustafa wrote:
> > > >>>>>>> Hi
> > > >>>>>>>
> > > >>>>>>> On Tue, Sep 16, 2014 at 7:47 PM, Joel Sherrill
> > > >>>>>>> <joel.sherrill at oarcorp.com <mailto:
joel.sherrill at oarcorp.com>
> > <mailto:joel.sherrill at oarcorp.com <mailto:joel.sherrill at oarcorp.com>>>
wrote:
> > > >>>>>>>> I don't understand this but I got it applied.
> > > >>>>>>>>
> > > >>>>>>>> I manually edited the saved email to delete
thepreinstall.am <http://preinstall.am>
> > > <http://preinstall.am>
> > > >>>>>>>> changes. I committed the rest. Then I ran
bootstrap -p myself
> > > >>>>>>>> and folded that into the rest of your patch.
> > > >>>>>>>>
> > > >>>>>>>> It should all be committed now.
> > > >>>>>>>>
> > > >>>>>>> Thanks for doing this, me too do not know what's
wrong. BTW,
> > commits
> > > >>>>>>> are not mirrored on github since 4 days ago.
> > > >>>>>>>
> > > >>>>>>>> How about some new test results. :)
> > > >>>>>>>>
> > > >>>>>>> I did run one last night, no big progress since
previous
> > results :( Is
> > > >>>>>>> there any tool, script, utility program or whatever
that I
> > can use to
> > > >>>>>>> detect wrong memory access (i.e, stack overwrite,
heap
> > corruption,
> > > >>>>>>> access to another task context)? I tried to add
> > -fstack-protector-all
> > > >>>>>>> to gcc, but QEMU did not get anything or core-dump,
ticker
> > just hangs.
> > > >>>>>> I haven't checked into how gcc does its stack
overwrite
> > protection.
> > > >>>>>>
> > > >>>>>> The tests by themselves don't have these problems.
The first
> > > >>>>>> possible source is incorrect layout of sections to
memory by
> > > >>>>>> the linker script. There is some debug code in boot
> > > >>>>>>
> > > >>>>>> There used to be debug printk's in bspgetworkarea.c
so you
> > > >>>>>> could check if areas overlapped. That usually causes
bad issues
> > > >>>>>> though. Let's go through some basics:
> > > >>>>>>
> > > >>>>>> + Does hello world run and exit cleanly?
> > > >>>>>>
> > > >>>>> The output of Hello World is:
> > > >>>>>
> > > >>>>> *** BEGIN OF TEST HELLO WORLD ***
> > > >>>>> Hello World
> > > >>>>> *** END OF TEST HELLO WORLD ***
> > > >>>>> Fatal Error 5.0 Halted
> > > >>>>>
> > > >>>>> From GDB:
> > > >>>>>
> > > >>>>> Breakpoint 1, _Terminate (
> > > >>>>> the_source=RTEMS_FATAL_SOURCE_EXIT,
is_internal=false,
> > > >>>>> the_error=0)
> > > >>>>> at
> > > >>>>>
../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
> > > >>>>> 39 _ISR_Disable_without_giant( level );
> > > >>>>> (gdb) bt
> > > >>>>> #0 _Terminate (
> > > >>>>> the_source=RTEMS_FATAL_SOURCE_EXIT,
is_internal=false,
> > > >>>>> the_error=0)
> > > >>>>> at
> > > >>>>>
../../../../../../rtems/c/src/../../cpukit/score/src/interr.c:39
> > > >>>>> #1 0x0003b5f8 in rtems_shutdown_executive (result=0)
> > > >>>>> at
> > > >>>>>
> > ../../../../../../rtems/c/src/../../cpukit/sapi/src/exshutdown.c:21
> > > >>>>> #2 0x0003b350 in _exit (status=0)
> > > >>>>> at
> > > >>>>>
> > > >>>>>
> > >
> >
../../../../../../rtems/c/src/../../cpukit/libcsupport/src/newlibc_exit.c:47
> > > >>>>> #3 0x0002cc30 in exit (code=0)
> > > >>>>> at
../../../../../gcc-4.8.3/newlib/libc/stdlib/exit.c:70
> > > >>>>> #4 0x00002184 in Init (ignored=253816)
> > > >>>>> at
> > > >>>>>
> > > >>>>>
> >
../../../../../../../rtems/c/src/../../testsuites/samples/hello/init.c:33
> > > >>>>> ---Type <return> to continue, or q <return> to quit---
> > > >>>>> #5 0x0002c5b8 in _Thread_Handler ()
> > > >>>>> at
> > > >>>>>
> >
../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
> > > >>>>> #6 0x0002c540 in _User_extensions_Thread_exitted
> > (executing=0x40080)
> > > >>>>> at
> > > >>>>>
> >
../../cpukit/../../../or1ksim/lib/include/rtems/score/userextimpl.h:243
> > > >>>> This is normal and OK. Look at the arguments to
_Terminate.
> > > >>>>>> + How far does ticker get?
> > > >>>>>>
> > > >>>>> Ticker goes to the end:
> > > >>>>>
> > > >>>>> *** BEGIN OF TEST CLOCK TICK ***
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988
> > > >>>>> TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988
> > > >>>>> TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:05 12/31/1988
> > > >>>>> TA2 - rtems_clock_get_tod - 09:00:10 12/31/1988
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:10 12/31/1988
> > > >>>>> TA3 - rtems_clock_get_tod - 09:00:15 12/31/1988
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:15 12/31/1988
> > > >>>>> TA2 - rtems_clock_get_tod - 09:00:20 12/31/1988
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:20 12/31/1988
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:25 12/31/1988
> > > >>>>> TA3 - rtems_clock_get_tod - 09:00:30 12/31/1988
> > > >>>>> TA2 - rtems_clock_get_tod - 09:00:30 12/31/1988
> > > >>>>> TA1 - rtems_clock_get_tod - 09:00:30 12/31/1988
> > > >>>>> *** END OF TEST CLOCK TICK ***
> > > >>>>> Fatal Error 9.276564 Halted
> > > >>>>>
> > > >>>>> From GDB:
> > > >>>>>
> > > >>>>> (gdb) break _Terminate
> > > >>>>> Breakpoint 1 at 0x19a68: file
> > > >>>>>
../../../../../../rtems/c/src/../../cpukit/score/src/interr.c,
> > line
> > > >>>>> 39.
> > > >>>>> (gdb) break _OR1K_Exception_default
> > > >>>>> Breakpoint 2 at 0x2686c: file
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > >
> >
../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c,
> > > >>>>> line 22.
> > > >>>>> (gdb) c
> > > >>>>> The program is not being run.
> > > >>>>> (gdb) target remote :50001
> > > >>>>> Remote debugging using :50001
> > > >>>>> 0x00000100 in _reset ()
> > > >>>>> (gdb) c
> > > >>>>> Continuing.
> > > >>>>>
> > > >>>>> Breakpoint 2, _OR1K_Exception_default (vector=6,
frame=0x43854) at
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > >
> >
../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
> > > >>>>> 22 rtems_fatal( RTEMS_FATAL_SOURCE_EXCEPTION,
> > (rtems_fatal_code) frame
> > > >>>>> );
> > > >>>>> (gdb) bt
> > > >>>>> #0 _OR1K_Exception_default (vector=6, frame=0x43854)
at
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > >
> >
../../../../../../../../rtems/c/src/../../cpukit/score/cpu/or1k/or1k-exception-default.c:22
> > > >>>>> #1 0x00026980 in jump_to_c_handler ()
> > > >>>>> Backtrace stopped: frame did not save the PC
> > > >>>>>
> > > >>>>> vector 6 is _unalign exception.
> > > >>>> Set a break point at exit() (I think) and
> > rtems_shutdown_executive(). You
> > > >>>> could start in the task which calls whatever kicks off
the shutdown
> > > >>>> sequence.
> > > >>>> It looks like something in the shutdown procedure
trips over
> > something.
> > > >>>> This might be easy to debug.
> > > >>>>
> > > >>> I did add just a function call to
_CPU_Exception_frame_print(frame);
> > > >>> from _OR1K_Exception_default(uint32_t vector,
CPU_Exception_frame
> > > >>> *frame)
> > > >>> And ticker exits normally without even entering
> > > >>> _OR1K_Exception_defaul as it did before. This is very
weird.
> > Does this
> > > >>> mean that some areas of the code are overlapped from
the linker
> > > >>> script?
> > > >> I doubt it. I suspect something unitialized or not
aligned properly.
> > > >>
> > > >> Set a breakpoint at
> > > >>
http://git.rtems.org/rtems/tree/testsuites/samples/ticker/tasks.c#n40
> > > >> next over the print and then step through
rtems_test_exit() and see
> > > >> where it faults.
> > > >>>> If the fault address is in the exception data, you can
map that
> > back to
> > > >>>> the
> > > >>>> nm file and see what file that was in, then that might
help.
> > > >>>>>> + Have you tried the trick I suggested earlier to
disable the
> > > >>>>>> real clock tick driver, use the simulator idle tick
code, and
> > > >>>>>> disable all the tests that are known to fail that
way. This
> > > >>>>>> will eliminate the ISR code as an issue because you
won't
> > > >>>>>> have any (if console output if polled). See h8sim
for
> > > >>>>>> an example. Should be a Makefile.am change, adding
> > > >>>>>> an include to the testsuite configuration file,
building
> > > >>>>>> and running.
> > > >>>>>>
> > > >> --
> > > >> Joel Sherrill, Ph.D. Director of Research &
Development
> > > >> joel.sherrill at OARcorp.com On-Line Applications
Research
> > > >> Ask me about RTEMS: a free RTOS Huntsville AL 35805
> > > >> Support Available (256) 722-9985
> > > >>
> > >
> > > --
> > > Joel Sherrill, Ph.D. Director of Research &
Development
> > > joel.sherrill at OARcorp.com On-Line Applications
Research
> > > Ask me about RTEMS: a free RTOS Huntsville AL 35805
> > > Support Available (256) 722-9985
> > >
> > >
> > >
> >
> > --
> > Joel Sherrill, Ph.D. Director of Research & Development
> > joel.sherrill at OARcorp.com On-Line Applications Research
> > Ask me about RTEMS: a free RTOS Huntsville AL 35805
> > Support Available (256) 722-9985
> >
> >
>
> --
> Joel Sherrill, Ph.D. Director of Research & Development
> joel.sherrill at OARcorp.com On-Line Applications Research
> Ask me about RTEMS: a free RTOS Huntsville AL 35805
> Support Available (256) 722-9985
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20140918/d42ab61e/attachment-0002.html>
More information about the devel
mailing list