<offlist> or1k printf causes crash
Hesham Moustafa
heshamelmatary at gmail.com
Thu Aug 21 22:00:05 UTC 2014
On Thu, Aug 21, 2014 at 11:54 PM, Joel Sherrill
<joel.sherrill at oarcorp.com> wrote:
>
> On 8/21/2014 4:15 PM, Hesham Moustafa wrote:
>> On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill
>> <joel.sherrill at oarcorp.com> wrote:
>>> On 8/21/2014 2:44 PM, Hesham Moustafa wrote:
>>>> Hi,
>>>>
>>>> I have been debugging since a while or1k code hopefully I'd find
>>>> what's wrong. Here's what I got.
>>> First I am moving this to devel@ so others can chime in.
>>>> First, I asked about this problem at #openrisc IRC channel, they told
>>>> me the problem might be that I have to take account of the red-zone, I
>>>> asked what's the red-zone and Stefan said the following:
>>>> "the first 128 bytes of the stack has to be stepped over, leaf
>>>> functions might use that without modifying the stack pointer, and gcc
>>>> takes advantage of the fact that there is a red zone in non-leaf
>>>> functions prologues too. i.e. it stores things on the stack and *then*
>>>> update the stack pointer"
>>> This is a bug in gcc. We have seen it on the ARM and there was a recent
>>> dust up from the Linux kernel community because it happened on x86-64.
>>> My understanding is that there was rework/improvement which triggered
>>> bugs in backends. But this needs to be fixed.
>>>
>>> The sp must be updated before the memory can be used. This is just
>>> a bug otherwise.
>>>> He suggested that I add 128 bytes to stack pointer before I jump to
>>>> _ISR_Handler (from start.S). I tried this solution and I was not
>>>> lucky. You may have some ideas where/when this red-zone make problem.
>>> You probably need to
>>>> Second, I discovered that there is unusual (unalign) exception happens
>>>> when using printf (which does not happen with printk). When I stack, I
>>>> found out the problem happens in rtems_semaphore_obtain(), when trying
>>>> to access the_semaphore data which its pointer is returned (invalid
>>>> pointer) from a call to _Objects_Get_isr_disable(). This exception
>>>> only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to
>>>> _Thread_Dispatch and make a successful context switch and run the
>>>> first task. The following is a snapshot of the output when
>>>> encountering this problem.
>>> What's the alignment of the task stack in the port? The stack may not be
>>> properly aligned for the widest access of the or1k.
>> If you mean the following:
>> #define CPU_STACK_ALIGNMENT 0
>> but even if with this macro assigned to 4 or 8, I got the same problem.
>> and from linkcmds.base
>> bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8;
> Hmm.. ok .. then we need to know the instruction. 8 is normally a wide
> enough alignment since that is the usually like a double or 64-bit access.
>>>> "*** BEGIN OF TEST CLOCK TICK ***
>>>> TA1 - rtems_clock_get_tod - 09:00:00 12/31/1988
>>>> TA2 - rtems_clock_get_tod - 09:00:00 12/31/1988
>>>> TA3 - rtems_clock_get_tod - 09:00:00 12/31/1988
>>>> Fatal Error 263572 Halted"
>>> Can you tell what the instruction is? And the address it is trying to
>>> access.
>> The _Objects_Get_isr_disable() function returns a weird address for
>> Object (which in tern should be the_semaphore), this address is
>> 0x8007, it seems like the value of the SR register. All previous
>> Object/the_semaphore addresses returned from
>> _Objects_Get_isr_disable() are higher addresses, that's why I indicate
>> that the last (0x8007) Object address is invalid.
> _Objects_Get_isr_disable() will return an address from the RTEMS Workspace
> which would tend to be a higher RAM address.
>
> Random thought. Temporarily disable the "real hardware" clock tick driver
> in your BSP and add the simulated clock tick driver. See h8sim BSP's
> Makefile
> for an example. We need to eliminate that your ISR code is doing the
> right thing. You could be getting an interrupt at the wrong time and
> just clobbering a register. Doing this will let the test run without
> interrupts.
>
> What is the value of _Watchdog_Ticks_since_boot at this fault?
>
5. Pleaes note that I replaced ticker wake_after call (to avoid
waiting long time) with the following
status = rtems_task_wake_after(
task_index * 5
);
And it was making the context switch to the first task, the unalign
happens when task 1 (after the context switch) tries to use printf and
semaphore obtain.
>>>> I set a break point at a call to _Objects_Get_isr_disable() and
>>>> continued until the call that returns the invalid Object pointer, and
>>>> typed bt to get the following stack:
>>> Another possibility is that the register/memory constraints on
>>> enable/disable
>>> interrupts isn't right and it is confusing gcc. You could be randomly
>>> clobbering
>>> registers anytime ISRs are disabled/enabled.
>>>
>>> Christian.. can you review that code?
>>>> "
>>>> #0 _Objects_Get_isr_disable (
>>>> information=0x3ba54 <_Semaphore_Information>,
>>>> id=436273156, location=0x406b4, level_p=0x406b0)
>>>> at ../../../../../../rtems/c/src/../../cpukit/score/src/objectgetisr.c:34
>>>> #1 0x00014294 in _Semaphore_Get_interrupt_disable (
>>>> id=436273156, location=0x406b4, level=0x406b0)
>>>> at ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/rtems/semimpl.h:196
>>>> #2 0x000142e0 in rtems_semaphore_obtain (id=436273156,
>>>> option_set=0, timeout=0)
>>>> at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semobtain.c:47
>>>> #3 0x0000d648 in rtems_termios_write (arg=0x40730)
>>>> at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/termios.c:1099
>>>> #4 0x00004380 in console_write (major=0, minor=0,
>>>> arg=0x40730)
>>>> at ../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/console_write.c:42
>>>> #5 0x00031cc4 in rtems_io_write (major=0, minor=0,
>>>> argument=0x40730)
>>>> at ../../../../../../rtems/c/src/../../cpukit/sapi/src/---Type
>>>> <return> to continue, or q <return> to quit---
>>>> iowrite.c:37
>>>> #6 0x000305f0 in rtems_deviceio_write (iop=0x46a30,
>>>> buf=0x4088c, nbyte=1, major=0, minor=0)
>>>> at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/sup_fs_deviceio.c:109
>>>> #7 0x0002fc70 in device_write (iop=0x46a30,
>>>> buffer=0x4088c, count=1)
>>>> at ../../../../../../rtems/c/src/../../cpukit/libfs/src/imfs/deviceio.c:90
>>>> #8 0x00038f14 in write (fd=2, buffer=0x4088c, count=1)
>>>> at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/write.c:48
>>>> #9 0x00038d54 in _write_r (ptr=0x3db40, fd=2,
>>>> buf=0x4088c, nbytes=1)
>>>> at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/write_r.c:41
>>>> #10 0x00033198 in __swrite (ptr=0x3db40, cookie=0x3dd68,
>>>> buf=0x4088c "T\004\b\220", n=1)
>>>> at ../../../../../gcc-4.8.2/newlib/libc/stdio/stdio.c:97
>>>> #11 0x000357c0 in __sfvwrite_r (ptr=0x3db40, fp=0x3dd68,
>>>> uio=0x40840)
>>>> at ../../../../../gcc-4.8.2/newlib/libc/stdio/fvwrite.c---Type
>>>> <return> to continue, or q <return> to quit---
>>>> :99
>>>> #12 0x000338a0 in __sprint_r (ptr=ptr at entry=0x3db40,
>>>> fp=fp at entry=0x3dd68, uio=uio at entry=0x40840)
>>>> at ../../../../../gcc-4.8.2/newlib/libc/stdio/vfprintf.c:437
>>>> #13 0x000345e0 in __sprint_r (uio=0x40840, fp=0x3dd68,
>>>> ptr=0x3db40)
>>>> at ../../../../../gcc-4.8.2/newlib/libc/stdio/vfprintf.c:1776
>>>> #14 _vfiprintf_r (data=0x3db40, fp=fp at entry=0x3dd68,
>>>> fmt0=fmt0 at entry=0x392d1 "%c", ap=0x40930,
>>>> ap at entry=0x4092c)
>>>> at ../../../../../gcc-4.8.2/newlib/libc/stdio/vfprintf.c:1776
>>>> #15 0x00032aec in fiprintf (fp=0x3dd68, fmt=0x392d1 "%c")
>>>> at ../../../../../gcc-4.8.2/newlib/libc/stdio/fiprintf.c:50
>>>> #16 0x00002f28 in Test_task (unused=1)
>>>> at ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:43
>>>> #17 0x00031ddc in _Thread_Handler ()
>>>> at ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>>>> ---Type <return> to continue, or q <return> to quit---
>>>> #18 0x00031d64 in _User_extensions_Thread_exitted (
>>>> executing=0x3d92c)
>>>> at ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/score/userextimpl.h:243
>>>> Backtrace stopped: frame did not save the PC
>>>> "
>>>>
>>>> This problem does not happen with printk, because non of these newlib
>>>> stuff is called and consequently rtems_semaphore_obtain() is not
>>>> called after context switches and/or _ISR_Handler.
>>> printk is simple and may not be accessing memory in the same way. It
>>> also may
>>> be simple enough that an issue with incorrect register constraints on inline
>>> assembly aren't blowing it up.
>>>>
>>>> On Tue, Aug 19, 2014 at 7:52 PM, Gedare Bloom <gedare at rtems.org> wrote:
>>>>> Submit the revised patch.
>>>>>
>>>>> -Gedare
>>>>>
>>>>> On Tue, Aug 19, 2014 at 1:49 PM, Hesham Moustafa
>>>>> <heshamelmatary at gmail.com> wrote:
>>>>>> Hi Gedare,
>>>>>> Thanks for providing this solution, I will try to imitate these two files
>>>>>> and run the test. The fixed patch for or1ksim is ready, should i submit it
>>>>>> or wait until I check this solution and hopefully figuring out what is
>>>>>> wrong?
>>>>>>
>>>>>> On Aug 19, 2014 7:08 PM, "Gedare Bloom" <gedare at rtems.org> wrote:
>>>>>>> Hi Hesham,
>>>>>>>
>>>>>>> I found this advice from Sebastian in our bugzilla related to another
>>>>>>> arch (bfin) that has some context-switch problems:
>>>>>>> "In order to test the exception code I would add the functions
>>>>>>>
>>>>>>> _CPU_Context_validate()
>>>>>>> _CPU_Context_volatile_clobber(
>>>>>>> )
>>>>>>>
>>>>>>> used in this test
>>>>>>>
>>>>>>> http://git.rtems.org/rtems/tree/testsuites/sptests/spcontext01/init.c
>>>>>>>
>>>>>>> For examples please have a look at the ARM, Nios 2 or PowerPC."
>>>>>>>
>>>>>>> You may like to try this out to debug your problem.
>>>>>>> Gedare
>>> --
>>> Joel Sherrill, Ph.D. Director of Research & Development
>>> joel.sherrill at OARcorp.com On-Line Applications Research
>>> Ask me about RTEMS: a free RTOS Huntsville AL 35805
>>> Support Available (256) 722-9985
>>>
>
> --
> Joel Sherrill, Ph.D. Director of Research & Development
> joel.sherrill at OARcorp.com On-Line Applications Research
> Ask me about RTEMS: a free RTOS Huntsville AL 35805
> Support Available (256) 722-9985
>
More information about the devel
mailing list