<offlist> or1k printf causes crash

Joel Sherrill joel.sherrill at oarcorp.com
Thu Aug 21 21:54:45 UTC 2014


On 8/21/2014 4:15 PM, Hesham Moustafa wrote:
> On Thu, Aug 21, 2014 at 10:56 PM, Joel Sherrill
> <joel.sherrill at oarcorp.com> wrote:
>> On 8/21/2014 2:44 PM, Hesham Moustafa wrote:
>>> Hi,
>>>
>>> I have been debugging since a while or1k code hopefully I'd find
>>> what's wrong. Here's what I got.
>> First I am moving this to devel@ so others can chime in.
>>> First, I asked about this problem at #openrisc IRC channel, they told
>>> me the problem might be that I have to take account of the red-zone, I
>>> asked what's the red-zone and Stefan said the following:
>>> "the first 128 bytes of the stack has to be stepped over, leaf
>>> functions might use that without modifying the stack pointer, and gcc
>>> takes advantage of the fact that there is a red zone in non-leaf
>>> functions prologues too. i.e. it stores things on the stack and *then*
>>> update the stack pointer"
>> This is a bug in gcc. We have seen it on the ARM and there was a recent
>> dust up from the Linux kernel community because it happened on x86-64.
>> My understanding is that there was rework/improvement which triggered
>> bugs in backends. But this needs to be fixed.
>>
>> The sp must be updated before the memory can be used. This is just
>> a bug otherwise.
>>> He suggested that I add 128 bytes to stack pointer before I jump to
>>> _ISR_Handler (from start.S). I tried this solution and I was not
>>> lucky. You may have some ideas where/when this red-zone make problem.
>> You probably need to
>>> Second, I discovered that there is unusual (unalign) exception happens
>>> when using printf (which does not happen with printk). When I stack, I
>>> found out the problem happens in rtems_semaphore_obtain(), when trying
>>> to access the_semaphore data which its pointer is returned (invalid
>>> pointer) from a call to _Objects_Get_isr_disable(). This exception
>>> only happens after DISPATCH_NEEDED is true and _ISR_Handler jumps to
>>> _Thread_Dispatch and make a successful context switch and run the
>>> first task. The following is a snapshot of the output when
>>> encountering this problem.
>> What's the alignment of the task stack in the port? The stack may not be
>> properly aligned for the widest access of the or1k.
> If you mean the following:
> #define CPU_STACK_ALIGNMENT        0
> but even if with this macro assigned to 4 or 8, I got the same problem.
> and from linkcmds.base
> bsp_stack_align = DEFINED (bsp_stack_align) ? bsp_stack_align : 8;
Hmm.. ok .. then we need to know the instruction. 8 is normally a wide
enough alignment since that is the usually like a double or 64-bit access.
>>> "*** BEGIN OF TEST CLOCK TICK ***
>>> TA1  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>> TA2  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>> TA3  - rtems_clock_get_tod - 09:00:00   12/31/1988
>>> Fatal Error 263572 Halted"
>> Can you tell what the instruction is? And the address it is trying to
>> access.
> The _Objects_Get_isr_disable() function returns a weird address for
> Object (which in tern should be the_semaphore), this address is
> 0x8007, it seems like the value of the SR register. All previous
> Object/the_semaphore addresses returned from
> _Objects_Get_isr_disable() are higher addresses, that's why I indicate
> that the last (0x8007) Object address is invalid.
_Objects_Get_isr_disable() will return an address from the RTEMS Workspace
which would tend to be a higher RAM address.

Random thought. Temporarily disable the "real hardware" clock tick driver
in your BSP and add the simulated clock tick driver. See h8sim BSP's
Makefile
for an example. We need to eliminate that your ISR code is doing the
right thing. You could be getting an interrupt at the wrong time and
just clobbering a register. Doing this will let the test run without
interrupts.

What is the value of _Watchdog_Ticks_since_boot at this fault?

>>> I set a break point at  a call to _Objects_Get_isr_disable() and
>>> continued until the call that returns the invalid Object pointer, and
>>> typed bt to get the following stack:
>> Another possibility is that the register/memory constraints on
>> enable/disable
>> interrupts isn't right and it is confusing gcc. You could be randomly
>> clobbering
>> registers anytime ISRs are disabled/enabled.
>>
>> Christian.. can you review that code?
>>> "
>>> #0  _Objects_Get_isr_disable (
>>>     information=0x3ba54 <_Semaphore_Information>,
>>>     id=436273156, location=0x406b4, level_p=0x406b0)
>>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/objectgetisr.c:34
>>> #1  0x00014294 in _Semaphore_Get_interrupt_disable (
>>>     id=436273156, location=0x406b4, level=0x406b0)
>>>     at ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/rtems/semimpl.h:196
>>> #2  0x000142e0 in rtems_semaphore_obtain (id=436273156,
>>>     option_set=0, timeout=0)
>>>     at ../../../../../../rtems/c/src/../../cpukit/rtems/src/semobtain.c:47
>>> #3  0x0000d648 in rtems_termios_write (arg=0x40730)
>>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/termios.c:1099
>>> #4  0x00004380 in console_write (major=0, minor=0,
>>>     arg=0x40730)
>>>     at ../../../../../../../../rtems/c/src/lib/libbsp/or1k/or1ksim/../../shared/console_write.c:42
>>> #5  0x00031cc4 in rtems_io_write (major=0, minor=0,
>>>     argument=0x40730)
>>>     at ../../../../../../rtems/c/src/../../cpukit/sapi/src/---Type
>>> <return> to continue, or q <return> to quit---
>>> iowrite.c:37
>>> #6  0x000305f0 in rtems_deviceio_write (iop=0x46a30,
>>>     buf=0x4088c, nbyte=1, major=0, minor=0)
>>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/sup_fs_deviceio.c:109
>>> #7  0x0002fc70 in device_write (iop=0x46a30,
>>>     buffer=0x4088c, count=1)
>>>     at ../../../../../../rtems/c/src/../../cpukit/libfs/src/imfs/deviceio.c:90
>>> #8  0x00038f14 in write (fd=2, buffer=0x4088c, count=1)
>>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/write.c:48
>>> #9  0x00038d54 in _write_r (ptr=0x3db40, fd=2,
>>>     buf=0x4088c, nbytes=1)
>>>     at ../../../../../../rtems/c/src/../../cpukit/libcsupport/src/write_r.c:41
>>> #10 0x00033198 in __swrite (ptr=0x3db40, cookie=0x3dd68,
>>>     buf=0x4088c "T\004\b\220", n=1)
>>>     at ../../../../../gcc-4.8.2/newlib/libc/stdio/stdio.c:97
>>> #11 0x000357c0 in __sfvwrite_r (ptr=0x3db40, fp=0x3dd68,
>>>     uio=0x40840)
>>>     at ../../../../../gcc-4.8.2/newlib/libc/stdio/fvwrite.c---Type
>>> <return> to continue, or q <return> to quit---
>>> :99
>>> #12 0x000338a0 in __sprint_r (ptr=ptr at entry=0x3db40,
>>>     fp=fp at entry=0x3dd68, uio=uio at entry=0x40840)
>>>     at ../../../../../gcc-4.8.2/newlib/libc/stdio/vfprintf.c:437
>>> #13 0x000345e0 in __sprint_r (uio=0x40840, fp=0x3dd68,
>>>     ptr=0x3db40)
>>>     at ../../../../../gcc-4.8.2/newlib/libc/stdio/vfprintf.c:1776
>>> #14 _vfiprintf_r (data=0x3db40, fp=fp at entry=0x3dd68,
>>>     fmt0=fmt0 at entry=0x392d1 "%c", ap=0x40930,
>>>     ap at entry=0x4092c)
>>>     at ../../../../../gcc-4.8.2/newlib/libc/stdio/vfprintf.c:1776
>>> #15 0x00032aec in fiprintf (fp=0x3dd68, fmt=0x392d1 "%c")
>>>     at ../../../../../gcc-4.8.2/newlib/libc/stdio/fiprintf.c:50
>>> #16 0x00002f28 in Test_task (unused=1)
>>>     at ../../../../../../../rtems/c/src/../../testsuites/samples/ticker/tasks.c:43
>>> #17 0x00031ddc in _Thread_Handler ()
>>>     at ../../../../../../rtems/c/src/../../cpukit/score/src/threadhandler.c:192
>>> ---Type <return> to continue, or q <return> to quit---
>>> #18 0x00031d64 in _User_extensions_Thread_exitted (
>>>     executing=0x3d92c)
>>>     at ../../cpukit/../../../or1k_or1ksim/lib/include/rtems/score/userextimpl.h:243
>>> Backtrace stopped: frame did not save the PC
>>> "
>>>
>>> This problem does not happen with printk, because non of these newlib
>>> stuff is called and consequently rtems_semaphore_obtain() is not
>>> called after context switches and/or _ISR_Handler.
>> printk is simple and may not be accessing memory in the same way. It
>> also may
>> be simple enough that an issue with incorrect register constraints on inline
>> assembly aren't blowing it up.
>>>
>>> On Tue, Aug 19, 2014 at 7:52 PM, Gedare Bloom <gedare at rtems.org> wrote:
>>>> Submit the revised patch.
>>>>
>>>> -Gedare
>>>>
>>>> On Tue, Aug 19, 2014 at 1:49 PM, Hesham Moustafa
>>>> <heshamelmatary at gmail.com> wrote:
>>>>> Hi Gedare,
>>>>> Thanks for providing this solution, I will try to imitate these two files
>>>>> and run the test. The fixed patch for or1ksim is ready, should i submit it
>>>>> or wait until I check this solution and hopefully figuring out what is
>>>>> wrong?
>>>>>
>>>>> On Aug 19, 2014 7:08 PM, "Gedare Bloom" <gedare at rtems.org> wrote:
>>>>>> Hi Hesham,
>>>>>>
>>>>>> I found this advice from Sebastian in our bugzilla related to another
>>>>>> arch (bfin) that has some context-switch problems:
>>>>>> "In order to test the exception code I would add the functions
>>>>>>
>>>>>> _CPU_Context_validate()
>>>>>> _CPU_Context_volatile_clobber(
>>>>>> )
>>>>>>
>>>>>> used in this test
>>>>>>
>>>>>> http://git.rtems.org/rtems/tree/testsuites/sptests/spcontext01/init.c
>>>>>>
>>>>>> For examples please have a look at the ARM, Nios 2 or PowerPC."
>>>>>>
>>>>>> You may like to try this out to debug your problem.
>>>>>> Gedare
>> --
>> Joel Sherrill, Ph.D.             Director of Research & Development
>> joel.sherrill at OARcorp.com        On-Line Applications Research
>> Ask me about RTEMS: a free RTOS  Huntsville AL 35805
>> Support Available                (256) 722-9985
>>

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel.sherrill at OARcorp.com        On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available                (256) 722-9985




More information about the devel mailing list