General PTY and CONSOLE question--SOLVED

Mon Sep 22 00:23:19 UTC 2008

Gene Smith wrote, On 09/21/2008 03:15 PM:
> Joel Sherrill wrote, On 09/20/2008 05:27 PM:
>> Gene Smith wrote:
>>> Joel Sherrill wrote:
>>>   
>>>> Thomas Doerfler wrote:
>>>>     
>>>>> Gene,
>>>>>
>>>>> during startup RTEMS tries to open "dev/console". If you don't a device
>>>>> of this name, you will run into trouble. PTY will be initialized later
>>>>> (AFAIK), so this can't be a replacement.
>>>>>
>>>>>
>>>>>       
>>>> Depends on the RTEMS version.  This is the
>>>> progression right:
>>>>
>>>> + in 4.6 /dev/console was required and if open() on it
>>>>     failed, it was a fatal error.
>>>> + in 4.7.1, the open could fail without causing a fatal error.
>>>>    This made the console optional but open() always got
>>>>    pulled in.
>>>> + In 4.8, this routine was refactored so there could be
>>>>    a "no-console.c"  optional stub to avoid open() being
>>>>    referenced.
>>>>     
>>> I'm still at 4.8 right now but having a problem when I don't include
>>> CONSOLE_DRIVER_TABLE_ENTRY in my Device_drivers[] array. I don't see
>>> open() being called since dbgu_init() is not called (and dbgu_init() is
>>> called when CONSOLE_DRIVER_TABLE_ENTRY present). However, I can see
>>> RTEMS detect a fatal error of some sort and shut down. Also, can't
>>> telnet to RTEMS. Only when I include the CONSOLE... driver entry does
>>> RTEMS keep running and I can telnet in.
>>>
>>> My BSP (modified csb337) still includes the uart code (in dbgu.c,
>>> customized for my uart) but it is basically a NOP right now. The
>>> dbgu_init() function does nothing to the h/w. The dbgu_write() just
>>> returns 1 (indicating the char was sent) while dbgu_read() just returns
>>> -1 (indicating that nothing received). Those are the only functions that
>>> touch the uart so print[fk] must call them.
>>>
>>>   
>> Can you dump the print[fk] output into a ring buffer so we
>> can look at it?
> 
> What I did was move the dbgu_init() uart setup actions to dbgu_write() 
> and do this once on only the first printk char (using a flag "uart_up" 
> to enforce a one-shot uart init). This allow the uart to still be 
> initialize even when "CONSOLE_DRIVE_TABLE_ENTRY" is not in the 
> Device_drivers[] array. So with this I can still see the printk messages 
> in gtkterm. I also changed all printf's in my app to printk's.
> 
> What I am seeing is that it gets past all the initializations including 
>   networking, telnetd and my primary app init but fails when freeing 
> memory like this:
> 
> Program heap: free of bad pointer 2031D150 -- range 202CA800 - 21000000
> EXECUTIVE SHUTDOWN! Any key to reboot...
> 
> This printk occurs at malloc.c:495 when freeing memory. If I set a bp at 
> this line 495 and run again, I see this in gdb:
> 
> (arm-gdb) b malloc.c:495
> Breakpoint 1 at 0x201b5ef8: file 
> ../../../../../../rtems-4.8.0/c/src/../../cpukit/libcsupport/src/malloc.c, 
> line 495.
> (arm-gdb) c
> Continuing.
> 
> Breakpoint 1, free (ptr=
> During symbol reading, incomplete CFI data; unspecified registers (e.g., 
> r0) at 0x201b5e7c.
> 0x0) at 
> ../../../../../../rtems-4.8.0/c/src/../../cpukit/libcsupport/src/malloc.c:495
> 495         printk( "Program heap: free of bad pointer %p -- range %p - 
> %p \n",
> Current language:  auto; currently c
> (arm-gdb) bt
> #0  free (ptr=0x2031d150) at 
> ../../../../../../rtems-4.8.0/c/src/../../cpukit/libcsupport/src/malloc.c:495
> #1  0x201cd798 in rtems_bsdnet_free (addr=0x2031d150, type=3) at 
> ../../../../../../rtems-4.8.0/c/src/../../cpukit/libnetworking/rtems/rtems_glue.c:137
> #2  0x201e5cac in sofree (so=0x201ce8b8) at 
> ../../../../../../rtems-4.8.0/c/src/../../cpukit/libnetworking/kern/uipc_socket.c:156
> #3  0x02000000 in ?? ()
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
> 
> (arm-gdb)
> 
> So it appears that not having a CONSOLE driver causes a problem in the 
> networking code. The call to function in uipc_socket.c is sofree() which 
> calls the FREE macro. Again, if I keep the CONSOLE driver in the driver 
> array, I have no problems. (Note: my JTAG debugger is a bit flaky, so 
> the indication above of a corrupt stack may or may not be valid.)
> 
> Also, just realized that the printk at the bp is not a fatal error in 
> itself, so I stepped on from there with debugger. What I see is that the 
> stack from item #3 up is actually:
> 
> #3 soclose
> #4 rtems_bsdnet_close
> #5 close
> #6 rtems_task_telnetd
> 
> For some reason, rtems_task_telnetd seems to be coming out of its 
> endless loop and calling close(). When it terminates the fatal error is 
> flagged (after going into rtems_task_delete). I have no idea why telnetd 
> is terminating or is terminated. I have not at this time tried to make a 
> telnet connection.
> 
>>> I can't really tell exactly where the RTEMS shutdown (rtems_panic
>>> called) is occurring because I have no serial port printing and my debug
>>> environment is still kind of flaky. However, I think it might be
>>> occurring during a printf from what I can tell (but not necessarily the
>>> 1st printf). Also, while stepping with gdb in a printf call, somehow I
>>> *seemed* to end up in some networking/socket code (this is with console
>>> driver disabled). Like I said, my debugger is flaky so not sure if this
>>> was real or not.
>>>
>>>   
>> Do you get a backtrace?
>>
>> If you call printf from the wrong context (ISR or dispatching
>> disabled), you can get a fatal error.

Well, I spent *way* too much time trying to figure this one out but I 
think I have. I think it is my fault for calling rtems_panic in my user 
code when my kbhit() function (discussed recently in another thread) 
failed internally with no CONSOLE driver enabled. You end up with a 
stack like this:

Kbhit()  <--- my function
rtems_panic
rtems_verror
_exit
newlibc_exit
close(stdin/stdout/stderr)
soisdisconnected   <--sets SS_CANTRCVMORE bit that accept() fails on.

The error I was actually seeing was in the telnetd server failing on 
accept() due to so_state flag SS_CANTRCVMORE being set on socket number 
0. The close of stdin (also 0) is what sets this bit which caused the 
accept to fail (kind of a race condition).

I guess my only question might be is it correct for newlibc_exit() to 
call close on stdin, stdout and stderr when there is no console driver 
and these FDs are possibly now sockets? Possibly since the systems is 
going down it does not matter?

Also, another issue for me is the fact that perror() does nothing 
without a console. (The accept() error in telnetd did not print an 
error.) Possibly perror should revert to using printk when no console is 
present.

Anyhow, I will not call rtems_panic() from a user program!

>>
>> --joel
>>>> + Basically the same in 4.9.
>>>>     
>>>>> Maybe the NULL device can help you as a dummy replacement of the console?
>>>>>
>>>>>
>>>>>       
>>>> When you had to have /dev/console, you could link /dev/console
>>>> to /dev/null and get by.  Now you only need to do that if you print
>>>> to stdout and assume that it is automatically setup for you.
>>>>
>>>> --joel
>>>>     
>>>>> wkr,
>>>>> Thomas.
>>>>>
>>>>>
>>>>> Gene Smith wrote:
>>>>>
>>>>>       
>>>>>> Does a PTY driver require the console driver? When I remove
>>>>>> CONSOLE_DRIVER_TABLE_ENTRY from the Device_drivers[] array I get no pty
>>>>>> login. Are they somehow tied together? Examples I have seen on the list
>>>>>> have the both.
>>>>>>
>>>>>> I need to run with my UART h/w disabled (I can't have it on, I need its
>>>>>> I/O ports for other functions). However I would still like the PTY
>>>>>> login/shell to be working (for now).
>>>>>>
>>>>>> The other problem also might be that print[kf] calls are still present
>>>>>> in the code. printk still tries to send chars to the uart even when
>>>>>> CONSOLE is gone. I am not sure about printf.
>>>>>>
>>>>>> Thanks,
>>>>>> -gene
>>>>>>         
> 
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>