Bug in termios

Eric Norum norume at aps.anl.gov
Thu Oct 19 14:44:55 UTC 2006

On Oct 19, 2006, at 9:28 AM, Joel Sherrill wrote:

> jennifer at oarcorp.com wrote:
>> I have a system with 12 serial port.  The driver is configured to be
>> interrupt driven with one character arrival per interrupt.   
>> Furthermore,
>> I am using 1 task which is set up to poll the serial ports in raw  
>> mode at
>> a 10 ms rate and distribute any data that has arrived.  To do this  
>> I have
>> set VMIN and VTIME to 0.  The problem occurs on high bandwidth  
>> devices
>> (after they have ran a while), when the device is turned off.   
>> Turning off
>> the device results in the system locking up for several seconds then
>> recovering.  Analysis has shown that the read command is locking the
>> system during this time.  It appears to me that the problem is  
>> happening
>> is that inside of termios.c in the fillBufferQueue routine.  While  
>> the
>> device is running there are always characters available so  
>> rawInBuff.Head
>> and rawInBuffTail are never equal.   Then ccount is always >=  c_cc 
>> [VMIN]
>> which results in wait being set to 0 and the semaphore never being
>> decremented even thought it being incremented approximately 20 times
>> during the 10 ms application task poll rate.  When the serial  
>> device is
>> turned off and the characters are emptied out of the termios  
>> buffer, this
>> results in a spinlock of obtaining the rawInBufSemaphore several  
>> hundred
>> thousand times.
> I looked at the code and believe this semaphore was intended to be  
> used
> as a counting
> semaphore. I think each increment is for an interrupt occurrence --  
> not
> for a single
> character. It is simply not decremented via obtain unless a task is
> willing to block.
> In Jennifer's case, this is very rare so the count is VERY high  
> when she
> finally
> doesn't have any data and has to way.
> I suggested that this probably needs to be a simple binary  
> semaphore but
> she had
> already tried that and it broke something else. She is now trying  
> to do
> a flush on
> this semaphore just before checking the buffer counts.
> Technically, this
> is a
> condition mutex and we should be able to get away with that. You don't
> care about
> it until there is no data and then you want to block on it until just
> the next interrupt.
Right -- which is why I would have thought that a binary semaphore  
would work.    Can you describe what breaks when a binary semaphore  
is used?

Flushing the semaphore count doesn't sound like a good fix to me.  I  
worry a lot about race conditions with that sort of approach.

> Eric.. do you see what is going on? I know it has been years but you
> originally
> wrote this code.
Well, I wrote some of it.  All the flow-control stuff is new.

Eric Norum <norume at aps.anl.gov>
Advanced Photon Source
Argonne National Laboratory
(630) 252-4793

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20061019/85537fb3/attachment-0001.html>

More information about the users mailing list