Bug in termios

jennifer at oarcorp.com jennifer at oarcorp.com
Thu Oct 19 14:55:06 UTC 2006


I'm rebuilding the system with this changed to a simple binary semaphore in
order to see exactly what breaks.

Jennifer

> On Oct 19, 2006, at 9:28 AM, Joel Sherrill wrote:
>
>> jennifer at oarcorp.com wrote:
>>> I have a system with 12 serial port.  The driver is configured to be
>>> interrupt driven with one character arrival per interrupt.
>>> Furthermore,
>>> I am using 1 task which is set up to poll the serial ports in raw
>>> mode at
>>> a 10 ms rate and distribute any data that has arrived.  To do this
>>> I have
>>> set VMIN and VTIME to 0.  The problem occurs on high bandwidth
>>> devices
>>> (after they have ran a while), when the device is turned off.
>>> Turning off
>>> the device results in the system locking up for several seconds then
>>> recovering.  Analysis has shown that the read command is locking the
>>> system during this time.  It appears to me that the problem is
>>> happening
>>> is that inside of termios.c in the fillBufferQueue routine.  While
>>> the
>>> device is running there are always characters available so
>>> rawInBuff.Head
>>> and rawInBuffTail are never equal.   Then ccount is always >=  c_cc
>>> [VMIN]
>>> which results in wait being set to 0 and the semaphore never being
>>> decremented even thought it being incremented approximately 20 times
>>> during the 10 ms application task poll rate.  When the serial
>>> device is
>>> turned off and the characters are emptied out of the termios
>>> buffer, this
>>> results in a spinlock of obtaining the rawInBufSemaphore several
>>> hundred
>>> thousand times.
>>>
>> I looked at the code and believe this semaphore was intended to be
>> used
>> as a counting
>> semaphore. I think each increment is for an interrupt occurrence --
>> not
>> for a single
>> character. It is simply not decremented via obtain unless a task is
>> willing to block.
>> In Jennifer's case, this is very rare so the count is VERY high
>> when she
>> finally
>> doesn't have any data and has to way.
>>
>> I suggested that this probably needs to be a simple binary
>> semaphore but
>> she had
>> already tried that and it broke something else. She is now trying
>> to do
>> a flush on
>> this semaphore just before checking the buffer counts.
>> Technically, this
>> is a
>> condition mutex and we should be able to get away with that. You don't
>> care about
>> it until there is no data and then you want to block on it until just
>> the next interrupt.
> Right -- which is why I would have thought that a binary semaphore
> would work.    Can you describe what breaks when a binary semaphore
> is used?
>
> Flushing the semaphore count doesn't sound like a good fix to me.  I
> worry a lot about race conditions with that sort of approach.
>
>>
>> Eric.. do you see what is going on? I know it has been years but you
>> originally
>> wrote this code.
>>
> Well, I wrote some of it.  All the flow-control stuff is new.
>
> --
> Eric Norum <norume at aps.anl.gov>
> Advanced Photon Source
> Argonne National Laboratory
> (630) 252-4793
>
>
>




More information about the users mailing list