Bug in termios

jennifer at oarcorp.com jennifer at oarcorp.com
Thu Oct 19 15:24:57 UTC 2006


Changing to a simple binary semaphore works.  What I had checked before was
changing it to a binary semaphore, not a simple binary.  We are going to
run over the weekend, but this appears to have solved the problem.


> I'm rebuilding the system with this changed to a simple binary semaphore
> in
> order to see exactly what breaks.
> Jennifer
>> On Oct 19, 2006, at 9:28 AM, Joel Sherrill wrote:
>>> jennifer at oarcorp.com wrote:
>>>> I have a system with 12 serial port.  The driver is configured to be
>>>> interrupt driven with one character arrival per interrupt.
>>>> Furthermore,
>>>> I am using 1 task which is set up to poll the serial ports in raw
>>>> mode at
>>>> a 10 ms rate and distribute any data that has arrived.  To do this
>>>> I have
>>>> set VMIN and VTIME to 0.  The problem occurs on high bandwidth
>>>> devices
>>>> (after they have ran a while), when the device is turned off.
>>>> Turning off
>>>> the device results in the system locking up for several seconds then
>>>> recovering.  Analysis has shown that the read command is locking the
>>>> system during this time.  It appears to me that the problem is
>>>> happening
>>>> is that inside of termios.c in the fillBufferQueue routine.  While
>>>> the
>>>> device is running there are always characters available so
>>>> rawInBuff.Head
>>>> and rawInBuffTail are never equal.   Then ccount is always >=  c_cc
>>>> [VMIN]
>>>> which results in wait being set to 0 and the semaphore never being
>>>> decremented even thought it being incremented approximately 20 times
>>>> during the 10 ms application task poll rate.  When the serial
>>>> device is
>>>> turned off and the characters are emptied out of the termios
>>>> buffer, this
>>>> results in a spinlock of obtaining the rawInBufSemaphore several
>>>> hundred
>>>> thousand times.
>>> I looked at the code and believe this semaphore was intended to be
>>> used
>>> as a counting
>>> semaphore. I think each increment is for an interrupt occurrence --
>>> not
>>> for a single
>>> character. It is simply not decremented via obtain unless a task is
>>> willing to block.
>>> In Jennifer's case, this is very rare so the count is VERY high
>>> when she
>>> finally
>>> doesn't have any data and has to way.
>>> I suggested that this probably needs to be a simple binary
>>> semaphore but
>>> she had
>>> already tried that and it broke something else. She is now trying
>>> to do
>>> a flush on
>>> this semaphore just before checking the buffer counts.
>>> Technically, this
>>> is a
>>> condition mutex and we should be able to get away with that. You don't
>>> care about
>>> it until there is no data and then you want to block on it until just
>>> the next interrupt.
>> Right -- which is why I would have thought that a binary semaphore
>> would work.    Can you describe what breaks when a binary semaphore
>> is used?
>> Flushing the semaphore count doesn't sound like a good fix to me.  I
>> worry a lot about race conditions with that sort of approach.
>>> Eric.. do you see what is going on? I know it has been years but you
>>> originally
>>> wrote this code.
>> Well, I wrote some of it.  All the flow-control stuff is new.
>> --
>> Eric Norum <norume at aps.anl.gov>
>> Advanced Photon Source
>> Argonne National Laboratory
>> (630) 252-4793

More information about the users mailing list