The "Out of mbuf clusters" problem, resolved

Wed Sep 15 18:04:45 UTC 2004

In reference to my previous message, here's what I ended up doing to
"fix" it.

The deadlocked state that I was observing was caused when the RTEMS
system was doing sustained file transmission via FTP, and receiving
a mix of TCP ACKs and broadcast traffic (from chatty ms windows boxes
on our LAN).  With the default mbuf/cluster pool sizes, we quickly
run out of clusters.  (Our Ethernet driver only allocates clusters
for receive data, which makes matters even worse.)

As soon as all clusters are exhausted, the receive task goes into
its "waiting for clusters" loop.  As incoming ACKs are processed,
outbound packets are freed from the sockbuf by TCP, which frees up
some clusters.  But, there is a race condition between the receive
thread and the application writing to the socket; they both want
clusters, and the application is winning too much of the time.  So,
the incoming ACKs get lost, the outbound packets stay in the sockbuf
pending retransmission, and there we sit.

I expected that TCP would eventually time out and drop the connection,
which should bring us back to life.  It does, but manages not to free
the outbound packets from the sockbuf.  (This makes no sense to me,
as it seems to guarantee that we will leak memory if a remote client
hangs.  But, it sat there wedged for 16 hours without recovering.  
That's close enough to forever for me.)

So, I applied two fixes:

1) Deadlock recovery.  I shortened tcp_keepidle to 30 seconds, 
   tcp_keepintvl to 10 seconds, and set always_keepalive.  This
   makes the connection time out in a few minutes rather than many
   hours.  Then I modified tcp_drop() so that if the connection is
   being dropped due to timeout, both receive and send sockbufs and
   any mbufs/clusters are explicitly freed.

2) Deadlock avoidance.  To resolve the "receive thread is losing the
   fight for clusters" problem, I modified m_clalloc() to respect a
   global flag set by the receive thread when it is waiting for a
   cluster.  No one but the receive thread can get a cluster so long
   as that flag is true.

With those two changes, my application is now rock-solid even under
sustained heavy load with default pool sizes.  I can offer patches if
anyone is interested; I don't know if these changes are something 
that would be desirable to merge into RTEMS or not.

-Phil

-- 

=====================================================================
Phil Torre                               phone: 425-820-6363 x234
Design Engineer                          email: ptorre at zetron.com
Switching Systems Group                    fax: 425-820-7031
Zetron, Inc.                               web: http://www.zetron.com