Determining which socket is tying up network buffers

Wed Jul 27 19:54:14 UTC 2005

We have an application running RTEMS 4.6.0pre5 that supports up to four 
connections via telnet.  Telnet clients can connect and disconnect at will 
without causing us any trouble.  If, however, the network connection to one 
of them is lost (such that we don't receive the command to close the telnet 
session and its socket), we eventually run out of network buffers, which 
causes further calls to send() on any socket to block, disrupting the other 
connections.  I think that the socket for the broken connection would 
eventually time out and free up the buffers, but we are running out of 
network buffers before that.

The problem is aggravated by the fact that we send (push) a lot of data to 
the telnet clients even when not receiving anything from them.  It is 
probably not too different in principle from logging on to a PC with 
telnet, then typing "make" and getting compiler messages back for the next 
half hour.  Because we keep sending data even after the connection is 
broken, we use up a lot of network buffers that can't get freed because 
they haven't yet been acknowledged.

We would like to find a way to close the broken socket (and its telnet 
session) when we discover that it is hogging too many network buffers.  One 
complication is that the first call to "send" that blocks for lack of an 
available network buffer may not be sending to the socket that is causing 
the problem;  it is just the first one to try sending after all of the 
buffers have been used.

A simplified version of our current telnet send thread looks like this:

         do
         {
                 // get pointer to and size of data to send - this will
                 // block until there is something to send
                 rtems_message_queue_receive();

                 call "send" on the network socket();
         } while ( socket is open );
         cleanup();

The conceptual design for one workaround looks like this:

         do
         {
                 // get pointer to and size of data to send - this will
                 // block until there is something to send
                 rtems_message_queue_receive();

                 if ( we are out of buffers )
                 {
                         if ( I am the socket that is hogging it )
                         {
                                 close my socket();
                         }
                         else
                         {
                                 // send something harmless to all telnet 
threads
                                 rtems_message_queue_send();

                                 // this will block for a while
                                 call "send" on the network socket();
                         }
                 }
                 else
                 {
                         // this shouldn't normally block
                         call "send" on the network socket();
                 }
         } while ( socket is open );
         cleanup();

The basic idea of this workaround is that if one of the telnet send threads 
can tell that it is causing the problem (hogging the buffers), it can close 
itself.  If it can tell that there is a problem (out of buffers) but it is 
not the owner of the socket that is causing the problem, it will stick 
something harmless in the rtems queues of each of the other telnet threads 
(to make sure that they will unblock soon), then call send() (which will 
block waiting for network buffers).  When each of the other telnet threads 
become unblocked, they will figure out that there is a shortage of buffers, 
check to see if they are the source of the problem, and if so, close their 
socket which should release the network buffers and close the offending 
telnet session to prevent the problem from reoccuring.

In essence this design uses a shortage of network buffers as a sort of 
timeout.  While that might not be appropriate for a desktop PC, it makes 
sense in our application, because we don't want to stop transferring data 
to the working telnet sessions while we wait for the broken socket to time out.

I have a couple of questions.  Is there a better (less complicated) way to 
address this problem?  Is it possible to implement the above workaround 
using standard networking calls?  I think "select" might provide enough 
information to evaluate whether "we are out of buffers", but I don't have 
any idea how to test a socket to see if it is the one that is hogging the 
network buffers.  It seems like the TCP/IP stack must have that information 
somewhere, but I don't know how to query it for that information.  If there 
isn't an existing function that would do the job, could someone give me a 
hint where to start looking if I want to add one?  Thanks for any suggestions.

Steve

P.S.  I realize that this design might not be thread safe, in that we might 
run out of buffer space between the first "if" and the call to send.  I 
don't think that will be a problem in our case, since all of the telnet 
threads run at the same priority and we have timeslicing turned off.

---
Steve Strobel
Link Communications, Inc.
1035 Cerise Rd
Billings, MT 59101-7378
(406) 245-5002 ext 102
(406) 245-4889 (fax)
WWW: http://www.link-comm.com
MailTo:steve at link-comm.com