Determining which socket is tying up network buffers

Wed Jul 27 22:09:00 UTC 2005

Steve Strobel wrote:
> We have an application running RTEMS 4.6.0pre5 that supports up to four 
> connections via telnet.  Telnet clients can connect and disconnect at 
> will without causing us any trouble.  If, however, the network 
> connection to one of them is lost (such that we don't receive the 
> command to close the telnet session and its socket), we eventually run 
> out of network buffers, which causes further calls to send() on any 
> socket to block, disrupting the other connections.  I think that the 
> socket for the broken connection would eventually time out and free up 
> the buffers, but we are running out of network buffers before that.
> 

The wiki has some details on setting up the stacks various parameters. 
There may be more parameters that can be tunned to help.

  http://www.rtems.org/phpwiki/index.php/TCP-IP%20Setup

> The problem is aggravated by the fact that we send (push) a lot of data 
> to the telnet clients even when not receiving anything from them.  It is 
> probably not too different in principle from logging on to a PC with 
> telnet, then typing "make" and getting compiler messages back for the 
> next half hour.  Because we keep sending data even after the connection 
> is broken, we use up a lot of network buffers that can't get freed 
> because they haven't yet been acknowledged.
> 

What about lowering the buffer sizes using SO_SNDBUF and SO_RCVBUF 
socket options ?

I do not think I have played with this setting in RTEMS. I would make 
the size small, say 2k and see what happens.

You may wish to turn linger off so you do not wait for the linger 
timeout when the socket is closing.

You need to make sure your current stack memory sizes are suitable for 
your application. I have found differing network loads have different 
needs.  Some things to consider are the socket receive buffer sizes as 
this sets the tcp acknowledge window size. The send queue size times the 
number of sending sockets and finally the queue length for the network 
driver.

The stack does like memory so if you can afford it I would give it lots. 
What is the configuration ?

> We would like to find a way to close the broken socket (and its telnet 
> session) when we discover that it is hogging too many network buffers.  
> One complication is that the first call to "send" that blocks for lack 
> of an available network buffer may not be sending to the socket that is 
> causing the problem;  it is just the first one to try sending after all 
> of the buffers have been used.

If you are going to close a socket which has a writer blocked with 
another thread you will need my patch or your application will crash. 
The patch is still being worked on but I am happy to provide if people 
would like to try it.

With this patch you can monitor with a single thread and clean up. I 
have an application that does this.

> The basic idea of this workaround is that if one of the telnet send 
> threads can tell that it is causing the problem (hogging the buffers), 
> it can close itself.

How do you tell it is hogging the buffers and the cause of the problem 
or just connected to a slow host ?

> If it can tell that there is a problem (out of 
> buffers) but it is not the owner of the socket that is causing the 
> problem, it will stick something harmless in the rtems queues of each of 
> the other telnet threads (to make sure that they will unblock soon), 
> then call send() (which will block waiting for network buffers).  When 
> each of the other telnet threads become unblocked, they will figure out 
> that there is a shortage of buffers, check to see if they are the source 
> of the problem, and if so, close their socket which should release the 
> network buffers and close the offending telnet session to prevent the 
> problem from reoccuring.

What if the other threads are blocked in send ?

> 
> I have a couple of questions.  Is there a better (less complicated) way 
> to address this problem?

Giving the stack more memory, plus trying to lower the socket buffer 
sizes would be my first phase in finding a solution.

>  Is it possible to implement the above 
> workaround using standard networking calls?  I think "select" might 
> provide enough information to evaluate whether "we are out of buffers", 
> but I don't have any idea how to test a socket to see if it is the one 
> that is hogging the network buffers.  It seems like the TCP/IP stack 
> must have that information somewhere, but I don't know how to query it 
> for that information.  If there isn't an existing function that would do 
> the job, could someone give me a hint where to start looking if I want 
> to add one?  Thanks for any suggestions.

You can use the sysctl API to get stats on the state of the stack. This 
is what NetSNMP uses. You can also check the mbuf and related counters 
directly to see when you are in trouble.

-- 
  Chris Johns