Network Stress Testing Questions (ss-20021118)

Bob Wisdom bobwis at ascweb.co.uk
Mon Dec 2 17:41:17 UTC 2002


Hello all again,
I have recently trying to track down an issue on our gen68360 BSP where the
network operation appears to become sporadic for maybe 30 or more seconds at
a time, and occasionally not ever recovering and needing the dreaded reset.
Unfortunately, as many of you know, I am not a RTEMS or Network (or 'C')
specialist, but I would be very interested if my findings make sense to
those who are, and if there are any comments or suggestions anyone might
have to improve matters - apart from get another guy to do my job!.

>From my investigation it looks like a scenario develops when the input can
get saturated with socket opens. That is, there is one socket in the listen
state at the RTEMS side, and the remote side runs several tasks which
continually open and close new connections to that listening socket (and
each sends about a hundred bytes which gets echoed back by the RTEMS
application before closing).

It seems that the network listen queue quite correctly reaches its limit of
pending connections and then attempts to automatically deallocate new half
open temporary sockets. However, because the input is so busy some of the
appropriate action might be incomplete and the heap gradually fills up (I
have about 300K heap). Eventually, the heap is full and the network attempts
to reject new incoming socket opens because the heap is full, but again
can't complete it due to the input being so busy. Please note we don't run
out of MBUFs or Clusters here.

I don't know if the heap being "full" prevents other network operations on
its own. I have tried mostly filling the heap artificially, but the network
seems to still work alright, albeit at a reduced throughput capacity.

In /libnetworking/netinet/ip_input.c I tried making ipintr() return to the
networkDaemon() found in /libnetworking/rtems/rtems_glue.c after a maximum
number of consecutive input packets to attempt to give time to the other
duties of the daemon (I guess these are to allow it to send packets, and
clean up its timed actions etc), and this seems to mostly fix the problem -
I haven't yet been able to get it to behave unpredictably anyway. We still
manage to eventually fill the heap up though which is quite worrying - I
think it might take a little longer to fill than before.

Any comments would be greatly appreciated, I have been struggling with this
for several weeks now.

Thanks to all, and thanks for your patience..
Bob Wisdom (UK)

p.s. As an aside, does anyone else get malloc / free to fail with a nasty
corruption if the #define MALLOC_STATS  is switched on in the malloc.c file?
I seem to get the failure even using the standard malloctest suite that
comes with RTEMS built as a standalone application. Maybe the #define is not
to be used in this way?




More information about the users mailing list