MBUF Cluster Network freeze problem
Gene.Smith at sea.siemens.com
Tue May 1 20:23:31 UTC 2001
I downloaded the latest ss and in the ChangeLog for libnetworking I could
find no mention of fixes regarding mbufs/clusters. The libnetworking
ChangeLog for 4.5.0 (which I am using) seems to contain nothing newer than
When I flood ping the unit with nothing going on, it seems to recover with
no mbuf/cluster problems reported. The problem seems to occur with lot of
clients connected and sending/rcving data. Could it be that the unit's
consumption of data is being prempted by the ping -f and is actually
experiencing the problem you describe below as "kills the entire network"?
Do you see the "still waiting for mbuf clusters" message when this problem
Guess I need more information about what has changed and when it was
From: bob [mailto:bobwis at asczone.com]
Sent: Tuesday, May 01, 2001 12:21 PM
To: Smith, Gene
Cc: rtems-users at oarcorp.com
Subject: RE: MBUF Cluster Network freeze problem
Its been a while, but I am pretty sure that with snapshot 3 at
system does recover if you fill all the MBUF's to the point of the error
message, close the connections (sender end), wait a while and then start
again. There *might* be a problem with connections left open
something where the net data is unconsumed by the RTEMS
application, I think
that the MBUFs stay allocated forever and it can soon become a
happens when the Application is waiting for something else and
network data. This is not a TCP/Stack problem as such, just an annoying
side-effect that kills the entire network.
It would be nice if there was a daemon to free MBUFs that were
too stale, or
something to prevent one "stream" from hogging the whole MBUF
pool if its
associated application had stopped consuming buffers for a while.
I am very pleased you are looking into it by further testing as
it would be
nice to state what the rules of the game really are!
Hope this helps.
bobwis at asczone.com
From: Smith, Gene [mailto:Gene.Smith at sea.siemens.com]
Sent: 01 May 2001 16:20
To: joel.sherrill at oarcorp.com
Cc: rtems-users at oarcorp.com
Subject: FW: MBUF Cluster Network freeze problem
I have been doing to final stress test to finish up my project. I have
the problem described by "bob" back in Dec (see below). What I have is
about 64 tcp
connections to the rtems unit with data flowing back and forth on all
seems to work fine. The mbufs and clusters just seem moderately used.
However, when I
flood ping the unit, I quickly start seeing the message from
waiting for mbuf clusters" which repeats about every 30
seconds. Even after
I stop the
flood ping and disconnect all clients, I still see the messages
and I never
recover. I have to reboot the unit.
I also see this when I stop my 80186 processor. The 186
receives the data
from the rtems processor (a 386) via dual port ram. The sockets are all
but no read()s are occurring when the 186 stops, so data backs up and
the mbuf clusters which causes the "Still waiting..." messages to occur.
Also, in this
situation I have to reboot. I can see this with just one connection and
is not needed to trigger it.
"bob" seem to indicate that possibly this had been corrected in
but it is somewhat unclear from the postings. Do you or Eric know the
status of this
problem? It seems like the systems should recover from mbuf cluster
depletion. I am
From: bob [mailto:bobwis at ascweb.co.uk]
Sent: Friday, December 15, 2000 8:33 AM
To: Eric_Norum at young.usask.ca
Cc: 'rtems mail-list'
Subject: MBUF Cluster Network freeze problem
Hello Eric / RTEMS users
I have been testing again this morning (snapshot 20001201) and it is all
looking very positive. I can now confirm that we don't need the
a close, to empty the data to save MBUFs. I am fairly certain
this was not
always the case, but I could be wrong. The MBUF pool also seems
to cope with
the cable being pulled - that is, it recovers the used MBUFs
all by itself
after the timeout has occurred.
The only problem we are seeing now is not a BSD stack problem
as such, its
when the task servicing the open socket stops calling read
(because it has
frozen). The open socket still allows incoming data into free
the clusters and locks up the lot after a while. The only
recovery seems to
be a system reset. While the MBUF clusters are filling, the master
application task still allows accept(), to spawn new tasks and
so the "big lockup" comes quite a while after this. This had us
going for a
To conclude, the TCP Stack looks very solid again, now that we
the problems to our application.
Thanks again for all your help.
More information about the users