MBUF Cluster Network freeze problem

Wed May 2 13:19:42 UTC 2001

>-----Original Message-----
>From: Rosimildo da Silva [mailto:rdasilva at connecttel.com]
>Sent: Wednesday, May 02, 2001 8:12 AM
>Cc: rtems-users at oarcorp.com
>Subject: Re: MBUF Cluster Network freeze problem
>
>
>From: "bob" <bobwis at asczone.com>
>To: "'Smith, Gene'" <Gene.Smith at sea.siemens.com>
>Cc: <rtems-users at oarcorp.com>
>Sent: Wednesday, May 02, 2001 2:43 AM
>Subject: RE: MBUF Cluster Network freeze problem
>
>
>> Your hypothesis/description certainly sounds credible to me. 
>After much
>> pain, we eventually came to the conclusion that the network stack was
>stable
>> and the problems were all at the application level. However, 
>pings are
>dealt
>> with in the network code itself and it is possible that this 
>consumes 100%
>> CPU, not giving the App time to empty the other stuff from 
>the MBUF pool.
>I
>> am not sure, but it could be that the ping handling runs as 
>one of the
>> networks tasks and these run at high priority compared with 
>the typical
>app
>> priority. With RTEMS hard scheduling algorithm the lower 
>priority apps
>will
>> never get a slice of CPU time.
>
>
>I have been developing a SOAP server for embedded systems,
>and it seems to trigger this problem easily under RTEMS.
>
>When I put the system under heavy load, a SOAP client make exactly 504
>requests, and "freezes" for about a minute ( client's timeout ).  This
>particular request
>times out, and system goes on for aother 504 requests. I see 
>the message
>on the RTEMS' console about running low of "MBUFS" before the freeze.
>
>This problems goes away if I do "socket connection pooling" ( resue the
>socket )
>on the client side ( using Keep-Alive of HTTP 1.1 ).
>
>I would say that for some reason the RTEMS does not "free" 
>right away the
>MBUFS on closed sockets, if the system is extremely busy.
>
>Rosimildo.
>

Rosimildo,
The message I see is actually "Still waiting for mbuf clusters" and there is
no disconnection of clients.  Also, when I see the message, my systems never
recovers and has to be reset. Did you have to reboot when your problem
occurred?

Also, when I do 'ping -f' to a similar non-rtems unit, it drops about 60% of
the pings and slows down greatly from its primary task, but it does not
lock-up like by rtems-based unit does.  The rtems-based unit seem to devote
all its energy to responding to pings and stops doing its primary task
entirely. When the clients (64 for them) time-out after 20 sec and re-send
their messages (while the ping -f is going on) is when the lock-up occurs.
As long as I stop the ping before the clients re-send, I do not see a
lock-up. (The clients never intentionally disconnect.)
-gene