MBUF Cluster Network freeze problem

Tue Nov 28 17:25:42 UTC 2000

Hello Eric, many thanks for your reply.
You may remember we all went through the 68360 buffer descriptor problems
together at the time and I don't think it's that problem again :). I quickly
checked the latest snapshot for the code around the MBUF allocate /
deallocate routines (in rtems_glue.c) and it doesn't look like its changed.
This problem seems to manifest as MBUF clusters being unable to deallocate.
I have seen the BSP slowly using up clusters after a bout of heavy network
activity has stopped (such as after dealing with 10000 flood pings). Its as
if a certain condition occurs to break it, and after that it can't recover.
If you think that its worth porting a later version I will certainly have a
go. Which is the recommended release / snapshot to work with these days?
Regards to all.
Bob Wisdom
-----Original Message-----
From: Eric_Norum at young.usask.ca [mailto:Eric_Norum at young.usask.ca]On Behalf
Of Eric Norum
Sent: 28 November 2000 15:19
To: bobwis at ascweb.co.uk
Cc: 'rtems mail-list'
Subject: Re: MBUF Cluster Network freeze problem

bob wrote:
>
> I have come up against an interesting problem when the local BSP is
> moderately task loaded and the local network is also very active talking
to
> the BSP. We are using the Mot 68360 and RTEMS 4.5 Beta
> The symptoms are that the MBUF Clusters fill up slowly over time - as if
the
> CPU can't keep up with the processing of the data. Eventually, the message
> "Still waiting for mbuf cluster." pops out from m_clalloc in rtems_glue.c.
> If the other network traffic to the BSP is stopped, to reduce the external
> network load to virtually nothing, the BSP still never recovers and seems
to
> have a no free clusters for ever (still has plenty of free MBUFS).
>
> To generate this condition we run our normal applications - a PC which
polls
> a task in the BSP and a flood ping aimed at the BSP. It takes up to a
couple
> of minutes to fall-over. However, we think a similar situation may have
> occurred in real life without needing the flood pings.
>
> I believe the network should recover and the MBUF Clusters should free up
> after the load is reduced.
>
> Anyone got any theories on what's happening, or ideas on where to start
> looking / work-arounds. I believe increasing MBUFs/Clusters will do no
good
> as I think it's the CPU overload, that causes them to fill up.
>
> What we need is a graceful recovery?
> Any help/ideas much appreciated, thanks.
>

There have been *lots* of network changes since 4.5 beta.  One of the
biggest was cleaning up a problem where things would lock up if a packet
were fragmented into more parts than the number of 68360 transmit buffer
descriptors.  I recall others having problems with flood pings and that
we made some changes to prevent these problems as well, but I don't
remember the exact details.

--
Eric Norum                                 eric.norum at usask.ca
Department of Electrical Engineering       Phone: (306) 966-5394
University of Saskatchewan                 FAX:   (306) 966-5407
Saskatoon, Canada.