Really need some help with RTEMS networking semaphores

Wed Oct 18 16:00:42 UTC 2006

I've been rewriting the Coldfire fec network driver for a week now to
try and make it stable under a significant network load and I'm running
into considerable trouble with deadlocks and network semaphore issues.
The next 2 days are important, if I can't get the driver stable then I
will have to abandon the network stack and try to kludge something up
with message queues.

The rx and tx tasks both sleep on rtems_bsdnet_event_receive().  When
they respectively awake, now having the semaphore, they both perform
their expected tasks, and then go back to sleep on the event again; rx
task allocates mbufs & sends them to ether_receive, tx task copies off
the mbuf data to the transmit buffers, etc..  This is all fine when the
I/O rate is low, but when it picks up, then the stack has issues, it
either silently deadlocks, prints that its waiting for mbufs (waits
forever, printing that its waiting every few secs), or says it can't
release the network semaphore and also deadlocks.

Version is RTEMS 4.6.6

I have the network task priority == 1, all other tasks lower.  256k in
both the mbuf_bytecount and mbuf_cluster_bytecount.

The problems mostly manifest in tcp receives by the RTEMS ftpd, but
rapid UDP sends also seem to lock up the stack.

The tx task always clears the tx queue; loading packets onto the card
till its full and dumping the rest.  Rx task receives packets, once an
mbuf allocation (done with M_DONTWAIT) fails, all remaining rx packets
on the card are dumped.  Thus the driver (theoretically) never queues tx
buffers and will not stall the card waiting for rx mbufs.

The questions that I'd <really> like to find answers to are;

Are there SPECIFIC guidelines anywhere about how driver rx & tx tasks
may and may not work with mbufs and how the network semaphore is to be
used?  The Network supplement is very general and quite old at this
point.

Is it true that the rx and tx tasks can allocate and free mbuffs as
needed when they have the network semaphore, OR must additional
semaphore release/obtain invocations be used for each and every mbuf
manipulation?

Under what conditions does the stack deadlock and what can drivers do to
help prevent it from doing so?

What is the functional relationship between the mbuf_bytecount and
mbuf_cluster_bytecount?

What should their relative sizings be?

Thanks,

Greg