Really need some help with RTEMS networking semaphores

Wed Oct 18 18:23:02 UTC 2006

On Oct 18, 2006, at 12:45 PM, gregory.menke at gsfc.nasa.gov wrote:

>
> Eric Norum writes:
>> On Oct 18, 2006, at 11:00 AM, gregory.menke at gsfc.nasa.gov wrote:
>>
>>>
>>> I've been rewriting the Coldfire fec network driver for a week  
>>> now to
>>> try and make it stable under a significant network load and I'm
>>> running
>>> into considerable trouble with deadlocks and network semaphore  
>>> issues.
>>> The next 2 days are important, if I can't get the driver stable  
>>> then I
>>> will have to abandon the network stack and try to kludge  
>>> something up
>>> with message queues.
>>
>> This is the driver from which BSP?
>> The uC5282 driver has been pretty solid here.
>
> We took a copy of the u5282 network.c from the 4.7 CVS for our bsp.
>
>>>
>>> I have the network task priority == 1, all other tasks lower.   
>>> 256k in
>>> both the mbuf_bytecount and mbuf_cluster_bytecount.
>>>
>>> The problems mostly manifest in tcp receives by the RTEMS ftpd, but
>>> rapid UDP sends also seem to lock up the stack.
>>>
>>> The tx task always clears the tx queue; loading packets onto the  
>>> card
>>> till its full and dumping the rest.  Rx task receives packets,  
>>> once an
>>> mbuf allocation (done with M_DONTWAIT) fails, all remaining rx  
>>> packets
>>> on the card are dumped.  Thus the driver (theoretically) never
>>> queues tx
>>> buffers and will not stall the card waiting for rx mbufs.
>>
>> Having the driver throw away transmit buffers doesn't sound like a
>> good idea to me.
>
> I'm trying all options to try and keep the stack on its feet.
>
>
>>>
>>> Is it true that the rx and tx tasks can allocate and free mbuffs as
>>> needed when they have the network semaphore, OR must additional
>>> semaphore release/obtain invocations be used for each and every mbuf
>>> manipulation?
>>
>> The rule is that if a task makes calls to any of the BSD network code
>> it must ensure that it holds the semaphore.  The network receive and
>> transmit tasks are started with the semaphore held and call
>> rtems_bsdnet_event_receive to wait for an event.  This call releases
>> the semaphore, waits for an event and then reobtains the semaphore
>> before returning.   In this way the driver never has to explicitly
>> deal with the network semaphore.  By way of example, have a look  
>> at c/
>> src/lib/libbsp/m68k/uC5282/network/network.c -- there is no code that
>> manipulates the network semaphore.
>
> The driver tasks only use rtems_bsdnet_event_receive.  But for some
> reason I'm still getting the "failed to release" message.  Is there a
> way that can be triggered from m_freem()'ing a mbuf that the driver is
> finished with?
>
> Also, how should the rx task request buffers; is it OK to use  
> M_DONTWAIT
> so the rx task can dump the rx queue on an allocation failure?

Yes.

>
>
>>>
>>> Under what conditions does the stack deadlock and what can drivers
>>> do to
>>> help prevent it from doing so?
>>
>> Running out of mbufs is never a good thing.  In the UDP send case you
>> might  reduce the maximum length of the socket queue.
>
> Does that mean a too-long udp send queue can starve for mbufs &  
> deadlock
> the stack?

I suspect that this could happen, yes.

>
>
>
>>> What is the functional relationship between the mbuf_bytecount and
>>> mbuf_cluster_bytecount?
>>
>> 'regular' (small) mbufs are allocated from the pool sized by
>> mbuf_bytecount.   mbuf clusters (2k each) are allocated from the pool
>> sized by mbuf_cluster_bytecount.
>>
>>>
>>> What should their relative sizings be?
>>
>> Depends on your application.  Which type are you running out of?
>> For my EPICS applications here I've got:
>>      180*1024,              /* MBUF space */
>>      350*1024,              /* MBUF cluster space */
>
>
> How do I tell which I'm running out of?

     rtems_bsdnet_show_mbuf_stats ();

>
> I've tried everything from 64k & 128k up to 256k & 256k, some sort of
> problems in all cases.  Could you give examples of how mbuf buffer
> sizings relates to types of application?

The only example I can give is what seems to be working here.

>

You're sure that you don't have half/full duplex problems?

-- 
Eric Norum <norume at aps.anl.gov>
Advanced Photon Source
Argonne National Laboratory
(630) 252-4793

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/users/attachments/20061018/d2b8ee91/attachment-0001.html>