Network problem - header checksum error -

Mon Apr 17 17:04:28 UTC 2006

It suggests to me a driver bug where perhaps DMA is screwing up buffers
someplace or maybe there are problems manipulating mbufs leading to
extra or too few packet bytes.  The stack is also extremely fragile with
respect to how mbufs are handled- not much sanity checking goes on and
its easy to screw up both the processing and the related arithmetic.
Whoever thought up the mbufs had a bit too much fun for too long while
at University, I think.

I never noticed checksum errors coming from the stack or sensitivity to
delay in the dec21140 or the elnk drivers.

I also didn't see any cache issues on the PPC though the elnk is not
affected because its IO space.  The ppc cache is not supposed to cover
the PCI memory space- and I've not seen PCI cache issues happen with the
dec21140 or the proprietary boards we have here either.

Regards,

Greg

Joel Sherrill writes:
 > 
 > 
 > Steve Hunt wrote:
 > 
 > >in_cksum() does work (without the delay).
 > >
 > >
 > >On Mon, 2006-04-17 at 15:29 +0200, Steve Hunt wrote:
 > >  
 > >
 > >>I have found that by changing file
 > >>cpukit/libnetworking/netinet/ip_output.c - by adding a delay just before
 > >>in_cksum_hdr(ip) - the checksum is correctly calculated and everything
 > >>works!!!! so perhaps the header is still being changed by the driver at
 > >>that time? or is everything in the same thread? perhaps 'ip' is pointing
 > >>directly to the hardware and the registers are not stable???.
 > >>
 > >>    
 > >>
 > Is there any possible way you are seeing some type of caching effect?  
 > PowerPC
 > systems sometimes get caching issues with NIC drivers in PCI memory space. 
 > 
 > >>Very strange ...  but I have not have had time to investigate further
 > >>yet - but I will also see if using in_cksum() in place of in_cksum_hdr()
 > >>fixes (hides) the problem.
 > >>
 > >>As a side issue - I noticed that when my delay was by adding a 'printf'
 > >>I had a quite stable time reported by ping (~2ms) - but when I used
 > >>usleep() the time for ping to return changed in a cyclical and
 > >>predictable way from 10ms to 1mS !!!!! - would this be expected?  I will
 > >>do some more tests.
 > >>
 > >>    
 > >>
 > If the usleep() requested falls below the clock tick configured, then it 
 > can't be less than
 > that.  So a 10msec clock tick means every usleep() < 10 msec will be 
 > rounded up to 1 tick.
 > 
 > >>By the way, my target system is a pc104+ ... not very fast by modern
 > >>standards.
 > >>
 > >>    
 > >>
 > Fast enough to get into trouble. :)
 > 
 > --joel
 > 
 > >>Steve Hunt
 > >>
 > >>On Fri, 2006-04-14 at 18:15 +0200, Steve Hunt wrote:
 > >>    
 > >>
 > >>>No - it looks like only the header checksum is wrong.
 > >>>
 > >>>On Fri, 2006-04-14 at 16:30 +0200, Sylvain Prestavoine wrote:
 > >>>      
 > >>>
 > >>>>----- Original Message ----- 
 > >>>>From: "Steve Hunt" <hunt at alceli.ch>
 > >>>>To: <rtems-users at rtems.com>
 > >>>>Sent: Friday, April 14, 2006 2:44 PM
 > >>>>Subject: Network problem - header checksum error -
 > >>>>
 > >>>>
 > >>>>        
 > >>>>
 > >>>>>I am having a problem getting rtems to run.  I can boot my application
 > >>>>>on my pc104 with rtk8139 network chip using pxe boot and grub.
 > >>>>>
 > >>>>>However once my application is running (and is therefore now using the
 > >>>>>rtems network stack not the pxe or grub networking) any packets sent
 > >>>>>from my system seem to have incorrect header checksum.
 > >>>>>
 > >>>>>For instance a ping to the device reports no replies - but ethereal
 > >>>>>'sniffer' shows the reply packet arriving - but with corrupt header
 > >>>>>checksum.
 > >>>>>
 > >>>>>          
 > >>>>>
 > >>>>How are other datas, corrupted or no ?
 > >>>>Perhaps is an endianness problem ?
 > >>>>
 > >>>>--
 > >>>>-Stan
 > >>>>
 > >>>>
 > >>>>
 > >>>>        
 > >>>>
 > >
 > >  
 > >
 >