shadow stream TFTP session

Fri Mar 26 08:37:28 UTC 2021

Hello,

in one system I have experienced a very strange TFTP driver problem.

My i.MX7 based software tries to read a file from an external TFTP 
server, which works fine in most system constellations, except for one 
with a slightly slower network connection (100MBit instead of 1GBit).

On that system, we have the following TFTP packet sequence:

#01 RTEMS  -> SERVER TFTP Read request (file open)
#02 SERVER -> RTEMS  TFTP DATA Block #1
#03 RTEMS  -> SERVER TFTP Read request (file open)
#04 SERVER -> RTEMS  TFTP DATA Block #1
#05 RTEMS  -> SERVER TFTP ACK for Block #1
#06 SERVER -> RTEMS  TFTP DATA Block #2
#07 RTEMS  -> SERVER TFTP ACK for Block #1
#08 SERVER -> RTEMS  TFTP DATA Block #2
#09 RTEMS  -> SERVER TFTP ACK for Block #2
#10 SERVER -> RTEMS  TFTP DATA Block #3
#11 RTEMS  -> SERVER TFTP ACK for Block #2
#12 SERVER -> RTEMS  TFTP DATA Block #3

...

Obviously the RTEMS system queries each block twice (and the server 
delivers it). When looking at the UDP statistics, there is no packet 
loss, UDP reports all packets to be transferred twice.

I have looked into cpukit/libnetworking/lib/tftpDriver.c 
(rtems_tftp_read)  and found two points which might be reason (and might 
need correction):

The timeout for the first read request is 400 msec (while the timeout 
for consecutive requests is 6 sec):

#define PACKET_FIRST_TIMEOUT_MILLISECONDS  400L
#define PACKET_TIMEOUT_MILLISECONDS        6000L

So my guess is, that the first data packets need slightly more than 
400msec to reach the tftp driver. This delay is NOT visible on 
wireshark, but the RTEMS system load may delay packet delivery inside 
the network stack for some additional time. (BTW: a loss of the first 
request MAY happen anyway, so this should not harm further operation.) 
Due to that delay, the RTEMS system will:

- abort the packet reception due to timeout
- send out the first request again (-> see #03)
- enter packet reception again
- almost immediately receive the response !!! to the first request !!! 
(-> see #02)
- ACK it (-> see #05)
- receive the second try of the first data block (-> see #04)
- drop the packet internally (because it does not match the next 
expected block number)
===> and, as a reaction on receiving a non-expected data block, send out 
the ACK for the next expected data block again (-> see #07)

IMHO the last step is wrong. When the tftp driver receives a block with 
an unexpected data block sequence number, it should silently drop it. If 
the expected block is not received, this will be handled via the 
PACKET_TIMEOUT_MILLISECONDS anyway.

IMHO the PACKET_FIRST_TIMEOUT_MILLISECONDS is a bit ... thight.

----------------------------

So my questions to the list:
- can somebody confirm my conclusions?
- is there any reason why the currently implemented behavior is correct or
- is it ok to remove the answer to a non-expected data block?
- is it ok to increase the timeout to the first data block to something 
like 2-3 seconds?

Sorry for the lengthy mail, but... networking is complicated ;-)

wkr,

Thomas.
-- 
embedded brains GmbH
Herr Thomas DOERFLER
Dornierstr. 4
82178 Puchheim
Germany
email: Thomas.DOERFLER at embedded-brains.de
phone: +49-89-18 94 741 - 12
fax:   +49-89-18 94 741 - 09

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/