[ProgSoc] Bizarre TCP/SMTP timeout

Roland Turner raz at progsoc.org
Tue Mar 25 19:44:52 EST 2008


I'm seeing peculiar TCP behaviour in a particular SMTP context:

- I'm viewing from the PoV of an SMTP "client"; a mail relay delivering
an "outbound" message.

- Several hundred ms (and KB) into the transfer without loss, suddenly a
few client-to-server datagrams go astray (the ACKs coming back suggest
this interpretation)

- Thereafter, every second client-to-server datagram goes astray (again,
interpreting the ACKs)

- My guess is some sort of brain-dead traffic shaper that can't be
bothered quenching, or getting its hands dirty with TCP state, so it
simply chucks every second datagram on a TCP session that's gone on "too
long".

- The impact on Linux 2.6.17 TCP appears to be that it keeps doubling
the inter-segment time until the interval exceeds about 90s, and the
receiving SMTP server declares a timeout (and sends back a 451 to say
so).

- The nett result is messages stuck in the queue.

- This was not happening previously with their Windows+Exchange setup.

- I infer a broken firewall somewhere and a difference between Windows
and Linux TCP behaviour.

- Switching congestion control from bic to reno doesn't appear to help.

- Switching on FRTO doesn't appear to help.

Has anyone seen this before? Have any idea how to tackle it?

- Raz



More information about the Progsoc mailing list