Mailing List Archive

Some Emails to gmail now hang
This started about a week ago, exim 4.94.2 on debian.

I tried disabling chunking ( hosts_try_chunking = ) and it didn't help

Some Emails to gmail get here, some don't. It seems content dependent. It could be as if gmail
is actually teergrubbing me :)

Connecting to gmail-smtp-in.l.google.com [2607:f8b0:4023:c0d::1a]:25 ... failed: Cannot assign requested address
LOG: MAIN
H=gmail-smtp-in.l.google.com [2607:f8b0:4023:c0d::1a] Cannot assign requested address
Connecting to gmail-smtp-in.l.google.com [142.251.2.27]:25 ... TFO mode sendto, no data: EINPROGRESS
connected
TCP_FASTOPEN tcpi_unacked 2
SMTP<< 220 mx.google.com ESMTP l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> STARTTLS
SMTP<< 220 2.0.0 Ready to start TLS
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> MAIL FROM:<foo> SIZE=111056
SMTP>> RCPT TO:<merlin@gmail.com>
will write message using CHUNKING
SMTP>> BDAT 4562
SMTP<< 250 2.1.0 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
SMTP<< 250 2.1.5 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
SMTP<< 250 2.0.0 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
SMTP>> BDAT 105345 LAST
<hangs here>


nnecting to gmail-smtp-in.l.google.com [142.251.2.27]:25 ... TFO mode sendto, no data: EINPROGRESS
connected
TCP_FASTOPEN tcpi_unacked 2
SMTP<< 220 mx.google.com ESMTP n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> STARTTLS
SMTP<< 220 2.0.0 Ready to start TLS
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> MAIL FROM:<foo> SIZE=111056
SMTP>> RCPT TO:<merlin@gmail.com>
SMTP>> DATA
SMTP<< 250 2.1.0 OK n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
SMTP<< 250 2.1.5 OK n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
SMTP<< 354 Go ahead n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
SMTP>> writing message and terminating "."
<hangs here>

-d+all shows
16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0, buffer=0x5594689e5ae0, len=4096)
16:52:41 5682 read response data: size=72
16:52:41 5682 SMTP<< 250 2.1.0 OK 30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
16:52:41 5682 sync_responses expect rcpt
16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0, buffer=0x5594689e5ae0, len=4096)
16:52:41 5682 read response data: size=72
16:52:41 5682 SMTP<< 250 2.1.5 OK 30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
16:52:41 5682 look for one response for BDAT
16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0, buffer=0x5594689e5ae0, len=4096)
16:52:41 5682 read response data: size=72
16:52:41 5682 SMTP<< 250 2.0.0 OK 30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
16:52:41 5682 SMTP>> BDAT 105345 LAST
16:52:41 5682 cmd buf flush 18 bytes (more expected)
16:52:41 5682 gnutls_record_cork(session=0x5594689d08e0)
16:52:41 5682 tls_write(0x5594689e6ae0, 18, more)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594689e6ae0, left=18)
16:52:41 5682 outbytes=18
16:52:41 5682 cannot use sendfile for body: spoolfile not wireformat
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 gnutls_record_uncork(session=0x5594689d08e0)
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutl_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 writing data block fd=7 size=7053 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 7053)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=7053)
16:52:41 5682 outbytes=7053
<hangs here>

--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
Tricky to guess at. Turn off more features, I guess.
You already tried chunking. Next would be fastopen, then pipelining.

However, given it was right after all the data (even in non-chunking)
one has to wonder if it's a content-check of theirs going wrong.

Does a given failing message get through on a later retry,
or fail for ever?


I guess another thing to try would be to build your own Exim
from source, in case there's some fix Deb haven't picked up.
Unlikely, though.
--
Cheers,
Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On 2022-08-09 at 20:20:36 UTC-0400 (Wed, 10 Aug 2022 01:20:36 +0100)
Jeremy Harris via Exim-users <jgh@wizmail.org>
is rumored to have said:

> Tricky to guess at. Turn off more features, I guess.
> You already tried chunking. Next would be fastopen, then pipelining.
>
> However, given it was right after all the data (even in non-chunking)
> one has to wonder if it's a content-check of theirs going wrong.

This sounds a lot like the problem of some 'transparent' middleboxes
that try to filter SMTP but get confused when <crlf>.<crlf> gets split
between packets.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
This does sound something like this issue I blogged about:
https://www.chromosphere.co.uk/2022/06/01/googles-tcp-fast-open-breaks-exim-
delivery/

The workround I have (so far!) successfully implemented with the same
version of Exim on Debian 11 is:

hosts_try_fastopen = !*.l.google.com

into /etc/exim4/conf.d/transports/30_exim4-config_remote_smtp (or whichever
config the remote transport is in depending on how you have installed Exim
on Debian).

HTH

Graeme

-----Original Message-----
From: Exim-users <exim-users-bounces+graeme=chromosphere.co.uk@exim.org> On
Behalf Of Marc MERLIN via Exim-users
Sent: 10 August 2022 01:03
To: exim-users@exim.org
Subject: [exim] Some Emails to gmail now hang

This started about a week ago, exim 4.94.2 on debian.

I tried disabling chunking ( hosts_try_chunking = ) and it didn't help

Some Emails to gmail get here, some don't. It seems content dependent. It
could be as if gmail is actually teergrubbing me :)

Connecting to gmail-smtp-in.l.google.com [2607:f8b0:4023:c0d::1a]:25 ...
failed: Cannot assign requested address
LOG: MAIN
H=gmail-smtp-in.l.google.com [2607:f8b0:4023:c0d::1a] Cannot assign
requested address Connecting to gmail-smtp-in.l.google.com [142.251.2.27]:25
... TFO mode sendto, no data: EINPROGRESS connected TCP_FASTOPEN
tcpi_unacked 2
SMTP<< 220 mx.google.com ESMTP
l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> STARTTLS
SMTP<< 220 2.0.0 Ready to start TLS
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> MAIL FROM:<foo> SIZE=111056
SMTP>> RCPT TO:<merlin@gmail.com>
will write message using CHUNKING
SMTP>> BDAT 4562
SMTP<< 250 2.1.0 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217
- gsmtp
SMTP<< 250 2.1.5 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217
- gsmtp
SMTP<< 250 2.0.0 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217
- gsmtp
SMTP>> BDAT 105345 LAST
<hangs here>


nnecting to gmail-smtp-in.l.google.com [142.251.2.27]:25 ... TFO mode
sendto, no data: EINPROGRESS connected TCP_FASTOPEN tcpi_unacked 2
SMTP<< 220 mx.google.com ESMTP
n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> STARTTLS
SMTP<< 220 2.0.0 Ready to start TLS
SMTP>> EHLO mail1.merlins.org
SMTP<< 250-mx.google.com at your service, [209.81.13.136]
250-SIZE 157286400
250-8BITMIME
250-ENHANCEDSTATUSCODES
250-PIPELINING
250-CHUNKING
250 SMTPUTF8
SMTP>> MAIL FROM:<foo> SIZE=111056
SMTP>> RCPT TO:<merlin@gmail.com>
SMTP>> DATA
SMTP<< 250 2.1.0 OK n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530
- gsmtp
SMTP<< 250 2.1.5 OK n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530
- gsmtp
SMTP<< 354 Go ahead n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530
- gsmtp
SMTP>> writing message and terminating "."
<hangs here>

-d+all shows
16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0,
buffer=0x5594689e5ae0, len=4096)
16:52:41 5682 read response data: size=72
16:52:41 5682 SMTP<< 250 2.1.0 OK
30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
16:52:41 5682 sync_responses expect rcpt
16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0,
buffer=0x5594689e5ae0, len=4096)
16:52:41 5682 read response data: size=72
16:52:41 5682 SMTP<< 250 2.1.5 OK
30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
16:52:41 5682 look for one response for BDAT
16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0,
buffer=0x5594689e5ae0, len=4096)
16:52:41 5682 read response data: size=72
16:52:41 5682 SMTP<< 250 2.0.0 OK
30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
16:52:41 5682 SMTP>> BDAT 105345 LAST
16:52:41 5682 cmd buf flush 18 bytes (more expected)
16:52:41 5682 gnutls_record_cork(session=0x5594689d08e0)
16:52:41 5682 tls_write(0x5594689e6ae0, 18, more)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594689e6ae0, left=18)
16:52:41 5682 outbytes=18
16:52:41 5682 cannot use sendfile for body: spoolfile not wireformat
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 gnutls_record_uncork(session=0x5594689d08e0)
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutl_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 flushing headers buffer
16:52:41 5682 writing data block fd=7 size=8191 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 8191)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=8191)
16:52:41 5682 outbytes=8191
16:52:41 5682 writing data block fd=7 size=7053 timeout=300
16:52:41 5682 tls_write(0x5594688fabf0, 7053)
16:52:41 5682 gnutls_record_send(session=0x5594689d08e0,
buffer=0x5594688fabf0, left=7053)
16:52:41 5682 outbytes=7053
<hangs here>

--
"A mouse is a device used to point at the xterm you want to type in" -
A.S.R.

Home page: http://marc.merlins.org/

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, Aug 10, 2022 at 11:21:31AM +0100, Graeme Coates via Exim-users wrote:
> This does sound something like this issue I blogged about:
> https://www.chromosphere.co.uk/2022/06/01/googles-tcp-fast-open-breaks-exim-
> delivery/
>
> The workround I have (so far!) successfully implemented with the same
> version of Exim on Debian 11 is:
>
> hosts_try_fastopen = !*.l.google.com
>
> into /etc/exim4/conf.d/transports/30_exim4-config_remote_smtp (or whichever
> config the remote transport is in depending on how you have installed Exim
> on Debian).

Thank you, that totally fixed my problem.

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On 10 August 2022 17:12:55 BST, Marc MERLIN via Exim-users <exim-users@exim.org> wrote:
>> hosts_try_fastopen = !*.l.google.com
>>
>> into /etc/exim4/conf.d/transports/30_exim4-config_remote_smtp (or
>whichever
>> config the remote transport is in depending on how you have installed
>Exim
>> on Debian).
>
>Thank you, that totally fixed my problem.

That's extremwly weird. I can't see a logical connection between the TCP startup detail and a problem that late in the SMTP conversation.

I'd love to hear from someone at Google on this point.


--
Cheers,
Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, Aug 10, 2022 at 06:29:47PM +0100, Jeremy Harris via Exim-users wrote:
> That's extremwly weird. I can't see a logical connection between the
> TCP startup detail and a problem that late in the SMTP conversation.

That was my thought too, I don't get it.

> I'd love to hear from someone at Google on this point.

I work at google (not in gmail), and have asked internally if they can
look into it.
They are very busy so I'm not sure when I'll hear back.

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, 10 Aug 2022, Marc MERLIN via Exim-users wrote:

> On Wed, Aug 10, 2022 at 06:29:47PM +0100, Jeremy Harris via Exim-users wrote:
>> That's extremwly weird. I can't see a logical connection between the
>> TCP startup detail and a problem that late in the SMTP conversation.
>
> That was my thought too, I don't get it.
>
>> I'd love to hear from someone at Google on this point.
>
> I work at google (not in gmail), and have asked internally if they can
> look into it.
> They are very busy so I'm not sure when I'll hear back.

Might be worth posting on the mailop@mailop.org list

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, Aug 10, 2022 at 11:17:43PM +0100, Andrew C Aitchison via Exim-users wrote:

> On Wed, 10 Aug 2022, Marc MERLIN via Exim-users wrote:
>
> > On Wed, Aug 10, 2022 at 06:29:47PM +0100, Jeremy Harris via Exim-users wrote:
> >> That's extremwly weird. I can't see a logical connection between the
> >> TCP startup detail and a problem that late in the SMTP conversation.
> >
> > That was my thought too, I don't get it.
> >
> >> I'd love to hear from someone at Google on this point.
> >
> > I work at google (not in gmail), and have asked internally if they can
> > look into it.
> > They are very busy so I'm not sure when I'll hear back.
>
> Might be worth posting on the mailop@mailop.org list

I've also reached out to the Gmail team. They're aware. Which is not
to say that there's a quick fix in the works, the front-end connection
termination devices are both non-trivial and critical, so changes will
happen cautiously and likely slowly, and may be delayed by other
priorities...

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, Aug 10, 2022 at 06:46:16PM -0400, Viktor Dukhovni via Exim-users wrote:
> On Wed, Aug 10, 2022 at 11:17:43PM +0100, Andrew C Aitchison via Exim-users wrote:
>
> > On Wed, 10 Aug 2022, Marc MERLIN via Exim-users wrote:
> >
> > > On Wed, Aug 10, 2022 at 06:29:47PM +0100, Jeremy Harris via Exim-users wrote:
> > >> That's extremwly weird. I can't see a logical connection between the
> > >> TCP startup detail and a problem that late in the SMTP conversation.
> > >
> > > That was my thought too, I don't get it.
> > >
> > >> I'd love to hear from someone at Google on this point.
> > >
> > > I work at google (not in gmail), and have asked internally if they can
> > > look into it.
> > > They are very busy so I'm not sure when I'll hear back.
> >
> > Might be worth posting on the mailop@mailop.org list
>
> I've also reached out to the Gmail team. They're aware. Which is not
> to say that there's a quick fix in the works, the front-end connection
> termination devices are both non-trivial and critical, so changes will
> happen cautiously and likely slowly, and may be delayed by other
> priorities...

Thanks.

Whose fault is it? debian/exim, or gmail?

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, Aug 10, 2022 at 04:00:51PM -0700, Marc MERLIN wrote:

> > I've also reached out to the Gmail team. They're aware. Which is not
> > to say that there's a quick fix in the works, the front-end connection
> > termination devices are both non-trivial and critical, so changes will
> > happen cautiously and likely slowly, and may be delayed by other
> > priorities...
>
> Thanks.
>
> Whose fault is it? debian/exim, or gmail?

It looks *strongly* like an interoperability problem between the Linux
kernel TCP implementation and the Google TCP/TLS termination front-ends,
unless all the Exim users who lately somewhat regularly show up to
report this issue are behind some as yet unidentified set of
middle-boxes that break TCP state.

It would perhaps be useful to also see any reports of success sending
sufficiently large messages to Gmail from the reported Exim builds and
Linux versions. If some users are not seeing any issues, then it would
be good to know how their situation is differs.

If you have Exim on a Linux laptop and are able to connect it the office
WiFi network, can you still see the problem with default Exim settings?

Can you post a "tshark" decode of a full capture of a failed delivery?

# tcpdump -s0 -w /some/file.pcap ...
# tshark -nr /some/file

[. Keep the PCAP file, more questions may arise once the basic decode is
posted. ]

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
D?a 10. augusta 2022 23:39:08 UTC používate? Viktor Dukhovni via Exim-users <exim-users@exim.org> napísal:

>It would perhaps be useful to also see any reports of success sending
>sufficiently large messages to Gmail from the reported Exim builds and
>Linux versions. If some users are not seeing any issues, then it would
>be good to know how their situation is differs.

I (my MTA) doesn't send large messages often, especially not to gmail,
but two days ago one happen. Its delivery over IPv6 fails (timeout after
DATA), subsequent delivery over IPv4 was success.

regards

Slavko

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Thu, 11 Aug 2022 at 00:51, Viktor Dukhovni via Exim-users <
exim-users@exim.org> wrote:

>
> Can you post a "tshark" decode of a full capture of a failed delivery?
>
> # tcpdump -s0 -w /some/file.pcap ...
> # tshark -nr /some/file
>
> [. Keep the PCAP file, more questions may arise once the basic decode is
> posted. ]
>
>
I have a full capture run through tshark as per above at the following URL:

https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tshark_gmail.txt

(NB: I added a capture filter to tcpdump of the form: "port 25 || port 587
|| port 465 || port 2525" to filter down to SMTP traffic - if I need to
repeat using an alternate packet filter, please let me know - it's fairly
easy for me to do). Note that this was for an ~5.6MB attachment - the
initial delivery was via ipv6 which failed, and it then flicked back to v4
and worked. However, I have seen failures on both ipv6 and ipv4 prior to
implementing the workaround).

Graeme
--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Thu, Aug 11, 2022 at 02:55:51PM +0100, Graeme Coates via Exim-users wrote:

> > Can you post a "tshark" decode of a full capture of a failed delivery?
> >
> > # tcpdump -s0 -w /some/file.pcap ...
> > # tshark -nr /some/file
> >
> > [. Keep the PCAP file, more questions may arise once the basic decode is
> > posted. ]
> >
> >
> I have a full capture run through tshark as per above at the following URL:
>
> https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tshark_gmail.txt
>
> (NB: I added a capture filter to tcpdump of the form: "port 25 || port 587
> || port 465 || port 2525" to filter down to SMTP traffic - if I need to
> repeat using an alternate packet filter, please let me know - it's fairly
> easy for me to do). Note that this was for an ~5.6MB attachment - the
> initial delivery was via ipv6 which failed, and it then flicked back to v4
> and worked. However, I have seen failures on both ipv6 and ipv4 prior to
> implementing the workaround).

Among the decoded sessions, only one used TFO to eliminate a round-trip
delay:

65.309846 2a00:1098:0:86:1000:45:0:1 ? 2a00:1450:400c:c07::1b TCP 106 44884 ? 25 [SYN] Seq=0 Win=64800 Len=0 MSS=1440 SACK_PERM=1 TSval=2327966919 TSecr=0 WS=128 TFO=C
65.318623 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 TCP 94 25 ? 44884 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1440 SACK_PERM=1 TSval=3698060567 TSecr=2327966919 WS=256
65.319958 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 SMTP 173 S: 220 mx.google.com ESMTP n12-20020a5d660c000000b0021eed663c94si12775572wru.912 - gsmtp
65.319991 2a00:1098:0:86:1000:45:0:1 ? 2a00:1450:400c:c07::1b TCP 86 44884 ? 25 [ACK] Seq=1 Ack=88 Win=64768 Len=0 TSval=2327966929 TSecr=3698060569

Here the server's greeting appears to be sent before the client's ACK,
suggesting that the client's TFO cookie was accepted.

This somewhat hits an edge case in the TFO specification, because with
the server talking first, the client's initial data length is zero, and
so the server's signal that TFO is not accepted (by ACKING only the SYN
and not the initial data) is indistinguishable from the signal that it
was (by ACKing also the initial data).

In any case many packets later, and with data successfully delivered in
both directions, things start to go wrong:

65.584402 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 TCP 86 25 ? 44884 [ACK] Seq=6247 Ack=147544 Win=321792 Len=0 TSval=3698060833 TSecr=2327967185
65.584402 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 TCP 86 25 ? 44884 [ACK] Seq=6247 Ack=148972 Win=324608 Len=0 TSval=3698060833 TSecr=2327967185
65.605640 2a00:1098:0:86:1000:45:0:1 ? 2a00:1450:400c:c07::1b TCP 1514 44884 ? 25 [ACK] Seq=148972 Ack=6247 Win=64128 Len=1428 TSval=2327967214 TSecr=3698060823 [TCP segment of a reassembled PDU]
65.614445 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 TCP 86 25 ? 44884 [ACK] Seq=6247 Ack=150400 Win=327424 Len=0 TSval=3698060863 TSecr=2327967214
65.821663 2a00:1098:0:86:1000:45:0:1 ? 2a00:1450:400c:c07::1b TCP 1514 [TCP Spurious Retransmission] 44884 ? 25 [ACK] Seq=69004 Ack=6247 Win=64128 Len=1428 TSval=2327967430 TSecr=3698060823[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
65.830763 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 TCP 98 [TCP Dup ACK 2064#1] 25 ? 44884 [ACK] Seq=6247 Ack=150400 Win=327424 Len=0 TSval=3698061079 TSecr=2327967214 SLE=69004 SRE=70432
66.261594 2a00:1098:0:86:1000:45:0:1 ? 2a00:1450:400c:c07::1b TCP 1514 [TCP Spurious Retransmission] 44884 ? 25 [ACK] Seq=69004 Ack=6247 Win=64128 Len=1428 TSval=2327967870 TSecr=3698060823[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
66.271116 2a00:1450:400c:c07::1b ? 2a00:1098:0:86:1000:45:0:1 TCP 98 [TCP Dup ACK 2064#2] 25 ? 44884 [ACK] Seq=6247 Ack=150400 Win=327424 Len=0 TSval=3698061519 TSecr=2327967214 SLE=69004 SRE=70432
67.125602 2a00:1098:0:86:1000:45:0:1 ? 2a00:1450:400c:c07::1b TCP 1514 [TCP Spurious Retransmission] 44884 ? 25 [ACK] Seq=69004 Ack=6247 Win=64128 Len=1428 TSval=2327968734 TSecr=3698060823[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
...

The client seems rather intent to retransmit packets long ago ACKed by
the server! How/why TFO comes into play here, so late in the TCP stream
is unclear. Naively, looks more like a Linux bug, unless the server did
something wrong along the way I missed on first inspection.

At this point it would be useful to see the full PCAP file for just the
traffic involving client port "44884".

$ tcpdump -s0 -r /some/file.pcap -w /tmp/tfo.pcap tcp port 44884

The tshark summary decode elides some details...

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
Mark,

I have experienced the same... seems to happen one every 2-3 weeks and I
think it depends on which actual server in Google's cluster you get
connected to.

Google's implementation of SMTP seems to be very poor at reporting
actual problems, rather it either accepts delivery (and presumably
discards it) or hangs like you have experienced.

Mike


On 10/08/2022 01:02, Marc MERLIN via Exim-users wrote:
> This started about a week ago, exim 4.94.2 on debian.
>
> I tried disabling chunking ( hosts_try_chunking = ) and it didn't help
>
> Some Emails to gmail get here, some don't. It seems content dependent. It could be as if gmail
> is actually teergrubbing me :)
>
> Connecting to gmail-smtp-in.l.google.com [2607:f8b0:4023:c0d::1a]:25 ... failed: Cannot assign requested address
> LOG: MAIN
> H=gmail-smtp-in.l.google.com [2607:f8b0:4023:c0d::1a] Cannot assign requested address
> Connecting to gmail-smtp-in.l.google.com [142.251.2.27]:25 ... TFO mode sendto, no data: EINPROGRESS
> connected
> TCP_FASTOPEN tcpi_unacked 2
> SMTP<< 220 mx.google.com ESMTP l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
> SMTP>> EHLO mail1.merlins.org
> SMTP<< 250-mx.google.com at your service, [209.81.13.136]
> 250-SIZE 157286400
> 250-8BITMIME
> 250-STARTTLS
> 250-ENHANCEDSTATUSCODES
> 250-PIPELINING
> 250-CHUNKING
> 250 SMTPUTF8
> SMTP>> STARTTLS
> SMTP<< 220 2.0.0 Ready to start TLS
> SMTP>> EHLO mail1.merlins.org
> SMTP<< 250-mx.google.com at your service, [209.81.13.136]
> 250-SIZE 157286400
> 250-8BITMIME
> 250-ENHANCEDSTATUSCODES
> 250-PIPELINING
> 250-CHUNKING
> 250 SMTPUTF8
> SMTP>> MAIL FROM:<foo> SIZE=111056
> SMTP>> RCPT TO:<merlin@gmail.com>
> will write message using CHUNKING
> SMTP>> BDAT 4562
> SMTP<< 250 2.1.0 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
> SMTP<< 250 2.1.5 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
> SMTP<< 250 2.0.0 OK l193-20020a6391ca000000b0041b8f2bd530si11447438pge.217 - gsmtp
> SMTP>> BDAT 105345 LAST
> <hangs here>
>
>
> nnecting to gmail-smtp-in.l.google.com [142.251.2.27]:25 ... TFO mode sendto, no data: EINPROGRESS
> connected
> TCP_FASTOPEN tcpi_unacked 2
> SMTP<< 220 mx.google.com ESMTP n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
> SMTP>> EHLO mail1.merlins.org
> SMTP<< 250-mx.google.com at your service, [209.81.13.136]
> 250-SIZE 157286400
> 250-8BITMIME
> 250-STARTTLS
> 250-ENHANCEDSTATUSCODES
> 250-PIPELINING
> 250-CHUNKING
> 250 SMTPUTF8
> SMTP>> STARTTLS
> SMTP<< 220 2.0.0 Ready to start TLS
> SMTP>> EHLO mail1.merlins.org
> SMTP<< 250-mx.google.com at your service, [209.81.13.136]
> 250-SIZE 157286400
> 250-8BITMIME
> 250-ENHANCEDSTATUSCODES
> 250-PIPELINING
> 250-CHUNKING
> 250 SMTPUTF8
> SMTP>> MAIL FROM:<foo> SIZE=111056
> SMTP>> RCPT TO:<merlin@gmail.com>
> SMTP>> DATA
> SMTP<< 250 2.1.0 OK n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
> SMTP<< 250 2.1.5 OK n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
> SMTP<< 354 Go ahead n23-20020a170902969700b0016ef3d9ed6bsi14752543plp.530 - gsmtp
> SMTP>> writing message and terminating "."
> <hangs here>
>
> -d+all shows
> 16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0, buffer=0x5594689e5ae0, len=4096)
> 16:52:41 5682 read response data: size=72
> 16:52:41 5682 SMTP<< 250 2.1.0 OK 30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
> 16:52:41 5682 sync_responses expect rcpt
> 16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0, buffer=0x5594689e5ae0, len=4096)
> 16:52:41 5682 read response data: size=72
> 16:52:41 5682 SMTP<< 250 2.1.5 OK 30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
> 16:52:41 5682 look for one response for BDAT
> 16:52:41 5682 Calling gnutls_record_recv(session=0x5594689d08e0, buffer=0x5594689e5ae0, len=4096)
> 16:52:41 5682 read response data: size=72
> 16:52:41 5682 SMTP<< 250 2.0.0 OK 30-20020a17090a035e00b001f57a54c7aasi385374pjf.69 - gsmtp
> 16:52:41 5682 SMTP>> BDAT 105345 LAST
> 16:52:41 5682 cmd buf flush 18 bytes (more expected)
> 16:52:41 5682 gnutls_record_cork(session=0x5594689d08e0)
> 16:52:41 5682 tls_write(0x5594689e6ae0, 18, more)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594689e6ae0, left=18)
> 16:52:41 5682 outbytes=18
> 16:52:41 5682 cannot use sendfile for body: spoolfile not wireformat
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 gnutls_record_uncork(session=0x5594689d08e0)
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutl_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 flushing headers buffer
> 16:52:41 5682 writing data block fd=7 size=8191 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 8191)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=8191)
> 16:52:41 5682 outbytes=8191
> 16:52:41 5682 writing data block fd=7 size=7053 timeout=300
> 16:52:41 5682 tls_write(0x5594688fabf0, 7053)
> 16:52:41 5682 gnutls_record_send(session=0x5594689d08e0, buffer=0x5594688fabf0, left=7053)
> 16:52:41 5682 outbytes=7053
> <hangs here>
>


--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Thu, Aug 11, 2022 at 09:06:28PM +0100, Mike Tubby via Exim-users wrote:

> I have experienced the same... seems to happen one every 2-3 weeks and I
> think it depends on which actual server in Google's cluster you get
> connected to.
>
> Google's implementation of SMTP seems to be very poor at reporting
> actual problems, rather it either accepts delivery (and presumably
> discards it) or hangs like you have experienced.

The evidence from the posted decodeed packet capture so far suggests
otherwise. The problem is at the TCP layer, completely unrelated to
Google's "SMTP implementation", and to first approximation, pending
further more detailed analysis of packet captures, looks like a Linux
TCP bug, more than a Google bug...

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Thu, 11 Aug 2022 at 17:21, Viktor Dukhovni via Exim-users <
exim-users@exim.org> wrote:

>
> At this point it would be useful to see the full PCAP file for just the
> traffic involving client port "44884".
>
> $ tcpdump -s0 -r /some/file.pcap -w /tmp/tfo.pcap tcp port 44884
>
> The tshark summary decode elides some details...
>
>
No problem - here's a link to the pcap file filtered down by port 44884.

https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tfo.zip

(It should unzip to the .pcap file)

Graeme
--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Thu, Aug 11, 2022 at 10:23:47PM +0100, Graeme Coates via Exim-users wrote:

> > At this point it would be useful to see the full PCAP file for just the
> > traffic involving client port "44884".
> >
> > $ tcpdump -s0 -r /some/file.pcap -w /tmp/tfo.pcap tcp port 44884
> >
> > The tshark summary decode elides some details...
>
> No problem - here's a link to the pcap file filtered down by port 44884.
>
> https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tfo.zip

It still very much looks like client-side misbehaviour. One
complication is that you also have client-side TCP offload, but since
after STARTTLS the server only sends small ACK packets, its checksums
verify correctly, and essentially only the client packets carry
"incorrect" (to be computed in the NIC) checksums.

It may be worth trying to disable TCP offload in the NIC, and see
whether then TFO remains problematic. In other words, does it
somehow confuse the kernel or the NIC?

Still very odd that the problem would show up so late (~69k) into the
stream.

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On 11/08/2022 22:23, Graeme Coates via Exim-users wrote:
> No problem - here's a link to the pcap file filtered down by port 44884.
>
> https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tfo.zip

Attached is the time-sequence plot for that. I agree with Viktor:
this is a problem in the (you said Debian, so probably all Linux)
kernel TCP implementation. Assuming the capture was taken on the
initiating host, not a hypervisor or out on a router.

The red "R" packets are these retries being sent by the client
side of the connection. But they are for a region of sequence space
already ACK'd - see the green line - so there is no good reason for
the retry. The server end, Google, certainly thinks it saw that data
before; it responds with a DSACK (purple "DS") each time.

The client keeps on retrying until it times out and resets the connection.

The lineup with the initial value of the window advertised by the server
(yellow, at the "WS 8" end) is intruiging and might hint at the
location of the bug. It doesn't line up exactly, but it *is* at the
start of the first (transmit-offloaded) 44 KB segment just after
the initial window edge sequence value.
--
Cheers,
Jeremy

(yes, I do this sort of thing for $work...)
Re: Some Emails to gmail now hang [ In reply to ]
On Wed, 10 Aug 2022, Viktor Dukhovni via Exim-users wrote:

> On Wed, Aug 10, 2022 at 04:00:51PM -0700, Marc MERLIN wrote:
>
>>> I've also reached out to the Gmail team. They're aware. Which is not
>>> to say that there's a quick fix in the works, the front-end connection
>>> termination devices are both non-trivial and critical, so changes will
>>> happen cautiously and likely slowly, and may be delayed by other
>>> priorities...
>>
>> Thanks.
>>
>> Whose fault is it? debian/exim, or gmail?
>
> It looks *strongly* like an interoperability problem between the Linux
> kernel TCP implementation and the Google TCP/TLS termination front-ends,
> unless all the Exim users who lately somewhat regularly show up to
> report this issue are behind some as yet unidentified set of
> middle-boxes that break TCP state.
>
> It would perhaps be useful to also see any reports of success sending
> sufficiently large messages to Gmail from the reported Exim builds and
> Linux versions. If some users are not seeing any issues, then it would
> be good to know how their situation is differs.

Might be good to know who is using openssl and who is using gnu-tls,
so that we can rule in or out the tls implementation.

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Fri, Aug 12, 2022 at 06:30:21AM +0100, Andrew C Aitchison via Exim-users wrote:

> > It looks *strongly* like an interoperability problem between the Linux
> > kernel TCP implementation and the Google TCP/TLS termination front-ends,
> > unless all the Exim users who lately somewhat regularly show up to
> > report this issue are behind some as yet unidentified set of
> > middle-boxes that break TCP state.
> >
> > It would perhaps be useful to also see any reports of success sending
> > sufficiently large messages to Gmail from the reported Exim builds and
> > Linux versions. If some users are not seeing any issues, then it would
> > be good to know how their situation is differs.
>
> Might be good to know who is using openssl and who is using gnu-tls,
> so that we can rule in or out the tls implementation.

Surely irrelevant, this is a *TCP-layer* problem. If not in the Linux
kernel, perhaps in the TCP offload in the network card.

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
I repeated the test with tso off in the NIC. Process as follows:

1. Stop Exim, remove fastopen exclusion in transport conf.
2. ethtool -K eth0 tso off; ethtool -K eth0 tx off
3. Restart exim, retest.

Still experiencing timeouts in a similar fashion much as before - tshark
summary:
https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tfo_nic.txt

Of note, here's the output from ethtool --show-offload when I ran the test:


# ethtool --show-offload eth0
Features for eth0:
rx-checksumming: on [fixed]
tx-checksumming: off
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: off
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: on [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]

On Fri, 12 Aug 2022 at 07:09, Viktor Dukhovni via Exim-users <
exim-users@exim.org> wrote:

> On Fri, Aug 12, 2022 at 06:30:21AM +0100, Andrew C Aitchison via
> Exim-users wrote:
>
> > > It looks *strongly* like an interoperability problem between the Linux
> > > kernel TCP implementation and the Google TCP/TLS termination
> front-ends,
> > > unless all the Exim users who lately somewhat regularly show up to
> > > report this issue are behind some as yet unidentified set of
> > > middle-boxes that break TCP state.
> > >
> > > It would perhaps be useful to also see any reports of success sending
> > > sufficiently large messages to Gmail from the reported Exim builds and
> > > Linux versions. If some users are not seeing any issues, then it would
> > > be good to know how their situation is differs.
> >
> > Might be good to know who is using openssl and who is using gnu-tls,
> > so that we can rule in or out the tls implementation.
>
> Surely irrelevant, this is a *TCP-layer* problem. If not in the Linux
> kernel, perhaps in the TCP offload in the network card.
>
> --
> Viktor.
>
> --
> ## List details at https://lists.exim.org/mailman/listinfo/exim-users
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://wiki.exim.org/
>
--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On 12/08/2022 08:31, Graeme Coates via Exim-users wrote:
> generic-segmentation-offload: on
^^^^
This might still be enabling transmit using >MTU from the kernel to the NIC.
Get a pcap to check; any >1500 byte packets being sent?

I agree with Viktor though - it's a Linux kernel bug. I worked
up a nice analysis and posted it last night, but it's not appeared
on the list yet; either stuck in moderation or dropped due to the
graphic attachment, I guess.

I lean more towards a TCP endpoint software bug than an offload
bug, from that analysis.
--
Cheers,
Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Fri, Aug 12, 2022 at 08:31:37AM +0100, Graeme Coates via Exim-users wrote:

> I repeated the test with tso off in the NIC. Process as follows:
>
> 1. Stop Exim, remove fastopen exclusion in transport conf.
> 2. ethtool -K eth0 tso off; ethtool -K eth0 tx off
> 3. Restart exim, retest.
>
> Still experiencing timeouts in a similar fashion much as before - tshark
> summary:
> https://www.chromosphere.co.uk/wp-content/blogs.dir/1/files/2022/08/tfo_nic.txt

The numbers look very similar to the previous case:

ACK SEQ WARP
150400 69004 81396
156097 71845 84252

Both cases see the server ACK ~150k of data with the client then
retransmitting back from ~70k, going back ~80k for no obvious reason.

> Of note, here's the output from ethtool --show-offload when I ran the test:
>
> # ethtool --show-offload eth0
> Features for eth0:
> rx-checksumming: on [fixed]
> tx-checksumming: off
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: off
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [fixed]

I would like to suggest also disabling "sg".

> generic-segmentation-offload: on
> generic-receive-offload: on

Perhaps these too. The idea is to try and see whether it is Linux or
the NIC. Why on earth TFO would have such a delayed effect far down the
TCP stream is rather a mystery. Once the 3WHS is complete, with or
without 0-RTT data, the rest of the TCP session should proceed
identically.

If the problem persists with as much as possible of the hardware assist
disabled, then it sure looks like Linux TCP is the culprit.

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: Some Emails to gmail now hang [ In reply to ]
On Fri, Aug 12, 2022 at 12:31:16PM -0400, Viktor Dukhovni via Exim-users wrote:

> If the problem persists with as much as possible of the hardware assist
> disabled, then it sure looks like Linux TCP is the culprit.

Unsurprisingly, this is indeed a Linux bug. Neal Cardwell from Google
shared the below:

I strongly suspect this is a known issue with interactions between
Exim and TFO causing machines to ignore packets, which was reported
in this thread:

https://lore.kernel.org/lkml/E1nZMdl-0006nG-0J@plastiekpoot/

I tracked it down to a conntrack bug and suggested a fix, and the
conntrack maintainers checked in an expanded fix here: c7aab4f17021b
netfilter: nf_conntrack_tcp: re-init for syn packets only

https://lore.kernel.org/netdev/17c87824-7d04-c34e-bf6a-d8b874242636@tmb.nu/t/#mab1f2792ba24e98e3f41468c9781747a77c87ac9

Can you please advise folks who run into this to upgrade to Linux v5.18
or later (since it has the fix) or to cherry-pick in that fix?

I see this patch was only backported to 5.17, and not to older stable
releases. I will try to get it backported to other stable releases so
more users pick up the fix automatically from their distributions...

It seems that with TFO the Linux TCP client is prone to losing track of
the window scale, and eventually the SMTP client runs out of TCP window,
matching Jeremy's observation that the client did not get far past the
initial window.

So either get a later (or patched) kernel, or disable TFO.

--
Viktor.

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

1 2  View All