Mailing List Archive

SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue?
Howdy,

Long time, no talk, sorry that I've been a stranger :)

I have Exim version 4.92 #3 built 07-May-2019 17:44:23
on both the backup MX and the main MX (both running debian)

I have an issue where some Emails go to the backup MX, then the backup
MX sends to the main MX, and the mail gets delivered to my mailbox, but
the transaction isn't finished in a way that the backup knows the mail
went through, so it sends it over and over again.

Well, except that with the same queue flush command I have a log of it
hanging and failing, and then a later log of it going through.
I am using sa-exim in local_scan which does run after DATA has been
sent. I'm not sure if this could be related.

It looks like this:
moremagic:~# exim -v -qf 1kG0Wt-0001K6-TH
LOG: MAIN
cwd=/root 4 args: exim -v -qf 1kG0Wt-0001K6-TH
LOG: queue_run MAIN
Start queue run: pid=6293 -qf
delivering 1kG3DU-0003Au-Fd (queue run pid 6293)
T: remote_smtp for marcatwww@merlins.org
Connecting to mail1.merlins.org [209.81.13.136]:25 ... connected
SMTP<< 220 mail1.merlins.org ESMTP Exim 4.92 #3 Thu, 10 Sep 2020 12:05:28 -0700 - mm9
SMTP>> EHLO mail2.merlins.org
SMTP<< 250-mail1.merlins.org Hello mail2.merlins.org [213.228.62.38]
250-SIZE 26214400
250-8BITMIME
250-VRFY
250-EXPN
250-PIPELINING
250-AUTH PLAIN LOGIN
250-CHUNKING
250-STARTTLS
250 HELP
SMTP>> STARTTLS
SMTP<< 220 TLS go ahead
SMTP>> EHLO mail2.merlins.org
SMTP<< 250-mail1.merlins.org Hello mail2.merlins.org [213.228.62.38]
250-SIZE 26214400
250-8BITMIME
250-VRFY
250-EXPN
250-PIPELINING
250-AUTH PLAIN LOGIN
250-CHUNKING
250 HELP
SMTP>> MAIL FROM:<SRS0=1t+SHt=CS=solar.cc=info@eigbox.net> SIZE=8413
SMTP>> RCPT TO:<marcatwww@merlins.org>
will write message using CHUNKING
SMTP>> BDAT 7247 LAST

and then it hangs here.

In the logs, I see
2020-09-10 11:41:41 1kG0Wt-0001K6-TH == marcatwww@merlins.org R=dnslookup T=remote_smtp defer (110): Connection timed out H=mail1.merlins.org [209.81.13.136]: SMTP timeout after pipelined end of data (5334 bytes written)

But on the other side, I see:
2020-09-10 12:05:33 1kGRsw-0006mt-VG SA: Debug: SAEximRunCond expand returned: '1'
2020-09-10 12:05:33 1kGRsw-0006mt-VG SA: Debug: check succeeded, running spamc
2020-09-10 12:05:34 1kGRsw-0006mt-VG SA: Action: scanned but message isn't spam: score=6.4 required=7.0 (scanned in 1/1 secs | Message-Id: 15835da1-3b2d-8b
a6-e219-a838a141a70b@solar.cc). From <SRS0=1t+SHt=CS=solar.cc=info@eigbox.net> (host=NULL [213.228.62.38]) for marcatwww@merlins.org
2020-09-10 12:05:34 1kGRsw-0006mt-VG <= SRS0=1t+SHt=CS=solar.cc=info@eigbox.net H=(mail2.merlins.org) [213.228.62.38]:54380 I=[209.81.13.136]:25 P=esmtps X
=TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256 CV=no K S=7471 DKIM=solar.cc id=15835da1-3b2d-8ba6-e219-a838a141a70b@solar.cc T="Re: hi from germany (tesla model
3 12V to household ac)" from <SRS0=1t+SHt=CS=solar.cc=info@eigbox.net> for marcatwww@merlins.org
2020-09-10 12:05:34 cwd=/var/spool/exim4 3 args: /usr/sbin/exim4 -Mc 1kGRsw-0006mt-VG
2020-09-10 12:05:34 1kGRsw-0006mt-VG => |/usr/bin/procmail -m /home/merlin/.procmail/procmailrc_marcme (marcatwww@merlins.org) <marcatwww@merlins.org> F=<SRS0=1t+S
Ht=CS=solar.cc=info@eigbox.net> R=system_aliases_merlin T=address_pipe_root S=7550

The Email was scanned in 1 second and accepted.



I then ran it again, and it went through (well, I mean I got yet another
copy, but the backup MX is now happy that the mail went through).

moremagic:~# exim -v -qf 1kG0Wt-0001K6-TH
LOG: MAIN
cwd=/root 4 args: exim -v -qf 1kG0Wt-0001K6-TH
LOG: queue_run MAIN
Start queue run: pid=6427 -qf
delivering 1kG0Wt-0001K6-TH (queue run pid 6427)
T: remote_smtp for marcatwww@merlins.org
Connecting to mail1.merlins.org [209.81.13.136]:25 ... connected
SMTP<< 220 mail1.merlins.org ESMTP Exim 4.92 #3 Thu, 10 Sep 2020 12:06:41 -0700 - mm9
SMTP>> EHLO mail2.merlins.org
SMTP<< 250-mail1.merlins.org Hello mail2.merlins.org [213.228.62.38]
250-SIZE 26214400
250-8BITMIME
250-VRFY
250-EXPN
250-PIPELINING
250-AUTH PLAIN LOGIN
250-CHUNKING
250-STARTTLS
250 HELP
SMTP>> STARTTLS
SMTP<< 220 TLS go ahead
SMTP>> EHLO mail2.merlins.org
SMTP<< 250-mail1.merlins.org Hello mail2.merlins.org [213.228.62.38]
250-SIZE 26214400
250-8BITMIME
250-VRFY
250-EXPN
250-PIPELINING
250-AUTH PLAIN LOGIN
250-CHUNKING
250 HELP
SMTP>> MAIL FROM:<SRS0=1t+SHt=CS=solar.cc=info@eigbox.net> SIZE=6504
SMTP>> RCPT TO:<marcatwww@merlins.org>
will write message using CHUNKING
SMTP>> BDAT 5334 LAST
SMTP<< 250 OK
SMTP<< 250 Accepted
SMTP<< 250- 5334 byte chunk, total 5334
250 OK id=1kGRu6-0006vO-DL
SMTP>> QUIT
SMTP(close)>>
LOG: MAIN
=> marcatwww@merlins.org F=<SRS0=1t+SHt=CS=solar.cc=info@eigbox.net> R=dnslookup T=remote_smtp S=5334 H=mail1.merlins.org [209.81.13.136] I=[213.228.62.40] X=TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256 CV=no DN="C=US,ST=CA,L=Silicon Valley,O=Linux Geeks Incorporated,OU=merlins.org,CN=*.merlins.org" K C="250- 5334 byte chunk, total 5334\\n250 OK id=1kGRu6-0006vO-DL"
LOG: MAIN
Completed


Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
On Thu, Sep 10, 2020 at 12:24:51PM -0700, Marc MERLIN via Exim-users wrote:
> But on the other side, I see:
> 2020-09-10 12:05:33 1kGRsw-0006mt-VG SA: Debug: SAEximRunCond expand returned: '1'
...
> 2020-09-10 12:05:34 cwd=/var/spool/exim4 3 args: /usr/sbin/exim4 -Mc 1kGRsw-0006mt-VG
> 2020-09-10 12:05:34 1kGRsw-0006mt-VG => |/usr/bin/procmail -m /home/merlin/.procmail/procmailrc_marcme (marcatwww@merlins.org) <marcatwww@merlins.org> F=<SRS0=1t+S
> Ht=CS=solar.cc=info@eigbox.net> R=system_aliases_merlin T=address_pipe_root S=7550
>
> The Email was scanned in 1 second and accepted.

Are you sure? There is no line indicating the end of message processing.
I suppose it is somewhere below in the log, try to found it with
"grep 1kGRsw-0006mt-VG".
--
Eugene Berdnikov

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
On 10/09/2020 20:24, Marc MERLIN via Exim-users wrote:
> I have an issue where some Emails go to the backup MX, then the backup
> MX sends to the main MX, and the mail gets delivered to my mailbox, but
> the transaction isn't finished in a way that the backup knows the mail
> went through, so it sends it over and over again.

What sort of proportion of messages?

We'd really like to get debug from the primary MX for one such,
or a packet capture (though we'd want the TLS session keys for
the latter, which means debug on one end).

By the way - you're advertising plaintext AUTH methods on
plaintext connections. I *really* hope you don't accept
any plaintext-conn AUTHs from outside your organisation
(and, defence-in-depth, even within is poor practice).
--
Cheers,
Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
On Thu, 10 Sep 2020, Evgeniy Berdnikov via Exim-users wrote:

> On Thu, Sep 10, 2020 at 12:24:51PM -0700, Marc MERLIN via Exim-users wrote:
>> But on the other side, I see:
>> 2020-09-10 12:05:33 1kGRsw-0006mt-VG SA: Debug: SAEximRunCond expand returned: '1'
> ...
>> 2020-09-10 12:05:34 cwd=/var/spool/exim4 3 args: /usr/sbin/exim4 -Mc 1kGRsw-0006mt-VG
>> 2020-09-10 12:05:34 1kGRsw-0006mt-VG => |/usr/bin/procmail -m /home/merlin/.procmail/procmailrc_marcme (marcatwww@merlins.org) <marcatwww@merlins.org> F=<SRS0=1t+S
>> Ht=CS=solar.cc=info@eigbox.net> R=system_aliases_merlin T=address_pipe_root S=7550
>>
>> The Email was scanned in 1 second and accepted.
>
> Are you sure? There is no line indicating the end of message processing.
> I suppose it is somewhere below in the log, try to found it with
> "grep 1kGRsw-0006mt-VG".

"exigrep" should automate finding alllines for a message.

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
On Thu, Sep 10, 2020 at 10:47:45PM +0300, Evgeniy Berdnikov via Exim-users wrote:
> On Thu, Sep 10, 2020 at 12:24:51PM -0700, Marc MERLIN via Exim-users wrote:
> > But on the other side, I see:
> > 2020-09-10 12:05:33 1kGRsw-0006mt-VG SA: Debug: SAEximRunCond expand returned: '1'
> ...
> > 2020-09-10 12:05:34 cwd=/var/spool/exim4 3 args: /usr/sbin/exim4 -Mc 1kGRsw-0006mt-VG
> > 2020-09-10 12:05:34 1kGRsw-0006mt-VG => |/usr/bin/procmail -m /home/merlin/.procmail/procmailrc_marcme (marcatwww@merlins.org) <marcatwww@merlins.org> F=<SRS0=1t+S
> > Ht=CS=solar.cc=info@eigbox.net> R=system_aliases_merlin T=address_pipe_root S=7550
> >
> > The Email was scanned in 1 second and accepted.
>
> Are you sure? There is no line indicating the end of message processing.
> I suppose it is somewhere below in the log, try to found it with
> "grep 1kGRsw-0006mt-VG".

Yes, I got 10+ copies of each Email from that domain.

On Thu, Sep 10, 2020 at 08:49:49PM +0100, Jeremy Harris via Exim-users wrote:
> On 10/09/2020 20:24, Marc MERLIN via Exim-users wrote:
> > I have an issue where some Emails go to the backup MX, then the backup
> > MX sends to the main MX, and the mail gets delivered to my mailbox, but
> > the transaction isn't finished in a way that the backup knows the mail
> > went through, so it sends it over and over again.
>
> What sort of proportion of messages?

Good question. It's quite rare, maybe less than 0.1% of messages?

> We'd really like to get debug from the primary MX for one such,
> or a packet capture (though we'd want the TLS session keys for
> the latter, which means debug on one end).

I can try, but indeed I can't reproduce this right now. After the last
queue flush, the message went through for real, and now everything is
good with the rest.

> By the way - you're advertising plaintext AUTH methods on
> plaintext connections. I *really* hope you don't accept
> any plaintext-conn AUTHs from outside your organisation
> (and, defence-in-depth, even within is poor practice).

Thanks for checking.
I am only using this for myself (single user) and only connect over
ssmtp.

Yes, I should probably reconfigure this later, but functionally it's the
same in my use case.

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/ | PGP 7F55D5F27AAF9D08

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
Marc MERLIN via Exim-users <exim-users@exim.org> (Do 10 Sep 2020 21:24:51 CEST):
> Howdy,
> I have Exim version 4.92 #3 built 07-May-2019 17:44:23
> on both the backup MX and the main MX (both running debian)

> MX sends to the main MX, and the mail gets delivered to my mailbox, but
> the transaction isn't finished in a way that the backup knows the mail
> went through, so it sends it over and over again.

Sounds like a known (and fixed) issue to me. I'll try to find any
reference.
--
Heiko
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
On Thu, Sep 10, 2020 at 02:49:05PM -0700, Marc MERLIN via Exim-users wrote:
> On Thu, Sep 10, 2020 at 10:47:45PM +0300, Evgeniy Berdnikov via Exim-users wrote:
> > On Thu, Sep 10, 2020 at 12:24:51PM -0700, Marc MERLIN via Exim-users wrote:
> > > But on the other side, I see:
> > > 2020-09-10 12:05:33 1kGRsw-0006mt-VG SA: Debug: SAEximRunCond expand returned: '1'
> > ...
> > > 2020-09-10 12:05:34 cwd=/var/spool/exim4 3 args: /usr/sbin/exim4 -Mc 1kGRsw-0006mt-VG
> > > 2020-09-10 12:05:34 1kGRsw-0006mt-VG => |/usr/bin/procmail -m /home/merlin/.procmail/procmailrc_marcme (marcatwww@merlins.org) <marcatwww@merlins.org> F=<SRS0=1t+S
> > > Ht=CS=solar.cc=info@eigbox.net> R=system_aliases_merlin T=address_pipe_root S=7550
> > >
> > > The Email was scanned in 1 second and accepted.
> >
> > Are you sure? There is no line indicating the end of message processing.
> > I suppose it is somewhere below in the log, try to found it with
> > "grep 1kGRsw-0006mt-VG".
>
> Yes, I got 10+ copies of each Email from that domain.

These copies may be delivered in different sessions. My remark was
about 1kGRsw-0006mt-VG. I see no log record about termination of processing
of this message, which may bee too long and lead to timeout on SMTP
(BDAT stage), and consequent attempts to transfer mail again.

BTW, procmail may fall into lock wait, you have better to trace it.
Detailed log may be done with LOGFILE and VERBOSE in rcfile.
--
Eugene Berdnikov

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
Heiko Schlittermann via Exim-users <exim-users@exim.org> (Fr 11 Sep 2020 08:56:37 CEST):
> Marc MERLIN via Exim-users <exim-users@exim.org> (Do 10 Sep 2020 21:24:51 CEST):
> > Howdy,
> > I have Exim version 4.92 #3 built 07-May-2019 17:44:23
> > on both the backup MX and the main MX (both running debian)
> …
> > MX sends to the main MX, and the mail gets delivered to my mailbox, but
> > the transaction isn't finished in a way that the backup knows the mail
> > went through, so it sends it over and over again.
>
> Sounds like a known (and fixed) issue to me. I'll try to find any
> reference.

This may be fixed in 8f9adfd36222d4e9e730734e00dffe874073e5b4
This commit is currently only part of master and exim-4.94+fixes

If I remember well and if you really hit this bug, I believe, disabling
pipelining would mitigate it.

In case the Debian Exim package maintainer is willing, we can help
providing a backport of the above mentioned commit.

Best regards from Dresden/Germany
Viele Grüße aus Dresden
Heiko Schlittermann
--
SCHLITTERMANN.de ---------------------------- internet & unix support -
Heiko Schlittermann, Dipl.-Ing. (TU) - {fon,fax}: +49.351.802998{1,3} -
gnupg encrypted messages are welcome --------------- key ID: F69376CE -
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
>> Marc MERLIN via Exim-users <exim-users@exim.org> (Do 10 Sep 2020 21:24:51 CEST):
>>> Howdy,
>>> I have Exim version 4.92 #3 built 07-May-2019 17:44:23

Note that Buster, at least, is showing 4.92-8+deb10u4

--
Cheers,
Jeremy

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
On Fri, Sep 11, 2020 at 10:48:27AM +0200, Heiko Schlittermann via Exim-users wrote:
> > Sounds like a known (and fixed) issue to me. I'll try to find any
> > reference.
>
> This may be fixed in 8f9adfd36222d4e9e730734e00dffe874073e5b4
> This commit is currently only part of master and exim-4.94+fixes
>
> If I remember well and if you really hit this bug, I believe, disabling
> pipelining would mitigate it.

That was going to be my next thing to try :)

> In case the Debian Exim package maintainer is willing, we can help
> providing a backport of the above mentioned commit.

I can start looking in the source to see if that's in the latest
package.

On Fri, Sep 11, 2020 at 05:15:28PM +0100, Jeremy Harris via Exim-users wrote:
> >> Marc MERLIN via Exim-users <exim-users@exim.org> (Do 10 Sep 2020 21:24:51 CEST):
> >>> Howdy,
> >>> I have Exim version 4.92 #3 built 07-May-2019 17:44:23
>
> Note that Buster, at least, is showing 4.92-8+deb10u4

Yes. I only update exim (or anything) when I have to, due to how many
times I've been burned by upgrades that broke me in various ways. In the
case of the now half maintained sa-exim which I still use (yes, I suck,
I'm the original author, but I moved on to other things), I also have to
be careful as newer exim can break sa-exim.

Thanks for the pointers, I'll look for the best solution depending on
whether that patch is in a debian package, or not.
if not, I'll read the docs on how to disable pipelining on my backup MX.

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/
Re: SMTP timeout after pipelined end of data (5334 bytes written), hang at DAT 7247 LAST. Local_scan vs chunking issue? [ In reply to ]
Jeremy Harris via Exim-users <exim-users@exim.org> (Fr 11 Sep 2020 18:15:28 CEST):
> >> Marc MERLIN via Exim-users <exim-users@exim.org> (Do 10 Sep 2020 21:24:51 CEST):
> >>> Howdy,
> >>> I have Exim version 4.92 #3 built 07-May-2019 17:44:23
>
> Note that Buster, at least, is showing 4.92-8+deb10u4

My git repo on exim.org: https://git.exim.org/users/heiko/exim.git
contains a branch hs12/backport-cork-patch-to-4.92, which is based on
exim-4.92+fixes.

It should contain the backports of PIPELINING/GNUTLS fixes.
If anybody can confirm that these patches work, I'd push it to the
official repository. (The testsuite succeeded with the cork test)

--
Heiko