Mailing List Archive

[Bug 2665] SMTP timeout after sending data block
https://bugs.exim.org/show_bug.cgi?id=2665

Jeremy Harris <jgh146exb@wizmail.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|SMTP timeout after end of |SMTP timeout after sending
|data |data block

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #8 from Jeremy Harris <jgh146exb@wizmail.org> ---
The smtp transport option "data_timeout" is likely relevant here. However, at
the default value of five minutes - if you're running into that, the far end
really does have a problem.

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #9 from Heiko Schlittermann <hs@schlittermann.de> ---
(In reply to Joseph Diffin from comment #5)
> (In reply to Heiko Schlittermann from comment #2)
> > Does the issue persist, if you disable pipelining on your side?
>
> We do not have pipelining enabled

Well, to my knowledge, we have a config option for the smtp transport
`hosts_avoid_pipelining` which is set to an empty list per default.

Try setting this option to a literal *

But, that's just guesswork.
What Exim Version are you using? Distro Package? Which Distro?

--
HS

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

Phil Pennock <pdp@exim.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |pdp@exim.org

--- Comment #10 from Phil Pennock <pdp@exim.org> ---
Decoding RTF files and puzzling a lot, is this a correct summarisation?

1. You are running Exim 4.94
2. For the domain gloucestershire.pnn.police.uk the MX is Message Labs, for
filtering, and they then forward onto you
3. You are running the service mailgate.gsi.gov.uk
4. For some mails, predominantly those from Amazon SES originally, when Message
Labs sends onto your server those mails, you see long timeouts
5. These GSI servers are in Vodafone hosting, so not on your premises

In addition:

6. Is it fair to say that these messages from SES are a bit larger than many of
the other external-sender inbound messages received through this filter?
7. Please confirm that the firewalls in your hosting environment are configured
to allow ICMP 3/4 ("Destination Unreachable", "fragmentation needed and DF
set") to pass through.

An unfortunate pattern is for firewalls to block all ICMP, claiming that this
improves security, while ignoring that Path MTU Discovery requires that certain
ICMP pass through. For "large packets seen first from TCP connection
initiator" (as opposed to "small request, big response" patterns such as most
HTTP), this causes connections to hang, but only when the packet sizes go up
and the sender and recipient are trying to figure out the correct MTU size to
use for packets.

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #11 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Jeremy Harris from comment #7)
> Well, those are processed from the log and not the originals. There is
> info missing which would have been useful for diagnosis. It seems like you
> will have to chase this further on your own, if you cannot leak any more
> detail.
> I suggest adding +millisec to your log_selector then doing a manual delivery
> attempt on one such queued item, with "-d-all+transport+timestamp" level
> debug. Capture the stderr; there will be plenty.
>
> I expect that you will see a bunch of message content being written, and then
> a rather long pause waiting for the far end. After long enough, Exim will
> give
> up waiting. If this is what is happening then you'll need to move to looking
> at that SMTP peer system.

Thanks Jeremy - I loaded up the raw logfile entry for the email which we
attached to the ticket originally - hopefully that is more informative for you
-
In the meantime we will try you logging , although it may be Monday before I
get a look at this again

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #12 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
Created attachment 1345
--> https://bugs.exim.org/attachment.cgi?id=1345&action=edit
raw log file

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #13 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Heiko Schlittermann from comment #9)
> (In reply to Joseph Diffin from comment #5)
> > (In reply to Heiko Schlittermann from comment #2)
> > > Does the issue persist, if you disable pipelining on your side?
> >
> > We do not have pipelining enabled
>
> Well, to my knowledge, we have a config option for the smtp transport
> `hosts_avoid_pipelining` which is set to an empty list per default.
>
> Try setting this option to a literal *
>
> But, that's just guesswork.
> What Exim Version are you using? Distro Package? Which Distro?
>
> --
> HS

We will try this
We are running EXIM 4.94 on SUSE linux SLES12 SP5

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #14 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Phil Pennock from comment #10)
> Decoding RTF files and puzzling a lot, is this a correct summarisation?
>
> 1. You are running Exim 4.94
> 2. For the domain gloucestershire.pnn.police.uk the MX is Message Labs, for
> filtering, and they then forward onto you
> 3. You are running the service mailgate.gsi.gov.uk
> 4. For some mails, predominantly those from Amazon SES originally, when
> Message Labs sends onto your server those mails, you see long timeouts
> 5. These GSI servers are in Vodafone hosting, so not on your premises
>
> In addition:
>
> 6. Is it fair to say that these messages from SES are a bit larger than many
> of the other external-sender inbound messages received through this filter?
> 7. Please confirm that the firewalls in your hosting environment are
> configured to allow ICMP 3/4 ("Destination Unreachable", "fragmentation
> needed and DF set") to pass through.
>
> An unfortunate pattern is for firewalls to block all ICMP, claiming that
> this improves security, while ignoring that Path MTU Discovery requires that
> certain ICMP pass through. For "large packets seen first from TCP
> connection initiator" (as opposed to "small request, big response" patterns
> such as most HTTP), this causes connections to hang, but only when the
> packet sizes go up and the sender and recipient are trying to figure out the
> correct MTU size to use for packets.

The email flow is
eu-west-1.amazonses.com {a wrapper for eeperks for example to messagelabs - to
us {GCF Core Relays fujitsu/vodafone} to gloucester pnn

I've asked our Firewall team to look at these settings hopefully they'll get
back to me quickly

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #15 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
Created attachment 1346
--> https://bugs.exim.org/attachment.cgi?id=1346&action=edit
debug on a queued message +millisecond

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #16 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Jeremy Harris from comment #7)
> Well, those are processed from the log and not the originals. There is
> info missing which would have been useful for diagnosis. It seems like you
> will have to chase this further on your own, if you cannot leak any more
> detail.
> I suggest adding +millisec to your log_selector then doing a manual delivery
> attempt on one such queued item, with "-d-all+transport+timestamp" level
> debug. Capture the stderr; there will be plenty.
>
> I expect that you will see a bunch of message content being written, and then
> a rather long pause waiting for the far end. After long enough, Exim will
> give
> up waiting. If this is what is happening then you'll need to move to looking
> at that SMTP peer system.

I have attached the debug stderr output on an example mail

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #17 from Jeremy Harris <jgh146exb@wizmail.org> ---
The first of the two smtp connections shown is taking a timeout
"after end of data" - and it is after 10 minutes, which is the default value
for the relevant option.

We can't tell if the data (and terminating dot) made it to the far end
(and the remote MTA choked), or our kernel wasn't able to send it on the
wire (cf. Phil's point in comment 10). A packet capture would show for sure,
but care over data redaction would be needed before sharing one here
(the TCP headers for the one connection only, plus ICMPs. And maybe
obfuscating IP addresses and MAC addresses too).

The remote MTA log might (or not) be helpful. It isn't an Exim.

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #18 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Jeremy Harris from comment #17)
> The first of the two smtp connections shown is taking a timeout
> "after end of data" - and it is after 10 minutes, which is the default value
> for the relevant option.
>
> We can't tell if the data (and terminating dot) made it to the far end
> (and the remote MTA choked), or our kernel wasn't able to send it on the
> wire (cf. Phil's point in comment 10). A packet capture would show for sure,
> but care over data redaction would be needed before sharing one here
> (the TCP headers for the one connection only, plus ICMPs. And maybe
> obfuscating IP addresses and MAC addresses too).
>
> The remote MTA log might (or not) be helpful. It isn't an Exim.

Thanks for all the help so far - we're requesting packet captures to be set up
at the customer end and at our end. I'll update the ticket as soon as we have
put it place and analysed the outputs

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #19 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Joseph Diffin from comment #18)
> (In reply to Jeremy Harris from comment #17)
> > The first of the two smtp connections shown is taking a timeout
> > "after end of data" - and it is after 10 minutes, which is the default value
> > for the relevant option.
> >
> > We can't tell if the data (and terminating dot) made it to the far end
> > (and the remote MTA choked), or our kernel wasn't able to send it on the
> > wire (cf. Phil's point in comment 10). A packet capture would show for sure,
> > but care over data redaction would be needed before sharing one here
> > (the TCP headers for the one connection only, plus ICMPs. And maybe
> > obfuscating IP addresses and MAC addresses too).
> >
> > The remote MTA log might (or not) be helpful. It isn't an Exim.
>
> Thanks for all the help so far - we're requesting packet captures to be set
> up at the customer end and at our end. I'll update the ticket as soon as we
> have put it place and analysed the outputs

The customer had already obtained packet captures - this is what they've
provided :
From: EE-Perk <EEPerk@perk.ee.co.uk>
Reply-To: EEPerk@perk.ee.co.uk
To: xxxx.xxxx@gloucestershire.police.uk
Message-ID:
<010201754ff70f34-59ac34b4-e459-4214-961e-870f6fc5b310-000000@eu-west-1.amazonses.com>
Subject: Password Reset
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_38654425_2143421832.1603364387718"
extra_params:
{"trackOpens":true,"extraparams":{"authkey":"12a3cd31-afab-4349-9f87-590342f53444","x_req_origin_time":"2020-10-22
10:40:11.722","channel":19,"destination":"xxxx.xxxx@gloucestershire.police.uk","templateId":"92","main_tid":"80ac5aed-a2ee-427a-92f4-5c171d04c264","userid":101837,"tid":"80ac5aed-a2ee-427a-92f4-5c171d04c264","sid":"6101_3672","report_info":{"request_source":3,"source_tid":"1a7f2f01-0546-4e60-9a73-d9731e9f6863"},"routerName":"","outbound_notification_events":["SENT","DELIVERED","REJECTED"],"routerId":"","teamid":1949,"noOfMessages":1,"_nodetid":"80ac5aed-a2ee-427a-92f4-5c171d04c264","destinationType":"email","notificationurl":"https://perk.ee.co.uk/imiconnect/EmailReciept","messagePriority":1,"groupid":1278,"sfeSessionInfo":"transid::1a7f2f01-0546-4e60-9a73-d9731e9f6863_8687_13628","api_version":"V1","sourceTid":"1a7f2f01-0546-4e60-9a73-d9731e9f6863","sfe_dr":0,"mt_trans_date":"2020-10-22
10:40:11.723","evtid":"9","servicekey":"652d8732-9c7d-11e7-bf0b-0213261164bb","corelationid":"798002","apiDestination":"xxxx.xxxx@gloucestershire.police.uk","flowid":"3672","txn_log_level":0},"clientId":754,"appId":"a_636419409202285866","cha

If we compare this to the example we see
1376  extra_params:
{"trackOpens":true,"extraparams":{"authkey":"12a3cd31-afab-4349-9f87-590342f53444","x_req_origin_time":"2020-10-24
12:31:51.879","channel":19,"destination":"xxxx.xxxx@gloucestershire.pnn.police.uk","templateId":"84","main_tid":"b9ef785e-35ea-4e61-a007-fad4aea7e341","userid":101837,"tid":"b9ef785e-35ea-4e61-a007-fad4aea7e341","sid":"6101_3672","report_info":{"request_source":3,"source_tid":"b75b8e8f-870b-4f1b-b82b-0e8cdf900665"},"routerName":"","outbound_notification_events":["SENT","DELIVERED","REJECTED"],"routerId":"","teamid":1949,"noOfMessages":1,"_nodetid":"b9ef785e-35ea-4e61-a007-fad4aea7e341","destinationType":"email","notificationurl":"https://perk.ee.co.uk/imiconnect/EmailReciept","messagePriority":1,"groupid":1278,"sfeSessionInfo":"transid::b75b8e8f-870b-4f1b-b82b-0e8cdf900665_8687_13628","api_version":"V1","sourceTid":"b75b8e8f-870b-4f1b-b82b-0e8cdf900665","sfe_dr":0,"mt_trans_date":"2020-10-24
12:31:51.880","evtid":"9","servicekey":"652d8732-9c7d-11e7-bf0b-0213261164bb","corelationid":"802205","apiDestination":"xxxx.xxxx@gloucestershire.pnn.police.uk","flowid":"3672","txn_log_level":0},"clientId":754,"appId":"a_636419409202285866","cha
This is the part being "dropped"
nnel":"email","serviceKey":"652d8732-9c7d-11e7-bf0b-0213261164bb","trackClicks":false,"serviceId":6101,"tid":"b9ef785e-35ea-4e61-a007-fad4aea7e341","status":1,"ts":"2020-10-24T12:31:51.886Z"}

I'll provide another update when we carry out our packet capture

--
You are receiving this mail because:
You are on the CC list for the bug.
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #20 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
Created attachment 1347
--> https://bugs.exim.org/attachment.cgi?id=1347&action=edit
another example

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #21 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
Created attachment 1348
--> https://bugs.exim.org/attachment.cgi?id=1348&action=edit
wireshark output

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #22 from Joseph Diffin <Joseph.diffin@fujitsu.com> ---
(In reply to Jeremy Harris from comment #17)
> The first of the two smtp connections shown is taking a timeout
> "after end of data" - and it is after 10 minutes, which is the default value
> for the relevant option.
>
> We can't tell if the data (and terminating dot) made it to the far end
> (and the remote MTA choked), or our kernel wasn't able to send it on the
> wire (cf. Phil's point in comment 10). A packet capture would show for sure,
> but care over data redaction would be needed before sharing one here
> (the TCP headers for the one connection only, plus ICMPs. And maybe
> obfuscating IP addresses and MAC addresses too).
>
> The remote MTA log might (or not) be helpful. It isn't an Exim.

Have you had a chance to look at the latest evidence?

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##
[Bug 2665] SMTP timeout after sending data block [ In reply to ]
https://bugs.exim.org/show_bug.cgi?id=2665

--- Comment #23 from Jeremy Harris <jgh146exb@wizmail.org> ---
Looks like you have a networking problem - nothing after the 192'nd byte of
the TCP connection is being ack'd at the TCP level. We see the SMTP data
go-ahead,
and the next packet being sent by the kernel (the first one of the SMTP data)
is
2774 bytes. That could get fragmented down to the interface MTU by the NIC, if
you have offload enabled (likely) - we're not seeing the actual wire-size
packets
here. In any case, the far end isn't ack'ing.

Summary: probable MTU issue on the net, compounded by ICMP being disabled.

--
You are receiving this mail because:
You are on the CC list for the bug.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-dev Exim details at http://www.exim.org/ ##