Mailing List Archive

Lingering close + unwritten data == failed connections
I've come across a situation whereby file transfers consistently fail from
an httpd server. On the one hand it's a bit of an edge case, but on the
other it definitely seems to be incorrect behaviour. I'm sure this must
have been discussed before, but I couldn't find anything much on the list
with an admittedly fairly brief search.

Essentially, npm/event/event.c waits in lingering close state for
MAX_SECS_TO_LINGER ( which is defined as 30 ) before forcibly closing the
connection - if there's still unacknowledged write data in the kernel
socket at this point, the connection fails. Luckily, the conditions under
which this can happen are fairly limited - essentially it amounts to the
receiver not being able to accept data quickly enough. On Linux at least,
the default write buffer for a socket seems to be 212992 bytes ( well,
that's /proc/sys/net/core/wmem_(default|max), the actual value used will be
less - the manpage suggesting half, though my experiments don't bear that
out. ) For that to drain in 30 seconds, the transfer speed needs to be at
least 57kbit/s. Whilst that's pretty slow, remember that there could be
many simultaneous connections, so the size of the pipe that starts to cause
issues could be considerably larger. Of course, the file(s) being
transferred would also need to be big enough to fill that buffer - smaller
files result in even lower transfer rates needed before the issue happens.

A simple test case for all this is I set up a web server, client machine,
and two routers in between to act as a WAN emulator. On each of the ( Linux
) routers I did:

tc qdisc add dev eth2 root netem limit 100000 rate 1000kbit

( eth2 is obviously the "WAN" interface. ) Issuing 20 simultaneous "wget"
commands from the client machine to fetch a 1M file with no retries
resulted in 14 of them failing. It actually seems to struggle at 8
simultaneous connections and above - this is with a fairly default
compilation of httpd from source.

On Linux at least, you can see how much unsent data remains by querying the
SIOCOUTQ ioctl, so the mitigation would be to check to see that ANY data
was draining at all, and if so ( and there's some left ) extend the
lingering close time and repeat. However, this wouldn't be a cross platform
solution, but it would at least be the "correct" thing to do in terms of
network function. Not sure if there's an equivalent on other systems.

Adam
Re: Lingering close + unwritten data == failed connections [ In reply to ]
On Wed, Oct 21, 2020 at 05:17:01PM +0100, Adam Hill wrote:
> On Linux at least, you can see how much unsent data remains by querying the
> SIOCOUTQ ioctl, so the mitigation would be to check to see that ANY data
> was draining at all, and if so ( and there's some left ) extend the
> lingering close time and repeat. However, this wouldn't be a cross platform
> solution, but it would at least be the "correct" thing to do in terms of
> network function. Not sure if there's an equivalent on other systems.

Nice writeup, thank you. Also discussed with similar conclusions here:
https://bz.apache.org/bugzilla/show_bug.cgi?id=63666 and
https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable
and a quite related writeup:
https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/ which
suggests using the Linux TCP_INFO socket option for getting information
about how fast the send buffer is draining.

I don't have much of substance to contribute...

If it is not safe to implement a lingering close using a timeout on
socket readability to detect EOF, it seems like it is impossible to
implement a lingering close with a timeout using the BSD socket API,
which seems rather shocking conclusion. So I kind of wish that
something was missed here, but multiple people have come to exactly that
conclusion independently.

Regards, Joe
Re: Lingering close + unwritten data == failed connections [ In reply to ]
On Wed, Oct 28, 2020 at 6:40 PM Joe Orton <jorton@redhat.com> wrote:
>
> On Wed, Oct 21, 2020 at 05:17:01PM +0100, Adam Hill wrote:
> > On Linux at least, you can see how much unsent data remains by querying the
> > SIOCOUTQ ioctl, so the mitigation would be to check to see that ANY data
> > was draining at all, and if so ( and there's some left ) extend the
> > lingering close time and repeat. However, this wouldn't be a cross platform
> > solution, but it would at least be the "correct" thing to do in terms of
> > network function. Not sure if there's an equivalent on other systems.
>
> Nice writeup, thank you.

+1

> So I kind of wish that
> something was missed here, but multiple people have come to exactly that
> conclusion independently.

It may be due to r1802875 where I added RST (SO_LINGER.l_linger = 0)
after lingering close timeout.
Thinking of it now, it's probably not the right thing to do. Simply
calling apr_socket_close() in abort_socket_nonblocking() would allow
the system's lingering close after httpd's.

Adam, can you still observe the same behaviour with the attached
mpm_event patch applied?


Regards;
Yann.
Re: Lingering close + unwritten data == failed connections [ In reply to ]
Hi Yann,

Yep, I can confirm that the patch fixes the issue. Interestingly ( or maybe
not ) I had a quick glance at apr_socket_close and it seems to set a
SO_LINGER timeout of 30 seconds, so I sort of expected the problem still to
happen but at half the transfer rate.... but that doesn't seem to be the
case. As I say, it was a very cursory look, so maybe it does more than that
( or maybe the linger timeout is just time for the close() call to return
but RST isn't sent. )

Anyway, this does seem to be the fix, and you've got to hope that any type
of DoS attempting to take advantage of sockets in the various CLOSE_WAIT et
al states would be mitigated at kernel level.

Thanks for looking at this Yann.

Adam

On Sat, 31 Oct 2020 at 00:57, Yann Ylavic <ylavic.dev@gmail.com> wrote:

> On Wed, Oct 28, 2020 at 6:40 PM Joe Orton <jorton@redhat.com> wrote:
> >
> > On Wed, Oct 21, 2020 at 05:17:01PM +0100, Adam Hill wrote:
> > > On Linux at least, you can see how much unsent data remains by
> querying the
> > > SIOCOUTQ ioctl, so the mitigation would be to check to see that ANY
> data
> > > was draining at all, and if so ( and there's some left ) extend the
> > > lingering close time and repeat. However, this wouldn't be a cross
> platform
> > > solution, but it would at least be the "correct" thing to do in terms
> of
> > > network function. Not sure if there's an equivalent on other systems.
> >
> > Nice writeup, thank you.
>
> +1
>
> > So I kind of wish that
> > something was missed here, but multiple people have come to exactly that
> > conclusion independently.
>
> It may be due to r1802875 where I added RST (SO_LINGER.l_linger = 0)
> after lingering close timeout.
> Thinking of it now, it's probably not the right thing to do. Simply
> calling apr_socket_close() in abort_socket_nonblocking() would allow
> the system's lingering close after httpd's.
>
> Adam, can you still observe the same behaviour with the attached
> mpm_event patch applied?
>
>
> Regards;
> Yann.
>
Re: Lingering close + unwritten data == failed connections [ In reply to ]
On Sat, Oct 31, 2020 at 01:57:08AM +0100, Yann Ylavic wrote:
> On Wed, Oct 28, 2020 at 6:40 PM Joe Orton <jorton@redhat.com> wrote:
> >
> > On Wed, Oct 21, 2020 at 05:17:01PM +0100, Adam Hill wrote:
> > > On Linux at least, you can see how much unsent data remains by querying the
> > > SIOCOUTQ ioctl, so the mitigation would be to check to see that ANY data
> > > was draining at all, and if so ( and there's some left ) extend the
> > > lingering close time and repeat. However, this wouldn't be a cross platform
> > > solution, but it would at least be the "correct" thing to do in terms of
> > > network function. Not sure if there's an equivalent on other systems.
> >
> > Nice writeup, thank you.
>
> +1
>
> > So I kind of wish that
> > something was missed here, but multiple people have come to exactly that
> > conclusion independently.
>
> It may be due to r1802875 where I added RST (SO_LINGER.l_linger = 0)
> after lingering close timeout.

Oh, happy to see I missed that! Thanks Yann.

> Thinking of it now, it's probably not the right thing to do. Simply
> calling apr_socket_close() in abort_socket_nonblocking() would allow
> the system's lingering close after httpd's.

+1

> Adam, can you still observe the same behaviour with the attached
> mpm_event patch applied?
>
>
> Regards;
> Yann.

> Index: server/mpm/event/event.c
> ===================================================================
> --- server/mpm/event/event.c (revision 1881339)
> +++ server/mpm/event/event.c (working copy)
> @@ -526,21 +526,6 @@ static void abort_socket_nonblocking(apr_socket_t
> {
> apr_status_t rv;
> apr_socket_timeout_set(csd, 0);
> -#if defined(SOL_SOCKET) && defined(SO_LINGER)
> - /* This socket is over now, and we don't want to block nor linger
> - * anymore, so reset it. A normal close could still linger in the
> - * system, while RST is fast, nonblocking, and what the peer will
> - * get if it sends us further data anyway.
> - */
> - {
> - apr_os_sock_t osd = -1;
> - struct linger opt;
> - opt.l_onoff = 1;
> - opt.l_linger = 0; /* zero timeout is RST */
> - apr_os_sock_get(&osd, csd);
> - setsockopt(osd, SOL_SOCKET, SO_LINGER, (void *)&opt, sizeof opt);
> - }
> -#endif
> rv = apr_socket_close(csd);
> if (rv != APR_SUCCESS) {
> ap_log_error(APLOG_MARK, APLOG_ERR, rv, ap_server_conf, APLOGNO(00468)
Re: Lingering close + unwritten data == failed connections [ In reply to ]
Hi Adam,

On Mon, Nov 2, 2020 at 1:04 PM Adam Hill <sidepipeuk@gmail.com> wrote:
>
> Yep, I can confirm that the patch fixes the issue.

Thanks for testing, committed to trunk in https://svn.apache.org/r1883097
I'll propose a backport to 2.4.x ASAP.

> Interestingly ( or maybe not ) I had a quick glance at apr_socket_close and it seems to set a SO_LINGER timeout of 30 seconds, so I sort of expected the problem still to happen but at half the transfer rate.... but that doesn't seem to be the case.

I don't see any (internal) use of (APR_)SO_LINGER in the APR library,
one can call apr_socket_opt_set() to set the option on the socket but
neither httpd nor APR seem to actually use it.

> As I say, it was a very cursory look, so maybe it does more than that ( or maybe the linger timeout is just time for the close() call to return but RST isn't sent. )

That would be bad actually, SO_LINGER with a positive timeout (as
opposed to zero timeout to reset the connection like mpm_event did)
would/could cause close() to block, while abort_socket_nonblocking()
in mpm_event must not block (at least from some callers).

Unix systems don't block on close() unless SO_LINGER is used, removing
the reset depends on this actually.

>
> Anyway, this does seem to be the fix, and you've got to hope that any type of DoS attempting to take advantage of sockets in the various CLOSE_WAIT et al states would be mitigated at kernel level.

It certainly will do better than httpd which has no control on this anyway :)

>
> Thanks for looking at this Yann.

Thanks for testing!


Regards;
Yann.
Re: Lingering close + unwritten data == failed connections [ In reply to ]
> Thanks for testing!

And investigating the bug, nice report.