Mailing List Archive

tcp send buffering and keepalive races
People might recall an event bug where keepalive connections might be
closed up to 200ms early (r1874350).

I was recently looking at something with $bigco hat on where (IIUC) a
slow TTFB for a proxied request causes TCP congestion to kick in and
makes even a relatively short response sit in the write buffer.

From the behavior, it appears the browser is:

1) willing to use nearly every millisecond of the advertised KeepAlive
time for reusing a connection from its pool
2) starts counting from when the response is complete
3) can't be asked to use Expect: 100-continue on an XHR POST
4) leaves error handling up to the caller and doesn't give it a ton of feedback

This results in Apache starting the keepalive countdown "tens" of
milliseconds early while the last bytes of the response are in the
queue. If we get unlucky, a POST and a FIN cross in the night on a
subsequent request.

These types of investigations can be really painful. Is there any
harm in allowing the server to act like "KeepAliveTimeout 5" is e.g.
"KeepAliveTimeout 5200ms".

If this fudge buffer existed as an addl directive (rather than a trick
documented in KeepAliveTimeout) , would it be reasonable as a non-zero
default to discourage this race?
Re: tcp send buffering and keepalive races [ In reply to ]
On Fri, May 27, 2022 at 7:34 PM Eric Covener <covener@gmail.com> wrote:
>
> These types of investigations can be really painful. Is there any
> harm in allowing the server to act like "KeepAliveTimeout 5" is e.g.
> "KeepAliveTimeout 5200ms".
>
> If this fudge buffer existed as an addl directive (rather than a trick
> documented in KeepAliveTimeout) , would it be reasonable as a non-zero
> default to discourage this race?

Sounds reasonable to me.
Possibly mpm_event when busy should not kill connections in keep-alive
state for less than the timeout bias (200ms in your example) too?

Regards;
Yann.
Re: tcp send buffering and keepalive races [ In reply to ]
On 5/27/22 7:33 PM, Eric Covener wrote:
> People might recall an event bug where keepalive connections might be
> closed up to 200ms early (r1874350).
>
> I was recently looking at something with $bigco hat on where (IIUC) a
> slow TTFB for a proxied request causes TCP congestion to kick in and
> makes even a relatively short response sit in the write buffer.
>
>>From the behavior, it appears the browser is:
>
> 1) willing to use nearly every millisecond of the advertised KeepAlive
> time for reusing a connection from its pool
> 2) starts counting from when the response is complete
> 3) can't be asked to use Expect: 100-continue on an XHR POST
> 4) leaves error handling up to the caller and doesn't give it a ton of feedback
>
> This results in Apache starting the keepalive countdown "tens" of
> milliseconds early while the last bytes of the response are in the
> queue. If we get unlucky, a POST and a FIN cross in the night on a
> subsequent request.
>
> These types of investigations can be really painful. Is there any
> harm in allowing the server to act like "KeepAliveTimeout 5" is e.g.
> "KeepAliveTimeout 5200ms".
>
> If this fudge buffer existed as an addl directive (rather than a trick
> documented in KeepAliveTimeout) , would it be reasonable as a non-zero
> default to discourage this race?
>

In the end you want to get to a KeepAlive we announce to the client and
a KeepAlive which is longer than that that we execute.
My understanding of keepalive is that the client cannot take for granted
that a connection is really kept alive for as long as it was announced by
the server (it SHOULD be but there seems no MUST) and in fact we close keepalive
connections if get too busy and keeping these would prevent us from accepting
new connections.
Hence I think the issue will not be fixed in all situations.
I am willing to have this possibility, I guess best by adding an additional
amount of grace to the KeepAliveTimeout configurable by a directive, but I think
it should be zero by default to avoid confusion unless the behavior you report above
is widespread.

Regards

Rüdiger
Re: tcp send buffering and keepalive races [ In reply to ]
Sent from my iPhone

> On May 30, 2022, at 16:21, Ruediger Pluem <rpluem@apache.org> wrote:
>
> ?
>
>> On 5/27/22 7:33 PM, Eric Covener wrote:
>> People might recall an event bug where keepalive connections might be
>> closed up to 200ms early (r1874350).
>>
>> I was recently looking at something with $bigco hat on where (IIUC) a
>> slow TTFB for a proxied request causes TCP congestion to kick in and
>> makes even a relatively short response sit in the write buffer.
>>
>>> From the behavior, it appears the browser is:
>>
>> 1) willing to use nearly every millisecond of the advertised KeepAlive
>> time for reusing a connection from its pool
>> 2) starts counting from when the response is complete
>> 3) can't be asked to use Expect: 100-continue on an XHR POST
>> 4) leaves error handling up to the caller and doesn't give it a ton of feedback
>>
>> This results in Apache starting the keepalive countdown "tens" of
>> milliseconds early while the last bytes of the response are in the
>> queue. If we get unlucky, a POST and a FIN cross in the night on a
>> subsequent request.
>>
>> These types of investigations can be really painful. Is there any
>> harm in allowing the server to act like "KeepAliveTimeout 5" is e.g.
>> "KeepAliveTimeout 5200ms".
>>
>> If this fudge buffer existed as an addl directive (rather than a trick
>> documented in KeepAliveTimeout) , would it be reasonable as a non-zero
>> default to discourage this race?
>>
>
> In the end you want to get to a KeepAlive we announce to the client and
> a KeepAlive which is longer than that that we execute.
> My understanding of keepalive is that the client cannot take for granted
> that a connection is really kept alive for as long as it was announced by
> the server (it SHOULD be but there seems no MUST) and in fact we close keepalive
> connections if get too busy and keeping these would prevent us from accepting
> new connections.
> Hence I think the issue will not be fixed in all situations.
> I am willing to have this possibility, I guess best by adding an additional
> amount of grace to the KeepAliveTimeout configurable by a directive, but I think
> it should be zero by default to avoid confusion unless the behavior you report above
> is widespread.
>
> Regards
>
> Rüdiger
Re: tcp send buffering and keepalive races [ In reply to ]
Can someone remove Nam Ho from the ML please? The spamming has been going
on for weeks now.

On Mon, 30 May 2022 at 05:31, Nam H? <honamluxurychef@icloud.com> wrote:

>
>
> Sent from my iPhone
>
> > On May 30, 2022, at 16:21, Ruediger Pluem <rpluem@apache.org> wrote:
> >
> > ?
> >
> >> On 5/27/22 7:33 PM, Eric Covener wrote:
> >> People might recall an event bug where keepalive connections might be
> >> closed up to 200ms early (r1874350).
> >>
> >> I was recently looking at something with $bigco hat on where (IIUC) a
> >> slow TTFB for a proxied request causes TCP congestion to kick in and
> >> makes even a relatively short response sit in the write buffer.
> >>
> >>> From the behavior, it appears the browser is:
> >>
> >> 1) willing to use nearly every millisecond of the advertised KeepAlive
> >> time for reusing a connection from its pool
> >> 2) starts counting from when the response is complete
> >> 3) can't be asked to use Expect: 100-continue on an XHR POST
> >> 4) leaves error handling up to the caller and doesn't give it a ton of
> feedback
> >>
> >> This results in Apache starting the keepalive countdown "tens" of
> >> milliseconds early while the last bytes of the response are in the
> >> queue. If we get unlucky, a POST and a FIN cross in the night on a
> >> subsequent request.
> >>
> >> These types of investigations can be really painful. Is there any
> >> harm in allowing the server to act like "KeepAliveTimeout 5" is e.g.
> >> "KeepAliveTimeout 5200ms".
> >>
> >> If this fudge buffer existed as an addl directive (rather than a trick
> >> documented in KeepAliveTimeout) , would it be reasonable as a non-zero
> >> default to discourage this race?
> >>
> >
> > In the end you want to get to a KeepAlive we announce to the client and
> > a KeepAlive which is longer than that that we execute.
> > My understanding of keepalive is that the client cannot take for granted
> > that a connection is really kept alive for as long as it was announced by
> > the server (it SHOULD be but there seems no MUST) and in fact we close
> keepalive
> > connections if get too busy and keeping these would prevent us from
> accepting
> > new connections.
> > Hence I think the issue will not be fixed in all situations.
> > I am willing to have this possibility, I guess best by adding an
> additional
> > amount of grace to the KeepAliveTimeout configurable by a directive, but
> I think
> > it should be zero by default to avoid confusion unless the behavior you
> report above
> > is widespread.
> >
> > Regards
> >
> > Rüdiger
>
Re: tcp send buffering and keepalive races [ In reply to ]
On 5/30/22 2:35 PM, Frank Gingras wrote:
> Can someone remove Nam Ho from the ML please? The spamming has been going on for weeks now.
>

Removed

Regards

Rüdiger
Re: tcp send buffering and keepalive races [ In reply to ]
> >>From the behavior, it appears the browser is:
> >
> > 1) willing to use nearly every millisecond of the advertised KeepAlive
> > time for reusing a connection from its pool

Just so this doesn't incept anyone with this misinfo, I recently
learned that chrome does not parse the Keep-Alive timeout= parm and
uses a fixed 5 minute TTL in its connection pool.
So these 200ms-extra games don't help for chrome.