Mailing List Archive

backend connections life times
It looks like we stumbled upon an issue in https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life times of our backend connections.

When a frontend connection causes a backend request and drops, our backend connection only notifies the loss when it attempts to pass some data. In normal http response processing, this is not an issue since response chunks are usually coming in quite frequently. Then the proxied connection will fail to pass it to an aborted frontend connection and cleanup will occur.

However, with such modern shenanigans such as Server Side Events (SSE), the request is supposed to be long running and will produce body chunks quite infrequently, like every 30 seconds or so. This leaves our proxy workers hanging in recv for quite a while and may lead to worker exhaustion.

We can say SSE is a bad idea anyway, but that will probably not stop people from doing such crazy things.

What other mitigations do we have?
- pthread_kill() will interrupt the recv and probably make it fail
- we can use shorter socket timeouts on backend and check r->connection status in between
- ???

Whatever the means, I think it would be a Good Thing to abort backend connections earlier than we do now.

WDYT?

- Stefan
Re: backend connections life times [ In reply to ]
On Wed, Jun 30, 2021 at 11:46 AM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> It looks like we stumbled upon an issue in https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life times of our backend connections.
>
> When a frontend connection causes a backend request and drops, our backend connection only notifies the loss when it attempts to pass some data. In normal http response processing, this is not an issue since response chunks are usually coming in quite frequently. Then the proxied connection will fail to pass it to an aborted frontend connection and cleanup will occur.
>
> However, with such modern shenanigans such as Server Side Events (SSE), the request is supposed to be long running and will produce body chunks quite infrequently, like every 30 seconds or so. This leaves our proxy workers hanging in recv for quite a while and may lead to worker exhaustion.
>
> We can say SSE is a bad idea anyway, but that will probably not stop people from doing such crazy things.
>
> What other mitigations do we have?
> - pthread_kill() will interrupt the recv and probably make it fail
> - we can use shorter socket timeouts on backend and check r->connection status in between
> - ???


In trunk the tunnelling side of mod_proxy_http can go async and get
called back for activity on either side by asking Event to watch both
sockets.

I'm not sure how browsers treat the SSE connection, can it ever have a
subsequent request? If not, maybe we could see the SSE Content-Type
and shoehorn it into the tunneling (figuring out what to do with
writes from the client, backport the event and async tunnel stuff?)
Re: backend connections life times [ In reply to ]
> Am 30.06.2021 um 18:01 schrieb Eric Covener <covener@gmail.com>:
>
> On Wed, Jun 30, 2021 at 11:46 AM Stefan Eissing
> <stefan.eissing@greenbytes.de> wrote:
>>
>> It looks like we stumbled upon an issue in https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life times of our backend connections.
>>
>> When a frontend connection causes a backend request and drops, our backend connection only notifies the loss when it attempts to pass some data. In normal http response processing, this is not an issue since response chunks are usually coming in quite frequently. Then the proxied connection will fail to pass it to an aborted frontend connection and cleanup will occur.
>>
>> However, with such modern shenanigans such as Server Side Events (SSE), the request is supposed to be long running and will produce body chunks quite infrequently, like every 30 seconds or so. This leaves our proxy workers hanging in recv for quite a while and may lead to worker exhaustion.
>>
>> We can say SSE is a bad idea anyway, but that will probably not stop people from doing such crazy things.
>>
>> What other mitigations do we have?
>> - pthread_kill() will interrupt the recv and probably make it fail
>> - we can use shorter socket timeouts on backend and check r->connection status in between
>> - ???
>
>
> In trunk the tunnelling side of mod_proxy_http can go async and get
> called back for activity on either side by asking Event to watch both
> sockets.


How does that work, actually? Do we have an example somewhere?

> I'm not sure how browsers treat the SSE connection, can it ever have a
> subsequent request? If not, maybe we could see the SSE Content-Type
> and shoehorn it into the tunneling (figuring out what to do with
> writes from the client, backport the event and async tunnel stuff?)

I don't think they will do a subsequent request in the HTTP/1.1 sense,
meaning they'll close their H1 connection and open a new one. In H2 land,
the request connection is a virtual "secondary" one away.

But changing behaviour based on the content type seems inadequate. When
the server proxies applications (like uwsgi), the problem may also happen
to requests that are slow producing responses.

To DoS such a setup, where a proxied response takes n seconds, you'd need
total_workers / n aborted requests per second. In HTTP/1.1 that would
all be connections and maybe noticeable from a supervisor, but in H2 this
could happen all on the same tcp connection (although our h2 implementation
has some protection against abusive client behaviour).

A general solution to the problem would therefore be valuable, imo.

We should think about solving this in the context of mpm_event, which
I believe is the production recommended setup that merits our efforts.

If mpm_event could make the link between one connection to another,
like frontend to backend, it could wake up backends on a frontend
termination. Do you agree, Yann?

Could this be as easy as adding another "conn_rec *context" field
in conn_rec that tracks this?

- Stefan
Re: backend connections life times [ In reply to ]
On Thu, Jul 1, 2021 at 10:15 AM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> > Am 30.06.2021 um 18:01 schrieb Eric Covener <covener@gmail.com>:
> >
> > On Wed, Jun 30, 2021 at 11:46 AM Stefan Eissing
> > <stefan.eissing@greenbytes.de> wrote:
> >>
> >> It looks like we stumbled upon an issue in https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life times of our backend connections.
> >>
> >> When a frontend connection causes a backend request and drops, our backend connection only notifies the loss when it attempts to pass some data. In normal http response processing, this is not an issue since response chunks are usually coming in quite frequently. Then the proxied connection will fail to pass it to an aborted frontend connection and cleanup will occur.
> >>
> >> However, with such modern shenanigans such as Server Side Events (SSE), the request is supposed to be long running and will produce body chunks quite infrequently, like every 30 seconds or so. This leaves our proxy workers hanging in recv for quite a while and may lead to worker exhaustion.
> >>
> >> We can say SSE is a bad idea anyway, but that will probably not stop people from doing such crazy things.
> >>
> >> What other mitigations do we have?
> >> - pthread_kill() will interrupt the recv and probably make it fail
> >> - we can use shorter socket timeouts on backend and check r->connection status in between
> >> - ???

Can mod_proxy_http2 do better here? I suppose we lose all
relationships in h2->h1 and then h1->h2, asking just in case..

> >
> >
> > In trunk the tunnelling side of mod_proxy_http can go async and get
> > called back for activity on either side by asking Event to watch both
> > sockets.
>
>
> How does that work, actually? Do we have an example somewhere?

This is the ap_proxy_tunnel_create() and ap_proxy_tunnel_run()
called/used by mod_proxy_http for Upgrade(d) protocols.

I'm thinking to improve this interface to have a hook called in
ap_proxy_transfer_between_connections() with the data being forwarded
from one side to the other (in/out connection), and the hook could
decide to let the data pass, or retain them, and/or switch to
speculative mode, and/or remove/add one side/sense from the pollset,
or abort, or.. The REALLY_LAST hook would be something like the
existing ap_proxy_buckets_lifetime_transform().

The hook(s) would be responsible (each) of their connections' states,
mod_proxy_http could then be implemented fully async in a callback,
but I suppose mod_h2 could hook itself there too if it has something
to care about.

>
> > I'm not sure how browsers treat the SSE connection, can it ever have a
> > subsequent request? If not, maybe we could see the SSE Content-Type
> > and shoehorn it into the tunneling (figuring out what to do with
> > writes from the client, backport the event and async tunnel stuff?)
>
> I don't think they will do a subsequent request in the HTTP/1.1 sense,
> meaning they'll close their H1 connection and open a new one. In H2 land,
> the request connection is a virtual "secondary" one away.

The issue I see here for inbound h2 is that the tunneling loop needs
something to poll() on both sides, and there is no socket on the h2
slave connections to do that.. How to poll a h2 stream, pipe or
something?

>
> But changing behaviour based on the content type seems inadequate. When
> the server proxies applications (like uwsgi), the problem may also happen
> to requests that are slow producing responses.
>
> To DoS such a setup, where a proxied response takes n seconds, you'd need
> total_workers / n aborted requests per second. In HTTP/1.1 that would
> all be connections and maybe noticeable from a supervisor, but in H2 this
> could happen all on the same tcp connection (although our h2 implementation
> has some protection against abusive client behaviour).
>
> A general solution to the problem would therefore be valuable, imo.

The general/generic solution for anything proxy could be the tunneling
loop, a bit like a proxy_tcp (or proxy_transport) module to hook to.

>
> We should think about solving this in the context of mpm_event, which
> I believe is the production recommended setup that merits our efforts.

Yes, the tunneling loop stops and the poll()ing is deferred to MPM
event (the ap_hook_mpm_register_poll_callback*() API) when nothing
comes from either side for an AsyncDelay.

>
> If mpm_event could make the link between one connection to another,
> like frontend to backend, it could wake up backends on a frontend
> termination. Do you agree, Yann?

Absolutely, but there's more work to be done to get there :)

Also, is this kind of architecture what we really want?
Ideas, criticisms and discussions welcome!

>
> Could this be as easy as adding another "conn_rec *context" field
> in conn_rec that tracks this?

Tracking some connection close (at transport level) on the client side
to "abort" the transaction is not enough, a connection can be
half-closed and still want some response back for instance.

We want something that says abort (like h2 RST_STREAM), no such thing
in transactional HTTP/1 (AFAICT), that's why mod_h2 would need to hook
in the tunnel if it wanted to abort the loop (IIUC).


Cheers;
Yann.
Re: backend connections life times [ In reply to ]
> Am 01.07.2021 um 14:16 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>
> On Thu, Jul 1, 2021 at 10:15 AM Stefan Eissing
> <stefan.eissing@greenbytes.de> wrote:
>>
>>> Am 30.06.2021 um 18:01 schrieb Eric Covener <covener@gmail.com>:
>>>
>>> On Wed, Jun 30, 2021 at 11:46 AM Stefan Eissing
>>> <stefan.eissing@greenbytes.de> wrote:
>>>>
>>>> It looks like we stumbled upon an issue in https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life times of our backend connections.
>>>>
>>>> When a frontend connection causes a backend request and drops, our backend connection only notifies the loss when it attempts to pass some data. In normal http response processing, this is not an issue since response chunks are usually coming in quite frequently. Then the proxied connection will fail to pass it to an aborted frontend connection and cleanup will occur.
>>>>
>>>> However, with such modern shenanigans such as Server Side Events (SSE), the request is supposed to be long running and will produce body chunks quite infrequently, like every 30 seconds or so. This leaves our proxy workers hanging in recv for quite a while and may lead to worker exhaustion.
>>>>
>>>> We can say SSE is a bad idea anyway, but that will probably not stop people from doing such crazy things.
>>>>
>>>> What other mitigations do we have?
>>>> - pthread_kill() will interrupt the recv and probably make it fail
>>>> - we can use shorter socket timeouts on backend and check r->connection status in between
>>>> - ???
>
> Can mod_proxy_http2 do better here? I suppose we lose all
> relationships in h2->h1 and then h1->h2, asking just in case..

I have not tested, but my guess is that it goes into a blocking read on the backend as well, since there is nothing it wants to send when a response body is incoming.

>>>
>>>
>>> In trunk the tunnelling side of mod_proxy_http can go async and get
>>> called back for activity on either side by asking Event to watch both
>>> sockets.
>>
>>
>> How does that work, actually? Do we have an example somewhere?
>
> This is the ap_proxy_tunnel_create() and ap_proxy_tunnel_run()
> called/used by mod_proxy_http for Upgrade(d) protocols.
>
> I'm thinking to improve this interface to have a hook called in
> ap_proxy_transfer_between_connections() with the data being forwarded
> from one side to the other (in/out connection), and the hook could
> decide to let the data pass, or retain them, and/or switch to
> speculative mode, and/or remove/add one side/sense from the pollset,
> or abort, or.. The REALLY_LAST hook would be something like the
> existing ap_proxy_buckets_lifetime_transform().
>
> The hook(s) would be responsible (each) of their connections' states,
> mod_proxy_http could then be implemented fully async in a callback,
> but I suppose mod_h2 could hook itself there too if it has something
> to care about.

Just had a glimpse and it looks interesting, not only for "real" backends but maybe also for handling h2 workers from a main connection. See below.

>
>>
>>> I'm not sure how browsers treat the SSE connection, can it ever have a
>>> subsequent request? If not, maybe we could see the SSE Content-Type
>>> and shoehorn it into the tunneling (figuring out what to do with
>>> writes from the client, backport the event and async tunnel stuff?)
>>
>> I don't think they will do a subsequent request in the HTTP/1.1 sense,
>> meaning they'll close their H1 connection and open a new one. In H2 land,
>> the request connection is a virtual "secondary" one away.
>
> The issue I see here for inbound h2 is that the tunneling loop needs
> something to poll() on both sides, and there is no socket on the h2
> slave connections to do that.. How to poll a h2 stream, pipe or
> something?

More "something". The plan to switch the current polling to something
pipe based has long stalled. Mainly due to lack of time, but also missing
a bright idea how the server in general should handle such constructs.

>
>>
>> But changing behaviour based on the content type seems inadequate. When
>> the server proxies applications (like uwsgi), the problem may also happen
>> to requests that are slow producing responses.
>>
>> To DoS such a setup, where a proxied response takes n seconds, you'd need
>> total_workers / n aborted requests per second. In HTTP/1.1 that would
>> all be connections and maybe noticeable from a supervisor, but in H2 this
>> could happen all on the same tcp connection (although our h2 implementation
>> has some protection against abusive client behaviour).
>>
>> A general solution to the problem would therefore be valuable, imo.
>
> The general/generic solution for anything proxy could be the tunneling
> loop, a bit like a proxy_tcp (or proxy_transport) module to hook to.
>
>>
>> We should think about solving this in the context of mpm_event, which
>> I believe is the production recommended setup that merits our efforts.
>
> Yes, the tunneling loop stops and the poll()ing is deferred to MPM
> event (the ap_hook_mpm_register_poll_callback*() API) when nothing
> comes from either side for an AsyncDelay.
>
>>
>> If mpm_event could make the link between one connection to another,
>> like frontend to backend, it could wake up backends on a frontend
>> termination. Do you agree, Yann?
>
> Absolutely, but there's more work to be done to get there :)
>
> Also, is this kind of architecture what we really want?
> Ideas, criticisms and discussions welcome!
>
>>
>> Could this be as easy as adding another "conn_rec *context" field
>> in conn_rec that tracks this?
>
> Tracking some connection close (at transport level) on the client side
> to "abort" the transaction is not enough, a connection can be
> half-closed and still want some response back for instance.
>
> We want something that says abort (like h2 RST_STREAM), no such thing
> in transactional HTTP/1 (AFAICT), that's why mod_h2 would need to hook
> in the tunnel if it wanted to abort the loop (IIUC).

Makes sense. There is also the H2 stream state which goes into
HALF_CLOSED_REMOTE when the request is fully sent and a response
is expected to come until the server closes its side of the stream.

The TLS CLOSE_NOTIFY would be an equivalent signal available on a http/1.1
https: connection, I assume.

Cheers, Stefan
>
>
> Cheers;
> Yann.
Re: backend connections life times [ In reply to ]
Coming back to this discussion, starting at the head because this has become a bit nested, I try to summarise:

Fact: when the client of a proxied http request aborts the connection (as in causing a c->aborted somehow), mod_proxy_http only react to this when writing (parts of) a response. The time this takes depends on the responsiveness of the backend. Long delay can commonly come from invoking an expensive request (long time to compute) or from a long-running request (as in server-side-events, SSEs).

Even disregarding SSEs and opinions about its design, we could reduce resource waste in our server when we can eliminate those delays.

In mod_proxy_wstunnel we have to monitor frontend and backend connection simultaneously due to the nature of the protocol and there the delay will not happen. This uses a pollset in 2.4.x and in trunk Yann wrapped that into ap_proxy_tunnel_* functions.

It seeems that ap_proxy_tunnel_* should be usable in mod_proxy_http as well. Especially when waiting for the status line and when streaming the response body. When the frontend connection is HTTP/1.1...

Alas, for HTTP/2, this will not do the trick since H2 secondary connections do not really have their own socket for polling. But that is a long-open issue which needs addressing. Not that easy, but certainly beneficial to have a PIPE socket pair where H2 main connection and workers can at least notify each other, if not transfer the actual data over.

Another approach could be to "stutter" the blocking backend reads in mod_proxy_http with a 5 sec socket timeout or so, only to check frontend->aborted and read again. That might be a minimum effort approach for the short term.

Yann, did I get this right? Eric had also the suggestion to only do this on certain content types, but that would not solve slow responsiveness.

- Stefan


> Am 01.07.2021 um 16:06 schrieb Stefan Eissing <stefan.eissing@greenbytes.de>:
>
>
>
>> Am 01.07.2021 um 14:16 schrieb Yann Ylavic <ylavic.dev@gmail.com>:
>>
>> On Thu, Jul 1, 2021 at 10:15 AM Stefan Eissing
>> <stefan.eissing@greenbytes.de> wrote:
>>>
>>>> Am 30.06.2021 um 18:01 schrieb Eric Covener <covener@gmail.com>:
>>>>
>>>> On Wed, Jun 30, 2021 at 11:46 AM Stefan Eissing
>>>> <stefan.eissing@greenbytes.de> wrote:
>>>>>
>>>>> It looks like we stumbled upon an issue in https://bz.apache.org/bugzilla/show_bug.cgi?id=65402 which concerns the life times of our backend connections.
>>>>>
>>>>> When a frontend connection causes a backend request and drops, our backend connection only notifies the loss when it attempts to pass some data. In normal http response processing, this is not an issue since response chunks are usually coming in quite frequently. Then the proxied connection will fail to pass it to an aborted frontend connection and cleanup will occur.
>>>>>
>>>>> However, with such modern shenanigans such as Server Side Events (SSE), the request is supposed to be long running and will produce body chunks quite infrequently, like every 30 seconds or so. This leaves our proxy workers hanging in recv for quite a while and may lead to worker exhaustion.
>>>>>
>>>>> We can say SSE is a bad idea anyway, but that will probably not stop people from doing such crazy things.
>>>>>
>>>>> What other mitigations do we have?
>>>>> - pthread_kill() will interrupt the recv and probably make it fail
>>>>> - we can use shorter socket timeouts on backend and check r->connection status in between
>>>>> - ???
>>
>> Can mod_proxy_http2 do better here? I suppose we lose all
>> relationships in h2->h1 and then h1->h2, asking just in case..
>
> I have not tested, but my guess is that it goes into a blocking read on the backend as well, since there is nothing it wants to send when a response body is incoming.
>
>>>>
>>>>
>>>> In trunk the tunnelling side of mod_proxy_http can go async and get
>>>> called back for activity on either side by asking Event to watch both
>>>> sockets.
>>>
>>>
>>> How does that work, actually? Do we have an example somewhere?
>>
>> This is the ap_proxy_tunnel_create() and ap_proxy_tunnel_run()
>> called/used by mod_proxy_http for Upgrade(d) protocols.
>>
>> I'm thinking to improve this interface to have a hook called in
>> ap_proxy_transfer_between_connections() with the data being forwarded
>> from one side to the other (in/out connection), and the hook could
>> decide to let the data pass, or retain them, and/or switch to
>> speculative mode, and/or remove/add one side/sense from the pollset,
>> or abort, or.. The REALLY_LAST hook would be something like the
>> existing ap_proxy_buckets_lifetime_transform().
>>
>> The hook(s) would be responsible (each) of their connections' states,
>> mod_proxy_http could then be implemented fully async in a callback,
>> but I suppose mod_h2 could hook itself there too if it has something
>> to care about.
>
> Just had a glimpse and it looks interesting, not only for "real" backends but maybe also for handling h2 workers from a main connection. See below.
>
>>
>>>
>>>> I'm not sure how browsers treat the SSE connection, can it ever have a
>>>> subsequent request? If not, maybe we could see the SSE Content-Type
>>>> and shoehorn it into the tunneling (figuring out what to do with
>>>> writes from the client, backport the event and async tunnel stuff?)
>>>
>>> I don't think they will do a subsequent request in the HTTP/1.1 sense,
>>> meaning they'll close their H1 connection and open a new one. In H2 land,
>>> the request connection is a virtual "secondary" one away.
>>
>> The issue I see here for inbound h2 is that the tunneling loop needs
>> something to poll() on both sides, and there is no socket on the h2
>> slave connections to do that.. How to poll a h2 stream, pipe or
>> something?
>
> More "something". The plan to switch the current polling to something
> pipe based has long stalled. Mainly due to lack of time, but also missing
> a bright idea how the server in general should handle such constructs.
>
>>
>>>
>>> But changing behaviour based on the content type seems inadequate. When
>>> the server proxies applications (like uwsgi), the problem may also happen
>>> to requests that are slow producing responses.
>>>
>>> To DoS such a setup, where a proxied response takes n seconds, you'd need
>>> total_workers / n aborted requests per second. In HTTP/1.1 that would
>>> all be connections and maybe noticeable from a supervisor, but in H2 this
>>> could happen all on the same tcp connection (although our h2 implementation
>>> has some protection against abusive client behaviour).
>>>
>>> A general solution to the problem would therefore be valuable, imo.
>>
>> The general/generic solution for anything proxy could be the tunneling
>> loop, a bit like a proxy_tcp (or proxy_transport) module to hook to.
>>
>>>
>>> We should think about solving this in the context of mpm_event, which
>>> I believe is the production recommended setup that merits our efforts.
>>
>> Yes, the tunneling loop stops and the poll()ing is deferred to MPM
>> event (the ap_hook_mpm_register_poll_callback*() API) when nothing
>> comes from either side for an AsyncDelay.
>>
>>>
>>> If mpm_event could make the link between one connection to another,
>>> like frontend to backend, it could wake up backends on a frontend
>>> termination. Do you agree, Yann?
>>
>> Absolutely, but there's more work to be done to get there :)
>>
>> Also, is this kind of architecture what we really want?
>> Ideas, criticisms and discussions welcome!
>>
>>>
>>> Could this be as easy as adding another "conn_rec *context" field
>>> in conn_rec that tracks this?
>>
>> Tracking some connection close (at transport level) on the client side
>> to "abort" the transaction is not enough, a connection can be
>> half-closed and still want some response back for instance.
>>
>> We want something that says abort (like h2 RST_STREAM), no such thing
>> in transactional HTTP/1 (AFAICT), that's why mod_h2 would need to hook
>> in the tunnel if it wanted to abort the loop (IIUC).
>
> Makes sense. There is also the H2 stream state which goes into
> HALF_CLOSED_REMOTE when the request is fully sent and a response
> is expected to come until the server closes its side of the stream.
>
> The TLS CLOSE_NOTIFY would be an equivalent signal available on a http/1.1
> https: connection, I assume.
>
> Cheers, Stefan
>>
>>
>> Cheers;
>> Yann.
Re: backend connections life times [ In reply to ]
> Another approach could be to "stutter" the blocking backend reads in mod_proxy_http with a 5 sec socket timeout or so, only to check frontend->aborted and read again. That might be a minimum effort approach for the short term.

Unfortunately I don't think you will find frontend->aborted is set in
such a case (w HTTP 1.x at least). There is nobody watching while the
request is being processed. If you split up the long poll, you still
have to do _something_ with the frontend socket to discover it's
unusable.
Re: backend connections life times [ In reply to ]
On Tue, Jul 6, 2021 at 4:22 PM Stefan Eissing
<stefan.eissing@greenbytes.de> wrote:
>
> Coming back to this discussion, starting at the head because this has become a bit nested, I try to summarise:
>
> Fact: when the client of a proxied http request aborts the connection (as in causing a c->aborted somehow), mod_proxy_http only react to this when writing (parts of) a response. The time this takes depends on the responsiveness of the backend. Long delay can commonly come from invoking an expensive request (long time to compute) or from a long-running request (as in server-side-events, SSEs).

Right.

>
> Even disregarding SSEs and opinions about its design, we could reduce resource waste in our server when we can eliminate those delays.

We could let idle connections be handled by the MPM's listener thread
instead of holding a worker thread.

>
> In mod_proxy_wstunnel we have to monitor frontend and backend connection simultaneously due to the nature of the protocol and there the delay will not happen. This uses a pollset in 2.4.x and in trunk Yann wrapped that into ap_proxy_tunnel_* functions.

Both trunk and 2.4.x use the same proxy tunneling mechanism/functions
since 2.4.48, mod_proxy_wstunnel is now an "empty shell" falling back
to mod_proxy_http (which is the one creating and starting the tunnel,
only for Upgrade(d) protocols so far).
In 2.4.x the tunnel is holding a worker thread for the lifetime of the
connections still, while trunk goes one step further by allowing the
configuration of an AsyncDelay above which when both client and
backend connections are idle their polling is deferred to MPM event by
using the mpm_register_poll_callback_timeout hook mechanism (the
connections are handled by the registered callback from now). The
handler then returns SUSPENDED and the worker thread is given back to
the MPM (to take another work like handling a new incoming connection
or running the callback that resumes the tunneling loop once the
connections are ready, and so on..).

>
> It seeems that ap_proxy_tunnel_* should be usable in mod_proxy_http as well.

It is already, see above.

> Especially when waiting for the status line and when streaming the response body. When the frontend connection is HTTP/1.1...

That part is indeed missing, for that we'd need to move the HTTP state
tracking and response parsing in a hook/callback run by the tunneling
loop for each chunk of request/response data or connection event
(possibly different hooks depending on the type of event).

>
> Alas, for HTTP/2, this will not do the trick since H2 secondary connections do not really have their own socket for polling. But that is a long-open issue which needs addressing. Not that easy, but certainly beneficial to have a PIPE socket pair where H2 main connection and workers can at least notify each other, if not transfer the actual data over.

The tunneling loop needs an fd on both sides for polling, but once any
fd triggers (is ready) the loop uses the usual input/output filters
chain to read/write the buckets brigade.

So possibly two pipes per stream could do it for h2<->h1:
- the main connection writes incoming data to istream[1] to make them
available for the secondary connection on istream[0]
- the secondary connection writes outgoing data to ostream[1] to make
them available for the main connection on ostream[0]

The advantage of the pipes (over socketpair or else) is that they are
available on all platforms and apr_pollset() compatible already.
That's two fds per pipe but with the above they could be reused for
successive streams, provided all the data have been consumed.

With this most of the missing work would be on the main connection
side I suppose, writing incoming data as they arrive is probably easy
but some kind of listener is needed for the responses on the multiple
ostream[0], to multiplex them on the main connection.
For istream at least, I don't think we can write raw incoming data as
is though, more a tuple like (type, length, data) for each chunk so to
be able to pass meta like RST_STREAM or alike.

On the h1 / secondary connection side all this becomes kind of
transparent, all we need is a pair of core secondary input/output
filters able to read from / write to the pipes (eventually extracting
and interpreting the tuples as needed), plus small changes here and
there to accommodate for ap_get_conn_socket() which could actually be
a pipe now.

>
> Another approach could be to "stutter" the blocking backend reads in mod_proxy_http with a 5 sec socket timeout or so, only to check frontend->aborted and read again. That might be a minimum effort approach for the short term.

Like Eric I don't think this can work, we are unlikely to have
->aborted set on read, unless the connection is really reset by peer.
So with a half-closed connection we still owe a response to the frontend..

>
> Yann, did I get this right?

Hopefully I made sense above with the picture of what could/remains-to
be done ;)

> Eric had also the suggestion to only do this on certain content types, but that would not solve slow responsiveness.

The current proxy tunneling mechanism in mod_proxy_http is triggered
by an "Upgrade: <proto>" header requested by the client and a "101
Upgrade" accept/reply from the backend. I think what Eric proposed is
that we also initiate the tunnel based on some configured
Content-Type(s) or maybe some r->notes set by mod_h2 (if it can
determine from the start that it's relevant).
The issue I see here is that we probably won't have something to rely
on from the backend meaning that it is OK for tunneling (like the "101
Upgrade"), and thus a client forging a Content-Type can open a tunnel
and do whatever until the connection is closed (no HTTP parsing/check
anymore). Plus with h2 it wouldn't be enough, the h2<->h1 pipes from
above are needed for the tunneling loop to work.


Regards;
Yann.