Mailing List Archive

pre_translate_name hook vs location/directory walk (was: svn commit: r1879079)
> Allow for URI-path pre_translate_name before (and/or instead of) decoding.
>
> Only if no hook takes "ownership" of the URI (returning OK), apply
> percent decoding for the rest of request handling. Otherwise r->uri remains
> encoded meaning that further location/directory/file/if/.. sections (walks)
> should that into account.

This change allows a module to avoid URI decoding in
pre_translate_name, such that r->uri remains encoded for the whole
request handling.

For instance (same series), mod_proxy will use that hook when
"ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
matches. The decoding was already disabled (before this commit) for
the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
an early stage, but now it "works" for the reverse-proxy case too.

This means that when "ProxyMappingDecoded off" is configured (the
default is still "on", though), for instance a <Location URI-path, or
a fixup RewriteRule or a non-early RequestHeader rule will have to be
expressed with the encoded form since reserved characters will not be
decoded as before (note that unreserved characters will always be
decoded anyway).

I don't know how much it can break existing configurations, and wonder
if we should have a core directive that still controls whether r->uri
should be decoded, regardless of ProxyMappingDecoded or any further
module pre_translate_name hook with its own directives.

For mod_proxy this is not really an issue to have r->uri modified
after the (pre_)translate phase, because the handler will later work
on r->filename regardless (the "proxy:..." set by proxy_detect or
proxy_trans), so maybe we should simply always decode r->uri after the
pre_translate hooks have run. Though I find it easier in a proxy
context to match a URI-path (LocationMatch, RewriteRule..) in its
encoded form, and not worry about original vs decoded separators. That
may be just my preference, but a new core directive looks sane to me..

Thoughts?


Also, since we are at it, I'm on the fence about running the
pre_translate hooks before quick handlers (thus before
ap_process_request_internal() too), good or bad idea?


Regards;
Yann.
Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079) [ In reply to ]
On Mon, Jun 22, 2020 at 2:28 PM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>
> Also, since we are at it, I'm on the fence about running the
> pre_translate hooks before quick handlers (thus before
> ap_process_request_internal() too), good or bad idea?

Sorry, I meant running the *normalization* (including decoding
unreserved characters) before the quick handlers.
Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079) [ In reply to ]
On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic <ylavic.dev@gmail.com> wrote:
>
> > Allow for URI-path pre_translate_name before (and/or instead of) decoding.
> >
> > Only if no hook takes "ownership" of the URI (returning OK), apply
> > percent decoding for the rest of request handling. Otherwise r->uri remains
> > encoded meaning that further location/directory/file/if/.. sections (walks)
> > should that into account.
>
> This change allows a module to avoid URI decoding in
> pre_translate_name, such that r->uri remains encoded for the whole
> request handling.
>
> For instance (same series), mod_proxy will use that hook when
> "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
> matches. The decoding was already disabled (before this commit) for
> the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
> an early stage, but now it "works" for the reverse-proxy case too.
>
> This means that when "ProxyMappingDecoded off" is configured (the
> default is still "on", though), for instance a <Location URI-path, or
> a fixup RewriteRule or a non-early RequestHeader rule will have to be
> expressed with the encoded form since reserved characters will not be
> decoded as before (note that unreserved characters will always be
> decoded anyway).
>
> I don't know how much it can break existing configurations, and wonder
> if we should have a core directive that still controls whether r->uri
> should be decoded, regardless of ProxyMappingDecoded or any further
> module pre_translate_name hook with its own directives.
>
> For mod_proxy this is not really an issue to have r->uri modified
> after the (pre_)translate phase, because the handler will later work
> on r->filename regardless (the "proxy:..." set by proxy_detect or
> proxy_trans), so maybe we should simply always decode r->uri after the
> pre_translate hooks have run. Though I find it easier in a proxy
> context to match a URI-path (LocationMatch, RewriteRule..) in its
> encoded form, and not worry about original vs decoded separators. That
> may be just my preference, but a new core directive looks sane to me..
>
> Thoughts?

I have not followed too closely so take with a grain of salt / for the
sake of argument:

From an externals perspective (maybe being detached here is a benefit)
it might be better to have something like "ProxyUseOriginalURL" that
has the following caveats:
1. it's partially decoded
2. it is required for whatever the servlet mapping option is (iiuc?)
3. it will affect the string used in comparisons/substitutions for
<Location*>, Rewrite, <If>, (*CAUTION*)
- Some details is needed here about the partial decoding, for
example %2f or multiple leading slashes so "simple" context root
configs can be written without too much paranoia
4. If there are risks of re-injecting URL's with [PT] or per-dir
rewrites try to explain that it's a complex combination to worry
about.

I would not worry too much about "existing configs" as this is all
opt-in and it's probably a lost cause anyway.

I would instead give people new built-in vars in the important places
(rewrite, expr) to be able to match the partially decoded URI that the
proxy will be working on so a savvy user can safely interoperate
without monstrous regexes.
Re: pre_translate_name hook vs location/directory walk (was: svn commit: r1879079) [ In reply to ]
On Mon, Jun 22, 2020 at 2:57 PM Eric Covener <covener@gmail.com> wrote:
>
> On Mon, Jun 22, 2020 at 8:28 AM Yann Ylavic <ylavic.dev@gmail.com> wrote:
> >
> > > Allow for URI-path pre_translate_name before (and/or instead of) decoding.
> > >
> > > Only if no hook takes "ownership" of the URI (returning OK), apply
> > > percent decoding for the rest of request handling. Otherwise r->uri remains
> > > encoded meaning that further location/directory/file/if/.. sections (walks)
> > > should that into account.
> >
> > This change allows a module to avoid URI decoding in
> > pre_translate_name, such that r->uri remains encoded for the whole
> > request handling.
> >
> > For instance (same series), mod_proxy will use that hook when
> > "ProxyMappingDecoded off" is configured, and return OK if a ProxyPass
> > matches. The decoding was already disabled (before this commit) for
> > the forward-proxy case (ProxyRequests on) since r->proxyreq was set at
> > an early stage, but now it "works" for the reverse-proxy case too.
> >
> > This means that when "ProxyMappingDecoded off" is configured (the
> > default is still "on", though), for instance a <Location URI-path, or
> > a fixup RewriteRule or a non-early RequestHeader rule will have to be
> > expressed with the encoded form since reserved characters will not be
> > decoded as before (note that unreserved characters will always be
> > decoded anyway).
> >
> > I don't know how much it can break existing configurations, and wonder
> > if we should have a core directive that still controls whether r->uri
> > should be decoded, regardless of ProxyMappingDecoded or any further
> > module pre_translate_name hook with its own directives.
> >
> > For mod_proxy this is not really an issue to have r->uri modified
> > after the (pre_)translate phase, because the handler will later work
> > on r->filename regardless (the "proxy:..." set by proxy_detect or
> > proxy_trans), so maybe we should simply always decode r->uri after the
> > pre_translate hooks have run. Though I find it easier in a proxy
> > context to match a URI-path (LocationMatch, RewriteRule..) in its
> > encoded form, and not worry about original vs decoded separators. That
> > may be just my preference, but a new core directive looks sane to me..
> >
> > Thoughts?
>
> I have not followed too closely so take with a grain of salt / for the
> sake of argument:

Thanks, always welcome!

>
> From an externals perspective (maybe being detached here is a benefit)
> it might be better to have something like "ProxyUseOriginalURL" that

Yeah, much better name. Will use *URI which is somewhat closer to the
RFC's URI-path nomenclature.

> has the following caveats:
> 1. it's partially decoded
> 2. it is required for whatever the servlet mapping option is (iiuc?)

Correct, I'm a bit reluctant to issue servlet mapping on decoded URIs..

> 3. it will affect the string used in comparisons/substitutions for
> <Location*>, Rewrite, <If>, (*CAUTION*)
> - Some details is needed here about the partial decoding, for
> example %2f or multiple leading slashes so "simple" context root
> configs can be written without too much paranoia
> 4. If there are risks of re-injecting URL's with [PT] or per-dir
> rewrites try to explain that it's a complex combination to worry
> about.

Possibly we are better off decoding r->uri in any case then, and not
change its current semantics.
As I said it won't affect the mod_proxy pre_trans case since its
translation already occurred and was saved in r->filename.
If we later want to let the user work with either the decoded or
encoded URI then we will add r->encoded_uri or alike, with some
opacity and helpers to keep it in sync with r->uri. But that would
affect module writers too, that's why I was thinking more of a "core"
directive than ProxyUseOriginalURI, or no directive at all with
trunk/2.next/3 material only.

>
> I would not worry too much about "existing configs" as this is all
> opt-in and it's probably a lost cause anyway.
>
> I would instead give people new built-in vars in the important places
> (rewrite, expr) to be able to match the partially decoded URI that the
> proxy will be working on so a savvy user can safely interoperate
> without monstrous regexes.

The hard part here (IMHO) is to keep encoded_uri and decoded_uri in
sync, depending on whichever is updated/accessed by a module or e.g. a
RewriteRule. It may be too much of an API/ABI change for the simple
proxy encoded URI case.

For now we can probably live with r->uri being always
normalized/decoded as before (and <Location*, RewriteRule, ... working
as today), even though it does not allow for encoded control chars
matching or things like that, and requires to be careful with decoded
reserved chars (non default for slash fortunately) or other
separators, while still allowing to match them without a PhD in
regexes.
We can't simply have the best of both worlds I'm afraid..