Mailing List Archive

RFC 3875, REQUEST_URI and PATH_INFO
per bobtfish, I'm writing about the RFC3875 paragraph of
Catalyst/Engine/CGI.pm rev 13152 around line 152.

This new bit of code is harming the current mapping REQUEST_URI +
PATH_INFO => $base_uri, which is used for dispatch.

In particular, I've got a set of mod_rewrite config clauses, which are
getting misdirected by this new code.

http://paste.scsys.co.uk/42123

If I understand the situation correctly, bobtfish has broken my
redirects in an attempt to address issues surrounding the encoding of
slash-containing query parameters. I have suggested to him that his
changes are hard to prove correct without a good spec for the above
mapping, and he's agreed to either remove the code or configure it off
by default.

for completeness, I've included the commenting-out bits that I used to
debug the problem.

Index: lib/Catalyst/Engine/CGI.pm
===================================================================
--- lib/Catalyst/Engine/CGI.pm (revision 13152)
+++ lib/Catalyst/Engine/CGI.pm (working copy)
@@ -156,6 +156,11 @@
# Here we try to resurrect the original encoded URI from REQUEST_URI.
my $path_info = $ENV{PATH_INFO};
if (my $req_uri = $ENV{REQUEST_URI}) {
+
+=pod
+
+disabled by djs
+
$req_uri =~ s/^\Q$base_path\E//;
$req_uri =~ s/\?.*$//;
if ($req_uri && $req_uri ne '/') {
@@ -169,6 +174,9 @@
if $path_info_part;
$path_info = $req_uri;
}
+
+=cut
+
}

# set the request URI
@@ -199,7 +207,9 @@
$base_path .= '/' unless $base_path =~ m{/$};

my $base_uri = $scheme . '://' . $host . $base_path;
-
+ use Data::Dumper;
+ $c->log->debug(Data::Dumper->Dump([$ENV{REQUEST_URI},$ENV{SCRIPT_NAME},$ENV{PATH_INFO},$path, $base_path],
+ [qw(REQUEST_URI SCRIPT_NAME PATH_INFO path base_path )]));
$c->request->base( bless \$base_uri, $uri_class );
}





Cheers!

--
Danny Sadinoff
danny@sadinoff.com

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev
Re: RFC 3875, REQUEST_URI and PATH_INFO [ In reply to ]
On 21 Apr 2010, at 23:30, <danny-catalyst-dev@sadinoff.com> <danny-catalyst-dev@sadinoff.com
> wrote:

>
> per bobtfish, I'm writing about the RFC3875 paragraph of
> Catalyst/Engine/CGI.pm rev 13152 around line 152.
>
> This new bit of code is harming the current mapping REQUEST_URI +
> PATH_INFO => $base_uri, which is used for dispatch.
>
> In particular, I've got a set of mod_rewrite config clauses, which are
> getting misdirected by this new code.
>
> http://paste.scsys.co.uk/42123
>
> If I understand the situation correctly, bobtfish has broken my
> redirects in an attempt to address issues surrounding the encoding of
> slash-containing query parameters.

That'd be me then :)

Sorry about the delay in getting to this - I've been thinking about it
fairly deeply for a while, and I've come to the conclusion it's
entirely unsolvable easily...

To make this entirely clear to everyone (hopefully), the difficulty is
that previously Catalyst entirely ignored the REQUEST_URI environment,
and instead constructed the request base ($c->req->base), uri ($c->req-
>uri) and path ($c->req->path) from the combination of SCRIPT_NAME
and PATH_INFO.

However, the PATH_INFO is _always_ decoded, which means that %2F is
decoded into /, meaning that you can't possibly get it right if you're
just using PATH_INFO.

> I have suggested to him that his
> changes are hard to prove correct without a good spec for the above
> mapping, and he's agreed to either remove the code or configure it off
> by default.

It's not to do with any spec - it's down to what web servers actually
do in reality.

In the case where (for example), your app is at /cgi-bin/myapp.cgi,
the request path is /foo%2Fbar, and mod_rewrite or mod_alias is used
to map / into /cgi-bin/myapp.cgi, then the REQUEST_URI will reflect
the path (/foo%2Fbar), SCRIPT_NAME will be /cgi-bin/myapp.cgi and the
PATH_INFO contains /cgi-bin/myapp.cgi/foo/bar

This means that if you're using REQUEST_URI and SCRIPT_NAME, then the
request base cannot be correctly determined. If you're using PATH_INFO
and SCRIPT_NAME (as we used to) then everything works as expected,
however you _cannot_ handle %2F correctly...

So basically, you're damned if you do and damned if you don't.

Given that many people are relying on being able to map arbitrary
paths into the application using mod_(rewrite|alias|ssi), then I think
we have to revert to the previous behavior by default, and provide the
behavior of using REQUEST_URI as a configuration option.

I have the change in a branch now:
http://dev.catalyst.perl.org/repos/Catalyst/Catalyst-Runtime/5.80/branches/fix_request_uri

I've attached the diff of the branch right now to this mail.. Please
someone review.

I'd also plan to back out some of the heuristics in the REQUEST_URI
handling that I added after initially finding the issues so that the
handling is simple, and we also need to add a way to have your cake
and eat it too - by telling Catalyst the place(s) the application is
based (by config) so that it can both get the $c->req->base correct,
AND use REQUEST_URI to get %2F handling (i.e. have your cake and eat
it too).

However all of this can happen later - the main issue blocking merging
this code is (a) having tests for both cases (config option on and
off) - I'll get to these very shortly (but probably not tonight), and
(b) the name I've given the config option ('rfc3875_path') is entirely
rubbish. Can someone please suggest something better?
Re: RFC 3875, REQUEST_URI and PATH_INFO [ In reply to ]
On 21 Apr 2010, at 23:30, <danny-catalyst-dev@sadinoff.com> <danny-catalyst-dev@sadinoff.com
> wrote:

>
> per bobtfish, I'm writing about the RFC3875 paragraph of
> Catalyst/Engine/CGI.pm rev 13152 around line 152.
>
> This new bit of code is harming the current mapping REQUEST_URI +
> PATH_INFO => $base_uri, which is used for dispatch.
>
> In particular, I've got a set of mod_rewrite config clauses, which are
> getting misdirected by this new code.
>
> http://paste.scsys.co.uk/42123
>
> If I understand the situation correctly, bobtfish has broken my
> redirects in an attempt to address issues surrounding the encoding of
> slash-containing query parameters.

That'd be me then :)

Sorry about the delay in getting to this - I've been thinking about it
fairly deeply for a while, and I've come to the conclusion it's
entirely unsolvable easily...

To make this entirely clear to everyone (hopefully), the difficulty is
that previously Catalyst entirely ignored the REQUEST_URI environment,
and instead constructed the request base ($c->req->base), uri ($c->req-
>uri) and path ($c->req->path) from the combination of SCRIPT_NAME
and PATH_INFO.

However, the PATH_INFO is _always_ decoded, which means that %2F is
decoded into /, meaning that you can't possibly get it right if you're
just using PATH_INFO.

> I have suggested to him that his
> changes are hard to prove correct without a good spec for the above
> mapping, and he's agreed to either remove the code or configure it off
> by default.

It's not to do with any spec - it's down to what web servers actually
do in reality.

In the case where (for example), your app is at /cgi-bin/myapp.cgi,
the request path is /foo%2Fbar, and mod_rewrite or mod_alias is used
to map / into /cgi-bin/myapp.cgi, then the REQUEST_URI will reflect
the path (/foo%2Fbar), SCRIPT_NAME will be /cgi-bin/myapp.cgi and the
PATH_INFO contains /cgi-bin/myapp.cgi/foo/bar

This means that if you're using REQUEST_URI and SCRIPT_NAME, then the
request base cannot be correctly determined. If you're using PATH_INFO
and SCRIPT_NAME (as we used to) then everything works as expected,
however you _cannot_ handle %2F correctly...

So basically, you're damned if you do and damned if you don't.

Given that many people are relying on being able to map arbitrary
paths into the application using mod_(rewrite|alias|ssi), then I think
we have to revert to the previous behavior by default, and provide the
behavior of using REQUEST_URI as a configuration option.

I have the change in a branch now:
http://dev.catalyst.perl.org/repos/Catalyst/Catalyst-Runtime/5.80/branches/fix_request_uri

I've attached the diff of the branch right now to this mail.. Please
someone review.

I'd also plan to back out some of the heuristics in the REQUEST_URI
handling that I added after initially finding the issues so that the
handling is simple, and we also need to add a way to have your cake
and eat it too - by telling Catalyst the place(s) the application is
based (by config) so that it can both get the $c->req->base correct,
AND use REQUEST_URI to get %2F handling (i.e. have your cake and eat
it too).

However all of this can happen later - the main issue blocking merging
this code is (a) having tests for both cases (config option on and
off) - I'll get to these very shortly (but probably not tonight), and
(b) the name I've given the config option ('rfc3875_path') is entirely
rubbish. Can someone please suggest something better?