Mailing List Archive

REPOST: URL decoding bugs in apache 0.8.3
This bug was present in 0.8.2, but doesn't seem to have been fixed for 0.8.3

There's a couple of places where unescape_url() is inappropriately called.

1. PATH_INFO gets URL decoded _again_ before PATH_TRANSLATED is found;
e.g.
http://server.com/cgi-bin/test-cgi/hello%2524

produces
PATH_INFO = /hello%24
PATH_TRANSLATED = /opt/httpd/htdocs/hello$

fix: new routine sub_req_lookup_path() for this case.

2. For internal redirects, the URL gets decoded twice.

e.g.
redir contains:
echo 'Location: /cgi-bin/test-cgi/byee%2524'
echo

then http://server.com/cgi-bin/redir
produces
PATH_INFO = /byee$
PATH_TRANSLATED = /opt/httpd/htdocs/byee$

fix: remove unescape_url() and friends from internal_redirect().

3. for virtual path redirects, the URL might not be decoded, or parts of
it might be decoded twice.

The call to unescape_url() in sub_req_lookup_uri() is in the wrong place;
the new path should be decoded straight away.

4. (May not be a bug) sub_req_lookup_file wasn't setting the args of the
new request.

David.

------------------------ begin file enc.patch -----------------------------
*** http_request.h.orig Wed Jul 19 01:50:25 1995
--- http_request.h Thu Jul 27 12:21:38 1995
***************
*** 78,83 ****
--- 78,84 ----
*/

request_rec *sub_req_lookup_uri (char *new_file, request_rec *r);
+ extern request_rec *sub_req_lookup_path(char *new_file, request_rec *r);
request_rec *sub_req_lookup_file (char *new_file, request_rec *r);
int run_sub_req (request_rec *r);
void destroy_sub_req (request_rec *r);
*** http_request.c.orig Tue Jul 25 02:12:45 1995
--- http_request.c Thu Jul 27 12:22:12 1995
***************
*** 327,334 ****
{
request_rec *rnew;
int res;
! char *udir;

/* Check for a special case... if there are no '/' characters in new_file
* at all, then we are looking at a relative lookup in the same directory.
* That means we don't have to redo any access checks.
--- 327,338 ----
{
request_rec *rnew;
int res;
! char *udir, *uri;

+ /* firstly, decode uri */
+ uri = pstrdup(r->pool, new_file);
+ unescape_url(uri);
+
/* Check for a special case... if there are no '/' characters in new_file
* at all, then we are looking at a relative lookup in the same directory.
* That means we don't have to redo any access checks.
***************
*** 351,357 ****
new_file :
make_full_path (rnew->pool, udir, new_file)));

- unescape_url (rnew->uri);
getparents (rnew->uri);

if ((res = translate_name (rnew))
--- 355,360 ----
***************
*** 369,374 ****
--- 372,411 ----
return rnew;
}

+ /*
+ * This one is for building a new request based on an absolute URL path,
+ * inteneded for the PATH_INFO -> PATH_INFO_TRANSLATED mapping.
+ */
+ request_rec *sub_req_lookup_path (char *new_file, request_rec *r)
+ {
+ request_rec *rnew;
+ int res;
+
+ rnew = make_sub_request (r);
+
+ rnew->connection = r->connection; /* For now... */
+ rnew->server = r->server;
+ rnew->request_config = create_request_config (rnew->pool);
+ set_sub_req_protocol (rnew, r);
+
+ rnew->uri = new_file;
+ rnew->args = NULL;
+
+ if ((res = translate_name (rnew))
+ || (res = directory_walk (rnew))
+ || (res = check_access (rnew))
+ || (!auth_type (rnew) ? 0 :
+ ((res = check_user_id (rnew)) && (res = check_auth (rnew))))
+ || (res = find_types (rnew))
+ || (res = run_fixups (rnew))
+ )
+ {
+ rnew->status = res;
+ }
+
+ return rnew;
+ }
+
request_rec *sub_req_lookup_file (char *new_file, request_rec *r)
{
request_rec *rnew;
***************
*** 392,397 ****
--- 429,435 ----
set_sub_req_protocol (rnew, r);

rnew->uri = "INTERNALLY GENERATED file-relative req";
+ rnew->args = NULL;
rnew->filename = ((new_file[0] == '/') ?
new_file :
make_full_path (rnew->pool, fdir, new_file));
***************
*** 604,612 ****

new->prev = r;
r->next = new;
-
- unescape_url(new->uri);
- getparents (new->uri);

/* We are redirecting. Treat the internally generated transaction
* as a GET, since there is not a chance of its getting POST-style
--- 642,647 ----
*** mod_cgi.c.orig Wed Jul 19 01:38:43 1995
--- mod_cgi.c Thu Jul 27 12:21:39 1995
***************
*** 190,196 ****
}

if (r->path_info && r->path_info[0]) {
! request_rec *pa_req = sub_req_lookup_uri (r->path_info, r);

table_set (e, "PATH_INFO", r->path_info);

--- 190,196 ----
}

if (r->path_info && r->path_info[0]) {
! request_rec *pa_req = sub_req_lookup_path(r->path_info, r);

table_set (e, "PATH_INFO", r->path_info);

------------------------ end file enc.patch -----------------------------
Re: REPOST: URL decoding bugs in apache 0.8.3 [ In reply to ]
Date: Thu, 27 Jul 95 12:48 BST
From: drtr@ast.cam.ac.uk (David Robinson)

This bug was present in 0.8.2, but doesn't seem to have been fixed for 0.8.3

Hmmm... there are four separate bug reports here --- one of which
looks OK, one of which should probably be fixed a different way, and
I'm not sure that the other two are bugs. BTW, if these did cause
problems with some real application, then I'm happy to deal with it
now, but if not, it may be better to put it off until after the first
public release.

(Any bug fix carries with it a nonzero possibility of introducing new
bugs, which may be worse than the originals. If the bug in question
is causing someone real problems, then of course we have to take that
risk; on the other hand, if the problems are such that they are only
likely to be seen by people who go specifically looking for them, it
may be worth leaving them off until we have time to test the fixes
themselves extensively, and make sure they don't have nastier bugs of
their own --- viz. item 3 below).

1. PATH_INFO gets URL decoded _again_ before PATH_TRANSLATED is found;
e.g.
http://server.com/cgi-bin/test-cgi/hello%2524

produces
PATH_INFO = /hello%24
PATH_TRANSLATED = /opt/httpd/htdocs/hello$

fix: new routine sub_req_lookup_path() for this case.

From a maintenance point of view, it would be much better to just call
escape_uri on the argument to sub_req_lookup_uri() --- this is *very*
slightly more expensive, but the extra cost is trivial in the context
of a CGI fork/exec.

(As a matter of style, there's way too much duplication of code in the
sub_req stuff, and it has already been a maintenance headache.
Creating another complete duplicate is, in my view, something to be
avoided if there is any plausible alternative --- and in this case,
there is).

2. For internal redirects, the URL gets decoded twice.

e.g.
redir contains:
#!/bin/sh
echo 'Location: /cgi-bin/test-cgi/byee%2524'
echo

then http://server.com/cgi-bin/redir
produces
PATH_INFO = /byee$
PATH_TRANSLATED = /opt/httpd/htdocs/byee$

fix: remove unescape_url() and friends from internal_redirect().

Agreed.

3. for virtual path redirects, the URL might not be decoded, or parts of
it might be decoded twice.

The call to unescape_url() in sub_req_lookup_uri() is in the wrong place;
the new path should be decoded straight away.

Disagree -- the current code there parallels the treatment of URIs as
they come in from the client; in particular, parse_uri must be called
before unescape_uri. Your "fix" does it the other way, which would
introduce at least two clear bugs --- a sub_req_uri lookup for
/foo/bar%3fmoo would be treated the same as /foo/bar?moo, (it would
*be* the same when parse_uri was called), which is clearly incorrect,
and any args (anything following a real '?') in the URI would be
decoded, which is very wrong (scripts need to be able to distinguish
'%2b' from '+').

(IMHO, these bugs --- in particular, decoding QUERY_ARGS --- would be
rather more severe than anything fixed by this patch).

4. (May not be a bug) sub_req_lookup_file wasn't setting the args of the
new request.

Not a bug, in my view --- URIs may have arguments, by the '...?foo'
convention, but filenames do not.

rst
Re: REPOST: URL decoding bugs in apache 0.8.3 [ In reply to ]
I think that apache showing problems for URL paths containing (decoded) '%'
characters should be considered signficicant.

> 1. PATH_INFO gets URL decoded _again_ before PATH_TRANSLATED is found;
> e.g.
> http://server.com/cgi-bin/test-cgi/hello%2524
>
> produces
> PATH_INFO = /hello%24
> PATH_TRANSLATED = /opt/httpd/htdocs/hello$
>
> fix: new routine sub_req_lookup_path() for this case.
>
From a maintenance point of view, it would be much better to just call
>escape_uri on the argument to sub_req_lookup_uri() --- this is *very*
>slightly more expensive, but the extra cost is trivial in the context
>of a CGI fork/exec.

Just as good. (Of course, the CGI spec should never have defined PATH_INFO
to URL decoded.)

>(As a matter of style, there's way too much duplication of code in the
>sub_req stuff, and it has already been a maintenance headache.
>Creating another complete duplicate is, in my view, something to be
>avoided if there is any plausible alternative --- and in this case,
>there is).

>Oh yes. I was simply being consistend with the existing code.
>
> 3. for virtual path redirects, the URL might not be decoded, or parts of
> it might be decoded twice.
>
> The call to unescape_url() in sub_req_lookup_uri() is in the wrong place;
> the new path should be decoded straight away.
>
>Disagree -- the current code there parallels the treatment of URIs as
>they come in from the client; in particular, parse_uri must be called
>before unescape_uri. Your "fix" does it the other way, which would
>introduce at least two clear bugs --- a sub_req_uri lookup for
>/foo/bar%3fmoo would be treated the same as /foo/bar?moo, (it would
>*be* the same when parse_uri was called), which is clearly incorrect,
>and any args (anything following a real '?') in the URI would be
>decoded, which is very wrong (scripts need to be able to distinguish
>'%2b' from '+').

Well spotted.
But the current code has similar bugs;
a lookup of bar%41?moo would be inherit the args of the original request,
instead of setting it to 'moo', and the path would not be URL decoded
to barA. And it would fail if the original path contained a (decoded)
'%' character.

Revised patch for this bug below.

> 4. (May not be a bug) sub_req_lookup_file wasn't setting the args of the
> new request.
>
>Not a bug, in my view --- URIs may have arguments, by the '...?foo'
>convention, but filenames do not.

Only a suggestion; but it seemed easier to set args rather than try and
guarantee that no existing or future code would ever try and access the args
for a request returned by sub_req_look_file.


David.

------------------ begin file decode.patch -------------------------
*** http_request.c.orig Tue Jul 25 02:12:45 1995
--- http_request.c Thu Jul 27 15:42:29 1995
***************
*** 326,341 ****
request_rec *sub_req_lookup_uri (char *new_file, request_rec *r)
{
request_rec *rnew;
! int res;
char *udir;

! /* Check for a special case... if there are no '/' characters in new_file
! * at all, then we are looking at a relative lookup in the same directory.
* That means we don't have to redo any access checks.
*/

! if (strchr (new_file, '/') == NULL)
return sub_req_lookup_simple (new_file, r);

/* The nasty general case. Need to start from nearly scratch. */

--- 326,346 ----
request_rec *sub_req_lookup_uri (char *new_file, request_rec *r)
{
request_rec *rnew;
! int res, relpath;
char *udir;

! /* Check for a special case... if there are no '/' or '?' characters in
! * new_file at all, then we are looking at a relative lookup in the same
! * directory without any query_args.
* That means we don't have to redo any access checks.
*/

! if (strchr (new_file, '/') == NULL && strchr(new_file, '?') == NULL)
! {
! new_file = pstrdup(r->pool, new_file);
! unescape_url(new_file);
return sub_req_lookup_simple (new_file, r);
+ }

/* The nasty general case. Need to start from nearly scratch. */

***************
*** 346,357 ****
rnew->server = r->server;
rnew->request_config = create_request_config (rnew->pool);
set_sub_req_protocol (rnew, r);
!
! parse_uri (rnew, ((new_file[0] == '/') ?
! new_file :
! make_full_path (rnew->pool, udir, new_file)));
!
unescape_url (rnew->uri);
getparents (rnew->uri);

if ((res = translate_name (rnew))
--- 351,363 ----
rnew->server = r->server;
rnew->request_config = create_request_config (rnew->pool);
set_sub_req_protocol (rnew, r);
!
! relpath = new_file[0] != '/';
! parse_uri (rnew, new_file);
unescape_url (rnew->uri);
+
+ if (relpath) rnew->uri = make_full_path(rnew->pool, udir, rnew->uri);
+
getparents (rnew->uri);

if ((res = translate_name (rnew))
------------------ end file decode.patch -------------------------------
Re: REPOST: URL decoding bugs in apache 0.8.3 [ In reply to ]
Date: Thu, 27 Jul 95 15:53 BST
From: drtr@ast.cam.ac.uk (David Robinson)
Precedence: bulk
Reply-To: new-httpd@hyperreal.com

I think that apache showing problems for URL paths containing (decoded) '%'
characters should be considered signficicant.

For all such paths --- of course. For paths which are produced in
circumstances which are highly unlikely to come up in practice, I'm
not so sure. My test for this sort of thing is, as I said before,
whether the marginal cases in question have actually come up in
practice, on someone's real site (or whether there's a *clearly*
plausible way in which they could).

For your bugs 1 & 2, I can come up with plausible situations where it
would screw up (and the fixes are simple and local, which makes
unintended consequences less likely). For #3, I'm not so sure ---
particularly when the fix involves changing a somewhat tricky routine.

But the current code has similar bugs;
a lookup of bar%41?moo would be inherit the args of the original request,
instead of setting it to 'moo', and the path would not be URL decoded
to barA. And it would fail if the original path contained a (decoded)
'%' character.

Hmmm... this patch makes me nervous (non-trivial changes to a somewhat
subtle routine), but it looks inoccuous enough, and I can at least see
how to trigger the bug(s) it fixes. I'd appreciate it if someone at a
heavy includes-using site would try it out (SSI could be totally
trashed and I'd never notice here); if it doesn't wreck anything, I'm
happy to have it.

(Incidentally, it doesn't seem to fix your bugs 1&2 (PATH_TRANSLATED
(bleah) and internal_redirect. A separate patch for these would be
very helpful, if you could make one up).

> 4. (May not be a bug) sub_req_lookup_file wasn't setting the args of the
> new request.
>
>Not a bug, in my view --- URIs may have arguments, by the '...?foo'
>convention, but filenames do not.

Only a suggestion; but it seemed easier to set args rather than try and
guarantee that no existing or future code would ever try and access the args
for a request returned by sub_req_look_file.

If it does, it will find... no args, the same as if args were
supported but there simply hadn't been any. I'm not sure I see the
problem.

rst
Re: REPOST: URL decoding bugs in apache 0.8.3 [ In reply to ]
Here they are.

Thanks. FITNR.

rst
Re: REPOST: URL decoding bugs in apache 0.8.3 [ In reply to ]
>(Incidentally, it doesn't seem to fix your bugs 1&2 (PATH_TRANSLATED
>(bleah) and internal_redirect. A separate patch for these would be
>very helpful, if you could make one up).

Here they are.

David.

------------------------ PATH_TRANSLATED patch
*** mod_cgi.c.orig Wed Jul 19 01:38:43 1995
--- mod_cgi.c Fri Jul 28 10:46:44 1995
***************
*** 190,196 ****
}

if (r->path_info && r->path_info[0]) {
! request_rec *pa_req = sub_req_lookup_uri (r->path_info, r);

table_set (e, "PATH_INFO", r->path_info);

--- 190,201 ----
}

if (r->path_info && r->path_info[0]) {
! /*
! * To get PATH_TRANSLATED, treat PATH_INFO as a URI path. Need to re-escape
! * it for this.
! */
! request_rec *pa_req = sub_req_lookup_uri(
! escape_uri(r->pool, r->path_info), r);

table_set (e, "PATH_INFO", r->path_info);

------------------------ end PATH_TRANSLATED patch

------------------------ internal redirect patch
*** http_request.c.orig2 Thu Jul 27 15:42:29 1995
--- http_request.c Fri Jul 28 10:41:05 1995
***************
*** 611,619 ****
new->prev = r;
r->next = new;

- unescape_url(new->uri);
- getparents (new->uri);
-
/* We are redirecting. Treat the internally generated transaction
* as a GET, since there is not a chance of its getting POST-style
* arguments.
--- 611,616 ----
------------------------ end redirect patch