Mailing List Archive

Patch go boom...
Mmm,

04a_ExtraPath.0.8.14.patch

For 04a_ExtraPath.0.8.14.patch Ben L writes:

Changelog: Prevent Apache from serving /x/y/z/a as /x/y when z/a
doesn't exist.

I sort of thought we'd already hammered this out a couple of rounds ago.
I recall Rob H and I fussing because we both knew of code that used to
expect a /x/y CGI-BIN script to be passed /z/a as PATH_INFO. Rob H, you
wanna confirm?

Ben, do you want to compare this with:

http://sumwarez.com/cgi-bin/test-cgi/foo/bar

where PATH_INFO == /foo/bar

Or am I missing summink.

---

Ay.
Re: Patch go boom... [ In reply to ]
>
>
> Mmm,
>
> 04a_ExtraPath.0.8.14.patch
>
> For 04a_ExtraPath.0.8.14.patch Ben L writes:
>
> Changelog: Prevent Apache from serving /x/y/z/a as /x/y when z/a
> doesn't exist.
>
> I sort of thought we'd already hammered this out a couple of rounds ago.
> I recall Rob H and I fussing because we both knew of code that used to
> expect a /x/y CGI-BIN script to be passed /z/a as PATH_INFO. Rob H, you
> wanna confirm?
>
> Ben, do you want to compare this with:
>
> http://sumwarez.com/cgi-bin/test-cgi/foo/bar
>
> where PATH_INFO == /foo/bar
>
> Or am I missing summink.

Yep. This patch does not affect cgi scripts PATH_INFO stuff. The patch
prevents ordinary pages from exhibiting this bizarre behaviour.

> Ay.

--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk
A.L. Digital Ltd,
London, England.
Re: Patch go boom... [ In reply to ]
> >
> > Or am I missing summink.
>
> Yep. This patch does not affect cgi scripts PATH_INFO stuff. The patch
> prevents ordinary pages from exhibiting this bizarre behaviour.

I think I remember someone saying that PATH_INFO can be used by SSI,
so is the patch still necessary?

rob
Re: Patch go boom... [ In reply to ]
>
>
> > >
> > > Or am I missing summink.
> >
> > Yep. This patch does not affect cgi scripts PATH_INFO stuff. The patch
> > prevents ordinary pages from exhibiting this bizarre behaviour.
>
> I think I remember someone saying that PATH_INFO can be used by SSI,
> so is the patch still necessary?

SSI?

The patch only affects URLs which resolve to plain files.

>
> rob

--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk
A.L. Digital Ltd,
London, England.
Re: Patch go boom... [ In reply to ]
>
> > > Mmm,
> > >
> > > 04a_ExtraPath.0.8.14.patch
> > >
> > > For 04a_ExtraPath.0.8.14.patch Ben L writes:
> > >
> > > Changelog: Prevent Apache from serving /x/y/z/a as /x/y when z/a
> > > doesn't exist.
> > >
> > > I sort of thought we'd already hammered this out a couple of rounds ago.
> > > I recall Rob H and I fussing because we both knew of code that used to
> > > expect a /x/y CGI-BIN script to be passed /z/a as PATH_INFO. Rob H, you
> > > wanna confirm?
> > >
> > > Ben, do you want to compare this with:
> > >
> > > http://sumwarez.com/cgi-bin/test-cgi/foo/bar
> > >
> > > where PATH_INFO == /foo/bar
> > >
> > > Or am I missing summink.
>
>
> Ben:
>
> > Yep. This patch does not affect cgi scripts PATH_INFO stuff. The patch
> > prevents ordinary pages from exhibiting this bizarre behaviour.
>
> But, but... it's not bizarre. For a URL like:
>
> http://where/foo.html/bar/baz
>
> there's a web resource called foo.html which can receive /bar/baz as PATH_INFO.
> If foo.html is a SSI-enabled page (chmod u+x, or renamed to *.shtml) then
> PATH_INFO is passed to the SSI environment and everyone's happy. In this
> sense foo.html is working as a script.
>
> But if foo.html is just a regular page (no SSI) then why should the server
> behave differently? Specifically, why should the browser be made to care
> whether or not the resource can make use of the additional path information?
>
> A counter argument would be:
>
> "Sure, then what's to stop people from sending URLs like:
> http://where/aaaa/any/old/stuff/and/nonsense"
>
> and my response would be:
>
> "Provided there's an 'aaaa' or 'aaaa/any' or 'aaaa/any/old' etc,
> etc, then it doesn't matter. Search the URL from left to right
> stopping at the last matching resource (.html, .shtml, .cgi) and
> everything remaining to the right is for the resource to deal with."
>
> RobH:
> > I think I remember someone saying that PATH_INFO can be used by SSI,
> > so is the patch still necessary?
>
> Well, I was confused. This patch has no effect if foo.html is SSI
> enabled. But that's not the point.
>
> I don't like this patch ;) But I wonder if we all agree about what URLs
> really mean. For my argument a URL != UNIX file, and I believe we'd be
> limiting the flexibility of the server by adding this new behaviour.
>
> Any offers? Perhaps Roy F's got a clue here?
>

I agree in essence; a URL is not a file. However, it seems to me that the
whole URL should be used to determine the content. Redundant extra bits
can be ignored, but it seems more sensible and useful to not ignore them.
Of course, in the case of SSI and CGI enforcing this is beyond the remit of
the server, but where plain ordinary files are concerned, the server can
see that there is extra meaningless stuff and should complain appropriately.
If there are some that think the old behaviour is useful, we can make it a
configurable flag.

> Ay.
>

--
Ben Laurie Phone: +44 (181) 994 6435
Freelance Consultant Fax: +44 (181) 994 6472
and Technical Director Email: ben@algroup.co.uk
A.L. Digital Ltd,
London, England.
Re: Patch go boom... [ In reply to ]
I think I remember someone saying that PATH_INFO can be used by SSI,
so is the patch still necessary?

The patch is to default_handler. SSI is handled by the handlers in the
SSI module. Ergo, SSI considerations are irrelevant to the patch.

Sigh...

rst
Re: Patch go boom... [ In reply to ]
> > Mmm,
> >
> > 04a_ExtraPath.0.8.14.patch
> >
> > For 04a_ExtraPath.0.8.14.patch Ben L writes:
> >
> > Changelog: Prevent Apache from serving /x/y/z/a as /x/y when z/a
> > doesn't exist.
> >
> > I sort of thought we'd already hammered this out a couple of rounds ago.
> > I recall Rob H and I fussing because we both knew of code that used to
> > expect a /x/y CGI-BIN script to be passed /z/a as PATH_INFO. Rob H, you
> > wanna confirm?
> >
> > Ben, do you want to compare this with:
> >
> > http://sumwarez.com/cgi-bin/test-cgi/foo/bar
> >
> > where PATH_INFO == /foo/bar
> >
> > Or am I missing summink.


Ben:

> Yep. This patch does not affect cgi scripts PATH_INFO stuff. The patch
> prevents ordinary pages from exhibiting this bizarre behaviour.

But, but... it's not bizarre. For a URL like:

http://where/foo.html/bar/baz

there's a web resource called foo.html which can receive /bar/baz as PATH_INFO.
If foo.html is a SSI-enabled page (chmod u+x, or renamed to *.shtml) then
PATH_INFO is passed to the SSI environment and everyone's happy. In this
sense foo.html is working as a script.

But if foo.html is just a regular page (no SSI) then why should the server
behave differently? Specifically, why should the browser be made to care
whether or not the resource can make use of the additional path information?

A counter argument would be:

"Sure, then what's to stop people from sending URLs like:
http://where/aaaa/any/old/stuff/and/nonsense"

and my response would be:

"Provided there's an 'aaaa' or 'aaaa/any' or 'aaaa/any/old' etc,
etc, then it doesn't matter. Search the URL from left to right
stopping at the last matching resource (.html, .shtml, .cgi) and
everything remaining to the right is for the resource to deal with."

RobH:
> I think I remember someone saying that PATH_INFO can be used by SSI,
> so is the patch still necessary?

Well, I was confused. This patch has no effect if foo.html is SSI
enabled. But that's not the point.

I don't like this patch ;) But I wonder if we all agree about what URLs
really mean. For my argument a URL != UNIX file, and I believe we'd be
limiting the flexibility of the server by adding this new behaviour.

Any offers? Perhaps Roy F's got a clue here?

Ay.
Re: Patch go boom... [ In reply to ]
>I agree in essence; a URL is not a file. However, it seems to me that the
>whole URL should be used to determine the content. Redundant extra bits
>can be ignored, but it seems more sensible and useful to not ignore them.
>Of course, in the case of SSI and CGI enforcing this is beyond the remit of
>the server, but where plain ordinary files are concerned, the server can
>see that there is extra meaningless stuff and should complain appropriately.
>If there are some that think the old behaviour is useful, we can make it a
>configurable flag.

Although it's not what you meant, the _old_ behaviour (apache 0.6.5 and
earlier) was to reject /path/file.html/extra/data. We should default
to compatibility unless someone has a convincing argument for changing the
behaviour (i.e. a convincing argument for not fixing the bug).

David.
Re: Patch go boom... [ In reply to ]
> >I agree in essence; a URL is not a file. However, it seems to me that the
> >whole URL should be used to determine the content. Redundant extra bits
> >can be ignored, but it seems more sensible and useful to not ignore them.
> >Of course, in the case of SSI and CGI enforcing this is beyond the remit of
> >the server, but where plain ordinary files are concerned, the server can
> >see that there is extra meaningless stuff and should complain appropriately.
> >If there are some that think the old behaviour is useful, we can make it a
> >configurable flag.
>
> Although it's not what you meant, the _old_ behaviour (apache 0.6.5 and
> earlier) was to reject /path/file.html/extra/data. We should default
> to compatibility unless someone has a convincing argument for changing the
> behaviour (i.e. a convincing argument for not fixing the bug).

Well ok. I've been for a jog and had a think about it. Ben L's patch
makes explicit an assumption that trailing information is meaningless and
indeed r0ng IFF the resource in question is a plain *.html, *.gif or whatever
file. It doesn't effect SSI behaviour (sorry, my mistake, I misinterpreted
Ben's intent) and CGI is also unaffected.

This week we've seen a suggestion from Cameron Elliott <cam@indy.mvbms.com>
which uses in-URL cookies, essentially non-path information being passed to
the server alongside path information. One way to implement this could be
to allow cookies embedded in trailing information to be accepted, but again
there'd be a smart module which could process this additional info out of the
URL before the default server behaviour kicked in and trashed the access.

Mmm...

I can think of one example (feasible, but arcane and an acre of pain to
implement) where Ben L's patch would hurt. Suppose you wanted to provide
different content from the same file, depending on the nature of the PATH_INFO.

[.in this example I'll use #exec SSI calls, but clearly this might
be better performed by a to-be-written conditional SSI module.

eg: <h2>Section 2</h2>
<!--#if ( PATH_INFO eq expanded ) -->
<blockquote>
The section wherein we describe, in full, that which
is to be described between the first and the third
section.
</blockquote>
<!--#endif -->
Blah, blah, blah...
]

If a set of URLs were published:

http://foo/bar/baz1.html/brief
http://foo/bar/baz1.html/expanded
http://foo/bar/baz1.html/glossary

http://foo/bar/baz2.html/expanded
http://foo/bar/baz3.html/glossary

...

Then people would be able to put all the pertinent information in a single
file (which makes sense in terms of managing the data) and would be able to
offer different views of the data depending on the trailing information
they embed in the URL. Ok, now suppose that Joe DataManager wanted to
switch this functionality off by 'chmod u-x'ing the file. SSI would stop
working and only a reasonable subset of the full text would be exported to the
browser, regardless of the trailing information.

[.using SSI as it stands this might be done so:

eg: <h2>Section 2</h2>
<!--#exec cmd="/home/andrew/bin/SSI/includeif \
expanded \
'<blockquote>
The section wherein we describe, in full, that which
is to be described between the first and the third
section.
</blockquote>'"
-->
Blah, blah, blah...

Read the enclosed source for more info, try switching between u+x
and u-x, or your local equivalent, on the .html file.
]


...but, as has been pointed out before now. I can always find a reason to NOT
do something.

> David.

For the time being I'll close. I'd be nice if there was a note made in
the compatability issues section regarding this reaffirmation of 0.6.5's
behaviour? Basically I'll probably be reversing the patch on my own gear.

Cheers,
Ay.


--- cut here ---

<html>
<head>
<title>includeif</title>
</head>
<body>
<h2 align=center>includeif</h2>
<blockquote>
This is funky. Try appending one of the following to the URL:
<dl>
<dt> <b>/expanded</b>
<dd> More information than you can usefully manage at a glance.
<dt> <b>/glossary</b>
<dd> Just what do all those terms mean anyway.
</dl>
</blockquote>
<h2>Section 2</h2>
<!--#exec cmd="/home/andrew/bin/SSI/includeif \
expanded \
'<blockquote>
The section wherein we describe, in full, that which
is to be described between the first and the third
section.
</blockquote>'"
-->
Blah, blah, blah...

<!--#exec cmd="/home/andrew/bin/SSI/includeif \
glossary \
'
<h2>Glossary</h2>
<dl compact>
<dt> <b>foo</b>
<dd> That which is to be called <i>foo</i>.
<dt> <b>bar</b>
<dd> That which is not <i>foo</i> but is nonetheless <i>bar</i>.
</dl>'"
-->

</body>
</html>


--- cut here ---

#!/usr/local/bin/perl

# includeif
#
# Andrew Wilson 10.Sep.95
#

( $IF, $WHAT ) = @ARGV;

if ( $ENV{PATH_INFO} eq "/$IF" ) {
print "$WHAT";
};

exit;

--- cut here ---
Re: Patch go boom... [ In reply to ]
> I don't like this patch ;) But I wonder if we all agree about what URLs
> really mean. For my argument a URL != UNIX file, and I believe we'd be
> limiting the flexibility of the server by adding this new behaviour.

URL != File. However, if by chance a particular URL == File and File is
incapable of making use of extra path, then any extra path is bad.
In fact, I would extend that to SSI and only allow it for ASIS and CGI,
if I thought I could get away with it. Same goes for / vs /index.html
but I know I can't get away with that one.

Why is it bad? Because you can't index sites with infinite URLs.

> Any offers? Perhaps Roy F's got a clue here?

Two. Col. Mustard is in the Library with a lead pipe.

.....Roy ;-)
Re: Patch go boom... [ In reply to ]
> > I don't like this patch ;) But I wonder if we all agree about what URLs
> > really mean. For my argument a URL != UNIX file, and I believe we'd be
> > limiting the flexibility of the server by adding this new behaviour.
>
> URL != File. However, if by chance a particular URL == File and File is
> incapable of making use of extra path, then any extra path is bad.
> In fact, I would extend that to SSI and only allow it for ASIS and CGI,

But CGI and SSI are the same damned thing surely. They're notionally programs
that run when you access them, take parameters sucked outta the URL and spit
output back to the browser?

> if I thought I could get away with it. Same goes for / vs /index.html
> but I know I can't get away with that one.
>
> Why is it bad? Because you can't index sites with infinite URLs.

Huh? Stemming? Fuzzy matching? Publicised search engines? Is it possible
to 'index' a site that makes extensive use of multiviews? I'm confused by
what you mean by 'indexing' in this instance.

> > Any offers? Perhaps Roy F's got a clue here?
>
> Two. Col. Mustard is in the Library with a lead pipe.

Ms. Scarlett, in the bedroom, with the lights down lowwwww.

> .....Roy ;-)

Ay.
Re: Patch go boom... [ In reply to ]
"Roy T. Fielding" writes:
> URL != File. However, if by chance a particular URL == File and File is
...
> Why is it bad? Because you can't index sites with infinite URLs.

I would very much like it if /index.html were illegal when / is
specified to be the the canonical form for that document (the
current aliasing problem is really a major pain). I definitely
believe that a single document should have a single name.

In fact, I really wish there was an analog standard to `?' for
extra PATH_INFO so that clients could distinguish the document part
from their arguments (that would allow, among other things, browsers
to provide an interface for jumping back to "top" of an interface
without the user having to manually cut and paste the URL).
Re: Patch go boom... [ In reply to ]
In message <199510111814.NAA19745@austin.bsdi.com>, Tony Sanders writes:
>"Roy T. Fielding" writes:
>> URL != File. However, if by chance a particular URL == File and File is
>...
>> Why is it bad? Because you can't index sites with infinite URLs.
>
>I would very much like it if /index.html were illegal when / is
>specified to be the the canonical form for that document (the
>current aliasing problem is really a major pain). I definitely
>believe that a single document should have a single name.

I actually tried doing redirects for all */index.html to */.
Netscape dealt with it fine. My output looked fine when I viewed
it with telnet. But Microsoft's browser barfed bad on it. Lynx
didn't like it either. I'm an HTTP newbie, but I'm certain the
redirect was right -- I was returning "http://ServerName:ServerPort/*".
(Obviously * is the possibly empty path to the index.html, and I
was returning the trailing /). Sometime I'd like to figure out
what went wrong.

Dean
Re: Patch go boom... [ In reply to ]
On Wed, 11 Oct 1995, Tony Sanders wrote:

> In fact, I really wish there was an analog standard to `?' for
> extra PATH_INFO so that clients could distinguish the document part
> from their arguments (that would allow, among other things, browsers
> to provide an interface for jumping back to "top" of an interface
> without the user having to manually cut and paste the URL).

No other servers that I'm aware of do, but WebSTAR/MacHTTP does use
$ as the path info seperator instead of /

Just a thought.

--/ Alexei Kosut <akosut@nueva.pvt.k12.ca.us> /--------/ Lefler on IRC
----------------------------/ <http://www.nueva.pvt.k12.ca.us/~akosut/>
The viewpoints expressed above are entirely false, and in no way
represent Alexei Kosut nor any other person or entity. /--------------