Mailing List Archive

sloppy URLs
Okay so we can have,

a missing leading /
a missing trailing / (for directory indexing)
a superfluous trailing / e.g. /index.html/


I vote we "404 Not Found" them all. Let's stamp out
patronising stupid users who setup links to the wrong stuff.

True it'll break lots of URLs which have until now been
redirected by the server, but do we want to continue playing
mother forever ?

I'd be willing to setup a custom 404 script response intercept
common bad urls and instead give a suitable "Go fix it" message.

I know some of you won't like this idea, so it probably needs to
be customizable... maybe a "REDIRECT_SLOPPY_URLS on" in a
config file.


robh
Re: sloppy URLs [ In reply to ]
> Specifically --- wrt trailing / --- it works specifically by the request
> of a whole lot of users, including (at one point) me. It's not as
> efficient as including the trailing /, but so long as people know that
> I don't see any problem with their continuing to use it. And making
> these return 404s instead of redirects would break MANY existing sites;
> Apache would no longer be a drop-in replacement for NCSA 1.3, let alone
> 1.4. I think that's a bad idea.

This is the one that bugs me. If I publish a link as /foo/bar/ I expect
people to use it. What happens though is that somewhere along the
line some bozzo write it as /foo/bar. People follow that link and
it works. All I get out of it is the overhead of dealing with more and
more redirects to /foo/bar. If I could setup Apache to put a stop to
this complete waste of time, the original bozzo would have written
/foo/bar, it would fail and explain the problem, and he'd correct it.

Can't we have this as an option ?
Re: sloppy URLs [ In reply to ]
I don't know if this has ever been explicitly agreed upon, but I thought
all of us agreed that apache should try and be a drop-in
replacement for NCSA as much as possible. No, this doesn't mean that
we shouldn't fix conceptual bugs (or, 'misfeatures') in NCSA's code, but
where the behavior differs enough that people will have to change things
when they install apache, we should make sure it's necessary.

On Fri, 7 Apr 1995, Rob Hartill wrote:
> Okay so we can have,
>
> a missing leading /

clearly a bug. 400 it.

> a missing trailing / (for directory indexing)

*not* a bug. Fix redirects so they don't have :80, but enough people
rely on this, and I'm not convinced it's clearly a "wrong" thing. For
example, it provides a pretty graceful mechanism for turning a MultiViews
page into a directory when the page gets to big - i.e.

http://host/path/file

can represent file.html or file.html3 when I first create it - then file
gets to big and I'd like to turn it into a subdirectory. I don't have to
change any URL's, as now httpd will redirect that to

http://host/path/file/

and the index.html beneath that. I like this, so it's not clear that
missing trailing slashes is a conceptually bad thing to me

> a superfluous trailing / e.g. /index.html/

Hmm. What about when we implement server-side annotations where people
POST comments on arbitrary URL's, and we want to pass along PATH_INFO to
the CGI script that handles it? Maybe I'm grasping as straws here, so
never mind. Classify it as "if a regular file access has a trailing
slash where the file before that trailing slash is a file and not a
directory, return a 404". Or maybe a 400....

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: sloppy URLs [ In reply to ]
On Fri, 7 Apr 1995, Rob Hartill wrote:
> > Specifically --- wrt trailing / --- it works specifically by the request
> > of a whole lot of users, including (at one point) me. It's not as
> > efficient as including the trailing /, but so long as people know that
> > I don't see any problem with their continuing to use it. And making
> > these return 404s instead of redirects would break MANY existing sites;
> > Apache would no longer be a drop-in replacement for NCSA 1.3, let alone
> > 1.4. I think that's a bad idea.
>
> This is the one that bugs me. If I publish a link as /foo/bar/ I expect
> people to use it. What happens though is that somewhere along the
> line some bozzo write it as /foo/bar. People follow that link and
> it works. All I get out of it is the overhead of dealing with more and
> more redirects to /foo/bar. If I could setup Apache to put a stop to
> this complete waste of time, the original bozzo would have written
> /foo/bar, it would fail and explain the problem, and he'd correct it.
>
> Can't we have this as an option ?

-DANAL_RETENTIVE? :)

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: sloppy URLs [ In reply to ]
On Fri, 7 Apr 1995, Rob Hartill wrote:
> Is this why hotwired gets 400k hits a day ? are they all doubled up ? :-)
>
> By redirecting these broken URLs, you're not doing anyone any favours
> in the long term. Clients have to make 2 requests instead of one, the
> server has to service two requests instead of 1. It's bad netiquette.
> Someone should put a stop to it. We're in a position to do that.

Make it a config file (not compile-time, I was joking earlier :) option,
and I'll gladly support it.

And no, it's not the 302's that cause lots of extra hits, it's the 401's
and 304's......

Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@hotwired.com brian@hyperreal.com http://www.hotwired.com/Staff/brian/
Re: sloppy URLs [ In reply to ]
Missing leading / should clearly get a 400 --- no browser should ever
even be generating such a request.

As to the other two, I'm not sure why you regard the current behavior
as "incorrect" --- there's no standard that says anything at all in
particular about how URLs relate to the underlying file structure, nor
should there be --- it just places too much constraint on perfectly
legitimate experimentation by server authors. (It's perfectly reasonable
to have database-based servers which don't even have an underlying file
system).

Specifically --- wrt trailing / --- it works specifically by the request
of a whole lot of users, including (at one point) me. It's not as
efficient as including the trailing /, but so long as people know that
I don't see any problem with their continuing to use it. And making
these return 404s instead of redirects would break MANY existing sites;
Apache would no longer be a drop-in replacement for NCSA 1.3, let alone
1.4. I think that's a bad idea.

Trailing / is, IMHO, even a weaker case --- trailing / even has a perfectly
legitimate interpretation in some circumstances (PATH_INFO to server-includes
HTML). Sure, it's silly, but unlike the "missing" / case, there isn't even
any extra overhead. Why should we go out of our way to break it?

-1 on all but the first.

rst
Re: sloppy URLs [ In reply to ]
/*
* "Re: sloppy URLs" by Rob Hartill <hartill@ooo.lanl.gov>
* written Fri, 7 Apr 95 17:04:15 MDT
*
** I just checked --- the CERN server sends redirects, as does
** NetSite. It seems to be fairly standard and accepted behavior.

NCSA httpd is supposed to redirect missing trailing slashes too. The
place where it doesn't is when name translation is involved; I made
sure to fix that little design problem in Netsite.

CERN has a feature you can turn on where accessing a directory without
a trailing slash forces an automatic index, and adding the / gives you
the index. I always thought that was a Bad Idea.

* As a service provider, I'm happy to put up with screaming users for
* a week or two while they fix their URLs, knowing that in the long
* term everything speeds up.
*/

In general I'd agree however for some reason it just confuses the hell
out of some users.

--Rob
Re: sloppy URLs [ In reply to ]
Can't we have this as an option ?

If it's a config-file option, and off by default (that drop-in replacement
thing again), I could stomach it.

rst
Re: sloppy URLs [ In reply to ]
> >
> > Can't we have this as an option ?
>
> -DANAL_RETENTIVE? :)
>

sure, if that switches sloppy URL redirecting on.

How do other servers react to missing trailing slashes ?
What if some sensible server rejects these, and I want to
mirror by stuff onto that server. Should mine accept sloppy
URLs while the mirror doesn't ?

the two sides to the argument are,

1) if you ask for the wrong URL, you should be told it isn't there.

2) if you ask for the wrong URL, and the server is able to guess
what you really meant, then the server should tell you where
to go get the proper URL.

Brian suggests using (2) as a feature - a way to make things flexible
in the future. Ok fairy nuff, but in the meantime you send out
tens of thousands of redirects to clients deliberately. So you
advertize the "broken" URL, on the off chance that one day it'll
be fixed. This is a seriously flawed idea.

Is this why hotwired gets 400k hits a day ? are they all doubled up ? :-)

By redirecting these broken URLs, you're not doing anyone any favours
in the long term. Clients have to make 2 requests instead of one, the
server has to service two requests instead of 1. It's bad netiquette.
Someone should put a stop to it. We're in a position to do that.


robh/
Re: sloppy URLs [ In reply to ]
> How do other servers react to missing trailing slashes ?
>
> I just checked --- the CERN server sends redirects, as does NetSite.
> It seems to be fairly standard and accepted behavior.

but do they have this wonderful on/off switch that we might get ?

:-)

As a service provider, I'm happy to put up with screaming users
for a week or two while they fix their URLs, knowing that in the
long term everything speeds up.

Maybe Randy could draw me a road sign, the circular type with
the words "No Sloppy URLs" crossed off. My 404 Not Found custom
page could then explain the problem and benefits of fixing it to
the users. I could even offer them the correct URL if I knew
what it was. Hey I'd be educating them !


robh: 402 Payment Required
Re: sloppy URLs [ In reply to ]
> Rob --- just a question --- have you actually looked at the logs from
> any of your sites and determined what percentage of responses get
> redirected? If it's five or less, I don't think it's at all worth
> worrying about...

> (FYI, here, it's about 0.3%).

Well, you can argue that the other way. If only 0.3% of links
are broken, it won't hurt too many people to inforce the
correct behaviour :-)

You can't argue it both ways, one minute we're upsetting lots of
people using these bad links, now they're an insignificant number.

Even if it's 0.00001%, I still think that's too high. Why make
more work for your server. Fix those damn URLs and the problem
is gone forever.

I suspect that httpd used to do an internal redirect under these
circumstances, but then was forced to change to 302 'cos of the
relative URL problems. The feature should have been dropped at that
point instead of having this crazy setup where directories are a
special case of URL that you are allowed to address incorrectly.

Being redirected to the same machine over a very slow connection
is no fun at all.

Enough. If I haven't convinced you yet, you must be a die hard type
with a broken '/' key on your keyboard. :-)

robh
Re: sloppy URLs [ In reply to ]
> Sure I can. Watch me: Even if this only affects 1% of the people
> visiting my site on a given day (and I think that number's a bit low),
> that's still more people than I care to annoy without good cause. The
> cost of keeping them happy is *miniscule* --- the server spends much
> more of its time, proportionately, interpreting printf control
> strings.

Okay, I've no problem in letting you humor people, but let's not
force it on others. I think rejection should be on by default, so
that all *new* servers set up to run Apache, behave in the most
sensible way.

I could go for a default of redirecting, if it were made really
obvious that one should switch to rejecting unless you have a real
good reason to keep the status quo.

> Hey, so long as it's a config-file option, and off by default, I won't
> blackball it. But with all the other stuff lying around that consume
> so much more significant trifles of the server's resources, I really
> have trouble understanding why you're focused on *this*.

It's not the microseconds that it takes to do the processing it's
the seconds and sometimes tens of seconds that it takes to get the
request and pass back the redirect. In the meantime, one of your
N non-forking httpds is unuseable.

The only reason for focussing on this is that it's a misfeature,
that I suggested we fix, but which resulted in a lot of arguments
that it shouldn't.


robh
Re: sloppy URLs [ In reply to ]
Rob H. asks...

How do other servers react to missing trailing slashes ?

I just checked --- the CERN server sends redirects, as does NetSite.
It seems to be fairly standard and accepted behavior.

rst
Re: sloppy URLs [ In reply to ]
Rob --- just a question --- have you actually looked at the logs from
any of your sites and determined what percentage of responses get
redirected? If it's five or less, I don't think it's at all worth
worrying about...

(FYI, here, it's about 0.3%).

rst
Re: sloppy URLs [ In reply to ]
From: Rob Hartill <hartill@ooo.lanl.gov>
Date: Fri, 7 Apr 95 17:23:55 MDT

> Rob --- just a question --- have you actually looked at the logs from
> any of your sites and determined what percentage of responses get
> redirected? If it's five or less, I don't think it's at all worth
> worrying about...

> (FYI, here, it's about 0.3%).

Well, you can argue that the other way. If only 0.3% of links
are broken, it won't hurt too many people to inforce the
correct behaviour :-)

Note the implicit assumption that each person follows only one link.
If you make more realistic assumptions (say, 10 to 12 transactions per
customer), the number of people affected rises correspondingly.

You can't argue it both ways, one minute we're upsetting lots of
people using these bad links, now they're an insignificant number.

Sure I can. Watch me: Even if this only affects 1% of the people
visiting my site on a given day (and I think that number's a bit low),
that's still more people than I care to annoy without good cause. The
cost of keeping them happy is *miniscule* --- the server spends much
more of its time, proportionately, interpreting printf control
strings.

(Actually _doprnt consumes surprisingly many cycles; last time I ran a
profile, it was about 4% --- all user CPU, mostly in _doprnt itself,
so there's no contamination here from the underlying writes. This
exceeds the load on my site from all redirects by roughly an order of
magnitude. Many of those printf()-variant calls could be replaced, or
at least cheapened, at the expense of obfuscating the code. I've made
a principled decision that it just ain't worth the trouble. These
redirects are worth less).

Enough. If I haven't convinced you yet, you must be a die hard type
with a broken '/' key on your keyboard. :-)

robh

Hey, so long as it's a config-file option, and off by default, I won't
blackball it. But with all the other stuff lying around that consume
so much more significant trifles of the server's resources, I really
have trouble understanding why you're focused on *this*.

rst
Re: sloppy URLs [ In reply to ]
> Rob H. asks...
>
> How do other servers react to missing trailing slashes ?
>
> I just checked --- the CERN server sends redirects, as does NetSite.
> It seems to be fairly standard and accepted behavior.
>
> rst

Yep, and I also agree that it should be the default behavior, with
the rejection behavior available via a config directive.

For HTTP/1.1, this can be replaced with the actual index document and
slashful BASE header when the server is talking to an HTTP/1.1 client.
Any year soon. ;-)

......Roy
Re: sloppy URLs [ In reply to ]
..enough biking for one day.

> Yep, and I also agree that it should be the default behavior, with
> the rejection behavior available via a config directive.
>
> For HTTP/1.1, this can be replaced with the actual index document and
> slashful BASE header when the server is talking to an HTTP/1.1 client.
> Any year soon. ;-)

How does this deal with a BASE appearing in the header and document ?

who do you trust ?, the server or the document writer ?
Re: sloppy URLs [ In reply to ]
>> For HTTP/1.1, this can be replaced with the actual index document and
>> slashful BASE header when the server is talking to an HTTP/1.1 client.
>> Any year soon. ;-)
>
> How does this deal with a BASE appearing in the header and document ?
>
> who do you trust ?, the server or the document writer ?

The innermost BASE takes precedence (the document writer is assumed
to be more intelligent than the server ;-).

Details can be found in the Relative URL spec, available from

http://www.ics.uci.edu/pub/ietf/uri/

but the Base: header won't be available to HTTP/1.0 clients.

......Roy
Re: sloppy URLs [ In reply to ]
A superfluous trailing slash is no different to any other PATH_INFO data.
So, whatever you do for /page.html/ you should do for /page.html/wibble

I strongly disagree that /directory has a 'missing' trailing slash.
From a user-interface point of view, /directory is probably _more_ natural
than /directory/. Certainly the URL spec does not require these to be the
same. And most users of Unix would tend to drop trailing slashes.

The only reason it is 'wrong' is if the browser uses /directory as the base
URL for any relative URLs in the document returned, whereas the document
the server is accessing is /directory/index.html.

Suppose httpd had a convention that the index for a directory was in fact
stored in the file /directory.index.html. In that case, it would be
/directory/ that would be incorrect, and the server would have to redirect
to remove the trailing slash.

If your only concern is the redirect, then consider adding a Base: header
to the respose, containing the correct base URL for the document.

David.