At 8:00 PM -0700 6/27/2000, Chris Thorman wrote:
>In the URI escaping spec, '+' is the special (optional) escape code
>for space. But since %XX works for any character, %20 also works
>for space.
I can't find anything like that in
http://www.ietf.org/rfc/rfc2396.txt - do you have any references?
Here's what I do see:
2.4.1. Escaped Encoding
An escaped octet is encoded as a character triplet, consisting of the
percent character "%" followed by the two hexadecimal digits
representing the octet code. For example, "%20" is the escaped
encoding for the US-ASCII space character.
escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"
Nothing about '+'; it's mentioned as a reserved character elsewhere
in the RFC, but its significance is not linked to any particular
implementation function. My interpretation is that '+' is simply a
reserved character, traditionally used to delimit CGI params with
multiple arguments, such as
?search=foo+bar
but serving no specific purpose in the URI standard beyond its
reserved-ness. Perhaps there's a tradition of using '+' and %20
interchangeably, but IMO it's better to be fully RFC-compliant with
output, while still accepting non-standard inputs as liberally as
possible.
CGI.pm mentions a slightly-relevant case:
If the script was invoked with a parameter list (e.g.
"name1=value1&name2=value2&name3=value3"), the param()
method will return the parameter names as a list. If the
script was invoked as an <ISINDEX> script and contains a
string without ampersands (e.g. "value1+value2+value3") ,
there will be a single parameter named "keywords" containing
the "+"-delimited keywords.
And, if you trust CGI.pm's implementation:
use CGI;
use strict;
$^W = 1;
my $q = new CGI({});
$q->param("foo", qw/bar baz biz/);
print $q->query_string() . "\n";
$q = new CGI ({}); $q->param("foo", "bar baz biz");
print $q->query_string() . "\n";
outputs:
foo=bar;foo=baz;foo=biz
foo=bar%20baz%20biz
>At 4:54 PM -0700 6/27/00, Michael Blakeley wrote:
>>With the default $escapemode, Embperl seems to encode
>> javascript('foo bar')
>>as
>> javascript('foo+bar')
>>but I would have expected
>> javascript('foo%20bar')
>>like Apache::Utils::escape_uri() does it. The '+', to me, means
>>multiple options.
>>
> >Am I misguided? Or is Embperl?
I'm convinced that the conversion really ought to be
' '->%20
The patch below (diff against 1.3b3) fixes this. Note that it also
breaks the 'make test', for obvious reasons.
Any RFC-compliant client must translate %20 to space, but they may
not all translate '+' to space, since that seems to be traditional
behavior rather than RFC-specified behavior. For example, when
Netscape 4.73 submits a form, it translates
<INPUT type=text name="Text" size=20 maxsize=128 VALUE="foo bar">
to
?Text=foo+bar
Apache::param() also seems to be liberal about accepting '+' for %20.
But it doesn't work that way with the "thin client" I'm coding for. I
don't think I can argue that the client's wrong without a standards
doc that says so. I'd also rather that Embperl be conservative about
what it sends out - adhering to the RFC rather than to common, but
non-RFC behavior.
-- Mike
diff -c epchar.c.orig epchar.c
*** epchar.c.orig Tue Jun 27 17:09:17 2000
--- epchar.c Tue Jun 27 17:21:55 2000
***************
*** 324,330 ****
--- 324,335 ----
{ ' ' , "%1D" }, /*  Unused */
{ ' ' , "%1E" }, /*  Unused */
{ ' ' , "%1F" }, /*  Unused */
+ /* see
http://www.ietf.org/rfc/rfc2396.txt */
+ #ifdef ENCODE_SPACES_AS_PLUS
{ ' ' , "+" }, /*   Space */
+ #else
+ { ' ' , "%20" }, /*   Space */
+ #endif
{ '!' , "" }, /* ! Exclamation mark */
{ '"' , "%22" }, /* Quotation mark */
{ '#' , "" }, /* # Number sign */