Mailing List Archive

uri escaping spaces
With the default $escapemode, Embperl seems to encode
javascript('foo bar')
as
javascript('foo+bar')
but I would have expected
javascript('foo%20bar')
like Apache::Utils::escape_uri() does it. The '+', to me, means
multiple options.

Am I misguided? Or is Embperl? Or is this one of those 'undefined
behavior' things?

thanks,
-- Mike
Re: uri escaping spaces [ In reply to ]
In the URI escaping spec, '+' is the special (optional) escape code for space. But since %XX works for any character, %20 also works for space.

-c


At 4:54 PM -0700 6/27/00, Michael Blakeley wrote:
>With the default $escapemode, Embperl seems to encode
> javascript('foo bar')
>as
> javascript('foo+bar')
>but I would have expected
> javascript('foo%20bar')
>like Apache::Utils::escape_uri() does it. The '+', to me, means multiple options.
>
>Am I misguided? Or is Embperl? Or is this one of those 'undefined behavior' things?
>
>thanks,
>-- Mike
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
>For additional commands, e-mail: embperl-help@perl.apache.org


------------------------------------------------------------------------
Chris Thorman (413) 473-0853 e-fax
------------------------------------------------------------------------
Re: uri escaping spaces [ In reply to ]
At 8:00 PM -0700 6/27/2000, Chris Thorman wrote:
>In the URI escaping spec, '+' is the special (optional) escape code
>for space. But since %XX works for any character, %20 also works
>for space.

I can't find anything like that in
http://www.ietf.org/rfc/rfc2396.txt - do you have any references?
Here's what I do see:

2.4.1. Escaped Encoding

An escaped octet is encoded as a character triplet, consisting of the
percent character "%" followed by the two hexadecimal digits
representing the octet code. For example, "%20" is the escaped
encoding for the US-ASCII space character.

escaped = "%" hex hex
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" |
"a" | "b" | "c" | "d" | "e" | "f"

Nothing about '+'; it's mentioned as a reserved character elsewhere
in the RFC, but its significance is not linked to any particular
implementation function. My interpretation is that '+' is simply a
reserved character, traditionally used to delimit CGI params with
multiple arguments, such as
?search=foo+bar
but serving no specific purpose in the URI standard beyond its
reserved-ness. Perhaps there's a tradition of using '+' and %20
interchangeably, but IMO it's better to be fully RFC-compliant with
output, while still accepting non-standard inputs as liberally as
possible.

CGI.pm mentions a slightly-relevant case:
If the script was invoked with a parameter list (e.g.
"name1=value1&name2=value2&name3=value3"), the param()
method will return the parameter names as a list. If the
script was invoked as an <ISINDEX> script and contains a
string without ampersands (e.g. "value1+value2+value3") ,
there will be a single parameter named "keywords" containing
the "+"-delimited keywords.

And, if you trust CGI.pm's implementation:
use CGI;
use strict;
$^W = 1;
my $q = new CGI({});
$q->param("foo", qw/bar baz biz/);
print $q->query_string() . "\n";
$q = new CGI ({}); $q->param("foo", "bar baz biz");
print $q->query_string() . "\n";

outputs:
foo=bar;foo=baz;foo=biz
foo=bar%20baz%20biz

>At 4:54 PM -0700 6/27/00, Michael Blakeley wrote:
>>With the default $escapemode, Embperl seems to encode
>> javascript('foo bar')
>>as
>> javascript('foo+bar')
>>but I would have expected
>> javascript('foo%20bar')
>>like Apache::Utils::escape_uri() does it. The '+', to me, means
>>multiple options.
>>
> >Am I misguided? Or is Embperl?

I'm convinced that the conversion really ought to be
' '->%20
The patch below (diff against 1.3b3) fixes this. Note that it also
breaks the 'make test', for obvious reasons.

Any RFC-compliant client must translate %20 to space, but they may
not all translate '+' to space, since that seems to be traditional
behavior rather than RFC-specified behavior. For example, when
Netscape 4.73 submits a form, it translates
<INPUT type=text name="Text" size=20 maxsize=128 VALUE="foo bar">
to
?Text=foo+bar
Apache::param() also seems to be liberal about accepting '+' for %20.
But it doesn't work that way with the "thin client" I'm coding for. I
don't think I can argue that the client's wrong without a standards
doc that says so. I'd also rather that Embperl be conservative about
what it sends out - adhering to the RFC rather than to common, but
non-RFC behavior.

-- Mike

diff -c epchar.c.orig epchar.c
*** epchar.c.orig Tue Jun 27 17:09:17 2000
--- epchar.c Tue Jun 27 17:21:55 2000
***************
*** 324,330 ****
--- 324,335 ----
{ ' ' , "%1D" }, /* &#29; Unused */
{ ' ' , "%1E" }, /* &#30; Unused */
{ ' ' , "%1F" }, /* &#31; Unused */
+ /* see http://www.ietf.org/rfc/rfc2396.txt */
+ #ifdef ENCODE_SPACES_AS_PLUS
{ ' ' , "+" }, /* &#32; Space */
+ #else
+ { ' ' , "%20" }, /* &#32; Space */
+ #endif
{ '!' , "" }, /* &#33; Exclamation mark */
{ '"' , "%22" }, /* Quotation mark */
{ '#' , "" }, /* &#35; Number sign */
RE: uri escaping spaces [ In reply to ]
>
> At 8:00 PM -0700 6/27/2000, Chris Thorman wrote:
> >In the URI escaping spec, '+' is the special (optional) escape code
> >for space. But since %XX works for any character, %20 also works
> >for space.
>
> I can't find anything like that in
> http://www.ietf.org/rfc/rfc2396.txt - do you have any references?

This code is very old (about 3 years) so I don't have in mind from where
exactly I get the information, but the '+' as escape for space is commonly
used. A quick test with IE 5.01 shows that entering "a b c" in a formfield
is send as a+b+c.

Anyway as far as I see it won't hurt to change + to %20 on the ouput side,
so, if nobody speaks against it, I include your patch in the next release

Gerald

-------------------------------------------------------------
Gerald Richter ecos electronic communication services gmbh
Internetconnect * Webserver/-design/-datenbanken * Consulting

Post: Tulpenstrasse 5 D-55276 Dienheim b. Mainz
E-Mail: richter@ecos.de Voice: +49 6133 925151
WWW: http://www.ecos.de Fax: +49 6133 925152
-------------------------------------------------------------
Re: uri escaping spaces [ In reply to ]
The plus=space encoding is from the specification of the
application/x-www-form-urlencoded media type.

http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1

Bryan.

--
Bryan Thale
Motorola Labs, Networking and Infrastructure Research
mailto:thale@rsch.comm.mot.com

Gerald Richter wrote:

> >
> > At 8:00 PM -0700 6/27/2000, Chris Thorman wrote:
> > >In the URI escaping spec, '+' is the special (optional) escape code
> > >for space. But since %XX works for any character, %20 also works
> > >for space.
> >
> > I can't find anything like that in
> > http://www.ietf.org/rfc/rfc2396.txt - do you have any references?
>
> This code is very old (about 3 years) so I don't have in mind from where
> exactly I get the information, but the '+' as escape for space is commonly
> used. A quick test with IE 5.01 shows that entering "a b c" in a formfield
> is send as a+b+c.
>
> Anyway as far as I see it won't hurt to change + to %20 on the ouput side,
> so, if nobody speaks against it, I include your patch in the next release
>
> Gerald