Mailing List Archive

Encoding problem
Hello,

I have problems with the encoding of posted form data. I try to do everything
in UTF-8 (code, DB, html...).

I have a form on a page where the data IS utf-8 (that's what I think) but
it does not have the UTF-8 bit set, wonder why.

Firefox detects the page encoding as Unicode (UTF-8). The page has this header :
<meta http-equiv="content-type" content="text/html; charset=utf-8">

But if I "print OUT $fdat{myfield}" it gets re-encoded in UTF-8 (ie: I get two
chars like Äç for every accented letter)

The following code makes the page work but I don't understand why I have to
do the work manually :

foreach my $k(keys %fdat) {
Encode::_utf8_on($fdat{$k});
}

My apache2 conf is like this :

<Directory /var/www/sites/dynatouraine>
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
allow from all
EMBPERL_APPNAME DynaTouraine
EMBPERL_OBJECT_BASE base.epl
EMBPERL_ESCMODE 0
<Files *.html>
SetHandler perl-script
PerlHandler Embperl::Object
Options ExecCGI
</Files>
</Directory>

Thanks for your help.

PS: Embperl 2.2.0-3.1 on Debian/Lenny 5.0.4 with apache 2.2.9-10+lenny6

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
RE: Encoding problem [ In reply to ]
Hi,

setting the default encoding in the httpd.conf to utf8 might help


Gerald


> -----Original Message-----
> From: Jean-Christophe Boggio [mailto:embperl@thefreecat.org]
> Sent: Wednesday, April 21, 2010 5:21 PM
> To: embperl@perl.apache.org
> Subject: Encoding problem
>
> Hello,
>
> I have problems with the encoding of posted form data. I try to do
> everything
> in UTF-8 (code, DB, html...).
>
> I have a form on a page where the data IS utf-8 (that's what I think)
> but
> it does not have the UTF-8 bit set, wonder why.
>
> Firefox detects the page encoding as Unicode (UTF-8). The page has this
> header :
> <meta http-equiv="content-type" content="text/html; charset=utf-8">
>
> But if I "print OUT $fdat{myfield}" it gets re-encoded in UTF-8 (ie: I
> get two
> chars like Äç for every accented letter)
>
> The following code makes the page work but I don't understand why I
> have to
> do the work manually :
>
> foreach my $k(keys %fdat) {
> Encode::_utf8_on($fdat{$k});
> }
>
> My apache2 conf is like this :
>
> <Directory /var/www/sites/dynatouraine>
> Options Indexes FollowSymLinks MultiViews
> AllowOverride None
> Order allow,deny
> allow from all
> EMBPERL_APPNAME DynaTouraine
> EMBPERL_OBJECT_BASE base.epl
> EMBPERL_ESCMODE 0
> <Files *.html>
> SetHandler perl-script
> PerlHandler Embperl::Object
> Options ExecCGI
> </Files>
> </Directory>
>
> Thanks for your help.
>
> PS: Embperl 2.2.0-3.1 on Debian/Lenny 5.0.4 with apache 2.2.9-10+lenny6
>
> --
> Jean-Christophe Boggio -o)
> embperl@thefreecat.org /\\
> Independant Consultant and Developer _\_V
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
> For additional commands, e-mail: embperl-help@perl.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Encoding problem [ In reply to ]
Hi Gerald,

Gerald Richter - ECOS a écrit :
> setting the default encoding in the httpd.conf to utf8 might help

I already have :
AddDefaultCharset UTF-8
in my httpd.conf.

I tried to add it to my <directory.../> directives and also
AddCharset utf-8 .html
With no more luck.

I found other people describing this kind of symptom, one using CGIs :
http://mail-archives.apache.org/mod_mbox/perl-modperl/200806.mbox/%3C485EA6FF.7090104@ice-sa.com%3E

Another with Mason :
http://www.cybaea.net/Blogs/TechNotes/Mason-utf-8-clean.html#h2_form_input

Both use decode() functions (which works for me too) but I guess they are
converting the encoding back and forth... I don't even know if this is related.

Are we supposed to get utf8-stamped $fdat{xx} variables when the input is accentuated
and the form/page are utf-8 ?

Thanks for your help,

PS: I rewrote the "fix" in a more "monky" way :

Encode::_utf8_on($fdat{$_}) for keys %fdat;

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Encoding problem [ In reply to ]
You should *always* return the correct charset in the http header, no
matter which framework/cgi script you're using.

AddDefaultCharset in apache is bad because it appends that header for
every resource which didn't specify it.

--
Best regards, Alex


Am Donnerstag, den 22.04.2010, 02:53 +0200 schrieb Jean-Christophe
Boggio:
> Hi Gerald,
>
> Gerald Richter - ECOS a écrit :
> > setting the default encoding in the httpd.conf to utf8 might help
>
> I already have :
> AddDefaultCharset UTF-8
> in my httpd.conf.
>
> I tried to add it to my <directory.../> directives and also
> AddCharset utf-8 .html
> With no more luck.
>
> I found other people describing this kind of symptom, one using CGIs :
> http://mail-archives.apache.org/mod_mbox/perl-modperl/200806.mbox/%3C485EA6FF.7090104@ice-sa.com%3E
>
> Another with Mason :
> http://www.cybaea.net/Blogs/TechNotes/Mason-utf-8-clean.html#h2_form_input
>
> Both use decode() functions (which works for me too) but I guess they are
> converting the encoding back and forth... I don't even know if this is related.
>
> Are we supposed to get utf8-stamped $fdat{xx} variables when the input is accentuated
> and the form/page are utf-8 ?
>
> Thanks for your help,
>
> PS: I rewrote the "fix" in a more "monky" way :
>
> Encode::_utf8_on($fdat{$_}) for keys %fdat;
>


*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
T-Systems Austria GesmbH Rennweg 97-99, 1030 Wien
Handelsgericht Wien, FN 79340b
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
Notice: This e-mail contains information that is confidential and may be privileged.
If you are not the intended recipient, please notify the sender and then
delete this e-mail immediately.
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Encoding problem [ In reply to ]
Hi Alexander,

Alexander Hartmaier a écrit :
> You should *always* return the correct charset in the http header, no
> matter which framework/cgi script you're using.

? The problem comes from the header I *receive*. The headers I send are
always good (hard coded in base.epl). I'm quoting myself :

> Firefox detects the page encoding as Unicode (UTF-8). The page has this header :
> <meta http-equiv="content-type" content="text/html; charset=utf-8">

Do you suggest something else ? I'm sorry, I don't understand your point.

I referred to other people having the same kind of problems just because
it might not be a embperl-only problem but maybe an apache-perl
problem.

To make it short, the %fdat "fields" are coded in utf-8 but not seen
by perl *as* utf-8.

> AddDefaultCharset in apache is bad because it appends that header for
> every resource which didn't specify it.

I know, it was a "second chance" type of solution (suggested by Gerald).
Though I don't want anything else than utf-8 so it doesn't harm.

Any idea ?

Thanks for your help,

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Encoding problem [ In reply to ]
Hi,

Since I seem to be the only one having problems with utf8 forms, I guess
the problem is me not expecting the correct things to happen.

The following is a simple html test page with a simple form. I expect the
result to be utf-8 but it's not (until I comment out the Encode::_utf8_on() line).

Is this normal ? Do you have the same behaviour ? Can someone explain (or point me
to a doc explaining) the confusion I'm making ?

Thanks for your help,



<!doctype html>
<html><head>
[-
use utf8;
use Encode;

# Encode::_utf8_on($fdat{$_}) for keys %fdat;

$escmode=0;
$http_headers_out{'Content-Type'}="text/html; charset=utf-8";
-]
</head>

<body>
[+ utf8::is_utf8($fdat{nom}) ? 'utf8' : 'other' +]
<br />
Received : [+ $fdat{nom} +]<br />
<form method="post" accept-charset="UTF-8">
<input type="text" id="nom" name="nom" />
<input type="submit" value="go" />
</form>
</body>

</html>


PS: In the same directory I have a base.epl file containing
[- Execute('*'); -]


--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
RE: Encoding problem [ In reply to ]
Hi,

I have UTF8 pages where I remember that thing are handled correctly, but I have to dig a little bit deeper to find out what the difference to your example is. I am currently on a business trip and hope to get the time to look at it near the end of the week

Gerald




> -----Original Message-----
> From: Jean-Christophe Boggio [mailto:embperl@thefreecat.org]
> Sent: Monday, April 26, 2010 12:16 PM
> To: embperl@perl.apache.org
> Subject: Re: Encoding problem
>
> Hi,
>
> Since I seem to be the only one having problems with utf8 forms, I
> guess
> the problem is me not expecting the correct things to happen.
>
> The following is a simple html test page with a simple form. I expect
> the
> result to be utf-8 but it's not (until I comment out the
> Encode::_utf8_on() line).
>
> Is this normal ? Do you have the same behaviour ? Can someone explain
> (or point me
> to a doc explaining) the confusion I'm making ?
>
> Thanks for your help,
>
>
>
> <!doctype html>
> <html><head>
> [-
> use utf8;
> use Encode;
>
> # Encode::_utf8_on($fdat{$_}) for keys %fdat;
>
> $escmode=0;
> $http_headers_out{'Content-Type'}="text/html; charset=utf-8";
> -]
> </head>
>
> <body>
> [+ utf8::is_utf8($fdat{nom}) ? 'utf8' : 'other' +]
> <br />
> Received : [+ $fdat{nom} +]<br />
> <form method="post" accept-charset="UTF-8">
> <input type="text" id="nom" name="nom" />
> <input type="submit" value="go" />
> </form>
> </body>
>
> </html>
>
>
> PS: In the same directory I have a base.epl file containing
> [- Execute('*'); -]
>
>
> --
> Jean-Christophe Boggio -o)
> embperl@thefreecat.org /\\
> Independant Consultant and Developer _\_V
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
> For additional commands, e-mail: embperl-help@perl.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Encoding problem [ In reply to ]
Sorry, seems I missed the point.

I'm using plain old Embperl, not Embperl::Object so it might be a
difference in there.

My old Embperl app works flawless with UTF-8.
Have you checked if your browser sends the data as UTF-8 with e.g.
tcpdump?

In general you shouldn't rely on Perl's utf-8 flag but en-/decode
according to the charset you expect.
I'm not sure if Embperl decodes request params by default into Perl's
internal utf-8 representation.

My app does successfully store German Umlauts into our Oracle database,
but I haven't checked Perl's internal utf-8 flag.

--
Best regards, Alex


Am Donnerstag, den 22.04.2010, 21:05 +0200 schrieb Jean-Christophe
Boggio:
> Hi Alexander,
>
> Alexander Hartmaier a écrit :
> > You should *always* return the correct charset in the http header, no
> > matter which framework/cgi script you're using.
>
> ? The problem comes from the header I *receive*. The headers I send are
> always good (hard coded in base.epl). I'm quoting myself :
>
> > Firefox detects the page encoding as Unicode (UTF-8). The page has this header :
> > <meta http-equiv="content-type" content="text/html; charset=utf-8">
>
> Do you suggest something else ? I'm sorry, I don't understand your point.
>
> I referred to other people having the same kind of problems just because
> it might not be a embperl-only problem but maybe an apache-perl
> problem.
>
> To make it short, the %fdat "fields" are coded in utf-8 but not seen
> by perl *as* utf-8.
>
> > AddDefaultCharset in apache is bad because it appends that header for
> > every resource which didn't specify it.
>
> I know, it was a "second chance" type of solution (suggested by Gerald).
> Though I don't want anything else than utf-8 so it doesn't harm.
>
> Any idea ?
>
> Thanks for your help,
>


*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
T-Systems Austria GesmbH Rennweg 97-99, 1030 Wien
Handelsgericht Wien, FN 79340b
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*
Notice: This e-mail contains information that is confidential and may be privileged.
If you are not the intended recipient, please notify the sender and then
delete this e-mail immediately.
*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*"*

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org