Mailing List Archive

Problem with file upload
Hello,

I have a fully UTF8 encoded site in which I want users to upload files
(simple <input type="file...> form).
These files are ISO-8859-1 encoded.

My problem is that sometimes the file is correctly uploaded and
sometimes it is converted to something stupid :

==>correct :
$ hd sconet1.csv |head -1
00000000 4e 6f 6d 3b 50 72 e9 6e 6f 6d 20 31 3b 44 61 74 |Nom;Pr.nom 1;Dat|

==>bad :
$ hd sconet1.csv |head -1
00000000 4e 6f 6d 3b 50 72 ef bf bd 6e 6f 6d 20 31 3b 44 |Nom;Pr...nom 1;D|

The problem seems completely random, sometimes pressing F5 a few times makes
the upload work.

The code I use is this :
open(FILE,">:encoding(iso-8859-1)","sconet1.csv") or print OUT $!;
my $buffer;
while (read($fdat{efilename},$buffer,32768)) {
print FILE $buffer;
}
close(FILE);

I have tried removing the second param to open() but it changes nothing.

The problem arises in Firefox and IE so I guess the problem is server-side.

At the beginning of _base.epl I have this :
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

Can I force a "pure binary" upload/save of the file ?

*any* clue much appreciated. I'm getting mad with this...

Thanks,

JC

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Problem with file upload [ In reply to ]
Hello JC,

Try to set your locale inside your Embperl file, try:

[-
use locale;
-]

just before your upload code, and if this doesn't help try changing the
locale for this file with the POSIX module:

[!
use POSIX qw(locale_h);
!]
[-
use locale;
setlocale(LC_ALL, "pt_BR.ISO8859-1");
-]

Change pt_BR to your locale.
See man setlocale for more info.


Hope it helps.

Regards,

--
Luiz Fernando Bernardes Ribeiro
Engenho Soluções
Tel: +55 11 2122-4216
Cel: +55 11 9254-1061

Jean-Christophe Boggio escreveu:
> Hello,
>
> I have a fully UTF8 encoded site in which I want users to upload files
> (simple <input type="file...> form).
> These files are ISO-8859-1 encoded.
>
> My problem is that sometimes the file is correctly uploaded and
> sometimes it is converted to something stupid :
>
> ==>correct :
> $ hd sconet1.csv |head -1
> 00000000 4e 6f 6d 3b 50 72 e9 6e 6f 6d 20 31 3b 44 61 74 |Nom;Pr.nom
> 1;Dat|
>
> ==>bad :
> $ hd sconet1.csv |head -1
> 00000000 4e 6f 6d 3b 50 72 ef bf bd 6e 6f 6d 20 31 3b 44
> |Nom;Pr...nom 1;D|
>
> The problem seems completely random, sometimes pressing F5 a few times
> makes
> the upload work.
>
> The code I use is this :
> open(FILE,">:encoding(iso-8859-1)","sconet1.csv") or print OUT $!;
> my $buffer;
> while (read($fdat{efilename},$buffer,32768)) {
> print FILE $buffer;
> }
> close(FILE);
>
> I have tried removing the second param to open() but it changes nothing.
>
> The problem arises in Firefox and IE so I guess the problem is server-side.
>
> At the beginning of _base.epl I have this :
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
>
> Can I force a "pure binary" upload/save of the file ?
>
> *any* clue much appreciated. I'm getting mad with this...
>
> Thanks,
>
> JC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
> For additional commands, e-mail: embperl-help@perl.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Problem with file upload [ In reply to ]
Hi JC,

I've got a similar problem on my site except the other way around-
everything comes in ISO-8859-1 and I want it in UTF8. Perl usually
tries to guess at the best encoding when it takes in the data and then
encodes it internally as best it can. You may have a problem where the
data comes in as ISO88591 but perl thinks it is UTF8 data, encodes it
internally as UTF8 and then prints out the UTF8-as-ISO88591 to give you
the bad results. I usually get this sort of behavior when I get
characters in the \x{80}-\x{ff} range and it looks like the 'bad string'
starts right after the e9 character so this is probably what you're
getting as well.

It may be worth checking to see what format Perl thinks your incoming
data is by using

$flag = utf8::is_utf8(STRING);

If perl thinks UTF8 then it is misintepreting your incoming data and
you'll need to either decode it with decode or with one of the other
UTF8 utilities. This may work:

$GoodInternalString = decode("iso-8859-1", $IncomingData);

These are the pages I read over and over and over again until my pages
magically work:

http://perldoc.perl.org/utf8.html
http://perldoc.perl.org/Encode.html

Best,
Ben

> Hello,
>
> I have a fully UTF8 encoded site in which I want users to upload files
> (simple <input type="file...> form).
> These files are ISO-8859-1 encoded.
>
> My problem is that sometimes the file is correctly uploaded and
> sometimes it is converted to something stupid :
>
> ==>correct :
> $ hd sconet1.csv |head -1
> 00000000 4e 6f 6d 3b 50 72 e9 6e 6f 6d 20 31 3b 44 61 74 |Nom;Pr.nom 1;Dat|
>
> ==>bad :
> $ hd sconet1.csv |head -1
> 00000000 4e 6f 6d 3b 50 72 ef bf bd 6e 6f 6d 20 31 3b 44 |Nom;Pr...nom 1;D|
>
> The problem seems completely random, sometimes pressing F5 a few times makes
> the upload work.
>
> The code I use is this :
> open(FILE,">:encoding(iso-8859-1)","sconet1.csv") or print OUT $!;
> my $buffer;
> while (read($fdat{efilename},$buffer,32768)) {
> print FILE $buffer;
> }
> close(FILE);
>
> I have tried removing the second param to open() but it changes nothing.
>
> The problem arises in Firefox and IE so I guess the problem is server-side.
>
> At the beginning of _base.epl I have this :
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
>
> Can I force a "pure binary" upload/save of the file ?
>
> *any* clue much appreciated. I'm getting mad with this...
>
> Thanks,
>
> JC
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
> For additional commands, e-mail: embperl-help@perl.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Problem with file upload [ In reply to ]
Ben Hiebert wrote :
> Perl usually
> tries to guess at the best encoding when it takes in the data and then
> encodes it internally as best it can. You may have a problem where the
> data comes in as ISO88591 but perl thinks it is UTF8 data, encodes it
> internally as UTF8 and then prints out the UTF8-as-ISO88591 to give you
> the bad results.

Yes, that is my guess too.

> It may be worth checking to see what format Perl thinks your incoming
> data is by using
> $flag = utf8::is_utf8(STRING);

Good idea. I modified the code to this :

while (read($fdat{efilename},$buffer,32768)) {
if (utf8::is_utf8($buffer)) {
print OUT "u";
}
print FILE $buffer;
}

...but in both cases (working and not) I never get the "uuuuu" lines.
BUT when the $buffer is written to disk it is transformed ! I tried
with binmode FILE just after opening the file for output but same
things happen.

> If perl thinks UTF8 then it is misintepreting your incoming data and
> you'll need to either decode it with decode or with one of the other
> UTF8 utilities. This may work:
>
> $GoodInternalString = decode("iso-8859-1", $IncomingData);

That's what I use when the file *is* iso-8859-1.

> These are the pages I read over and over and over again until my pages
> magically work:

:-) I see *exactly* what you mean. I've read these pages over and over too.

I don't get the reason for that random behaviour.

Thanks,

JC

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org