Mailing List Archive

Upload problem
Hello,

I have a simple upload module but I can't get a pure binary upload
to work everytime. This is what I get sometimes :
- Original is 73856 bytes
- Uploaded is 33398 bytes or 27939 bytes

Also the header changed from :
0x00000000: FFD8FFE0 00104A46 49460001 0101012C ......JFIF.....,
To :
0x00000000: EFBFBDEF BFBD0010 4A464946 00010101 ........JFIF....
0x00000010: 012C .,
Or even :
0x00000000: EFBFBDEF BFBDEFBF BDEFBFBD 00104A46 ..............JF
0x00000010: 49460001 0101012C IF.....,

So it's not just a truncated file. What is doing this ? Why does it work sometimes
and sometimes not ?

I have tried setting escmode=0, binmode FILE, EmbperlBlocks with no change. What is
causing this behaviour ? How can I guarantee a binary upload ?

Here is my code :


[$ syntax EmbperlBlocks $]
[.-
$req=shift;
$escmode=0;
$PHOTOPATH = "$ENV{DOCUMENT_ROOT}/data/img/artistes/big";
while ( ($k,$v)=each(%fdat)) {
if ($k =~ /^upl(\d+)$/ and $v) {
my $filename=$1;
open(FILE,">$PHOTOPATH/$filename.jpg") or print OUT $!;
binmode FILE;
my $buffer;
while (read($fdat{$k},$buffer,32768)) {
# should I do something with $buffer here ?
print FILE $buffer;
}
close(FILE);
}
}
-]


<form method="post" ENCTYPE="multipart/form-data">

And inside a loop :
<input type="FILE" id="upl[+ $p->[0] +]" name="upl[+ $p->[0] +]" />

<input type="SUBMIT" name="Bsave" value="Enregistrer" />
</form>


Thanks for your help,

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Upload problem [ In reply to ]
Don't know why or where, but you've got some utf8 encoding going on.
EF,BF,BD is the utf8 "replacement string" used for an unknown character
(probably the initial FF).

Suggest you sniff your data stream to see if it's happening before it
reaches Embperl.


On 04/04/2012 17:20, Jean-Christophe Boggio wrote:
> Hello,
>
> I have a simple upload module but I can't get a pure binary upload
> to work everytime. This is what I get sometimes :
> - Original is 73856 bytes
> - Uploaded is 33398 bytes or 27939 bytes
>
> Also the header changed from :
> 0x00000000: FFD8FFE0 00104A46 49460001 0101012C ......JFIF.....,
> To :
> 0x00000000: EFBFBDEF BFBD0010 4A464946 00010101 ........JFIF....
> 0x00000010: 012C .,
> Or even :
> 0x00000000: EFBFBDEF BFBDEFBF BDEFBFBD 00104A46 ..............JF
> 0x00000010: 49460001 0101012C IF.....,
>
> So it's not just a truncated file. What is doing this ? Why does it
> work sometimes
> and sometimes not ?
>
> I have tried setting escmode=0, binmode FILE, EmbperlBlocks with no
> change. What is
> causing this behaviour ? How can I guarantee a binary upload ?
>
> Here is my code :
>
>
> [$ syntax EmbperlBlocks $]
> [.-
> $req=shift;
> $escmode=0;
> $PHOTOPATH = "$ENV{DOCUMENT_ROOT}/data/img/artistes/big";
> while ( ($k,$v)=each(%fdat)) {
> if ($k =~ /^upl(\d+)$/ and $v) {
> my $filename=$1;
> open(FILE,">$PHOTOPATH/$filename.jpg") or print OUT $!;
> binmode FILE;
> my $buffer;
> while (read($fdat{$k},$buffer,32768)) {
> # should I do something with $buffer here ?
> print FILE $buffer;
> }
> close(FILE);
> }
> }
> -]
>
>
> <form method="post" ENCTYPE="multipart/form-data">
>
> And inside a loop :
> <input type="FILE" id="upl[+ $p->[0] +]" name="upl[+ $p->[0] +]" />
>
> <input type="SUBMIT" name="Bsave" value="Enregistrer" />
> </form>
>
>
> Thanks for your help,
>

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Upload problem [ In reply to ]
Le 04/04/2012 18:38, Chris Allen a écrit :
> Don't know why or where, but you've got some utf8 encoding going on.
> EF,BF,BD is the utf8 "replacement string" used for an unknown character (probably the initial FF).
>
> Suggest you sniff your data stream to see if it's happening before it reaches Embperl.

tcpdump prints this :

0x0500: 7570 6c33 3831 223b 2066 696c 656e 616d upl381";.filenam
0x0510: 653d 2243 686f 7269 7374 6573 4844 312e e="ChoristesHD1.
0x0520: 6a70 6722 0d0a 436f 6e74 656e 742d 5479 jpg"..Content-Ty
0x0530: 7065 3a20 696d 6167 652f 6a70 6567 0d0a pe:.image/jpeg..
0x0540: 0d0a ffd8 ffe0 0010 4a46 4946 0001 0101 ........JFIF....
0x0550: 012c 012c 0000 ffe1 33ff 4578 6966 0000 .,.,....3.Exif..

So it has to be embperl-related.

The site's config seems clean to me :

<VirtualHost *>
ServerName something.fr
DocumentRoot /var/www/sites/semi
DirectoryIndex index.html
EMBPERL_DEBUG 0
EMBPERL_APPNAME semiv2
EMBPERL_OBJECT_BASE base.epl
<FilesMatch "\.html">
SetHandler perl-script
PerlHandler Embperl::Object
Options ExecCGI
</FilesMatch>
Options -Indexes
</VirtualHost>

<Directory "/var/www/sites/semi/admin">
AuthUserFile /var/www/sites/semi/admin/.htpasswd
AuthName "Administration SEMI"
AuthType Basic
require valid-user

<Files *.cgi>
AddHandler cgi-script .cgi .pl .htm
Options ExecCGI
</files>
</Directory>

The /etc/apache2/conf.d/charset file is empty (all commented out).

Nothing charset-related in apache2.conf (standard Debian squeeze file)

The "base.epl" file contains this (I have UTF-8 accented characters in my pages) :
use utf8;
use encoding "utf8";
$http_headers_out{'Content-Type'}="text/html; charset=utf-8";

and :
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

I tried creating a "test" subdir containing only the upload module and a simplistic base.epl :
[-
Execute("$ENV{DOCUMENT_ROOT}/basedb.epl");
Execute('*');
-]
(basedb.epl only creates the $req->{dbh} handle to the DB)
Still no change.

Where can this encoding come from ? And why are the files smaller than the originals (UTF8 should
only enlarge the file when it encounters unknown chars, no ?) And why don't they always have the
same size ?

Last clue : it *seems* that when I restart apache I can reliably do ONE upload.

I run out of ideas so if anyone has one, I'll take it. Thanks for your help.

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Upload problem [ In reply to ]
Thanks for taking the time to help me.

Le 05/04/2012 08:48, Chris Allen a écrit :
> Can you include all of the headers here please?

I have attached the beginning of the dump (tcpdump addresses are changed to aa.aa.aaa.aa
and bb.bbb.bb.bb but it's easy to find the real ones). Hope the list accepts attachments.
The whole dump is 2.5Mb so I won't post it to the list but I have it handy if you need.

> It's possible you have more than one issue here. Firstly, what happens if you
> upload several textfiles (ASCII data only)? Do they upload correctly? Or perhaps
> they upload correctly but truncated?

Uploaded the full tcpdump (2670592bytes). It's pure 7-bit ASCII : same size, same md5sum
Uploaded a linux-header Makefile (53Kb). Probably 7-bit ASCII : same size, same md5sum

Uploaded a big ASCII file containing a few accents :
1395336 original
1395118 copy
Results are... insane : here is the diff :

diff -u 0410959v-phase2.txt 14.jpg
--- original 2011-09-05 15:18:49.000000000 +0200
+++ copy 2012-04-05 16:17:22.091080638 +0200
@@ -38,18 +38,18 @@
Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
-Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
+Use of uninitialized value in numeric eq (==) at ext-bin/do_5_gense2.pl line 1126.
+Use of uninitialized value in numeric et-bin/do_5_genfichiers_phase2.pl line 1126.
+Use ofed value in n=) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
+Use ozed value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
+Use of uninitialized valu et-bin/do_5_genfichiers_phase2.pl line 1126.
+Use of uninite in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
+Use of uninitialized value in num a_5_genfichiers_phase2.pl line 1126.
+Use of uninitialized eric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
+Use of uninitialized value in numeric eq (inhiers_phase2.pl line 1126.
+Use of uninitialized value in ==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
+Use of uninitialized value in numeric eq (==) at exense2.pl line 1126.
+Use of uninitialized value in numeric et-bichiers_phase2.pl line 1126.
Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
Use of uninitialized value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
@@ -258,7 +258,7 @@
Warning: Permanently added '[10.141.0.61]:2222' (RSA) to the list of known hosts.
Ubuntu 10.04.3 LTS
Warning: Permanently added '192.168.122.130' (RSA) to the list of known hosts.
-Arret du LDAP (patienter 10 secondes)
+Arret LDAP (patienter 10 secondes)
Stopping daemon monitor: monit.
Stopping OpenLDAP: slapd.
tar: Removing leading `/' from member names

The differences are lines 41-52 and 261 though the file is 23818 lines long. I guess it comes
from the fact that there's only one 32768-bytes buffer "corrupted" ?
Accents are only lines 2-191 (not on all lines)
The accents are still there, untouched. In the original file, they are UTF-8 encoded :
iconv -f utf8 -t latin1 original >/dev/null
-> no error

Also the files are not "truncated", there are bits randomly missing in the middle.


So as I understand it, the problemS (UTF8 encoding + bits missing) arise only when
non-UTF8 characters are encountered.

If you have ideas of where/what I can look next...

Thanks for your patience,

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V
Re: Upload problem [ In reply to ]
Hi Ed,

Le 05/04/2012 15:16, Ed Grimm a écrit :
> If my guess is right, then I think doing a
> binmode OUT, ':encoding(UTF-8)';

Tried that : the file is even more encoded (its size is growing rather than
shrinking). Here's the "jpeg" header :

0x00000000: C3BFC398 C3BFC3A0 00104A46 49460001 ..........JFIF..
0x00000010: 01010048 00480000 C3BFC39B 00430001 ...H.H.......C..

Thanks for your suggestion.


I wonder if I'm the only one on this list to upload non-7 bit files in HTTP with embperl ?
If so, it *has* to come from my config.

--
Jean-Christophe Boggio -o)
embperl@thefreecat.org /\\
Independant Consultant and Developer _\_V

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
Re: Upload problem [ In reply to ]
> I wonder if I'm the only one on this list to upload non-7 bit files in HTTP with
> embperl ?
> If so, it *has* to come from my config.

I have an application with upload support (for any type of file). However I have
not dealed with UTF-8 encoding, or in other words my system is simply using 8
bit characters in LATIN1.

--
---> Dirk Jagdmann
----> http://cubic.org/~doj
-----> http://llg.cubic.org

---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org
RE: Upload problem [ In reply to ]
Hi,

the file upload is handled by CGI.pm and not by Embperl itself. It looks like CGI.pm is doing some UTF8 conversion (or it is done when you write the file).

Perl's UTF-8 handling is a kind of mystery (and least to me). Every time I thought I had understood what is going on, I got a new surprise.

In the past the only way I got around is by try and error :-(

You might specify a binary encoding in your open statement (binmode only set the crlf <-> lf conversion, but it doesn't change charset conversion).

Gerald


> -----Original Message-----
> From: Jean-Christophe Boggio [mailto:embperl@thefreecat.org]
> Sent: Thursday, April 05, 2012 6:43 PM
> To: Chris Allen
> Cc: embperl@perl.apache.org
> Subject: Re: Upload problem
>
> Thanks for taking the time to help me.
>
> Le 05/04/2012 08:48, Chris Allen a écrit :
> > Can you include all of the headers here please?
>
> I have attached the beginning of the dump (tcpdump addresses are changed
> to aa.aa.aaa.aa and bb.bbb.bb.bb but it's easy to find the real ones). Hope
> the list accepts attachments.
> The whole dump is 2.5Mb so I won't post it to the list but I have it handy if
> you need.
>
> > It's possible you have more than one issue here. Firstly, what happens
> > if you upload several textfiles (ASCII data only)? Do they upload
> > correctly? Or perhaps they upload correctly but truncated?
>
> Uploaded the full tcpdump (2670592bytes). It's pure 7-bit ASCII : same size,
> same md5sum Uploaded a linux-header Makefile (53Kb). Probably 7-bit ASCII
> : same size, same md5sum
>
> Uploaded a big ASCII file containing a few accents :
> 1395336 original
> 1395118 copy
> Results are... insane : here is the diff :
>
> diff -u 0410959v-phase2.txt 14.jpg
> --- original 2011-09-05 15:18:49.000000000 +0200
> +++ copy 2012-04-05 16:17:22.091080638 +0200
> @@ -38,18 +38,18 @@
> Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> -Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> +Use of uninitialized value in numeric eq (==) at ext-bin/do_5_gense2.pl line
> 1126.
> +Use of uninitialized value in numeric et-bin/do_5_genfichiers_phase2.pl
> line 1126.
> +Use ofed value in n=) at ext-bin/do_5_genfichiers_phase2.pl line 1126.
> +Use ozed value in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl
> line 1126.
> +Use of uninitialized valu et-bin/do_5_genfichiers_phase2.pl line 1126.
> +Use of uninite in numeric eq (==) at ext-bin/do_5_genfichiers_phase2.pl
> line 1126.
> +Use of uninitialized value in num a_5_genfichiers_phase2.pl line 1126.
> +Use of uninitialized eric eq (==) at ext-bin/do_5_genfichiers_phase2.pl line
> 1126.
> +Use of uninitialized value in numeric eq (inhiers_phase2.pl line 1126.
> +Use of uninitialized value in ==) at ext-bin/do_5_genfichiers_phase2.pl line
> 1126.
> +Use of uninitialized value in numeric eq (==) at exense2.pl line 1126.
> +Use of uninitialized value in numeric et-bichiers_phase2.pl line 1126.
> Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> Use of uninitialized value in numeric eq (==) at ext-
> bin/do_5_genfichiers_phase2.pl line 1126.
> @@ -258,7 +258,7 @@
> Warning: Permanently added '[10.141.0.61]:2222' (RSA) to the list of known
> hosts.
> Ubuntu 10.04.3 LTS
> Warning: Permanently added '192.168.122.130' (RSA) to the list of known
> hosts.
> -Arret du LDAP (patienter 10 secondes)
> +Arret LDAP (patienter 10 secondes)
> Stopping daemon monitor: monit.
> Stopping OpenLDAP: slapd.
> tar: Removing leading `/' from member names
>
> The differences are lines 41-52 and 261 though the file is 23818 lines long. I
> guess it comes from the fact that there's only one 32768-bytes buffer
> "corrupted" ?
> Accents are only lines 2-191 (not on all lines) The accents are still there,
> untouched. In the original file, they are UTF-8 encoded :
> iconv -f utf8 -t latin1 original >/dev/null
> -> no error
>
> Also the files are not "truncated", there are bits randomly missing in the
> middle.
>
>
> So as I understand it, the problemS (UTF8 encoding + bits missing) arise only
> when
> non-UTF8 characters are encountered.
>
> If you have ideas of where/what I can look next...
>
> Thanks for your patience,
>
> --
> Jean-Christophe Boggio -o)
> embperl@thefreecat.org /\\
> Independant Consultant and Developer _\_V
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
> For additional commands, e-mail: embperl-help@perl.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org