Mailing List Archive

Catalyst::Request::Upload->filename is not decoded.
All my upload forms have accept-charset="utf-8". We expect that uploaded
filenames could have wide-characters.

The problem I hit was ->basename does this:

$ perl -le 'use Catalyst::Request::Upload; my $upload =
Catalyst::Request::Upload->new( { filename => q[документ обучения.pdf] } );
print $upload->basename;'
_.pdf

That's pretty mangled.


The problem is that $upload->filename is not decoded so the substitution is
working on octets not characters.

sub _build_basename {
my $self = shift;
my $basename = $self->filename;
$basename =~ s|\\|/|g;
$basename = ( File::Spec::Unix->splitpath($basename) )[2];
$basename =~ s|[^\w\.-]+|_|g;
return $basename;
}


Obviously, we want \w to work on characters, not encoded octets. Decoding
the filename should be done -- it's character data.

Does it make sense to do it in Engine's prepare_uploads?

For example:

my $u = Catalyst::Request::Upload->new(
size => $upload->{size},
type => scalar $headers->content_type,
headers => $headers,
tempname => $upload->{tempname},
filename =>
*$c->_handle_unicode_decoding($upload->{filename})*,
);


--
Bill Moseley
moseley@hank.org
Re: Catalyst::Request::Upload->filename is not decoded. [ In reply to ]
Any chance you can test this on the current dev release on CPAN?  There's a ton of utf8 fixes there.  
Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.org

|   |
|   |   |   |   |   |
| Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.orgThe Catalyst Framework Runtime |
| |
| View on metacpan.org | Preview by Yahoo |
| |
|   |


If trouble remains, I'd love an issue or ideally a test case.  There's a big UTF8 test case over here

perl-catalyst/catalyst-runtime

|   |
|   | |   |   |   |   |   |
| perl-catalyst/catalyst-runtimecatalyst-runtime - The Elegant MVC Web Application Framework |
| |
| View on github.com | Preview by Yahoo |
| |
|   |

 Take a look and let me know if we need  more here.  The file upload stuff is something that is a bit confusing to me that I got it all correct

On Wednesday, December 17, 2014 7:22 PM, Bill Moseley <moseley@hank.org> wrote:


All my upload forms have accept-charset="utf-8".    We expect that uploaded filenames could have wide-characters.
The problem I hit was ->basename does this:
$ perl -le 'use Catalyst::Request::Upload; my $upload = Catalyst::Request::Upload->new( { filename => q[документ обучения.pdf] } ); print $upload->basename;'_.pdf
That's pretty mangled.

The problem is that $upload->filename is not decoded so the substitution is working on octets not characters. 

sub _build_basename {    my $self = shift;    my $basename = $self->filename;    $basename =~ s|\\|/|g;    $basename = ( File::Spec::Unix->splitpath($basename) )[2];    $basename =~ s|[^\w\.-]+|_|g;    return $basename;}

Obviously, we want \w to work on characters, not encoded octets.   Decoding the filename should be done -- it's character data.
Does it make sense to do it in Engine's prepare_uploads?
For example:
            my $u = Catalyst::Request::Upload->new(               size => $upload->{size},               type => scalar $headers->content_type,               headers => $headers,               tempname => $upload->{tempname},               filename => $c->_handle_unicode_decoding($upload->{filename}),            );

--
Bill Moseley
moseley@hank.org
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: Catalyst::Request::Upload->filename is not decoded. [ In reply to ]
actually you might need to checkout and test the holland branch HEAD, there's fixes around that are not on CPAN -jnap

On Friday, December 19, 2014 11:15 AM, John Napiorkowski <jjn1056@yahoo.com> wrote:


Any chance you can test this on the current dev release on CPAN?  There's a ton of utf8 fixes there.  
Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.org

|   |
|   |   |   |   |   |
| Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.orgThe Catalyst Framework Runtime |
| |
| View on metacpan.org | Preview by Yahoo |
| |
|   |


If trouble remains, I'd love an issue or ideally a test case.  There's a big UTF8 test case over here

perl-catalyst/catalyst-runtime

|   |
|   | |   |   |   |   |   |
| perl-catalyst/catalyst-runtimecatalyst-runtime - The Elegant MVC Web Application Framework |
| |
| View on github.com | Preview by Yahoo |
| |
|   |

 Take a look and let me know if we need  more here.  The file upload stuff is something that is a bit confusing to me that I got it all correct

On Wednesday, December 17, 2014 7:22 PM, Bill Moseley <moseley@hank.org> wrote:


All my upload forms have accept-charset="utf-8".    We expect that uploaded filenames could have wide-characters.
The problem I hit was ->basename does this:
$ perl -le 'use Catalyst::Request::Upload; my $upload = Catalyst::Request::Upload->new( { filename => q[документ обучения.pdf] } ); print $upload->basename;'_.pdf
That's pretty mangled.

The problem is that $upload->filename is not decoded so the substitution is working on octets not characters. 

sub _build_basename {    my $self = shift;    my $basename = $self->filename;    $basename =~ s|\\|/|g;    $basename = ( File::Spec::Unix->splitpath($basename) )[2];    $basename =~ s|[^\w\.-]+|_|g;    return $basename;}

Obviously, we want \w to work on characters, not encoded octets.   Decoding the filename should be done -- it's character data.
Does it make sense to do it in Engine's prepare_uploads?
For example:
            my $u = Catalyst::Request::Upload->new(               size => $upload->{size},               type => scalar $headers->content_type,               headers => $headers,               tempname => $upload->{tempname},               filename => $c->_handle_unicode_decoding($upload->{filename}),            );

--
Bill Moseley
moseley@hank.org
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: Catalyst::Request::Upload->filename is not decoded. [ In reply to ]
actually you might need to checkout and test the holland branch HEAD, there's fixes around that are not on CPAN 
and it looks like filename is right but baseman is using a regexp that is not unicode friendly.  I'll take a look
jnap

On Friday, December 19, 2014 11:15 AM, John Napiorkowski <jjn1056@yahoo.com> wrote:


Any chance you can test this on the current dev release on CPAN?  There's a ton of utf8 fixes there.  
Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.org

|   |
|   |   |   |   |   |
| Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.orgThe Catalyst Framework Runtime |
| |
| View on metacpan.org | Preview by Yahoo |
| |
|   |


If trouble remains, I'd love an issue or ideally a test case.  There's a big UTF8 test case over here

perl-catalyst/catalyst-runtime

|   |
|   | |   |   |   |   |   |
| perl-catalyst/catalyst-runtimecatalyst-runtime - The Elegant MVC Web Application Framework |
| |
| View on github.com | Preview by Yahoo |
| |
|   |

 Take a look and let me know if we need  more here.  The file upload stuff is something that is a bit confusing to me that I got it all correct

On Wednesday, December 17, 2014 7:22 PM, Bill Moseley <moseley@hank.org> wrote:


All my upload forms have accept-charset="utf-8".    We expect that uploaded filenames could have wide-characters.
The problem I hit was ->basename does this:
$ perl -le 'use Catalyst::Request::Upload; my $upload = Catalyst::Request::Upload->new( { filename => q[документ обучения.pdf] } ); print $upload->basename;'_.pdf
That's pretty mangled.

The problem is that $upload->filename is not decoded so the substitution is working on octets not characters. 

sub _build_basename {    my $self = shift;    my $basename = $self->filename;    $basename =~ s|\\|/|g;    $basename = ( File::Spec::Unix->splitpath($basename) )[2];    $basename =~ s|[^\w\.-]+|_|g;    return $basename;}

Obviously, we want \w to work on characters, not encoded octets.   Decoding the filename should be done -- it's character data.
Does it make sense to do it in Engine's prepare_uploads?
For example:
            my $u = Catalyst::Request::Upload->new(               size => $upload->{size},               type => scalar $headers->content_type,               headers => $headers,               tempname => $upload->{tempname},               filename => $c->_handle_unicode_decoding($upload->{filename}),            );

--
Bill Moseley
moseley@hank.org
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: Catalyst::Request::Upload->filename is not decoded. [ In reply to ]
ok so we can't change the existing baseman for backcompat, so we added a new raw_basename that I think should do what you want.  That will work its way to CPAN as dev004 shortly.  Holland is expected to become stable Jan 28, 2015
john


On Friday, December 19, 2014 11:40 AM, John Napiorkowski <jjn1056@yahoo.com> wrote:


actually you might need to checkout and test the holland branch HEAD, there's fixes around that are not on CPAN 
and it looks like filename is right but baseman is using a regexp that is not unicode friendly.  I'll take a look
jnap

On Friday, December 19, 2014 11:15 AM, John Napiorkowski <jjn1056@yahoo.com> wrote:


Any chance you can test this on the current dev release on CPAN?  There's a ton of utf8 fixes there.  
Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.org

|   |
|   |   |   |   |   |
| Catalyst-Runtime-5.90079_003 - The Catalyst Framework Runtime - metacpan.orgThe Catalyst Framework Runtime |
| |
| View on metacpan.org | Preview by Yahoo |
| |
|   |


If trouble remains, I'd love an issue or ideally a test case.  There's a big UTF8 test case over here

perl-catalyst/catalyst-runtime

|   |
|   | |   |   |   |   |   |
| perl-catalyst/catalyst-runtimecatalyst-runtime - The Elegant MVC Web Application Framework |
| |
| View on github.com | Preview by Yahoo |
| |
|   |

 Take a look and let me know if we need  more here.  The file upload stuff is something that is a bit confusing to me that I got it all correct

On Wednesday, December 17, 2014 7:22 PM, Bill Moseley <moseley@hank.org> wrote:


All my upload forms have accept-charset="utf-8".    We expect that uploaded filenames could have wide-characters.
The problem I hit was ->basename does this:
$ perl -le 'use Catalyst::Request::Upload; my $upload = Catalyst::Request::Upload->new( { filename => q[документ обучения.pdf] } ); print $upload->basename;'_.pdf
That's pretty mangled.

The problem is that $upload->filename is not decoded so the substitution is working on octets not characters. 

sub _build_basename {    my $self = shift;    my $basename = $self->filename;    $basename =~ s|\\|/|g;    $basename = ( File::Spec::Unix->splitpath($basename) )[2];    $basename =~ s|[^\w\.-]+|_|g;    return $basename;}

Obviously, we want \w to work on characters, not encoded octets.   Decoding the filename should be done -- it's character data.
Does it make sense to do it in Engine's prepare_uploads?
For example:
            my $u = Catalyst::Request::Upload->new(               size => $upload->{size},               type => scalar $headers->content_type,               headers => $headers,               tempname => $upload->{tempname},               filename => $c->_handle_unicode_decoding($upload->{filename}),            );

--
Bill Moseley
moseley@hank.org
_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/