Mailing List Archive

Content Disposition filename
Anyone aware of a good, portable way in Perl to encode the filename in a
Content-Disposition header? I would like to support UTF8 filenames, but
support in browsers is unclear (if not changing).

Is this complexity something that the Catalyst framework should handle?
It's one of those areas where it's easy to get wrong (I can see many
different approaches in our own code).

http://greenbytes.de/tech/tc2231/

http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http

Thanks,

--
Bill Moseley
moseley@hank.org
Re: Content Disposition filename [ In reply to ]
On Tue, Nov 19, 2013 at 10:32 AM, Bill Moseley <moseley@hank.org> wrote:

> Anyone aware of a good, portable way in Perl to encode the filename in a
> Content-Disposition header? I would like to support UTF8 filenames, but
> support in browsers is unclear (if not changing).
>
> Is this complexity something that the Catalyst framework should handle?
> It's one of those areas where it's easy to get wrong (I can see many
> different approaches in our own code).
>
> http://greenbytes.de/tech/tc2231/
>
>
> http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http
>

I have no idea what the client can accept or what its OS uses as a
path-separator, and I don't want to go down the client-sniffing path,
anyway.

I have a user-supplied character string that I want to use as the filename,
which I have to assume can contain any unicode character since it's
user-supplied data.

>From my limited tests it seems most modern browsers are supporting the
"filename*" extension. Each browser does some special handling (like
replacing the path-separator, or adding a file extension based on
content-type if no file extension is in the filename).


All I want to do is make valid HTTP headers and let the client decide how
to handle it, but also provide a usable filename (not just underscores, for
example).


So, all I'm after is to make this valid markup:

$c->res->header( content_disposition =>
qq[attachment; filename="$ascii_file"; filename*=UTF-8''$utf8_file]
);



The filename* is easy, I'm finding:

my $utf8_file = uri_escape( Encode::encode( 'UTF-8' => $filename ) );



But the $ascii_file is a bit more work. Percent-encoding doesn't work.
So, have to do a bit of filtering.


See any easier/cleaner/more-correct approach? When I see this much code I
tend to think it's the wrong approach.


# Convert to ASCII using underscore as replacement

my $ascii_file = Encode::encode( ascii => $filename, sub { '_' } );

# Remove quotes as we want to use quoted form of "filename" and preserve
whitespace.

$ascii_file =~ s/"/_/g;

# Replace non-printable characters with underscore, and collapse dups

$ascii_file =~ s/[^[:print:]]/_/g;
$ascii_file =~ s/_{2,}/_/g;

# Split off the extension so can check length of filename w/o extension.

# Of course, $ext could end up as dot + underscore.

my ( $base, $ext ) = split /(\.\w+)$/, $ascii_file;

# Use default filename if we don't have more than three "meaningful"
characters.

# very subjective.

$base = 'your_file' unless ( () = $base =~ /[A-Za-z0-9]/g ) > 3;

# Stuff the extension back on.

$ascii_file = $base;
$ascii_file .= $ext if defined $ext;



Again, "filename*" support is good, and I'm not trying to prevent buggy
clients from doing something stupid (e.g. filename=/etc/passwd), but want
to provide a reasonable fallback to "filename".

Perhaps the simple solution is to always use "filename=your_file" and hope
most clients use the filename* extension.


--
Bill Moseley
moseley@hank.org
Re: Re: Content Disposition filename [ In reply to ]
From: Bill Moseley




On Tue, Nov 19, 2013 at 10:32 AM, Bill Moseley <moseley@hank.org> wrote:

Anyone aware of a good, portable way in Perl to encode the filename in a Content-Disposition header? I would like to support UTF8 filenames, but support in browsers is unclear (if not changing).


Is this complexity something that the Catalyst framework should handle? It's one of those areas where it's easy to get wrong (I can see many different approaches in our own code).



http://greenbytes.de/tech/tc2231/



http://stackoverflow.com/questions/93551/how-to-encode-the-filename-parameter-of-content-disposition-header-in-http


I have no idea what the client can accept or what its OS uses as a path-separator, and I don't want to go down the client-sniffing path, anyway.


I have a user-supplied character string that I want to use as the filename, which I have to assume can contain any unicode character since it's user-supplied data.


From my limited tests it seems most modern browsers are supporting the "filename*" extension. Each browser does some special handling (like replacing the path-separator, or adding a file extension based on content-type if no file extension is in the filename).




All I want to do is make valid HTTP headers and let the client decide how to handle it, but also provide a usable filename (not just underscores, for example).




So, all I'm after is to make this valid markup:


$c->res->header( content_disposition =>
qq[attachment; filename="$ascii_file"; filename*=UTF-8''$utf8_file] );




The filename* is easy, I'm finding:


my $utf8_file = uri_escape( Encode::encode( 'UTF-8' => $filename ) );




But the $ascii_file is a bit more work. Percent-encoding doesn't work. So, have to do a bit of filtering.




See any easier/cleaner/more-correct approach? When I see this much code I tend to think it's the wrong approach.






You can use Text::Unidecode if you want to replace special chars with ASCII chars.

Octavian