Mailing List Archive: mod_perl and utf8 and CGI->param

mod_perl and utf8 and CGI->param

merlyn at stonehenge

Sep 2, 2014, 2:19 PM

Post #1 of 15 (8835 views)

Getting really frustrated with mod_perl2's apparent inability to
probably read UTF8 input.

Here's my mod_perl2 setup:

Apache 2.2.[something]
mod_perl 2.0.7 (or nearly that)
ModPerl::Registry
Perl "script" with CGI.pm

Very early in my app:

## ensure utf8 CGI params:
$CGI::PARAM_UTF8 = 1;

binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";

This works fine in CGI mode: when I ask for $foo = $cgi->param('foo'),
DBI::data_string_desc($foo) shows a UTF8 string with the proper
discrepency between bytes and chars.

But when I try to run it under mod_perl, the returned string appears
to be the raw ascii bytes, and definitely not utf8. Of course, when I
store that in the database (using DBD::Pg), the "latin-1" is encoded
to "utf-8", and I get a bunch of weird chars on the output.

Has anyone managed to round-trip UTF8 from form to database and back
using a setup similar to this?

I suspect part of the problem is this in CGI.pm:

'read_from_client' => <<'END_OF_FUNC',
# Read data from a file handle
sub read_from_client {
my($self, $buff, $len, $offset) = @_;
local $^W=0; # prevent a warning
return $MOD_PERL
? $self->r->read($$buff, $len, $offset)
: read(\*STDIN, $$buff, $len, $offset);
}
END_OF_FUNC

Since I binmode STDIN, the non-$MOD_PERL works ok here. What's the
equivalent of $r->read() that marks the incoming stream as UTF8, so I
get chars instead of bytes? Or can I just read(\*STDIN) in mod_perl2
as well? (I know that was supported at one point...)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

Re: mod_perl and utf8 and CGI->param [ In reply to ]

Sep 3, 2014, 2:17 AM

Post #2 of 15 (8763 views)

Hi Randal.

Randal L. Schwartz wrote:
> Getting really frustrated with mod_perl2's apparent inability to
> probably read UTF8 input.
>
> Here's my mod_perl2 setup:
>
> Apache 2.2.[something]
> mod_perl 2.0.7 (or nearly that)
> ModPerl::Registry
> Perl "script" with CGI.pm
>
> Very early in my app:
>
> ## ensure utf8 CGI params:
> $CGI::PARAM_UTF8 = 1;
>
> binmode STDIN, ":utf8";
> binmode STDOUT, ":utf8";
> binmode STDERR, ":utf8";
>
> This works fine in CGI mode: when I ask for $foo = $cgi->param('foo'),
> DBI::data_string_desc($foo) shows a UTF8 string with the proper
> discrepency between bytes and chars.
>
> But when I try to run it under mod_perl, the returned string appears
> to be the raw ascii bytes, and definitely not utf8. Of course, when I
> store that in the database (using DBD::Pg), the "latin-1" is encoded
> to "utf-8", and I get a bunch of weird chars on the output.
>
> Has anyone managed to round-trip UTF8 from form to database and back
> using a setup similar to this?
>
> I suspect part of the problem is this in CGI.pm:
>
> 'read_from_client' => <<'END_OF_FUNC',
> # Read data from a file handle
> sub read_from_client {
> my($self, $buff, $len, $offset) = @_;
> local $^W=0; # prevent a warning
> return $MOD_PERL
> ? $self->r->read($$buff, $len, $offset)
> : read(\*STDIN, $$buff, $len, $offset);
> }
> END_OF_FUNC
>
> Since I binmode STDIN, the non-$MOD_PERL works ok here. What's the
> equivalent of $r->read() that marks the incoming stream as UTF8, so I
> get chars instead of bytes? Or can I just read(\*STDIN) in mod_perl2
> as well? (I know that was supported at one point...)
>
>
>

I share your frustration, as I have been dealing for a long time with multi-lingual web
applications, using perl and mod_perl.

First a very top-level comment : the basic problem here is the incompleteness of the HTTP
RFC's, and the lack of proper support of international characters sets, even still today.
When a browser is POST-ing the contents of the <input> elements of a <form> to a server,
there is a set of arcane rules which, in principle, determine the character set in which
this content is encoded. The problem is that these arcane rules are arcane, often
confusing, and in addition regularly flouted by different browser makes and versions (not
to even talk about umpteen non-browser proprietary HTTP client things).

For example, when a browser sends the content of a form in the "application/form-data"
"enctype", the content of each form parameter is sent as a separate section, in a form
similar to the parts in a multi-part RFC-822 email. In theory, each of these parts should
have its own "content-type" header, and if it is text, it should also contain a "charset"
attribute indicating the corresponding data's encoding.
(and if it doesn't, by virtue of the HTTP RFC's, it should be ISO-8859-1, which is still
the default HTTP character today; quite ridiculous, but so it is).

But the sad reality is that browser don't do that, and so in the practice in many cases
the server-side application is reduced to "guessing".

By experience more than by definite code knowledge, I have to suppose that this kind of
confusion sometimes also hits developers of modules such a CGI.pm and mod_perl, so that
over the years, things have tended to vary from one version to another (versions of
browsers, versions of perl, versions of mod_perl, versions of CGI.pm). Maybe also because
of all the reasons above, there is just no "right" way of handling this, so CGI.pm always
returns "bytes" (and libapreq2 may do things differwently).

In the end, rather than trying to follow the latest developments all the time and
continuously patch my programs because of all this, I have resorted to some "defensive
programming" techniques in terms of interpreting <form>-posted data, which have been
working fine for me for the last few years. It may well be that they are a total
overkill, but in the practice they have saved me a lot of time not spent wondering why the
data in some application suddenly started to show up as "A tilde" followed by some bizarre
graphic sign (or, at the opposite, as a question mark embedded in a losange).

(Even logging this stuff and trying to figure out what is going on is a pain, because you
have to figure out first in what encoding you are logging, and second in what encoding you
are viewing your logs).

The methodology I follow is as follows :

1) all html <form> pages of the applications should have a tag like :
<meta content-type="text/html; charset=.....">
2) all <forms> in the page should have the attributes
enctype="application/form-data"
accept-charset="....." (the same as above)

The above 2 things do not really guarantee anything, but at least they establish some
"baseline" which helps in interpreting the rest (and slapping users when they change their
browser settings).

3) all forms contain a hidden text <input> like
<input type="hidden" name="my-UTF8-check" value="AÜÖ.."> (some known sequence of
"diacritics" characters guaranteed to have a different byte length between ISO-8859-x and
UTF-8 encoding)

The point of this one is :
- all "your" forms have this parameter, so when you receive some posted data, you can
reasonably assume that it is one of "your" forms that sent it.
- if the browser sends the data in iso-8859-1, this string will be a certain length in
bytes, and similarly for UTF-8. You can measure that length in a "use bytes;" section of
the cgi-bin script. And you can also just compare this with some carefully-crafted string
constant.

Then, on the server side, I have some code which systematically checks which is the
encoding that is *really* seen by the program (cgi-bin script or mod_perl module) for
these form input elements (using various clues from the server configuration, and the
above received hidden form parameter).
And when this code "knows" the received encoding, it then systemetically "sets" or not the
perl "utf8" flag for these received cgi->param("x") values before actually using them (or
encode/decode's them as appropriate).
The point here being that the rest of your script can assume that all the param values are
UTF-8 encoded, and known as such by Perl; and be done with it all.

I'm not saying that this is the cleverest and most elegant and most efficient way of
dealing with this, nor that it is the answer you were looking for.
But it's helped me sleep better for quite a while now.

Re: mod_perl and utf8 and CGI->param [ In reply to ]

cosimo at streppone

Sep 3, 2014, 2:23 AM

Post #3 of 15 (8732 views)

On 09/03/2014 11:17 AM, AndrÃ© Warnier wrote:

> 3) all forms contain a hidden text <input> like
> <input type="hidden" name="my-UTF8-check" value="AÃœÃ–.."> (some known
> sequence of "diacritics" characters guaranteed to have a different byte
> length between ISO-8859-x and UTF-8 encoding)
> [...]
> But it's helped me sleep better for quite a while now.

This is brilliant :-)
Thanks AndrÃ©.

--
Cosimo

Re: mod_perl and utf8 and CGI->param [ In reply to ]

merlyn at stonehenge

Sep 3, 2014, 12:38 PM

Post #4 of 15 (8747 views)

>>>>> "AndrÃ©" == AndrÃ© Warnier <aw@ice-sa.com> writes:

AndrÃ©> The methodology I follow is as follows :

AndrÃ©> 1) all html <form> pages of the applications should have a tag like :
AndrÃ©> <meta content-type="text/html; charset=.....">
AndrÃ©> 2) all <forms> in the page should have the attributes
AndrÃ©> enctype="application/form-data"
AndrÃ©> accept-charset="....." (the same as above)

I've pretty much got success with CGI (and CGI.pm) doing the things I
listed above. So this isn't needed. I'm not having problems with the
browser, Apache, or Perl, or RDBO, or Postgresql. (Even that took a bit
of work to get working, and so I think none of those are the issue.)

What I need to know is what is mod_perl doing differently? Does it not
respect binmode STDIN, ":utf8"? Apparently not. So if you know of a
way to get mod_perl to "fix" reading from the browser properly, I'm
interested in that.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

Re: mod_perl and utf8 and CGI->param [ In reply to ]

Sep 3, 2014, 2:34 PM

Post #5 of 15 (8749 views)

I encode a "pound sign" which as a parameter which indicates whether
content is UTF-8, UCS or latin-1 - and this seems to resolve most of the
issues... I did take a lot of effort to fix issues with utf8 and there
are a lot of these - between form -> post; between requests if storing
data in sessions; between script and database; etc...

I do however not use CGI.pm but use APR instead which I know works (and
may be less error prone)

James

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Re: mod_perl and utf8 and CGI->param [ In reply to ]

torsten.foertsch at gmx

Sep 4, 2014, 1:19 AM

Post #6 of 15 (8738 views)

On 03/09/14 21:38, Randal L. Schwartz wrote:
> What I need to know is what is mod_perl doing differently? Does it not
> respect binmode STDIN, ":utf8"? Apparently not. So if you know of a
> way to get mod_perl to "fix" reading from the browser properly, I'm
> interested in that.

Something along these lines:

use Apache2::RequestIO ();
use Encode ();
BEGIN {
my $orig=\&Apache2::RequestRec::read;
*Apache2::RequestRec::read=sub {
my ($r, $buf, $len, $offset)=@_;
my $_buf;
my $rc=$r->$orig($_buf, $len);
substr($buf, $offset, undef, Encode::decode_utf8 $_buf);
return $rc;
};
}

It's a bit more complicated than that because $_buf may end in the
middle of a character. But you can catch that and read a few more bytes.
Also, not sure if you expect the return value to be in octets or characters.

Though, I wouldn't go this way. I'd either try to force CGI.pm to read
from STDIN and use the perl-script handler
(http://perl.apache.org/docs/2.0/user/config/config.html#C_perl_script_). This
pushes a PerlIO layer to STDIN so that you can read from STDIN. On top
of that you can push :utf8 then.

The other way I'd prefer over the hack above is to patch CGI.pm to
convert the data after it has read it. You can even do that in your
application. Many applications I have seen have a separate step to
sanitize the input. That would be the place to do that. However, then
you have to watch out for upload fields.

So, there is no really simple solution. And I don't think this will be
"fixed" in modperl because $r has no such concept as an IO layer. The
closest thing httpd/modperl has to offer is an input filter. But that
won't help you here because brigades are handled mainly by httpd which
knows only about octets. You don't want to change the data itself. You
want to change the data's metadata.

Torsten

Re: mod_perl and utf8 and CGI->param [ In reply to ]

merlyn at stonehenge

Sep 4, 2014, 12:45 PM

Post #7 of 15 (8753 views)

>>>>> "Torsten" == Torsten FÃ¶rtsch <torsten.foertsch@gmx.net> writes:

Torsten> Though, I wouldn't go this way. I'd either try to force CGI.pm to read
Torsten> from STDIN and use the perl-script handler
Torsten> (http://perl.apache.org/docs/2.0/user/config/config.html#C_perl_script_). This
Torsten> pushes a PerlIO layer to STDIN so that you can read from STDIN. On top
Torsten> of that you can push :utf8 then.

Yeah, just coded that. In a BEGIN block in my app, I monkey-patched
read_from_client:

BEGIN {
## monkey-patch CGI.pm so we can get proper utf8 handling
require CGI;
CGI::_compile_all(qw(
read_from_client
));
# warn "defined &CGI::read_from_client is ", 0 + defined
&CGI::read_from_client;

## moose 'around' would be nice here. :)
my $read_from_client = \&CGI::read_from_client;
no warnings 'redefine';
*CGI::read_from_client = sub {
local $CGI::MOD_PERL = $CGI::MOD_PERL;
warn "prior MOD_PERL is $CGI::MOD_PERL";
if (our $USE_STDIN_FOR_MOD_PERL) {
$CGI::MOD_PERL = 0;
}
warn "after MOD_PERL is $CGI::MOD_PERL";
goto &$read_from_client;
}
}

And in my toplevel, I now do this:

sub activate {
my $self = shift;

require Carp;
local $SIG{__DIE__} = \&Carp::confess;

## ensure utf8 CGI params:
local $CGI::PARAM_UTF8 = 1;
## and disable mod_perl handling during read_from_client
local our $USE_STDIN_FOR_MOD_PERL = 1;

binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";

return $self->SUPER::activate(@_);
}

(This is my CGI::Prototype-based code, from the CPAN...)

I'm properly getting the $CGI::MOD_PERL set to 0, which forces
read from STDIN (via $r) instead of the native STDIN. In theory. In
practice, even though I've done a binmode STDIN, I'm still getting raw
bytes from read(\*STDIN...), not utf8-tagged strings.

Not sure what to do next. Still frustrated.

Why can't the world just use ASCII? :)

(I even tried binmode STDIN, "encoding(utf8)" just now as well.)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

Re: mod_perl and utf8 and CGI->param [ In reply to ]

merlyn at stonehenge

Sep 4, 2014, 1:45 PM

Post #8 of 15 (8758 views)

>>>>> "Randal" == Randal L Schwartz <merlyn@stonehenge.com> writes:

Randal> Yeah, just coded that. In a BEGIN block in my app, I monkey-patched
Randal> read_from_client:

And then I've also tried to monkey-patch ->read just as you said.

On the first read, an empty string is apparently returned, which fails
something higher in CGI.pm. Ugh.

Update:

This monkey patch works:

*Apache2::RequestRec::read = sub {
warn "READ CALLED";
goto &$orig;
}

Although it doesn't do any decoding. When I replace the body of that
with your code, I'm getting these zero-byte reads. Even this fails:

my ($r, $buff, $len, $offset)=@_;
# my $_buff;
# my $rc = $r->$orig($_buff, $len);
my $rc = $r->$orig($buff, $len, $offset);
# warn "BEFORE: ", DBI::data_string_desc($_buff);
# utf8::decode($_buff);
# warn "AFTER: ", DBI::data_string_desc($_buff);
# substr($buff, $offset, undef, $_buff);
# warn "AFTER: ", DBI::data_string_desc($buff);
return $rc;

which should be the same as your code without the utf8 encoding still.
Still getting 0 bytes though.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

Re: mod_perl and utf8 and CGI->param [ In reply to ]

Sep 8, 2014, 10:56 AM

Post #9 of 15 (8702 views)

On 9/2/14, 4:19 PM, Randal L. Schwartz wrote:

> ## ensure utf8 CGI params:
> $CGI::PARAM_UTF8 = 1;

Sorry to chime in late on this, but part of the problem with CGI.pm and
UTF-8 is that PARAM_UTF8 gets clobbered by a cleanup handler that CGI.pm
itself registers if its running under mod_perl.

This caused major headaches for me at one time until I figured this out.

You have to make sure to set $CGI::PARAM_UTF8 early, and FOR EVERY
REQUEST, because if you just set it globally (e.g.: in a startup perl
script), then it only works for the first request.

Regards,
Michael Schout

Re: mod_perl and utf8 and CGI->param [ In reply to ]

Sep 8, 2014, 12:16 PM

Post #10 of 15 (8701 views)

Michael Schout wrote:
> On 9/2/14, 4:19 PM, Randal L. Schwartz wrote:
>
>> ## ensure utf8 CGI params:
>> $CGI::PARAM_UTF8 = 1;
>
> Sorry to chime in late on this, but part of the problem with CGI.pm and
> UTF-8 is that PARAM_UTF8 gets clobbered by a cleanup handler that CGI.pm
> itself registers if its running under mod_perl.
>
> This caused major headaches for me at one time until I figured this out.
>
> You have to make sure to set $CGI::PARAM_UTF8 early, and FOR EVERY
> REQUEST, because if you just set it globally (e.g.: in a startup perl
> script), then it only works for the first request.
>

Hi.
Just an addendum to the discussion :

There are really two distinct approaches to this issue, and they work at different levels :

1) is to "fix" CGI.pm so that it delivers the parameters in the way which you expect.
As shown by the previous valuable and technical contributions, this generally works, but
it requires a certain level of expertise; and it does not necessarily work backwards with
all versions of mod_perl and CGI.pm.

2) is to take whatever CGI.pm does deliver to the calling script or module, and use a
couple of tricks and some additional code in ditto script or module, to ensure that
whatever CGI.pm delivers under whatever mod_perl version, the receiving script or module
always knows in the end what it is dealing with.
That is the method which I presented early in the discussion.
As stated in that contribution, it is not necessarily the most elegant or efficient way to
deal with the issue, but it has the advantage of working always, no matter which version
of CGI.pm and/or mod_perl are in use.

The real crux of the matter is this, in my view : as things stand today in terms of
protocol and RFCs, there is no real way for CGI.pm (or any comparable framework) to be
*sure* of the encoding of the data sent by a browser or another HTTP client agent. Even
the RFCs do not really provide a way by which this can be enforced. (*)

So if you are sure of what the client is sending, and the matter consists of *forcing*
CGI.pm to always communicate POST (or GET) data as UTF-8 encoded and utf8-marked (or the
opposite) to the calling script/module, then method 1 will work, and it is more elegant
and probably more efficient than method 2.

But if the matter consists of ensuring that the receiving code in the script/module which
handles the data submitted by the HTTP client, is resilient and "does the right thing"
whatever the submitted data really was, then in my opinion method 2 is better.
(But that's only my opinion of the moment, and I stand ready to be corrected).

(*) and if you believe this not to be true, please send me some references about it,
because I am really interested. It might save me some code in all my web-facing applications.

Re: mod_perl and utf8 and CGI->param [ In reply to ]

joe_schaefer at yahoo

Sep 14, 2014, 7:41 PM

Post #11 of 15 (8684 views)

apreq validates anything it presents as utf8, otherwise it marks it as ISO88591 or some windows encoding I don't remember the name of if that fails.

On Monday, September 8, 2014 3:17 PM, André Warnier <aw@ice-sa.com> wrote:

Michael Schout wrote:

> On 9/2/14, 4:19 PM, Randal L. Schwartz wrote:
>
>> ## ensure utf8 CGI params:
>> $CGI::PARAM_UTF8 = 1;
>
> Sorry to chime in late on this, but part of the problem with CGI.pm and
> UTF-8 is that PARAM_UTF8 gets clobbered by a cleanup handler that CGI.pm
> itself registers if its running under mod_perl.
>
> This caused major headaches for me at one time until I figured this out.
>
> You have to make sure to set $CGI::PARAM_UTF8 early, and FOR EVERY
> REQUEST, because if you just set it globally (e.g.: in a startup perl
> script), then it only works for the first request.
>

Hi.
Just an addendum to the discussion :

There are really two distinct approaches to this issue, and they work at different levels :

1) is to "fix" CGI.pm so that it delivers the parameters in the way which you expect.
As shown by the previous valuable and technical contributions, this generally works, but
it requires a certain level of expertise; and it does not necessarily work backwards with
all versions of mod_perl and CGI.pm.

2) is to take whatever CGI.pm does deliver to the calling script or module, and use a
couple of tricks and some additional code in ditto script or module, to ensure that
whatever CGI.pm delivers under whatever mod_perl version, the receiving script or module
always knows in the end what it is dealing with.
That is the method which I presented early in the discussion.
As stated in that contribution, it is not necessarily the most elegant or efficient way to
deal with the issue, but it has the advantage of working always, no matter which version
of CGI.pm and/or mod_perl are in use.

The real crux of the matter is this, in my view : as things stand today in terms of
protocol and RFCs, there is no real way for CGI.pm (or any comparable framework) to be
*sure* of the encoding of the data sent by a browser or another HTTP client agent. Even
the RFCs do not really provide a way by which this can be enforced. (*)

So if you are sure of what the client is sending, and the matter consists of *forcing*
CGI.pm to always communicate POST (or GET) data as UTF-8 encoded and utf8-marked (or the
opposite) to the calling script/module, then method 1 will work, and it is more elegant
and probably more efficient than method 2.

But if the matter consists of ensuring that the receiving code in the script/module which
handles the data submitted by the HTTP client, is resilient and "does the right thing"
whatever the submitted data really was, then in my opinion method 2 is better.
(But that's only my opinion of the moment, and I stand ready to be corrected).

(*) and if you believe this not to be true, please send me some references about it,
because I am really interested. It might save me some code in all my web-facing applications.

Re: mod_perl and utf8 and CGI->param [ In reply to ]

merlyn at stonehenge

Jun 1, 2017, 8:34 AM

Post #12 of 15 (5404 views)

>>>>> "Randal" == Randal L Schwartz <merlyn@stonehenge.com> writes:
Randal> Getting really frustrated with mod_perl2's apparent inability to
Randal> probably read UTF8 input.

Randal> Here's my mod_perl2 setup:

Randal> Apache 2.2.[something]
Randal> mod_perl 2.0.7 (or nearly that)
Randal> ModPerl::Registry
Randal> Perl "script" with CGI.pm

Randal> Very early in my app:

Randal> ## ensure utf8 CGI params:
Randal> $CGI::PARAM_UTF8 = 1;

Randal> binmode STDIN, ":utf8";
Randal> binmode STDOUT, ":utf8";
Randal> binmode STDERR, ":utf8";

Randal> This works fine in CGI mode: when I ask for $foo = $cgi->param('foo'),
Randal> DBI::data_string_desc($foo) shows a UTF8 string with the proper
Randal> discrepency between bytes and chars.

Randal> But when I try to run it under mod_perl, the returned string appears
Randal> to be the raw ascii bytes, and definitely not utf8. Of course, when I
Randal> store that in the database (using DBD::Pg), the "latin-1" is encoded
Randal> to "utf-8", and I get a bunch of weird chars on the output.

Randal> Has anyone managed to round-trip UTF8 from form to database and back
Randal> using a setup similar to this?

Randal> I suspect part of the problem is this in CGI.pm:

Randal> 'read_from_client' => <<'END_OF_FUNC',
Randal> # Read data from a file handle
Randal> sub read_from_client {
Randal> my($self, $buff, $len, $offset) = @_;
Randal> local $^W=0; # prevent a warning
Randal> return $MOD_PERL
Randal> ? $self->r->read($$buff, $len, $offset)
Randal> : read(\*STDIN, $$buff, $len, $offset);
Randal> }
Randal> END_OF_FUNC

Randal> Since I binmode STDIN, the non-$MOD_PERL works ok here. What's the
Randal> equivalent of $r->read() that marks the incoming stream as UTF8, so I
Randal> get chars instead of bytes? Or can I just read(\*STDIN) in mod_perl2
Randal> as well? (I know that was supported at one point...)

I realized that I never posted my ultimate solution. I monkey patch
CGI.pm:

require CGI;
{
my $orig = \&CGI::param;
no warnings 'redefine';
*CGI::param = sub {
$CGI::LIST_CONTEXT_WARN = 0; # workaround for backward compatibility
$CGI::PARAM_UTF8 = 1;
goto &$orig;
};
}

And this has been working just fine for both CGI and mod_perl. Just for the
record.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig

Re: mod_perl and utf8 and CGI->param [ In reply to ]

pyh at vodafonemail

Jun 1, 2017, 5:48 PM

Post #13 of 15 (5404 views)

good patch. thanks for sharing.

On 2017/6/1 23:34, Randal L. Schwartz wrote:
> I realized that I never posted my ultimate solution. I monkey patch
> CGI.pm:
>
> require CGI;
> {
> my $orig = \&CGI::param;
> no warnings 'redefine';
> *CGI::param = sub {
> $CGI::LIST_CONTEXT_WARN = 0; # workaround for backward compatibility
> $CGI::PARAM_UTF8 = 1;
> goto &$orig;
> };
> }
>
> And this has been working just fine for both CGI and mod_perl. Just for the
> record.

Re: mod_perl and utf8 and CGI->param [ In reply to ]

pyh at vodafonemail

Jun 1, 2017, 5:51 PM

Post #14 of 15 (5404 views)

And, can I override any method from a class via this way? is this a
general trick? thanks.

On 2017/6/2 8:48, Peng Yonghua wrote:
> good patch. thanks for sharing.
>
> On 2017/6/1 23:34, Randal L. Schwartz wrote:
>> I realized that I never posted my ultimate solution. I monkey patch
>> CGI.pm:
>>
>> require CGI;
>> {
>> my $orig = \&CGI::param;
>> no warnings 'redefine';
>> *CGI::param = sub {
>> $CGI::LIST_CONTEXT_WARN = 0; # workaround for backward compatibility
>> $CGI::PARAM_UTF8 = 1;
>> goto &$orig;
>> };
>> }
>>
>> And this has been working just fine for both CGI and mod_perl. Just
>> for the
>> record.

Re: mod_perl and utf8 and CGI->param [ In reply to ]

merlyn at stonehenge

Jun 2, 2017, 12:04 PM

Post #15 of 15 (5403 views)

>>>>> "Peng" == Peng Yonghua <pyh@vodafonemail.de> writes:

Peng> And, can I override any method from a class via this way? is this a general
Peng> trick? thanks.

Yes, and your downstream will hate you for it. The ruby people do this
all the time, and it makes their code brittle. I did this in my app,
and would never think of putting that into the core CGI::Prototype where
this gets used, even though it would solve the problem for everyone.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix consulting, Technical writing, Comedy, etc. etc.
Still trying to think of something clever for the fourth line of this .sig