Johan Holmberg writes:
>
> Hi !
>
> I run perl 5.001m in an environment where I use the Iso-Latin-1
> character set.
>
> When trying to do case-insensitive pattern matching with strings
> and patterns containing non-ASCII characters (the swedish
> characters åäöÅÄÖ) strange things happened.
>
> The code looked something like:
>
> if ($word =~ /dalaröbrygga/i) {
> print "Found $&\n";
> }
>
> This matched "dalaröbrygga" and "DALARöBRYGGA" but not "DALARÖBRYGGA"
Hejsan!
The pre-5.002beta1 releases of Perl5 need this kind of trick:
use POSIX 'setlocale';
setlocale(LC_CTYPE, 'whatchamacallit');
where the 'whatchamacallit' is the locale you want to use for your
national characters. In most modern systems the Swedish locale is:
sv_SE.ISO8859-1
as you most probably already know. But it might be something else
depending on your vendor, try out "locale -a" and/or consult your
documentation.
Yes, I did say "pre-5.002beta1". In the beta1 Larry included a patch
of mine that automagically does the above thing (setlocale(LC_CTYPE, ...))
in C (faster) and for every Perl (provided the environment variable
LC_CTYPE is set). So it _should_ not break in places where LC_CTYPE
is unset and be helpful in places where it is set.
You can get the 5.002beta1d from:
Sweden ftp://ftp.sunet.se/pub/lang/perl/CPAN/
and in there by combining
src/5.0/perl5.002beta1.tar.gz
CPAN/authors/id/ANDYD/perl5.002beta1.patch.2b1[a-d].gz
(There should be a ready-patched perl5.002beta1d.tar.gz available
in a couple of days)
++jhi;
>
> Hi !
>
> I run perl 5.001m in an environment where I use the Iso-Latin-1
> character set.
>
> When trying to do case-insensitive pattern matching with strings
> and patterns containing non-ASCII characters (the swedish
> characters åäöÅÄÖ) strange things happened.
>
> The code looked something like:
>
> if ($word =~ /dalaröbrygga/i) {
> print "Found $&\n";
> }
>
> This matched "dalaröbrygga" and "DALARöBRYGGA" but not "DALARÖBRYGGA"
Hejsan!
The pre-5.002beta1 releases of Perl5 need this kind of trick:
use POSIX 'setlocale';
setlocale(LC_CTYPE, 'whatchamacallit');
where the 'whatchamacallit' is the locale you want to use for your
national characters. In most modern systems the Swedish locale is:
sv_SE.ISO8859-1
as you most probably already know. But it might be something else
depending on your vendor, try out "locale -a" and/or consult your
documentation.
Yes, I did say "pre-5.002beta1". In the beta1 Larry included a patch
of mine that automagically does the above thing (setlocale(LC_CTYPE, ...))
in C (faster) and for every Perl (provided the environment variable
LC_CTYPE is set). So it _should_ not break in places where LC_CTYPE
is unset and be helpful in places where it is set.
You can get the 5.002beta1d from:
Sweden ftp://ftp.sunet.se/pub/lang/perl/CPAN/
and in there by combining
src/5.0/perl5.002beta1.tar.gz
CPAN/authors/id/ANDYD/perl5.002beta1.patch.2b1[a-d].gz
(There should be a ready-patched perl5.002beta1d.tar.gz available
in a couple of days)
++jhi;