Mailing List Archive

Bug: setlocale and case-insensitive pattern matching
Johan Holmberg writes:
>
> Hi !
>
> I run perl 5.001m in an environment where I use the Iso-Latin-1
> character set.
>
> When trying to do case-insensitive pattern matching with strings
> and patterns containing non-ASCII characters (the swedish
> characters åäöÅÄÖ) strange things happened.
>
> The code looked something like:
>
> if ($word =~ /dalaröbrygga/i) {
> print "Found $&\n";
> }
>
> This matched "dalaröbrygga" and "DALARöBRYGGA" but not "DALARÖBRYGGA"

Hejsan!

The pre-5.002beta1 releases of Perl5 need this kind of trick:

use POSIX 'setlocale';
setlocale(LC_CTYPE, 'whatchamacallit');

where the 'whatchamacallit' is the locale you want to use for your
national characters. In most modern systems the Swedish locale is:

sv_SE.ISO8859-1

as you most probably already know. But it might be something else
depending on your vendor, try out "locale -a" and/or consult your
documentation.

Yes, I did say "pre-5.002beta1". In the beta1 Larry included a patch
of mine that automagically does the above thing (setlocale(LC_CTYPE, ...))
in C (faster) and for every Perl (provided the environment variable
LC_CTYPE is set). So it _should_ not break in places where LC_CTYPE
is unset and be helpful in places where it is set.

You can get the 5.002beta1d from:

Sweden ftp://ftp.sunet.se/pub/lang/perl/CPAN/

and in there by combining

src/5.0/perl5.002beta1.tar.gz
CPAN/authors/id/ANDYD/perl5.002beta1.patch.2b1[a-d].gz

(There should be a ready-patched perl5.002beta1d.tar.gz available
in a couple of days)

++jhi;
Re: Bug: setlocale and case-insensitive pattern matching [ In reply to ]
On Fri, 8 Dec 1995, Jarkko Hietaniemi wrote:
[...]
>
> Yes, I did say "pre-5.002beta1". In the beta1 Larry included a patch
> of mine that automagically does the above thing (setlocale(LC_CTYPE, ...))
> in C (faster) and for every Perl (provided the environment variable
> LC_CTYPE is set). So it _should_ not break in places where LC_CTYPE
> is unset and be helpful in places where it is set.
>
> You can get the 5.002beta1d from:
>
> Sweden ftp://ftp.sunet.se/pub/lang/perl/CPAN/
>
> and in there by combining
>
> src/5.0/perl5.002beta1.tar.gz
> CPAN/authors/id/ANDYD/perl5.002beta1.patch.2b1[a-d].gz
>

OK, so I did the following:

* fetched 5.002beta1 + the four patches
* built this on SunOS 4.1.x and Solaris 2.4
* checked that LC_CTYPE was set to "iso_8859_1"
* ran my test-script

I still got the same problem.
I then applied my little patch to "localize" the array "fold"
defined in "perl.h". Then it works as it should:

% ./perl5.002beta1d test-locale
version = 5.002
Cant find DALARÖBRYGGA
%
%
% ./perl5.002beta1d-jh test-locale
version = 5.002
Found DALARÖBRYGGA
%

So, I'm not convinced that it "just works" in 5.002beta1d.
Maybe I misunderstand some fundamental things about this.
In that case can you point it out ?

See the files I attach this mail:

- the "test-locale" script
- my patch to miniperlmain.c

Regards,
Johan Holmberg

-----------------------------------------------------------------------
Johan Holmberg Email: holmberg@upp.promotor.telia.se
Telia Promotor AB Phone: +46 18 18 94 55
Box 1218 Mobile: +46 70 528 94 55
751 42 Uppsala, SWEDEN Fax: +46 18 18 94 99
-----------------------------------------------------------------------