Hi everybody,
I've just been fighting with sorting and alphabetical ordering in
multiple languages, and I've got things to work, but I'm a little
puzzled about how. So if anybody has any insight, I'd be grateful.
This is for IFEX, on something called the "Digest." It's a
regularly-published list of items recently published on the site. You
can see an example here:
http://www.ifex.org/2010/02/12/digest/
It's a big alphabetical list of regions (OK, "International" is at the
top), and within each region is an alphabetical list of countries.
I had been doing the alphabetization with the Schwartz, looking up the
name of each country according to the output channel:
my @alphabetized_cats =
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_ => $m->scomp('/util/translations.mc', word => $_) ] }
keys(%all_cats);
(translations.mc maps category URIs to country names based on the
current OC).
This was mostly fine, except that the vanilla Perl sort is really only
good for asciibetical order. In Friday's Digest, "Rwanda" was coming
before "République démocratique du Congo."
So I've been trying to use locales, like this:
my %ocs_to_locales = (
'Web (French)' => 'fr_FR.utf8',
'Web (Spanish)' => 'es_ES.utf8',
'Web (Russian)' => 'ru_RU.utf8',
'Web (Arabic)' => 'ar_EG.utf8',
);
use POSIX;
use locale;
if ($ocs_to_locales{$burner->get_oc->get_name}) {
POSIX::setlocale(LC_COLLATE,
$ocs_to_locales{$burner->get_oc->get_name});
}
...then do the sort, and then add this line afterward:
no locale;
Sadly, the utf8 locales seem to have the characters in completely nutty
order. "Rwanda" still came before "République démocratique du Congo."
Dropping the ".utf8" from the French locale name, and using just "fr_FR"
works, though. So I'm full of hope for Spanish and Arabic.
Now, everything in the site is all UTF8, so I'm puzzled about why the
".utf8" locales turned out to be bad choices. Does anybody have any
idea?
Thanks,
Bret
--
Bret Dawson
Producer
Pectopah Productions Inc.
(416) 895-7635
bret@pectopah.com
www.pectopah.com
I've just been fighting with sorting and alphabetical ordering in
multiple languages, and I've got things to work, but I'm a little
puzzled about how. So if anybody has any insight, I'd be grateful.
This is for IFEX, on something called the "Digest." It's a
regularly-published list of items recently published on the site. You
can see an example here:
http://www.ifex.org/2010/02/12/digest/
It's a big alphabetical list of regions (OK, "International" is at the
top), and within each region is an alphabetical list of countries.
I had been doing the alphabetization with the Schwartz, looking up the
name of each country according to the output channel:
my @alphabetized_cats =
map { $_->[0] }
sort { $a->[1] cmp $b->[1] }
map { [ $_ => $m->scomp('/util/translations.mc', word => $_) ] }
keys(%all_cats);
(translations.mc maps category URIs to country names based on the
current OC).
This was mostly fine, except that the vanilla Perl sort is really only
good for asciibetical order. In Friday's Digest, "Rwanda" was coming
before "République démocratique du Congo."
So I've been trying to use locales, like this:
my %ocs_to_locales = (
'Web (French)' => 'fr_FR.utf8',
'Web (Spanish)' => 'es_ES.utf8',
'Web (Russian)' => 'ru_RU.utf8',
'Web (Arabic)' => 'ar_EG.utf8',
);
use POSIX;
use locale;
if ($ocs_to_locales{$burner->get_oc->get_name}) {
POSIX::setlocale(LC_COLLATE,
$ocs_to_locales{$burner->get_oc->get_name});
}
...then do the sort, and then add this line afterward:
no locale;
Sadly, the utf8 locales seem to have the characters in completely nutty
order. "Rwanda" still came before "République démocratique du Congo."
Dropping the ".utf8" from the French locale name, and using just "fr_FR"
works, though. So I'm full of hope for Spanish and Arabic.
Now, everything in the site is all UTF8, so I'm puzzled about why the
".utf8" locales turned out to be bad choices. Does anybody have any
idea?
Thanks,
Bret
--
Bret Dawson
Producer
Pectopah Productions Inc.
(416) 895-7635
bret@pectopah.com
www.pectopah.com