Mailing List Archive

Own Wiki / Customization / UTF-8 / ISO-8859-1
Hi!

okay, I've head enough headaches, now I'm asking:

I've got a recent CVS-version, I did an install with an older copy, I'm
running a recent apache and php4 on a recent Debian unstable, and I'm
freaking out. The latter has at least a little bit to do with mediawiki.

Main problem: I set the language to german, and the messages are in
german - but all umlauts (read: special characters) are encoded
ISO-8859-1; while mediawiki (correctly, in my opinion) thinks and tells
(via meta http-equiv) we're using UTF-8 - overriding Apache/PHP saying
we're doing ISO-8859-1 .. a little bit shizophrenic after all.

I've had a look at LanguageDe.php, and (almost, looks like a bug) all
characters there are UTF-8-encoded .. the characters in de.lang aren't,
but this doesn't seem to matter too much? I don't know.

I tried chaning LanguageDe.php and running rebuildMessages.php, and get
warnings as well as errors:

---- 8< ----

apache:/var/www/slop.flatline.de/fswiki/maintenance# php4 rebuildMessages.php

Warning: Invalid argument supplied for foreach() in
/var/www/slop.flatline.de/fswiki/includes/User.php on line 90

Warning: Invalid argument supplied for foreach() in
/var/www/slop.flatline.de/fswiki/includes/User.php on line 90

Warning: Invalid argument supplied for foreach() in
/var/www/slop.flatline.de/fswiki/includes/User.php on line 90

Warning: Invalid argument supplied for foreach() in
/var/www/slop.flatline.de/fswiki/includes/Setup.php on line 250

Fatal error: Undefined class name 'database' in
/var/www/slop.flatline.de/fswiki/includes/LoadBalancer.php on line 151

---- >8 ----

.. adding require_once("Database.php"); to LoadBalancer.php fixes the
error and allows me to run the script - I choose '2', delete old
messages and rebuild them from scratch. changes NOTHING, not even a text
I changed completely. quite frustrating.

So, my questions:

- where could the problem with ISO-8859-1 characters in an UTF-8 page
come from? I guess it's ISO-8859-1 characters in the database.
- how could I backup _only_ the pages from the database and scrap
everything else, so I could start with a freshly generated database?
- how can I customize other things? i.e. the menues, the picture, stuff
like that ...

while I understand mediawiki is designed for wikipedia some
documentation for these tasks might be deemed helpful by other users
running their own wikis, too :)

kind regards,

Count

--
Andreas Kotes - ICQ: 3741366 - The views expressed herein are (only) mine!
Follow the path of the unsafe, independent thinker. Expose your ideas to the
danger of controversy. Speak your mind and fear less the label of "crackpot"
than the stigma of conformity. (Thomas J. Watson) ### OpenPGP key 0x8F94C228
Re: Own Wiki / Customization / UTF-8 / ISO-8859-1 [ In reply to ]
Andreas Kotes wrote:
> Main problem: I set the language to german, and the messages are in
> german - but all umlauts (read: special characters) are encoded
> ISO-8859-1; while mediawiki (correctly, in my opinion) thinks and tells
> (via meta http-equiv) we're using UTF-8 - overriding Apache/PHP saying
> we're doing ISO-8859-1 .. a little bit shizophrenic after all.

If you are upgrading an old installation you should probably specify
"Deutsch (Latin-1)" for the language/charset rather than "Deutsch
(Unicode)".

To do this manually after setup, set
$wgUseLatin1 = true; and rebuild the messages.

> - how could I backup _only_ the pages from the database and scrap
> everything else, so I could start with a freshly generated database?

Use mysqldump to backup anything you want from the database.

> - how can I customize other things? i.e. the menues, the picture, stuff
> like that ...

Find and edit the appropriate files. Sorry, annoying answer. :)

-- brion vibber (brion @ pobox.com)
Re: Own Wiki / Customization / UTF-8 / ISO-8859-1 [ In reply to ]
Heya,

* Brion Vibber <brion@pobox.com> [20040810 19:39]:
> Andreas Kotes wrote:
> >Main problem: I set the language to german, and the messages are in
> >german - but all umlauts (read: special characters) are encoded
> >ISO-8859-1; while mediawiki (correctly, in my opinion) thinks and tells
> >(via meta http-equiv) we're using UTF-8 - overriding Apache/PHP saying
> >we're doing ISO-8859-1 .. a little bit shizophrenic after all.
>
> If you are upgrading an old installation you should probably specify
> "Deutsch (Latin-1)" for the language/charset rather than "Deutsch
> (Unicode)".

But I want to use Unicode ..

> To do this manually after setup, set
> $wgUseLatin1 = true; and rebuild the messages.

.. doesn't change anything at all.

> >- how could I backup _only_ the pages from the database and scrap
> > everything else, so I could start with a freshly generated database?
>
> Use mysqldump to backup anything you want from the database.

no. everything. anything I want would be a FAR smaller selection.

> >- how can I customize other things? i.e. the menues, the picture, stuff
> > like that ...
>
> Find and edit the appropriate files. Sorry, annoying answer. :)

did that, didn't change anything at all either (at least for the
messages).

Count

--
Andreas Kotes - ICQ: 3741366 - The views expressed herein are (only) mine!
Follow the path of the unsafe, independent thinker. Expose your ideas to the
danger of controversy. Speak your mind and fear less the label of "crackpot"
than the stigma of conformity. (Thomas J. Watson) ### OpenPGP key 0x8F94C228
Re: Own Wiki / Customization / UTF-8 / ISO-8859-1 [ In reply to ]
Andreas Kotes wrote:
>>If you are upgrading an old installation you should probably specify
>>"Deutsch (Latin-1)" for the language/charset rather than "Deutsch
>>(Unicode)".
>
> But I want to use Unicode ..

You'll want to convert your data from latin-1 to UTF-8, then. Dump, run
it through iconv, and restore.

>>>- how could I backup _only_ the pages from the database and scrap
>>> everything else, so I could start with a freshly generated database?
>>
>>Use mysqldump to backup anything you want from the database.
>
>
> no. everything. anything I want would be a FAR smaller selection.

You can backup ask much or as little as you want.

-- brion vibber (brion @ pobox.com)
Re: Own Wiki / Customization / UTF-8 / ISO-8859-1 [ In reply to ]
Andreas Kotes wrote:
[snip]
> .. adding require_once("Database.php"); to LoadBalancer.php fixes the
> error and allows me to run the script - I choose '2', delete old
> messages and rebuild them from scratch. changes NOTHING, not even a text
> I changed completely. quite frustrating.

There's one more fix you have to make to get it working: move this line
from near the bottom of maintenance/commandLine.inc to near the top:

define("MEDIAWIKI",true);

Some of the include files are guarded by a check for this variable to
protect against possible attacks by loading the files individual from
the outside. With the define at the bottom, important definitions were
being ignored and things don't work correctly.

Additionally to make the changes visible you'll have to manually clear
the objectcache table ('DELETE FROM objectcache').

(Fixes for these are in current CVS head and 1.3 branches, but anon CVS
is a little behind.)

-- brion vibber (brion @ pobox.com)
Re: Own Wiki / Customization / UTF-8 / ISO-8859-1 [ In reply to ]
Heya,

* Brion Vibber <brion@pobox.com> [20040810 21:13]:
> >But I want to use Unicode ..
>
> You'll want to convert your data from latin-1 to UTF-8, then. Dump, run
> it through iconv, and restore.

iconv didn't help (mixed UTF-8 and ISO-8859-1 in the database),

while (<>) {
s/\344/\303\244/g;
s/\366/\303\266/g;
s/\374/\303\274/g;
s/\304/\303\204/g;
s/\326/\303\226/g;
s/\334/\303\234/g;
s/\337/\303\237/g;
print $_;
}

did.

so long,

Count

--
Andreas Kotes - ICQ: 3741366 - The views expressed herein are (only) mine!
Follow the path of the unsafe, independent thinker. Expose your ideas to the
danger of controversy. Speak your mind and fear less the label of "crackpot"
than the stigma of conformity. (Thomas J. Watson) ### OpenPGP key 0x8F94C228