Mailing List Archive

Spanish text in new software
I'm curious: why is the Spanish text in wikiTextEs.php encoded
in utf-8? ISO-8859-1 has all the needed characters for Spanish;
The German wikiTextDe.php uses plain ISO, and the current
es.wikipedia.com is in ISO.

--
Lee Daniel Crocker <lee@piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Re: Spanish text in new software [ In reply to ]
On Tue, May 07, 2002 at 01:47:57AM -0500, Lee Daniel Crocker wrote:
> I'm curious: why is the Spanish text in wikiTextEs.php encoded
> in utf-8? ISO-8859-1 has all the needed characters for Spanish;
> The German wikiTextDe.php uses plain ISO, and the current
> es.wikipedia.com is in ISO.


All wikipedias are switching to utf-8.
Polish (normally latin2) and Esperanto (normally latin3) both decided
to do that.

I'm only wondering why English didn't do it *yet*
and I why Germans didn't switched.

I hope that both will switch.

Reasons include:

1.
We can't use latinX, we can either use utf8 or latinX +
&lot_of_silly_numeric_entities;

2.
UTF-8 allows you to insert all diactrics and other characters that
happen from time to time in proper names etc. Encoding them in html
codes is huge pita. There is no software that facilitate it.

3.
&silly_numeric_entities; are completely unreadable, and wikipedia
markup language should be easy to read and write.

4.
Interwiki links 100% require utf-8. Making interwiki links using
%-codes is completely out of the question. You can't even use
&silly_entities; in such links, as software don't know in what
encoding should it convert non-ascii characters to %-codes.

5.
General interoperability needs that. You wouldn't even be able to copy
from one wikipedia to another, if they used different charsets,
as software won't know in should convert diactrics to &entities;