Mailing List Archive

Re: Wikitech-l digest, Vol 1 #332 - 14 msgs
Message: 14
Date: Sat, 11 Jan 2003 21:46:07 -0800
From: Jonathan Walther <krooger@debian.org>
To: wikitech-l@wikipedia.org
Subject: Re: [Wikitech-l] Re: Wikitech-l digest, Vol 1
#329 - 13 msgs
Reply-To: wikitech-l@wikipedia.org
--zYM0uCDKw75PZbzx
Content-Type: text/plain; charset=us-ascii;
format=flowed
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Fri, Jan 10, 2003 at 12:28:53PM -0800, Brion Vibber
wrote:
>What would you prefer? That we tell Anthere to take a
hike and buy a
new
>computer? That I petition my uni to upgrade hundreds
of machines in
>their labs? That we ignore similar conditions across
the world where
>people have old machines or machines they cannot
control and tell them,
>hey, fuck off, Wikipedia's not for you you whiny
bitch?
Your examples are legitimate. How would you feel if
there was a user option to edit in "broken UTF-8
mode"? Then when you edited a page, you could insert
some markup to put in non-ASCII characters. I don't
know what the best way to do this would be; I am
guessing something like \xAB\xCD where \x means "an 8
bit value in hexadecimal representation follows". If
you have any other ideas, let me know.
Jonathan

This could not make it in french. We have accentuated
letters in an awful number of words. That would make
editing very difficult.

When is Jimbo coming back from holidays ?

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
Re: UTF [ In reply to ]
On Sunday 12 January 2003 03:23, Anthere wrote:
> Your examples are legitimate. How would you feel if
> there was a user option to edit in "broken UTF-8
> mode"? Then when you edited a page, you could insert
> some markup to put in non-ASCII characters. I don't
> know what the best way to do this would be; I am
> guessing something like \xAB\xCD where \x means "an 8
> bit value in hexadecimal representation follows". If
> you have any other ideas, let me know.
> Jonathan
>
> This could not make it in french. We have accentuated
> letters in an awful number of words. That would make
> editing very difficult.

If the character is between 128 and 255 inclusive, present it as a single
byte. If it's Greek, give the HTML character name. Else turn it into a
number.

We could have a preference for what encoding to use on edit screens. Any
character not in that encoding is represented as a number, unless it's Greek
or for some reason has a character name.

phma
Re: Re: Wikitech-l digest, Vol 1 #332 - 14 msgs [ In reply to ]
On Sun, Jan 12, 2003 at 12:23:06AM -0800, Anthere wrote:
>>Your examples are legitimate. How would you feel if
>>there was a user option to edit in "broken UTF-8
>>mode"? Then when you edited a page, you could insert
>>some markup to put in non-ASCII characters. I don't
>>know what the best way to do this would be; I am
>>guessing something like \xAB\xCD where \x means "an 8
>>bit value in hexadecimal representation follows". If
>>you have any other ideas, let me know.
>
>This could not make it in french. We have accentuated
>letters in an awful number of words. That would make
>editing very difficult.

What if we had a TeX mode for diacritics? TeX makes it fairly easy to
put diacritic marks over and under letters. This isn't meant to be the
default editing mode, just a mode for people with broken browsers.

Jonathan

--
Geek House Productions, Ltd.

Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998

Phone: 604-435-1205
Email: djw@reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2
Re: UTF [ In reply to ]
On Sun, 12 Jan 2003 03:32:20 -0500, Pierre Abbat
<phma=ce9h4FcxEoVIf6P1QZMOBw@public.gmane.org> wrote:


> If the character is between 128 and 255 inclusive, present it as a single
> byte. If it's Greek, give the HTML character name. Else turn it into a
> number.
Actually, if it's between 128 and 159, reject it outright. Characters with
bytecodes between those values have no meaning on the web at all.
Unfortunately, they have meaning in the default"Windows" character set, so
a certain Word processor from a very large software corporation with a poor
reputation litters its documents with #146, #147 etc in the guise of "smart
quotes", and these fail to render on some good browsers. Perhaps the input
processor could clean the text, replacing these characters with unicode
equivalents via a lookup table?

--
Richard Grevers
Re: Re: Wikitech-l digest, Vol 1 #332 - 14 msgs [ In reply to ]
On Sun, Jan 12, 2003 at 12:23:06AM -0800, Anthere wrote:
>
> > On Fri, Jan 10, 2003 at 12:28:53PM -0800, Brion Vibber
> > wrote:
> > >What would you prefer? That we tell Anthere to take a
> > hike and buy a
> > new
> > >computer? That I petition my uni to upgrade hundreds
> > of machines in
> > >their labs? That we ignore similar conditions across
> > the world where
> > >people have old machines or machines they cannot
> > control and tell them,
> > >hey, fuck off, Wikipedia's not for you you whiny
> > bitch?
> > Your examples are legitimate. How would you feel if
> > there was a user option to edit in "broken UTF-8
> > mode"? Then when you edited a page, you could insert
> > some markup to put in non-ASCII characters. I don't
> > know what the best way to do this would be; I am
> > guessing something like \xAB\xCD where \x means "an 8
> > bit value in hexadecimal representation follows". If
> > you have any other ideas, let me know.
> Jonathan
>
> This could not make it in french. We have accentuated
> letters in an awful number of words. That would make
> editing very difficult.
>

French accentuated letters are part of Latin-1, they
could be edited directly. Only more foreign alphabetes
would suffer. In an article about Lech Walesa you could
not directly input the stroken-through "l" of his last
name but would see something like &#322;

See for example
http://www.wikipedia.org/w/wiki.phtml?title=List_of_Polish_prime_ministers
http://www.wikipedia.org/w/wiki.phtml?title=China
for a articles on the English wikipedia having those
kind of letters.

A UTF-8 capable browser would present all letters
directly in the edit window. A Non-UTF-8-browser
would present the accentuated characters in a
numeric form like it's already used in parts of the
english wikipedia.

Making UTF-8 edits an option would ease the life of
people working on Asian, eastern European, Hebrew ...
topics while still allowing edits by older browsers.

Best regards,

JeLuF