Mailing List Archive: Re: Wikitech-l digest, Vol 1 #329

Re: Wikitech-l digest, Vol 1 #329 - 13 msgs

Jan 10, 2003, 9:54 AM

Post #1 of 11 (963 views)

From: tarquin >Please, mac users, what do you use ???? >I Use Mac OS X 10.1 Chimera browser :-) Are you still on OS 9?

Yes. Still on OS 9. I have a G3 (imac graphite 3 years old). It would be possible, but technically not very reasonnable to upgrade it to X. I might increase the memory but the cadency would be too weak anyway.

Besides, I have no problem with any other application I use (and I already know some will likely not work under X), so I am not planning to ruin myself before at least a year, maybe 2. I would be curious to know, but I think the majority of macs are still under 9. Imacs were sold *a lot* about 2-3 years ago, and most wouldnot be under X.

>From: Daniel Mayer
>If it realy doesn't work, try upgrading your browser.
This is a totally wrong-headed way to look at things. Most net users have out-of-date browsers and we shouldn't expect them to upgrade to the latest version of their browser just to write articles. /Very/ unWiki.

Agreed
Users are not only people at home, on their personnal computer, that they might upgrade or not upgrade (because, they do not necessarily *know* how to do so), but also people working, on the business computer.
I work with a couple of big firms. To simplify administration, the tech team try to have all computers exactly on the same config, worldwide. Same os, same browser, same version. And NEVER the latest one. So all of the config will be supported by "old" 2/3 years old computers, and also because they prefer to wait till bugs are really found out. That means, many working people are not on the latest browsers, and can not change that themselves.
And though my country is not one of the poorest, most computers in schools and librairies are kept for nearly 10 years. What to say of other places ?

What you developers do is *very* important, the wikipedia concept entirely rely on you, and how suitable (readable and editable) you make it to the highest possible number of people. Not only people on the latest PC, with Internet Explorer 6. But also Mac users, and couple of years old hardware, and diactrics users, and near-blind users, and keyboard-without-easy-access-to-accentuated-letters, and modem-users, and 600*800 users. 'Cause we don't always have choice. Sorry, don't mean to hurt anybody, but programing is not an end, just a tool. And please don't say "here's the code, you're welcome, help yourself", I am a programing disabled-person ;-))

Well, if necessary, I think I could find figures for my country. Meanwhile, please don't switch the fr.wiki to UTF 8 before we all know what it will imply in terms of usability.

---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

brion at pobox

Jan 10, 2003, 12:22 PM

Post #2 of 11 (956 views)

Permalink

On ven, 2003-01-10 at 08:54, Anthere wrote:
> Meanwhile, please don't switch the fr.wiki to UTF 8 before we all know what it will imply in terms of usability.

My position continues to be that we should not switch the big Latin-1
wikis to UTF-8 until we have automatic conversion to handle common
broken browsers.

-- brion vibber (brion @ pobox.com)

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

taw at users

Jan 10, 2003, 12:45 PM

Post #3 of 11 (957 views)

Permalink

On Fri, Jan 10, 2003 at 11:22:43AM -0800, Brion Vibber wrote:
> On ven, 2003-01-10 at 08:54, Anthere wrote:
> > Meanwhile, please don't switch the fr.wiki to UTF 8 before we all know what it will imply in terms of usability.
>
> My position continues to be that we should not switch the big Latin-1
> wikis to UTF-8 until we have automatic conversion to handle common
> broken browsers.

All common browsers support UTF-8, so what do you mean by
"common broken browsers" ?

Autodetection won't work (all popular browsers, and most of less
popular ones, support UTF-8, and we don't want to break any of them,
list of broken browsers doesn't exist). There should be a link
"my browser is completely broken" which would set a cookie and
software, seeing that cookie, would convert page to "safer" version.

But what would that "safer" version be ?
* ISO-8859-1 + &codes;
* ISO-8859-1 + `?' marks
* ISO-8859-1 + rendered PNGs (do these browsers support PNG ?)
?

I don't think we should support editing that way.

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

brion at pobox

Jan 10, 2003, 1:28 PM

Post #4 of 11 (955 views)

Permalink

On ven, 2003-01-10 at 11:45, Tomasz Wegrzanowski wrote:
> On Fri, Jan 10, 2003 at 11:22:43AM -0800, Brion Vibber wrote:
> > My position continues to be that we should not switch the big Latin-1
> > wikis to UTF-8 until we have automatic conversion to handle common
> > broken browsers.
>
> All common browsers support UTF-8, so what do you mean by
> "common broken browsers" ?

Oh, how I wish that were true. *Current* versions of the most common
browsers support UTF-8, but older versions which are still used by
actual, real, Wikipedia contributors whose complaints reach my ears and
whose broken edits reach my eyes, are in fact broken.

"Common broken browsers" are the ones commonly turning up as broken. See
http://meta.wikipedia.org/wiki/Meta.wikipedia.org_technical_issues

This hits real people, including heavy contributors like Anthere; as for
myself, I can't edit UTF-8 pages on the Macs in my university's computer
lab without breaking the content, because they have only an old IE 5.0
and Netscape 4.x.

> Autodetection won't work (all popular browsers, and most of less
> popular ones, support UTF-8, and we don't want to break any of them,
> list of broken browsers doesn't exist).

Checking the 'Accept-charset' header plus a blacklist of known bad
user-agents should do well enough.

> There should be a link
> "my browser is completely broken" which would set a cookie and
> software, seeing that cookie, would convert page to "safer" version.

Better to do the safer thing by default when we know we're going to need
it.

> But what would that "safer" version be ?
> * ISO-8859-1 + &codes;

Ugly, but workable in most cases. (Or another base charset could be used
for some languages.) Automatic conversion of input &codes; to UTF-8
storage internally would additionally help with searching.

> * ISO-8859-1 + `?' marks
> * ISO-8859-1 + rendered PNGs (do these browsers support PNG ?)

That would be useless for editing, which is the problem. A browser that
won't _display_ UTF-8 text (whether able to show all the glyphs or not)
is damn near impossible to find. (Some shitty text browsers may manage
it.)

> I don't think we should support editing that way.

What would you prefer? That we tell Anthere to take a hike and buy a new
computer? That I petition my uni to upgrade hundreds of machines in
their labs? That we ignore similar conditions across the world where
people have old machines or machines they cannot control and tell them,
hey, fuck off, Wikipedia's not for you you whiny bitch?

*I don't think so.*

We should make at least a good-faith effort to make our site usable. I
don't mind so much of joe's buggy browser overlaps something in the
header from time to time or has an ugly border look, but if it's
damaging the content of our site because of bad interactions with
editing, than that hurts the very core of what a wiki is about.

-- brion vibber (brion @ pobox.com)

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

krooger at debian

Jan 11, 2003, 10:46 PM

Post #5 of 11 (961 views)

Permalink

On Fri, Jan 10, 2003 at 12:28:53PM -0800, Brion Vibber wrote:
>What would you prefer? That we tell Anthere to take a hike and buy a new
>computer? That I petition my uni to upgrade hundreds of machines in
>their labs? That we ignore similar conditions across the world where
>people have old machines or machines they cannot control and tell them,
>hey, fuck off, Wikipedia's not for you you whiny bitch?

Your examples are legitimate. How would you feel if there was a user
option to edit in "broken UTF-8 mode"? Then when you edited a page, you
could insert some markup to put in non-ASCII characters. I don't know
what the best way to do this would be; I am guessing something like
\xAB\xCD where \x means "an 8 bit value in hexadecimal representation
follows". If you have any other ideas, let me know.

Jonathan

--
Geek House Productions, Ltd.

Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998

Phone: 604-435-1205
Email: djw@reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

toby+wikipedia at math

Jan 12, 2003, 3:59 AM

Post #6 of 11 (958 views)

Permalink

Jonathan Walther wrote:

>Your examples are legitimate. How would you feel if there was a user
>option to edit in "broken UTF-8 mode"? Then when you edited a page, you
>could insert some markup to put in non-ASCII characters. I don't know
>what the best way to do this would be; I am guessing something like
>\xAB\xCD where \x means "an 8 bit value in hexadecimal representation
>follows". If you have any other ideas, let me know.

The specila markup could simply be the &#...; codes themselves.
Internally, we would store UTF-8, not an ampersand, semicolon, etc.
But these would be presented in the edit box as HTML entities
(names if they exist, numbers otherwise), when that option is checked.
(But you could *input* either HTML entities or direct UTF-8 regardless.)

Then we could even let [[fr:]] (say) choose to make that option the default,
while letting [[pl:]] (say) eschew it.

-- Toby

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

krooger at debian

Jan 12, 2003, 4:46 AM

Post #7 of 11 (956 views)

Permalink

On Sun, Jan 12, 2003 at 02:59:54AM -0800, Toby Bartels wrote:
>Jonathan Walther wrote:
>
>>Your examples are legitimate. How would you feel if there was a user
>>option to edit in "broken UTF-8 mode"? Then when you edited a page, you
>>could insert some markup to put in non-ASCII characters. I don't know
>>what the best way to do this would be; I am guessing something like
>>\xAB\xCD where \x means "an 8 bit value in hexadecimal representation
>>follows". If you have any other ideas, let me know.
>
>The specila markup could simply be the &#...; codes themselves.
>Internally, we would store UTF-8, not an ampersand, semicolon, etc.
>But these would be presented in the edit box as HTML entities
>(names if they exist, numbers otherwise), when that option is checked.
>(But you could *input* either HTML entities or direct UTF-8 regardless.)
>
>Then we could even let [[fr:]] (say) choose to make that option the default,
>while letting [[pl:]] (say) eschew it.

I like it. I second your proposal.

Jonathan

--
Geek House Productions, Ltd.

Providing Unix & Internet Contracting and Consulting,
QA Testing, Technical Documentation, Systems Design & Implementation,
General Programming, E-commerce, Web & Mail Services since 1998

Phone: 604-435-1205
Email: djw@reactor-core.org
Webpage: http://reactor-core.org
Address: 2459 E 41st Ave, Vancouver, BC V5R2W2

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

toby+wikipedia at math

Jan 12, 2003, 7:48 AM

Post #8 of 11 (960 views)

Permalink

I wrote:

>The special markup could simply be the &#...; codes themselves.
>Internally, we would store UTF-8, not an ampersand, semicolon, etc.
>But these would be presented in the edit box as HTML entities
>(names if they exist, numbers otherwise), when that option is checked.
>(But you could *input* either HTML entities or direct UTF-8 regardless.)

Actually, we need 3 options if everybody is to be satisfied:

Option Edit box presents as Edit box accepts input as
-----------------------------------------------------------------------------
UTF-8 UTF-8 UTF-8 &name; &#decimal; &#Xhexadecimal;
Latin-1 Latin-1 &name; &#decimal; Latin-1 &name; &#decimal; &#Xhexadecimal;
ASCII ASCII &name; &#decimal; Latin-1 &name; &#decimal; &#Xhexadecimal;

When presenting the edit box (middle column),
use the first version listed that applies to the character in question;
when accepting input from the edit box (last column),
accept anything that we get, with the default encoding listed.

>Then we could even let [[fr:]] (say) choose to make that option the default,
>while letting [[pl:]] (say) eschew it.

Presumably, of the above, [[pl:]] would set the default to UTF-8,
[[fr:]] would set the default to Latin-1 (for anthere's sake),
and [[en:]] would set the default to ASCII (form mav's sake ^_^).
But everything in the databases would be UTF-8 internally.

Of course, once the system is set up to run in general,
then it'll be no great trick to let [[pl:]] set their default to Latin-2 --
since it's a default of how to present Unicode to editors,
not a limitation on the available Unicode characters.

As for the forbidden numerical character entites from  to ,
we can interpret them as if they came from Micro$oft (most likely)
and convert them to whatever they should be (by table).
(If any other forbidden numerical entities have common nonstandard uses,
then we can adopt those as well as long as they translate to good Unicode.)

Gee, I guess that I'm getting involved in this after all.
But it now seems like there might be a solution, not just an argument!

-- Toby

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

taw at users

Jan 12, 2003, 8:07 AM

Post #9 of 11 (953 views)

Permalink

On Sun, Jan 12, 2003 at 06:48:08AM -0800, Toby Bartels wrote:
> As for the forbidden numerical character entites from  to ,
> we can interpret them as if they came from Micro$oft (most likely)
> and convert them to whatever they should be (by table).
> (If any other forbidden numerical entities have common nonstandard uses,
> then we can adopt those as well as long as they translate to good Unicode.)

They translate to Unicode 128-154.
Unicode 0-255 is identical with ISO-8859-1.

There are many other Unicode (and ISO-8859) characters that mean nothing,
so this is not a problem.

Re: UTF [ In reply to ]

phma at webjockey

Jan 12, 2003, 8:38 AM

Post #10 of 11 (957 views)

Permalink

On Sunday 12 January 2003 10:07, Tomasz Wegrzanowski wrote:
> On Sun, Jan 12, 2003 at 06:48:08AM -0800, Toby Bartels wrote:
> > As for the forbidden numerical character entites from  to ,
> > we can interpret them as if they came from Micro$oft (most likely)
> > and convert them to whatever they should be (by table).
> > (If any other forbidden numerical entities have common nonstandard uses,
> > then we can adopt those as well as long as they translate to good
> > Unicode.)
>
> They translate to Unicode 128-154.
> Unicode 0-255 is identical with ISO-8859-1.

The problem is that some contributors, apparently copy-pasting from some word
processor in Windows or from 1911, enter those characters as if they mean
something. Then I see an article with boxes in it, try to guess what the
boxes are supposed to be, and often get it wrong. 80-9f in ISO-8859-1 map to
0080-009f in Unicode, but they are invalid characters in both and display as
boxes, slugs, spaces, or nothing. (9f is 159, btw.)

phma

Re: Re: Wikitech-l digest, Vol 1 #329 - 13 msgs [ In reply to ]

brion at pobox

Jan 12, 2003, 1:25 PM

Post #11 of 11 (962 views)

Permalink

On dim, 2003-01-12 at 07:07, Tomasz Wegrzanowski wrote:
> On Sun, Jan 12, 2003 at 06:48:08AM -0800, Toby Bartels wrote:
> > As for the forbidden numerical character entites from  to ,
> > we can interpret them as if they came from Micro$oft (most likely)
> > and convert them to whatever they should be (by table).
> > (If any other forbidden numerical entities have common nonstandard uses,
> > then we can adopt those as well as long as they translate to good Unicode.)
>
> They translate to Unicode 128-154.
> Unicode 0-255 is identical with ISO-8859-1.
>
> There are many other Unicode (and ISO-8859) characters that mean nothing,
> so this is not a problem.

Well, it's a problem when people trying to write the euro symbol, the
French oe-ligature, or Slovene/Czech accented letters get mysterious
high control characters instead of the characters that they typed
legally in CP1252 (even though their browser shouldn't have given them
the option, since it was told to use ISO 8859-1, it's not the users'
fault).

I'd rather silently do input conversion from CP1252 to UTF-8 (thus
preserving those nasty Microsoft extentions as good Unicode characters),
and output conversion to ISO 8859-1 to keep with standards.

There's no legitimate use of ISO 8859-1's or Unicode's 128-154 range
that I know of except conceivably in terminal control. In plaintext on
the web, they're 100% useless, so if they show up it's safe to assume
they're really CP1252.

-- brion vibber (brion @ pobox.com)

Mailing List Archive

Attached Files:

Attached Files:

Attached Files: