Mailing List Archive: Why not listen to W3C ? Invalid HTML on www.wikipedia.org !

Why not listen to W3C ? Invalid HTML on www.wikipedia.org !

Jan 20, 2003, 5:22 PM

Post #1 of 5 (569 views)

Wikipages generated by the server do not follow W3C recommmendations.
Now that we eventually have an open standard for HTML, why not use it?

One of the reasons why http://www.wikipedia.org does not validate is that
the character-set is not specified, you should include a line like this:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

You can easily see what other errors are made, when you type in
an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for
example is not Valid HTML 4.01 Transitional! Below are the results of
attempting to parse this document with an SGML parser.

1.Line 7, column 7: required attribute "TYPE" not specified.

<SCRIPT>
^

2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.

<th colspan="2" align=center><big> Selected Articles </big></th></tr>
^

Kins regards,
Pieter Suurmond

Re: Why not listen to W3C ? Invalid HTML on www.wikipedia.org ! [ In reply to ]

brion at pobox

Jan 20, 2003, 6:31 PM

Post #2 of 5 (569 views)

Permalink

On lun, 2003-01-20 at 16:22, Pieter Suurmond wrote:
> Wikipages generated by the server do not follow W3C recommmendations.
> Now that we eventually have an open standard for HTML, why not use it?

Forgetfulness?

> One of the reasons why http://www.wikipedia.org does not validate is that
> the character-set is not specified, you should include a line like this:
>
> <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

This is set in the http headers. I assume you were validating a copy
that lacks the header?

> You can easily see what other errors are made, when you type in
> an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for
> example is not Valid HTML 4.01 Transitional! Below are the results of
> attempting to parse this document with an SGML parser.
>
> 1.Line 7, column 7: required attribute "TYPE" not specified.
>
> <SCRIPT>
> ^

Grr... I blame Magnus. Fixed.

> 2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.
>
> <th colspan="2" align=center><big> Selected Articles </big></th></tr>
> ^

That's an error in the wiki page; our parser can correct some errors,
but isn't smart enough to fix all of them. Fixed.

The front page now validates.

-- brion vibber (brion @ pobox.com)

Re: Why not listen to W3C ? Invalid HTML on www.wikipedia.org ! [ In reply to ]

taw at users

Jan 20, 2003, 7:33 PM

Post #3 of 5 (565 views)

Permalink

On Mon, Jan 20, 2003 at 05:31:02PM -0800, Brion Vibber wrote:
> That's an error in the wiki page; our parser can correct some errors,
> but isn't smart enough to fix all of them. Fixed.
>
> The front page now validates.
>
> -- brion vibber (brion @ pobox.com)

Is it possible to run some script to check all Wiki pages ?
Most of problems will be silly things like missing <tr> and it may help
to fix them.

It will also help for switching to XHTML (+ MathML) some day.
XHTML parsers are much less forgiving than HTML parsers.

Re: Thanks for W3C-valid HTML 4.01 so quickly! [ In reply to ]

pieter at kmt

Jan 20, 2003, 7:53 PM

Post #4 of 5 (569 views)

Permalink

That's great Brion Vibber!
Now http://www.wikipedia.org/ is indeed 100% valid.
The Dutch site still contains some errors but that's
mainly due to things like "Ð¾Ñ?Ñ?Ð¸Ñ?">Russkiy</a>"
on http://nl.wikipedia.org. I'll try to repair those
manually...
I've another whish: could a script on the wikiserver
do automatic character-to-entity-conversion like?:
à --> à
ë --> ë
Well, I'm only suggesting... Anyhow, thanks
for fixing the English front page so very quickly!

Pieter Suurmond
[Let's promote [the use of] Open Standards like W3C]

Brion Vibber wrote:
>
> On lun, 2003-01-20 at 16:22, Pieter Suurmond wrote:
> > Wikipages generated by the server do not follow W3C recommmendations.
> > Now that we eventually have an open standard for HTML, why not use it?
>
> Forgetfulness?
>
> > One of the reasons why http://www.wikipedia.org does not validate is that
> > the character-set is not specified, you should include a line like this:
> >
> > <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">
>
> This is set in the http headers. I assume you were validating a copy
> that lacks the header?
>
> > You can easily see what other errors are made, when you type in
> > an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for
> > example is not Valid HTML 4.01 Transitional! Below are the results of
> > attempting to parse this document with an SGML parser.
> >
> > 1.Line 7, column 7: required attribute "TYPE" not specified.
> >
> > <SCRIPT>
> > ^
>
> Grr... I blame Magnus. Fixed.
>
> > 2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.
> >
> > <th colspan="2" align=center><big> Selected Articles </big></th></tr>
> > ^
>
> That's an error in the wiki page; our parser can correct some errors,
> but isn't smart enough to fix all of them. Fixed.
>
> The front page now validates.
>
> -- brion vibber (brion @ pobox.com)
>
> ------------------------------------------------------------------------------------------------------------------------
> Name: signature.asc
> signature.asc Type: application/pgp-signature
> Description: This is a digitally signed message part

Re: Thanks for W3C-valid HTML 4.01 so quickly! [ In reply to ]

brion at pobox

Jan 20, 2003, 8:37 PM

Post #5 of 5 (568 views)

Permalink

On lun, 2003-01-20 at 18:53, Pieter Suurmond wrote:
> That's great Brion Vibber!
> Now http://www.wikipedia.org/ is indeed 100% valid.
> The Dutch site still contains some errors but that's
> mainly due to things like "ÃÂ¾Ã‘?Ã‘?ÃÂ¸Ã‘?">Russkiy</a>"
> on http://nl.wikipedia.org. I'll try to repair those
> manually...

Another problem is the inclusion of external links with ampersands in
them. Strictly speaking, ampersands in links need to be '&' instead
of '&' because entity interpretation is done before the tag attributes
are; all our generated links take this into account (I think!), but we
don't presently provide such conversion for inline external links in a
wiki page (in part because we don't know what the author intended...)

It would I think not be unreasonable for the wiki parser to
automatically convert &s not followed by an entity body to & ...
should we things that do look like entities to go through intact,
though? Or just escape all &s? (Upside: simple, consistent behavior.
Downside: inconsistent with wikilinks, where entities are allowed.)

> I've another whish: could a script on the wikiserver
> do automatic character-to-entity-conversion like?:
> Ã --> à
> Ã« --> ë

I'd rather do the other way around, convert input entities to real
characters and keep them that way; entities are bandwidth hogs and not
really particularly helpful. Text on the Chinese and Japanese wikis for
instance would take about 3 times the bandwidth they presently do using
numeric entities instead of UTF-8.

If you want to see the names of the entities in the textarea (so as to
avoid the editors that damage non-ASCII text), they have to be escaped
again with an &, and any text that uses non-trivial amounts of
non-ASCII characters becomes illegible. As an option and for known
unfriendly user agents it may be helpful, but I'd avoid it if I could.

> Well, I'm only suggesting... Anyhow, thanks
> for fixing the English front page so very quickly!

You're welcome!

-- brion vibber (brion @ pobox.com)

Mailing List Archive

Attached Files:

Attached Files: