Mailing List Archive

Why not listen to W3C ? Invalid HTML on www.wikipedia.org !
Wikipages generated by the server do not follow W3C recommmendations.
Now that we eventually have an open standard for HTML, why not use it?

One of the reasons why http://www.wikipedia.org does not validate is that
the character-set is not specified, you should include a line like this:

<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

You can easily see what other errors are made, when you type in
an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for
example is not Valid HTML 4.01 Transitional! Below are the results of
attempting to parse this document with an SGML parser.

1.Line 7, column 7: required attribute "TYPE" not specified.

<SCRIPT>
^

2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.

<th colspan="2" align=center><big> Selected Articles </big></th></tr>
^

Kins regards,
Pieter Suurmond
Re: Why not listen to W3C ? Invalid HTML on www.wikipedia.org ! [ In reply to ]
On lun, 2003-01-20 at 16:22, Pieter Suurmond wrote:
> Wikipages generated by the server do not follow W3C recommmendations.
> Now that we eventually have an open standard for HTML, why not use it?

Forgetfulness?

> One of the reasons why http://www.wikipedia.org does not validate is that
> the character-set is not specified, you should include a line like this:
>
> <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">

This is set in the http headers. I assume you were validating a copy
that lacks the header?

> You can easily see what other errors are made, when you type in
> an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for
> example is not Valid HTML 4.01 Transitional! Below are the results of
> attempting to parse this document with an SGML parser.
>
> 1.Line 7, column 7: required attribute "TYPE" not specified.
>
> <SCRIPT>
> ^

Grr... I blame Magnus. Fixed.

> 2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.
>
> <th colspan="2" align=center><big> Selected Articles </big></th></tr>
> ^

That's an error in the wiki page; our parser can correct some errors,
but isn't smart enough to fix all of them. Fixed.

The front page now validates.

-- brion vibber (brion @ pobox.com)
Re: Why not listen to W3C ? Invalid HTML on www.wikipedia.org ! [ In reply to ]
On Mon, Jan 20, 2003 at 05:31:02PM -0800, Brion Vibber wrote:
> That's an error in the wiki page; our parser can correct some errors,
> but isn't smart enough to fix all of them. Fixed.
>
> The front page now validates.
>
> -- brion vibber (brion @ pobox.com)

Is it possible to run some script to check all Wiki pages ?
Most of problems will be silly things like missing <tr> and it may help
to fix them.

It will also help for switching to XHTML (+ MathML) some day.
XHTML parsers are much less forgiving than HTML parsers.
Re: Thanks for W3C-valid HTML 4.01 so quickly! [ In reply to ]
That's great Brion Vibber!
Now http://www.wikipedia.org/ is indeed 100% valid.
The Dutch site still contains some errors but that's
mainly due to things like "оÑ?Ñ?иÑ?">Russkiy</a>"
on http://nl.wikipedia.org. I'll try to repair those
manually...
I've another whish: could a script on the wikiserver
do automatic character-to-entity-conversion like?:
à --> &agrave;
ë --> &euml;
Well, I'm only suggesting... Anyhow, thanks
for fixing the English front page so very quickly!

Pieter Suurmond
[Let's promote [the use of] Open Standards like W3C]


Brion Vibber wrote:
>
> On lun, 2003-01-20 at 16:22, Pieter Suurmond wrote:
> > Wikipages generated by the server do not follow W3C recommmendations.
> > Now that we eventually have an open standard for HTML, why not use it?
>
> Forgetfulness?
>
> > One of the reasons why http://www.wikipedia.org does not validate is that
> > the character-set is not specified, you should include a line like this:
> >
> > <META HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1">
>
> This is set in the http headers. I assume you were validating a copy
> that lacks the header?
>
> > You can easily see what other errors are made, when you type in
> > an URI at http://validator.w3.org/. Page "http://www.wikipedia.org/" for
> > example is not Valid HTML 4.01 Transitional! Below are the results of
> > attempting to parse this document with an SGML parser.
> >
> > 1.Line 7, column 7: required attribute "TYPE" not specified.
> >
> > <SCRIPT>
> > ^
>
> Grr... I blame Magnus. Fixed.
>
> > 2.Line 87, column 28: start tag for "TR" omitted, but its declaration does not permit this.
> >
> > <th colspan="2" align=center><big> Selected Articles </big></th></tr>
> > ^
>
> That's an error in the wiki page; our parser can correct some errors,
> but isn't smart enough to fix all of them. Fixed.
>
> The front page now validates.
>
> -- brion vibber (brion @ pobox.com)
>
> ------------------------------------------------------------------------------------------------------------------------
> Name: signature.asc
> signature.asc Type: application/pgp-signature
> Description: This is a digitally signed message part
Re: Thanks for W3C-valid HTML 4.01 so quickly! [ In reply to ]
On lun, 2003-01-20 at 18:53, Pieter Suurmond wrote:
> That's great Brion Vibber!
> Now http://www.wikipedia.org/ is indeed 100% valid.
> The Dutch site still contains some errors but that's
> mainly due to things like "оÑ?Ñ?иÑ?">Russkiy</a>"
> on http://nl.wikipedia.org. I'll try to repair those
> manually...

Another problem is the inclusion of external links with ampersands in
them. Strictly speaking, ampersands in links need to be '&amp;' instead
of '&' because entity interpretation is done before the tag attributes
are; all our generated links take this into account (I think!), but we
don't presently provide such conversion for inline external links in a
wiki page (in part because we don't know what the author intended...)

It would I think not be unreasonable for the wiki parser to
automatically convert &s not followed by an entity body to &amp; ...
should we things that do look like entities to go through intact,
though? Or just escape all &s? (Upside: simple, consistent behavior.
Downside: inconsistent with wikilinks, where entities are allowed.)

> I've another whish: could a script on the wikiserver
> do automatic character-to-entity-conversion like?:
> à --> &agrave;
> ë --> &euml;

I'd rather do the other way around, convert input entities to real
characters and keep them that way; entities are bandwidth hogs and not
really particularly helpful. Text on the Chinese and Japanese wikis for
instance would take about 3 times the bandwidth they presently do using
numeric entities instead of UTF-8.

If you want to see the names of the entities in the textarea (so as to
avoid the editors that damage non-ASCII text), they have to be escaped
again with an &amp;, and any text that uses non-trivial amounts of
non-ASCII characters becomes illegible. As an option and for known
unfriendly user agents it may be helpful, but I'd avoid it if I could.

> Well, I'm only suggesting... Anyhow, thanks
> for fixing the English front page so very quickly!

You're welcome!

-- brion vibber (brion @ pobox.com)