Mailing List Archive

Article meta information proposal
<anotherReallyGreatIdea>

I have noticed that the LanguageLinks, while working great, start
looking a little odd in the articles; they are also hard to manage once
the need arises, and it will, as soon as more wikipedias are switched to
the new software. LanguageLinks should then be cross-checked and
maintained (semi-)automatically; otherwise, they might drift into utter
chaos.

Now, if anyone read my recent mail on wikipedia-l ("And now for
something completely different"), I actually argued for *even more*
links like that, for "technical categories" so to speak, like
"biography" etc.

So, why not add a new field to the database, for each article, where
LanguageLinks and the like can be "collected"? Just another text field
on the edit page, it could be parsed on each Save into a more
machine-readable style, and be "regenerated" into text for editing. Such
a field could well hold "This article is based on..." information as
well. And it could store tags like "stub", if we want to.

</anotherReallyGreatIdea>

Magnus
Re: Article meta information proposal [ In reply to ]
On Sun, Sep 08, 2002 at 06:16:08PM +0200, Magnus Manske wrote:
>
> So, why not add a new field to the database, for each article, where
> LanguageLinks and the like can be "collected"?

Please put the language links in a separate table, otherwise you are
violating the first normal form of database design theory. Any introductory
text on database design will tell you why that is bad.

I like the idea of links for categories ('biography', 'country',
'mathematical theorem', 'book', 'movie', ...) by the way.

-- Jan Hidders
Re: Article meta information proposal [ In reply to ]
Jan Hidders wrote:

>Magnus Manske wrote:
>
>>So, why not add a new field to the database, for each article, where
>>LanguageLinks and the like can be "collected"?
>>
>Please put the language links in a separate table, otherwise you are
>violating the first normal form of database design theory. Any introductory
>text on database design will tell you why that is bad.
>
I doubt that many of us have such a text in their libraries. OTOH
Wikipedia is a real database rather than a theoretical one so the rules
for theoretical databases shouldn't apply. :-)

Eclecticology
Re: Article meta information proposal [ In reply to ]
>
>
>> Please put the language links in a separate table, otherwise you are
>> violating the first normal form of database design theory. Any
>> introductory
>> text on database design will tell you why that is bad.
>>
> I doubt that many of us have such a text in their libraries. OTOH
> Wikipedia is a real database rather than a theoretical one so the
> rules for theoretical databases shouldn't apply. :-)

IMHO, a separate *database* would be even better than a separate
*table*. The database would then consist of a single table, with items
like :
lang_from, lang_to, title_from, title_to

If we'd store the language links in the "source" language database,
cross-referencing would have to compare two different databases. I am
uncertain if that can be done in a single SQL request; probably not.
Think of "all articles that link to the en: wikipedia, but are not
linked back from there". How would you do that if your links are
scattered in two dozen different databases?

Comments from the database wizard? :)

Magnus
Re: Re: Article meta information proposal [ In reply to ]
>> So, why not add a new field to the database, for each
>> article, where LanguageLinks and the like can be "collected"?

> Please put the language links in a separate table, otherwise
> you are violating the first normal form of database design
> theory. Any introductory text on database design will tell you
> why that is bad.

The Wikipedia database layout is already pretty far from
normalized (mainly because MySQL is too slow on joins), but if
I did add this feature I would make it a proper one-to-one
mapping table. I'm not yet convinced that it would be worth
the effort, though. I'm more inclined to think that the
international wikis should be more independent and encapsulated.

>I like the idea of links for categories ('biography', 'country',
>'mathematical theorem', 'book', 'movie', ...) by the way.

We discussed tings like that early on, and initially rejcted it
as an attempt to categorize articles in a non-wiki way; we wanted
different organization schemes to evolve out of normal wiki
editing and linking, rather than imposing order from the outside.

But it might be time to revisit the idea, because at 2,500 edits
a day and growing, it's just no longer possible for one person to
keep track of edits to articles he's interested in, and subject
is the only filter that really makes sense for reducing that data
to a manageable level. I'd still like to see if we couldn't
build those subjects automatically in some way based on links in
the database.
Re: Article meta information proposal [ In reply to ]
Magnus Manske wrote:

> lcrocker@nupedia.com wrote:
>
>> I'm not yet convinced that it would be worth
>> the effort, though. I'm more inclined to think that the
>> international wikis should be more independent and encapsulated.
>>
> If someone made a link from de: to en:, I (and many other Germans, I'm
> sure) would very much like for a link from en: to de: to be added
> automatically, if there isn't one already.

The automatic linking would seem to have one big problem: translation.
Depending on the machine translation of article title could give
frighteningly unpredictable results. At the English word "law" my
English/German dictionary gives "Gesetz". "Recht", "Jura", and
"Jurisprudenz" as possible translations. A native German speaking human
may have no problem sorting this out, but a machine has only a 25%
chance of getting it right.

A possible solution might be have the source language term as an article
title in the target language Wikipedia, perhaps titled in the form
[[Topic (source language)]] to avoid inadvertant ambiguation. The
source language titled article would be nothing more than a redirect to
the correctly titled article in the target language, or (in the absence
of such an article to a single article perhaps named [[Articles to be
translated from (source language)]]. where all of these would be listed.

All this of course depends on whether the idea is technically
implementable in the first place.

Eclecticology

>
Re: Re: Article meta information proposal [ In reply to ]
>> Please put the language links in a separate table, otherwise
>> you are violating the first normal form of database design
>> theory. Any introductory text on database design will tell you
>> why that is bad.

> I doubt that many of us have such a text in their libraries.
> OTOH Wikipedia is a real database rather than a theoretical one
> so the rules for theoretical databases shouldn't apply. :-)

I don't have Jan's experience, but I do have Date's book on my
shelf, and a few others, and I do take database design seriously.
Unfortuneately, history and performance sometimes force decisions
that one might not have otherwise made for data integrity reasons.
Re: Article meta information proposal [ In reply to ]
lcrocker@nupedia.com wrote:

>I'm not yet convinced that it would be worth
>the effort, though. I'm more inclined to think that the
>international wikis should be more independent and encapsulated.
>
On the de: wikipedia, they're adding en: links like crazy. eo: and pl:
are also very popular ;)
If someone made a link from de: to en:, I (and many other Germans, I'm
sure) would very much like for a link from en: to de: to be added
automatically, if there isn't one already.
Yes, you don't like the software altering article text, but it wouldn't
be altering, it would be the special case "appending", which should be
safe enough.
Also, we'd like to automatically list all the pages that don't link to
en:, for example; either to insert the link, or to write the en:
counterpart, based on the German article.
By the way, there's about a dozen or so links in the German "links"
table that start with "en:"...
(SELECT bl_to FROM brokenlinks WHERE bl_to LIKE "en:%")

>>I like the idea of links for categories ('biography', 'country',
>>'mathematical theorem', 'book', 'movie', ...) by the way.
>>
>>
>
>We discussed tings like that early on, and initially rejcted it
>as an attempt to categorize articles in a non-wiki way; we wanted
>different organization schemes to evolve out of normal wiki
>editing and linking, rather than imposing order from the outside.
>
As far as I remember, the categories we discussed were more like
"Biology" or "Chemistry". What I suggest is a more technical thing, like
"biography", "year", "day", "city", "book", "movie", etc. There could
also be tags like "plant" and "animal", because these are easy to tell.
Wether an article belongs to "Chemistry", "Biology", "Biochemistry",
"Genetics" or "Molecular Biology" is *way* harder to tell.

The obvious way (to me;) here is a new namespace, one that "doesn't
exist", but is generated on-the-fly, like "special:". I suggest "type:".

Magnus
Re: Article meta information proposal [ In reply to ]
Ray Saintonge wrote:

> The automatic linking would seem to have one big problem:
> translation. Depending on the machine translation of article title
> could give frighteningly unpredictable results. At the English word
> "law" my English/German dictionary gives "Gesetz". "Recht", "Jura",
> and "Jurisprudenz" as possible translations. A native German speaking
> human may have no problem sorting this out, but a machine has only a
> 25% chance of getting it right.
> A possible solution might be have the source language term as an
> article title in the target language Wikipedia, perhaps titled in the
> form [[Topic (source language)]] to avoid inadvertant ambiguation.
> The source language titled article would be nothing more than a
> redirect to the correctly titled article in the target language, or
> (in the absence of such an article to a single article perhaps named
> [[Articles to be translated from (source language)]]. where all of
> these would be listed.

Nope. I mean that if, in German, [[Zeug]] links to [[en:Stuff]], _then_
the en: article [[Stuff]] should automatically link back to [[de:Zeug]].
Now, if someone linked the [[Stuff]] article to its Esperanto
counterpart, then the eo: article should link back to [[Stuff]], and
"following" the language links from there, link to [[Zeug]] as well,
which then would link back to the eo: article as well.

Basically, it should just logically complete a "network" that is already
up manually, but in parts.
Re: Re: Article meta information proposal [ In reply to ]
> Nope. I mean that if, in German, [[Zeug]] links to [[en:Stuff]],
> _then_ the en: article [[Stuff]] should automatically link back
> to [[de:Zeug]]. Now, if someone linked the [[Stuff]] article to
> its Esperanto counterpart, then the eo: article should link back
> to [[Stuff]], and "following" the language links from there, link
> to [[Zeug]] as well, which then would link back to the eo: article
> as well.

I think Ray understood what you were suggesting; I think you
missed what he was saying--namely, that it is an incorrect
assumption that because English article X links to German
article Y, that therefore German article Y should link back
to English article X. Different wikis will divide up the
space of ideas differently, as do different languages and
different cultures. Disambiguation pages will have entirely
different sets of links in each language, yet it still might
make sense for a language link to go to a disambiguation page
if there's some overlap. It's very likely that many one-page
topics in the foreign wikis will be split into several pages
on the English wiki, and some topics will be split further on
foreign wikis than they are on the English one. At any rate,
the simplistic one-to-one link idea will probably create as
many problems as it solves.
Re: Re: Article meta information proposal [ In reply to ]
On Mon, Sep 09, 2002 at 10:30:50AM -0700, lcrocker@nupedia.com wrote:
>
> > Please put the language links in a separate table, otherwise you are
> > violating the first normal form of database design theory. Any
> > introductory text on database design will tell you why that is bad.
>
> The Wikipedia database layout is already pretty far from
> normalized (mainly because MySQL is too slow on joins), but if
> I did add this feature I would make it a proper one-to-one
> mapping table.

Ok. It's just that denormalization is a bad idea in this case because if
you have a "two-way access pattern" here, i.e., you are going to look up the
Bs for a certain A and the As for a certain B. Nesting the Bs with an A or the
As with a B is then almost always a bad idea.

> I'm not yet convinced that it would be worth the effort, though. I'm more
> inclined to think that the international wikis should be more independent
> and encapsulated.

Why? Until now this approach has resulted in the non-English Wikipedias
feeling highly neglected. I still wonder what Jimbo's opinion on this is.

> >I like the idea of links for categories ('biography', 'country',
> >'mathematical theorem', 'book', 'movie', ...) by the way.
>
> We discussed tings like that early on, and initially rejcted it
> as an attempt to categorize articles in a non-wiki way; we wanted
> different organization schemes to evolve out of normal wiki
> editing and linking, rather than imposing order from the outside.

It should of course be as non-constraining, open and as editable as
possible. All I'm thinking of here is an extra edit field called "category"
that will result in a link to "category:book" if you typed in "book". On
that page we could then put tips, tricks, hints, policies et cetera on book
descriptions. If the category doesn't exist you get a ... big surprise ...
edit link.

> But it might be time to revisit the idea, because at 2,500 edits
> a day and growing, it's just no longer possible for one person to
> keep track of edits to articles he's interested in,

That reminds me. Whatever happend to the code for grouping edits on the
recent changes page? I remember you made a test site and then ... nothing?

> and subject is the only filter that really makes sense for reducing that
> data to a manageable level. I'd still like to see if we couldn't build
> those subjects automatically in some way based on links in the database.

Ah, you mean a real subject hierarchy, like in ... er ... that other big
encyclopedia, so you could simply watch a subject area? I don't see how that
can be derived. Having an editable list if "subject areas" for articles
would be great. It could give us a nice subject tree that you could navigate
through. Actually I never liked having both a description of mathematics and
a list of subjects in the same article.

-- Jan Hidders
Re: Re: Re: Article meta information proposal [ In reply to ]
> That reminds me. Whatever happend to the code for grouping edits
> on the recent changes page? I remember you made a test site and
> then ... nothing?

I'm still not happy with my latest code, and I've been working
on other matters. But I'll get back to that shortly, because I
think I have a solution to the one thing I didn't like.