I think the current handling of interlanguage links is problematic and
not very scalable. If we have n copies of an article, we need need n*n-1
interlanguage links. For 10 languages, that would be 90 links! All of
these links have to be added to separate pages, by people speaking
different languages, who often don't even have an account on the
Wikipedia in question.
As should be obvious, we are already missing interlanguage links for
many, if not most, of the translations we have.
The scalable solution requires us to have a meta-table for interlanguage
links that can be accessed by all Wikipedias. This table could look like
this:
language1 article1 language1 article2
------------------------------------------------------------
en Main Page de Hauptseite
fr Accueil en Main Page
fr Accueil es Portada
...
Let's call it shared.ilinks for the moment.
Instead of adding interlanguage links on top of articles, we would have
a separate text line below article bodies:
Interlanguage links (syntax: [[<code>:<article name>]])
The syntax would remain the same so that the link line can be cut and
pasted from the body. But this line would not be stored in that form in
the database.
Display of interlanguage links
------------------------------
Say I visit [[Main Page]] on en.wikipedia.org. Now, in order to show the
list of links, the shared.ilinks table is queried:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")
That is, a single SELECT allows us to find all translations of the word
"Main Page". But don't we only save relatively little time, as we still
have to tell *every* Wikipedia that homepage means "Main Page" in
English? No, because we can now leave this to the code.
When a user edits a page, the same list of links is generated, but this
time in the wiki syntax ([[fr:Accueil]] [[de:Hauptseite]] and so on).
This can be edited by anyone. When the list has been edited, and the
page is saved, the following is done:
1)
The same SELECT as above is run:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")
2)
Now, for each translation we get, another similar SELECT is run, so that
we find further translations into other languages.
3)
Every new translation we discover is stored in a new English (in our
example)/<new translation> table row, so that we can do the quick,
one-time SELECT to display the interlanguage links.
The result: If we have a page in 10 translations, the minimum effort we
have to go to is to add exactly one translation on every language
Wikipedia. That is, a minimum of 9 as opposed to 90 links! The other
translations are automatically discovered.
Example:
Someone creates a new page about Phil Collins on fr.wikipedia.org. This
person knows that there's already an English page about him on
en.wikipedia.org, so they type [[en:]] (suggested short syntax for "same
name as here"). "fr:Phil Collins->en:Phil Collins" is inserted into the
shared.ilinks table. This already means that the link is also shown on
en.wikipedia.org. But it gets better: Now someone on de.wikipedia.org
creates a Phil Collins page as well. He links to en.wikipedia.org's
[[en:]] entry. Zap!, after saving the entry, the French translation is
automatically discovered. Now the French translation has a link to the
German page and vice versa as well.
Editing links
-------------
What happens if the folks on fr.wikipedia.org move one of their pages?
The "Move this page" command now needs to automatically change every
instance of the page to something else (e.g. Accueil->Homepage) in the
shared.ilinks table.
What happens if someone on en.wikipedia.org decides that they do not
want to link to a page on nl.wikipedia.org because it contains obsolete
information, or because of "link-vandalism"? To unilaterally remove a
link to one translation, there would have to be a special interlanguage
link, like [[nl::]]. When saved, the link would be cleared and not
"rediscovered" until someone removed the [[nl::]] link. Such empty links
would not be copied.
If [[nl:Hoofdpagina]] is deleted, all instances of it in the
shared.ilinks table are removed as well.
What about links where there is no 1:1 relationship? Say I have a page
about "evolution" and "theory of evolution" on one wiki (English) and
only a page about "evolution" on another (French). So I add the
following to en.wikipedia.org on both pages:
[[fr:Théorie de l'évolution]]
In the shared.ilinks table, I therefore get entries:
Evolution Théorie de l'évolution
Theory of Evolution Théorie de l'évolution
When I visit the "Evolution" page, I get a clear match: Théorie de
l'évolution. But when I visit the "Théorie de l'évolution", I get two
matches. In this case, we could actually show both links on the French
page:
English: [1],[2]
Or in edit mode:
[[en:Evolution]][[en:Theory of Evolution]]
It may not be desirable to autocopy these duplicate links. So, if we
cannot discover an exact match, we may want to wait until someone
specifies a precise translation.
Discussion
----------
The process described above is complex from a technical perspective,
because it has to be respected during all changes to articles (move,
delete, edit etc.) It also requires us to run a separate database server
specifically for this shared information. There may be scenarios that I
have not yet covered in the above proposal, although I am sure solutions
can be found for every problem.
There are numerous advantages to this approach. Compared with the
current handling, we should quickly get an accurate representation of
interlanguage links on all wikis. We do not have to pick a single
language as "key" language, which would require a key entry in that
language to exist for all pages. [1]
There may be simpler solutions that I cannot see - if so, I would love
to hear about them. But I really think we should consider redesigning
the interlanguage links before the problem grows out of control.
Regards,
Erik
[1] Although that would expose us to charges of anglocentrism, I am open
to discussing this alternative.
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
not very scalable. If we have n copies of an article, we need need n*n-1
interlanguage links. For 10 languages, that would be 90 links! All of
these links have to be added to separate pages, by people speaking
different languages, who often don't even have an account on the
Wikipedia in question.
As should be obvious, we are already missing interlanguage links for
many, if not most, of the translations we have.
The scalable solution requires us to have a meta-table for interlanguage
links that can be accessed by all Wikipedias. This table could look like
this:
language1 article1 language1 article2
------------------------------------------------------------
en Main Page de Hauptseite
fr Accueil en Main Page
fr Accueil es Portada
...
Let's call it shared.ilinks for the moment.
Instead of adding interlanguage links on top of articles, we would have
a separate text line below article bodies:
Interlanguage links (syntax: [[<code>:<article name>]])
The syntax would remain the same so that the link line can be cut and
pasted from the body. But this line would not be stored in that form in
the database.
Display of interlanguage links
------------------------------
Say I visit [[Main Page]] on en.wikipedia.org. Now, in order to show the
list of links, the shared.ilinks table is queried:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")
That is, a single SELECT allows us to find all translations of the word
"Main Page". But don't we only save relatively little time, as we still
have to tell *every* Wikipedia that homepage means "Main Page" in
English? No, because we can now leave this to the code.
When a user edits a page, the same list of links is generated, but this
time in the wiki syntax ([[fr:Accueil]] [[de:Hauptseite]] and so on).
This can be edited by anyone. When the list has been edited, and the
page is saved, the following is done:
1)
The same SELECT as above is run:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")
2)
Now, for each translation we get, another similar SELECT is run, so that
we find further translations into other languages.
3)
Every new translation we discover is stored in a new English (in our
example)/<new translation> table row, so that we can do the quick,
one-time SELECT to display the interlanguage links.
The result: If we have a page in 10 translations, the minimum effort we
have to go to is to add exactly one translation on every language
Wikipedia. That is, a minimum of 9 as opposed to 90 links! The other
translations are automatically discovered.
Example:
Someone creates a new page about Phil Collins on fr.wikipedia.org. This
person knows that there's already an English page about him on
en.wikipedia.org, so they type [[en:]] (suggested short syntax for "same
name as here"). "fr:Phil Collins->en:Phil Collins" is inserted into the
shared.ilinks table. This already means that the link is also shown on
en.wikipedia.org. But it gets better: Now someone on de.wikipedia.org
creates a Phil Collins page as well. He links to en.wikipedia.org's
[[en:]] entry. Zap!, after saving the entry, the French translation is
automatically discovered. Now the French translation has a link to the
German page and vice versa as well.
Editing links
-------------
What happens if the folks on fr.wikipedia.org move one of their pages?
The "Move this page" command now needs to automatically change every
instance of the page to something else (e.g. Accueil->Homepage) in the
shared.ilinks table.
What happens if someone on en.wikipedia.org decides that they do not
want to link to a page on nl.wikipedia.org because it contains obsolete
information, or because of "link-vandalism"? To unilaterally remove a
link to one translation, there would have to be a special interlanguage
link, like [[nl::]]. When saved, the link would be cleared and not
"rediscovered" until someone removed the [[nl::]] link. Such empty links
would not be copied.
If [[nl:Hoofdpagina]] is deleted, all instances of it in the
shared.ilinks table are removed as well.
What about links where there is no 1:1 relationship? Say I have a page
about "evolution" and "theory of evolution" on one wiki (English) and
only a page about "evolution" on another (French). So I add the
following to en.wikipedia.org on both pages:
[[fr:Théorie de l'évolution]]
In the shared.ilinks table, I therefore get entries:
Evolution Théorie de l'évolution
Theory of Evolution Théorie de l'évolution
When I visit the "Evolution" page, I get a clear match: Théorie de
l'évolution. But when I visit the "Théorie de l'évolution", I get two
matches. In this case, we could actually show both links on the French
page:
English: [1],[2]
Or in edit mode:
[[en:Evolution]][[en:Theory of Evolution]]
It may not be desirable to autocopy these duplicate links. So, if we
cannot discover an exact match, we may want to wait until someone
specifies a precise translation.
Discussion
----------
The process described above is complex from a technical perspective,
because it has to be respected during all changes to articles (move,
delete, edit etc.) It also requires us to run a separate database server
specifically for this shared information. There may be scenarios that I
have not yet covered in the above proposal, although I am sure solutions
can be found for every problem.
There are numerous advantages to this approach. Compared with the
current handling, we should quickly get an accurate representation of
interlanguage links on all wikis. We do not have to pick a single
language as "key" language, which would require a key entry in that
language to exist for all pages. [1]
There may be simpler solutions that I cannot see - if so, I would love
to hear about them. But I really think we should consider redesigning
the interlanguage links before the problem grows out of control.
Regards,
Erik
[1] Although that would expose us to charges of anglocentrism, I am open
to discussing this alternative.
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de