Mailing List Archive

Interlanguage links redesign proposal
I think the current handling of interlanguage links is problematic and
not very scalable. If we have n copies of an article, we need need n*n-1
interlanguage links. For 10 languages, that would be 90 links! All of
these links have to be added to separate pages, by people speaking
different languages, who often don't even have an account on the
Wikipedia in question.

As should be obvious, we are already missing interlanguage links for
many, if not most, of the translations we have.

The scalable solution requires us to have a meta-table for interlanguage
links that can be accessed by all Wikipedias. This table could look like
this:


language1 article1 language1 article2
------------------------------------------------------------
en Main Page de Hauptseite
fr Accueil en Main Page
fr Accueil es Portada
...

Let's call it shared.ilinks for the moment.

Instead of adding interlanguage links on top of articles, we would have
a separate text line below article bodies:

Interlanguage links (syntax: [[<code>:<article name>]])

The syntax would remain the same so that the link line can be cut and
pasted from the body. But this line would not be stored in that form in
the database.

Display of interlanguage links
------------------------------
Say I visit [[Main Page]] on en.wikipedia.org. Now, in order to show the
list of links, the shared.ilinks table is queried:

SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")

That is, a single SELECT allows us to find all translations of the word
"Main Page". But don't we only save relatively little time, as we still
have to tell *every* Wikipedia that homepage means "Main Page" in
English? No, because we can now leave this to the code.

When a user edits a page, the same list of links is generated, but this
time in the wiki syntax ([[fr:Accueil]] [[de:Hauptseite]] and so on).
This can be edited by anyone. When the list has been edited, and the
page is saved, the following is done:

1)
The same SELECT as above is run:
SELECT * from shared.ilinks where (language1=en and article1="Main
Page") or (language2=en and article2="Main Page")

2)
Now, for each translation we get, another similar SELECT is run, so that
we find further translations into other languages.

3)
Every new translation we discover is stored in a new English (in our
example)/<new translation> table row, so that we can do the quick,
one-time SELECT to display the interlanguage links.

The result: If we have a page in 10 translations, the minimum effort we
have to go to is to add exactly one translation on every language
Wikipedia. That is, a minimum of 9 as opposed to 90 links! The other
translations are automatically discovered.

Example:
Someone creates a new page about Phil Collins on fr.wikipedia.org. This
person knows that there's already an English page about him on
en.wikipedia.org, so they type [[en:]] (suggested short syntax for "same
name as here"). "fr:Phil Collins->en:Phil Collins" is inserted into the
shared.ilinks table. This already means that the link is also shown on
en.wikipedia.org. But it gets better: Now someone on de.wikipedia.org
creates a Phil Collins page as well. He links to en.wikipedia.org's
[[en:]] entry. Zap!, after saving the entry, the French translation is
automatically discovered. Now the French translation has a link to the
German page and vice versa as well.

Editing links
-------------
What happens if the folks on fr.wikipedia.org move one of their pages?
The "Move this page" command now needs to automatically change every
instance of the page to something else (e.g. Accueil->Homepage) in the
shared.ilinks table.

What happens if someone on en.wikipedia.org decides that they do not
want to link to a page on nl.wikipedia.org because it contains obsolete
information, or because of "link-vandalism"? To unilaterally remove a
link to one translation, there would have to be a special interlanguage
link, like [[nl::]]. When saved, the link would be cleared and not
"rediscovered" until someone removed the [[nl::]] link. Such empty links
would not be copied.

If [[nl:Hoofdpagina]] is deleted, all instances of it in the
shared.ilinks table are removed as well.

What about links where there is no 1:1 relationship? Say I have a page
about "evolution" and "theory of evolution" on one wiki (English) and
only a page about "evolution" on another (French). So I add the
following to en.wikipedia.org on both pages:

[[fr:Théorie de l'évolution]]

In the shared.ilinks table, I therefore get entries:

Evolution Théorie de l'évolution
Theory of Evolution Théorie de l'évolution

When I visit the "Evolution" page, I get a clear match: Théorie de
l'évolution. But when I visit the "Théorie de l'évolution", I get two
matches. In this case, we could actually show both links on the French
page:

English: [1],[2]

Or in edit mode:

[[en:Evolution]][[en:Theory of Evolution]]

It may not be desirable to autocopy these duplicate links. So, if we
cannot discover an exact match, we may want to wait until someone
specifies a precise translation.

Discussion
----------
The process described above is complex from a technical perspective,
because it has to be respected during all changes to articles (move,
delete, edit etc.) It also requires us to run a separate database server
specifically for this shared information. There may be scenarios that I
have not yet covered in the above proposal, although I am sure solutions
can be found for every problem.

There are numerous advantages to this approach. Compared with the
current handling, we should quickly get an accurate representation of
interlanguage links on all wikis. We do not have to pick a single
language as "key" language, which would require a key entry in that
language to exist for all pages. [1]

There may be simpler solutions that I cannot see - if so, I would love
to hear about them. But I really think we should consider redesigning
the interlanguage links before the problem grows out of control.

Regards,

Erik

[1] Although that would expose us to charges of anglocentrism, I am open
to discussing this alternative.
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Re: Interlanguage links redesign proposal [ In reply to ]
> I think the current handling of interlanguage links is problematic and
> not very scalable. If we have n copies of an article, we need need n*n-1
> interlanguage links. For 10 languages, that would be 90 links! All of
> these links have to be added to separate pages, by people speaking
> different languages, who often don't even have an account on the
> Wikipedia in question.

It's not as bad as you make it appear here, those go to 10, not 90 different
pages. Still, the person who added the tenth language would, if (s)he did it
the 'proper' way, have to add a total of 19 links on a total of 10 different
pages. Not a desirable situation.

> As should be obvious, we are already missing interlanguage links for
> many, if not most, of the translations we have.

There's certainly already a wealth of missing back- and through-links
available, as well as a number of interlanguage links to non-existing pages.

> The scalable solution requires us to have a meta-table for interlanguage
> links that can be accessed by all Wikipedias. This table could look like
> this:
>
>
> language1 article1 language1 article2
> ------------------------------------------------------------
> en Main Page de Hauptseite
> fr Accueil en Main Page
> fr Accueil es Portada
> ...

My preference would be to have a different type of metatable, namely one
where each subject gets an indication (an English name, another name, or
just a number), and the articles are then stored by these (the 'key language
approach' as you call it). Your above case would then look:

1 en Main Page
1 de Hauptseite
1 fr Accueil
1 es Portada

or even:
1 [[en:Main Page]][[de:Hauptseite]][[fr:Accueil]][[es:Portada]]

The reason I prefer this is for what you mention below:

> What about links where there is no 1:1 relationship? Say I have a page
> about "evolution" and "theory of evolution" on one wiki (English) and
> only a page about "evolution" on another (French). So I add the
> following to en.wikipedia.org on both pages:
>
> [[fr:Théorie de l'évolution]]
>
> In the shared.ilinks table, I therefore get entries:
>
> Evolution Théorie de l'évolution
> Theory of Evolution Théorie de l'évolution
>
> When I visit the "Evolution" page, I get a clear match: Théorie de
> l'évolution. But when I visit the "Théorie de l'évolution", I get two
> matches. In this case, we could actually show both links on the French
> page:
>
> English: [1],[2]
>
> Or in edit mode:
>
> [[en:Evolution]][[en:Theory of Evolution]]
>
> It may not be desirable to autocopy these duplicate links. So, if we
> cannot discover an exact match, we may want to wait until someone
> specifies a precise translation.

In the alternative method, it will be possible to have one page connected
to multiple 'interlanguage rings'. For example, some Wikipedias have a page
'astronomy and astrophysics', while others have a page 'astronomy' and a
page 'astrophysics'. In my proposed way of working, the 'astronomy and
astrophysics' pages could be linked to both rings, so that they are linked
(in both directions) to 'astronomy' and 'astrophysics' pages without
'astronomy' pages being linked to 'astrophysics' pages.

> There may be scenarios that I have not yet covered in the above proposal,
> although I am sure solutions can be found for every problem.

The main problem I have found, is the one you try to solve with the [.[.nl::
links - when a page is linked in a way it should not, removing all
inappropriate links will nevertheless get the links back because the
appropriately linked pages will also be linked.

> There are numerous advantages to this approach. Compared with the
> current handling, we should quickly get an accurate representation of
> interlanguage links on all wikis. We do not have to pick a single
> language as "key" language, which would require a key entry in that
> language to exist for all pages. [1]

My preference would be to have a 'neutral' key language, for example simply
a numbering of all link groups that exist. Disadvantage of that method is
that inappropriate links can still come into existence (something one could
hope to avoid by using a real key language, which however increases the risk
of several 'rings' coming into existence around the same subject). They would
however be easier to repair than in your proposal.

My proposal would look something like this:

A table of terms with their pages in various languages, as described above.
There are no interlanguage links in the box below the page, instead there is
a table number or list of table numbers. However, there is an option for
users to specify that a page should be connected to a specific page in another
language. If that is done, the following can be the case:

1. Both pages are not language-connected yet. In that case a new ring would
be formed with the two pages in it.
2. Both pages are already in a ring. Then the user gets a new screen, where
the two rings are given, with the pages in the two rings. The user has
the following options:
* Melt the rings together into one ring
* Add the first page to the second ring
* Add the second page to the first ring

In the latter two cases, this will cause the added page to become part of
two rings.

3. One page is in more than one ring. Then the user again gets to see all
the rings with the pages in them; he can select one ring, causing the
other page to be added to that ring, or two rings, causing those two
rings to be melted together.

When a page already has interlanguage links, a user has a third option to
change its international links (apart from adding as specified above and
changing the ring memberships by hand), namely specifying that a link should
not exist. He then gets to see all pages (in the various languages) that
are part of the ring, and creates two rings, ring A and ring B, from them.
For each page in the ring, he specifies whether it should belong to ring A,
ring B or both.

The advantage of my system (compared to yours) is that it makes the handling
of multiple-subject pages and the correction of unwanted links easier. The
disadvantages are that it is probably even harder to implement and that it
is less wiki-like, using form-like entities rather than markup language.

Andre Engels
Re: Interlanguage links redesign proposal [ In reply to ]
I proposed enhanced interlanguage link manipulation/storage several
times before (check the archive;-), and gave that some thought.

I propose a new, independent *database*, instead of just a table. We'll
have to access another database from all but one wikis, anyway (we can't
store a complete consistent copy of that table in all databases, now can
we?).

That database could also hold other information, mainly a central user
database, so multi-language users won't have to create new user accounts
in every language.

Later, it could also be a place for the translations that are currently
in the LanguageXX.php files.

Magnus
Re: Interlanguage links redesign proposal [ In reply to ]
On Tue, Jan 07, 2003 at 07:44:55PM +0100, Magnus Manske wrote:
> Later, it could also be a place for the translations that are currently
> in the LanguageXX.php files.

Please, don't put things that are static into database.
It is slow enough now.
Re: Interlanguage links redesign proposal [ In reply to ]
Tomasz Wegrzanowski wrote:

>On Tue, Jan 07, 2003 at 07:44:55PM +0100, Magnus Manske wrote:
>
>
>>Later, it could also be a place for the translations that are currently
>>in the LanguageXX.php files.
>>
>>
>
>Please, don't put things that are static into database.
>It is slow enough now.
>
>
I thought of an online interface for translators to add/change items,
then an "update" button that creates a new LanguageXX.php file for the
edited language.

Magnus
Re: Interlanguage links redesign proposal [ In reply to ]
On Die, 2003-01-07 at 19:44, Magnus Manske wrote:
> I proposed enhanced interlanguage link manipulation/storage several
> times before (check the archive;-), and gave that some thought.
>
> I propose a new, independent *database*, instead of just a table. We'll
> have to access another database from all but one wikis, anyway (we can't
> store a complete consistent copy of that table in all databases, now can
> we?).
>
> That database could also hold other information, mainly a central user
> database, so multi-language users won't have to create new user accounts
> in every language.

Magnus,

of course the table I suggested would have to reside in some shared
database. This DB could then later also hold user data and other
information we want shared.

I would, however, want this approach to be limited to allow for
Wikipedias that reside on foreign servers. If the central shared
database fails, these Wikipedias should continue to function.

I would also like to suggest that we concentrate on fixing one problem
at a time. If we try to do too many different things at once, we may not
get anything done right in the end. The interlanguage links are an
important problem to start working on.

Regards,

Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de