Mailing List Archive

Cache vs. other-language links
I noticed that the other-language links (links in the form [[fr:Japon]]
[[en:Japan]] [[eo:Japanio]] etc which are hidden in the article body but
listed by language name in the header bar, pointing to the article on
the current subject in the other-language wikis) are vanishing on cached
pages, because they're scanned and listed during the wiki->html link
parsing which of course doesn't occur when loading a cached page.

I've a hackish fix for that which explicitly seeks out the other-
language links for cached pages, but I don't like it very much. It's
inelegant, and two sets of code have to be maintained to do the same
thing in different contexts.

What I'd like to do is add a column to the cur table, something like
cur_links_languages which would be analogous to cur_links_linked and
cur_links_unliked. The list of inter-language links for a page would be
stored when the page is saved, then easily loaded up again along with
the cache. This would also make it easy to provide statistics on the
degree of linkage between language wikis. (No change in current
user-visible behavior except in fixing the obvious bug of vanishing
links, and potentially providing more information in special:Statistics
etc.)

Alternatively, we might have a separate database which contains nothing
but lists of connected articles. This could facilitate keeping the
other-language links consistent; if somebody adds an article "JapĆ³n" to
the Spanish wikipedia, it shouldn't be necessary to separately add
[[es:Jap%f3n]] to the English, French, Esperanto, etc. articles. Keeping
a central repository would mean that it only needs to be linked in with
the others once, and all linked articles will immediately benefit by
being able to list it without manual editing. Upside: added simplicity
for article writers, who don't have to maintain as many links. Downside:
added complexity for site maintainers, who have to run a second database
or not get all the other-language links. Also might be more difficult to
remove incorrectly linked articles.

An alternative to the separate link database might be a robot/automatic
process that occasionally looks through all the wikipedias checking for
consistency in the other-language links and automatically adding (or
alerting a human that one ought to add) new other-language links where
needed.

So what do people think? Should we try one of these, or should I just
check in my hackish fix for the meantime?

-- brion vibber (brion @ pobox.com)
Re: Cache vs. other-language links [ In reply to ]
From: "Brion Vibber" <brion@pobox.com>
>
> I noticed that the other-language links (links in the form [[fr:Japon]]
> [[en:Japan]] [[eo:Japanio]] etc which are hidden in the article body but
> listed by language name in the header bar, pointing to the article on
> the current subject in the other-language wikis) are vanishing on cached
> pages, because they're scanned and listed during the wiki->html link
> parsing which of course doesn't occur when loading a cached page.

Can I suggest we simply stop with the whole caching thing? It complicates
things unnecesarily. Keeping the code simple should be one of our top
priorities. Jimbo doesn't have it turned on at the moment anyway, and
Wikipedia seems to be fine on non-generated pages. And I expect that we can
do really a lot of optimization on the generated pages comparable with
Recent Changes (which is also not cached at the moment), and there is a
whole bunch of very inefficient (esp. in terms of memory use) programming
going on in the current parser.

> Alternatively, we might have a separate database which contains nothing
> but lists of connected articles. This could facilitate keeping the
> other-language links consistent; [...]

*sigh* It's a very nice idea, but currently I don't believe that phpwiki is
really out of the woods yet. First the current functionality has to be
correct, efficient and the code has to be well-organized and documented. And
only then can we start thinking about such fancy extensions of
functionality.

-- Jan Hidders
Re: Cache vs. other-language links [ In reply to ]
From: "Magnus Manske" <Magnus.Manske@epost.de>
>
> The parser has to be brought up to speed. I'll also have a look into
> connecting the PHP script with the C++ parser I wrote (did I mention 0.05
> secs for rendering "Signal transduction", with fetching it from the
> database, searching the database for existing topics, and adding the
> "framework"?;)

As I said I'd rather keep it in PHP, but it's your project of course. Does
your parser put any requirements on the syntax. Should it be LL(1) or
LALR(1)? Are you going to use yacc, or is it just a simple recursiev descent
parser?

What we could improve in PHP for example is that the current parser parses
the string paragraph by paragraph. (But please don't use the function
explode() for that because that is a memory killer.) Most replace-functions
could be limited to only one paragraph and the rest can be dealt with by
making the parser a little context-sensitive. Standard Wiki matrup is
supposed to be limited to one paragraph anyway. The HTML markup is a bit
harder, but there you can remember the nesting depth and type of nesting,
and once you see that that tags are not balanced you go back in the string
and replace the < and > with entities. This will be expensive but it is an
exception, so it won't hurt.

-- Jan Hidders
Re: Cache vs. other-language links [ In reply to ]
From: "Jimmy Wales" <jwales@bomis.com>


> Jan Hidders wrote:
> > Can I suggest we simply stop with the whole caching thing? It
complicates
> > things unnecesarily. Keeping the code simple should be one of our top
> > priorities. Jimbo doesn't have it turned on at the moment anyway,
>
> I have no strong opinion about this, but I wanted to say that I
> thought I did have it turned on. If it's off, that's a mistake.

My mistake. I saw that the language links worked, but hadn't realized I was
looking at the page just after I edited it. I see now that you do have it on
because upon reloading the page the language links are missing, so I get
apparently the cached page.

> Tell you what, I'll benchmark with and without, on the live server,
> and report the numbers.

Yes, that would be very very welcome.

-- Jan Hidders