Mailing List Archive

Importing Polish Wikipedia
Generally it worked, but there were some problems

1)
Conversion script tried to convert .db.orig files:

Now there are articles like "Propozycje Tematow.db.o"
with contents "There was an error converting this file : file not found."

2)
Article q{"Casablanca"} (WITH quotes, i don't know how to write
it down, let's say that q{} are stronger quotes ;)), currently
a redirect to q{Casablanca (film)} was converted to q{\"Casablanca"}.
I wouldn't mind if it was just removed from database.

3)
Some articles are weird, like:
Polimery
12c12 < [?rednia masa cz?steczkowa polimer?w]? --- > Masa cz?steczkowa polimer?w

That looks like a diff which was imported as article.
I have no idea what could cause it.
Re: Importing Polish Wikipedia [ In reply to ]
If you're reading this, I hit "send" by mistake.

On mar, 2002-03-12 at 03:29, Tomasz Wegrzanowski wrote:
> Generally it worked, but there were some problems
>
> 1)
> Conversion script tried to convert .db.orig files:
>
> Now there are articles like "Propozycje Tematow.db.o"
> with contents "There was an error converting this file : file not found."

Should be harmless...

> 2)
> Article q{"Casablanca"} (WITH quotes, i don't know how to write
> it down, let's say that q{} are stronger quotes ;)), currently
> a redirect to q{Casablanca (film)} was converted to q{\"Casablanca"}.
> I wouldn't mind if it was just removed from database.

!! Okay, that's just wrong. Hmm.

Actually, on my machine I don't even see this article converted. I'm not
sure what's going on there, I seem to have a lot of articles missing.
Possibly some weird PHP problem, as near as I can tell it's simply not
reading all the files in every directory.

> 3)
> Some articles are weird, like:
> Polimery
> 12c12 < [?rednia masa cz?steczkowa polimer?w]? --- > Masa cz?steczkowa polimer?w
>
> That looks like a diff which was imported as article.
> I have no idea what could cause it.

The usemod db format doesn't seem to be quite as consistent as the
conversion script wants... I'll look it over again in the morning.

> Major problem: z-with-dot letter seems to be screwed in links.
> Links from http://local_copy_of_wikipedia/wiki.phtml?title=Wy%C5%BCsze_uczelnie_w_Polsce
> to subpages are broken.
> They work on Polish UseMod Wikipedia, so it must be problem with conversion script.
> Maybe there are some mistakes in latin2 -> unt8 table ?

Ah, looks like parts of the links are accidentally converted from latin2
to utf-8 twice. (Thanks for catching that! It doesn't show up on the
Esperanto database I usually test with, which is already mostly UTF-8
and doesn't change on a second conversion.) I've moved the conversion to
before the link extraction, it seems better now.

-- brion vibber (brion @ pobox.com)
Re: Importing Polish Wikipedia [ In reply to ]
On mar, 2002-03-12 at 06:03, Brion L. VIBBER wrote:
> If you're reading this, I hit "send" by mistake.

Oops. :)

> On mar, 2002-03-12 at 03:29, Tomasz Wegrzanowski wrote:
> > 3)
> > Some articles are weird, like:
> > Polimery
> > 12c12 < [?rednia masa cz?steczkowa polimer?w]? --- > Masa cz?steczkowa polimer?w
> >
> > That looks like a diff which was imported as article.
> > I have no idea what could cause it.
>
> The usemod db format doesn't seem to be quite as consistent as the
> conversion script wants... I'll look it over again in the morning.

Fixed now.

It should now also import old versions unchanged (without converting
subpage links) up to and including the current version, then add the
conversion.

Also, I added localisable strings $wikiSeeAlso, $wikiAutomatedConversion
and $wikiConversionScript.

-- brion vibber (brion @ pobox.com)