Mailing List Archive

Creating a Plucker version (was: Static html dump)
> Je Mardo 20 Majo 2003 03:53, Alfio Puglisi skribis:
> > I just subscribed (I'm the wikipedia user At18) to ask about the
> > automatic html dump function. ...
>
> > If anyone is interested, I have a rudimental Perl script that is
> > capable of reading the downloadable SQL dump and output all the
> > articles as separate files in a number of alphabetical directories.
> > It's not very fast, but it works.

That's great!

> > What's missing from the script: wikimarkup -> HTML conversion,

You should be able to call the existing PHP code that generates
HTML to do this.


A tool that generated the entire Wikipedia, in static HTML format,
would make it trivial to generate the "Plucker" format
for Palm PDAs. Plucker is offline web browser for Palm PDAs;
it's open source software/Free Software (OSS/FS) released under the
GPL.
It can handle HTML, as well as PNG, GIF, JPEG, txt, and a few others;
HTML is usually rendered as you'd expect (hypertext, italics, bold,
font size changes, lists, indenting all work).
It'd be very nice if the Wikipedia were available in Plucker format;
that would mean that an OSS/FS reader could be used to view the text
on a Palm PDA.

Plucker is available at: "http://www.plkr.org".
I have a Palm, and it is the MOST important program I use by far.

One minor problem is that Plucker doesn't have an index
facility. That could be solved by creating HTML pages that link to
sorted articles, e.g., "Master Index" could list "A, B, C...";
clicking on "A" would reach "Index A" which would list "AA, AB, AC...".
Then, modify the static version of the main main page so you
could quickly jump to the master index.

Internally, Plucker will break long pages (>32K) into multiple
pages with front and back link - but that'll be automatic and
won't affect anything.

I don't know of an automatic way to download the Wikipedia
images (which, in my mind, is a serious problem). Hopefully there
will soon be a way to download the images other than trawling.
However, for a Palm you'd have to drop the images in general anyway,
so for that particular use it wouldn't matter.
RE: Creating a Plucker version (was: Static html dump) [ In reply to ]
> If anyone is interested, I have a rudimental Perl script that is
> capable of reading the downloadable SQL dump and output all the
> articles as separate files in a number of alphabetical directories.
> It's not very fast, but it works.
> What's missing from the script: wikimarkup -> HTML conversion,

Mr David A. Wheeler,
Have you seen my Perl script for conversion of the SQL dump to
TomeRaider datase? You might find useful code there.

It renders all pages in html, checks als hyperlinks and unlinks half a
million orphaned ones. It edits wiki code to remove redundant tags,
fixes some badly coded html tables, adds stats and language specific
introduction. Replaces html tags by extended ascii (saves a lot of
space). Resolves redirects, thus making hyperlinks point directly to the
proper article. It removes tables that only contained an image (plus
possibly a single footer text).

In fact I think the script could be extended to generate separate html
pages in a few hours. Plucker specifics not taken into account.

Script: http://members.chello.nl/epzachte/Wikipedia/WikiToTome.pl
More info: http://members.chello.nl/epzachte/Wikipedia

Erik Zachte
Re: RE: Creating a Plucker version (was: Static html dump) [ In reply to ]
Hey, it seems that a lot of people is doing the same thing here.
Well, thanks for all the input about the code and the Plucker possibility.
For now I think I'll go on like before (I like my spaghetti code :-)
When I'll have a version working sufficiently well I'll make things
public.

Ciao,
Alfio

On Wed, 21 May 2003, Erik Zachte wrote:

>> If anyone is interested, I have a rudimental Perl script that is
>> capable of reading the downloadable SQL dump and output all the
>> articles as separate files in a number of alphabetical directories.
>> It's not very fast, but it works.
>> What's missing from the script: wikimarkup -> HTML conversion,
>
>Mr David A. Wheeler,
>Have you seen my Perl script for conversion of the SQL dump to
>TomeRaider datase? You might find useful code there.
>
>It renders all pages in html, checks als hyperlinks and unlinks half a
>million orphaned ones. It edits wiki code to remove redundant tags,
>fixes some badly coded html tables, adds stats and language specific
>introduction. Replaces html tags by extended ascii (saves a lot of
>space). Resolves redirects, thus making hyperlinks point directly to the
>proper article. It removes tables that only contained an image (plus
>possibly a single footer text).
>
>In fact I think the script could be extended to generate separate html
>pages in a few hours. Plucker specifics not taken into account.
>
>Script: http://members.chello.nl/epzachte/Wikipedia/WikiToTome.pl
>More info: http://members.chello.nl/epzachte/Wikipedia
>
>Erik Zachte
>
>
>_______________________________________________
>Wikitech-l mailing list
>Wikitech-l@wikipedia.org
>http://www.wikipedia.org/mailman/listinfo/wikitech-l
>