Mailing List Archive: Wikipedia on CD: searching?

Wikipedia on CD: searching?

axelboldt at yahoo

Feb 7, 2003, 10:50 AM

Post #1 of 20 (1508 views)

--- Erik Moeller <e.moeller@fokus.gmd.de> wrote:

> Of course, having quarterly Wikipedia CDs or DVDs would also
> be damn cool.

I know that I asked this question before, but I don't remember whether
we found a satisfying answer. How should the searching be handled once
we distribute a static HTML tree on CD? Is there anything better than a
javascript hack? Distribute a personal web server along with the files?
What do other projects do in this situation?

Axel

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: Wikipedia on CD: searching? [ In reply to ]

vibber at aludra

Feb 7, 2003, 12:14 PM

Post #2 of 20 (1506 views)

On Fri, 7 Feb 2003, Axel Boldt wrote:
> I know that I asked this question before, but I don't remember whether
> we found a satisfying answer. How should the searching be handled once
> we distribute a static HTML tree on CD? Is there anything better than a
> javascript hack? Distribute a personal web server along with the files?

Several thoughts:

* Rather than a static HTML tree (which will grow too big for a CD without
compression sooner or later), store compressed and indexed wikitext, and
distribute it with a dedicated reader, which will have a search mechanism
built in to it. (Related to or the same as a dedicated writer... see
discussions on meta)

* A Java hack

* Write a helper program to do searching, include compiled versions for
major platforms. (But how to integrate with browser?)

-- brion vibber (brion @ pobox.com)

Re: Wikipedia on CD: searching? [ In reply to ]

Feb 7, 2003, 1:10 PM

Post #3 of 20 (1510 views)

Brion Vibber wrote:
>
> On Fri, 7 Feb 2003, Axel Boldt wrote:
> > I know that I asked this question before, but I don't remember whether
> > we found a satisfying answer. How should the searching be handled once
> > we distribute a static HTML tree on CD? Is there anything better than a
> > javascript hack? Distribute a personal web server along with the files?
>
> Several thoughts:
>
> * Rather than a static HTML tree (which will grow too big for a CD without
> compression sooner or later), store compressed and indexed wikitext, and
> distribute it with a dedicated reader, which will have a search mechanism
> built in to it. (Related to or the same as a dedicated writer... see
> discussions on meta)
>
> * A Java hack
>
> * Write a helper program to do searching, include compiled versions for
> major platforms. (But how to integrate with browser?)
>
> -- brion vibber (brion @ pobox.com)

Sounds great!!
Maybe use DVD or a set of several CD-roms...
Instead of a 'Java hack' maybe also possible to write
a 'decent program' in C++ with help of FLTK or wxWindows libraries or ...

Or, for unix-users, also distribute a mini-webserver (I am writing
mini-WikipediaSoftware in pure C right at this moment..), the edit-button
will immediately link to the online wikipedia.... (?...)
Hmmm,... Lars Aronsson is asking good questions on how to present these
'wiki-mirrors'...

Technical question: I'm considering 'stealing' some wikipedia-source
(to mirror it on my own server) by generating queries like
"http://www.wikipedia.org/w/wiki.phtml?title=Chemistry&action=edit"
then read what your server sent me, reading only the wiki-source between
"<textarea tabindex=1 name='wpTextbox1' rows=25 cols=80 wrap=virtual>"
and "</textarea>".
If I would do this too massively, I would probably disturb your server (isn't it?)
so is there maybe a simpler way to retrieve wikisources?
(I don't need the html around it)

Thanks (also thanks to Jimmy Wales for informing me about 'the foundation'),
Pieter Suurmond

If CD-ROM becomes outdated one maybe can use it as a shaving-mirror
instead of a Wikipedia-mirror :-)

Re: Wikipedia on CD: searching? [ In reply to ]

vibber at aludra

Feb 7, 2003, 1:22 PM

Post #4 of 20 (1506 views)

On Fri, 7 Feb 2003, Pieter Suurmond wrote:
> Technical question: I'm considering 'stealing' some wikipedia-source
> (to mirror it on my own server) by generating queries like
> "http://www.wikipedia.org/w/wiki.phtml?title=Chemistry&action=edit"
...
> so is there maybe a simpler way to retrieve wikisources?

This is planned; see
http://meta.wikipedia.org/wiki/Machine-friendly_wiki_interface

> If CD-ROM becomes outdated one maybe can use it as a shaving-mirror
> instead of a Wikipedia-mirror :-)

;) Another reason in favor of a 'smart' program that can download updates
and seamlessly jump between the versions on disc and the new updated
versions.

-- brion vibber (brion @ pobox.com)

Re: Wikipedia on CD: searching? [ In reply to ]

Feb 7, 2003, 1:34 PM

Post #5 of 20 (1506 views)

On Fri, Feb 07, 2003 at 11:14:03AM -0800, Brion Vibber wrote:
> On Fri, 7 Feb 2003, Axel Boldt wrote:
> > I know that I asked this question before, but I don't remember whether
> > we found a satisfying answer. How should the searching be handled once
> > we distribute a static HTML tree on CD? Is there anything better than a
> > javascript hack? Distribute a personal web server along with the files?
>
> Several thoughts:
>
> * Rather than a static HTML tree (which will grow too big for a CD without
> compression sooner or later), store compressed and indexed wikitext, and
> distribute it with a dedicated reader, which will have a search mechanism
> built in to it. (Related to or the same as a dedicated writer... see
> discussions on meta)
>
> * A Java hack
>
> * Write a helper program to do searching, include compiled versions for
> major platforms. (But how to integrate with browser?)

Distributing any code to be run on client machine is really bad idea.
A bit of Javascript to do searching in some index is about as much
as can be reasonably done. Distributing renderer is really really bad idea.

Re: Wikipedia on CD: searching? [ In reply to ]

magnus.manske at epost

Feb 7, 2003, 1:41 PM

Post #6 of 20 (1513 views)

Pieter Suurmond wrote:

>Instead of a 'Java hack' maybe also possible to write
>a 'decent program' in C++ with help of FLTK or wxWindows libraries or ...
>
I know wxWindows, and it seems very suited for this. The same source,
with minor modifications if any, will compile on Windows, *nix, Mac, and
more IIRC.

>Technical question: I'm considering 'stealing' some wikipedia-source
>(to mirror it on my own server) by generating queries like
>"http://www.wikipedia.org/w/wiki.phtml?title=Chemistry&action=edit"
>then read what your server sent me, reading only the wiki-source between
>"<textarea tabindex=1 name='wpTextbox1' rows=25 cols=80 wrap=virtual>"
>and "</textarea>".
>
That's what I do in my first "Sifter" version :-)

Did you consider downloading the database dump? Link is on the Main Page...

Magnus

Re: 'database dump'? [ In reply to ]

Feb 7, 2003, 2:19 PM

Post #7 of 20 (1509 views)

Magnus Manske wrote:
> Did you consider downloading the database dump? Link is on the Main Page...
>
> Magnus

Sorry but I cannot find any 'database dump' on http://www.wikipedia.org/
or on meta.
Pieter

Re: Wikipedia on CD: searching? [ In reply to ]

jasonr at bomis

Feb 7, 2003, 2:22 PM

Post #8 of 20 (1513 views)

Tomasz Wegrzanowski wrote:

> Distributing any code to be run on client machine is really bad
> idea.

If you're going to say that someone's idea is "really bad", you should
give a "really good" reason why this is so...

I can see that some difficulties might arise from creating programs to
run client-side, but I can also see that such a thing would be
somewhat useful to many users.

I am also of the opinion that everyone who would work on such a
project would be doing so on a volunteer basis with an interest
towards building the wikipedia community stronger. I don't see the
harm in that. I would expect you of all people to recognize that it
is somewhat frustrating when someone tells you that your idea sucks.

> A bit of Javascript to do searching in some index is about as much
> as can be reasonably done. Distributing renderer is really really bad idea.

Likewise, I was expecting to see a "really really good" reason here...

> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@wikipedia.org
> http://www.wikipedia.org/mailman/listinfo/wikitech-l

--
"Jason C. Richey" <jasonr@bomis.com>

Re: Re: 'database dump'? [ In reply to ]

vibber at aludra

Feb 7, 2003, 2:29 PM

Post #9 of 20 (1503 views)

On Fri, 7 Feb 2003, Pieter Suurmond wrote:
> Magnus Manske wrote:
> > Did you consider downloading the database dump? Link is on the Main Page...
>
> Sorry but I cannot find any 'database dump' on http://www.wikipedia.org/
> or on meta.

Here it is:
http://www.wikipedia.org/wiki/Wikipedia:Database_download

-- brion vibber (brion @ pobox.com)

Re: Re: 'database dump'? [ In reply to ]

magnus.manske at epost

Feb 7, 2003, 2:33 PM

Post #10 of 20 (1511 views)

Pieter Suurmond wrote:

>Magnus Manske wrote:
>
>
>>Did you consider downloading the database dump? Link is on the Main Page...
>>
>>Magnus
>>
>>
>
>Sorry but I cannot find any 'database dump' on http://www.wikipedia.org/
>or on meta.
>
>
Try "Database download" on www.wikipedia.org, which leads to

http://www.wikipedia.org/wiki/Wikipedia%3ADatabase_download

The direct link to the "cur" table dump is

http://www.wikipedia.org/tarballs/cur_table.sql.bz2

Magnus

Re: Re: 'database dump'? [ In reply to ]

Feb 7, 2003, 2:41 PM

Post #11 of 20 (1510 views)

Aha, thanks very much Magnus and Brion, great!!!
(Even a search on 'database dump' did not yield these URL's.)

Really great, I now also see some tarballs
(I know tarballs, I'm still a bit unfamiliar with the sql-format
but now I can study it / learn more about it...)

Thanks for pointing out, :-)
Pieter

Magnus Manske wrote:
>
> Pieter Suurmond wrote:
>
> >Magnus Manske wrote:
> >
> >
> >>Did you consider downloading the database dump? Link is on the Main Page...
> >>
> >>Magnus
> >>
> >>
> >
> >Sorry but I cannot find any 'database dump' on http://www.wikipedia.org/
> >or on meta.
> >
> >
> Try "Database download" on www.wikipedia.org, which leads to
>
> http://www.wikipedia.org/wiki/Wikipedia%3ADatabase_download
>
> The direct link to the "cur" table dump is
>
> http://www.wikipedia.org/tarballs/cur_table.sql.bz2
>
> Magnus
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@wikipedia.org
> http://www.wikipedia.org/mailman/listinfo/wikitech-l

Re: Wikipedia on CD: searching? [ In reply to ]

Feb 7, 2003, 2:42 PM

Post #12 of 20 (1514 views)

On Fri, Feb 07, 2003 at 01:22:01PM -0800, Jason Richey wrote:
> Tomasz Wegrzanowski wrote:
>
> > Distributing any code to be run on client machine is really bad
> > idea.
>
> If you're going to say that someone's idea is "really bad", you should
> give a "really good" reason why this is so...
>
> I can see that some difficulties might arise from creating programs to
> run client-side, but I can also see that such a thing would be
> somewhat useful to many users.
>
> I am also of the opinion that everyone who would work on such a
> project would be doing so on a volunteer basis with an interest
> towards building the wikipedia community stronger. I don't see the
> harm in that. I would expect you of all people to recognize that it
> is somewhat frustrating when someone tells you that your idea sucks.
>
> > A bit of Javascript to do searching in some index is about as much
> > as can be reasonably done. Distributing renderer is really really bad idea.
>
> Likewise, I was expecting to see a "really really good" reason here...

Because:
* it's not safe for user
* there's no chance it will run on every machine people would want
to see wikipedia on.

Is this good enough ?

Re: Wikipedia on CD: searching? [ In reply to ]

magnus.manske at epost

Feb 7, 2003, 3:01 PM

Post #13 of 20 (1516 views)

Tomasz Wegrzanowski wrote:

>On Fri, Feb 07, 2003 at 01:22:01PM -0800, Jason Richey wrote:
>
>
>>Likewise, I was expecting to see a "really really good" reason here...
>>
>>
>
>Because:
>* it's not safe for user
>
I'm afraid I don't know what you mean with that...

>* there's no chance it will run on every machine people would want
> to see wikipedia on.
>
Right. But a thing like wxWindows will run on most of them, and there's
no reason not to distribute a HTML/JavaScript version as well for those few.

>Is this good enough ?
>
Not IMHO. "Real" programs can have many advantages over JavaScript
hacks, for example the "updater" Brion mentioned.

Magnus

Re: Wikipedia on CD: searching? [ In reply to ]

Feb 7, 2003, 3:03 PM

Post #14 of 20 (1511 views)

Tomasz Wegrzanowski wrote:
>
> On Fri, Feb 07, 2003 at 01:22:01PM -0800, Jason Richey wrote:
> > Tomasz Wegrzanowski wrote:
> >
> > > Distributing any code to be run on client machine is really bad
> > > idea.
> >
> > If you're going to say that someone's idea is "really bad", you should
> > give a "really good" reason why this is so...
> >
> > I can see that some difficulties might arise from creating programs to
> > run client-side, but I can also see that such a thing would be
> > somewhat useful to many users.
> >
> > I am also of the opinion that everyone who would work on such a
> > project would be doing so on a volunteer basis with an interest
> > towards building the wikipedia community stronger. I don't see the
> > harm in that. I would expect you of all people to recognize that it
> > is somewhat frustrating when someone tells you that your idea sucks.
> >
> > > A bit of Javascript to do searching in some index is about as much
> > > as can be reasonably done. Distributing renderer is really really bad idea.
> >
> > Likewise, I was expecting to see a "really really good" reason here...
>
> Because:
> * it's not safe for user
> * there's no chance it will run on every machine people would want
> to see wikipedia on.
>
> Is this good enough ?

Sorry, I'm afraid not.
If wiki-syntax and -semantics are clear, it must be possible to render it
on ANY machine. Why not?? (this is what OPEN SOURCE is about)

The only thing that matters is the wikisource (which I can now download,
which is really great!!), I 'll find my own alternative way(s) to render it :-).
Am I not free?
One could use JavaScript, Java, C++ in combination with wxWindows,
or whatever... Let us develop as may different viewers and renderers for
all different machines, operating-systems, etc... But first (I'm afraid)
Wikisyntax needs to be finished/completed (a matter of specification, not
of implementation).

Sorry, in my opinion "Distributing renderer" is a really really very good
idea (it enables me to read articles at times that the wikipedia-server is
very slow or down).

Pieter

Re: Wikipedia on CD: searching? [ In reply to ]

axelboldt at yahoo

Feb 7, 2003, 3:49 PM

Post #15 of 20 (1518 views)

--- Brion Vibber <vibber@aludra.usc.edu> wrote:

> * Rather than a static HTML tree (which will grow too big for a CD
> without compression sooner or later), store compressed and indexed
> wikitext, and distribute it with a dedicated reader, which will have
> a search mechanism built in to it.

Since we probably don't want to write a whole renderer though, maybe we
should start with Mozilla as display engine. Does Mozilla allows search
plugins that are more intelligent then just handing the request off to
some website?

> * A Java hack

I don't think a Java applet can return the search results in the form
of an HTML page though. Maybe a separate standalone Java program that
somehow communicates with a browser?

It seems like such a generic problem, somebody must have solved it
before.

Axel

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

Re: Wikipedia on CD: searching? [ In reply to ]

wikipedia at wp

Feb 7, 2003, 4:26 PM

Post #16 of 20 (1513 views)

On 7 Feb 2003 at 14:49, Axel Boldt wrote:

> Since we probably don't want to write a whole renderer though, maybe we
> should start with Mozilla as display engine. Does Mozilla allows search
> plugins that are more intelligent then just handing the request off to
> some website?

What about starting with Amaya? http://www.w3.org/Amaya/

Disclaimer: I'm not a member of W3C community neither of Amaya
Development team, and this is not an ad.

I've downloaded yesterday fresh (Feb 03) version 7.2
and the stupid smile on my face is getting bigger and bigger with every minute :)

Of course using Amaya won't be a solution for searching problem but Amaya:
* supports MathML rendering (so one only needs to transform
our TeX <math> syntax into MathML and Amaya will do the rest)
* supports SVG (anybody still thinking about future support of SVG in WP?)
* is a browser/editor - you have to click only one icon to change the mode
(so if one smartly design the rest, ie. handling "source code" of articles,
like transforming it to XHTML and back, then every person could edit
the page even without knowing wikisyntax or Wikipedia software,
Amaya is WYSIWYG)
* support for annotations which can be held in local file on posted
to a server (just imagine Wikipedia as a server for such annotations!
If one browse off-line and see an error then he/she adds an annotation
and saves it locally and when he/she has the access to the Net posts
to the Wikipedia and Wikipedians correct)

Disadvantages, that I've found out after one day of playing with Amaya:
no elaborated HTML, no scripts and a bit harsh interface
but Amaya is free and open source project.

User:Youandme

Re: Wikipedia on CD: searching? [ In reply to ]

wikipedia at wp

Feb 7, 2003, 4:49 PM

Post #17 of 20 (1516 views)

On 8 Feb 2003 at 0:26, Youandme wrote:

> * is a browser/editor - you have to click only one icon to change the mode
> (so if one smartly design the rest, ie. handling "source code" of articles,
> like transforming it to XHTML and back, then every person could edit
> the page even without knowing wikisyntax or Wikipedia software,
> Amaya is WYSIWYG)

Oh... one more thing: I've started playing with Amaya through
editing the documentation of... Amaya - it contains some typos
and formatting errors, does it ring a bell? :)

User:Youandme

Re: Wikipedia on CD: searching? [ In reply to ]

erik_moeller at gmx

Feb 7, 2003, 6:33 PM

Post #18 of 20 (1522 views)

> I know that I asked this question before, but I don't remember whether
> we found a satisfying answer. How should the searching be handled once
> we distribute a static HTML tree on CD? Is there anything better than a
> javascript hack? Distribute a personal web server along with the files?
> What do other projects do in this situation?

For indexing:
htdig or Berkeley DB might do the trick. (htdig indices are fairly
bloated, though, and I'm not sure about the fulltext indexing capabilities
of Berkeley.)

You could use a small cross-platform app (no Java, please) to generate the
results, or query the datastore through a CGI-enabled small webserver like
Fnord:
http://www.fbunet.de/fnord.shtml

The early Britannica used a similar solution.

Regards,

Erik

Re: http://www.fbunet.de/fnord.shtml [ In reply to ]

Feb 7, 2003, 7:50 PM

Post #19 of 20 (1521 views)

Erik Moeller wrote:
>
> > I know that I asked this question before, but I don't remember whether
> > we found a satisfying answer. How should the searching be handled once
> > we distribute a static HTML tree on CD? Is there anything better than a
> > javascript hack? Distribute a personal web server along with the files?
> > What do other projects do in this situation?
>
> For indexing:
> htdig or Berkeley DB might do the trick. (htdig indices are fairly
> bloated, though, and I'm not sure about the fulltext indexing capabilities
> of Berkeley.)
>
> You could use a small cross-platform app (no Java, please) to generate the
> results, or query the datastore through a CGI-enabled small webserver like
> Fnord:
> http://www.fbunet.de/fnord.shtml
>
> The early Britannica used a similar solution.
>
> Regards,
>
> Erik

Ah nice (I did see a lot of tiny httpd's lately but this one is new to me).
Indeed, must be possible to do it all via normal CGI-queries.

Write upon 'fnord' or grab source and add wiki-features to it... (?)
Just writing CGI seems best because that is standardised and would
enable migration to apache or some other server... there are probably many
servers out there that can do the job... I think on most Operating Systems
nowadays some kind of 'personal webserver' is already installed... ?

(I also object to Java, it's in the hands of Sun and Microsoft, it seems)

Greetings / thanks for the link,
Pieter

Re: Wikipedia on CD: searching? [ In reply to ]

Feb 7, 2003, 10:50 PM

Post #20 of 20 (1515 views)

On Sat, Feb 08, 2003 at 12:26:28AM +0100, Youandme wrote:
> Of course using Amaya won't be a solution for searching problem but Amaya:
> * supports MathML rendering (so one only needs to transform
> our TeX <math> syntax into MathML and Amaya will do the rest)

I checked that, but what it renders look horrible.
It completely fails to do proper vertical alignment.

For example \int \int_p x looks like:
/ /
| |
| | x
| /
/ p

With second integral sign smaller than first, and their centers at different
heights.

Another example: it doesn't place operators on the same height as fraction line.