There seems to be some interest in creating a static HTML distribution
(dump) of Wikipedia, most notably it is requested on the
Wikipedia:Database_download page and in Feature Requests #596830 on
Sourceforge. This would allow people to download the Wikipedia for use
offline, for example from a CDROM.
So, I have started work and made my initial version (English only)
available online for anyone on this list to evaluate and test. I am
looking for feedback, suggestions, bug reports and general comments.
http://www.rawlinson.ca:8080/wikipedia/index.html
Please do not attempt to mirror the site as my server and bandwidth won't
be able to handle it. The site is only intended for developers to try and
give feedback. Once everyone is happy with it I will make .tar and .iso
packages available for distribution.
At the moment, the method I use to create the static HTML version is very
lengthy in terms of processing time and requires a number of manual step.
It takes my 1 GHz machine about 5 hours to generate all the pages.
Ideally, I'll have something more automated and efficient as time goes on.
My plan for how things would work is I will produce an updated static HTML
version every few months or significant milestones.
I'm not sure how to distribute this static HTML version when it's ready
for a public release. Currently it's about 500 Meg in size (that includes
everything). As I mentioned above I have limited server resources. For
distribution maybe it could be put on the Sourceforge download page, or on
the Wikipedia.org server somewhere (/tarballs)?
Finally, since I am new to Wikipedia and this list, please excuse me while
I learn how things work around here. I am open to criticism, suggestions
and discussion. I am looking forward to working with everyone on
Wikipedia and contributing where I can.
Some Technical Details (for those interested):
- English only (currently)
- uses "printable" pages, no top or side navigation bars
- added links to home, back, copyright and Wikipedia.org to bottom of all
pages (TODO: if a talk page exists a link should be added)
- pages are stored in directories based on first two characters of MD5
hash, same as image storage scheme
- includes all namespaces (talk, users, users_talk, wikipedia_talk, etc.)
- created a list with links to all the items in each namespace to allow
for basic searching of page titles
- redirects replaced with direct link to article
Regards,
Steve Rawlinson
(dump) of Wikipedia, most notably it is requested on the
Wikipedia:Database_download page and in Feature Requests #596830 on
Sourceforge. This would allow people to download the Wikipedia for use
offline, for example from a CDROM.
So, I have started work and made my initial version (English only)
available online for anyone on this list to evaluate and test. I am
looking for feedback, suggestions, bug reports and general comments.
http://www.rawlinson.ca:8080/wikipedia/index.html
Please do not attempt to mirror the site as my server and bandwidth won't
be able to handle it. The site is only intended for developers to try and
give feedback. Once everyone is happy with it I will make .tar and .iso
packages available for distribution.
At the moment, the method I use to create the static HTML version is very
lengthy in terms of processing time and requires a number of manual step.
It takes my 1 GHz machine about 5 hours to generate all the pages.
Ideally, I'll have something more automated and efficient as time goes on.
My plan for how things would work is I will produce an updated static HTML
version every few months or significant milestones.
I'm not sure how to distribute this static HTML version when it's ready
for a public release. Currently it's about 500 Meg in size (that includes
everything). As I mentioned above I have limited server resources. For
distribution maybe it could be put on the Sourceforge download page, or on
the Wikipedia.org server somewhere (/tarballs)?
Finally, since I am new to Wikipedia and this list, please excuse me while
I learn how things work around here. I am open to criticism, suggestions
and discussion. I am looking forward to working with everyone on
Wikipedia and contributing where I can.
Some Technical Details (for those interested):
- English only (currently)
- uses "printable" pages, no top or side navigation bars
- added links to home, back, copyright and Wikipedia.org to bottom of all
pages (TODO: if a talk page exists a link should be added)
- pages are stored in directories based on first two characters of MD5
hash, same as image storage scheme
- includes all namespaces (talk, users, users_talk, wikipedia_talk, etc.)
- created a list with links to all the items in each namespace to allow
for basic searching of page titles
- redirects replaced with direct link to article
Regards,
Steve Rawlinson