Mailing List Archive

WMDumper
I tried using WMDumper to load the content of wikipedia in a Mysql 5
Database. I used tables.sql to generate the table. I then tried writing
the data in the mySql using WMDumper and get the following results.

C:\Downloads>set
class=mwdumper.jar;mysql-connector-java-3.0.11-stable-bin.jar

C:\Downloads>set data="C:\Downloads\enwiki-20070206-pages-articles.xml.bz2"

C:\Downloads>java -client -classpath
mwdumper.jar;mysql-connector-java-3.0.11-stable-bin.jar
org.mediawiki.dumper.Dumper
"--output=mysql://127.0.0.1/enwiki?user=xxxx&password=xxxxxxx"
"--format=sql:1.5" "C:\Downloads\enwiki-20070206-pages-a
rticles.xml.bz2"
1.000 pages (148,148/sec), 1.000 revs (148,148/sec)
2.000 pages (156,104/sec), 2.000 revs (156,104/sec)
Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
String index out of range: -1
at java.lang.String.substring(Unknown Source)
at
com.mysql.jdbc.EscapeProcessor.escapeSQL(EscapeProcessor.java:151)
at com.mysql.jdbc.Statement.execute(Statement.java:845)
at org.mediawiki.importer.SqlServerStream.writeStatement(Unknown
Source)
at org.mediawiki.importer.SqlWriter.flushInsertBuffer(Unknown
Source)
at org.mediawiki.importer.SqlWriter.bufferInsertRow(Unknown Source)
at org.mediawiki.importer.SqlWriter15.writeRevision(Unknown Source)
at org.mediawiki.importer.MultiWriter.writeRevision(Unknown Source)
at org.mediawiki.importer.PageFilter.writeRevision(Unknown Source)
at org.mediawiki.dumper.ProgressFilter.writeRevision(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.closeRevision(Unknown
Source)
at org.mediawiki.importer.XmlDumpReader.endElement(Unknown Source)
at
org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanEndElement(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at
org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at javax.xml.parsers.SAXParser.parse(Unknown Source)
at org.mediawiki.importer.XmlDumpReader.readDump(Unknown Source)
at org.mediawiki.dumper.Dumper.main(Unknown Source)

--
________________________________________________________________________

Axel Ngonga University of Leipzig, Dpt. Computer Sciences
M.Sc. Business Information Systems Group
http://bis.informatik.uni-leipzig.de
Johannisgasse 26, Room 5-22
D-04103 Leipzig
fon: +49-341-9732341 * fax: +49-341-9732239 * mobile: +49-176-23517631
________________________________________________________________________


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: WMDumper [ In reply to ]
Axel Ngonga wrote:
> I tried using WMDumper to load the content of wikipedia in a Mysql 5
> Database. I used tables.sql to generate the table. I then tried writing
> the data in the mySql using WMDumper and get the following results.

You'll probably get the answer that the MediaWiki developers don't
support third-party extensions, so you'd better give it a try with the
maintenance/importDump.php script first (run it from the command-line).

Good luck,

Boris

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: WMDumper [ In reply to ]
Boris Eetgerink write:
> You'll probably get the answer that the MediaWiki developers don't
> support third-party extensions, so you'd better give it a try with the
> maintenance/importDump.php script first (run it from the command-line).
>
> Good luck,
>
> Boris

MWDumper is supported by MediaWiki. I don't know the author, but it's
certainly not a "third party extension", but the recommended tool.
importDump.php is too slow for using it with a full wiki dump (it
renders each page).


http://www.mediawiki.org/wiki/MWDumper
http://download.wikimedia.org/tools/


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: WMDumper [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Axel Ngonga wrote:
> C:\Downloads>set
> class=mwdumper.jar;mysql-connector-java-3.0.11-stable-bin.jar
[snip]
> 1.000 pages (148,148/sec), 1.000 revs (148,148/sec)
> 2.000 pages (156,104/sec), 2.000 revs (156,104/sec)
> Exception in thread "main" java.lang.StringIndexOutOfBoundsException:
> String index out of range: -1
> at java.lang.String.substring(Unknown Source)
> at
> com.mysql.jdbc.EscapeProcessor.escapeSQL(EscapeProcessor.java:151)
> at com.mysql.jdbc.Statement.execute(Statement.java:845)

I can't reproduce this problem with MySQL Connector/J 3.0.14 or 3.1.11
(java 1.5.0_06-113 on Mac OS X 10.4/Intel, tried current mwdumper build
and the snapshot on download.wikimedia.org); try grabbing 3.0.14 from
http://dev.mysql.com/.

- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFF00T0wRnhpk1wk44RAhlMAJ42n0CDZOT2rzVeC7sJ54m6WOtLuwCg2MDJ
48Brgx8qdYFGHvfm9dO52SM=
=ryXB
-----END PGP SIGNATURE-----

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: WMDumper [ In reply to ]
Platonides wrote:
> Boris Eetgerink write:
> > You'll probably get the answer that the MediaWiki developers don't
>> support third-party extensions, so you'd better give it a try with the
>> maintenance/importDump.php script first (run it from the command-line).
>>
>> Good luck,
>>
>> Boris
>
> MWDumper is supported by MediaWiki. I don't know the author, but it's
> certainly not a "third party extension", but the recommended tool.
> importDump.php is too slow for using it with a full wiki dump (it
> renders each page).

My apologies then, and thanks for the information.

Boris

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: WMDumper [ In reply to ]
On 14/02/07, Platonides <Platonides@gmail.com> wrote:
> MWDumper is supported by MediaWiki. I don't know the author, but it's
> certainly not a "third party extension", but the recommended tool.
> importDump.php is too slow for using it with a full wiki dump (it
> renders each page).

Brion Vibber, I thought?


Rob Church

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: WMDumper [ In reply to ]
| -----Original Message-----
| From: wikitech-l-bounces@lists.wikimedia.org
| [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of
| Rob Church
| Sent: Wednesday, February 14, 2007 6:38 PM
/
| On 14/02/07, Platonides <Platonides@gmail.com> wrote:
| > MWDumper is supported by MediaWiki. I don't know the author,
?
/
| Brion Vibber, I thought?

Without doubt! :-))

Reg., Janusz 'Ency' Dorozynski


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l