Mailing List Archive

XMLDirectory for index storage as XML fails during segment merge
I'm using lucene-1.2-rc4 to index XML elements stored in a native XML
database. An Index class is used to generate an index for a set of nodes that
is then persisted to the DOM tree.

Following the pattern of the RAMDirectory class I've created an XMLDirectory
that has operations for writing index files as elements and others for
reading index content given an element representing an index file.

This class works fine on test datasets but fails after approx. 3000 documents
have been indexed.

An EOF error is thrown during the refill() operation of the InputStream when
the segments are being merged, I've attached the relevant stack-trace below.

[Index.java:78] java.io.IOException: read past EOF
java.io.IOException: read past EOF
at org.apache.lucene.store.InputStream.refill(Unknown Source)
at org.apache.lucene.store.InputStream.readByte(Unknown Source)
at org.apache.lucene.store.InputStream.readChars(Unknown Source)
at org.apache.lucene.store.InputStream.readString(Unknown Source)
at org.apache.lucene.index.FieldsReader.doc(Unknown Source)
at org.apache.lucene.index.SegmentReader.document(Unknown Source)
at org.apache.lucene.index.SegmentMerger.mergeFields(Unknown Source)
at org.apache.lucene.index.SegmentMerger.merge(Unknown Source)
at org.apache.lucene.index.IndexWriter.mergeSegments(Unknown Source)
at org.apache.lucene.index.IndexWriter.maybeMergeSegments(Unknown Source)
at org.apache.lucene.index.IndexWriter.addDocument(Unknown Source)
at com.parochus.search.Index.create(Index.java:72)
at com.parochus.search.TstIndex.main(TstIndex.java:43)

I've reviewed the Lucene and JGuru FAQs and conclude that Lucene should be
comfortable with indexing millions of documents within a single index so this
error wouldn't appear to be due to any upper-limit of Lucene's handling
capablility being reached.

I'm wondering if this is indicating that the length count for the segment is
not matching up with the number of bytes actually written for it.

Anybody got any ideas for tests that I could run that would determine if this
is the cause of the problem?

Regards,
Adam Ratcliffe
adam@prema.co.nz



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>