Hi!
I recently posted about profiling the nightly benchmarks, and in the
process of doing that I came across an exception that looks rather nasty. I
was about to create a Jira issue, but the nice Jira submit form told me to
post it here or in irc first.
The Lucene commit where I saw this was
eb24e95731b9f865b95b821c1745264fdc58119, which was the master branch's HEAD
this Saturday. It's is an EOF error, which seems to happen when reading
vector values:
Exception in thread "Thread-1" java.lang.RuntimeException:
java.lang.RuntimeException: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-17_eb24e95_medium_1thread/index/_32.vec")
[slice=vector-data]
at perf.TaskThreads$TaskThread.run(TaskThreads.java:105)
Caused by: java.lang.RuntimeException: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-17_eb24e95_medium_1thread/index/_32.vec")
[slice=vector-data]
at perf.SearchTask.go(SearchTask.java:322)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
Caused by: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-17_eb24e95_medium_1thread/index/_32.vec")
[slice=vector-data]
at
org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:255)
at
org.apache.lucene.store.ByteBufferIndexInput$MultiBufferImpl.seek(ByteBufferIndexInput.java:575)
at
org.apache.lucene.codecs.lucene90.Lucene90VectorReader$OffHeapVectorValues.vectorValue(Lucene90VectorReader.java:432)
at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:118)
at
org.apache.lucene.codecs.lucene90.Lucene90VectorReader$OffHeapVectorValues.search(Lucene90VectorReader.java:409)
at perf.KnnQuery$KnnWeight.scorer(KnnQuery.java:88)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:743)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:664)
at
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:510)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:520)
at perf.SearchTask.go(SearchTask.java:263)
Sorry about the long stack trace in an email - still not quite up to speed
with etiquette on this mailing list, should I have attached a file instead?
I have reproduced it 3 times on my machine, running the nightly benchmark's
search benchmark with the competition.WIKI_MEDIUM_ALL source. It does not
occur with e.g. the competition.WIKI_MEDIUM_10M source. The source of the
competition file is in
https://github.com/mikemccand/luceneutil/blob/master/src/python/competition.py.
I run the indexing with only 1 thread - not sure if that matters.
I'll happily provide more information regarding system setup etc. if that
can help in figuring things out.
best regards,
Anton Hägerstrand
I recently posted about profiling the nightly benchmarks, and in the
process of doing that I came across an exception that looks rather nasty. I
was about to create a Jira issue, but the nice Jira submit form told me to
post it here or in irc first.
The Lucene commit where I saw this was
eb24e95731b9f865b95b821c1745264fdc58119, which was the master branch's HEAD
this Saturday. It's is an EOF error, which seems to happen when reading
vector values:
Exception in thread "Thread-1" java.lang.RuntimeException:
java.lang.RuntimeException: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-17_eb24e95_medium_1thread/index/_32.vec")
[slice=vector-data]
at perf.TaskThreads$TaskThread.run(TaskThreads.java:105)
Caused by: java.lang.RuntimeException: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-17_eb24e95_medium_1thread/index/_32.vec")
[slice=vector-data]
at perf.SearchTask.go(SearchTask.java:322)
at perf.TaskThreads$TaskThread.run(TaskThreads.java:91)
Caused by: java.io.EOFException: seek past EOF:
MMapIndexInput(path="/home/anton/dev/lucene-bench-home/indices/lucene_bench_2021-01-17_eb24e95_medium_1thread/index/_32.vec")
[slice=vector-data]
at
org.apache.lucene.store.ByteBufferIndexInput.seek(ByteBufferIndexInput.java:255)
at
org.apache.lucene.store.ByteBufferIndexInput$MultiBufferImpl.seek(ByteBufferIndexInput.java:575)
at
org.apache.lucene.codecs.lucene90.Lucene90VectorReader$OffHeapVectorValues.vectorValue(Lucene90VectorReader.java:432)
at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:118)
at
org.apache.lucene.codecs.lucene90.Lucene90VectorReader$OffHeapVectorValues.search(Lucene90VectorReader.java:409)
at perf.KnnQuery$KnnWeight.scorer(KnnQuery.java:88)
at org.apache.lucene.search.Weight.bulkScorer(Weight.java:166)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:743)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:533)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:664)
at
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:510)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:520)
at perf.SearchTask.go(SearchTask.java:263)
Sorry about the long stack trace in an email - still not quite up to speed
with etiquette on this mailing list, should I have attached a file instead?
I have reproduced it 3 times on my machine, running the nightly benchmark's
search benchmark with the competition.WIKI_MEDIUM_ALL source. It does not
occur with e.g. the competition.WIKI_MEDIUM_10M source. The source of the
competition file is in
https://github.com/mikemccand/luceneutil/blob/master/src/python/competition.py.
I run the indexing with only 1 thread - not sure if that matters.
I'll happily provide more information regarding system setup etc. if that
can help in figuring things out.
best regards,
Anton Hägerstrand