Hi All,
I am working on a patch that would leverage the MergePolicy and
MergeScheduler to run addIndexes(CodecReader...) triggered merges
concurrently (Lucene-10216
<https://issues.apache.org/jira/browse/LUCENE-10216>, WIP-PR
<https://github.com/apache/lucene/pull/633>). I had some general questions
about the APIs current implementation.
At the start of the API, we trigger a flush(triggerMerge: false,
applyAllDeletes: true)
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L3132>.
I was wondering why we need this. My understanding is that the readers
brought in by addIndexes() API would be unrelated to any pending updates or
deletes.
I tried removing this call, and testExistingDeletes
<https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java#L1022-L1052>
(). failed. This leads me to understand that we flush and applyAllDeletes,
so that, if there was a pending delete by term, it does not impact incoming
readers that coincidentally contained docs with the same term.
Is this correct?
Also, since we may still get such a delete before the API completes, and
those deletes would get applied, this is likely a best effort scenario,
right?
On a related note, the regular merge for existing segments writes all
pending DV updates before merging, but we skip this in the addIndexes API.
Should we be doing this in both places?
Thanks,
Vigya
I am working on a patch that would leverage the MergePolicy and
MergeScheduler to run addIndexes(CodecReader...) triggered merges
concurrently (Lucene-10216
<https://issues.apache.org/jira/browse/LUCENE-10216>, WIP-PR
<https://github.com/apache/lucene/pull/633>). I had some general questions
about the APIs current implementation.
At the start of the API, we trigger a flush(triggerMerge: false,
applyAllDeletes: true)
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L3132>.
I was wondering why we need this. My understanding is that the readers
brought in by addIndexes() API would be unrelated to any pending updates or
deletes.
I tried removing this call, and testExistingDeletes
<https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java#L1022-L1052>
(). failed. This leads me to understand that we flush and applyAllDeletes,
so that, if there was a pending delete by term, it does not impact incoming
readers that coincidentally contained docs with the same term.
Is this correct?
Also, since we may still get such a delete before the API completes, and
those deletes would get applied, this is likely a best effort scenario,
right?
On a related note, the regular merge for existing segments writes all
pending DV updates before merging, but we skip this in the addIndexes API.
Should we be doing this in both places?
Thanks,
Vigya