Mailing List Archive

Question about setIsMerging and getReaderForMerge for ReadersAndUpdates
Hi lucene dev,

For lucene merge logic, I see such code below (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5075):


merge.initMergeReaders(
sci -> {
final ReadersAndUpdates rld = getPooledInstance(sci, true);
rld.setIsMerging();
return rld.getReaderForMerge(context);
});

It looks that if ReadersAndUpdates#addDVUpdate is invoked by another thread between rld.setIsMerging() and rld.getReaderForMerge(context), mergingDVUpdates in ReadersAndUpdates could end up with duplicated del gen for the same field. It happens as follows:

Merge thread: Another Thread:
1. rld.setIsMerging()
2. rld.addDVUpdate(update)
it places the update both in pendingDVUpdates and mergingDVUpdates
3. rld.getReaderForMerge(context)
Carry over all pendingDVUpdates to mergingDVUpdates

Does it make more sense if we invoke red.setIsMerging() and old.getReaderForMerge(context) atomically ? By doing so, we can avoid the issue above.

Please correct me if I miss something.

Thanks,
Will