Hi lucene dev,
For lucene merge logic, I see such code below (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5075):
merge.initMergeReaders(
sci -> {
final ReadersAndUpdates rld = getPooledInstance(sci, true);
rld.setIsMerging();
return rld.getReaderForMerge(context);
});
It looks that if ReadersAndUpdates#addDVUpdate is invoked by another thread between rld.setIsMerging() and rld.getReaderForMerge(context), mergingDVUpdates in ReadersAndUpdates could end up with duplicated del gen for the same field. It happens as follows:
Merge thread: Another Thread:
1. rld.setIsMerging()
2. rld.addDVUpdate(update)
it places the update both in pendingDVUpdates and mergingDVUpdates
3. rld.getReaderForMerge(context)
Carry over all pendingDVUpdates to mergingDVUpdates
Does it make more sense if we invoke red.setIsMerging() and old.getReaderForMerge(context) atomically ? By doing so, we can avoid the issue above.
Please correct me if I miss something.
Thanks,
Will
For lucene merge logic, I see such code below (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L5075):
merge.initMergeReaders(
sci -> {
final ReadersAndUpdates rld = getPooledInstance(sci, true);
rld.setIsMerging();
return rld.getReaderForMerge(context);
});
It looks that if ReadersAndUpdates#addDVUpdate is invoked by another thread between rld.setIsMerging() and rld.getReaderForMerge(context), mergingDVUpdates in ReadersAndUpdates could end up with duplicated del gen for the same field. It happens as follows:
Merge thread: Another Thread:
1. rld.setIsMerging()
2. rld.addDVUpdate(update)
it places the update both in pendingDVUpdates and mergingDVUpdates
3. rld.getReaderForMerge(context)
Carry over all pendingDVUpdates to mergingDVUpdates
Does it make more sense if we invoke red.setIsMerging() and old.getReaderForMerge(context) atomically ? By doing so, we can avoid the issue above.
Please correct me if I miss something.
Thanks,
Will