Hello,
I am trying to execute a program to read documents segment-by-segment and
reindex to the same index. I am reading using Lucene apis and indexing
using solr api (in a core that is currently loaded).
What I am observing is that even after a segment has been fully processed
and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment
with 0 live docs gets left behind. *Upon Solr restart, the segment does get
cleared succesfully.*
I tried to replicate same thing without the code by indexing 3 docs on an
empty test core, and then reindexing the same docs. The older segment gets
deleted as soon as softCommit interval hits or an explicit commit=true is
called.
Here are the two approaches that I have tried. Approach 2 is inspired by
the merge logic of accessing segments in case opening a DirectoryReader
(Approach 1) externally is causing this issue.
But both approaches leave undeleted segments behind until I restart Solr
and load the core again. What am I missing? I don't have any more brain
cells left to fry on this!
Approach 1:
=========
try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
IndexReader reader = DirectoryReader.open(dir)) {
for (LeafReaderContext lrc : reader.leaves()) {
//read live docs from each leaf , create a
SolrInputDocument out of Document and index using Solr api
}
}catch(Exception e){
}
Approach 2:
==========
ReadersAndUpdates rld = null;
SegmentReader segmentReader = null;
RefCounted<IndexWriter> iwRef =
core.getSolrCoreState().getIndexWriter(core);
iw = iwRef.get();
try{
for (SegmentCommitInfo sci : segmentInfos) {
rld = iw.getPooledInstance(sci, true);
segmentReader = rld.getReader(IOContext.READ);
//process all live docs similar to above using the segmentReader.
rld.release(segmentReader);
iw.release(rld);
}finally{
if (iwRef != null) {
iwRef.decref();
}
}
Help would be much appreciated!
Thanks,
Rahul
I am trying to execute a program to read documents segment-by-segment and
reindex to the same index. I am reading using Lucene apis and indexing
using solr api (in a core that is currently loaded).
What I am observing is that even after a segment has been fully processed
and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment
with 0 live docs gets left behind. *Upon Solr restart, the segment does get
cleared succesfully.*
I tried to replicate same thing without the code by indexing 3 docs on an
empty test core, and then reindexing the same docs. The older segment gets
deleted as soon as softCommit interval hits or an explicit commit=true is
called.
Here are the two approaches that I have tried. Approach 2 is inspired by
the merge logic of accessing segments in case opening a DirectoryReader
(Approach 1) externally is causing this issue.
But both approaches leave undeleted segments behind until I restart Solr
and load the core again. What am I missing? I don't have any more brain
cells left to fry on this!
Approach 1:
=========
try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
IndexReader reader = DirectoryReader.open(dir)) {
for (LeafReaderContext lrc : reader.leaves()) {
//read live docs from each leaf , create a
SolrInputDocument out of Document and index using Solr api
}
}catch(Exception e){
}
Approach 2:
==========
ReadersAndUpdates rld = null;
SegmentReader segmentReader = null;
RefCounted<IndexWriter> iwRef =
core.getSolrCoreState().getIndexWriter(core);
iw = iwRef.get();
try{
for (SegmentCommitInfo sci : segmentInfos) {
rld = iw.getPooledInstance(sci, true);
segmentReader = rld.getReader(IOContext.READ);
//process all live docs similar to above using the segmentReader.
rld.release(segmentReader);
iw.release(rld);
}finally{
if (iwRef != null) {
iwRef.decref();
}
}
Help would be much appreciated!
Thanks,
Rahul