Mailing List Archive

need suggetion in replacing forcemerge(1) with alternative which consumes less space
HI Team,



1.We Upgraded Lucene 4.6 to 8+, After upgrading we are facing issue with Lucene Index Creation.

We are indexing in Multi-threading environment. When we create bulk indexes , Lucene Document is getting corrupted. (data is not getting updated correctly. Merging of different row data).

2. when we are trying to updateDocument method for single record. It is not reflecting in IndexReader until the count is 8. Once the count exceeds, than records are visible for IndexReader. (creating 8 segment files.) is there any alternative for reducing these segment file creation.

3. above two issues are resolved by forcemerge(1). But it is not feasible for our use case , because it takes 3X memory. We are creating indexes for huge data.



Please suggest some ideas alternate of forceMerge, dealing with indexwriter.commit for multithreading, committing data while updating single record.





Thanks,

Jyothsna
Need suggetion in replacing forcemerge(1) with alternative which consumes less space [ In reply to ]
Hi,



1.We Upgraded Lucene 4.6 to 8+, After upgrading we are facing issue with Lucene Index Creation.

We are indexing in Multi-threading environment. When we create bulk indexes , Lucene Document is getting corrupted. (data is not getting updated correctly. Merging of different row data).

2. when we are trying to updateDocument method for single record. It is not reflecting in IndexReader until the count is 8. Once the count exceeds, than records are visible for IndexReader. (creating 8 segment files.) is there any alternative for reducing these segment file creation.

3. above two issues are resolved by forcemerge(1). But it is not feasible for our use case , because it takes 3X memory. We are creating indexes for huge data.



4. IndexWriter Config:
analyzer=com.datanomic.director.casemanagement.indexing.AnalyzerFactory$MA

ramBufferSizeMB=64.0

maxBufferedDocs=-1

mergedSegmentWarmer=null

delPolicy=com.datanomic.director.casemanagement.indexing.engines.TimedDeletionPolicy

commit=null

openMode=CREATE_OR_APPEND

similarity=org.apache.lucene.search.similarities.BM25Similarity

mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=-1, maxMergeCount=-1, ioThrottle=true

codec=Lucene80

infoStream=org.apache.lucene.util.InfoStream$NoOutput

mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.1, deletesPctAllowed=33.0

indexerThreadPool=org.apache.lucene.index.DocumentsWriterPerThreadPool@24348e05

readerPooling=true

perThreadHardLimitMB=1945

useCompoundFile=false

commitOnClose=true

indexSort=null

checkPendingFlushOnUpdate=true

softDeletesField=null

readerAttributes={}

writer=org.apache.lucene.index.IndexWriter@23a84a99



Please suggest some ideas alternate of forceMerge, dealing with indexwriter.commit for multithreading, committing data while updating single record.





Thanks,

Jyothsna
RE: Need suggetion in replacing forcemerge(1) with alternative which consumes less space [ In reply to ]
Hi,

from what you are describing it is not clear, what you are seeing. Asking the question about "forceMerge(1)" seems like an XY-Problem (https://en.wikipedia.org/wiki/XY_problem).

(1) forceMerge(1) should never be used, only for some very special circumstances (like indexes that are read only and never be updated again). If you forceMerge an index its "internal structure" gets corrupted and later merging never works again like it should. This requires you to forceMerge it over an over.

(2) forceMerge does not solve the problem you are asking for! What you see might just be a side effect of something else!

(3) you say:

> Lucene Document is getting corrupted. (data is not getting updated correctly.
> Merging of different row data).

This looks like an issue in your code. Be sure to create new Documents and pass them to IndexReader. Documents may be indexed asynchronously (depending on how ou setup everything), so it looks like you change already created/existing documents while indexing.

> 2. when we are trying to updateDocument method for single record. It is not
> reflecting in IndexReader until the count is 8. Once the count exceeds, than
> records are visible for IndexReader. (creating 8 segment files.) is there any
> alternative for reducing these segment file creation.

Segments are perfectly fine and required to make incremental updates work correctly. What you say with "up to 8" does not make sense. Lucene has no mechanism of making the visibility dependent of number of segments. The issue you are seing is more related to wrong usage of the real-time readers. IndexReaders are point-in-time snapshorts. When you getReader on the Writer you get a reader that does not change anymore (point-in-time snapshot). To get the updates, you have to open a new reader. There is SearcherManager to help with that. It allows to manage a pool of searchers/indexreaders and takes care of reopening them if underlying index data changes.

> 3. above two issues are resolved by forcemerge(1). But it is not feasible for our
> use case , because it takes 3X memory. We are creating indexes for huge data.

Don't use forceMerge, especially not to work around some issue that comes from wrong multi-threading code and basic misunderstanding on IndexReaders and their relationship to IndexWriters.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Jyothsna Bavisetti <jyothsna.bavisetti@oracle.com>
> Sent: Tuesday, April 14, 2020 7:56 AM
> To: java-user@lucene.apache.org
> Subject: Need suggetion in replacing forcemerge(1) with alternative which
> consumes less space
>
> Hi,
>
>
>
> 1.We Upgraded Lucene 4.6 to 8+, After upgrading we are facing issue with
> Lucene Index Creation.
>
> We are indexing in Multi-threading environment. When we create bulk indexes
> , Lucene Document is getting corrupted. (data is not getting updated correctly.
> Merging of different row data).
>
> 2. when we are trying to updateDocument method for single record. It is not
> reflecting in IndexReader until the count is 8. Once the count exceeds, than
> records are visible for IndexReader. (creating 8 segment files.) is there any
> alternative for reducing these segment file creation.
>
> 3. above two issues are resolved by forcemerge(1). But it is not feasible for our
> use case , because it takes 3X memory. We are creating indexes for huge data.
>
>
>
> 4. IndexWriter Config:
> analyzer=com.datanomic.director.casemanagement.indexing.AnalyzerFactory$
> MA
>
> ramBufferSizeMB=64.0
>
> maxBufferedDocs=-1
>
> mergedSegmentWarmer=null
>
> delPolicy=com.datanomic.director.casemanagement.indexing.engines.TimedDel
> etionPolicy
>
> commit=null
>
> openMode=CREATE_OR_APPEND
>
> similarity=org.apache.lucene.search.similarities.BM25Similarity
>
> mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=-1,
> maxMergeCount=-1, ioThrottle=true
>
> codec=Lucene80
>
> infoStream=org.apache.lucene.util.InfoStream$NoOutput
>
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
> maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0,
> floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0,
> segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022207999E12,
> noCFSRatio=0.1, deletesPctAllowed=33.0
>
> indexerThreadPool=org.apache.lucene.index.DocumentsWriterPerThreadPool@
> 24348e05
>
> readerPooling=true
>
> perThreadHardLimitMB=1945
>
> useCompoundFile=false
>
> commitOnClose=true
>
> indexSort=null
>
> checkPendingFlushOnUpdate=true
>
> softDeletesField=null
>
> readerAttributes={}
>
> writer=org.apache.lucene.index.IndexWriter@23a84a99
>
>
>
> Please suggest some ideas alternate of forceMerge, dealing with
> indexwriter.commit for multithreading, committing data while updating single
> record.
>
>
>
>
>
> Thanks,
>
> Jyothsna
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org