Mailing List Archive

Force-merge performance degrading after upgrade to Lucene 8.0
Hi!

After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will take long time, about double of previous latency.

Based on our investigation, we found the follows is main cause of the force-merge performance decrease:

* From Lucene 8.0, NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.

Cause analysis
From Lucene 8.0, We find that NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.
The function mergeTerms is used to create .tim, .tip, .doc, .pos, .pay for each term.
This change is related to merge operation of norms setting of fields.
merge() function before Lucene 8.0
mergeTerms(segmentWriteState);

merge() function of lucene 8.0
try (NormsProducer norms = mergeState.mergeFieldInfos.hasNorms()
? codec.normsFormat().normsProducer(segmentReadState)
: null) {
NormsProducer normsMergeInstance = null;
if (norms != null) {
// Use the merge instance in order to reuse the same IndexInput for all terms
normsMergeInstance = norms.getMergeInstance();
}
mergeTerms(segmentWriteState, normsMergeInstance);
}

Test cases and result

In order to validate that above analysis is the main cause of force-merge performance decrease, we design some test cases.

Test environment

* ES cluster: 3 master nodes /1 client node /3 data nodes with i3.2xlarge
* Data: 13216068 docs
* Index: 3 primary, 0 replica

Test steps

1. modify merge policy setting & norms setting in ES mapping file.
2. load data into ES cluster && record running duration
3. run index_name/_flush
4. run _cat segments & save output
5. run _forcemerge
6. run _cat segments & save output

Test result
No.
ES version
Lucene version
omit norms
force merge time
1.1
6.8.13
7.7.2
no
13 min
1.2
6.8.13
7.7.2
omit norms for all text, keyword fields
14 min
2.1
7.9.1
8.6.2
no
31 min
2.2
7.9.1
8.6.2
omit norms for all text, keyword fields
13 min

My question is:

1. Why will this Norms related change cause obviously force-merge performance decrease?
2. Is there any way to resolve it and improve force-merge performance for Lucene 8.0+?
Look forward your answer and thanks a lot for your help.
Eileen Xie

Confidentiality note: This e-mail may contain confidential information from Clarivate. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is strictly prohibited. If you have received this e-mail in error, please delete this e-mail and notify the sender immediately.
Force-merge performance degrading after upgrade to Lucene 8.0 [ In reply to ]
Hi!

After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will take long time, about double of previous latency.

Based on our investigation, we found the follows is main cause of the force-merge performance decrease:

* From Lucene 8.0, NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.

Cause analysis
From Lucene 8.0, We find that NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.
The function mergeTerms is used to create .tim, .tip, .doc, .pos, .pay for each term.
This change is related to merge operation of norms setting of fields.
merge() function before Lucene 8.0
mergeTerms(segmentWriteState);

merge() function of lucene 8.0
try (NormsProducer norms = mergeState.mergeFieldInfos.hasNorms()
? codec.normsFormat().normsProducer(segmentReadState)
: null) {
NormsProducer normsMergeInstance = null;
if (norms != null) {
// Use the merge instance in order to reuse the same IndexInput for all terms
normsMergeInstance = norms.getMergeInstance();
}
mergeTerms(segmentWriteState, normsMergeInstance);
}

Test cases and result

In order to validate that above analysis is the main cause of force-merge performance decrease, we design some test cases.

Test environment

* ES cluster: 3 master nodes /1 client node /3 data nodes with i3.2xlarge
* Data: 13216068 docs
* Index: 3 primary, 0 replica

Test steps

1. modify merge policy setting & norms setting in ES mapping file.
2. load data into ES cluster && record running duration
3. run index_name/_flush
4. run _cat segments & save output
5. run _forcemerge
6. run _cat segments & save output

Test result
No.
ES version
Lucene version
omit norms
force merge time
1.1
6.8.13
7.7.2
no
13 min
1.2
6.8.13
7.7.2
omit norms for all text, keyword fields
14 min
2.1
7.9.1
8.6.2
no
31 min
2.2
7.9.1
8.6.2
omit norms for all text, keyword fields
13 min

My question is:

1. Why will this Norms related change cause obviously force-merge performance decrease?
2. Is there any way to resolve it and improve force-merge performance for Lucene 8.0+?
Look forward your answer and thanks a lot for your help.
Eileen Xie

Confidentiality note: This e-mail may contain confidential information from Clarivate. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is strictly prohibited. If you have received this e-mail in error, please delete this e-mail and notify the sender immediately.
Force-merge performance degrading after upgrade to Lucene 8.0 [ In reply to ]
Hi!

After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will take long time, about double of previous latency.
Based on our investigation, we found the follows is main cause of the force-merge performance decrease:
* From Lucene 8.0, NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.


<< Cause analysis

From Lucene 8.0, We find that NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.
The function mergeTerms is used to create .tim, .tip, .doc, .pos, .pay for each term.
This change is related to merge operation of norms setting of fields.

< merge() function before Lucene 8.0
mergeTerms(segmentWriteState);

< merge() function of lucene 8.0
try (NormsProducer norms = mergeState.mergeFieldInfos.hasNorms()
? codec.normsFormat().normsProducer(segmentReadState)
: null) {
NormsProducer normsMergeInstance = null;
if (norms != null) {
// Use the merge instance in order to reuse the same IndexInput for all terms
normsMergeInstance = norms.getMergeInstance();
}
mergeTerms(segmentWriteState, normsMergeInstance); }


<< Test cases and result

In order to validate that above analysis is the main cause of force-merge performance decrease, we design some test cases.

< Test environment
* ES cluster: 3 master nodes /1 client node /3 data nodes with i3.2xlarge
* Data: 13216068 docs
* Index: 3 primary, 0 replica

< Test steps
1. modify merge policy setting & norms setting in ES mapping file.
2. load data into ES cluster && record running duration
3. run index_name/_flush
4. run _cat segments & save output
5. run _forcemerge
6. run _cat segments & save output

< Test result

No. | ES version | Lucene version | omit norms | force merge time
-----------------------------------------------------------------
1.1 | 6.8.13 | 7.7.2 | no | 13 min
1.2 | 6.8.13 | 7.7.2 | omit norms for all text, keyword fields | 14 min
2.1 | 7.9.1 | 8.6.2 | no | 31 min
2.2 | 7.9.1 | 8.6.2 | omit norms for all text, keyword fields | 13 min


<< My question is:

1. Why will this Norms related change cause obviously force-merge performance decrease?
2. Is there any way to resolve it and improve force-merge performance for Lucene 8.0+?


Look forward your answer and thanks a lot for your help.
Eileen Xie

Confidentiality note: This e-mail may contain confidential information from Clarivate. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this e-mail is strictly prohibited. If you have received this e-mail in error, please delete this e-mail and notify the sender immediately.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org