Hello everyone,
We are in the process of upgrading from Lucene 8.5.0 and on the latest
version our query performance tests show significant latency degradation
for one of the important use cases. In this test, each query retrieves a
relatively large dataset of 40k documents with a small stored fields
payload (< 100 bytes per doc).
It looks like the change which affects this use case was introduced in
LUCENE-9486 <https://issues.apache.org/jira/browse/LUCENE-9486> (Lucene
8.7), on this version our tests show almost 3 times higher latency. Later
in LUCENE-9917 <https://issues.apache.org/jira/browse/LUCENE-9917> block
size for BEST_SPEED was reduced and since Lucene 8.10 we see about 30%
degradation.
It is still a significant performance regression, and in our case query
latency is more important than index size. Unless I'm missing something,
the only way to fix that today is to introduce our own Codec,
StoredFieldsFormat and CompressionMode - an experiment with disabled preset
dict and lower block size showed that these changes allow to achieve query
latency we need on Lucene 9.2. While it can solve the problem, there is a
concern about maintaining our own version of the codec and having more
complicated upgrades in the future.
Are there any less obvious ways to improve the situation for this use case?
If not, does it make sense to expose related settings so users can tune the
compression without copying several internal classes?
Thank you,
Alex
We are in the process of upgrading from Lucene 8.5.0 and on the latest
version our query performance tests show significant latency degradation
for one of the important use cases. In this test, each query retrieves a
relatively large dataset of 40k documents with a small stored fields
payload (< 100 bytes per doc).
It looks like the change which affects this use case was introduced in
LUCENE-9486 <https://issues.apache.org/jira/browse/LUCENE-9486> (Lucene
8.7), on this version our tests show almost 3 times higher latency. Later
in LUCENE-9917 <https://issues.apache.org/jira/browse/LUCENE-9917> block
size for BEST_SPEED was reduced and since Lucene 8.10 we see about 30%
degradation.
It is still a significant performance regression, and in our case query
latency is more important than index size. Unless I'm missing something,
the only way to fix that today is to introduce our own Codec,
StoredFieldsFormat and CompressionMode - an experiment with disabled preset
dict and lower block size showed that these changes allow to achieve query
latency we need on Lucene 9.2. While it can solve the problem, there is a
concern about maintaining our own version of the codec and having more
complicated upgrades in the future.
Are there any less obvious ways to improve the situation for this use case?
If not, does it make sense to expose related settings so users can tune the
compression without copying several internal classes?
Thank you,
Alex