Mailing List Archive: Performance changes within the Lucene 8 branch

Hello,

We have a search application built around Lucene 8. Motivated by the list
of performance enhancements and optimizations in the change notes we
upgraded from 8.1 to 8.11.2. We track the performance of different
activities within our application and can clearly see an improvement in our
facet count queries (about 15% on our p50 to 5% on our p95 execution times).

But when looking at the calls to retrieve matching documents, the parts of
our application where we allow the retrieval of many top hits (up to 10k)
has really suffered. For instance when the number of documents matching our
query exceeds 5k I see the performance of the doc matching degrade 50% on
the p50 and 35% on the p95. The facet collection still shows improvements
of 1-5%. The counter point is that when a query will match 500 documents
then we see across the board improvements in both document matching and
facet count generation.

Has anyone else seen this sort of performance change in applications that
allow such a high number of top docs to be returned from
IndexSearcher.search()? Is there some tuning or flag setting that I should
consider when allowing such a large number of top documents to be
returned?

Thank you,
Marc Davenport

Hi Marc,

How are you retrieving your hits? Lucene's stored fields, or doc values,
or both?

Do you sort the hits docids and then retrieve them in docid order (NOT in
the sorted order Lucene returned them in)? I think that might be faster as
Lucene's stored fields use block compression and if there are multiple
docids in one underlying block then you amortize the overhead of that
decompression over the N docs in it.

Also, make sure you are using Mode.BEST_SPEED when you create the Codec at
IndexWriter write time. It's the default, so if you are not changing that
to BEST_COMPRESSION, great.

Separately: why are you retrieving so many results? Are you doing some
multi-phased ranking/filtering or so? It's best to push any/all
filters/ranks as deep into the original Lucene search as possible/feasible
so that you don't have problems like this.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Dec 12, 2023 at 4:36?PM Marc Davenport
<madavenport@cargurus.com.invalid> wrote:

> Hello,
>
> We have a search application built around Lucene 8. Motivated by the list
> of performance enhancements and optimizations in the change notes we
> upgraded from 8.1 to 8.11.2. We track the performance of different
> activities within our application and can clearly see an improvement in our
> facet count queries (about 15% on our p50 to 5% on our p95 execution
> times).
>
> But when looking at the calls to retrieve matching documents, the parts of
> our application where we allow the retrieval of many top hits (up to 10k)
> has really suffered. For instance when the number of documents matching our
> query exceeds 5k I see the performance of the doc matching degrade 50% on
> the p50 and 35% on the p95. The facet collection still shows improvements
> of 1-5%. The counter point is that when a query will match 500 documents
> then we see across the board improvements in both document matching and
> facet count generation.
>
> Has anyone else seen this sort of performance change in applications that
> allow such a high number of top docs to be returned from
> IndexSearcher.search()? Is there some tuning or flag setting that I should
> consider when allowing such a large number of top documents to be
> returned?
>
> Thank you,
> Marc Davenport
>