Hello everyone,
For our use case, we need to run queries which return the full
matched result set. In some cases, this result set can be large (50k+
results out of 4 million total documents).
Perf test showed that just 4 threads running random queries returning 50k
results make Lucene utilize 100% CPU on a 4-core machine (profiler
screenshot
<https://user-images.githubusercontent.com/6069066/157188814-fbd9d205-c2e4-45b6-b98d-b7622b6ac801.png>).
The query is very simple and contains only a single-term filter clause, all
unrelated parts of the application are disabled, no stored fields are
fetched, GC is doing minimal amount of work
<https://user-images.githubusercontent.com/6069066/157191646-eb8c5ccc-41c1-4af1-afcf-37d0c5f86054.png>
.
My understanding is that fetching a large result set is not exactly
the best use case for Lucene, as explained here
<http://philosophyforprogrammers.blogspot.com/2010/09/lucene-performance.html>.
But I wonder if there are ways to optimize something / use a special type
of collector in order to minimize CPU utilization?
Thank you,
Alex
For our use case, we need to run queries which return the full
matched result set. In some cases, this result set can be large (50k+
results out of 4 million total documents).
Perf test showed that just 4 threads running random queries returning 50k
results make Lucene utilize 100% CPU on a 4-core machine (profiler
screenshot
<https://user-images.githubusercontent.com/6069066/157188814-fbd9d205-c2e4-45b6-b98d-b7622b6ac801.png>).
The query is very simple and contains only a single-term filter clause, all
unrelated parts of the application are disabled, no stored fields are
fetched, GC is doing minimal amount of work
<https://user-images.githubusercontent.com/6069066/157191646-eb8c5ccc-41c1-4af1-afcf-37d0c5f86054.png>
.
My understanding is that fetching a large result set is not exactly
the best use case for Lucene, as explained here
<http://philosophyforprogrammers.blogspot.com/2010/09/lucene-performance.html>.
But I wonder if there are ways to optimize something / use a special type
of collector in order to minimize CPU utilization?
Thank you,
Alex