Hi All,
For concurrent segment search, lucene uses the *slices* method to compute
the number of work units which can be processed concurrently.
a) It calculates *slices* in the constructor of *IndexSearcher*
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
with default thresholds for document count and segment counts.
b) Provides an implementation of *SliceExecutor* (i.e.
QueueSizeBasedExecutor)
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
based on executor type which applies the backpressure in concurrent
execution based on a limiting factor of 1.5 times the passed in threadpool
maxPoolSize.
In OpenSearch, we have a search threadpool which serves the search request
to all the lucene indices (or OpenSearch shards) assigned to a node. Each
node can get the requests to some or all the indices on that node.
I am exploring a mechanism such that I can dynamically control the max
slices for each lucene index search request. For example: search requests
to some indices on that node to have max 4 slices each and others to have 2
slices each. Then the threadpool shared to execute these slices does not
have any limiting factor. In this model the top level search threadpool
will limit the number of active search requests which will limit the number
of work units in the SliceExecutor threadpool.
For this the derived implementation of IndexSearcher can get an input value
in the constructor to control the slice count computation. Even though the
slice method is protected it gets called from the constructor of base
IndexSearcher class which prevents the derived class from using the passed
in input.
To achieve this I can think of the following ways (in order of preference)
and would like to submit a pull request for it. But I wanted to get some
feedback if option 1 looks fine or take some other approach.
1. Provide another constructor in IndexSearcher which takes in 4 input
parameters:
protected IndexSearcher(IndexReaderContext context, Executor executor,
SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, LeafSlice[]>
sliceProvider)
2. Make the `leafSlices` member protected and non final. After it is
initialized by the IndexSearcher (using default mechanism in lucene), the
derived implementation can again update it if need be (like based on some
input parameter to its own constructor). Also make the constructor with
SliceExecutor input protected such that derived implementation can provide
its own implementation of SliceExecutor. This mechanism will have redundant
computation of leafSlices.
Thanks,
Sorabh
For concurrent segment search, lucene uses the *slices* method to compute
the number of work units which can be processed concurrently.
a) It calculates *slices* in the constructor of *IndexSearcher*
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
with default thresholds for document count and segment counts.
b) Provides an implementation of *SliceExecutor* (i.e.
QueueSizeBasedExecutor)
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
based on executor type which applies the backpressure in concurrent
execution based on a limiting factor of 1.5 times the passed in threadpool
maxPoolSize.
In OpenSearch, we have a search threadpool which serves the search request
to all the lucene indices (or OpenSearch shards) assigned to a node. Each
node can get the requests to some or all the indices on that node.
I am exploring a mechanism such that I can dynamically control the max
slices for each lucene index search request. For example: search requests
to some indices on that node to have max 4 slices each and others to have 2
slices each. Then the threadpool shared to execute these slices does not
have any limiting factor. In this model the top level search threadpool
will limit the number of active search requests which will limit the number
of work units in the SliceExecutor threadpool.
For this the derived implementation of IndexSearcher can get an input value
in the constructor to control the slice count computation. Even though the
slice method is protected it gets called from the constructor of base
IndexSearcher class which prevents the derived class from using the passed
in input.
To achieve this I can think of the following ways (in order of preference)
and would like to submit a pull request for it. But I wanted to get some
feedback if option 1 looks fine or take some other approach.
1. Provide another constructor in IndexSearcher which takes in 4 input
parameters:
protected IndexSearcher(IndexReaderContext context, Executor executor,
SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, LeafSlice[]>
sliceProvider)
2. Make the `leafSlices` member protected and non final. After it is
initialized by the IndexSearcher (using default mechanism in lucene), the
derived implementation can again update it if need be (like based on some
input parameter to its own constructor). Also make the constructor with
SliceExecutor input protected such that derived implementation can provide
its own implementation of SliceExecutor. This mechanism will have redundant
computation of leafSlices.
Thanks,
Sorabh