Mailing List Archive

Custom SliceExecutor and slices computation in IndexSearcher
Hi All,

For concurrent segment search, lucene uses the *slices* method to compute
the number of work units which can be processed concurrently.

a) It calculates *slices* in the constructor of *IndexSearcher*
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
with default thresholds for document count and segment counts.
b) Provides an implementation of *SliceExecutor* (i.e.
QueueSizeBasedExecutor)
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
based on executor type which applies the backpressure in concurrent
execution based on a limiting factor of 1.5 times the passed in threadpool
maxPoolSize.

In OpenSearch, we have a search threadpool which serves the search request
to all the lucene indices (or OpenSearch shards) assigned to a node. Each
node can get the requests to some or all the indices on that node.
I am exploring a mechanism such that I can dynamically control the max
slices for each lucene index search request. For example: search requests
to some indices on that node to have max 4 slices each and others to have 2
slices each. Then the threadpool shared to execute these slices does not
have any limiting factor. In this model the top level search threadpool
will limit the number of active search requests which will limit the number
of work units in the SliceExecutor threadpool.

For this the derived implementation of IndexSearcher can get an input value
in the constructor to control the slice count computation. Even though the
slice method is protected it gets called from the constructor of base
IndexSearcher class which prevents the derived class from using the passed
in input.

To achieve this I can think of the following ways (in order of preference)
and would like to submit a pull request for it. But I wanted to get some
feedback if option 1 looks fine or take some other approach.

1. Provide another constructor in IndexSearcher which takes in 4 input
parameters:
protected IndexSearcher(IndexReaderContext context, Executor executor,
SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, LeafSlice[]>
sliceProvider)

2. Make the `leafSlices` member protected and non final. After it is
initialized by the IndexSearcher (using default mechanism in lucene), the
derived implementation can again update it if need be (like based on some
input parameter to its own constructor). Also make the constructor with
SliceExecutor input protected such that derived implementation can provide
its own implementation of SliceExecutor. This mechanism will have redundant
computation of leafSlices.


Thanks,
Sorabh
Re: Custom SliceExecutor and slices computation in IndexSearcher [ In reply to ]
Hi Sorabh,
You'll want to override the protected slices method to include your custom
logic for creating the leaf slices. Your IndexSearcher extension can also
retrieve the slices through the getSlices public method. I don't understand
what makes the additional constructor necessary, could you clarify that for
me?

One thing that may make sense to do is making the SliceExecutor extensible.
Currently it is package private, and I can see how users may want to
provide their own implementation when it comes to handling rejections,
executing on the caller thread in certain scenarios. Possibly even the task
creation, and the coordination of their execution could be moved to the
SliceExecutor too.

Cheers
Luca

On Fri, May 19, 2023, 03:27 SorabhApache <sorabh@apache.org> wrote:

> Hi All,
>
> For concurrent segment search, lucene uses the *slices* method to compute
> the number of work units which can be processed concurrently.
>
> a) It calculates *slices* in the constructor of *IndexSearcher*
> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
> with default thresholds for document count and segment counts.
> b) Provides an implementation of *SliceExecutor* (i.e.
> QueueSizeBasedExecutor)
> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
> based on executor type which applies the backpressure in concurrent
> execution based on a limiting factor of 1.5 times the passed in threadpool
> maxPoolSize.
>
> In OpenSearch, we have a search threadpool which serves the search request
> to all the lucene indices (or OpenSearch shards) assigned to a node. Each
> node can get the requests to some or all the indices on that node.
> I am exploring a mechanism such that I can dynamically control the max
> slices for each lucene index search request. For example: search requests
> to some indices on that node to have max 4 slices each and others to have 2
> slices each. Then the threadpool shared to execute these slices does not
> have any limiting factor. In this model the top level search threadpool
> will limit the number of active search requests which will limit the number
> of work units in the SliceExecutor threadpool.
>
> For this the derived implementation of IndexSearcher can get an input
> value in the constructor to control the slice count computation. Even
> though the slice method is protected it gets called from the constructor of
> base IndexSearcher class which prevents the derived class from using the
> passed in input.
>
> To achieve this I can think of the following ways (in order of preference)
> and would like to submit a pull request for it. But I wanted to get some
> feedback if option 1 looks fine or take some other approach.
>
> 1. Provide another constructor in IndexSearcher which takes in 4 input
> parameters:
> protected IndexSearcher(IndexReaderContext context, Executor executor,
> SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, LeafSlice[]>
> sliceProvider)
>
> 2. Make the `leafSlices` member protected and non final. After it is
> initialized by the IndexSearcher (using default mechanism in lucene), the
> derived implementation can again update it if need be (like based on some
> input parameter to its own constructor). Also make the constructor with
> SliceExecutor input protected such that derived implementation can provide
> its own implementation of SliceExecutor. This mechanism will have redundant
> computation of leafSlices.
>
>
> Thanks,
> Sorabh
>
Re: Custom SliceExecutor and slices computation in IndexSearcher [ In reply to ]
Hi Luca,
Thanks for your reply. Sharing an example below for clarity.








*public class CustomIndexSearcher extends IndexSearcher { public
CustomIndexSearcher(IndexReader reader, Executor executor, int
maxSliceCount) { super(reader, executor); }
@Override protected LeafSlice[] slices(List<LeafReaderContext>
leaves) {*
* // cannot use maxSliceCount here to control custom logic
as this is called from constructor of super class*
* // I want to use parameter[s] in the constructor input to
control this slice computation*
* }*
*}*

Yes, the SliceExecutor class will become public. Will also need a
constructor in IndexSearcher which can take an implementation from the
extensions.

Thanks,
Sorabh


On Thu, May 18, 2023 at 11:40?PM Luca Cavanna <luca@elastic.co.invalid>
wrote:

> Hi Sorabh,
> You'll want to override the protected slices method to include your custom
> logic for creating the leaf slices. Your IndexSearcher extension can also
> retrieve the slices through the getSlices public method. I don't understand
> what makes the additional constructor necessary, could you clarify that for
> me?
>
> One thing that may make sense to do is making the SliceExecutor
> extensible. Currently it is package private, and I can see how users may
> want to provide their own implementation when it comes to handling
> rejections, executing on the caller thread in certain scenarios. Possibly
> even the task creation, and the coordination of their execution could be
> moved to the SliceExecutor too.
>
> Cheers
> Luca
>
> On Fri, May 19, 2023, 03:27 SorabhApache <sorabh@apache.org> wrote:
>
>> Hi All,
>>
>> For concurrent segment search, lucene uses the *slices* method to
>> compute the number of work units which can be processed concurrently.
>>
>> a) It calculates *slices* in the constructor of *IndexSearcher*
>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
>> with default thresholds for document count and segment counts.
>> b) Provides an implementation of *SliceExecutor* (i.e.
>> QueueSizeBasedExecutor)
>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
>> based on executor type which applies the backpressure in concurrent
>> execution based on a limiting factor of 1.5 times the passed in threadpool
>> maxPoolSize.
>>
>> In OpenSearch, we have a search threadpool which serves the search
>> request to all the lucene indices (or OpenSearch shards) assigned to a
>> node. Each node can get the requests to some or all the indices on that
>> node.
>> I am exploring a mechanism such that I can dynamically control the max
>> slices for each lucene index search request. For example: search requests
>> to some indices on that node to have max 4 slices each and others to have 2
>> slices each. Then the threadpool shared to execute these slices does not
>> have any limiting factor. In this model the top level search threadpool
>> will limit the number of active search requests which will limit the number
>> of work units in the SliceExecutor threadpool.
>>
>> For this the derived implementation of IndexSearcher can get an input
>> value in the constructor to control the slice count computation. Even
>> though the slice method is protected it gets called from the constructor of
>> base IndexSearcher class which prevents the derived class from using the
>> passed in input.
>>
>> To achieve this I can think of the following ways (in order of
>> preference) and would like to submit a pull request for it. But I wanted to
>> get some feedback if option 1 looks fine or take some other approach.
>>
>> 1. Provide another constructor in IndexSearcher which takes in 4 input
>> parameters:
>> protected IndexSearcher(IndexReaderContext context, Executor executor,
>> SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, LeafSlice[]>
>> sliceProvider)
>>
>> 2. Make the `leafSlices` member protected and non final. After it is
>> initialized by the IndexSearcher (using default mechanism in lucene), the
>> derived implementation can again update it if need be (like based on some
>> input parameter to its own constructor). Also make the constructor with
>> SliceExecutor input protected such that derived implementation can provide
>> its own implementation of SliceExecutor. This mechanism will have redundant
>> computation of leafSlices.
>>
>>
>> Thanks,
>> Sorabh
>>
>
Re: Custom SliceExecutor and slices computation in IndexSearcher [ In reply to ]
Hi Sorabh,
thanks for explaining. I see what you mean, it does get awkward to
customize how slices are created. If the plan is to make SliceExecutor
public and extensible, would it make sense to figure out what its public
methods should be, and include the slice creation in there so it is
detached from the IndexSearcher? I think that in practice there is a
correlation between how slices are created and how they are executed (e.g.
you may want to limit the number of slices created and execute each one on
a separate thread, or possibly have more slices than threads, and reuse the
same thread to execute multiple slices). Moving the slice creation to the
component responsible for their execution would give the desired
flexibility for users to customize their logic without having to poke with
IndexSearcher constructors? What do you think? I personally prefer this
option over adding a function argument to the existing searcher
constructor, or making the slices non-final which affects the inter-segment
concurrency design (slices should not be mutable?).

Cheers
Luca

On Fri, May 19, 2023 at 9:02?AM SorabhApache <sorabh@apache.org> wrote:

> Hi Luca,
> Thanks for your reply. Sharing an example below for clarity.
>
>
>
>
>
>
>
>
> *public class CustomIndexSearcher extends IndexSearcher { public
> CustomIndexSearcher(IndexReader reader, Executor executor, int
> maxSliceCount) { super(reader, executor); }
> @Override protected LeafSlice[] slices(List<LeafReaderContext>
> leaves) {*
> * // cannot use maxSliceCount here to control custom logic
> as this is called from constructor of super class*
> * // I want to use parameter[s] in the constructor input
> to control this slice computation*
> * }*
> *}*
>
> Yes, the SliceExecutor class will become public. Will also need a
> constructor in IndexSearcher which can take an implementation from the
> extensions.
>
> Thanks,
> Sorabh
>
>
> On Thu, May 18, 2023 at 11:40?PM Luca Cavanna <luca@elastic.co.invalid>
> wrote:
>
>> Hi Sorabh,
>> You'll want to override the protected slices method to include your
>> custom logic for creating the leaf slices. Your IndexSearcher extension can
>> also retrieve the slices through the getSlices public method. I don't
>> understand what makes the additional constructor necessary, could you
>> clarify that for me?
>>
>> One thing that may make sense to do is making the SliceExecutor
>> extensible. Currently it is package private, and I can see how users may
>> want to provide their own implementation when it comes to handling
>> rejections, executing on the caller thread in certain scenarios. Possibly
>> even the task creation, and the coordination of their execution could be
>> moved to the SliceExecutor too.
>>
>> Cheers
>> Luca
>>
>> On Fri, May 19, 2023, 03:27 SorabhApache <sorabh@apache.org> wrote:
>>
>>> Hi All,
>>>
>>> For concurrent segment search, lucene uses the *slices* method to
>>> compute the number of work units which can be processed concurrently.
>>>
>>> a) It calculates *slices* in the constructor of *IndexSearcher*
>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
>>> with default thresholds for document count and segment counts.
>>> b) Provides an implementation of *SliceExecutor* (i.e.
>>> QueueSizeBasedExecutor)
>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
>>> based on executor type which applies the backpressure in concurrent
>>> execution based on a limiting factor of 1.5 times the passed in threadpool
>>> maxPoolSize.
>>>
>>> In OpenSearch, we have a search threadpool which serves the search
>>> request to all the lucene indices (or OpenSearch shards) assigned to a
>>> node. Each node can get the requests to some or all the indices on that
>>> node.
>>> I am exploring a mechanism such that I can dynamically control the max
>>> slices for each lucene index search request. For example: search requests
>>> to some indices on that node to have max 4 slices each and others to have 2
>>> slices each. Then the threadpool shared to execute these slices does not
>>> have any limiting factor. In this model the top level search threadpool
>>> will limit the number of active search requests which will limit the number
>>> of work units in the SliceExecutor threadpool.
>>>
>>> For this the derived implementation of IndexSearcher can get an input
>>> value in the constructor to control the slice count computation. Even
>>> though the slice method is protected it gets called from the constructor of
>>> base IndexSearcher class which prevents the derived class from using the
>>> passed in input.
>>>
>>> To achieve this I can think of the following ways (in order of
>>> preference) and would like to submit a pull request for it. But I wanted to
>>> get some feedback if option 1 looks fine or take some other approach.
>>>
>>> 1. Provide another constructor in IndexSearcher which takes in 4 input
>>> parameters:
>>> protected IndexSearcher(IndexReaderContext context, Executor executor,
>>> SliceExecutor sliceExecutor, Function<List<LeafReaderContext>, LeafSlice[]>
>>> sliceProvider)
>>>
>>> 2. Make the `leafSlices` member protected and non final. After it is
>>> initialized by the IndexSearcher (using default mechanism in lucene), the
>>> derived implementation can again update it if need be (like based on some
>>> input parameter to its own constructor). Also make the constructor with
>>> SliceExecutor input protected such that derived implementation can provide
>>> its own implementation of SliceExecutor. This mechanism will have redundant
>>> computation of leafSlices.
>>>
>>>
>>> Thanks,
>>> Sorabh
>>>
>>
Re: Custom SliceExecutor and slices computation in IndexSearcher [ In reply to ]
Hi Luca,
Thanks for the suggestion. Let me explore more on it and I will get back.
But I agree making the slices non-final will require consumers to ensure
slices are not mutated after the construction time and is not preferred.

Thanks,
Sorabh



On Tue, May 23, 2023 at 3:14?AM Luca Cavanna <javanna@apache.org> wrote:

> Hi Sorabh,
> thanks for explaining. I see what you mean, it does get awkward to
> customize how slices are created. If the plan is to make SliceExecutor
> public and extensible, would it make sense to figure out what its public
> methods should be, and include the slice creation in there so it is
> detached from the IndexSearcher? I think that in practice there is a
> correlation between how slices are created and how they are executed (e.g.
> you may want to limit the number of slices created and execute each one on
> a separate thread, or possibly have more slices than threads, and reuse the
> same thread to execute multiple slices). Moving the slice creation to the
> component responsible for their execution would give the desired
> flexibility for users to customize their logic without having to poke with
> IndexSearcher constructors? What do you think? I personally prefer this
> option over adding a function argument to the existing searcher
> constructor, or making the slices non-final which affects the inter-segment
> concurrency design (slices should not be mutable?).
>
> Cheers
> Luca
>
> On Fri, May 19, 2023 at 9:02?AM SorabhApache <sorabh@apache.org> wrote:
>
>> Hi Luca,
>> Thanks for your reply. Sharing an example below for clarity.
>>
>>
>>
>>
>>
>>
>>
>>
>> *public class CustomIndexSearcher extends IndexSearcher { public
>> CustomIndexSearcher(IndexReader reader, Executor executor, int
>> maxSliceCount) { super(reader, executor); }
>> @Override protected LeafSlice[] slices(List<LeafReaderContext>
>> leaves) {*
>> * // cannot use maxSliceCount here to control custom
>> logic as this is called from constructor of super class*
>> * // I want to use parameter[s] in the constructor input
>> to control this slice computation*
>> * }*
>> *}*
>>
>> Yes, the SliceExecutor class will become public. Will also need a
>> constructor in IndexSearcher which can take an implementation from the
>> extensions.
>>
>> Thanks,
>> Sorabh
>>
>>
>> On Thu, May 18, 2023 at 11:40?PM Luca Cavanna <luca@elastic.co.invalid>
>> wrote:
>>
>>> Hi Sorabh,
>>> You'll want to override the protected slices method to include your
>>> custom logic for creating the leaf slices. Your IndexSearcher extension can
>>> also retrieve the slices through the getSlices public method. I don't
>>> understand what makes the additional constructor necessary, could you
>>> clarify that for me?
>>>
>>> One thing that may make sense to do is making the SliceExecutor
>>> extensible. Currently it is package private, and I can see how users may
>>> want to provide their own implementation when it comes to handling
>>> rejections, executing on the caller thread in certain scenarios. Possibly
>>> even the task creation, and the coordination of their execution could be
>>> moved to the SliceExecutor too.
>>>
>>> Cheers
>>> Luca
>>>
>>> On Fri, May 19, 2023, 03:27 SorabhApache <sorabh@apache.org> wrote:
>>>
>>>> Hi All,
>>>>
>>>> For concurrent segment search, lucene uses the *slices* method to
>>>> compute the number of work units which can be processed concurrently.
>>>>
>>>> a) It calculates *slices* in the constructor of *IndexSearcher*
>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
>>>> with default thresholds for document count and segment counts.
>>>> b) Provides an implementation of *SliceExecutor* (i.e.
>>>> QueueSizeBasedExecutor)
>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
>>>> based on executor type which applies the backpressure in concurrent
>>>> execution based on a limiting factor of 1.5 times the passed in threadpool
>>>> maxPoolSize.
>>>>
>>>> In OpenSearch, we have a search threadpool which serves the search
>>>> request to all the lucene indices (or OpenSearch shards) assigned to a
>>>> node. Each node can get the requests to some or all the indices on that
>>>> node.
>>>> I am exploring a mechanism such that I can dynamically control the max
>>>> slices for each lucene index search request. For example: search requests
>>>> to some indices on that node to have max 4 slices each and others to have 2
>>>> slices each. Then the threadpool shared to execute these slices does not
>>>> have any limiting factor. In this model the top level search threadpool
>>>> will limit the number of active search requests which will limit the number
>>>> of work units in the SliceExecutor threadpool.
>>>>
>>>> For this the derived implementation of IndexSearcher can get an input
>>>> value in the constructor to control the slice count computation. Even
>>>> though the slice method is protected it gets called from the constructor of
>>>> base IndexSearcher class which prevents the derived class from using the
>>>> passed in input.
>>>>
>>>> To achieve this I can think of the following ways (in order of
>>>> preference) and would like to submit a pull request for it. But I wanted to
>>>> get some feedback if option 1 looks fine or take some other approach.
>>>>
>>>> 1. Provide another constructor in IndexSearcher which takes in 4 input
>>>> parameters:
>>>> protected IndexSearcher(IndexReaderContext context, Executor
>>>> executor, SliceExecutor sliceExecutor, Function<List<LeafReaderContext>,
>>>> LeafSlice[]> sliceProvider)
>>>>
>>>> 2. Make the `leafSlices` member protected and non final. After it is
>>>> initialized by the IndexSearcher (using default mechanism in lucene), the
>>>> derived implementation can again update it if need be (like based on some
>>>> input parameter to its own constructor). Also make the constructor with
>>>> SliceExecutor input protected such that derived implementation can provide
>>>> its own implementation of SliceExecutor. This mechanism will have redundant
>>>> computation of leafSlices.
>>>>
>>>>
>>>> Thanks,
>>>> Sorabh
>>>>
>>>
Re: Custom SliceExecutor and slices computation in IndexSearcher [ In reply to ]
Hi Luca,
I looked into moving the slice computation to SliceExecutor and using that
in the default case as well. This way the package private constructor with
SliceExecutor can be exposed and utilized by different extensions to
customize the slice computation and execution as well. I have created a
GitHub issue[1] and PR[2] to share the changes and will look forward to the
feedback.

[1]: https://github.com/apache/lucene/issues/12347
[2]: https://github.com/apache/lucene/pull/12348

Thanks,
Sorabh

On Sat, May 27, 2023 at 1:09?AM SorabhApache <sorabh@apache.org> wrote:

> Hi Luca,
> Thanks for the suggestion. Let me explore more on it and I will get back.
> But I agree making the slices non-final will require consumers to ensure
> slices are not mutated after the construction time and is not preferred.
>
> Thanks,
> Sorabh
>
>
>
> On Tue, May 23, 2023 at 3:14?AM Luca Cavanna <javanna@apache.org> wrote:
>
>> Hi Sorabh,
>> thanks for explaining. I see what you mean, it does get awkward to
>> customize how slices are created. If the plan is to make SliceExecutor
>> public and extensible, would it make sense to figure out what its public
>> methods should be, and include the slice creation in there so it is
>> detached from the IndexSearcher? I think that in practice there is a
>> correlation between how slices are created and how they are executed (e.g.
>> you may want to limit the number of slices created and execute each one on
>> a separate thread, or possibly have more slices than threads, and reuse the
>> same thread to execute multiple slices). Moving the slice creation to the
>> component responsible for their execution would give the desired
>> flexibility for users to customize their logic without having to poke with
>> IndexSearcher constructors? What do you think? I personally prefer this
>> option over adding a function argument to the existing searcher
>> constructor, or making the slices non-final which affects the inter-segment
>> concurrency design (slices should not be mutable?).
>>
>> Cheers
>> Luca
>>
>> On Fri, May 19, 2023 at 9:02?AM SorabhApache <sorabh@apache.org> wrote:
>>
>>> Hi Luca,
>>> Thanks for your reply. Sharing an example below for clarity.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *public class CustomIndexSearcher extends IndexSearcher { public
>>> CustomIndexSearcher(IndexReader reader, Executor executor, int
>>> maxSliceCount) { super(reader, executor); }
>>> @Override protected LeafSlice[] slices(List<LeafReaderContext>
>>> leaves) {*
>>> * // cannot use maxSliceCount here to control custom
>>> logic as this is called from constructor of super class*
>>> * // I want to use parameter[s] in the constructor input
>>> to control this slice computation*
>>> * }*
>>> *}*
>>>
>>> Yes, the SliceExecutor class will become public. Will also need a
>>> constructor in IndexSearcher which can take an implementation from the
>>> extensions.
>>>
>>> Thanks,
>>> Sorabh
>>>
>>>
>>> On Thu, May 18, 2023 at 11:40?PM Luca Cavanna <luca@elastic.co.invalid>
>>> wrote:
>>>
>>>> Hi Sorabh,
>>>> You'll want to override the protected slices method to include your
>>>> custom logic for creating the leaf slices. Your IndexSearcher extension can
>>>> also retrieve the slices through the getSlices public method. I don't
>>>> understand what makes the additional constructor necessary, could you
>>>> clarify that for me?
>>>>
>>>> One thing that may make sense to do is making the SliceExecutor
>>>> extensible. Currently it is package private, and I can see how users may
>>>> want to provide their own implementation when it comes to handling
>>>> rejections, executing on the caller thread in certain scenarios. Possibly
>>>> even the task creation, and the coordination of their execution could be
>>>> moved to the SliceExecutor too.
>>>>
>>>> Cheers
>>>> Luca
>>>>
>>>> On Fri, May 19, 2023, 03:27 SorabhApache <sorabh@apache.org> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> For concurrent segment search, lucene uses the *slices* method to
>>>>> compute the number of work units which can be processed concurrently.
>>>>>
>>>>> a) It calculates *slices* in the constructor of *IndexSearcher*
>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
>>>>> with default thresholds for document count and segment counts.
>>>>> b) Provides an implementation of *SliceExecutor* (i.e.
>>>>> QueueSizeBasedExecutor)
>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
>>>>> based on executor type which applies the backpressure in concurrent
>>>>> execution based on a limiting factor of 1.5 times the passed in threadpool
>>>>> maxPoolSize.
>>>>>
>>>>> In OpenSearch, we have a search threadpool which serves the search
>>>>> request to all the lucene indices (or OpenSearch shards) assigned to a
>>>>> node. Each node can get the requests to some or all the indices on that
>>>>> node.
>>>>> I am exploring a mechanism such that I can dynamically control the max
>>>>> slices for each lucene index search request. For example: search requests
>>>>> to some indices on that node to have max 4 slices each and others to have 2
>>>>> slices each. Then the threadpool shared to execute these slices does not
>>>>> have any limiting factor. In this model the top level search threadpool
>>>>> will limit the number of active search requests which will limit the number
>>>>> of work units in the SliceExecutor threadpool.
>>>>>
>>>>> For this the derived implementation of IndexSearcher can get an input
>>>>> value in the constructor to control the slice count computation. Even
>>>>> though the slice method is protected it gets called from the constructor of
>>>>> base IndexSearcher class which prevents the derived class from using the
>>>>> passed in input.
>>>>>
>>>>> To achieve this I can think of the following ways (in order of
>>>>> preference) and would like to submit a pull request for it. But I wanted to
>>>>> get some feedback if option 1 looks fine or take some other approach.
>>>>>
>>>>> 1. Provide another constructor in IndexSearcher which takes in 4 input
>>>>> parameters:
>>>>> protected IndexSearcher(IndexReaderContext context, Executor
>>>>> executor, SliceExecutor sliceExecutor, Function<List<LeafReaderContext>,
>>>>> LeafSlice[]> sliceProvider)
>>>>>
>>>>> 2. Make the `leafSlices` member protected and non final. After it is
>>>>> initialized by the IndexSearcher (using default mechanism in lucene), the
>>>>> derived implementation can again update it if need be (like based on some
>>>>> input parameter to its own constructor). Also make the constructor with
>>>>> SliceExecutor input protected such that derived implementation can provide
>>>>> its own implementation of SliceExecutor. This mechanism will have redundant
>>>>> computation of leafSlices.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Sorabh
>>>>>
>>>>
Re: Custom SliceExecutor and slices computation in IndexSearcher [ In reply to ]
Thanks Sorabh,
I will have a look in the coming days.

On Tue, Jun 6, 2023 at 12:04?AM SorabhApache <sorabh@apache.org> wrote:

> Hi Luca,
> I looked into moving the slice computation to SliceExecutor and using that
> in the default case as well. This way the package private constructor with
> SliceExecutor can be exposed and utilized by different extensions to
> customize the slice computation and execution as well. I have created a
> GitHub issue[1] and PR[2] to share the changes and will look forward to the
> feedback.
>
> [1]: https://github.com/apache/lucene/issues/12347
> [2]: https://github.com/apache/lucene/pull/12348
>
> Thanks,
> Sorabh
>
> On Sat, May 27, 2023 at 1:09?AM SorabhApache <sorabh@apache.org> wrote:
>
>> Hi Luca,
>> Thanks for the suggestion. Let me explore more on it and I will get back.
>> But I agree making the slices non-final will require consumers to ensure
>> slices are not mutated after the construction time and is not preferred.
>>
>> Thanks,
>> Sorabh
>>
>>
>>
>> On Tue, May 23, 2023 at 3:14?AM Luca Cavanna <javanna@apache.org> wrote:
>>
>>> Hi Sorabh,
>>> thanks for explaining. I see what you mean, it does get awkward to
>>> customize how slices are created. If the plan is to make SliceExecutor
>>> public and extensible, would it make sense to figure out what its public
>>> methods should be, and include the slice creation in there so it is
>>> detached from the IndexSearcher? I think that in practice there is a
>>> correlation between how slices are created and how they are executed (e.g.
>>> you may want to limit the number of slices created and execute each one on
>>> a separate thread, or possibly have more slices than threads, and reuse the
>>> same thread to execute multiple slices). Moving the slice creation to the
>>> component responsible for their execution would give the desired
>>> flexibility for users to customize their logic without having to poke with
>>> IndexSearcher constructors? What do you think? I personally prefer this
>>> option over adding a function argument to the existing searcher
>>> constructor, or making the slices non-final which affects the inter-segment
>>> concurrency design (slices should not be mutable?).
>>>
>>> Cheers
>>> Luca
>>>
>>> On Fri, May 19, 2023 at 9:02?AM SorabhApache <sorabh@apache.org> wrote:
>>>
>>>> Hi Luca,
>>>> Thanks for your reply. Sharing an example below for clarity.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *public class CustomIndexSearcher extends IndexSearcher {
>>>> public CustomIndexSearcher(IndexReader reader, Executor executor, int
>>>> maxSliceCount) { super(reader, executor); }
>>>> @Override protected LeafSlice[] slices(List<LeafReaderContext>
>>>> leaves) {*
>>>> * // cannot use maxSliceCount here to control custom
>>>> logic as this is called from constructor of super class*
>>>> * // I want to use parameter[s] in the constructor
>>>> input to control this slice computation*
>>>> * }*
>>>> *}*
>>>>
>>>> Yes, the SliceExecutor class will become public. Will also need a
>>>> constructor in IndexSearcher which can take an implementation from the
>>>> extensions.
>>>>
>>>> Thanks,
>>>> Sorabh
>>>>
>>>>
>>>> On Thu, May 18, 2023 at 11:40?PM Luca Cavanna <luca@elastic.co.invalid>
>>>> wrote:
>>>>
>>>>> Hi Sorabh,
>>>>> You'll want to override the protected slices method to include your
>>>>> custom logic for creating the leaf slices. Your IndexSearcher extension can
>>>>> also retrieve the slices through the getSlices public method. I don't
>>>>> understand what makes the additional constructor necessary, could you
>>>>> clarify that for me?
>>>>>
>>>>> One thing that may make sense to do is making the SliceExecutor
>>>>> extensible. Currently it is package private, and I can see how users may
>>>>> want to provide their own implementation when it comes to handling
>>>>> rejections, executing on the caller thread in certain scenarios. Possibly
>>>>> even the task creation, and the coordination of their execution could be
>>>>> moved to the SliceExecutor too.
>>>>>
>>>>> Cheers
>>>>> Luca
>>>>>
>>>>> On Fri, May 19, 2023, 03:27 SorabhApache <sorabh@apache.org> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> For concurrent segment search, lucene uses the *slices* method to
>>>>>> compute the number of work units which can be processed concurrently.
>>>>>>
>>>>>> a) It calculates *slices* in the constructor of *IndexSearcher*
>>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L239>
>>>>>> with default thresholds for document count and segment counts.
>>>>>> b) Provides an implementation of *SliceExecutor* (i.e.
>>>>>> QueueSizeBasedExecutor)
>>>>>> <https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L1008>
>>>>>> based on executor type which applies the backpressure in concurrent
>>>>>> execution based on a limiting factor of 1.5 times the passed in threadpool
>>>>>> maxPoolSize.
>>>>>>
>>>>>> In OpenSearch, we have a search threadpool which serves the search
>>>>>> request to all the lucene indices (or OpenSearch shards) assigned to a
>>>>>> node. Each node can get the requests to some or all the indices on that
>>>>>> node.
>>>>>> I am exploring a mechanism such that I can dynamically control the
>>>>>> max slices for each lucene index search request. For example: search
>>>>>> requests to some indices on that node to have max 4 slices each and others
>>>>>> to have 2 slices each. Then the threadpool shared to execute these slices
>>>>>> does not have any limiting factor. In this model the top level search
>>>>>> threadpool will limit the number of active search requests which will limit
>>>>>> the number of work units in the SliceExecutor threadpool.
>>>>>>
>>>>>> For this the derived implementation of IndexSearcher can get an input
>>>>>> value in the constructor to control the slice count computation. Even
>>>>>> though the slice method is protected it gets called from the constructor of
>>>>>> base IndexSearcher class which prevents the derived class from using the
>>>>>> passed in input.
>>>>>>
>>>>>> To achieve this I can think of the following ways (in order of
>>>>>> preference) and would like to submit a pull request for it. But I wanted to
>>>>>> get some feedback if option 1 looks fine or take some other approach.
>>>>>>
>>>>>> 1. Provide another constructor in IndexSearcher which takes in 4
>>>>>> input parameters:
>>>>>> protected IndexSearcher(IndexReaderContext context, Executor
>>>>>> executor, SliceExecutor sliceExecutor, Function<List<LeafReaderContext>,
>>>>>> LeafSlice[]> sliceProvider)
>>>>>>
>>>>>> 2. Make the `leafSlices` member protected and non final. After it is
>>>>>> initialized by the IndexSearcher (using default mechanism in lucene), the
>>>>>> derived implementation can again update it if need be (like based on some
>>>>>> input parameter to its own constructor). Also make the constructor with
>>>>>> SliceExecutor input protected such that derived implementation can provide
>>>>>> its own implementation of SliceExecutor. This mechanism will have redundant
>>>>>> computation of leafSlices.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Sorabh
>>>>>>
>>>>>