Hi,
in reference to previous code references and discussions from other
Lucene committers I have to clarify:
* If you run the query multithreaded (per segment), this means when
you add an Executor to IndexSearcher, the order is not predicatable,
plain simple
* If you use Solr, a single query is not multithreaded. Solr works on
shards and paralellizes them, but it does not parallelize search on
a single index
* If you want to have control on the order of segments when searching,
theres an easy way with pure lucene, Solr would need to be patched:
o don't pass Executor (see above)
o when constructing the IndexSearcher, don't simply pass
IndexReader but instead "customize it": There are two ways to do
it: (a) You can take the existing IndexReader and then get all
leave segments from it (IndexReader#leaves() call). Sort the
leaves in the order you like it to be searched and then create a
MultiReader on those sorged segments. (b) alternatively use
DirectoryReader#open() with a Comparator to sort the segments.
You could order them reverse on their segment ID.
Anyways, Solr needs to be patched, there are no API hooks to dig into
that. You may be able to subclass SolrIndexSearcher, but you still need
to hook it into the Solr control flow.
Uwe
Am 08.05.2023 um 16:47 schrieb Wei:
> Hi Michael,
>
> I am applying early termination with Solr's EarlyTerminatingCollector
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> ,
> which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
>
> Thanks,
> Wei
>
>
> On Thu, May 4, 2023 at 11:47?AM Michael Sokolov<msokolov@gmail.com> wrote:
>
>> Yes, sorry I didn't mean to imply you couldn't control this if you
>> want to. I guess in the typical setup it is not predictable. How are
>> you applying early termination? Are you using a standard Lucene
>> Collector or do you have your own?
>>
>> On Thu, May 4, 2023 at 2:03?PM Patrick Zhai<zhai7631@gmail.com> wrote:
>>> Hi Mike,
>>> Just want to mention if the user chooses to use single thread to index
>> and
>>> use LogXXMergePolicy then the document order will be preserved as index
>>> order.
>>>
>>>
>>>
>>> On Thu, May 4, 2023 at 10:04?AM Wei<weiwang19@gmail.com> wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> We are interested in the segment sequence for early termination. In our
>>>> case there is always a large dominant segment after index rebuild,
>> then
>>>> many small segments are generated with continuous updates as time goes
>> by.
>>>> When early termination is applied, the limit could be reached just for
>>>> traversing the dominant segment alone and the newer smaller segments
>>>> doesn't get a chance. If we can control the segment sequence so that
>> the
>>>> newer segments are visited first, the documents with recent updates
>> can be
>>>> retrieved with early termination. Do you think this makes sense? Any
>>>> suggestion is appreciated.
>>>>
>>>> Thanks,
>>>> Wei
>>>>
>>>> On Thu, May 4, 2023 at 3:33?AM Michael Sokolov<msokolov@gmail.com>
>> wrote:
>>>>> There is no meaning to the sequence. The segments are created
>>>> concurrently
>>>>> by many threads and the merge process will merge them without
>> regards to
>>>>> any ordering.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 3, 2023, 1:09 PM Patrick Zhai<zhai7631@gmail.com>
>> wrote:
>>>>>> For that part I'm not entirely sure, if other folks know it please
>>>> chime
>>>>> in
>>>>>> :)
>>>>>>
>>>>>> On Wed, May 3, 2023 at 8:48?AM Wei<weiwang19@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Patrick! In the default case when no LeafSorter is
>> provided,
>>>> are
>>>>>> the
>>>>>>> segments traversed in the order of creation time, i.e. the oldest
>>>>> segment
>>>>>>> is always visited first?
>>>>>>>
>>>>>>> Wei
>>>>>>>
>>>>>>> On Tue, May 2, 2023 at 7:22?PM Patrick Zhai<zhai7631@gmail.com>
>>>>> wrote:
>>>>>>>> Hi Wei,
>>>>>>>> Lucene in general iterate through the index in the order of
>> what is
>>>>>>>> recorded in the SegmentInfos
>>>>>>>> <
>>>>>>>>
>> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
>>>>>>>> And at search time, you can specify the order using LeafSorter
>>>>>>>> <
>>>>>>>>
>> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
>>>>>>>> when you're opening the IndexReader
>>>>>>>>
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>> On Tue, May 2, 2023 at 5:28?PM Wei<weiwang19@gmail.com>
>> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> We have a index that has multiple segments generated with
>>>>> continuous
>>>>>>>>> updates. Does Lucene have a specific order when iterate
>> through
>>>>> the
>>>>>>>>> segments (assuming single query thread) ? Can the order be
>>>>> customized
>>>>>>>> that
>>>>>>>>> the latest generated segments are searched first?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Wei
>>>>>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:java-user-help@lucene.apache.org
>>
>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de eMail:uwe@thetaphi.de