Mailing List Archive

Question about index segment search order
Hello,

We have a index that has multiple segments generated with continuous
updates. Does Lucene have a specific order when iterate through the
segments (assuming single query thread) ? Can the order be customized that
the latest generated segments are searched first?

Thanks,
Wei
Re: Question about index segment search order [ In reply to ]
Hi Wei,
Lucene in general iterate through the index in the order of what is
recorded in the SegmentInfos
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140>
And at search time, you can specify the order using LeafSorter
<https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75>
when you're opening the IndexReader

Patrick

On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:

> Hello,
>
> We have a index that has multiple segments generated with continuous
> updates. Does Lucene have a specific order when iterate through the
> segments (assuming single query thread) ? Can the order be customized that
> the latest generated segments are searched first?
>
> Thanks,
> Wei
>
Re: Question about index segment search order [ In reply to ]
Thanks Patrick! In the default case when no LeafSorter is provided, are the
segments traversed in the order of creation time, i.e. the oldest segment
is always visited first?

Wei

On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com> wrote:

> Hi Wei,
> Lucene in general iterate through the index in the order of what is
> recorded in the SegmentInfos
> <
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> >
> And at search time, you can specify the order using LeafSorter
> <
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> >
> when you're opening the IndexReader
>
> Patrick
>
> On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:
>
> > Hello,
> >
> > We have a index that has multiple segments generated with continuous
> > updates. Does Lucene have a specific order when iterate through the
> > segments (assuming single query thread) ? Can the order be customized
> that
> > the latest generated segments are searched first?
> >
> > Thanks,
> > Wei
> >
>
Re: Question about index segment search order [ In reply to ]
For that part I'm not entirely sure, if other folks know it please chime in
:)

On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:

> Thanks Patrick! In the default case when no LeafSorter is provided, are the
> segments traversed in the order of creation time, i.e. the oldest segment
> is always visited first?
>
> Wei
>
> On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com> wrote:
>
> > Hi Wei,
> > Lucene in general iterate through the index in the order of what is
> > recorded in the SegmentInfos
> > <
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > >
> > And at search time, you can specify the order using LeafSorter
> > <
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > >
> > when you're opening the IndexReader
> >
> > Patrick
> >
> > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > We have a index that has multiple segments generated with continuous
> > > updates. Does Lucene have a specific order when iterate through the
> > > segments (assuming single query thread) ? Can the order be customized
> > that
> > > the latest generated segments are searched first?
> > >
> > > Thanks,
> > > Wei
> > >
> >
>
Re: Question about index segment search order [ In reply to ]
There is no meaning to the sequence. The segments are created concurrently
by many threads and the merge process will merge them without regards to
any ordering.



On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com> wrote:

> For that part I'm not entirely sure, if other folks know it please chime in
> :)
>
> On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:
>
> > Thanks Patrick! In the default case when no LeafSorter is provided, are
> the
> > segments traversed in the order of creation time, i.e. the oldest segment
> > is always visited first?
> >
> > Wei
> >
> > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com> wrote:
> >
> > > Hi Wei,
> > > Lucene in general iterate through the index in the order of what is
> > > recorded in the SegmentInfos
> > > <
> > >
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > >
> > > And at search time, you can specify the order using LeafSorter
> > > <
> > >
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > >
> > > when you're opening the IndexReader
> > >
> > > Patrick
> > >
> > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > We have a index that has multiple segments generated with continuous
> > > > updates. Does Lucene have a specific order when iterate through the
> > > > segments (assuming single query thread) ? Can the order be customized
> > > that
> > > > the latest generated segments are searched first?
> > > >
> > > > Thanks,
> > > > Wei
> > > >
> > >
> >
>
Re: Question about index segment search order [ In reply to ]
Hi Michael,

We are interested in the segment sequence for early termination. In our
case there is always a large dominant segment after index rebuild, then
many small segments are generated with continuous updates as time goes by.
When early termination is applied, the limit could be reached just for
traversing the dominant segment alone and the newer smaller segments
doesn't get a chance. If we can control the segment sequence so that the
newer segments are visited first, the documents with recent updates can be
retrieved with early termination. Do you think this makes sense? Any
suggestion is appreciated.

Thanks,
Wei

On Thu, May 4, 2023 at 3:33?AM Michael Sokolov <msokolov@gmail.com> wrote:

> There is no meaning to the sequence. The segments are created concurrently
> by many threads and the merge process will merge them without regards to
> any ordering.
>
>
>
> On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com> wrote:
>
> > For that part I'm not entirely sure, if other folks know it please chime
> in
> > :)
> >
> > On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:
> >
> > > Thanks Patrick! In the default case when no LeafSorter is provided, are
> > the
> > > segments traversed in the order of creation time, i.e. the oldest
> segment
> > > is always visited first?
> > >
> > > Wei
> > >
> > > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com>
> wrote:
> > >
> > > > Hi Wei,
> > > > Lucene in general iterate through the index in the order of what is
> > > > recorded in the SegmentInfos
> > > > <
> > > >
> > >
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > > >
> > > > And at search time, you can specify the order using LeafSorter
> > > > <
> > > >
> > >
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > > >
> > > > when you're opening the IndexReader
> > > >
> > > > Patrick
> > > >
> > > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > We have a index that has multiple segments generated with
> continuous
> > > > > updates. Does Lucene have a specific order when iterate through
> the
> > > > > segments (assuming single query thread) ? Can the order be
> customized
> > > > that
> > > > > the latest generated segments are searched first?
> > > > >
> > > > > Thanks,
> > > > > Wei
> > > > >
> > > >
> > >
> >
>
Re: Question about index segment search order [ In reply to ]
Hi Mike,
Just want to mention if the user chooses to use single thread to index and
use LogXXMergePolicy then the document order will be preserved as index
order.



On Thu, May 4, 2023 at 10:04?AM Wei <weiwang19@gmail.com> wrote:

> Hi Michael,
>
> We are interested in the segment sequence for early termination. In our
> case there is always a large dominant segment after index rebuild, then
> many small segments are generated with continuous updates as time goes by.
> When early termination is applied, the limit could be reached just for
> traversing the dominant segment alone and the newer smaller segments
> doesn't get a chance. If we can control the segment sequence so that the
> newer segments are visited first, the documents with recent updates can be
> retrieved with early termination. Do you think this makes sense? Any
> suggestion is appreciated.
>
> Thanks,
> Wei
>
> On Thu, May 4, 2023 at 3:33?AM Michael Sokolov <msokolov@gmail.com> wrote:
>
> > There is no meaning to the sequence. The segments are created
> concurrently
> > by many threads and the merge process will merge them without regards to
> > any ordering.
> >
> >
> >
> > On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com> wrote:
> >
> > > For that part I'm not entirely sure, if other folks know it please
> chime
> > in
> > > :)
> > >
> > > On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:
> > >
> > > > Thanks Patrick! In the default case when no LeafSorter is provided,
> are
> > > the
> > > > segments traversed in the order of creation time, i.e. the oldest
> > segment
> > > > is always visited first?
> > > >
> > > > Wei
> > > >
> > > > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com>
> > wrote:
> > > >
> > > > > Hi Wei,
> > > > > Lucene in general iterate through the index in the order of what is
> > > > > recorded in the SegmentInfos
> > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > > > >
> > > > > And at search time, you can specify the order using LeafSorter
> > > > > <
> > > > >
> > > >
> > >
> >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > > > >
> > > > > when you're opening the IndexReader
> > > > >
> > > > > Patrick
> > > > >
> > > > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > We have a index that has multiple segments generated with
> > continuous
> > > > > > updates. Does Lucene have a specific order when iterate through
> > the
> > > > > > segments (assuming single query thread) ? Can the order be
> > customized
> > > > > that
> > > > > > the latest generated segments are searched first?
> > > > > >
> > > > > > Thanks,
> > > > > > Wei
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: Question about index segment search order [ In reply to ]
Yes, sorry I didn't mean to imply you couldn't control this if you
want to. I guess in the typical setup it is not predictable. How are
you applying early termination? Are you using a standard Lucene
Collector or do you have your own?

On Thu, May 4, 2023 at 2:03?PM Patrick Zhai <zhai7631@gmail.com> wrote:
>
> Hi Mike,
> Just want to mention if the user chooses to use single thread to index and
> use LogXXMergePolicy then the document order will be preserved as index
> order.
>
>
>
> On Thu, May 4, 2023 at 10:04?AM Wei <weiwang19@gmail.com> wrote:
>
> > Hi Michael,
> >
> > We are interested in the segment sequence for early termination. In our
> > case there is always a large dominant segment after index rebuild, then
> > many small segments are generated with continuous updates as time goes by.
> > When early termination is applied, the limit could be reached just for
> > traversing the dominant segment alone and the newer smaller segments
> > doesn't get a chance. If we can control the segment sequence so that the
> > newer segments are visited first, the documents with recent updates can be
> > retrieved with early termination. Do you think this makes sense? Any
> > suggestion is appreciated.
> >
> > Thanks,
> > Wei
> >
> > On Thu, May 4, 2023 at 3:33?AM Michael Sokolov <msokolov@gmail.com> wrote:
> >
> > > There is no meaning to the sequence. The segments are created
> > concurrently
> > > by many threads and the merge process will merge them without regards to
> > > any ordering.
> > >
> > >
> > >
> > > On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com> wrote:
> > >
> > > > For that part I'm not entirely sure, if other folks know it please
> > chime
> > > in
> > > > :)
> > > >
> > > > On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:
> > > >
> > > > > Thanks Patrick! In the default case when no LeafSorter is provided,
> > are
> > > > the
> > > > > segments traversed in the order of creation time, i.e. the oldest
> > > segment
> > > > > is always visited first?
> > > > >
> > > > > Wei
> > > > >
> > > > > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Wei,
> > > > > > Lucene in general iterate through the index in the order of what is
> > > > > > recorded in the SegmentInfos
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > > > > >
> > > > > > And at search time, you can specify the order using LeafSorter
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > > > > >
> > > > > > when you're opening the IndexReader
> > > > > >
> > > > > > Patrick
> > > > > >
> > > > > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com> wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > We have a index that has multiple segments generated with
> > > continuous
> > > > > > > updates. Does Lucene have a specific order when iterate through
> > > the
> > > > > > > segments (assuming single query thread) ? Can the order be
> > > customized
> > > > > > that
> > > > > > > the latest generated segments are searched first?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Wei
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Question about index segment search order [ In reply to ]
Hi Michael,

I am applying early termination with Solr's EarlyTerminatingCollector
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
,
which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281

Thanks,
Wei


On Thu, May 4, 2023 at 11:47?AM Michael Sokolov <msokolov@gmail.com> wrote:

> Yes, sorry I didn't mean to imply you couldn't control this if you
> want to. I guess in the typical setup it is not predictable. How are
> you applying early termination? Are you using a standard Lucene
> Collector or do you have your own?
>
> On Thu, May 4, 2023 at 2:03?PM Patrick Zhai <zhai7631@gmail.com> wrote:
> >
> > Hi Mike,
> > Just want to mention if the user chooses to use single thread to index
> and
> > use LogXXMergePolicy then the document order will be preserved as index
> > order.
> >
> >
> >
> > On Thu, May 4, 2023 at 10:04?AM Wei <weiwang19@gmail.com> wrote:
> >
> > > Hi Michael,
> > >
> > > We are interested in the segment sequence for early termination. In our
> > > case there is always a large dominant segment after index rebuild,
> then
> > > many small segments are generated with continuous updates as time goes
> by.
> > > When early termination is applied, the limit could be reached just for
> > > traversing the dominant segment alone and the newer smaller segments
> > > doesn't get a chance. If we can control the segment sequence so that
> the
> > > newer segments are visited first, the documents with recent updates
> can be
> > > retrieved with early termination. Do you think this makes sense? Any
> > > suggestion is appreciated.
> > >
> > > Thanks,
> > > Wei
> > >
> > > On Thu, May 4, 2023 at 3:33?AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> > >
> > > > There is no meaning to the sequence. The segments are created
> > > concurrently
> > > > by many threads and the merge process will merge them without
> regards to
> > > > any ordering.
> > > >
> > > >
> > > >
> > > > On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com>
> wrote:
> > > >
> > > > > For that part I'm not entirely sure, if other folks know it please
> > > chime
> > > > in
> > > > > :)
> > > > >
> > > > > On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:
> > > > >
> > > > > > Thanks Patrick! In the default case when no LeafSorter is
> provided,
> > > are
> > > > > the
> > > > > > segments traversed in the order of creation time, i.e. the oldest
> > > > segment
> > > > > > is always visited first?
> > > > > >
> > > > > > Wei
> > > > > >
> > > > > > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Hi Wei,
> > > > > > > Lucene in general iterate through the index in the order of
> what is
> > > > > > > recorded in the SegmentInfos
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > > > > > >
> > > > > > > And at search time, you can specify the order using LeafSorter
> > > > > > > <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > > > > > >
> > > > > > > when you're opening the IndexReader
> > > > > > >
> > > > > > > Patrick
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com>
> wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > We have a index that has multiple segments generated with
> > > > continuous
> > > > > > > > updates. Does Lucene have a specific order when iterate
> through
> > > > the
> > > > > > > > segments (assuming single query thread) ? Can the order be
> > > > customized
> > > > > > > that
> > > > > > > > the latest generated segments are searched first?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Wei
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Question about index segment search order [ In reply to ]
Maybe ask this issue on solr-dev then? I'm not familiar with how that
collector works. Does it count hits across all segments? only within a
single segment?

On Tue, May 9, 2023 at 1:36?PM Wei <weiwang19@gmail.com> wrote:
>
> Hi Michael,
>
> I am applying early termination with Solr's EarlyTerminatingCollector
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> ,
> which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
>
> Thanks,
> Wei
>
>
> On Thu, May 4, 2023 at 11:47?AM Michael Sokolov <msokolov@gmail.com> wrote:
>
> > Yes, sorry I didn't mean to imply you couldn't control this if you
> > want to. I guess in the typical setup it is not predictable. How are
> > you applying early termination? Are you using a standard Lucene
> > Collector or do you have your own?
> >
> > On Thu, May 4, 2023 at 2:03?PM Patrick Zhai <zhai7631@gmail.com> wrote:
> > >
> > > Hi Mike,
> > > Just want to mention if the user chooses to use single thread to index
> > and
> > > use LogXXMergePolicy then the document order will be preserved as index
> > > order.
> > >
> > >
> > >
> > > On Thu, May 4, 2023 at 10:04?AM Wei <weiwang19@gmail.com> wrote:
> > >
> > > > Hi Michael,
> > > >
> > > > We are interested in the segment sequence for early termination. In our
> > > > case there is always a large dominant segment after index rebuild,
> > then
> > > > many small segments are generated with continuous updates as time goes
> > by.
> > > > When early termination is applied, the limit could be reached just for
> > > > traversing the dominant segment alone and the newer smaller segments
> > > > doesn't get a chance. If we can control the segment sequence so that
> > the
> > > > newer segments are visited first, the documents with recent updates
> > can be
> > > > retrieved with early termination. Do you think this makes sense? Any
> > > > suggestion is appreciated.
> > > >
> > > > Thanks,
> > > > Wei
> > > >
> > > > On Thu, May 4, 2023 at 3:33?AM Michael Sokolov <msokolov@gmail.com>
> > wrote:
> > > >
> > > > > There is no meaning to the sequence. The segments are created
> > > > concurrently
> > > > > by many threads and the merge process will merge them without
> > regards to
> > > > > any ordering.
> > > > >
> > > > >
> > > > >
> > > > > On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com>
> > wrote:
> > > > >
> > > > > > For that part I'm not entirely sure, if other folks know it please
> > > > chime
> > > > > in
> > > > > > :)
> > > > > >
> > > > > > On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com> wrote:
> > > > > >
> > > > > > > Thanks Patrick! In the default case when no LeafSorter is
> > provided,
> > > > are
> > > > > > the
> > > > > > > segments traversed in the order of creation time, i.e. the oldest
> > > > > segment
> > > > > > > is always visited first?
> > > > > > >
> > > > > > > Wei
> > > > > > >
> > > > > > > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <zhai7631@gmail.com>
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi Wei,
> > > > > > > > Lucene in general iterate through the index in the order of
> > what is
> > > > > > > > recorded in the SegmentInfos
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > > > > > > >
> > > > > > > > And at search time, you can specify the order using LeafSorter
> > > > > > > > <
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > > > > > > >
> > > > > > > > when you're opening the IndexReader
> > > > > > > >
> > > > > > > > Patrick
> > > > > > > >
> > > > > > > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > We have a index that has multiple segments generated with
> > > > > continuous
> > > > > > > > > updates. Does Lucene have a specific order when iterate
> > through
> > > > > the
> > > > > > > > > segments (assuming single query thread) ? Can the order be
> > > > > customized
> > > > > > > > that
> > > > > > > > > the latest generated segments are searched first?
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Wei
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Question about index segment search order [ In reply to ]
Hi Michael,

Yes the collector counts hits across all segments. Thanks for the
suggestion, I'm also asking the question on solr-dev.

Wei

On Thu, May 11, 2023 at 11:57?AM Michael Sokolov <msokolov@gmail.com> wrote:

> Maybe ask this issue on solr-dev then? I'm not familiar with how that
> collector works. Does it count hits across all segments? only within a
> single segment?
>
> On Tue, May 9, 2023 at 1:36?PM Wei <weiwang19@gmail.com> wrote:
> >
> > Hi Michael,
> >
> > I am applying early termination with Solr's EarlyTerminatingCollector
> >
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> > ,
> > which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
> >
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
> >
> > Thanks,
> > Wei
> >
> >
> > On Thu, May 4, 2023 at 11:47?AM Michael Sokolov <msokolov@gmail.com>
> wrote:
> >
> > > Yes, sorry I didn't mean to imply you couldn't control this if you
> > > want to. I guess in the typical setup it is not predictable. How are
> > > you applying early termination? Are you using a standard Lucene
> > > Collector or do you have your own?
> > >
> > > On Thu, May 4, 2023 at 2:03?PM Patrick Zhai <zhai7631@gmail.com>
> wrote:
> > > >
> > > > Hi Mike,
> > > > Just want to mention if the user chooses to use single thread to
> index
> > > and
> > > > use LogXXMergePolicy then the document order will be preserved as
> index
> > > > order.
> > > >
> > > >
> > > >
> > > > On Thu, May 4, 2023 at 10:04?AM Wei <weiwang19@gmail.com> wrote:
> > > >
> > > > > Hi Michael,
> > > > >
> > > > > We are interested in the segment sequence for early termination.
> In our
> > > > > case there is always a large dominant segment after index rebuild,
> > > then
> > > > > many small segments are generated with continuous updates as time
> goes
> > > by.
> > > > > When early termination is applied, the limit could be reached just
> for
> > > > > traversing the dominant segment alone and the newer smaller
> segments
> > > > > doesn't get a chance. If we can control the segment sequence so
> that
> > > the
> > > > > newer segments are visited first, the documents with recent updates
> > > can be
> > > > > retrieved with early termination. Do you think this makes sense?
> Any
> > > > > suggestion is appreciated.
> > > > >
> > > > > Thanks,
> > > > > Wei
> > > > >
> > > > > On Thu, May 4, 2023 at 3:33?AM Michael Sokolov <msokolov@gmail.com
> >
> > > wrote:
> > > > >
> > > > > > There is no meaning to the sequence. The segments are created
> > > > > concurrently
> > > > > > by many threads and the merge process will merge them without
> > > regards to
> > > > > > any ordering.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, May 3, 2023, 1:09 PM Patrick Zhai <zhai7631@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > For that part I'm not entirely sure, if other folks know it
> please
> > > > > chime
> > > > > > in
> > > > > > > :)
> > > > > > >
> > > > > > > On Wed, May 3, 2023 at 8:48?AM Wei <weiwang19@gmail.com>
> wrote:
> > > > > > >
> > > > > > > > Thanks Patrick! In the default case when no LeafSorter is
> > > provided,
> > > > > are
> > > > > > > the
> > > > > > > > segments traversed in the order of creation time, i.e. the
> oldest
> > > > > > segment
> > > > > > > > is always visited first?
> > > > > > > >
> > > > > > > > Wei
> > > > > > > >
> > > > > > > > On Tue, May 2, 2023 at 7:22?PM Patrick Zhai <
> zhai7631@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Wei,
> > > > > > > > > Lucene in general iterate through the index in the order of
> > > what is
> > > > > > > > > recorded in the SegmentInfos
> > > > > > > > > <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
> > > > > > > > > >
> > > > > > > > > And at search time, you can specify the order using
> LeafSorter
> > > > > > > > > <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
> > > > > > > > > >
> > > > > > > > > when you're opening the IndexReader
> > > > > > > > >
> > > > > > > > > Patrick
> > > > > > > > >
> > > > > > > > > On Tue, May 2, 2023 at 5:28?PM Wei <weiwang19@gmail.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > We have a index that has multiple segments generated with
> > > > > > continuous
> > > > > > > > > > updates. Does Lucene have a specific order when iterate
> > > through
> > > > > > the
> > > > > > > > > > segments (assuming single query thread) ? Can the order
> be
> > > > > > customized
> > > > > > > > > that
> > > > > > > > > > the latest generated segments are searched first?
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Wei
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Question about index segment search order [ In reply to ]
Hi,

in reference to previous code references and discussions from other
Lucene committers I have to clarify:

* If you run the query multithreaded (per segment), this means when
you add an Executor to IndexSearcher, the order is not predicatable,
plain simple
* If you use Solr, a single query is not multithreaded. Solr works on
shards and paralellizes them, but it does not parallelize search on
a single index
* If you want to have control on the order of segments when searching,
theres an easy way with pure lucene, Solr would need to be patched:
o don't pass Executor (see above)
o when constructing the IndexSearcher, don't simply pass
IndexReader but instead "customize it": There are two ways to do
it: (a) You can take the existing IndexReader and then get all
leave segments from it (IndexReader#leaves() call). Sort the
leaves in the order you like it to be searched and then create a
MultiReader on those sorged segments. (b) alternatively use
DirectoryReader#open() with a Comparator to sort the segments.
You could order them reverse on their segment ID.

Anyways, Solr needs to be patched, there are no API hooks to dig into
that. You may be able to subclass SolrIndexSearcher, but you still need
to hook it into the Solr control flow.

Uwe

Am 08.05.2023 um 16:47 schrieb Wei:
> Hi Michael,
>
> I am applying early termination with Solr's EarlyTerminatingCollector
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java
> ,
> which triggers EarlyTerminatingCollectorException in SolrIndexSearcher
> https://github.com/apache/solr/blob/d9ddba3ac51ece953d762c796f62730e27629966/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L281
>
> Thanks,
> Wei
>
>
> On Thu, May 4, 2023 at 11:47?AM Michael Sokolov<msokolov@gmail.com> wrote:
>
>> Yes, sorry I didn't mean to imply you couldn't control this if you
>> want to. I guess in the typical setup it is not predictable. How are
>> you applying early termination? Are you using a standard Lucene
>> Collector or do you have your own?
>>
>> On Thu, May 4, 2023 at 2:03?PM Patrick Zhai<zhai7631@gmail.com> wrote:
>>> Hi Mike,
>>> Just want to mention if the user chooses to use single thread to index
>> and
>>> use LogXXMergePolicy then the document order will be preserved as index
>>> order.
>>>
>>>
>>>
>>> On Thu, May 4, 2023 at 10:04?AM Wei<weiwang19@gmail.com> wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> We are interested in the segment sequence for early termination. In our
>>>> case there is always a large dominant segment after index rebuild,
>> then
>>>> many small segments are generated with continuous updates as time goes
>> by.
>>>> When early termination is applied, the limit could be reached just for
>>>> traversing the dominant segment alone and the newer smaller segments
>>>> doesn't get a chance. If we can control the segment sequence so that
>> the
>>>> newer segments are visited first, the documents with recent updates
>> can be
>>>> retrieved with early termination. Do you think this makes sense? Any
>>>> suggestion is appreciated.
>>>>
>>>> Thanks,
>>>> Wei
>>>>
>>>> On Thu, May 4, 2023 at 3:33?AM Michael Sokolov<msokolov@gmail.com>
>> wrote:
>>>>> There is no meaning to the sequence. The segments are created
>>>> concurrently
>>>>> by many threads and the merge process will merge them without
>> regards to
>>>>> any ordering.
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 3, 2023, 1:09 PM Patrick Zhai<zhai7631@gmail.com>
>> wrote:
>>>>>> For that part I'm not entirely sure, if other folks know it please
>>>> chime
>>>>> in
>>>>>> :)
>>>>>>
>>>>>> On Wed, May 3, 2023 at 8:48?AM Wei<weiwang19@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Patrick! In the default case when no LeafSorter is
>> provided,
>>>> are
>>>>>> the
>>>>>>> segments traversed in the order of creation time, i.e. the oldest
>>>>> segment
>>>>>>> is always visited first?
>>>>>>>
>>>>>>> Wei
>>>>>>>
>>>>>>> On Tue, May 2, 2023 at 7:22?PM Patrick Zhai<zhai7631@gmail.com>
>>>>> wrote:
>>>>>>>> Hi Wei,
>>>>>>>> Lucene in general iterate through the index in the order of
>> what is
>>>>>>>> recorded in the SegmentInfos
>>>>>>>> <
>>>>>>>>
>> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java#L140
>>>>>>>> And at search time, you can specify the order using LeafSorter
>>>>>>>> <
>>>>>>>>
>> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DirectoryReader.java#L75
>>>>>>>> when you're opening the IndexReader
>>>>>>>>
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>> On Tue, May 2, 2023 at 5:28?PM Wei<weiwang19@gmail.com>
>> wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> We have a index that has multiple segments generated with
>>>>> continuous
>>>>>>>>> updates. Does Lucene have a specific order when iterate
>> through
>>>>> the
>>>>>>>>> segments (assuming single query thread) ? Can the order be
>>>>> customized
>>>>>>>> that
>>>>>>>>> the latest generated segments are searched first?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Wei
>>>>>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:java-user-help@lucene.apache.org
>>
>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de