Mailing List Archive: Consistent NRT searching with SearcherLifetimeManager and multiple instances

Consistent NRT searching with SearcherLifetimeManager and multiple instances

Dec 13, 2023, 1:46 PM

Post #1 of 3 (116 views)

Hi lucene-users,

We use the lucene-replicator to have a single indexing node push commits and NRT updates
to a set of replicas.

Currently, each replica has the full dataset - there is no sharding.

We use a SearcherLifetimeManager to try to provide consistent pagination over results.

So when we present the first page of results, we return the result of `record(IndexSearcher)` to the client,
with the expectation that at a later time (but not too much later) they might request page 2 of results with version X.

This works fine for a single instance, since the SearcherLifetimeManager keeps the remembered version around.
However, with multiple instances, this doesn't seem to work at all -
your first request goes to replica A, who calls `record(searcher) -> X`.

The second request likely goes to a different instance B,
whose lifetime manager never saw a call to `record` at all - so the `acquire(X)` fails and returns null.

Surely there must be a way to solve this -
how do you implement consistent versioned searching like SearcherLifetimeManager, but with multiple Lucene replicas
who otherwise do not coordinate about which NRT versions get opened or recorded?

Thanks for any advice,
Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Consistent NRT searching with SearcherLifetimeManager and multiple instances [ In reply to ]

lucene at mikemccandless

Dec 14, 2023, 2:57 AM

Post #2 of 3 (116 views)

Permalink

Hi Steven,

Great question! I'm so glad to hear your app is providing consistent
pagination :) I've long felt Lucene (with NRT segment replication) could
do a great job at this, yet so few apps manage to implement it. Every time
I interact with a search engine and go to the next page it irks me that I
might be missing some results...

First off, the point-in-time IndexSearchers are keyed into the
SearcherLifetimeManager by their underlying IndexReader.getVersion(), which
returns the long value from the underlying SegmentInfos.getVersion().

This is good news because it means all of your replicas will see the same
long version mapping to the same point-in-time view of the index,
even across replicas, since that same SegmentInfos is sent to all replicas
by the primary node.

Second, each of your replicas should simply assume that every point-in-time
IndexSearcher may be used at any time by an incoming search request, and
enroll all refreshed IndexSearchers into the local
SearcherLifetimeManager. This way, no matter where the followon requests
go, that replica will have that IndexSearcher version. This is not as
costly as it sounds because a refreshed IndexSearcher will in general share
nearly all of its segments with the prior one(s).

This requires a periodic refresh schedule, and all replicas should quickly
refresh when the primary publishes a new point-in-time SegmentInfos.

There is some small risk if replicas do not refresh consistently around
the same time, and page 2 for a query goes to a replica that has not yet
refreshed. This ought to be rare, since it'd mean a human loaded page 1
from a replica that had already refreshed, consumed the results, then
clicked on page 2, and by then replicas should (typically) all have
refreshed. When it happens, you could either have the query wait for the
refresh to completely (somewhat dangerous since such queries could pile up
if something is seriously wrong with that node and its refreshing is
sluggish), or, simply retry the query to another replica: eventually it
will find a replica that has the point-in-time IndexSearcher already
refreshed.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Dec 13, 2023 at 4:47?PM Steven Schlansker <
stevenschlansker@gmail.com> wrote:

> Hi lucene-users,
>
> We use the lucene-replicator to have a single indexing node push commits
> and NRT updates
> to a set of replicas.
>
> Currently, each replica has the full dataset - there is no sharding.
>
> We use a SearcherLifetimeManager to try to provide consistent pagination
> over results.
>
> So when we present the first page of results, we return the result of
> `record(IndexSearcher)` to the client,
> with the expectation that at a later time (but not too much later) they
> might request page 2 of results with version X.
>
> This works fine for a single instance, since the SearcherLifetimeManager
> keeps the remembered version around.
> However, with multiple instances, this doesn't seem to work at all -
> your first request goes to replica A, who calls `record(searcher) -> X`.
>
> The second request likely goes to a different instance B,
> whose lifetime manager never saw a call to `record` at all - so the
> `acquire(X)` fails and returns null.
>
> Surely there must be a way to solve this -
> how do you implement consistent versioned searching like
> SearcherLifetimeManager, but with multiple Lucene replicas
> who otherwise do not coordinate about which NRT versions get opened or
> recorded?
>
> Thanks for any advice,
> Steven
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Consistent NRT searching with SearcherLifetimeManager and multiple instances [ In reply to ]

stevenschlansker at gmail

Dec 14, 2023, 11:13 AM

Post #3 of 3 (116 views)

Permalink

Thanks for the reply!

On Thu, Dec 14, 2023 at 2:58?AM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> Hi Steven,
>
> Great question! I'm so glad to hear your app is providing consistent pagination :) I've long felt Lucene (with NRT segment replication) could do a great job at this, yet so few apps manage to implement it. Every time I interact with a search engine and go to the next page it irks me that I might be missing some results...

Yes, this bugged me too. Everyone else looks at me like I'm crazy but
darnit my search engine needs to be correct to start with or what hope
do we have with the higher level features! :)

>
> First off, the point-in-time IndexSearchers are keyed into the SearcherLifetimeManager by their underlying IndexReader.getVersion(), which returns the long value from the underlying SegmentInfos.getVersion().
>
> This is good news because it means all of your replicas will see the same long version mapping to the same point-in-time view of the index, even across replicas, since that same SegmentInfos is sent to all replicas by the primary node.

Good news indeed.

>
> Second, each of your replicas should simply assume that every point-in-time IndexSearcher may be used at any time by an incoming search request, and enroll all refreshed IndexSearchers into the local SearcherLifetimeManager. This way, no matter where the followon requests go, that replica will have that IndexSearcher version. This is not as costly as it sounds because a refreshed IndexSearcher will in general share nearly all of its segments with the prior one(s).

OK. Right now we update about every second, and retain 15 minutes of
memory, so that would be 900 searchers right now for us.

>
> This requires a periodic refresh schedule, and all replicas should quickly refresh when the primary publishes a new point-in-time SegmentInfos.
>
> There is some small risk if replicas do not refresh consistently around the same time, and page 2 for a query goes to a replica that has not yet refreshed. This ought to be rare, since it'd mean a human loaded page 1 from a replica that had already refreshed, consumed the results, then clicked on page 2, and by then replicas should (typically) all have refreshed. When it happens, you could either have the query wait for the refresh to completely (somewhat dangerous since such queries could pile up if something is seriously wrong with that node and its refreshing is sluggish), or, simply retry the query to another replica: eventually it will find a replica that has the point-in-time IndexSearcher already refreshed.

Our search engine is expected to be correct at faster than human speed
- for example we have end-to-end tests that drive our app, make
changes, and expect to immediately take further actions based on the
result of those changes. We're trying to avoid "sleep for a while to
let search catch up" states, or at least have them look more like a
condition with wakeup rather than a timed sleep.

Right now, in a loop each replica sends a http long-poll to the
primary. When the primary refreshes, we wake all the long-pollers and
send them the new segmentinfos.

It looks like after we push the new nrt point, the replica node will
swap in the segmentinfos into its SearcherManager, and then try to
refresh. This looks like it will call refresh listeners.
So, I think our implementation could be as simple as adding a searcher
manager refresh listener to each replica that acquires and records
every time we load new infos - and it might all just work! Maybe.
We'll see... If there's still troubles, we can then add
retry-to-different-instance as well.

Thanks for the help :)

>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Dec 13, 2023 at 4:47?PM Steven Schlansker <stevenschlansker@gmail.com> wrote:
>>
>> Hi lucene-users,
>>
>> We use the lucene-replicator to have a single indexing node push commits and NRT updates
>> to a set of replicas.
>>
>> Currently, each replica has the full dataset - there is no sharding.
>>
>> We use a SearcherLifetimeManager to try to provide consistent pagination over results.
>>
>> So when we present the first page of results, we return the result of `record(IndexSearcher)` to the client,
>> with the expectation that at a later time (but not too much later) they might request page 2 of results with version X.
>>
>> This works fine for a single instance, since the SearcherLifetimeManager keeps the remembered version around.
>> However, with multiple instances, this doesn't seem to work at all -
>> your first request goes to replica A, who calls `record(searcher) -> X`.
>>
>> The second request likely goes to a different instance B,
>> whose lifetime manager never saw a call to `record` at all - so the `acquire(X)` fails and returns null.
>>
>> Surely there must be a way to solve this -
>> how do you implement consistent versioned searching like SearcherLifetimeManager, but with multiple Lucene replicas
>> who otherwise do not coordinate about which NRT versions get opened or recorded?
>>
>> Thanks for any advice,
>> Steven
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org