On Sun, Apr 23, 2023 at 6:19?AM Uwe Schindler <uwe@thetaphi.de> wrote:
Having the sequence number public in API does not bring any benefit, as
> you cannot use it for anything.
>
Actually there are some interesting use cases for sequence numbers:
They enable the caller to know the effective order of operations of
concurrent indexing events. This can be useful for applications that might
sometimes update the same document at the same time across threads to
implement optimistic concurrency to re-index the same document if the order
was not correct according to the applications external version tracking for
out-of-order updates. OpenSearch has an array of locks to implement
pessimistic concurrency (ensuring the that same id is never updated
concurrently) but for cases where the conflicts are rare, the optimistic
implementation based on Lucene's sequence numbers is likely more efficient.
Another use case is precise indexing operation replay (e.g. from a Kinesis
queue or transaction log or whatever) on recovering from a commit point:
upon commit, you know which precise indexing event was captured in the
commit, and on recovering you can resume indexing from precisely the next
indexing event. This doesn't matter for idempotent updates, but, for other
cases like append only, it is useful and performant.
I also don't see why flush should return a sequence number -- it is not an
externally visible event. Patrick maybe you had an interesting use case in
mind? Note that commit also writes (and fsyncs) the next segments_N file,
to light all the newly written/fsync'd segments for the next reader to open.
Mike McCandless
http://blog.mikemccandless.com