Mailing List Archive

Faster advance on Vector Values
Hi ,

Our team is using the recently introduced Lucene90Codec support for
vectors. We have a use case to quickly scan a segment for documents having
vectors. While implementing it, we noticed that the advance function in
the class Lucene90VectorReader does a linear search for the target document.
I have a proposal to make it faster - We can implement a binary search over
the "ordToDoc" array which will make the advance operation take logarithmic
time to search.

I would like to seek ideas, suggestions from the community. I have an
implementation on my private fork that implements the above idea. I can
open a PR if the idea sounds reasonable.

Thanks !
Anand Kotriwal
Re: Faster advance on Vector Values [ In reply to ]
Thanks for the suggestion! This will be a nice improvement for use
cases wanting to retrieve vectors for a sparse set of documents, eg
when incorporating a vector-based score as a scoring signal. Would you
mind opening an issue, Anand?

On Sat, Jan 16, 2021 at 9:07 AM Anand Kotriwal <anand.kotriwal@gmail.com> wrote:
>
> Hi ,
>
> Our team is using the recently introduced Lucene90Codec support for vectors. We have a use case to quickly scan a segment for documents having vectors. While implementing it, we noticed that the advance function in the class Lucene90VectorReader does a linear search for the target document.
> I have a proposal to make it faster - We can implement a binary search over the "ordToDoc" array which will make the advance operation take logarithmic time to search.
>
> I would like to seek ideas, suggestions from the community. I have an implementation on my private fork that implements the above idea. I can open a PR if the idea sounds reasonable.
>
> Thanks !
> Anand Kotriwal
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Faster advance on Vector Values [ In reply to ]
Sure ! created https://issues.apache.org/jira/browse/LUCENE-9674 .
Also attached a PR to the above issue.

Thanks,
Anand

On Mon, Jan 18, 2021 at 6:14 AM Michael Sokolov <msokolov@gmail.com> wrote:

> Thanks for the suggestion! This will be a nice improvement for use
> cases wanting to retrieve vectors for a sparse set of documents, eg
> when incorporating a vector-based score as a scoring signal. Would you
> mind opening an issue, Anand?
>
> On Sat, Jan 16, 2021 at 9:07 AM Anand Kotriwal <anand.kotriwal@gmail.com>
> wrote:
> >
> > Hi ,
> >
> > Our team is using the recently introduced Lucene90Codec support for
> vectors. We have a use case to quickly scan a segment for documents having
> vectors. While implementing it, we noticed that the advance function in
> the class Lucene90VectorReader does a linear search for the target document.
> > I have a proposal to make it faster - We can implement a binary search
> over the "ordToDoc" array which will make the advance operation take
> logarithmic time to search.
> >
> > I would like to seek ideas, suggestions from the community. I have an
> implementation on my private fork that implements the above idea. I can
> open a PR if the idea sounds reasonable.
> >
> > Thanks !
> > Anand Kotriwal
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>