Mailing List Archive

Filtering before a vector search.
Hi,

Thanks for all the work on the vector search!

I was wondering if there was a way using KnnVectorQuery to filter the docs
this query looks at. Right now the searchLeaf method passes in the liveDocs
to LeafReader.searchNearestVectors, but there appears to be no way to have
the KnnVectorQuery operate on a subset of liveDocs.

Thanks,

Joel Bernstein
http://joelsolr.blogspot.com/
Re: Filtering before a vector search. [ In reply to ]
+1 we should extend the functionality to support any Bits, not just
liveDocs; we need to propose an API. The implementation should not be
too hard - we need to intersect the user-supplied Bits with liveDocs
and use that to filter.

On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein <joelsolr@gmail.com> wrote:
>
> Hi,
>
> Thanks for all the work on the vector search!
>
> I was wondering if there was a way using KnnVectorQuery to filter the docs this query looks at. Right now the searchLeaf method passes in the liveDocs to LeafReader.searchNearestVectors, but there appears to be no way to have the KnnVectorQuery operate on a subset of liveDocs.
>
> Thanks,
>
> Joel Bernstein
> http://joelsolr.blogspot.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Filtering before a vector search. [ In reply to ]
Ok, I can create the jira.



Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov <msokolov@gmail.com> wrote:

> +1 we should extend the functionality to support any Bits, not just
> liveDocs; we need to propose an API. The implementation should not be
> too hard - we need to intersect the user-supplied Bits with liveDocs
> and use that to filter.
>
> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein <joelsolr@gmail.com> wrote:
> >
> > Hi,
> >
> > Thanks for all the work on the vector search!
> >
> > I was wondering if there was a way using KnnVectorQuery to filter the
> docs this query looks at. Right now the searchLeaf method passes in the
> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way
> to have the KnnVectorQuery operate on a subset of liveDocs.
> >
> > Thanks,
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Filtering before a vector search. [ In reply to ]
https://issues.apache.org/jira/browse/LUCENE-10382


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Jan 19, 2022 at 2:59 PM Joel Bernstein <joelsolr@gmail.com> wrote:

> Ok, I can create the jira.
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
>
>> +1 we should extend the functionality to support any Bits, not just
>> liveDocs; we need to propose an API. The implementation should not be
>> too hard - we need to intersect the user-supplied Bits with liveDocs
>> and use that to filter.
>>
>> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein <joelsolr@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > Thanks for all the work on the vector search!
>> >
>> > I was wondering if there was a way using KnnVectorQuery to filter the
>> docs this query looks at. Right now the searchLeaf method passes in the
>> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way
>> to have the KnnVectorQuery operate on a subset of liveDocs.
>> >
>> > Thanks,
>> >
>> > Joel Bernstein
>> > http://joelsolr.blogspot.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
Re: Filtering before a vector search. [ In reply to ]
+1 from me too, this will be a really helpful feature. I've done some
background research and found a couple aspects that are tricky. If the
filter only matches a small percentage of documents, HNSW can quickly
degrade to a brute-force scan. With live docs this isn't a big problem,
because our merge policies help keep deleted docs down to a reasonable
percentage. But with an arbitrary query, you could easily filter away most
documents, leading to a surprisingly slow kNN search. This blog post from
the Weaviate engine has a graph showing a slowdown past ~20% filter
selectivity:
https://towardsdatascience.com/effects-of-filtered-hnsw-searches-on-recall-and-latency-434becf8041c.
Looking forward to discussing more on the issue.

Julie

On Wed, Jan 19, 2022 at 12:10 PM Joel Bernstein <joelsolr@gmail.com> wrote:

> https://issues.apache.org/jira/browse/LUCENE-10382
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 19, 2022 at 2:59 PM Joel Bernstein <joelsolr@gmail.com> wrote:
>
>> Ok, I can create the jira.
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> +1 we should extend the functionality to support any Bits, not just
>>> liveDocs; we need to propose an API. The implementation should not be
>>> too hard - we need to intersect the user-supplied Bits with liveDocs
>>> and use that to filter.
>>>
>>> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein <joelsolr@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Thanks for all the work on the vector search!
>>> >
>>> > I was wondering if there was a way using KnnVectorQuery to filter the
>>> docs this query looks at. Right now the searchLeaf method passes in the
>>> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way
>>> to have the KnnVectorQuery operate on a subset of liveDocs.
>>> >
>>> > Thanks,
>>> >
>>> > Joel Bernstein
>>> > http://joelsolr.blogspot.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>