+1 from me too, this will be a really helpful feature. I've done some
background research and found a couple aspects that are tricky. If the
filter only matches a small percentage of documents, HNSW can quickly
degrade to a brute-force scan. With live docs this isn't a big problem,
because our merge policies help keep deleted docs down to a reasonable
percentage. But with an arbitrary query, you could easily filter away most
documents, leading to a surprisingly slow kNN search. This blog post from
the Weaviate engine has a graph showing a slowdown past ~20% filter
selectivity:
https://towardsdatascience.com/effects-of-filtered-hnsw-searches-on-recall-and-latency-434becf8041c. Looking forward to discussing more on the issue.
Julie
On Wed, Jan 19, 2022 at 12:10 PM Joel Bernstein <joelsolr@gmail.com> wrote:
> https://issues.apache.org/jira/browse/LUCENE-10382
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Jan 19, 2022 at 2:59 PM Joel Bernstein <joelsolr@gmail.com> wrote:
>
>> Ok, I can create the jira.
>>
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Wed, Jan 19, 2022 at 2:49 PM Michael Sokolov <msokolov@gmail.com>
>> wrote:
>>
>>> +1 we should extend the functionality to support any Bits, not just
>>> liveDocs; we need to propose an API. The implementation should not be
>>> too hard - we need to intersect the user-supplied Bits with liveDocs
>>> and use that to filter.
>>>
>>> On Wed, Jan 19, 2022 at 1:42 PM Joel Bernstein <joelsolr@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > Thanks for all the work on the vector search!
>>> >
>>> > I was wondering if there was a way using KnnVectorQuery to filter the
>>> docs this query looks at. Right now the searchLeaf method passes in the
>>> liveDocs to LeafReader.searchNearestVectors, but there appears to be no way
>>> to have the KnnVectorQuery operate on a subset of liveDocs.
>>> >
>>> > Thanks,
>>> >
>>> > Joel Bernstein
>>> > http://joelsolr.blogspot.com/
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: dev-help@lucene.apache.org
>>>
>>>