Mailing List Archive: Top docs depend on value of K nearest neighbour

Top docs depend on value of K nearest neighbour

Aug 2, 2023, 2:19 PM

Post #1 of 3 (111 views)

Hi

I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0
when doing vector search as follows:

I have indexed about 200 vectors (dimension 768)

I build the query as follows

?Query query = new KnnFloatVectorQuery("vector-field-name",
queryVector, k);

and do the search as follows:

TopDocs topDocs = searcher.search(query, k);

When I set k=27 then the top doc has a score of 0.7757

When I set the "k" value a little lower, e.g. k=24 then the top doc has
a score of 0.7319 and is not the same document as the one with the score
of 0.7757

And idea what I might be doing wrong or what I misunderstand?

Why does the value of k has an effect on the returned top doc?

Thanks

Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Top docs depend on value of K nearest neighbour [ In reply to ]

msokolov at gmail

Aug 3, 2023, 10:49 AM

Post #2 of 3 (111 views)

Permalink

well, it is "approximate" KNN and can get caught in local minima
(maxima?). Increasing K has, indirectly, the effect of expanding the
search space because the minimum score in the priority score (score of
the Kth item) is used as a threshold for deciding when to terminate
the search

On Wed, Aug 2, 2023 at 5:19?PM Michael Wechner
<michael.wechner@wyona.com> wrote:
>
> Hi
>
> I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0
> when doing vector search as follows:
>
> I have indexed about 200 vectors (dimension 768)
>
> I build the query as follows
>
> Query query = new KnnFloatVectorQuery("vector-field-name",
> queryVector, k);
>
> and do the search as follows:
>
> TopDocs topDocs = searcher.search(query, k);
>
> When I set k=27 then the top doc has a score of 0.7757
>
> When I set the "k" value a little lower, e.g. k=24 then the top doc has
> a score of 0.7319 and is not the same document as the one with the score
> of 0.7757
>
> And idea what I might be doing wrong or what I misunderstand?
>
> Why does the value of k has an effect on the returned top doc?
>
> Thanks
>
> Michael
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Top docs depend on value of K nearest neighbour [ In reply to ]

michael.wechner at wyona

Aug 4, 2023, 6:38 AM

Post #3 of 3 (111 views)

Permalink

Thank you very much for your feedback!

Does there also exist an "ef" value inside Lucene

"ef - *the size of the dynamic list for the nearest neighbors* (used
during the search). Higher ef leads to more accurate but slower search.
ef cannot be set lower than the number of queried nearest neighbors k .
The value ef of can be anything between k and the size of the dataset."

or is the "k" value *the* "ef" value?

IIUC the relevant "k" value (k1) for the HNSW algorithm is what is set at

Query query = new KnnFloatVectorQuery("vector-field-name",queryVector, k1);

and not the max top docs value (k2) at

TopDocs topDocs = searcher.search(query, k2);

right?

But it should be k2 <= k1, right? Or would you set k1 == k2?

Thanks

Michael

Am 03.08.23 um 19:49 schrieb Michael Sokolov:
> well, it is "approximate" KNN and can get caught in local minima
> (maxima?). Increasing K has, indirectly, the effect of expanding the
> search space because the minimum score in the priority score (score of
> the Kth item) is used as a threshold for deciding when to terminate
> the search
>
> On Wed, Aug 2, 2023 at 5:19?PM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>> Hi
>>
>> I use Lucene 9.7.0 but experienced the same behaviour with Lucene 9.6.0
>> when doing vector search as follows:
>>
>> I have indexed about 200 vectors (dimension 768)
>>
>> I build the query as follows
>>
>> Query query = new KnnFloatVectorQuery("vector-field-name",
>> queryVector, k);
>>
>> and do the search as follows:
>>
>> TopDocs topDocs = searcher.search(query, k);
>>
>> When I set k=27 then the top doc has a score of 0.7757
>>
>> When I set the "k" value a little lower, e.g. k=24 then the top doc has
>> a score of 0.7319 and is not the same document as the one with the score
>> of 0.7757
>>
>> And idea what I might be doing wrong or what I misunderstand?
>>
>> Why does the value of k has an effect on the returned top doc?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:java-user-help@lucene.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:java-user-help@lucene.apache.org
>