Mailing List Archive

Exact KNN
Is there a way of directly executing an exact nearest neighbor search? It
seems like the API provides some general functionality, and we can force
Lucene to execute exact nearest neighbor search by providing a high K
value, but I'm wondering if there's an exposed way to simply execute an
exact search without doing this "hack".

--

*{* name : "William Zhou",
title : "Software Engineer",
phone : "818-307-2334",
location : "Berkeley, CA",
twitter : "@MongoDB
<https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fmongodb&sa=D&sntz=1&usg=AFQjCNGEAIAhZyZhF7Z9ORWsRliTuc-2dg>
",
facebook : "MongoDB
<https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fmongodb&sa=D&sntz=1&usg=AFQjCNGPMcaFBzmWsh-MpaWeTH6vMQoDIg>
" *}*
Re: Exact KNN [ In reply to ]
William,

When I first read this question, I thought to myself, "why would anyone
want to do that?" Then I remembered that with search engines everyone wants
everything. Without thinking too much about it, I think you have a lot of
options as always. Two that come to mind:

1. To retrieve the value of k, you could use IndexReader.numDocs() I think,
but you need to check that it is not only the number of docs in a segment.
I cannot remember.

2. Another option is to use the more established primitives in Lucene to
accomplish this goal in a different way, through extending ScoreFunction
for the similarity math.

My question for the group is, would it be faster to store the vectors as
floats rather than vectors? If you did that, you might have more flexible
options to extend ScoreFunction and introduce less burden on the overall
system. That may be wrong though, given Lucene's columnar structure and the
work needed to index individual items in a list/array. My uninformed and
less intelligent assumption is that the dense vector field is intended for
the approximate nearest neighbor in its current implementation. I probably
need to re-read the paper and cross-reference with the Lucene class but am
curious about other more informed thoughts.

Good luck brother!

Best,

Marcus



On Tue, Jan 30, 2024 at 11:50?AM William Zhou
<william.zhou@mongodb.com.invalid> wrote:

> Is there a way of directly executing an exact nearest neighbor search? It
> seems like the API provides some general functionality, and we can force
> Lucene to execute exact nearest neighbor search by providing a high K
> value, but I'm wondering if there's an exposed way to simply execute an
> exact search without doing this "hack".
>
> --
>
> *{* name : "William Zhou",
> title : "Software Engineer",
> phone : "818-307-2334",
> location : "Berkeley, CA",
> twitter : "@MongoDB
> <
> https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Fmongodb&sa=D&sntz=1&usg=AFQjCNGEAIAhZyZhF7Z9ORWsRliTuc-2dg
> >
> ",
> facebook : "MongoDB
> <
> https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Fmongodb&sa=D&sntz=1&usg=AFQjCNGPMcaFBzmWsh-MpaWeTH6vMQoDIg
> >
> " *}*
>


--
Marcus Eagan
Re: Exact KNN [ In reply to ]
Isn’t that what Semantic-Vectors is doing?
E.g. https://github.com/Ontotext-AD/semanticvectors

Paul

On 30 Jan 2024, at 20:50, William Zhou wrote:

> Is there a way of directly executing an exact nearest neighbor search? It
> seems like the API provides some general functionality, and we can force
> Lucene to execute exact nearest neighbor search by providing a high K
> value, but I'm wondering if there's an exposed way to simply execute an
> exact search without doing this "hack".

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org