Mailing List Archive

hnsw parameters for vector search
Hi,

the hnsw documentation for the Lucene HnswGraph and the SolR vector search is not very verbose, especially in regards to the parameters hnswMaxConn and hnswBeamWidth.
I find it hard to come up with sensible values for these parameters by reading the paper from 2018.
Does anyone have experience with the influence of the parameters on the results? As far as I understand the code the graph is created at indexing time so it would be time intensive to come up with the optimal values for a specific use case by trial and error?

We have a SolR index with roughly 100 million embeddings and in a synthetic randomized benchmarks around 14% percent of requests will result in a suboptimal answer (based on the cosine vector similarity).
I expected this "error" rate to be much smaller. I would love to hear your experiences.

Best regards

Andreas Moll
Re: hnsw parameters for vector search [ In reply to ]
Re your "second" question about suboptimal results, I think Nils Reimers
explains quite nicely why this might happen, see for example

https://www.youtube.com/watch?v=Abh3YCahyqU

HTH

Michael



Am 30.01.24 um 15:48 schrieb Moll, Dr. Andreas:
> Hi,
>
> the hnsw documentation for the Lucene HnswGraph and the SolR vector search is not very verbose, especially in regards to the parameters hnswMaxConn and hnswBeamWidth.
> I find it hard to come up with sensible values for these parameters by reading the paper from 2018.
> Does anyone have experience with the influence of the parameters on the results? As far as I understand the code the graph is created at indexing time so it would be time intensive to come up with the optimal values for a specific use case by trial and error?
>
> We have a SolR index with roughly 100 million embeddings and in a synthetic randomized benchmarks around 14% percent of requests will result in a suboptimal answer (based on the cosine vector similarity).
> I expected this "error" rate to be much smaller. I would love to hear your experiences.
>
> Best regards
>
> Andreas Moll
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: hnsw parameters for vector search [ In reply to ]
To get best results it's necessary to tune these parameters for each vector
model. My suggestion is to use a subset of your 100M vectors for parameter
optimization to save time while iterating through the parameters space as
you will indeed need to reindex in order to measure

Generally speaking, increasing maxconns and beam width will lead to higher
recall, but more latency.

You can use the knngraphtester tool in luceneutils package to get started.

On Tue, Jan 30, 2024, 11:05?AM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Re your "second" question about suboptimal results, I think Nils Reimers
> explains quite nicely why this might happen, see for example
>
> https://www.youtube.com/watch?v=Abh3YCahyqU
>
> HTH
>
> Michael
>
>
>
> Am 30.01.24 um 15:48 schrieb Moll, Dr. Andreas:
> > Hi,
> >
> > the hnsw documentation for the Lucene HnswGraph and the SolR vector
> search is not very verbose, especially in regards to the parameters
> hnswMaxConn and hnswBeamWidth.
> > I find it hard to come up with sensible values for these parameters by
> reading the paper from 2018.
> > Does anyone have experience with the influence of the parameters on the
> results? As far as I understand the code the graph is created at indexing
> time so it would be time intensive to come up with the optimal values for a
> specific use case by trial and error?
> >
> > We have a SolR index with roughly 100 million embeddings and in a
> synthetic randomized benchmarks around 14% percent of requests will result
> in a suboptimal answer (based on the cosine vector similarity).
> > I expected this "error" rate to be much smaller. I would love to hear
> your experiences.
> >
> > Best regards
> >
> > Andreas Moll
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>