Mailing List Archive

Other vector similarity metric than provided by VectorSimilarityFunction
Hi

IIUC Lucene currently supports

VectorSimilarityFunction.COSINE
VectorSimilarityFunction.DOT_PRODUCT
VectorSimilarityFunction.EUCLIDEAN

whereas some embedding models have been trained with other metrics.
Also see
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

How can I best implement another metric?

Thanks

Michael





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Other vector similarity metric than provided by VectorSimilarityFunction [ In reply to ]
Hi Michael,

You could create a custom KNN vectors format that ignores the vector
similarity configured on the field and uses its own.

Le sam. 14 janv. 2023, 21:33, Michael Wechner <michael.wechner@wyona.com> a
écrit :

> Hi
>
> IIUC Lucene currently supports
>
> VectorSimilarityFunction.COSINE
> VectorSimilarityFunction.DOT_PRODUCT
> VectorSimilarityFunction.EUCLIDEAN
>
> whereas some embedding models have been trained with other metrics.
> Also see
>
> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
>
> How can I best implement another metric?
>
> Thanks
>
> Michael
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Other vector similarity metric than provided by VectorSimilarityFunction [ In reply to ]
Hi Adrien

Thanks for your feedback! Whereas I am not sure I fully understand what
you mean

At the moment I am using something like:

float[] vector = ...;
FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, VectorSimilarityFunction.COSINE);
KnnVectorField vectorField =new KnnVectorField("vector_field", vector, vectorFieldType);
doc.add(vectorField);

Could you give me some sample code what you mean with "custom KNN
vectors format"?

Thanks

Michael

Am 14.01.23 um 22:14 schrieb Adrien Grand:
> Hi Michael,
>
> You could create a custom KNN vectors format that ignores the vector
> similarity configured on the field and uses its own.
>
> Le sam. 14 janv. 2023, 21:33, Michael Wechner<michael.wechner@wyona.com> a
> ?crit :
>
>> Hi
>>
>> IIUC Lucene currently supports
>>
>> VectorSimilarityFunction.COSINE
>> VectorSimilarityFunction.DOT_PRODUCT
>> VectorSimilarityFunction.EUCLIDEAN
>>
>> whereas some embedding models have been trained with other metrics.
>> Also see
>>
>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
>>
>> How can I best implement another metric?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail:java-user-help@lucene.apache.org
>>
>>
Re: Other vector similarity metric than provided by VectorSimilarityFunction [ In reply to ]
I would suggest building Lucene from source and adding your own
similarity function to VectorSimilarity. That is the proper extension
point for similarity functions. If you find there is some substantial
benefit, it wouldn't be a big lift to add something like that. However
I'm dubious about the likely benefit; just because scipy supports lots
of functions doesn't mean you will get substantially better results
with L3 metric vs L2 metric or so. I think you'd probably find this
community receptive to a metric that *doesn't lose* accuracy and
provides a more efficient computation -- maybe L1 would do that?

On Sat, Jan 14, 2023 at 6:04 PM Michael Wechner
<michael.wechner@wyona.com> wrote:
>
> Hi Adrien
>
> Thanks for your feedback! Whereas I am not sure I fully understand what
> you mean
>
> At the moment I am using something like:
>
> float[] vector = ...;
> FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, VectorSimilarityFunction.COSINE);
> KnnVectorField vectorField =new KnnVectorField("vector_field", vector, vectorFieldType);
> doc.add(vectorField);
>
> Could you give me some sample code what you mean with "custom KNN
> vectors format"?
>
> Thanks
>
> Michael
>
> Am 14.01.23 um 22:14 schrieb Adrien Grand:
> > Hi Michael,
> >
> > You could create a custom KNN vectors format that ignores the vector
> > similarity configured on the field and uses its own.
> >
> > Le sam. 14 janv. 2023, 21:33, Michael Wechner<michael.wechner@wyona.com> a
> > écrit :
> >
> >> Hi
> >>
> >> IIUC Lucene currently supports
> >>
> >> VectorSimilarityFunction.COSINE
> >> VectorSimilarityFunction.DOT_PRODUCT
> >> VectorSimilarityFunction.EUCLIDEAN
> >>
> >> whereas some embedding models have been trained with other metrics.
> >> Also see
> >>
> >> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
> >>
> >> How can I best implement another metric?
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail:java-user-help@lucene.apache.org
> >>
> >>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Other vector similarity metric than provided by VectorSimilarityFunction [ In reply to ]
Am 15.01.23 um 16:36 schrieb Michael Sokolov:
> I would suggest building Lucene from source and adding your own
> similarity function to VectorSimilarity. That is the proper extension
> point for similarity functions. If you find there is some substantial
> benefit, it wouldn't be a big lift to add something like that. However
> I'm dubious about the likely benefit; just because scipy supports lots
> of functions doesn't mean you will get substantially better results
> with L3 metric vs L2 metric or so. I think you'd probably find this
> community receptive to a metric that *doesn't lose* accuracy and
> provides a more efficient computation -- maybe L1 would do that?

yes, I think the L1 (Manhattan) could be one of them :-)

Btw, Weaviate has a quite nice documentation re vector distances

https://weaviate.io/blog/2022/09/Distance-Metrics-in-Vector-Search.html

Yes, maybe it is easier to just contribute another metric as part of the
source, than make it configurable dynamically with a custom implementation.

Thanks

Michael


>
> On Sat, Jan 14, 2023 at 6:04 PM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>> Hi Adrien
>>
>> Thanks for your feedback! Whereas I am not sure I fully understand what
>> you mean
>>
>> At the moment I am using something like:
>>
>> float[] vector = ...;
>> FieldType vectorFieldType = KnnVectorField.createFieldType(vector.length, VectorSimilarityFunction.COSINE);
>> KnnVectorField vectorField =new KnnVectorField("vector_field", vector, vectorFieldType);
>> doc.add(vectorField);
>>
>> Could you give me some sample code what you mean with "custom KNN
>> vectors format"?
>>
>> Thanks
>>
>> Michael
>>
>> Am 14.01.23 um 22:14 schrieb Adrien Grand:
>>> Hi Michael,
>>>
>>> You could create a custom KNN vectors format that ignores the vector
>>> similarity configured on the field and uses its own.
>>>
>>> Le sam. 14 janv. 2023, 21:33, Michael Wechner<michael.wechner@wyona.com> a
>>> ?crit :
>>>
>>>> Hi
>>>>
>>>> IIUC Lucene currently supports
>>>>
>>>> VectorSimilarityFunction.COSINE
>>>> VectorSimilarityFunction.DOT_PRODUCT
>>>> VectorSimilarityFunction.EUCLIDEAN
>>>>
>>>> whereas some embedding models have been trained with other metrics.
>>>> Also see
>>>>
>>>> https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
>>>>
>>>> How can I best implement another metric?
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail:java-user-help@lucene.apache.org
>>>>
>>>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org