Mailing List Archive

How to Increase max vector size?
Hi

I just tried to test the OpenAI model "text-similarity-davinci-001" with
12288 dimensions and receive the following error

java.lang.IllegalArgumentException: vector numDimensions must be <=
VectorValues.MAX_DIMENSIONS (=1024); got 12288
        at
org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
jpountz - 2021-12-01 14:23:49]
        at
org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
jpountz - 2021-12-01 14:23:49]

IIUC I can not increase programmatically the max vector size which is
set inside lucene/core/src/java/org/apache/lucene/index/VectorValues.java

  public static int MAX_DIMENSIONS = 1024;

right?

I guess I could rebuild Lucene with a greater size or what are the
possbilities to increase the max vector size?

Thanks

Michael
Re: How to Increase max vector size? [ In reply to ]
Hi Michael,
let's create a Jira ticket to use a higher value(if you haven't already).
I would be happy to consider the patch/or do it myself but after 10/03.
Once the pull request is ready (including the Javadoc documentation that
clearly states that if you go above X it's at your own risk), we'll involve
also Michael Sokolov and the other committers familiar with this area of
the code.

Cheers

--------------------------
Alessandro Benedetti
Apache Lucene/Solr PMC member and Committer
Director, R&D Software Engineer, Search Consultant

www.sease.io


On Sat, 12 Feb 2022 at 22:53, Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi
>
> I just tried to test the OpenAI model "text-similarity-davinci-001" with
> 12288 dimensions and receive the following error
>
> java.lang.IllegalArgumentException: vector numDimensions must be <=
> VectorValues.MAX_DIMENSIONS (=1024); got 12288
> at
> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
> ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
> jpountz - 2021-12-01 14:23:49]
> at
> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
> ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
> jpountz - 2021-12-01 14:23:49]
>
> IIUC I can not increase programmatically the max vector size which is set
> inside lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>
> public static int MAX_DIMENSIONS = 1024;
>
> right?
>
> I guess I could rebuild Lucene with a greater size or what are the
> possbilities to increase the max vector size?
>
> Thanks
>
> Michael
>
>
>
Re: How to Increase max vector size? [ In reply to ]
Hi Alessandro

No, I have not created a Jira ticket, but I would be happy to create
one, just let me know or please feel free to create one.

I understand the concerns about the limits in general and I think it
makes sense to have a default max dimensions limit, but I could imagine
it needs to be increased eventually and being able to increase it
programmatically and at your own risk will help people using Lucene.

Thanks

Michael

Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
> Hi Michael,
> let's create a Jira ticket to use a higher value(if you haven't already).
> I would be happy to consider the patch/or do it myself but after 10/03.
> Once the pull request is ready (including the Javadoc documentation
> that clearly states that if you go above X it's at your own risk),
> we'll involve also Michael Sokolov and the other committers familiar
> with this area of the code.
>
> Cheers
>
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr PMC member and Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io <http://www.sease.io>
>
>
> On Sat, 12 Feb 2022 at 22:53, Michael Wechner
> <michael.wechner@wyona.com> wrote:
>
> Hi
>
> I just tried to test the OpenAI model
> "text-similarity-davinci-001" with 12288 dimensions and receive
> the following error
>
> java.lang.IllegalArgumentException: vector numDimensions must be
> <= VectorValues.MAX_DIMENSIONS (=1024); got 12288
>         at
> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
> ~[lucene-core-9.0.0.jar:9.0.0
> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01
> 14:23:49]
>         at
> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
> ~[lucene-core-9.0.0.jar:9.0.0
> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01
> 14:23:49]
>
> IIUC I can not increase programmatically the max vector size which
> is set inside
> lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>
>   public static int MAX_DIMENSIONS = 1024;
>
> right?
>
> I guess I could rebuild Lucene with a greater size or what are the
> possbilities to increase the max vector size?
>
> Thanks
>
> Michael
>
>
Re: How to Increase max vector size? [ In reply to ]
I don't think it makes sense to have a static variable maximum that you can
change by calling a method. What purpose would it serve?

On Tue, Feb 15, 2022, 2:39 PM Michael Wechner <michael.wechner@wyona.com>
wrote:

> Hi Alessandro
>
> No, I have not created a Jira ticket, but I would be happy to create one,
> just let me know or please feel free to create one.
>
> I understand the concerns about the limits in general and I think it makes
> sense to have a default max dimensions limit, but I could imagine it needs
> to be increased eventually and being able to increase it programmatically
> and at your own risk will help people using Lucene.
>
> Thanks
>
> Michael
>
> Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
>
> Hi Michael,
> let's create a Jira ticket to use a higher value(if you haven't already).
> I would be happy to consider the patch/or do it myself but after 10/03.
> Once the pull request is ready (including the Javadoc documentation that
> clearly states that if you go above X it's at your own risk), we'll involve
> also Michael Sokolov and the other committers familiar with this area of
> the code.
>
> Cheers
>
> --------------------------
> Alessandro Benedetti
> Apache Lucene/Solr PMC member and Committer
> Director, R&D Software Engineer, Search Consultant
>
> www.sease.io
>
>
> On Sat, 12 Feb 2022 at 22:53, Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> Hi
>>
>> I just tried to test the OpenAI model "text-similarity-davinci-001" with
>> 12288 dimensions and receive the following error
>>
>> java.lang.IllegalArgumentException: vector numDimensions must be <=
>> VectorValues.MAX_DIMENSIONS (=1024); got 12288
>> at
>> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
>> ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
>> jpountz - 2021-12-01 14:23:49]
>> at
>> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
>> ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
>> jpountz - 2021-12-01 14:23:49]
>>
>> IIUC I can not increase programmatically the max vector size which is set
>> inside lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>>
>> public static int MAX_DIMENSIONS = 1024;
>>
>> right?
>>
>> I guess I could rebuild Lucene with a greater size or what are the
>> possbilities to increase the max vector size?
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>
Re: How to Increase max vector size? [ In reply to ]
true :-) when you are the one controlling the input of vectors, then a
method to disable the maximum limit would be sufficient.

But I could imagine when you offer Lucene as a service where people can
for example configure their own "sentence embedding models" and you
would like to offer a different maximum limit than the default of 1024,
then I think a method to reset the maximum limit would make sense.
Examples could be a service of OpenAI or vector search databases like
for example Weaviate or Pinecone.

Thanks

Michael




Am 15.02.22 um 23:34 schrieb Michael Sokolov:
> I don't think it makes sense to have a static variable maximum that
> you can change by calling a method. What purpose would it serve?
>
> On Tue, Feb 15, 2022, 2:39 PM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>
> Hi Alessandro
>
> No, I have not created a Jira ticket, but I would be happy to
> create one, just let me know or please feel free to create one.
>
> I understand the concerns about the limits in general and I think
> it makes sense to have a default max dimensions limit, but I could
> imagine it needs to be increased eventually and being able to
> increase it programmatically and at your own risk will help people
> using Lucene.
>
> Thanks
>
> Michael
>
> Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
>> Hi Michael,
>> let's create a Jira ticket to use a higher value(if you haven't
>> already).
>> I would be happy to consider the patch/or do it myself but after
>> 10/03.
>> Once the pull request is ready (including the Javadoc
>> documentation that clearly states that if you go above X it's at
>> your own risk), we'll involve also Michael Sokolov and the other
>> committers familiar with this area of the code.
>>
>> Cheers
>>
>> --------------------------
>> Alessandro Benedetti
>> Apache Lucene/Solr PMC member and Committer
>> Director, R&D Software Engineer, Search Consultant
>>
>> www.sease.io <http://www.sease.io>
>>
>>
>> On Sat, 12 Feb 2022 at 22:53, Michael Wechner
>> <michael.wechner@wyona.com> wrote:
>>
>> Hi
>>
>> I just tried to test the OpenAI model
>> "text-similarity-davinci-001" with 12288 dimensions and
>> receive the following error
>>
>> java.lang.IllegalArgumentException: vector numDimensions must
>> be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288
>>         at
>> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
>> ~[lucene-core-9.0.0.jar:9.0.0
>> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
>> 2021-12-01 14:23:49]
>>         at
>> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
>> ~[lucene-core-9.0.0.jar:9.0.0
>> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
>> 2021-12-01 14:23:49]
>>
>> IIUC I can not increase programmatically the max vector size
>> which is set inside
>> lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>>
>>   public static int MAX_DIMENSIONS = 1024;
>>
>> right?
>>
>> I guess I could rebuild Lucene with a greater size or what
>> are the possbilities to increase the max vector size?
>>
>> Thanks
>>
>> Michael
>>
>>
>
Re: How to Increase max vector size? [ In reply to ]
Fair enough - are you planning to offer such a service;) sounds exciting!

-Mike

On Tue, Feb 15, 2022 at 6:00 PM Michael Wechner <michael.wechner@wyona.com>
wrote:

> true :-) when you are the one controlling the input of vectors, then a
> method to disable the maximum limit would be sufficient.
>
> But I could imagine when you offer Lucene as a service where people can
> for example configure their own "sentence embedding models" and you would
> like to offer a different maximum limit than the default of 1024, then I
> think a method to reset the maximum limit would make sense. Examples could
> be a service of OpenAI or vector search databases like for example Weaviate
> or Pinecone.
>
> Thanks
>
> Michael
>
>
>
>
> Am 15.02.22 um 23:34 schrieb Michael Sokolov:
>
> I don't think it makes sense to have a static variable maximum that you
> can change by calling a method. What purpose would it serve?
>
> On Tue, Feb 15, 2022, 2:39 PM Michael Wechner <michael.wechner@wyona.com>
> wrote:
>
>> Hi Alessandro
>>
>> No, I have not created a Jira ticket, but I would be happy to create one,
>> just let me know or please feel free to create one.
>>
>> I understand the concerns about the limits in general and I think it
>> makes sense to have a default max dimensions limit, but I could imagine it
>> needs to be increased eventually and being able to increase it
>> programmatically and at your own risk will help people using Lucene.
>>
>> Thanks
>>
>> Michael
>>
>> Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
>>
>> Hi Michael,
>> let's create a Jira ticket to use a higher value(if you haven't already).
>> I would be happy to consider the patch/or do it myself but after 10/03.
>> Once the pull request is ready (including the Javadoc documentation that
>> clearly states that if you go above X it's at your own risk), we'll involve
>> also Michael Sokolov and the other committers familiar with this area of
>> the code.
>>
>> Cheers
>>
>> --------------------------
>> Alessandro Benedetti
>> Apache Lucene/Solr PMC member and Committer
>> Director, R&D Software Engineer, Search Consultant
>>
>> www.sease.io
>>
>>
>> On Sat, 12 Feb 2022 at 22:53, Michael Wechner <michael.wechner@wyona.com>
>> wrote:
>>
>>> Hi
>>>
>>> I just tried to test the OpenAI model "text-similarity-davinci-001" with
>>> 12288 dimensions and receive the following error
>>>
>>> java.lang.IllegalArgumentException: vector numDimensions must be <=
>>> VectorValues.MAX_DIMENSIONS (=1024); got 12288
>>> at
>>> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
>>> ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
>>> jpountz - 2021-12-01 14:23:49]
>>> at
>>> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
>>> ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d -
>>> jpountz - 2021-12-01 14:23:49]
>>>
>>> IIUC I can not increase programmatically the max vector size which is
>>> set inside lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>>>
>>> public static int MAX_DIMENSIONS = 1024;
>>>
>>> right?
>>>
>>> I guess I could rebuild Lucene with a greater size or what are the
>>> possbilities to increase the max vector size?
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>>
>>
>
Re: How to Increase max vector size? [ In reply to ]
Not at the moment :-)

I am using Lucene's vector search for https://ukatie.com to detect
duplicated questions, whereas I am currently refactoring it, such that
you can connect Katie with your own similarity search implementation,
whereas I have done a very first prototype of a connector for Weaviate

https://github.com/wyona/spring-boot-hello-world-rest/blob/master/src/main/java/org/wyona/webapp/controllers/v2/KatieMockupConnectorController.java

Weaviate itself is now supporting the OpenAI embeddings and I wanted to
see how well this works together with Lucene, whereas I would like to
make the embeddings configurable.
So far the Katie Lucene implementation supports the various sbert
transformer models https://www.sbert.net/docs/pretrained_models.html and
OpenAI text-similarity-ada-001

I will need some more time for the refactoring, but will make the Lucene
connecter available under the Apache license.

Thanks

Michael

Am 16.02.22 um 19:51 schrieb Michael Sokolov:
> Fair enough - are you planning to offer such a service;) sounds exciting!
>
> -Mike
>
> On Tue, Feb 15, 2022 at 6:00 PM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>
> true :-) when you are the one controlling the input of vectors,
> then a method to disable the maximum limit would be sufficient.
>
> But I could imagine when you offer Lucene as a service where
> people can for example configure their own "sentence embedding
> models" and you would like to offer a different maximum limit than
> the default of 1024, then I think a method to reset the maximum
> limit would make sense. Examples could be a service of OpenAI or
> vector search databases like for example Weaviate or Pinecone.
>
> Thanks
>
> Michael
>
>
>
>
> Am 15.02.22 um 23:34 schrieb Michael Sokolov:
>> I don't think it makes sense to have a static variable maximum
>> that you can change by calling a method. What purpose would it
>> serve?
>>
>> On Tue, Feb 15, 2022, 2:39 PM Michael Wechner
>> <michael.wechner@wyona.com> wrote:
>>
>> Hi Alessandro
>>
>> No, I have not created a Jira ticket, but I would be happy to
>> create one, just let me know or please feel free to create one.
>>
>> I understand the concerns about the limits in general and I
>> think it makes sense to have a default max dimensions limit,
>> but I could imagine it needs to be increased eventually and
>> being able to increase it programmatically and at your own
>> risk will help people using Lucene.
>>
>> Thanks
>>
>> Michael
>>
>> Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
>>> Hi Michael,
>>> let's create a Jira ticket to use a higher value(if you
>>> haven't already).
>>> I would be happy to consider the patch/or do it myself but
>>> after 10/03.
>>> Once the pull request is ready (including the Javadoc
>>> documentation that clearly states that if you go above X
>>> it's at your own risk), we'll involve also Michael Sokolov
>>> and the other committers familiar with this area of the code.
>>>
>>> Cheers
>>>
>>> --------------------------
>>> Alessandro Benedetti
>>> Apache Lucene/Solr PMC member and Committer
>>> Director, R&D Software Engineer, Search Consultant
>>>
>>> www.sease.io <http://www.sease.io>
>>>
>>>
>>> On Sat, 12 Feb 2022 at 22:53, Michael Wechner
>>> <michael.wechner@wyona.com> wrote:
>>>
>>> Hi
>>>
>>> I just tried to test the OpenAI model
>>> "text-similarity-davinci-001" with 12288 dimensions and
>>> receive the following error
>>>
>>> java.lang.IllegalArgumentException: vector numDimensions
>>> must be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288
>>>         at
>>> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
>>> ~[lucene-core-9.0.0.jar:9.0.0
>>> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
>>> 2021-12-01 14:23:49]
>>>         at
>>> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
>>> ~[lucene-core-9.0.0.jar:9.0.0
>>> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
>>> 2021-12-01 14:23:49]
>>>
>>> IIUC I can not increase programmatically the max vector
>>> size which is set inside
>>> lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>>>
>>>
>>>   public static int MAX_DIMENSIONS = 1024;
>>>
>>> right?
>>>
>>> I guess I could rebuild Lucene with a greater size or
>>> what are the possbilities to increase the max vector size?
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>
>