Not at the moment :-)
I am using Lucene's vector search for
https://ukatie.com to detect
duplicated questions, whereas I am currently refactoring it, such that
you can connect Katie with your own similarity search implementation,
whereas I have done a very first prototype of a connector for Weaviate
https://github.com/wyona/spring-boot-hello-world-rest/blob/master/src/main/java/org/wyona/webapp/controllers/v2/KatieMockupConnectorController.java Weaviate itself is now supporting the OpenAI embeddings and I wanted to
see how well this works together with Lucene, whereas I would like to
make the embeddings configurable.
So far the Katie Lucene implementation supports the various sbert
transformer models
https://www.sbert.net/docs/pretrained_models.html and
OpenAI text-similarity-ada-001
I will need some more time for the refactoring, but will make the Lucene
connecter available under the Apache license.
Thanks
Michael
Am 16.02.22 um 19:51 schrieb Michael Sokolov:
> Fair enough - are you planning to offer such a service;) sounds exciting!
>
> -Mike
>
> On Tue, Feb 15, 2022 at 6:00 PM Michael Wechner
> <michael.wechner@wyona.com> wrote:
>
> true :-) when you are the one controlling the input of vectors,
> then a method to disable the maximum limit would be sufficient.
>
> But I could imagine when you offer Lucene as a service where
> people can for example configure their own "sentence embedding
> models" and you would like to offer a different maximum limit than
> the default of 1024, then I think a method to reset the maximum
> limit would make sense. Examples could be a service of OpenAI or
> vector search databases like for example Weaviate or Pinecone.
>
> Thanks
>
> Michael
>
>
>
>
> Am 15.02.22 um 23:34 schrieb Michael Sokolov:
>> I don't think it makes sense to have a static variable maximum
>> that you can change by calling a method. What purpose would it
>> serve?
>>
>> On Tue, Feb 15, 2022, 2:39 PM Michael Wechner
>> <michael.wechner@wyona.com> wrote:
>>
>> Hi Alessandro
>>
>> No, I have not created a Jira ticket, but I would be happy to
>> create one, just let me know or please feel free to create one.
>>
>> I understand the concerns about the limits in general and I
>> think it makes sense to have a default max dimensions limit,
>> but I could imagine it needs to be increased eventually and
>> being able to increase it programmatically and at your own
>> risk will help people using Lucene.
>>
>> Thanks
>>
>> Michael
>>
>> Am 15.02.22 um 19:22 schrieb Alessandro Benedetti:
>>> Hi Michael,
>>> let's create a Jira ticket to use a higher value(if you
>>> haven't already).
>>> I would be happy to consider the patch/or do it myself but
>>> after 10/03.
>>> Once the pull request is ready (including the Javadoc
>>> documentation that clearly states that if you go above X
>>> it's at your own risk), we'll involve also Michael Sokolov
>>> and the other committers familiar with this area of the code.
>>>
>>> Cheers
>>>
>>> --------------------------
>>> Alessandro Benedetti
>>> Apache Lucene/Solr PMC member and Committer
>>> Director, R&D Software Engineer, Search Consultant
>>>
>>> www.sease.io <http://www.sease.io>
>>>
>>>
>>> On Sat, 12 Feb 2022 at 22:53, Michael Wechner
>>> <michael.wechner@wyona.com> wrote:
>>>
>>> Hi
>>>
>>> I just tried to test the OpenAI model
>>> "text-similarity-davinci-001" with 12288 dimensions and
>>> receive the following error
>>>
>>> java.lang.IllegalArgumentException: vector numDimensions
>>> must be <= VectorValues.MAX_DIMENSIONS (=1024); got 12288
>>> at
>>> org.apache.lucene.document.FieldType.setVectorDimensionsAndSimilarityFunction(FieldType.java:381)
>>> ~[lucene-core-9.0.0.jar:9.0.0
>>> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
>>> 2021-12-01 14:23:49]
>>> at
>>> org.apache.lucene.document.KnnVectorField.createFieldType(KnnVectorField.java:69)
>>> ~[lucene-core-9.0.0.jar:9.0.0
>>> 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz -
>>> 2021-12-01 14:23:49]
>>>
>>> IIUC I can not increase programmatically the max vector
>>> size which is set inside
>>> lucene/core/src/java/org/apache/lucene/index/VectorValues.java
>>>
>>>
>>> public static int MAX_DIMENSIONS = 1024;
>>>
>>> right?
>>>
>>> I guess I could rebuild Lucene with a greater size or
>>> what are the possbilities to increase the max vector size?
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>
>