Mailing List Archive

Field[vector]vector's dimensions must be <= [1024]; got 1536
Hi

I recently upgraded Lucene to 9.8.0 and was running tests with OpenAI's
embedding model, which has the vector dimension 1536 and received the
following error

Field[vector]vector's dimensions must be <= [1024]; got 1536

wheres this worked previously with the hack to override the vector
dimension using a custom

float[] vector = ...
FieldType vectorFieldType = new CustomVectorFieldType(vector.length,
VectorSimilarityFuncion.COSINE);

and setting

KnnFloatVectorField vectorField = new
KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);

But this does not seem to work anymore with Lucene 9.8.0

Is this hack now prevented by the Lucene code itself, or any idea how to
make this work again?

Whatever one thinks of OpenAI, the embedding model
"text-embedding-ada-002" is really good and it is sad, that one cannot
use it with Lucene, because of the 1024 dimension restriction.

Thanks

Michael



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Field[vector]vector's dimensions must be <= [1024]; got 1536 [ In reply to ]
I forgot to mention, that when using the custom FieldType and 1536
vector dimension does work with Lucene 9.7.0

Thanks

Michael



Am 19.10.23 um 10:39 schrieb Michael Wechner:
> Hi
>
> I recently upgraded Lucene to 9.8.0 and was running tests with
> OpenAI's embedding model, which has the vector dimension 1536 and
> received the following error
>
> Field[vector]vector's dimensions must be <= [1024]; got 1536
>
> wheres this worked previously with the hack to override the vector
> dimension using a custom
>
> float[] vector = ...
> FieldType vectorFieldType = new CustomVectorFieldType(vector.length,
> VectorSimilarityFuncion.COSINE);
>
> and setting
>
> KnnFloatVectorField vectorField = new
> KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);
>
> But this does not seem to work anymore with Lucene 9.8.0
>
> Is this hack now prevented by the Lucene code itself, or any idea how
> to make this work again?
>
> Whatever one thinks of OpenAI, the embedding model
> "text-embedding-ada-002" is really good and it is sad, that one cannot
> use it with Lucene, because of the 1024 dimension restriction.
>
> Thanks
>
> Michael
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Field[vector]vector's dimensions must be <= [1024]; got 1536 [ In reply to ]
Hi Michael,

The max vector dimension limit is no longer checked in the field type as
it is responsibility of the codec to enforce it.

You need to build your own codec that returns a different setting so it
can be enforced by IndexWriter. See Apache Solr's code how to wrap the
existing KnnVectorsFormat so it returns another limit:
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183>

Basically you need to subclass Lucene95Codec like done here:
<https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146>
and return a different vectors format like a delegator as descirbed before.

The responsibility was shifted to the codec, because there may be better
alternatives to HNSW that have different limits especially with regard
to performance during merging and query response times, e.g. BKD trees.

Uwe

Am 19.10.2023 um 10:53 schrieb Michael Wechner:
> I forgot to mention, that when using the custom FieldType and 1536
> vector dimension does work with Lucene 9.7.0
>
> Thanks
>
> Michael
>
>
>
> Am 19.10.23 um 10:39 schrieb Michael Wechner:
>> Hi
>>
>> I recently upgraded Lucene to 9.8.0 and was running tests with
>> OpenAI's embedding model, which has the vector dimension 1536 and
>> received the following error
>>
>> Field[vector]vector's dimensions must be <= [1024]; got 1536
>>
>> wheres this worked previously with the hack to override the vector
>> dimension using a custom
>>
>> float[] vector = ...
>> FieldType vectorFieldType = new CustomVectorFieldType(vector.length,
>> VectorSimilarityFuncion.COSINE);
>>
>> and setting
>>
>> KnnFloatVectorField vectorField = new
>> KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);
>>
>> But this does not seem to work anymore with Lucene 9.8.0
>>
>> Is this hack now prevented by the Lucene code itself, or any idea how
>> to make this work again?
>>
>> Whatever one thinks of OpenAI, the embedding model
>> "text-embedding-ada-002" is really good and it is sad, that one
>> cannot use it with Lucene, because of the 1024 dimension restriction.
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Field[vector]vector's dimensions must be <= [1024]; got 1536 [ In reply to ]
Hi Uwe

Thank you very much for your quick feedback, really appreciated!

Will change it as you describe below.

Thanks

Michael



Am 19.10.23 um 11:23 schrieb Uwe Schindler:
> Hi Michael,
>
> The max vector dimension limit is no longer checked in the field type
> as it is responsibility of the codec to enforce it.
>
> You need to build your own codec that returns a different setting so
> it can be enforced by IndexWriter. See Apache Solr's code how to wrap
> the existing KnnVectorsFormat so it returns another limit:
> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183>
>
>
> Basically you need to subclass Lucene95Codec like done here:
> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146>
> and return a different vectors format like a delegator as descirbed
> before.
>
> The responsibility was shifted to the codec, because there may be
> better alternatives to HNSW that have different limits especially with
> regard to performance during merging and query response times, e.g.
> BKD trees.
>
> Uwe
>
> Am 19.10.2023 um 10:53 schrieb Michael Wechner:
>> I forgot to mention, that when using the custom FieldType and 1536
>> vector dimension does work with Lucene 9.7.0
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> Am 19.10.23 um 10:39 schrieb Michael Wechner:
>>> Hi
>>>
>>> I recently upgraded Lucene to 9.8.0 and was running tests with
>>> OpenAI's embedding model, which has the vector dimension 1536 and
>>> received the following error
>>>
>>> Field[vector]vector's dimensions must be <= [1024]; got 1536
>>>
>>> wheres this worked previously with the hack to override the vector
>>> dimension using a custom
>>>
>>> float[] vector = ...
>>> FieldType vectorFieldType = new CustomVectorFieldType(vector.length,
>>> VectorSimilarityFuncion.COSINE);
>>>
>>> and setting
>>>
>>> KnnFloatVectorField vectorField = new
>>> KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);
>>>
>>> But this does not seem to work anymore with Lucene 9.8.0
>>>
>>> Is this hack now prevented by the Lucene code itself, or any idea
>>> how to make this work again?
>>>
>>> Whatever one thinks of OpenAI, the embedding model
>>> "text-embedding-ada-002" is really good and it is sad, that one
>>> cannot use it with Lucene, because of the 1024 dimension restriction.
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Field[vector]vector's dimensions must be <= [1024]; got 1536 [ In reply to ]
Hi Uwe

Thanks again for your feedback, I got it working now :-)

I am using a simplified version, which I will post below, such that it
might help others, at least as long as this implementation makes sense.

Btw, when a new version of Lucene gets released, how do I best find out
that? "Lucene95Codec" is still the most recent default codec or that
there is a new default codec?

Thanks

Michael

---

@Autowired private LuceneCodecFactoryluceneCodecFactory;

IndexWriterConfig iwc =new IndexWriterConfig();
iwc.setCodec(luceneCodecFactory.getCodec());

----

package com.erkigsnek.webapp.services;

import org.apache.lucene.codecs.Codec;
import org.apache.lucene.codecs.KnnVectorsFormat;
import org.apache.lucene.codecs.KnnVectorsReader;
import org.apache.lucene.codecs.KnnVectorsWriter;
import org.apache.lucene.codecs.lucene95.Lucene95Codec;
import org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsFormat;
import org.apache.lucene.index.SegmentReadState;
import org.apache.lucene.index.SegmentWriteState;
import org.springframework.stereotype.Component;
import lombok.extern.slf4j.Slf4j;

import java.io.IOException;

@Slf4j @Component public class LuceneCodecFactory {

private final int maxDimensions =16384;/** * */ public Codec getCodec() {
//return Lucene95Codec.getDefault(); log.info("Get codec ...");
Codec codec =new Lucene95Codec() {
@Override public KnnVectorsFormat getKnnVectorsFormatForField(String field) {
var delegate =new Lucene95HnswVectorsFormat();
log.info("Maximum Vector Dimension: " +maxDimensions);
return new DelegatingKnnVectorsFormat(delegate,maxDimensions);
}
};

return codec;
}
}

/** * This class exists because Lucene95HnswVectorsFormat's
getMaxDimensions method is final and we * need to workaround that
constraint to allow more than the default number of dimensions */ @Slf4j
class DelegatingKnnVectorsFormatextends KnnVectorsFormat {
private final KnnVectorsFormatdelegate;
private final int maxDimensions;

public DelegatingKnnVectorsFormat(KnnVectorsFormat delegate,int maxDimensions) {
super(delegate.getName());
this.delegate = delegate;
this.maxDimensions = maxDimensions;
}

@Override public KnnVectorsWriter fieldsWriter(SegmentWriteState state)throws IOException {
return delegate.fieldsWriter(state);
}

@Override public KnnVectorsReader fieldsReader(SegmentReadState state)throws IOException {
return delegate.fieldsReader(state);
}

@Override public int getMaxDimensions(String fieldName) {
log.info("Maximum vector dimension: " +maxDimensions);
return maxDimensions;
}
}






Am 19.10.23 um 11:23 schrieb Uwe Schindler:
> Hi Michael,
>
> The max vector dimension limit is no longer checked in the field type
> as it is responsibility of the codec to enforce it.
>
> You need to build your own codec that returns a different setting so
> it can be enforced by IndexWriter. See Apache Solr's code how to wrap
> the existing KnnVectorsFormat so it returns another limit:
> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183>
>
>
> Basically you need to subclass Lucene95Codec like done here:
> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146>
> and return a different vectors format like a delegator as descirbed
> before.
>
> The responsibility was shifted to the codec, because there may be
> better alternatives to HNSW that have different limits especially with
> regard to performance during merging and query response times, e.g.
> BKD trees.
>
> Uwe
>
> Am 19.10.2023 um 10:53 schrieb Michael Wechner:
>> I forgot to mention, that when using the custom FieldType and 1536
>> vector dimension does work with Lucene 9.7.0
>>
>> Thanks
>>
>> Michael
>>
>>
>>
>> Am 19.10.23 um 10:39 schrieb Michael Wechner:
>>> Hi
>>>
>>> I recently upgraded Lucene to 9.8.0 and was running tests with
>>> OpenAI's embedding model, which has the vector dimension 1536 and
>>> received the following error
>>>
>>> Field[vector]vector's dimensions must be <= [1024]; got 1536
>>>
>>> wheres this worked previously with the hack to override the vector
>>> dimension using a custom
>>>
>>> float[] vector = ...
>>> FieldType vectorFieldType = new CustomVectorFieldType(vector.length,
>>> VectorSimilarityFuncion.COSINE);
>>>
>>> and setting
>>>
>>> KnnFloatVectorField vectorField = new
>>> KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);
>>>
>>> But this does not seem to work anymore with Lucene 9.8.0
>>>
>>> Is this hack now prevented by the Lucene code itself, or any idea
>>> how to make this work again?
>>>
>>> Whatever one thinks of OpenAI, the embedding model
>>> "text-embedding-ada-002" is really good and it is sad, that one
>>> cannot use it with Lucene, because of the 1024 dimension restriction.
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
Re: Field[vector]vector's dimensions must be <= [1024]; got 1536 [ In reply to ]
Hi Michael,

The version below looks correct. Of course the Solr version is able to
do much more. The code you posted limits it to the bare minimum:

* subclass default codec
* implement getKnnVectorsFormatForField() and return the wrapper with
other max dimension

Reading indexes still works with unmodified default codec, you only need
to set it for IndexWriter. When reading the actual codec is looked up by
name.

Uwe

Am 07.11.2023 um 17:03 schrieb Michael Wechner:
> Hi Uwe
>
> Thanks again for your feedback, I got it working now :-)
>
> I am using a simplified version, which I will post below, such that it
> might help others, at least as long as this implementation makes sense.
>
> Btw, when a new version of Lucene gets released, how do I best find
> out that  "Lucene95Codec" is still the most recent default codec or
> that there is a new default codec?
>
> Thanks
>
> Michael
>
> ---
>
> @Autowired private LuceneCodecFactoryluceneCodecFactory;
>
> IndexWriterConfig iwc =new IndexWriterConfig();
> iwc.setCodec(luceneCodecFactory.getCodec());
>
> ----
>
> package com.erkigsnek.webapp.services;
>
> import org.apache.lucene.codecs.Codec;
> import org.apache.lucene.codecs.KnnVectorsFormat;
> import org.apache.lucene.codecs.KnnVectorsReader;
> import org.apache.lucene.codecs.KnnVectorsWriter;
> import org.apache.lucene.codecs.lucene95.Lucene95Codec;
> import org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsFormat;
> import org.apache.lucene.index.SegmentReadState;
> import org.apache.lucene.index.SegmentWriteState;
> import org.springframework.stereotype.Component;
> import lombok.extern.slf4j.Slf4j;
>
> import java.io.IOException;
>
> @Slf4j @Component public class LuceneCodecFactory {
>
>     private final int maxDimensions =16384;/** * */ public Codec
> getCodec() {
>         //return Lucene95Codec.getDefault(); log.info("Get codec ...");
>         Codec codec =new Lucene95Codec() {
>             @Override public KnnVectorsFormat
> getKnnVectorsFormatForField(String field) {
>                 var delegate =new Lucene95HnswVectorsFormat();
>                 log.info("Maximum Vector Dimension: " +maxDimensions);
>                 return new
> DelegatingKnnVectorsFormat(delegate,maxDimensions);
>             }
>         };
>
>         return codec;
>     }
> }
>
> /** * This class exists because Lucene95HnswVectorsFormat's
> getMaxDimensions method is final and we * need to workaround that
> constraint to allow more than the default number of dimensions */
> @Slf4j class DelegatingKnnVectorsFormatextends KnnVectorsFormat {
>     private final KnnVectorsFormatdelegate;
>     private final int maxDimensions;
>
>     public DelegatingKnnVectorsFormat(KnnVectorsFormat delegate,int
> maxDimensions) {
>         super(delegate.getName());
>         this.delegate = delegate;
>         this.maxDimensions = maxDimensions;
>     }
>
>     @Override public KnnVectorsWriter fieldsWriter(SegmentWriteState
> state)throws IOException {
>         return delegate.fieldsWriter(state);
>     }
>
>     @Override public KnnVectorsReader fieldsReader(SegmentReadState
> state)throws IOException {
>         return delegate.fieldsReader(state);
>     }
>
>     @Override public int getMaxDimensions(String fieldName) {
>         log.info("Maximum vector dimension: " +maxDimensions);
>         return maxDimensions;
>     }
> }
>
>
>
>
>
>
> Am 19.10.23 um 11:23 schrieb Uwe Schindler:
>> Hi Michael,
>>
>> The max vector dimension limit is no longer checked in the field type
>> as it is responsibility of the codec to enforce it.
>>
>> You need to build your own codec that returns a different setting so
>> it can be enforced by IndexWriter. See Apache Solr's code how to wrap
>> the existing KnnVectorsFormat so it returns another limit:
>> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183>
>>
>>
>> Basically you need to subclass Lucene95Codec like done here:
>> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146>
>> and return a different vectors format like a delegator as descirbed
>> before.
>>
>> The responsibility was shifted to the codec, because there may be
>> better alternatives to HNSW that have different limits especially
>> with regard to performance during merging and query response times,
>> e.g. BKD trees.
>>
>> Uwe
>>
>> Am 19.10.2023 um 10:53 schrieb Michael Wechner:
>>> I forgot to mention, that when using the custom FieldType and 1536
>>> vector dimension does work with Lucene 9.7.0
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>>
>>>
>>> Am 19.10.23 um 10:39 schrieb Michael Wechner:
>>>> Hi
>>>>
>>>> I recently upgraded Lucene to 9.8.0 and was running tests with
>>>> OpenAI's embedding model, which has the vector dimension 1536 and
>>>> received the following error
>>>>
>>>> Field[vector]vector's dimensions must be <= [1024]; got 1536
>>>>
>>>> wheres this worked previously with the hack to override the vector
>>>> dimension using a custom
>>>>
>>>> float[] vector = ...
>>>> FieldType vectorFieldType = new
>>>> CustomVectorFieldType(vector.length, VectorSimilarityFuncion.COSINE);
>>>>
>>>> and setting
>>>>
>>>> KnnFloatVectorField vectorField = new
>>>> KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);
>>>>
>>>> But this does not seem to work anymore with Lucene 9.8.0
>>>>
>>>> Is this hack now prevented by the Lucene code itself, or any idea
>>>> how to make this work again?
>>>>
>>>> Whatever one thinks of OpenAI, the embedding model
>>>> "text-embedding-ada-002" is really good and it is sad, that one
>>>> cannot use it with Lucene, because of the 1024 dimension restriction.
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de
Re: Field[vector]vector's dimensions must be <= [1024]; got 1536 [ In reply to ]
Hi Uwe

Thank you very much for confirming the code!

Yes, I only set it for the IndexWriter, but what I meant to ask was,
what if the default Codec gets updated
and I will update my implementation, then I guess I will have to reindex
from scratch, right?

Or can I assume that the default Codec is backwards compatible also for
writing?

Thanks

Michael



Am 08.11.23 um 09:25 schrieb Uwe Schindler:
> Hi Michael,
>
> The version below looks correct. Of course the Solr version is able to
> do much more. The code you posted limits it to the bare minimum:
>
> ?* subclass default codec
> ?* implement getKnnVectorsFormatForField() and return the wrapper with
> ?? other max dimension
>
> Reading indexes still works with unmodified default codec, you only
> need to set it for IndexWriter. When reading the actual codec is
> looked up by name.
>
> Uwe
>
> Am 07.11.2023 um 17:03 schrieb Michael Wechner:
>> Hi Uwe
>>
>> Thanks again for your feedback, I got it working now :-)
>>
>> I am using a simplified version, which I will post below, such that
>> it might help others, at least as long as this implementation makes
>> sense.
>>
>> Btw, when a new version of Lucene gets released, how do I best find
>> out that? "Lucene95Codec" is still the most recent default codec or
>> that there is a new default codec?
>>
>> Thanks
>>
>> Michael
>>
>> ---
>>
>> @Autowired private LuceneCodecFactoryluceneCodecFactory;
>>
>> IndexWriterConfig iwc =new IndexWriterConfig();
>> iwc.setCodec(luceneCodecFactory.getCodec());
>>
>> ----
>>
>> package com.erkigsnek.webapp.services;
>>
>> import org.apache.lucene.codecs.Codec;
>> import org.apache.lucene.codecs.KnnVectorsFormat;
>> import org.apache.lucene.codecs.KnnVectorsReader;
>> import org.apache.lucene.codecs.KnnVectorsWriter;
>> import org.apache.lucene.codecs.lucene95.Lucene95Codec;
>> import org.apache.lucene.codecs.lucene95.Lucene95HnswVectorsFormat;
>> import org.apache.lucene.index.SegmentReadState;
>> import org.apache.lucene.index.SegmentWriteState;
>> import org.springframework.stereotype.Component;
>> import lombok.extern.slf4j.Slf4j;
>>
>> import java.io.IOException;
>>
>> @Slf4j @Component public class LuceneCodecFactory {
>>
>> ??? private final int maxDimensions =16384;/** * */ public Codec
>> getCodec() {
>> ??????? //return Lucene95Codec.getDefault(); log.info("Get codec ...");
>> ??????? Codec codec =new Lucene95Codec() {
>> ??????????? @Override public KnnVectorsFormat
>> getKnnVectorsFormatForField(String field) {
>> ??????????????? var delegate =new Lucene95HnswVectorsFormat();
>> ??????????????? log.info("Maximum Vector Dimension: " +maxDimensions);
>> ??????????????? return new
>> DelegatingKnnVectorsFormat(delegate,maxDimensions);
>> ??????????? }
>> ??????? };
>>
>> ??????? return codec;
>> ??? }
>> }
>>
>> /** * This class exists because Lucene95HnswVectorsFormat's
>> getMaxDimensions method is final and we * need to workaround that
>> constraint to allow more than the default number of dimensions */
>> @Slf4j class DelegatingKnnVectorsFormatextends KnnVectorsFormat {
>> ??? private final KnnVectorsFormatdelegate;
>> ??? private final int maxDimensions;
>>
>> ??? public DelegatingKnnVectorsFormat(KnnVectorsFormat delegate,int
>> maxDimensions) {
>> ??????? super(delegate.getName());
>> ??????? this.delegate = delegate;
>> ??????? this.maxDimensions = maxDimensions;
>> ??? }
>>
>> ??? @Override public KnnVectorsWriter fieldsWriter(SegmentWriteState
>> state)throws IOException {
>> ??????? return delegate.fieldsWriter(state);
>> ??? }
>>
>> ??? @Override public KnnVectorsReader fieldsReader(SegmentReadState
>> state)throws IOException {
>> ??????? return delegate.fieldsReader(state);
>> ??? }
>>
>> ??? @Override public int getMaxDimensions(String fieldName) {
>> ??????? log.info("Maximum vector dimension: " +maxDimensions);
>> ??????? return maxDimensions;
>> ??? }
>> }
>>
>>
>>
>>
>>
>>
>> Am 19.10.23 um 11:23 schrieb Uwe Schindler:
>>> Hi Michael,
>>>
>>> The max vector dimension limit is no longer checked in the field
>>> type as it is responsibility of the codec to enforce it.
>>>
>>> You need to build your own codec that returns a different setting so
>>> it can be enforced by IndexWriter. See Apache Solr's code how to
>>> wrap the existing KnnVectorsFormat so it returns another limit:
>>> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L159-L183>
>>>
>>>
>>> Basically you need to subclass Lucene95Codec like done here:
>>> <https://github.com/apache/solr/blob/6d50c592fb0b7e0ea2e52ecf1cde7e882e1d0d0a/solr/core/src/java/org/apache/solr/core/SchemaCodecFactory.java#L99-L146>
>>> and return a different vectors format like a delegator as descirbed
>>> before.
>>>
>>> The responsibility was shifted to the codec, because there may be
>>> better alternatives to HNSW that have different limits especially
>>> with regard to performance during merging and query response times,
>>> e.g. BKD trees.
>>>
>>> Uwe
>>>
>>> Am 19.10.2023 um 10:53 schrieb Michael Wechner:
>>>> I forgot to mention, that when using the custom FieldType and 1536
>>>> vector dimension does work with Lucene 9.7.0
>>>>
>>>> Thanks
>>>>
>>>> Michael
>>>>
>>>>
>>>>
>>>> Am 19.10.23 um 10:39 schrieb Michael Wechner:
>>>>> Hi
>>>>>
>>>>> I recently upgraded Lucene to 9.8.0 and was running tests with
>>>>> OpenAI's embedding model, which has the vector dimension 1536 and
>>>>> received the following error
>>>>>
>>>>> Field[vector]vector's dimensions must be <= [1024]; got 1536
>>>>>
>>>>> wheres this worked previously with the hack to override the vector
>>>>> dimension using a custom
>>>>>
>>>>> float[] vector = ...
>>>>> FieldType vectorFieldType = new
>>>>> CustomVectorFieldType(vector.length, VectorSimilarityFuncion.COSINE);
>>>>>
>>>>> and setting
>>>>>
>>>>> KnnFloatVectorField vectorField = new
>>>>> KnnFloatVectorField("VECTOR_FIELD", vector, vectorFieldType);
>>>>>
>>>>> But this does not seem to work anymore with Lucene 9.8.0
>>>>>
>>>>> Is this hack now prevented by the Lucene code itself, or any idea
>>>>> how to make this work again?
>>>>>
>>>>> Whatever one thinks of OpenAI, the embedding model
>>>>> "text-embedding-ada-002" is really good and it is sad, that one
>>>>> cannot use it with Lucene, because of the 1024 dimension restriction.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org