Hello,
I've tried to catch up on the vector API and I have the following
questions. I've tried to read through discussions on JIRA first in case it
had been covered, but it's possible I missed some relevant ones.
Should VectorValues#search be on VectorReader instead? It felt a bit odd to
me to have the search logic on the iterator.
Do we need SearchStrategy.NONE? Documentation suggests that it allows
storing vectors but that NN search won't be supported. This looks like a
use-case for binary doc values to me? It also slightly caught me by
surprise due to the inconsistency with IndexOptions.NONE, which means "do
not index this field" (and likewise for DocValuesType.NONE), so I first
assumed that SearchStrategy.NONE also meant "do not index this field as a
vector".
While postings and doc-value formats allow per-field configuration via
PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different
mechanism where VectorField#createHnswType sets attributes on the field
type that the vectors writer then reads. Should we have a
PerFieldVectorsFormat instead and configure these options via the vectors
format?
Should SearchStrategy constants avoid explicit references to HNSW? The rest
of the API seems to try to be agnostic of the way that NN search is
implemented. Could we make SearchStrategy only about the similarity metric
that is used for vectors? This particular point seems discussed on
LUCENE-9322 <https://issues.apache.org/jira/browse/LUCENE-9322> but I
couldn't find the conclusion.
Should we rename VectorFormat to VectorsFormat? This would be more
consistent with other file formats that use the plural, like
PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
--
Adrien
I've tried to catch up on the vector API and I have the following
questions. I've tried to read through discussions on JIRA first in case it
had been covered, but it's possible I missed some relevant ones.
Should VectorValues#search be on VectorReader instead? It felt a bit odd to
me to have the search logic on the iterator.
Do we need SearchStrategy.NONE? Documentation suggests that it allows
storing vectors but that NN search won't be supported. This looks like a
use-case for binary doc values to me? It also slightly caught me by
surprise due to the inconsistency with IndexOptions.NONE, which means "do
not index this field" (and likewise for DocValuesType.NONE), so I first
assumed that SearchStrategy.NONE also meant "do not index this field as a
vector".
While postings and doc-value formats allow per-field configuration via
PerFieldPostingsFormat/PerFieldDocValuesFormat, vectors use a different
mechanism where VectorField#createHnswType sets attributes on the field
type that the vectors writer then reads. Should we have a
PerFieldVectorsFormat instead and configure these options via the vectors
format?
Should SearchStrategy constants avoid explicit references to HNSW? The rest
of the API seems to try to be agnostic of the way that NN search is
implemented. Could we make SearchStrategy only about the similarity metric
that is used for vectors? This particular point seems discussed on
LUCENE-9322 <https://issues.apache.org/jira/browse/LUCENE-9322> but I
couldn't find the conclusion.
Should we rename VectorFormat to VectorsFormat? This would be more
consistent with other file formats that use the plural, like
PostingsFormat, DocValuesFormat, TermVectorsFormat, etc.?
--
Adrien