Hi all,
we have finalized all the options proposed by the community and we are
ready to vote for the preferred one and then proceed with the
implementation.
*Option 1*
Keep it as it is (dimension limit hardcoded to 1024)
*Motivation*:
We are close to improving on many fronts. Given the criticality of Lucene
in computing infrastructure and the concerns raised by one of the most
active stewards of the project, I think we should keep working toward
improving the feature as is and move to up the limit after we can
demonstrate improvement unambiguously.
*Option 2*
make the limit configurable, for example through a system property
*Motivation*:
The system administrator can enforce a limit its users need to respect that
it's in line with whatever the admin decided to be acceptable for them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and
any sort of plugin development
*Option 3*
Move the max dimension limit lower level to a HNSW specific implementation.
Once there, this limit would not bind any other potential vector engine
alternative/evolution.
*Motivation:* There seem to be contradictory performance interpretations
about the current HNSW implementation. Some consider its performance ok,
some not, and it depends on the target data set and use case. Increasing
the max dimension limit where it is currently (in top level
FloatVectorValues) would not allow potential alternatives (e.g. for other
use-cases) to be based on a lower limit.
*Option 4*
Make it configurable and move it to an appropriate place.
In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
1024) should be enough.
*Motivation*:
Both are good and not mutually exclusive and could happen in any order.
Someone suggested to perfect what the _default_ limit should be, but I've
not seen an argument _against_ configurability. Especially in this way --
a toggle that doesn't bind Lucene's APIs in any way.
I'll keep this [VOTE] open for a week and then proceed to the
implementation.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*
e-mail: a.benedetti@sease.io
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>
we have finalized all the options proposed by the community and we are
ready to vote for the preferred one and then proceed with the
implementation.
*Option 1*
Keep it as it is (dimension limit hardcoded to 1024)
*Motivation*:
We are close to improving on many fronts. Given the criticality of Lucene
in computing infrastructure and the concerns raised by one of the most
active stewards of the project, I think we should keep working toward
improving the feature as is and move to up the limit after we can
demonstrate improvement unambiguously.
*Option 2*
make the limit configurable, for example through a system property
*Motivation*:
The system administrator can enforce a limit its users need to respect that
it's in line with whatever the admin decided to be acceptable for them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and
any sort of plugin development
*Option 3*
Move the max dimension limit lower level to a HNSW specific implementation.
Once there, this limit would not bind any other potential vector engine
alternative/evolution.
*Motivation:* There seem to be contradictory performance interpretations
about the current HNSW implementation. Some consider its performance ok,
some not, and it depends on the target data set and use case. Increasing
the max dimension limit where it is currently (in top level
FloatVectorValues) would not allow potential alternatives (e.g. for other
use-cases) to be based on a lower limit.
*Option 4*
Make it configurable and move it to an appropriate place.
In particular, a simple Integer.getInteger("lucene.hnsw.maxDimensions",
1024) should be enough.
*Motivation*:
Both are good and not mutually exclusive and could happen in any order.
Someone suggested to perfect what the _default_ limit should be, but I've
not seen an argument _against_ configurability. Especially in this way --
a toggle that doesn't bind Lucene's APIs in any way.
I'll keep this [VOTE] open for a week and then proceed to the
implementation.
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*
e-mail: a.benedetti@sease.io
*Sease* - Information Retrieval Applied
Consulting | Training | Open Source
Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>