Mailing List Archive

Dimensions Limit for KNN vectors - Next Steps
We had a very long-running (and heated) thread about this (*[Proposal]
Remove max number of dimensions for KNN vectors*).
Without repeating any of it, I recommend we move this forward in this way:
*We stop any discussion* and everyone interested proposes an option with a
motivation, then we aggregate the options and create a Vote.

*Please, DO NOT use this thread for anything else than your proposed
option.*
All e-mails in this thread should be structured:
*Proposed Option:*
*Motivation:*

Let's keep this open for 1 week and then I'll aggregate the options and set
up the VOTE thread.
If you have anything else to add, please use the old thread.

Cheers

--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>
Re: Dimensions Limit for KNN vectors - Next Steps [ In reply to ]
*Proposed option*: make the limit configurable
*Motivation*:
The system administrator can enforce a limit its users need to respect that
it's in line with whatever the admin decided to be acceptable for them.
The default can stay the current one.
This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and
any sort of plugin development
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benedetti@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Tue, 9 May 2023 at 13:07, Alessandro Benedetti <a.benedetti@sease.io>
wrote:

> We had a very long-running (and heated) thread about this (*[Proposal]
> Remove max number of dimensions for KNN vectors*).
> Without repeating any of it, I recommend we move this forward in this way:
> *We stop any discussion* and everyone interested proposes an option with
> a motivation, then we aggregate the options and create a Vote.
>
> *Please, DO NOT use this thread for anything else than your proposed
> option.*
> All e-mails in this thread should be structured:
> *Proposed Option:*
> *Motivation:*
>
> Let's keep this open for 1 week and then I'll aggregate the options and
> set up the VOTE thread.
> If you have anything else to add, please use the old thread.
>
> Cheers
>
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benedetti@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
Re: Dimensions Limit for KNN vectors - Next Steps [ In reply to ]
+1

Michael Wechner

Am 09.05.23 um 14:08 schrieb Alessandro Benedetti:
>
> *Proposed option*: make the limit configurable
> *Motivation*:
> The system administrator can enforce a limit its users need to respect
> that it's in line with whatever the admin decided to be acceptable for
> them.
> The default can stay the current one.
> This should open the doors for Apache Solr, Elasticsearch, OpenSearch,
> and any sort of plugin development
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> /Apache Lucene/Solr Committer/
> /Apache Solr PMC Member/
>
> e-mail: a.benedetti@sease.io/
> /
>
> *Sease*?- Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd>?| Twitter
> <https://twitter.com/seaseltd>?| Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ>?| Github
> <https://github.com/seaseltd>
>
>
> On Tue, 9 May 2023 at 13:07, Alessandro Benedetti
> <a.benedetti@sease.io> wrote:
>
> We had a very long-running (and heated) thread about this
> (/[Proposal] Remove max number of dimensions for KNN vectors/).
> Without repeating any of it, I recommend we move this forward in
> this way:
> *We stop any discussion* and everyone interested proposes an
> option with a motivation, then we aggregate the options and create
> a Vote.
>
> _Please, DO NOT use this thread for anything else than your
> proposed option._
> All e-mails in this thread should be structured:
> *Proposed Option:*
> *Motivation:*
>
> Let's keep this open for 1 week and then I'll aggregate the
> options and set up the VOTE thread.
> If you have anything else to add, please use the old thread.
>
> Cheers
>
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> /Apache Lucene/Solr Committer/
> /Apache Solr PMC Member/
>
> e-mail: a.benedetti@sease.io/
> /
>
> *Sease*?- Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd>?| Twitter
> <https://twitter.com/seaseltd>?| Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ>?|
> Github <https://github.com/seaseltd>
>
Re: Dimensions Limit for KNN vectors - Next Steps [ In reply to ]
Over the past month, and lots of working with Lucene, I've moved to Robert
Muir's camp.

*Proposed option*: We focus our efforts on improving the testing
infrastructure, stability, and performance of the feature as is prior to
introducing more complexity. Someone could benefit the community to take
the lead in cataloging all of these efforts in a common place to be easily
referenced and analyzed. If we go with this option, I or someone more
talented than me could lead that effort. After we have sufficient evidence,
we could reconsider bumping the limit with strong consensus.

*Motivation*: We are close to improving on many fronts. Given the
criticality of Lucene in computing infrastructure and the concerns raised
by one of the most active stewards of the project, I think we should keep
working toward improving the feature as is and move to up the limit after
we can demonstrate improvement unambiguously.

-1 (non-binding)

Marcus Eagan

On Tue, May 9, 2023 at 2:49?PM Michael Wechner <michael.wechner@wyona.com>
wrote:

> +1
>
> Michael Wechner
>
> Am 09.05.23 um 14:08 schrieb Alessandro Benedetti:
>
>
> *Proposed option*: make the limit configurable
> *Motivation*:
> The system administrator can enforce a limit its users need to respect
> that it's in line with whatever the admin decided to be acceptable for
> them.
> The default can stay the current one.
> This should open the doors for Apache Solr, Elasticsearch, OpenSearch, and
> any sort of plugin development
> --------------------------
> *Alessandro Benedetti*
> Director @ Sease Ltd.
> *Apache Lucene/Solr Committer*
> *Apache Solr PMC Member*
>
> e-mail: a.benedetti@sease.io
>
>
> *Sease* - Information Retrieval Applied
> Consulting | Training | Open Source
>
> Website: Sease.io <http://sease.io/>
> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
> <https://twitter.com/seaseltd> | Youtube
> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
> <https://github.com/seaseltd>
>
>
> On Tue, 9 May 2023 at 13:07, Alessandro Benedetti <a.benedetti@sease.io>
> wrote:
>
>> We had a very long-running (and heated) thread about this (*[Proposal]
>> Remove max number of dimensions for KNN vectors*).
>> Without repeating any of it, I recommend we move this forward in this way:
>> *We stop any discussion* and everyone interested proposes an option with
>> a motivation, then we aggregate the options and create a Vote.
>>
>> *Please, DO NOT use this thread for anything else than your proposed
>> option.*
>> All e-mails in this thread should be structured:
>> *Proposed Option:*
>> *Motivation:*
>>
>> Let's keep this open for 1 week and then I'll aggregate the options and
>> set up the VOTE thread.
>> If you have anything else to add, please use the old thread.
>>
>> Cheers
>>
>> --------------------------
>> *Alessandro Benedetti*
>> Director @ Sease Ltd.
>> *Apache Lucene/Solr Committer*
>> *Apache Solr PMC Member*
>>
>> e-mail: a.benedetti@sease.io
>>
>>
>> *Sease* - Information Retrieval Applied
>> Consulting | Training | Open Source
>>
>> Website: Sease.io <http://sease.io/>
>> LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
>> <https://twitter.com/seaseltd> | Youtube
>> <https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
>> <https://github.com/seaseltd>
>>
>
>

--
Marcus Eagan
Re: Dimensions Limit for KNN vectors - Next Steps [ In reply to ]
*Proposed option:* Move the max dimension limit lower level to a HNSW
specific implementation. Once there, this limit would not bind any other
potential vector engine alternative/evolution.

*Motivation:* There seem to be contradictory performance interpretations
about the current HNSW implementation. Some consider its performance ok,
some not, and it depends on the target data set and use-case. Increasing
the max dimension limit where it is currently (in top level
FloatVectorValues) would not allow potential alternatives (e.g. for other
use-cases) to be based on a lower limit.

Bruno
Re: Dimensions Limit for KNN vectors - Next Steps [ In reply to ]
That's actually a good idea.

+1

Am 10.05.2023 um 09:22 schrieb Bruno Roustant:
> *Proposed option:* Move the max dimension limit lower level to a HNSW
> specific implementation. Once there, this limit would not bind any
> other potential vector engine alternative/evolution.
>
> *Motivation:* There seem to be contradictory performance
> interpretations about the current HNSW implementation. Some consider
> its performance ok, some not, and it depends on the target data set
> and use-case. Increasing the max dimension limit where it is currently
> (in top level FloatVectorValues) would not allow
> potential alternatives (e.g. for other use-cases) to be based on a
> lower limit.
>
> Bruno

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de
Re: Dimensions Limit for KNN vectors - Next Steps [ In reply to ]
Both what Allesandro said and what Bruno said: make it configurable and
move it. Both are good and not mutually exclusive and could happen in any
order.

Marcus, are you against configurability? In particular, I propose a
simple Integer.getInteger("lucene.hnsw.maxDimensions", 1024). Your response
suggests trying to somehow perfect what the _default_ limit should be, but
I've not seen an argument _against_ configurability. Especially in this
way -- a toggle that doesn't bind Lucene's APIs in any way.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Thu, May 11, 2023 at 7:59?AM Uwe Schindler <uwe@thetaphi.de> wrote:

> That's actually a good idea.
>
> +1
> Am 10.05.2023 um 09:22 schrieb Bruno Roustant:
>
> *Proposed option:* Move the max dimension limit lower level to a HNSW
> specific implementation. Once there, this limit would not bind any other
> potential vector engine alternative/evolution.
>
> *Motivation:* There seem to be contradictory performance interpretations
> about the current HNSW implementation. Some consider its performance ok,
> some not, and it depends on the target data set and use-case. Increasing
> the max dimension limit where it is currently (in top level
> FloatVectorValues) would not allow potential alternatives (e.g. for other
> use-cases) to be based on a lower limit.
>
> Bruno
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>