Mailing List Archive

ANN search current state
Hi,
 
           I want to incorporate semantic search in my project, which uses Lucene. I want to use sentence embeddings and ANN (approximate nearest neighbor) search. I found the related Lucene issues: https://issues.apache.org/jira/browse/LUCENE-9004 , https://issues.apache.org/jira/browse/LUCENE-9136https://issues.apache.org/jira/browse/LUCENE-9322 . I see that there are some related work and related PRs. What is the current state of this functionality?
 
--
Thanks,
Mikhail
 
 
Re: ANN search current state [ In reply to ]
Hi Mikhail,

I'm not sure about the state of ANN in lucene proper. Very interested to
see the response from others.
I've been doing some work on ANN for an Elasticsearch plugin:
http://elastiknn.klibisz.com/
I think it's possible to extract my custom queries and modeling code so
that it's elasticsearch-agnostic and can be used directly in Lucene apps.
However I'm much more familiar with Elasticsearch's APIs and usage/testing
patterns than I am with raw Lucene, so I'd likely need to get some help
from the Lucene community.
Please LMK if that sounds interesting to anyone.

- Alex



On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wmaster@mail.ru.invalid> wrote:

>
> Hi,
>
> I want to incorporate semantic search in my project, which uses
> Lucene. I want to use sentence embeddings and ANN (approximate nearest
> neighbor) search. I found the related Lucene issues:
> https://issues.apache.org/jira/browse/LUCENE-9004 ,
> https://issues.apache.org/jira/browse/LUCENE-9136 ,
> https://issues.apache.org/jira/browse/LUCENE-9322 . I see that there
> are some related work and related PRs. What is the current state of this
> functionality?
>
> --
> Thanks,
> Mikhail
>
>
Re: ANN search current state [ In reply to ]
We have some prototype implementations in the issues you found. If
you want to try out the approaches in those issues, you could build
Lucene from source and patch it, but there is no release containing
KNN/vector support. We're still working to establish consensus on what
the best way forward is. I think the most fruitful thing we can do at
the moment is establish a format for storing and accessing vectors
that will support different approaches since there is such a rich
variety of algorithms and approaches in this area. The last issue you
pointed to is focused on the format.

On Wed, Jul 15, 2020 at 11:20 AM Alex K <aklibisz@gmail.com> wrote:
>
> Hi Mikhail,
>
> I'm not sure about the state of ANN in lucene proper. Very interested to
> see the response from others.
> I've been doing some work on ANN for an Elasticsearch plugin:
> http://elastiknn.klibisz.com/
> I think it's possible to extract my custom queries and modeling code so
> that it's elasticsearch-agnostic and can be used directly in Lucene apps.
> However I'm much more familiar with Elasticsearch's APIs and usage/testing
> patterns than I am with raw Lucene, so I'd likely need to get some help
> from the Lucene community.
> Please LMK if that sounds interesting to anyone.
>
> - Alex
>
>
>
> On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wmaster@mail.ru.invalid> wrote:
>
> >
> > Hi,
> >
> > I want to incorporate semantic search in my project, which uses
> > Lucene. I want to use sentence embeddings and ANN (approximate nearest
> > neighbor) search. I found the related Lucene issues:
> > https://issues.apache.org/jira/browse/LUCENE-9004 ,
> > https://issues.apache.org/jira/browse/LUCENE-9136 ,
> > https://issues.apache.org/jira/browse/LUCENE-9322 . I see that there
> > are some related work and related PRs. What is the current state of this
> > functionality?
> >
> > --
> > Thanks,
> > Mikhail
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ANN search current state [ In reply to ]
I’m a bit of a layman in this area, but if we are talking about formats for
vectors, I vote for the one used by FastAI word vectors. It’s pretty easy
to work with.

If we are talking about the same / similiar things, if not just ignore me ????

On Thu, 16 Jul 2020 at 7:06 PM, Michael Sokolov <msokolov@gmail.com> wrote:

> We have some prototype implementations in the issues you found. If
> you want to try out the approaches in those issues, you could build
> Lucene from source and patch it, but there is no release containing
> KNN/vector support. We're still working to establish consensus on what
> the best way forward is. I think the most fruitful thing we can do at
> the moment is establish a format for storing and accessing vectors
> that will support different approaches since there is such a rich
> variety of algorithms and approaches in this area. The last issue you
> pointed to is focused on the format.
>
> On Wed, Jul 15, 2020 at 11:20 AM Alex K <aklibisz@gmail.com> wrote:
> >
> > Hi Mikhail,
> >
> > I'm not sure about the state of ANN in lucene proper. Very interested to
> > see the response from others.
> > I've been doing some work on ANN for an Elasticsearch plugin:
> > http://elastiknn.klibisz.com/
> > I think it's possible to extract my custom queries and modeling code so
> > that it's elasticsearch-agnostic and can be used directly in Lucene apps.
> > However I'm much more familiar with Elasticsearch's APIs and
> usage/testing
> > patterns than I am with raw Lucene, so I'd likely need to get some help
> > from the Lucene community.
> > Please LMK if that sounds interesting to anyone.
> >
> > - Alex
> >
> >
> >
> > On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wmaster@mail.ru.invalid>
> wrote:
> >
> > >
> > > Hi,
> > >
> > > I want to incorporate semantic search in my project, which
> uses
> > > Lucene. I want to use sentence embeddings and ANN (approximate nearest
> > > neighbor) search. I found the related Lucene issues:
> > > https://issues.apache.org/jira/browse/LUCENE-9004 ,
> > > https://issues.apache.org/jira/browse/LUCENE-9136 ,
> > > https://issues.apache.org/jira/browse/LUCENE-9322 . I see that there
> > > are some related work and related PRs. What is the current state of
> this
> > > functionality?
> > >
> > > --
> > > Thanks,
> > > Mikhail
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: ANN search current state [ In reply to ]
would it make sense to create a separate Lucene module for ANN search ?
we could then experiment with the different approaches and compare them
across the same benchmarks.

On Thu, 16 Jul 2020 at 23:14, Ali Akhtar <ali@ali.actor> wrote:

> I’m a bit of a layman in this area, but if we are talking about formats for
> vectors, I vote for the one used by FastAI word vectors. It’s pretty easy
> to work with.
>
> If we are talking about the same / similiar things, if not just ignore me
> ????
>
> On Thu, 16 Jul 2020 at 7:06 PM, Michael Sokolov <msokolov@gmail.com>
> wrote:
>
> > We have some prototype implementations in the issues you found. If
> > you want to try out the approaches in those issues, you could build
> > Lucene from source and patch it, but there is no release containing
> > KNN/vector support. We're still working to establish consensus on what
> > the best way forward is. I think the most fruitful thing we can do at
> > the moment is establish a format for storing and accessing vectors
> > that will support different approaches since there is such a rich
> > variety of algorithms and approaches in this area. The last issue you
> > pointed to is focused on the format.
> >
> > On Wed, Jul 15, 2020 at 11:20 AM Alex K <aklibisz@gmail.com> wrote:
> > >
> > > Hi Mikhail,
> > >
> > > I'm not sure about the state of ANN in lucene proper. Very interested
> to
> > > see the response from others.
> > > I've been doing some work on ANN for an Elasticsearch plugin:
> > > http://elastiknn.klibisz.com/
> > > I think it's possible to extract my custom queries and modeling code so
> > > that it's elasticsearch-agnostic and can be used directly in Lucene
> apps.
> > > However I'm much more familiar with Elasticsearch's APIs and
> > usage/testing
> > > patterns than I am with raw Lucene, so I'd likely need to get some help
> > > from the Lucene community.
> > > Please LMK if that sounds interesting to anyone.
> > >
> > > - Alex
> > >
> > >
> > >
> > > On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wmaster@mail.ru.invalid>
> > wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > I want to incorporate semantic search in my project, which
> > uses
> > > > Lucene. I want to use sentence embeddings and ANN (approximate
> nearest
> > > > neighbor) search. I found the related Lucene issues:
> > > > https://issues.apache.org/jira/browse/LUCENE-9004 ,
> > > > https://issues.apache.org/jira/browse/LUCENE-9136 ,
> > > > https://issues.apache.org/jira/browse/LUCENE-9322 . I see that there
> > > > are some related work and related PRs. What is the current state of
> > this
> > > > functionality?
> > > >
> > > > --
> > > > Thanks,
> > > > Mikhail
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
Re: ANN search current state [ In reply to ]
> would it make sense to create a separate Lucene module for ANN search ?

From a bit of my experience with LUCENE-9004, it is currently impossible to
plug in or opt in custom codecs and indexing chain for aknn search without
touching lucene-core module (plz correct that if it's wrong).
I think LUCENE-9322 (unified low-level Codec/Format for dense vectors)
would open the possibility for us to experiment different aknn algorithms
on some sandbox modules or even separated jars from Lucene itself.

Tomoko


2020?7?17?(?) 17:00 Tommaso Teofili <tommaso.teofili@gmail.com>:

> would it make sense to create a separate Lucene module for ANN search ?
> we could then experiment with the different approaches and compare them
> across the same benchmarks.
>
> On Thu, 16 Jul 2020 at 23:14, Ali Akhtar <ali@ali.actor> wrote:
>
> > I’m a bit of a layman in this area, but if we are talking about formats
> for
> > vectors, I vote for the one used by FastAI word vectors. It’s pretty easy
> > to work with.
> >
> > If we are talking about the same / similiar things, if not just ignore me
> > ????
> >
> > On Thu, 16 Jul 2020 at 7:06 PM, Michael Sokolov <msokolov@gmail.com>
> > wrote:
> >
> > > We have some prototype implementations in the issues you found. If
> > > you want to try out the approaches in those issues, you could build
> > > Lucene from source and patch it, but there is no release containing
> > > KNN/vector support. We're still working to establish consensus on what
> > > the best way forward is. I think the most fruitful thing we can do at
> > > the moment is establish a format for storing and accessing vectors
> > > that will support different approaches since there is such a rich
> > > variety of algorithms and approaches in this area. The last issue you
> > > pointed to is focused on the format.
> > >
> > > On Wed, Jul 15, 2020 at 11:20 AM Alex K <aklibisz@gmail.com> wrote:
> > > >
> > > > Hi Mikhail,
> > > >
> > > > I'm not sure about the state of ANN in lucene proper. Very interested
> > to
> > > > see the response from others.
> > > > I've been doing some work on ANN for an Elasticsearch plugin:
> > > > http://elastiknn.klibisz.com/
> > > > I think it's possible to extract my custom queries and modeling code
> so
> > > > that it's elasticsearch-agnostic and can be used directly in Lucene
> > apps.
> > > > However I'm much more familiar with Elasticsearch's APIs and
> > > usage/testing
> > > > patterns than I am with raw Lucene, so I'd likely need to get some
> help
> > > > from the Lucene community.
> > > > Please LMK if that sounds interesting to anyone.
> > > >
> > > > - Alex
> > > >
> > > >
> > > >
> > > > On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wmaster@mail.ru.invalid>
> > > wrote:
> > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > I want to incorporate semantic search in my project,
> which
> > > uses
> > > > > Lucene. I want to use sentence embeddings and ANN (approximate
> > nearest
> > > > > neighbor) search. I found the related Lucene issues:
> > > > > https://issues.apache.org/jira/browse/LUCENE-9004 ,
> > > > > https://issues.apache.org/jira/browse/LUCENE-9136 ,
> > > > > https://issues.apache.org/jira/browse/LUCENE-9322 . I see that
> there
> > > > > are some related work and related PRs. What is the current state of
> > > this
> > > > > functionality?
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Mikhail
> > > > >
> > > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
Re: ANN search current state [ In reply to ]
Tomoko -- this fits with my experience as well. I really like the idea of treating LUCENE-9322 as setting up a framework for experimentation + benchmarking (but not requiring us to commit a particular ANN implementation quite yet).

Julie

On 2020/07/17 12:16:18, Tomoko Uchida <t...@gmail.com> wrote:
> > would it make sense to create a separate Lucene module for ANN search ?>
>
> From a bit of my experience with LUCENE-9004, it is currently impossible to>
> plug in or opt in custom codecs and indexing chain for aknn search without>
> touching lucene-core module (plz correct that if it's wrong).>
> I think LUCENE-9322 (unified low-level Codec/Format for dense vectors)>
> would open the possibility for us to experiment different aknn algorithms>
> on some sandbox modules or even separated jars from Lucene itself.>
>
> Tomoko>
>
>
> 2020?7?17?(?) 17:00 Tommaso Teofili <to...@gmail.com>:>
>
> > would it make sense to create a separate Lucene module for ANN search ?>
> > we could then experiment with the different approaches and compare them>
> > across the same benchmarks.>
> >>
> > On Thu, 16 Jul 2020 at 23:14, Ali Akhtar <al...@ali.actor> wrote:>
> >>
> > > I’m a bit of a layman in this area, but if we are talking about formats>
> > for>
> > > vectors, I vote for the one used by FastAI word vectors. It’s pretty easy>
> > > to work with.>
> > >>
> > > If we are talking about the same / similiar things, if not just ignore me>
> > > ????>
> > >>
> > > On Thu, 16 Jul 2020 at 7:06 PM, Michael Sokolov <ms...@gmail.com>>
> > > wrote:>
> > >>
> > > > We have some prototype implementations in the issues you found. If>
> > > > you want to try out the approaches in those issues, you could build>
> > > > Lucene from source and patch it, but there is no release containing>
> > > > KNN/vector support. We're still working to establish consensus on what>
> > > > the best way forward is. I think the most fruitful thing we can do at>
> > > > the moment is establish a format for storing and accessing vectors>
> > > > that will support different approaches since there is such a rich>
> > > > variety of algorithms and approaches in this area. The last issue you>
> > > > pointed to is focused on the format.>
> > > >>
> > > > On Wed, Jul 15, 2020 at 11:20 AM Alex K <ak...@gmail.com> wrote:>
> > > > >>
> > > > > Hi Mikhail,>
> > > > >>
> > > > > I'm not sure about the state of ANN in lucene proper. Very interested>
> > > to>
> > > > > see the response from others.>
> > > > > I've been doing some work on ANN for an Elasticsearch plugin:>
> > > > > http://elastiknn.klibisz.com/>
> > > > > I think it's possible to extract my custom queries and modeling code>
> > so>
> > > > > that it's elasticsearch-agnostic and can be used directly in Lucene>
> > > apps.>
> > > > > However I'm much more familiar with Elasticsearch's APIs and>
> > > > usage/testing>
> > > > > patterns than I am with raw Lucene, so I'd likely need to get some>
> > help>
> > > > > from the Lucene community.>
> > > > > Please LMK if that sounds interesting to anyone.>
> > > > >>
> > > > > - Alex>
> > > > >>
> > > > >>
> > > > >>
> > > > > On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wm...@mail.ru.invalid>>
> > > > wrote:>
> > > > >>
> > > > > >>
> > > > > > Hi,>
> > > > > >>
> > > > > > I want to incorporate semantic search in my project,>
> > which>
> > > > uses>
> > > > > > Lucene. I want to use sentence embeddings and ANN (approximate>
> > > nearest>
> > > > > > neighbor) search. I found the related Lucene issues:>
> > > > > > https://issues.apache.org/jira/browse/LUCENE-9004 ,>
> > > > > > https://issues.apache.org/jira/browse/LUCENE-9136 ,>
> > > > > > https://issues.apache.org/jira/browse/LUCENE-9322 . I see that>
> > there>
> > > > > > are some related work and related PRs. What is the current state of>
> > > > this>
> > > > > > functionality?>
> > > > > >>
> > > > > > -->
> > > > > > Thanks,>
> > > > > > Mikhail>
> > > > > >>
> > > > > >>
> > > >>
> > > > --------------------------------------------------------------------->
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org>
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org>
> > > >>
> > > >>
> >>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: ANN search current state [ In reply to ]
Hi Julie,
thank you for working on LUCENE-9322 (I also love the issue).
I think it would be great if we can try some preliminary aknn
implementations (both of clustering-based and graph-based approach) on
LUCENE-9322, to explore working unified API and Codec/Format for vectors;
for now, I still have no good clear image about desirable abstraction we
should have. Sorry for my inactivity on the issues - I wish I could have
more time and expertise on it.

Tomoko


2020?7?23?(?) 3:10 Julie Tibshirani <julietibs@gmail.com>:

> Tomoko -- this fits with my experience as well. I really like the idea of
> treating LUCENE-9322 as setting up a framework for experimentation +
> benchmarking (but not requiring us to commit a particular ANN
> implementation quite yet).
>
> Julie
>
> On 2020/07/17 12:16:18, Tomoko Uchida <t...@gmail.com> wrote:
> > > would it make sense to create a separate Lucene module for ANN search
> ?>
> >
> > From a bit of my experience with LUCENE-9004, it is currently impossible
> to>
> > plug in or opt in custom codecs and indexing chain for aknn search
> without>
> > touching lucene-core module (plz correct that if it's wrong).>
> > I think LUCENE-9322 (unified low-level Codec/Format for dense vectors)>
> > would open the possibility for us to experiment different aknn
> algorithms>
> > on some sandbox modules or even separated jars from Lucene itself.>
> >
> > Tomoko>
> >
> >
> > 2020?7?17?(?) 17:00 Tommaso Teofili <to...@gmail.com>:>
> >
> > > would it make sense to create a separate Lucene module for ANN search
> ?>
> > > we could then experiment with the different approaches and compare
> them>
> > > across the same benchmarks.>
> > >>
> > > On Thu, 16 Jul 2020 at 23:14, Ali Akhtar <al...@ali.actor> wrote:>
> > >>
> > > > I’m a bit of a layman in this area, but if we are talking about
> formats>
> > > for>
> > > > vectors, I vote for the one used by FastAI word vectors. It’s pretty
> easy>
> > > > to work with.>
> > > >>
> > > > If we are talking about the same / similiar things, if not just
> ignore me>
> > > > ????>
> > > >>
> > > > On Thu, 16 Jul 2020 at 7:06 PM, Michael Sokolov <ms...@gmail.com>>
> > > > wrote:>
> > > >>
> > > > > We have some prototype implementations in the issues you found.
> If>
> > > > > you want to try out the approaches in those issues, you could
> build>
> > > > > Lucene from source and patch it, but there is no release
> containing>
> > > > > KNN/vector support. We're still working to establish consensus on
> what>
> > > > > the best way forward is. I think the most fruitful thing we can do
> at>
> > > > > the moment is establish a format for storing and accessing
> vectors>
> > > > > that will support different approaches since there is such a rich>
> > > > > variety of algorithms and approaches in this area. The last issue
> you>
> > > > > pointed to is focused on the format.>
> > > > >>
> > > > > On Wed, Jul 15, 2020 at 11:20 AM Alex K <ak...@gmail.com> wrote:>
> > > > > >>
> > > > > > Hi Mikhail,>
> > > > > >>
> > > > > > I'm not sure about the state of ANN in lucene proper. Very
> interested>
> > > > to>
> > > > > > see the response from others.>
> > > > > > I've been doing some work on ANN for an Elasticsearch plugin:>
> > > > > > http://elastiknn.klibisz.com/>
> > > > > > I think it's possible to extract my custom queries and modeling
> code>
> > > so>
> > > > > > that it's elasticsearch-agnostic and can be used directly in
> Lucene>
> > > > apps.>
> > > > > > However I'm much more familiar with Elasticsearch's APIs and>
> > > > > usage/testing>
> > > > > > patterns than I am with raw Lucene, so I'd likely need to get
> some>
> > > help>
> > > > > > from the Lucene community.>
> > > > > > Please LMK if that sounds interesting to anyone.>
> > > > > >>
> > > > > > - Alex>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > > On Wed, Jul 15, 2020 at 11:11 AM Mikhail <wm...@mail.ru.invalid>>
>
> > > > > wrote:>
> > > > > >>
> > > > > > >>
> > > > > > > Hi,>
> > > > > > >>
> > > > > > > I want to incorporate semantic search in my
> project,>
> > > which>
> > > > > uses>
> > > > > > > Lucene. I want to use sentence embeddings and ANN
> (approximate>
> > > > nearest>
> > > > > > > neighbor) search. I found the related Lucene issues:>
> > > > > > > https://issues.apache.org/jira/browse/LUCENE-9004 ,>
> > > > > > > https://issues.apache.org/jira/browse/LUCENE-9136 ,>
> > > > > > > https://issues.apache.org/jira/browse/LUCENE-9322 . I see
> that>
> > > there>
> > > > > > > are some related work and related PRs. What is the current
> state of>
> > > > > this>
> > > > > > > functionality?>
> > > > > > >>
> > > > > > > -->
> > > > > > > Thanks,>
> > > > > > > Mikhail>
> > > > > > >>
> > > > > > >>
> > > > >>
> > > > >
> --------------------------------------------------------------------->
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org>
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org>
>
> > >>
> >
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>