Mailing List Archive

Does Lucene have anything like a covering index as an alternative to DocValues?
Hi all,

I am curious if there is anything in Lucene that resembles a covering index
(from the relational database world) as an alternative to DocValues for
commonly-accessed values?

Consider the following use-case: I'm indexing docs in a Lucene index. Each
doc has some terms, which are not stored. Each doc also has a UUID
corresponding to some other system, which is stored using DocValues. When I
run a query, I get back the TopDocs and use the doc ID to fetch the UUID
from DocValues. I know that I will *always* need to go fetch this UUID. Is
there any way to have the UUID stored in the actual index, rather than
using DocValues?

Thanks in advance for any tips

Alex Klibisz
RE: Does Lucene have anything like a covering index as an alternative to DocValues? [ In reply to ]
You need to index the UUID as a standard indexed StringField. Then you can do a lookup using TermQuery. That's how all systems like Solr or Elasticsearch handle document identifiers.

DocValues are for facetting and sorting, but looking up by ID is a typical use case for an inverted index. If you still need to store it as DocValues field, just add it with both types.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Alex K <aklibisz@gmail.com>
> Sent: Monday, July 5, 2021 2:30 AM
> To: java-user@lucene.apache.org
> Subject: Does Lucene have anything like a covering index as an alternative to
> DocValues?
>
> Hi all,
>
> I am curious if there is anything in Lucene that resembles a covering index
> (from the relational database world) as an alternative to DocValues for
> commonly-accessed values?
>
> Consider the following use-case: I'm indexing docs in a Lucene index. Each
> doc has some terms, which are not stored. Each doc also has a UUID
> corresponding to some other system, which is stored using DocValues. When I
> run a query, I get back the TopDocs and use the doc ID to fetch the UUID
> from DocValues. I know that I will *always* need to go fetch this UUID. Is
> there any way to have the UUID stored in the actual index, rather than
> using DocValues?
>
> Thanks in advance for any tips
>
> Alex Klibisz


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Does Lucene have anything like a covering index as an alternative to DocValues? [ In reply to ]
Hi,

Sorry I misunderstood you question, you want to lookup the UUID in another system!
Then the approach you are doing is correct. Either store as stored field or as docvalue. An inverted index cannot store additional data, because it *is* inverted, it is focused around *terms* not documents. The posting list of each term can only store internal, numeric lucene doc ids. Those have then to be used to lookup the actual contents from e.g. stored fields (possibility A) or DocValues (possibility B). We can't store UUIDs in the highly compressed posting list.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Uwe Schindler <uwe@thetaphi.de>
> Sent: Monday, July 5, 2021 3:10 PM
> To: java-user@lucene.apache.org
> Subject: RE: Does Lucene have anything like a covering index as an alternative
> to DocValues?
>
> You need to index the UUID as a standard indexed StringField. Then you can do
> a lookup using TermQuery. That's how all systems like Solr or Elasticsearch
> handle document identifiers.
>
> DocValues are for facetting and sorting, but looking up by ID is a typical use
> case for an inverted index. If you still need to store it as DocValues field, just
> add it with both types.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Alex K <aklibisz@gmail.com>
> > Sent: Monday, July 5, 2021 2:30 AM
> > To: java-user@lucene.apache.org
> > Subject: Does Lucene have anything like a covering index as an alternative to
> > DocValues?
> >
> > Hi all,
> >
> > I am curious if there is anything in Lucene that resembles a covering index
> > (from the relational database world) as an alternative to DocValues for
> > commonly-accessed values?
> >
> > Consider the following use-case: I'm indexing docs in a Lucene index. Each
> > doc has some terms, which are not stored. Each doc also has a UUID
> > corresponding to some other system, which is stored using DocValues. When I
> > run a query, I get back the TopDocs and use the doc ID to fetch the UUID
> > from DocValues. I know that I will *always* need to go fetch this UUID. Is
> > there any way to have the UUID stored in the actual index, rather than
> > using DocValues?
> >
> > Thanks in advance for any tips
> >
> > Alex Klibisz
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Does Lucene have anything like a covering index as an alternative to DocValues? [ In reply to ]
Hi Uwe,
Thanks for clarifying. That makes sense.
Thanks,
Alex Klibisz

On Mon, Jul 5, 2021 at 9:22 AM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> Sorry I misunderstood you question, you want to lookup the UUID in another
> system!
> Then the approach you are doing is correct. Either store as stored field
> or as docvalue. An inverted index cannot store additional data, because it
> *is* inverted, it is focused around *terms* not documents. The posting list
> of each term can only store internal, numeric lucene doc ids. Those have
> then to be used to lookup the actual contents from e.g. stored fields
> (possibility A) or DocValues (possibility B). We can't store UUIDs in the
> highly compressed posting list.
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Uwe Schindler <uwe@thetaphi.de>
> > Sent: Monday, July 5, 2021 3:10 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Does Lucene have anything like a covering index as an
> alternative
> > to DocValues?
> >
> > You need to index the UUID as a standard indexed StringField. Then you
> can do
> > a lookup using TermQuery. That's how all systems like Solr or
> Elasticsearch
> > handle document identifiers.
> >
> > DocValues are for facetting and sorting, but looking up by ID is a
> typical use
> > case for an inverted index. If you still need to store it as DocValues
> field, just
> > add it with both types.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Alex K <aklibisz@gmail.com>
> > > Sent: Monday, July 5, 2021 2:30 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Does Lucene have anything like a covering index as an
> alternative to
> > > DocValues?
> > >
> > > Hi all,
> > >
> > > I am curious if there is anything in Lucene that resembles a covering
> index
> > > (from the relational database world) as an alternative to DocValues for
> > > commonly-accessed values?
> > >
> > > Consider the following use-case: I'm indexing docs in a Lucene index.
> Each
> > > doc has some terms, which are not stored. Each doc also has a UUID
> > > corresponding to some other system, which is stored using DocValues.
> When I
> > > run a query, I get back the TopDocs and use the doc ID to fetch the
> UUID
> > > from DocValues. I know that I will *always* need to go fetch this
> UUID. Is
> > > there any way to have the UUID stored in the actual index, rather than
> > > using DocValues?
> > >
> > > Thanks in advance for any tips
> > >
> > > Alex Klibisz
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>