Mailing List Archive: Need help on aggregation of nested documents

Need help on aggregation of nested documents

Nov 14, 2021, 9:07 PM

Post #1 of 5 (476 views)

Hi Team,

I have a document structure as a customer which itself has few attributes
like gender, location etc.

Each customer will have a list of facts like transaction, product views etc.

I want to do an aggregation of the facts. For example find all customers
who are from a specific location and have done transactions worth more than
500$ between two date ranges.

The queries can go deeper than this.

Thanks in advance.

Gopal Sharma

Re: Need help on aggregation of nested documents [ In reply to ]

jpountz at gmail

Nov 15, 2021, 9:05 AM

Post #2 of 5 (476 views)

Permalink

It's not straightforward as we don't provide high-level tooling to do this.
You need to use the BitSetProducer that you pass to the
ToParentBlockJoinQuery in order to resolve the range of child doc IDs for a
given parent doc ID (see e.g. how ToChildBlockJoinQuery does it), and then
aggregate over these child doc IDs.

On Mon, Nov 15, 2021 at 6:06 AM Gopal Sharma
<gopal.sharma@algonomy.com.invalid> wrote:

> Hi Team,
>
> I have a document structure as a customer which itself has few attributes
> like gender, location etc.
>
> Each customer will have a list of facts like transaction, product views
> etc.
>
> I want to do an aggregation of the facts. For example find all customers
> who are from a specific location and have done transactions worth more than
> 500$ between two date ranges.
>
> The queries can go deeper than this.
>
> Thanks in advance.
>
> Gopal Sharma
>

--
Adrien

Re: Need help on aggregation of nested documents [ In reply to ]

gopal.sharma at algonomy

Nov 15, 2021, 10:30 PM

Post #3 of 5 (476 views)

Permalink

Hi Adrien,

Thanks for the reply.

I am able to retrieve the child docId's using the .ToChildBlockJoinQuery.
Now for me to do aggregates i need to find the document using
reader.document(int docID) right?. If that is the case won't getting all
the documents would be a costly operation and then finally doing the
aggregates.

Is there any other way around this?

Thanks
Gopal Sharma

On Mon, Nov 15, 2021 at 10:36 PM Adrien Grand <jpountz@gmail.com> wrote:

> It's not straightforward as we don't provide high-level tooling to do this.
> You need to use the BitSetProducer that you pass to the
> ToParentBlockJoinQuery in order to resolve the range of child doc IDs for a
> given parent doc ID (see e.g. how ToChildBlockJoinQuery does it), and then
> aggregate over these child doc IDs.
>
> On Mon, Nov 15, 2021 at 6:06 AM Gopal Sharma
> <gopal.sharma@algonomy.com.invalid> wrote:
>
> > Hi Team,
> >
> > I have a document structure as a customer which itself has few attributes
> > like gender, location etc.
> >
> > Each customer will have a list of facts like transaction, product views
> > etc.
> >
> > I want to do an aggregation of the facts. For example find all customers
> > who are from a specific location and have done transactions worth more
> than
> > 500$ between two date ranges.
> >
> > The queries can go deeper than this.
> >
> > Thanks in advance.
> >
> > Gopal Sharma
> >
>
>
> --
> Adrien
>

Re: Need help on aggregation of nested documents [ In reply to ]

jpountz at gmail

Nov 16, 2021, 12:11 AM

Post #4 of 5 (476 views)

Permalink

Indeed you shouldn't load all hits, you should register a
org.apache.lucene.search.Collector that will aggregate data while matches
are being collected.

Since you are already using a ToChildBlockJoinQuery, you should be able to
use it in conjunction with utility classes from lucene/facets. Have you
looked into it already?

On Tue, Nov 16, 2021 at 7:30 AM Gopal Sharma
<gopal.sharma@algonomy.com.invalid> wrote:

> Hi Adrien,
>
> Thanks for the reply.
>
> I am able to retrieve the child docId's using the .ToChildBlockJoinQuery.
> Now for me to do aggregates i need to find the document using
> reader.document(int docID) right?. If that is the case won't getting all
> the documents would be a costly operation and then finally doing the
> aggregates.
>
> Is there any other way around this?
>
> Thanks
> Gopal Sharma
>
>
>
>
>
>
>
> On Mon, Nov 15, 2021 at 10:36 PM Adrien Grand <jpountz@gmail.com> wrote:
>
> > It's not straightforward as we don't provide high-level tooling to do
> this.
> > You need to use the BitSetProducer that you pass to the
> > ToParentBlockJoinQuery in order to resolve the range of child doc IDs
> for a
> > given parent doc ID (see e.g. how ToChildBlockJoinQuery does it), and
> then
> > aggregate over these child doc IDs.
> >
> > On Mon, Nov 15, 2021 at 6:06 AM Gopal Sharma
> > <gopal.sharma@algonomy.com.invalid> wrote:
> >
> > > Hi Team,
> > >
> > > I have a document structure as a customer which itself has few
> attributes
> > > like gender, location etc.
> > >
> > > Each customer will have a list of facts like transaction, product views
> > > etc.
> > >
> > > I want to do an aggregation of the facts. For example find all
> customers
> > > who are from a specific location and have done transactions worth more
> > than
> > > 500$ between two date ranges.
> > >
> > > The queries can go deeper than this.
> > >
> > > Thanks in advance.
> > >
> > > Gopal Sharma
> > >
> >
> >
> > --
> > Adrien
> >
>

--
Adrien

Re: Need help on aggregation of nested documents [ In reply to ]

gopal.sharma at algonomy

Nov 16, 2021, 8:43 AM

Post #5 of 5 (476 views)

Permalink

I have created a custom Collector extending SimpleCollector. I can see the
methods scoreMode() and collect(int doc).

I am seeing that the collect method is invoked by lucene with the child
docId. Am I moving in the right direction?

But to collect the values I would need the Document by using
reader.document(int docID) and then parse it which would be again the same
issue i pointed out.

Thanks
Gopal Sharma

On Tue, Nov 16, 2021 at 1:41 PM Adrien Grand <jpountz@gmail.com> wrote:

> Indeed you shouldn't load all hits, you should register a
> org.apache.lucene.search.Collector that will aggregate data while matches
> are being collected.
>
> Since you are already using a ToChildBlockJoinQuery, you should be able to
> use it in conjunction with utility classes from lucene/facets. Have you
> looked into it already?
>
> On Tue, Nov 16, 2021 at 7:30 AM Gopal Sharma
> <gopal.sharma@algonomy.com.invalid> wrote:
>
> > Hi Adrien,
> >
> > Thanks for the reply.
> >
> > I am able to retrieve the child docId's using the .ToChildBlockJoinQuery.
> > Now for me to do aggregates i need to find the document using
> > reader.document(int docID) right?. If that is the case won't getting all
> > the documents would be a costly operation and then finally doing the
> > aggregates.
> >
> > Is there any other way around this?
> >
> > Thanks
> > Gopal Sharma
> >
> >
> >
> >
> >
> >
> >
> > On Mon, Nov 15, 2021 at 10:36 PM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > It's not straightforward as we don't provide high-level tooling to do
> > this.
> > > You need to use the BitSetProducer that you pass to the
> > > ToParentBlockJoinQuery in order to resolve the range of child doc IDs
> > for a
> > > given parent doc ID (see e.g. how ToChildBlockJoinQuery does it), and
> > then
> > > aggregate over these child doc IDs.
> > >
> > > On Mon, Nov 15, 2021 at 6:06 AM Gopal Sharma
> > > <gopal.sharma@algonomy.com.invalid> wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I have a document structure as a customer which itself has few
> > attributes
> > > > like gender, location etc.
> > > >
> > > > Each customer will have a list of facts like transaction, product
> views
> > > > etc.
> > > >
> > > > I want to do an aggregation of the facts. For example find all
> > customers
> > > > who are from a specific location and have done transactions worth
> more
> > > than
> > > > 500$ between two date ranges.
> > > >
> > > > The queries can go deeper than this.
> > > >
> > > > Thanks in advance.
> > > >
> > > > Gopal Sharma
> > > >
> > >
> > >
> > > --
> > > Adrien
> > >
> >
>
>
> --
> Adrien
>