Mailing List Archive

Index-time join ToParentBlockJoinQuery query produces incorrect result
Hello, I am trying to understand the requirements for properly using the index-time join. In my use case, I am trying to model a 1-N relationship where parent document could have 0-N child documents. For now I am keeping my data very simple where each child has a single field. So my data right now look like this:


Parent Doc Children

--------------------------------------
id=id00000
none
id=id00001
program=P1

id=id00002
program=P1
program=P2

id=id00003
none
id=id00004
program=P1

id=id00005
program=P1
program=P2


So essentially I have 6 parent docs, doc 0 has no children, doc 1 has 1 child, doc 2 has 2 children, etc.


Certain queries are giving me incorrect result. For example:


BitSetProducer parentSet = new QueryBitSetProducer(new TermQuery(new Term("id", "id00003")));
Query q = new ToParentBlockJoinQuery(new WildcardQuery(new Term("program", "*")), parentSet, ScoreMode.None);


This returns "id00003", which is unexpected.


I opened a bug (https://issues.apache.org/jira/browse/LUCENE-8902) in my haste earlier (sorry) and it was mentioned in there that "chid free is not supported". So I take it to mean that each parent should have at least one child. So let's say I add a "default" child to each parent:


Parent Doc Children

--------------------------------------
id=id00000
field1=val1
id=id00001

field1=val1
program=P1

id=id00002
field1=val1

program=P1
program=P2

id=id00003
field1=val1

id=id00004
field1=val1

program=P1

id=id00005
field1=val1

program=P1
program=P2


So now every parent has at least one child. That made no difference, still get the same result. What am I doing wrong here?


Thanks
Re: Index-time join ToParentBlockJoinQuery query produces incorrect result [ In reply to ]
On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <asolodin@comcast.net> wrote:

>
> This returns "id00003", which is unexpected.
>
> Please check ToPBJQ javadoc. It's absolutely expected.

--
Sincerely yours
Mikhail Khludnev
Re: Index-time join ToParentBlockJoinQuery query produces incorrect result [ In reply to ]
Thanks Mikhail.


I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?


> On July 3, 2019 at 10:30 AM Mikhail Khludnev <mkhl@apache.org mailto:mkhl@apache.org > wrote:
>
>
> On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
>
> >
>
> >
> > >
> >
> > > --
> Sincerely yours
> Mikhail Khludnev
>
Re: Index-time join ToParentBlockJoinQuery query produces incorrect result [ In reply to ]
After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.

It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?

> On July 3, 2019 at 10:52 AM ANDREI SOLODIN <asolodin@comcast.net> wrote:
>
>
> Thanks Mikhail.
>
>
> I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
>
>
> >
> >
> > On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> >
> > >
> >
> > >
> > > > >
> > >
> > > > > --
> > Sincerely yours
> > Mikhail Khludnev
> >
> > >
Re: Index-time join ToParentBlockJoinQuery query produces incorrect result [ In reply to ]
Well for one thing, you might have other documents in the index that
are neither parents nor children (in this particular relation). Also,
consider a nested hierarchy - how can we automatically figure out
which "generation" or "level" of parent to select?

On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <asolodin@comcast.net> wrote:
>
> After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.
>
> It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?
>
> > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <asolodin@comcast.net> wrote:
> >
> >
> > Thanks Mikhail.
> >
> >
> > I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> >
> >
> > > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > >
> > >
> > > On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > >
> > > >
> > >
> > > > > > This returns "id00003", which is unexpected.
> > > >
> > > > > >
> > > > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > >
> > > > > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> > > >
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index-time join ToParentBlockJoinQuery query produces incorrect result [ In reply to ]
So you are implying that the parent filter allows subsets. The code at https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java#L46 implies that subset is not allowed. If I select a subset and invoke the checker, I get this IllegalStateException.


> On July 3, 2019 at 2:33 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
>
> Well for one thing, you might have other documents in the index that
> are neither parents nor children (in this particular relation). Also,
> consider a nested hierarchy - how can we automatically figure out
> which "generation" or "level" of parent to select?
>
> On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <asolodin@comcast.net> wrote:
> >
> > After looking through the unit tests, I got it working. The problem was that I thought the parent filter in the ToParentBlockJoinQuery can be used to select a subset of parents. It appears that the parent filter must select ALL parents, not a subset. This is not explained in the javadoc. If you want to select a subset of parents (independently of the child query), ToParentBlockJoinQuery can not be used on its own, but rather as a clause in another query.
> >
> > It would be a nice enhancement to just automatically select all parents, I mean, it is already required to be the last document in the block, why do we need to provide a query for them?
> >
> > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <asolodin@comcast.net> wrote:
> > >
> > >
> > > Thanks Mikhail.
> > >
> > >
> > > I read through the javadoc and thought I was satisfying all the preconditions. Obviously not :-) Is it this part that am I getting wrong: "At search time you provide a Filter identifying the parents, however this Filter must provide an BitSet https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true per sub-reader."? If so, given the data above how do I properly create a parent query?
> > >
> > >
> > > > > On July 3, 2019 at 10:30 AM Mikhail Khludnev < mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > > >
> > > >
> > > > On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN < asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > > >
> > > > >
> > > >
> > > > > > > This returns "id00003", which is unexpected.
> > > > >
> > > > > > >
> > > > > > > Please check ToPBJQ javadoc. It's absolutely expected.
> > > > >
> > > > > > > --
> > > > Sincerely yours
> > > > Mikhail Khludnev
> > > >
> > > > >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Index-time join ToParentBlockJoinQuery query produces incorrect result [ In reply to ]
Andrei, it's not clear what's the problem, but if you need to join children
to parents and then select only subset of parents you need to combine join
with parent filter. Some cases are explained
https://lucene.apache.org/solr/guide/8_0/other-parsers.html#OtherParsers-BlockJoinParentQueryParser
.

On Sat, Jul 6, 2019 at 1:41 AM ANDREI SOLODIN <asolodin@comcast.net> wrote:

> So you are implying that the parent filter allows subsets. The code at
> https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/CheckJoinIndex.java#L46
> implies that subset is not allowed. If I select a subset and invoke the
> checker, I get this IllegalStateException.
>
>
> > On July 3, 2019 at 2:33 PM Michael Sokolov <msokolov@gmail.com> wrote:
> >
> >
> > Well for one thing, you might have other documents in the index that
> > are neither parents nor children (in this particular relation). Also,
> > consider a nested hierarchy - how can we automatically figure out
> > which "generation" or "level" of parent to select?
> >
> > On Wed, Jul 3, 2019 at 2:50 PM ANDREI SOLODIN <asolodin@comcast.net>
> wrote:
> > >
> > > After looking through the unit tests, I got it working. The problem
> was that I thought the parent filter in the ToParentBlockJoinQuery can be
> used to select a subset of parents. It appears that the parent filter must
> select ALL parents, not a subset. This is not explained in the javadoc. If
> you want to select a subset of parents (independently of the child query),
> ToParentBlockJoinQuery can not be used on its own, but rather as a clause
> in another query.
> > >
> > > It would be a nice enhancement to just automatically select all
> parents, I mean, it is already required to be the last document in the
> block, why do we need to provide a query for them?
> > >
> > > > On July 3, 2019 at 10:52 AM ANDREI SOLODIN <asolodin@comcast.net>
> wrote:
> > > >
> > > >
> > > > Thanks Mikhail.
> > > >
> > > >
> > > > I read through the javadoc and thought I was satisfying all the
> preconditions. Obviously not :-) Is it this part that am I getting wrong:
> "At search time you provide a Filter identifying the parents, however this
> Filter must provide an BitSet
> https://lucene.apache.org/core/8_1_1/core/org/apache/lucene/util/BitSet.html?is-external=true
> per sub-reader."? If so, given the data above how do I properly create a
> parent query?
> > > >
> > > >
> > > > > > On July 3, 2019 at 10:30 AM Mikhail Khludnev <
> mkhl@apache.org mailto:mkhl@apache.org > wrote:
> > > > >
> > > > >
> > > > > On Wed, Jul 3, 2019 at 6:11 PM ANDREI SOLODIN <
> asolodin@comcast.net mailto:asolodin@comcast.net > wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > > > This returns "id00003", which is unexpected.
> > > > > >
> > > > > > > >
> > > > > > > > Please check ToPBJQ javadoc. It's absolutely
> expected.
> > > > > >
> > > > > > > > --
> > > > > Sincerely yours
> > > > > Mikhail Khludnev
> > > > >
> > > > > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Sincerely yours
Mikhail Khludnev