Mailing List Archive

Anyone familiar (or use) MultiRangeQuery?
Hi folks-

Is anyone familiar with MultiRangeQuery (found in
o.a.l.sandbox.search)? I was playing around with it recently since it
might be a good fit for a use-case I'm working on for Amazon's Product
Search engine, but it looks like it has a pretty fundamental bug in
how it works. That or I'm completely mis-understanding what the query
is meant to do.

My understanding is that this query should consider documents to be a
match if they contain a point that is found in _any_ of the ranges
represented by this query (i.e., it's a disjunction over a set of
query ranges). But... it appears that the query incorrectly considers
a document to be a match if its point matches on any single dimension
of any range (where it should be requiring all dimensions in a
particular range to match).

I added a unit test to demonstrate this bug along with a proposed fix
over here: https://github.com/apache/lucene/pull/437

If anyone is familiar with this query (or better yet, uses it), I'd be
really interested in your input.

Cheers,
-Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Anyone familiar (or use) MultiRangeQuery? [ In reply to ]
I did a little git spelunking and found this PR
https://github.com/apache/lucene-solr/pull/794 where it was
introduced. It does sound to me as if the intent was to match on
multiple multi-dimensional ranges (ie hypercubes), not on any
dimension among multiple ranges? Why would anyone ever want to do
that? On the other hand a lot of people looked at it ... so maybe
we're missing something here?

On Sun, Nov 21, 2021 at 11:14 AM Greg Miller <gsmiller@gmail.com> wrote:
>
> Hi folks-
>
> Is anyone familiar with MultiRangeQuery (found in
> o.a.l.sandbox.search)? I was playing around with it recently since it
> might be a good fit for a use-case I'm working on for Amazon's Product
> Search engine, but it looks like it has a pretty fundamental bug in
> how it works. That or I'm completely mis-understanding what the query
> is meant to do.
>
> My understanding is that this query should consider documents to be a
> match if they contain a point that is found in _any_ of the ranges
> represented by this query (i.e., it's a disjunction over a set of
> query ranges). But... it appears that the query incorrectly considers
> a document to be a match if its point matches on any single dimension
> of any range (where it should be requiring all dimensions in a
> particular range to match).
>
> I added a unit test to demonstrate this bug along with a proposed fix
> over here: https://github.com/apache/lucene/pull/437
>
> If anyone is familiar with this query (or better yet, uses it), I'd be
> really interested in your input.
>
> Cheers,
> -Greg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Anyone familiar (or use) MultiRangeQuery? [ In reply to ]
I think Greg is right and this query is supposed to be a
specialization for a disjunction of multiple range queries. It helps
because you need to visit the index of the BKD tree and build a bit
set once for the entire disjunction instead of once per range.

I suspect that the fact that it doesn't work with multi-dimensional
points is a bug that hasn't been found yet because it's been mostly
discussed in the context of 1D fields?

On Mon, Nov 22, 2021 at 5:13 PM Michael Sokolov <msokolov@gmail.com> wrote:
>
> I did a little git spelunking and found this PR
> https://github.com/apache/lucene-solr/pull/794 where it was
> introduced. It does sound to me as if the intent was to match on
> multiple multi-dimensional ranges (ie hypercubes), not on any
> dimension among multiple ranges? Why would anyone ever want to do
> that? On the other hand a lot of people looked at it ... so maybe
> we're missing something here?
>
> On Sun, Nov 21, 2021 at 11:14 AM Greg Miller <gsmiller@gmail.com> wrote:
> >
> > Hi folks-
> >
> > Is anyone familiar with MultiRangeQuery (found in
> > o.a.l.sandbox.search)? I was playing around with it recently since it
> > might be a good fit for a use-case I'm working on for Amazon's Product
> > Search engine, but it looks like it has a pretty fundamental bug in
> > how it works. That or I'm completely mis-understanding what the query
> > is meant to do.
> >
> > My understanding is that this query should consider documents to be a
> > match if they contain a point that is found in _any_ of the ranges
> > represented by this query (i.e., it's a disjunction over a set of
> > query ranges). But... it appears that the query incorrectly considers
> > a document to be a match if its point matches on any single dimension
> > of any range (where it should be requiring all dimensions in a
> > particular range to match).
> >
> > I added a unit test to demonstrate this bug along with a proposed fix
> > over here: https://github.com/apache/lucene/pull/437
> >
> > If anyone is familiar with this query (or better yet, uses it), I'd be
> > really interested in your input.
> >
> > Cheers,
> > -Greg
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>


--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Anyone familiar (or use) MultiRangeQuery? [ In reply to ]
Thanks everyone!

> I suspect that the fact that it doesn't work with multi-dimensional
points is a bug that hasn't been found yet because it's been mostly
discussed in the context of 1D fields?

This seems plausible. It also made me think that writing a "duel" test
that compares randomized scenarios against a disjunction of "standard"
PointRangeQueries would be a good idea, so I went ahead and added that
to my PR (https://github.com/apache/lucene/pull/437).

It seems to me that this is in fact a bug, so I'd suggest we move
forward with fixing it. But if anyone disagrees, let's discuss :)

Cheers,
-Greg

On Thu, Nov 25, 2021 at 9:47 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> I think Greg is right and this query is supposed to be a
> specialization for a disjunction of multiple range queries. It helps
> because you need to visit the index of the BKD tree and build a bit
> set once for the entire disjunction instead of once per range.
>
> I suspect that the fact that it doesn't work with multi-dimensional
> points is a bug that hasn't been found yet because it's been mostly
> discussed in the context of 1D fields?
>
> On Mon, Nov 22, 2021 at 5:13 PM Michael Sokolov <msokolov@gmail.com> wrote:
> >
> > I did a little git spelunking and found this PR
> > https://github.com/apache/lucene-solr/pull/794 where it was
> > introduced. It does sound to me as if the intent was to match on
> > multiple multi-dimensional ranges (ie hypercubes), not on any
> > dimension among multiple ranges? Why would anyone ever want to do
> > that? On the other hand a lot of people looked at it ... so maybe
> > we're missing something here?
> >
> > On Sun, Nov 21, 2021 at 11:14 AM Greg Miller <gsmiller@gmail.com> wrote:
> > >
> > > Hi folks-
> > >
> > > Is anyone familiar with MultiRangeQuery (found in
> > > o.a.l.sandbox.search)? I was playing around with it recently since it
> > > might be a good fit for a use-case I'm working on for Amazon's Product
> > > Search engine, but it looks like it has a pretty fundamental bug in
> > > how it works. That or I'm completely mis-understanding what the query
> > > is meant to do.
> > >
> > > My understanding is that this query should consider documents to be a
> > > match if they contain a point that is found in _any_ of the ranges
> > > represented by this query (i.e., it's a disjunction over a set of
> > > query ranges). But... it appears that the query incorrectly considers
> > > a document to be a match if its point matches on any single dimension
> > > of any range (where it should be requiring all dimensions in a
> > > particular range to match).
> > >
> > > I added a unit test to demonstrate this bug along with a proposed fix
> > > over here: https://github.com/apache/lucene/pull/437
> > >
> > > If anyone is familiar with this query (or better yet, uses it), I'd be
> > > really interested in your input.
> > >
> > > Cheers,
> > > -Greg
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: dev-help@lucene.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Anyone familiar (or use) MultiRangeQuery? [ In reply to ]
+1 to fixing

Le sam. 27 nov. 2021 à 17:32, Greg Miller <gsmiller@gmail.com> a écrit :

> Thanks everyone!
>
> > I suspect that the fact that it doesn't work with multi-dimensional
> points is a bug that hasn't been found yet because it's been mostly
> discussed in the context of 1D fields?
>
> This seems plausible. It also made me think that writing a "duel" test
> that compares randomized scenarios against a disjunction of "standard"
> PointRangeQueries would be a good idea, so I went ahead and added that
> to my PR (https://github.com/apache/lucene/pull/437).
>
> It seems to me that this is in fact a bug, so I'd suggest we move
> forward with fixing it. But if anyone disagrees, let's discuss :)
>
> Cheers,
> -Greg
>
> On Thu, Nov 25, 2021 at 9:47 AM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > I think Greg is right and this query is supposed to be a
> > specialization for a disjunction of multiple range queries. It helps
> > because you need to visit the index of the BKD tree and build a bit
> > set once for the entire disjunction instead of once per range.
> >
> > I suspect that the fact that it doesn't work with multi-dimensional
> > points is a bug that hasn't been found yet because it's been mostly
> > discussed in the context of 1D fields?
> >
> > On Mon, Nov 22, 2021 at 5:13 PM Michael Sokolov <msokolov@gmail.com>
> wrote:
> > >
> > > I did a little git spelunking and found this PR
> > > https://github.com/apache/lucene-solr/pull/794 where it was
> > > introduced. It does sound to me as if the intent was to match on
> > > multiple multi-dimensional ranges (ie hypercubes), not on any
> > > dimension among multiple ranges? Why would anyone ever want to do
> > > that? On the other hand a lot of people looked at it ... so maybe
> > > we're missing something here?
> > >
> > > On Sun, Nov 21, 2021 at 11:14 AM Greg Miller <gsmiller@gmail.com>
> wrote:
> > > >
> > > > Hi folks-
> > > >
> > > > Is anyone familiar with MultiRangeQuery (found in
> > > > o.a.l.sandbox.search)? I was playing around with it recently since it
> > > > might be a good fit for a use-case I'm working on for Amazon's
> Product
> > > > Search engine, but it looks like it has a pretty fundamental bug in
> > > > how it works. That or I'm completely mis-understanding what the query
> > > > is meant to do.
> > > >
> > > > My understanding is that this query should consider documents to be a
> > > > match if they contain a point that is found in _any_ of the ranges
> > > > represented by this query (i.e., it's a disjunction over a set of
> > > > query ranges). But... it appears that the query incorrectly considers
> > > > a document to be a match if its point matches on any single dimension
> > > > of any range (where it should be requiring all dimensions in a
> > > > particular range to match).
> > > >
> > > > I added a unit test to demonstrate this bug along with a proposed fix
> > > > over here: https://github.com/apache/lucene/pull/437
> > > >
> > > > If anyone is familiar with this query (or better yet, uses it), I'd
> be
> > > > really interested in your input.
> > > >
> > > > Cheers,
> > > > -Greg
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: dev-help@lucene.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: dev-help@lucene.apache.org
> > >
> >
> >
> > --
> > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>