Mailing List Archive

DisjunctionMinQuery
Hi all,

I noticed we have a DisjunctionMaxQuery
<https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java>
but
not a corresponding DisjunctionMinQuery. I was just wondering if there was
a specific reason for that? Or is it just that it is not a common query to
use?

Thanks!
Marc
Re: DisjunctionMinQuery [ In reply to ]
Hi Marc,

Can you clarify what the semantics of a DisjunctionMinQuery would be? Would
you keep the score for the *lowest* scoring disjunct (plus some tiebreaker
applied to the other matching disjuncts)?

I'm trying to imagine how that would work compared to the classic DisMax
use-case. Say I'm searching for "dalmatian" using a DisMax query over term
queries against title and body. A match on title is probably going to score
higher than a match against the body, just because the title has a shorter
length (and the doc frequency of individual terms in the title is likely to
be lower, since there are fewer terms overall). With DisMax, a match on
title alone will score higher than a match on body, and the tie-break will
tend to score a match on title and body higher than a match on title alone.

With a DisMin (assuming you keep the lowest score), then a match on title
and body would probably score lower than a match on title alone. That feels
weird to me, but I might be missing the use-case.

How would you use a DisMinQuery?

Thanks,
Froh



On Wed, Nov 8, 2023 at 10:50?AM Marc D'Mello <marcd2000@gmail.com> wrote:

> Hi all,
>
> I noticed we have a DisjunctionMaxQuery
> <
> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
> >
> but
> not a corresponding DisjunctionMinQuery. I was just wondering if there was
> a specific reason for that? Or is it just that it is not a common query to
> use?
>
> Thanks!
> Marc
>
Re: DisjunctionMinQuery [ In reply to ]
Hi Michael,

Thanks for the response! So to answer your first question, yes this would
keep the lowest score from the matching sub-scorers. Our use case is that
we have a custom term-level score overriding term frequency and we want to
take the min of that as part of our scoring function. Maybe it's a niche
use case?

Thanks,
Marc

On Wed, Nov 8, 2023 at 3:19?PM Michael Froh <msfroh@gmail.com> wrote:

> Hi Marc,
>
> Can you clarify what the semantics of a DisjunctionMinQuery would be? Would
> you keep the score for the *lowest* scoring disjunct (plus some tiebreaker
> applied to the other matching disjuncts)?
>
> I'm trying to imagine how that would work compared to the classic DisMax
> use-case. Say I'm searching for "dalmatian" using a DisMax query over term
> queries against title and body. A match on title is probably going to score
> higher than a match against the body, just because the title has a shorter
> length (and the doc frequency of individual terms in the title is likely to
> be lower, since there are fewer terms overall). With DisMax, a match on
> title alone will score higher than a match on body, and the tie-break will
> tend to score a match on title and body higher than a match on title alone.
>
> With a DisMin (assuming you keep the lowest score), then a match on title
> and body would probably score lower than a match on title alone. That feels
> weird to me, but I might be missing the use-case.
>
> How would you use a DisMinQuery?
>
> Thanks,
> Froh
>
>
>
> On Wed, Nov 8, 2023 at 10:50?AM Marc D'Mello <marcd2000@gmail.com> wrote:
>
> > Hi all,
> >
> > I noticed we have a DisjunctionMaxQuery
> > <
> >
> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
> > >
> > but
> > not a corresponding DisjunctionMinQuery. I was just wondering if there
> was
> > a specific reason for that? Or is it just that it is not a common query
> to
> > use?
> >
> > Thanks!
> > Marc
> >
>
Re: DisjunctionMinQuery [ In reply to ]
Hi,

in that case you should use something like 1/x as your scoring function
in the sub-clauses. In Lucene scores should go up for more relevancy.
This must also apply for function scoring.

Uwe

Am 09.11.2023 um 19:14 schrieb Marc D'Mello:
> Hi Michael,
>
> Thanks for the response! So to answer your first question, yes this would
> keep the lowest score from the matching sub-scorers. Our use case is that
> we have a custom term-level score overriding term frequency and we want to
> take the min of that as part of our scoring function. Maybe it's a niche
> use case?
>
> Thanks,
> Marc
>
> On Wed, Nov 8, 2023 at 3:19?PM Michael Froh <msfroh@gmail.com> wrote:
>
>> Hi Marc,
>>
>> Can you clarify what the semantics of a DisjunctionMinQuery would be? Would
>> you keep the score for the *lowest* scoring disjunct (plus some tiebreaker
>> applied to the other matching disjuncts)?
>>
>> I'm trying to imagine how that would work compared to the classic DisMax
>> use-case. Say I'm searching for "dalmatian" using a DisMax query over term
>> queries against title and body. A match on title is probably going to score
>> higher than a match against the body, just because the title has a shorter
>> length (and the doc frequency of individual terms in the title is likely to
>> be lower, since there are fewer terms overall). With DisMax, a match on
>> title alone will score higher than a match on body, and the tie-break will
>> tend to score a match on title and body higher than a match on title alone.
>>
>> With a DisMin (assuming you keep the lowest score), then a match on title
>> and body would probably score lower than a match on title alone. That feels
>> weird to me, but I might be missing the use-case.
>>
>> How would you use a DisMinQuery?
>>
>> Thanks,
>> Froh
>>
>>
>>
>> On Wed, Nov 8, 2023 at 10:50?AM Marc D'Mello <marcd2000@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> I noticed we have a DisjunctionMaxQuery
>>> <
>>>
>> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
>>> but
>>> not a corresponding DisjunctionMinQuery. I was just wondering if there
>> was
>>> a specific reason for that? Or is it just that it is not a common query
>> to
>>> use?
>>>
>>> Thanks!
>>> Marc
>>>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: DisjunctionMinQuery [ In reply to ]
Hi all,

Once again, thanks for the responses! After thinking about this a bit more,
I think Michael's response makes sense now. I do agree that partial matches
shouldn't be ranked higher than conjunctive matches, so I think it doesn't
make sense in my use case to use a DisjunctiveMinQuery (I think I would
need a AndMinQuery or something like that). This also answers my initial
question.

I did have a question about this though:

in that case you should use something like 1/x as your scoring function
> in the sub-clauses
>

Doesn't using 1/x as a scoring function, even in the subclauses, still
cause an issue where the output score will be inversely correlated to the
indexed term score? I think that would break BMW right? Or maybe I am
misunderstanding the suggestion.

Thanks,
Marc

On Thu, Nov 9, 2023 at 10:18?AM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> in that case you should use something like 1/x as your scoring function
> in the sub-clauses. In Lucene scores should go up for more relevancy.
> This must also apply for function scoring.
>
> Uwe
>
> Am 09.11.2023 um 19:14 schrieb Marc D'Mello:
> > Hi Michael,
> >
> > Thanks for the response! So to answer your first question, yes this would
> > keep the lowest score from the matching sub-scorers. Our use case is that
> > we have a custom term-level score overriding term frequency and we want
> to
> > take the min of that as part of our scoring function. Maybe it's a niche
> > use case?
> >
> > Thanks,
> > Marc
> >
> > On Wed, Nov 8, 2023 at 3:19?PM Michael Froh <msfroh@gmail.com> wrote:
> >
> >> Hi Marc,
> >>
> >> Can you clarify what the semantics of a DisjunctionMinQuery would be?
> Would
> >> you keep the score for the *lowest* scoring disjunct (plus some
> tiebreaker
> >> applied to the other matching disjuncts)?
> >>
> >> I'm trying to imagine how that would work compared to the classic DisMax
> >> use-case. Say I'm searching for "dalmatian" using a DisMax query over
> term
> >> queries against title and body. A match on title is probably going to
> score
> >> higher than a match against the body, just because the title has a
> shorter
> >> length (and the doc frequency of individual terms in the title is
> likely to
> >> be lower, since there are fewer terms overall). With DisMax, a match on
> >> title alone will score higher than a match on body, and the tie-break
> will
> >> tend to score a match on title and body higher than a match on title
> alone.
> >>
> >> With a DisMin (assuming you keep the lowest score), then a match on
> title
> >> and body would probably score lower than a match on title alone. That
> feels
> >> weird to me, but I might be missing the use-case.
> >>
> >> How would you use a DisMinQuery?
> >>
> >> Thanks,
> >> Froh
> >>
> >>
> >>
> >> On Wed, Nov 8, 2023 at 10:50?AM Marc D'Mello <marcd2000@gmail.com>
> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I noticed we have a DisjunctionMaxQuery
> >>> <
> >>>
> >>
> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
> >>> but
> >>> not a corresponding DisjunctionMinQuery. I was just wondering if there
> >> was
> >>> a specific reason for that? Or is it just that it is not a common query
> >> to
> >>> use?
> >>>
> >>> Thanks!
> >>> Marc
> >>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: DisjunctionMinQuery [ In reply to ]
> In Lucene scores should go up for more relevancy.

That is the case for combining child scores with min. min() is monotonic --
if its arguments increase, the result does not decrease, it only stays the
same or increases, so I think it is a valid scoring operation for Lucene.
And it makes some logical sense if you think of the terms as an ensemble:
you want all of them to match, and the score scales according to the number
of times they all occur ... something like that.

On Thu, Nov 9, 2023 at 3:09?PM Marc D'Mello <marcd2000@gmail.com> wrote:

> Hi all,
>
> Once again, thanks for the responses! After thinking about this a bit more,
> I think Michael's response makes sense now. I do agree that partial matches
> shouldn't be ranked higher than conjunctive matches, so I think it doesn't
> make sense in my use case to use a DisjunctiveMinQuery (I think I would
> need a AndMinQuery or something like that). This also answers my initial
> question.
>
> I did have a question about this though:
>
> in that case you should use something like 1/x as your scoring function
> > in the sub-clauses
> >
>
> Doesn't using 1/x as a scoring function, even in the subclauses, still
> cause an issue where the output score will be inversely correlated to the
> indexed term score? I think that would break BMW right? Or maybe I am
> misunderstanding the suggestion.
>
> Thanks,
> Marc
>
> On Thu, Nov 9, 2023 at 10:18?AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> > Hi,
> >
> > in that case you should use something like 1/x as your scoring function
> > in the sub-clauses. In Lucene scores should go up for more relevancy.
> > This must also apply for function scoring.
> >
> > Uwe
> >
> > Am 09.11.2023 um 19:14 schrieb Marc D'Mello:
> > > Hi Michael,
> > >
> > > Thanks for the response! So to answer your first question, yes this
> would
> > > keep the lowest score from the matching sub-scorers. Our use case is
> that
> > > we have a custom term-level score overriding term frequency and we want
> > to
> > > take the min of that as part of our scoring function. Maybe it's a
> niche
> > > use case?
> > >
> > > Thanks,
> > > Marc
> > >
> > > On Wed, Nov 8, 2023 at 3:19?PM Michael Froh <msfroh@gmail.com> wrote:
> > >
> > >> Hi Marc,
> > >>
> > >> Can you clarify what the semantics of a DisjunctionMinQuery would be?
> > Would
> > >> you keep the score for the *lowest* scoring disjunct (plus some
> > tiebreaker
> > >> applied to the other matching disjuncts)?
> > >>
> > >> I'm trying to imagine how that would work compared to the classic
> DisMax
> > >> use-case. Say I'm searching for "dalmatian" using a DisMax query over
> > term
> > >> queries against title and body. A match on title is probably going to
> > score
> > >> higher than a match against the body, just because the title has a
> > shorter
> > >> length (and the doc frequency of individual terms in the title is
> > likely to
> > >> be lower, since there are fewer terms overall). With DisMax, a match
> on
> > >> title alone will score higher than a match on body, and the tie-break
> > will
> > >> tend to score a match on title and body higher than a match on title
> > alone.
> > >>
> > >> With a DisMin (assuming you keep the lowest score), then a match on
> > title
> > >> and body would probably score lower than a match on title alone. That
> > feels
> > >> weird to me, but I might be missing the use-case.
> > >>
> > >> How would you use a DisMinQuery?
> > >>
> > >> Thanks,
> > >> Froh
> > >>
> > >>
> > >>
> > >> On Wed, Nov 8, 2023 at 10:50?AM Marc D'Mello <marcd2000@gmail.com>
> > wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I noticed we have a DisjunctionMaxQuery
> > >>> <
> > >>>
> > >>
> >
> https://github.com/apache/lucene/blob/branch_9_7/lucene/core/src/java/org/apache/lucene/search/DisjunctionMaxQuery.java
> > >>> but
> > >>> not a corresponding DisjunctionMinQuery. I was just wondering if
> there
> > >> was
> > >>> a specific reason for that? Or is it just that it is not a common
> query
> > >> to
> > >>> use?
> > >>>
> > >>> Thanks!
> > >>> Marc
> > >>>
> > --
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>