Mailing List Archive: Adding vs multiplicating scores when implementing "recency"

Adding vs multiplicating scores when implementing "recency"

Sep 16, 2021, 8:40 AM

Post #1 of 5 (551 views)

On March I've asked a question here that go no answers at all. As it
still something that I'd very much like to know I'll ask again.

To implement "recency" into a search you would add a boolean clause with
a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
meaning that this recency will impact different for searches with
different number of terms, right? With more terms the recency component
contribution to score will be more and more "diluted". However... I only
see examples using this way of doing, and I would need to do something
weird to implement a multiplicative change of the score... Am I missing
something?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Adding vs multiplicating scores when implementing "recency" [ In reply to ]

jpountz at gmail

Sep 16, 2021, 9:41 AM

Post #2 of 5 (551 views)

Permalink

Hello,

You are correct that the contribution would be additive in that case. We
don't provide an easy way to make the contribution multiplicative.

There is some debate about what is the best way to combine BM25 scores with
query-independent features, though in the discussions I've seen
contributions were summed up and the debate was more about whether they
should be normalized or not.

How much recency impacts ranking indeed depends on the number of terms and
how frequent these terms are. One way that I'm interpreting the fact that
not everyone recommends normalizing scores is that this way the query score
dominates when the query is looking for something very specific, because it
includes many terms or because it uses very specific terms - which may be a
feature. This approach also works well for Lucene since dynamic pruning via
Block-Max WAND keeps working when query-independent features are
incorporated into the final score, which helps figure out the top hits
without having to collect all matches.

On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
<nicolasl@wolfram.com.invalid> wrote:

> On March I've asked a question here that go no answers at all. As it
> still something that I'd very much like to know I'll ask again.
>
> To implement "recency" into a search you would add a boolean clause with
> a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> meaning that this recency will impact different for searches with
> different number of terms, right? With more terms the recency component
> contribution to score will be more and more "diluted". However... I only
> see examples using this way of doing, and I would need to do something
> weird to implement a multiplicative change of the score... Am I missing
> something?
>
> Thanks!
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Adrien

Re: Adding vs multiplicating scores when implementing "recency" [ In reply to ]

msokolov at gmail

Sep 17, 2021, 5:45 AM

Post #3 of 5 (551 views)

Permalink

Not advocating any particular approach here, just curious: could BMW
also function in the presence of a doc-score (like recency) that is
multiplied? My vague understanding is that as long as the scoring
formula is monotonic in all of its inputs, and we have block-encoded
the inputs, then we could compute a max score for a block?

On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand <jpountz@gmail.com> wrote:
>
> Hello,
>
> You are correct that the contribution would be additive in that case. We
> don't provide an easy way to make the contribution multiplicative.
>
> There is some debate about what is the best way to combine BM25 scores with
> query-independent features, though in the discussions I've seen
> contributions were summed up and the debate was more about whether they
> should be normalized or not.
>
> How much recency impacts ranking indeed depends on the number of terms and
> how frequent these terms are. One way that I'm interpreting the fact that
> not everyone recommends normalizing scores is that this way the query score
> dominates when the query is looking for something very specific, because it
> includes many terms or because it uses very specific terms - which may be a
> feature. This approach also works well for Lucene since dynamic pruning via
> Block-Max WAND keeps working when query-independent features are
> incorporated into the final score, which helps figure out the top hits
> without having to collect all matches.
>
> On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
> <nicolasl@wolfram.com.invalid> wrote:
>
> > On March I've asked a question here that go no answers at all. As it
> > still something that I'd very much like to know I'll ask again.
> >
> > To implement "recency" into a search you would add a boolean clause with
> > a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> > meaning that this recency will impact different for searches with
> > different number of terms, right? With more terms the recency component
> > contribution to score will be more and more "diluted". However... I only
> > see examples using this way of doing, and I would need to do something
> > weird to implement a multiplicative change of the score... Am I missing
> > something?
> >
> > Thanks!
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Adding vs multiplicating scores when implementing "recency" [ In reply to ]

jpountz at gmail

Sep 17, 2021, 7:10 AM

Post #4 of 5 (551 views)

Permalink

This is one requirement indeed. Since WAND reasons about partially
evaluated documents, it also requires that matching one more clause makes
the overall score higher, which is why we introduced the requirement that
scores must be positive in 8.0. For multiplication, this would require
scores that are greater than 1.

If someone really wanted to multiply scores, the easiest way might be to
create a query wrapper that takes the log of the scores of the wrapped
query, and rely on log(a)+log(b) = log(a * b).

Le ven. 17 sept. 2021 à 14:47, Michael Sokolov <msokolov@gmail.com> a
écrit :

> Not advocating any particular approach here, just curious: could BMW
> also function in the presence of a doc-score (like recency) that is
> multiplied? My vague understanding is that as long as the scoring
> formula is monotonic in all of its inputs, and we have block-encoded
> the inputs, then we could compute a max score for a block?
>
> On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand <jpountz@gmail.com> wrote:
> >
> > Hello,
> >
> > You are correct that the contribution would be additive in that case. We
> > don't provide an easy way to make the contribution multiplicative.
> >
> > There is some debate about what is the best way to combine BM25 scores
> with
> > query-independent features, though in the discussions I've seen
> > contributions were summed up and the debate was more about whether they
> > should be normalized or not.
> >
> > How much recency impacts ranking indeed depends on the number of terms
> and
> > how frequent these terms are. One way that I'm interpreting the fact that
> > not everyone recommends normalizing scores is that this way the query
> score
> > dominates when the query is looking for something very specific, because
> it
> > includes many terms or because it uses very specific terms - which may
> be a
> > feature. This approach also works well for Lucene since dynamic pruning
> via
> > Block-Max WAND keeps working when query-independent features are
> > incorporated into the final score, which helps figure out the top hits
> > without having to collect all matches.
> >
> > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
> > <nicolasl@wolfram.com.invalid> wrote:
> >
> > > On March I've asked a question here that go no answers at all. As it
> > > still something that I'd very much like to know I'll ask again.
> > >
> > > To implement "recency" into a search you would add a boolean clause
> with
> > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> > > meaning that this recency will impact different for searches with
> > > different number of terms, right? With more terms the recency component
> > > contribution to score will be more and more "diluted". However... I
> only
> > > see examples using this way of doing, and I would need to do something
> > > weird to implement a multiplicative change of the score... Am I missing
> > > something?
> > >
> > > Thanks!
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > --
> > Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Adding vs multiplicating scores when implementing "recency" [ In reply to ]

msokolov at gmail

Sep 17, 2021, 12:40 PM

Post #5 of 5 (551 views)

Permalink

ah, thanks for the explanation

On Fri, Sep 17, 2021 at 10:11 AM Adrien Grand <jpountz@gmail.com> wrote:
>
> This is one requirement indeed. Since WAND reasons about partially
> evaluated documents, it also requires that matching one more clause makes
> the overall score higher, which is why we introduced the requirement that
> scores must be positive in 8.0. For multiplication, this would require
> scores that are greater than 1.
>
> If someone really wanted to multiply scores, the easiest way might be to
> create a query wrapper that takes the log of the scores of the wrapped
> query, and rely on log(a)+log(b) = log(a * b).
>
> Le ven. 17 sept. 2021 à 14:47, Michael Sokolov <msokolov@gmail.com> a
> écrit :
>
> > Not advocating any particular approach here, just curious: could BMW
> > also function in the presence of a doc-score (like recency) that is
> > multiplied? My vague understanding is that as long as the scoring
> > formula is monotonic in all of its inputs, and we have block-encoded
> > the inputs, then we could compute a max score for a block?
> >
> > On Thu, Sep 16, 2021 at 12:41 PM Adrien Grand <jpountz@gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > You are correct that the contribution would be additive in that case. We
> > > don't provide an easy way to make the contribution multiplicative.
> > >
> > > There is some debate about what is the best way to combine BM25 scores
> > with
> > > query-independent features, though in the discussions I've seen
> > > contributions were summed up and the debate was more about whether they
> > > should be normalized or not.
> > >
> > > How much recency impacts ranking indeed depends on the number of terms
> > and
> > > how frequent these terms are. One way that I'm interpreting the fact that
> > > not everyone recommends normalizing scores is that this way the query
> > score
> > > dominates when the query is looking for something very specific, because
> > it
> > > includes many terms or because it uses very specific terms - which may
> > be a
> > > feature. This approach also works well for Lucene since dynamic pruning
> > via
> > > Block-Max WAND keeps working when query-independent features are
> > > incorporated into the final score, which helps figure out the top hits
> > > without having to collect all matches.
> > >
> > > On Thu, Sep 16, 2021 at 5:40 PM Nicolás Lichtmaier
> > > <nicolasl@wolfram.com.invalid> wrote:
> > >
> > > > On March I've asked a question here that go no answers at all. As it
> > > > still something that I'd very much like to know I'll ask again.
> > > >
> > > > To implement "recency" into a search you would add a boolean clause
> > with
> > > > a LongPoint.newDistanceFeatureQuery(), right? But that's additive,
> > > > meaning that this recency will impact different for searches with
> > > > different number of terms, right? With more terms the recency component
> > > > contribution to score will be more and more "diluted". However... I
> > only
> > > > see examples using this way of doing, and I would need to do something
> > > > weird to implement a multiplicative change of the score... Am I missing
> > > > something?
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > > --
> > > Adrien
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org