Mailing List Archive

Disjunctively scoring non-matching conjunctive clauses
Hi all,

I'm an engineer on Amazon Product Search and I've recently come upon a
situation where I've required conjunctive matching but disjunctive scoring.
As a concrete example, let's say I have a query like this:

(+title:"a" +title:"b" +title:"c") (product_id:1)

This is saying I want to conjunctively match on the title OR I want to
match a specific product document where the product_id is 1.

Let's say the document where product_id = 1 has a title of "a b", so it
doesn't match the title query. In this case, the score for the title clause
will be 0 since to my understanding, Lucene doesn't count scores for
non-matching clauses. However for my use case, I would like to take into
account that several keywords did in fact match, so as I stated earlier,
disjunctive scoring even though I still want to match conjunctively,

My way of working around this right now is to reconstruct the query as the
following (forgive my made-up Lucene query syntax, hopefully it's still
readable):

+(ConstantScoreQuery: 0 ((+title:"a" +title:"b" +title:"c")
(product_id:1))) (title:"a" title:"b" title:"c")

Pretty much, I separate this into a matching query that is wrapped by a
ConstantScore query so it has no score and a scoring query that will
provide a disjunctive score.

My approach feels a bit convoluted, so I was wondering if there were any
cleaner ways to do this? And if not, are there any drawbacks to my
workaround performance wise?

Thanks!
Marc D'Mello
Re: Disjunctively scoring non-matching conjunctive clauses [ In reply to ]
Hi,

this is the normal way to do this: use a filter or constant score query
to do the matcing and use disjunctive scoring as a long chain of
"should" clauses.

Uwe

Am 21.07.2023 um 02:35 schrieb Marc D'Mello:
> Hi all,
>
> I'm an engineer on Amazon Product Search and I've recently come upon a
> situation where I've required conjunctive matching but disjunctive scoring.
> As a concrete example, let's say I have a query like this:
>
> (+title:"a" +title:"b" +title:"c") (product_id:1)
>
> This is saying I want to conjunctively match on the title OR I want to
> match a specific product document where the product_id is 1.
>
> Let's say the document where product_id = 1 has a title of "a b", so it
> doesn't match the title query. In this case, the score for the title clause
> will be 0 since to my understanding, Lucene doesn't count scores for
> non-matching clauses. However for my use case, I would like to take into
> account that several keywords did in fact match, so as I stated earlier,
> disjunctive scoring even though I still want to match conjunctively,
>
> My way of working around this right now is to reconstruct the query as the
> following (forgive my made-up Lucene query syntax, hopefully it's still
> readable):
>
> +(ConstantScoreQuery: 0 ((+title:"a" +title:"b" +title:"c")
> (product_id:1))) (title:"a" title:"b" title:"c")
>
> Pretty much, I separate this into a matching query that is wrapped by a
> ConstantScore query so it has no score and a scoring query that will
> provide a disjunctive score.
>
> My approach feels a bit convoluted, so I was wondering if there were any
> cleaner ways to do this? And if not, are there any drawbacks to my
> workaround performance wise?
>
> Thanks!
> Marc D'Mello
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org