Hi all,
I'm an engineer on Amazon Product Search and I've recently come upon a
situation where I've required conjunctive matching but disjunctive scoring.
As a concrete example, let's say I have a query like this:
(+title:"a" +title:"b" +title:"c") (product_id:1)
This is saying I want to conjunctively match on the title OR I want to
match a specific product document where the product_id is 1.
Let's say the document where product_id = 1 has a title of "a b", so it
doesn't match the title query. In this case, the score for the title clause
will be 0 since to my understanding, Lucene doesn't count scores for
non-matching clauses. However for my use case, I would like to take into
account that several keywords did in fact match, so as I stated earlier,
disjunctive scoring even though I still want to match conjunctively,
My way of working around this right now is to reconstruct the query as the
following (forgive my made-up Lucene query syntax, hopefully it's still
readable):
+(ConstantScoreQuery: 0 ((+title:"a" +title:"b" +title:"c")
(product_id:1))) (title:"a" title:"b" title:"c")
Pretty much, I separate this into a matching query that is wrapped by a
ConstantScore query so it has no score and a scoring query that will
provide a disjunctive score.
My approach feels a bit convoluted, so I was wondering if there were any
cleaner ways to do this? And if not, are there any drawbacks to my
workaround performance wise?
Thanks!
Marc D'Mello
I'm an engineer on Amazon Product Search and I've recently come upon a
situation where I've required conjunctive matching but disjunctive scoring.
As a concrete example, let's say I have a query like this:
(+title:"a" +title:"b" +title:"c") (product_id:1)
This is saying I want to conjunctively match on the title OR I want to
match a specific product document where the product_id is 1.
Let's say the document where product_id = 1 has a title of "a b", so it
doesn't match the title query. In this case, the score for the title clause
will be 0 since to my understanding, Lucene doesn't count scores for
non-matching clauses. However for my use case, I would like to take into
account that several keywords did in fact match, so as I stated earlier,
disjunctive scoring even though I still want to match conjunctively,
My way of working around this right now is to reconstruct the query as the
following (forgive my made-up Lucene query syntax, hopefully it's still
readable):
+(ConstantScoreQuery: 0 ((+title:"a" +title:"b" +title:"c")
(product_id:1))) (title:"a" title:"b" title:"c")
Pretty much, I separate this into a matching query that is wrapped by a
ConstantScore query so it has no score and a scoring query that will
provide a disjunctive score.
My approach feels a bit convoluted, so I was wondering if there were any
cleaner ways to do this? And if not, are there any drawbacks to my
workaround performance wise?
Thanks!
Marc D'Mello