Mailing List Archive

Scoring Across Multiple Fields
Hi,

I have a question regarding how Lucene computes document similarities from
field similarities.

Lucene's scoring documentation mentions that scoring works on fields and
combines the results to return documents. I'm assuming fields are given
scores, and those scores are simply averaged to return the document score?

If this is the case, then in order to incorporate multiple fields in my
scoring, I would use multiple term queries that contain the same term, but
target different fields, then I would simply put them in a boolean query,
and search my index using this boolean query.

Am I going about this in the correct way? Any clarification would be
greatly appreciated.

Thank you,
John B
Re: Scoring Across Multiple Fields [ In reply to ]
Hi John,

A TermQuery produces a scorer that can compute similarity for a given term
value against a given field, in the context of the index, so as you say, it
produces a score for one field.

If you want to match a given term value across multiple fields, indeed you
could use a BooleanQuery with the TermQueries in SHOULD clauses. The
vanilla BooleanQuery produces a score which is the sum of all matching
clauses' scores (or at least that's the interpretation I get from reading
the source code of the explain() method in BooleanWeight).

You can also look into DisjunctionMaxQuery, which works like a disjunctive
BooleanQuery, but it returns the maximum score across matching clauses. The
idea here is that if, say, you're matching across title and body fields, a
title match may score higher (perhaps because it's been boosted). If you
sum the scores across fields, you're likely just inflating those title
matches even more (since a title match is probably highly correlated with a
body match). (The DisjunctionMaxQuery also has a an optional
"tieBreakerMultiplier" property that you can use to weight the scoring
somewhere between pure max and pure sum -- like "Use the maximum score,
plus 0.001 times the sum of the rest".)

Hope that helps,
Michael

On Mon, 27 Jan 2020 at 13:37, John Brown <brown.john@temple.edu> wrote:

> Hi,
>
> I have a question regarding how Lucene computes document similarities from
> field similarities.
>
> Lucene's scoring documentation mentions that scoring works on fields and
> combines the results to return documents. I'm assuming fields are given
> scores, and those scores are simply averaged to return the document score?
>
> If this is the case, then in order to incorporate multiple fields in my
> scoring, I would use multiple term queries that contain the same term, but
> target different fields, then I would simply put them in a boolean query,
> and search my index using this boolean query.
>
> Am I going about this in the correct way? Any clarification would be
> greatly appreciated.
>
> Thank you,
> John B
>