Mailing List Archive

TFIDFSimilarity score
Hi,

I have a question about the score produced by TFIDFSimilarity.
https://lucene.apache.org/core/8_5_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
says
that the IDF factor should be squared in the final score. However, looking
at the code, I see this:

public TFIDFScorer(float boost, Explanation idf, float[] normTable) {

// TODO: Validate?

this.idf = idf;

this.boost = boost;

this.queryWeight = boost * idf.getValue().floatValue();

this.normTable = normTable;

}


@Override

public float score(float freq, long norm) {

final float raw = tf(freq) * queryWeight; // compute tf(f)*weight

float normValue = normTable[(int) (norm & 0xFF)];

return raw * normValue; // normalize for field

}


Where does the second idf.getValue() factor come from? In Lucene 6.6.6,
before the patch for https://issues.apache.org/jira/browse/LUCENE-7368 was
applied, the code looked like this:

TFIDFSimScorer(IDFStats stats, NumericDocValues norms) throws
IOException {

this.stats = stats;

this.weightValue = stats.value;

this.norms = norms;

}



@Override

public float score(int doc, float freq) {

final float raw = tf(freq) * weightValue; // compute tf(f)*weight



return norms == null ? raw : raw * decodeNormValue(norms.get(doc)); //
normalize for field

}


...


public IDFStats(String field, Explanation idf) {

// TODO: Validate?

this.field = field;

this.idf = idf;

normalize(1f, 1f);

}


@Override

public void normalize(float queryNorm, float boost) {

this.boost = boost;

this.queryNorm = queryNorm;

queryWeight = queryNorm * boost * idf.getValue();

value = queryWeight * idf.getValue(); // idf for document

}

Did we lose an idf.getValue() factor in this patch? Or was it moved
somewhere else? Could you please point me to the code location where the
score is multiplied by the IDF value a second time?

Thanks!
Dumitru