I'm looking through the TermQuery code (and generally trying to
understand exactly how the searching works) and I found this code that
looks suspicious to me. It is very likeley that I just don't understand
what's going on, but there is a chance that this is a bug, so I wanted
to ask for clarification / review from Doug and others.
In the TermQuery.normalize(float norm), weight is being multiplied first
by the normalization factor (the argument) and then by the idf, that was
stored in the TermQuery before. Although I can't say for sure that this
is wrong, it does look suspect. First, idf is already factored into
weight in the sumOfSquaredWeights() method, and second, if normalize is
called multiple times, idf will be multiplied into weight over and
over... Plus the comment in normalize doesn't really make sense, and the
way the code is written makes me think that this is a problem caused by
a CVS merge conflict, and that only the line "weight *= norm" should be
in that method. Am I right?
======================================================
final float sumOfSquaredWeights(Searcher searcher) throws IOException {
idf = Similarity.idf(term, searcher);
weight = idf * boost;
return weight * weight; // square term weights
}
final void normalize(float norm) {
weight *= norm; // normalize for query
weight *= idf; // factor from document
}
======================================================
understand exactly how the searching works) and I found this code that
looks suspicious to me. It is very likeley that I just don't understand
what's going on, but there is a chance that this is a bug, so I wanted
to ask for clarification / review from Doug and others.
In the TermQuery.normalize(float norm), weight is being multiplied first
by the normalization factor (the argument) and then by the idf, that was
stored in the TermQuery before. Although I can't say for sure that this
is wrong, it does look suspect. First, idf is already factored into
weight in the sumOfSquaredWeights() method, and second, if normalize is
called multiple times, idf will be multiplied into weight over and
over... Plus the comment in normalize doesn't really make sense, and the
way the code is written makes me think that this is a problem caused by
a CVS merge conflict, and that only the line "weight *= norm" should be
in that method. Am I right?
======================================================
final float sumOfSquaredWeights(Searcher searcher) throws IOException {
idf = Similarity.idf(term, searcher);
weight = idf * boost;
return weight * weight; // square term weights
}
final void normalize(float norm) {
weight *= norm; // normalize for query
weight *= idf; // factor from document
}
======================================================