Mailing List Archive

Using a zero boost to prevent term for effecting score
Hello,

I am new to Lucene so my apologies in advance if what I am trying to do
does not make sense or has been discussed before. I searched the list
archives but couldn't find an answer....

First a bit of background.... I have a collection of documents which are
indexed by SourceID and Content. In the UI, documents are displayed in
folders which map to SourceIDs and by default all documents in a given
source are displayed using a query like "+(source:1 source:2)". I also
want to let users search for text in the Content and display results
ranked by their Lucene score. Unfortunately, including the SourceID
terms in my query effects the score I get back which in the context of
my app does not make sense. I have thought about turning the SourceIDs
terms into a QueryFilter but couldn't figure out how to get Lucene to
return all of the documents in the filtered collection since empty
queries are not allowed. As an alternative, I tried setting the boost on
the SourceID terms to zero which seems to work -- my queries look
something like "+((source:1 source:2)^0.0) +content:google".

So, my question is whether this approach is a supported method for
getting the scorer to ignore a field in its calculations? If it is, then
I may have found a bug in IndexSearcher.explain() which return "0.0 =
match required" when asked to explain why a result got the score it did
despite the fact that a non-zero score was passed to my hit collector
for that item. Tracing through the code, it looks like the
IndexSearcher.explain() method is unhappy with a required clause having
a zero score. Since the core search algorithms don't prevent this, I was
surprised to see this in IndexSearcher.explain(). The other problem that
I am having with the searcher.explain() method is that I can't pass it
the DateFilter that I use on some of my queries. Since that filter
effects the score for documents in the results, it would be nice if
IndexSearcher.explain() was able to take the filter into account. This
would also be a problem if I moved the SourceIDs term into a filter as I
have been considering.

Any help or insight on this issue will be greatly appreciated!

Thanks,
Tim