Mailing List Archive

A method for "de-boosting" a term...
We are trying to "de-value" documents which contain certain terms. So
instead of increasing the value of those documents and thus moving them to
the top of the list, we are trying to decrease the value and hopefully move
those documents to the bottom of the list. To do this we are playing with
various combinations of setBoost(...). Thinking here, is that to de-value
we would set a lower boost value, increase value we would set a higher boost
value, and neutral would be a boost value in the middle:

final TermQuery
retval = new TermQuery(new Term(field, value));

retval.setBoost(
preference < 0
? 0.0f
: preference > 0
? 100000.0f
: 50000.0f
);

However, no matter what "negative" boost we set, the mere fact that the
terms of the document have matched with the query moves the document to the
top of the list. Here is the query that is being generated:

GLOBAL_MATCH:GLOBAL_VALUE^50000.0 +(424:6^50000.0 424:7^50000.0
424:8^50000.0 424:9^50000.0 424:10^50000.0 424:11^50000.0 424:12^50000.0
424:13^50000.0 424:14^50000.0 424:15^50000.0 424:16^50000.0 424:17^50000.0
424:18^50000.0) (424:14^0.0 423:Natalie Imbruglia - Torn Music Video^0.0
422:600^0.0 422:476^0.0 422:586^0.0 422:419^0.0) +(contentTypeId:1
contentTypeId:4)

Notice that some terms have a boost of 0.0 while others have a boost of
50000.0 yet, the document with boost of 0.0 still appears on top.

Could someone recommend a way of doing a "negative" boost?

Thanks.
-AP_
RE: A method for "de-boosting" a term... [ In reply to ]
Alex,

Can you please supply a simple reproducible example? When I set the boost
for a term to zero then documents containing it do not come to the top. Nor
do they go to the bottom. The boost is multiplied into the weight for the
term, but the weights are then added into the document score, so a zero
boost will have little effect on the final ranking.

Lucene does not currently support negative boosts, which might achieve what
you want. I just made the modifications to required to support negative
boosts. Please find these attached and tell me how they work. If they work
then I will check them in after the 1.2 release is final.

Doug
RE: A method for "de-boosting" a term... [ In reply to ]
Doug the changes you have sent work GREAT in my case.

Thanks
Alex Paransky

PS:

1. I was not able to use patch 2.5 to incorporate the .diff file you sent
me, so my changes were done by hand. I double checked the final diff
against what you send me and they are the same. I am not too familiar with
the patch program so if there are any options I need to use please tell me.
Here is the patch session from my machine:

C:\work\individualnetwork\external\lucene-1.2-rc1-fix>"\Program
Files\GNU\WinCvs 1.2\patch" -i c:\diffs.dat
patching file `src/java/org/apache/lucene/queryParser/QueryParser.jj'
Assertion failed: hunk, file patch.c, line 321

abnormal program termination

2. Another minor addition would be to make Query and Terms implement a
java.io.Serializable. We are currently in a process of incorporating Lucene
into an EJB application, and we have a Statefull Session Bean object called
SearchResults which should accept a Query and search the collection
(SearchResults basically wraps IndexSearcher), however, because Query/Term
is not Serialiable, we are forced to execute .toString() and then attempt to
use a parser to rebuild the result. The problem with this is that parser
does not offer the type of flexibility that we can get from directly
buidling Query/Term structures (for example, some of the values have spaces
in them, so the parser sees field:value1 value2 value3 and does not
correctly interpret this as field:"term(value1) term(value2) term(value3)",
however if we use quotes, we get a PhraseQuery instead of TermQuery we are
looking for). Should this be an enhancement that I should mention in the
bugzilla, or would this e-mail suffice?

Again, thanks for your help.
-AP_

-----Original Message-----
From: Doug Cutting [mailto:DCutting@grandcentral.com]
Sent: Wednesday, October 24, 2001 12:47 PM
To: 'Alex Paransky'; lucene-user@jakarta.apache.org
Subject: RE: A method for "de-boosting" a term...


Alex,

Can you please supply a simple reproducible example? When I set the boost
for a term to zero then documents containing it do not come to the top. Nor
do they go to the bottom. The boost is multiplied into the weight for the
term, but the weights are then added into the document score, so a zero
boost will have little effect on the final ranking.

Lucene does not currently support negative boosts, which might achieve what
you want. I just made the modifications to required to support negative
boosts. Please find these attached and tell me how they work. If they work
then I will check them in after the 1.2 release is final.

Doug