Mailing List Archive

document & field boosting
FYI, I just added document and field boosting to Lucene. It should be
in tonight's nightly build.

This lets one, e.g., implement Google-like ranking, where a factor in a
document's score is determined independently from the text of the document.

Longer term, I'd still like to open up document scoring, so that a user
can alter any part of the formula without altering Lucene's core code.

Enjoy!

Doug

-------- Original Message --------
Subject: Re: cvs commit:
jakarta-lucene/src/test/org/apache/lucene/search TestDocBoost.java
Date: Mon, 29 Jul 2002 12:14:22 -0700
From: Doug Cutting <cutting@lucene.com>
Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
To: Lucene Developers List <lucene-dev@jakarta.apache.org>
References: <20020729191115.42343.qmail@icarus.apache.org>

cutting@apache.org wrote:
> Log:
> msg.txt

Oops. That log entry was supposed to read:

Added support for boosting the score of documents and fields via the
new methods Document.setBoost(float) and Field.setBoost(float).

Note: This changes the encoding of an indexed value. Indexes should
be re-created from scratch in order for search scores to be correct.
With the new code and an old index, searches will yield very large
scores for shorter fields, and very small scores for longer fields.
Once the index is re-created, scores will be as before.

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>



--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: document & field boosting [ In reply to ]
Doug , you are amazing!!
ciao.
--

On Mon, 29 Jul 2002 12:31:49
Doug Cutting wrote:
>FYI, I just added document and field boosting to Lucene. It should be
>in tonight's nightly build.
>
>This lets one, e.g., implement Google-like ranking, where a factor in a
>document's score is determined independently from the text of the document.
>
>Longer term, I'd still like to open up document scoring, so that a user
>can alter any part of the formula without altering Lucene's core code.
>
>Enjoy!
>
>Doug
>
>-------- Original Message --------
>Subject: Re: cvs commit:
>jakarta-lucene/src/test/org/apache/lucene/search TestDocBoost.java
>Date: Mon, 29 Jul 2002 12:14:22 -0700
>From: Doug Cutting <cutting@lucene.com>
>Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
>To: Lucene Developers List <lucene-dev@jakarta.apache.org>
>References: <20020729191115.42343.qmail@icarus.apache.org>
>
>cutting@apache.org wrote:
> > Log:
> > msg.txt
>
>Oops. That log entry was supposed to read:
>
> Added support for boosting the score of documents and fields via the
> new methods Document.setBoost(float) and Field.setBoost(float).
>
> Note: This changes the encoding of an indexed value. Indexes should
> be re-created from scratch in order for search scores to be correct.
> With the new code and an old index, searches will yield very large
> scores for shorter fields, and very small scores for longer fields.
> Once the index is re-created, scores will be as before.
>
>Doug
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
>
>


_____________________________________________________
Supercharge your e-mail with a 25MB Inbox, POP3 Access, No Ads
and NoTaglines --> LYCOS MAIL PLUS.
http://www.mail.lycos.com/brandPage.shtml?pageId=plus

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: document & field boosting [ In reply to ]
it's amazing!

in september I'll implement our news archive where we try to score the documents based on text relevancy and relative frequency of article's downloads/read (it's an e-magazine).

peter

> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: Monday, July 29, 2002 9:32 PM
> To: tinnes@ecliptictech.com; soshima@business.com;
> schrader.news@evendi.de; foo@welho.com; melissamifsud@yahoo.com
> Cc: Lucene Developers List
> Subject: document & field boosting
>
>
> FYI, I just added document and field boosting to Lucene. It
> should be
> in tonight's nightly build.
>
> This lets one, e.g., implement Google-like ranking, where a
> factor in a
> document's score is determined independently from the text of
> the document.
>
> Longer term, I'd still like to open up document scoring, so
> that a user
> can alter any part of the formula without altering Lucene's core code.
>
> Enjoy!
>
> Doug
>
> -------- Original Message --------
> Subject: Re: cvs commit:
> jakarta-lucene/src/test/org/apache/lucene/search TestDocBoost.java
> Date: Mon, 29 Jul 2002 12:14:22 -0700
> From: Doug Cutting <cutting@lucene.com>
> Reply-To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> To: Lucene Developers List <lucene-dev@jakarta.apache.org>
> References: <20020729191115.42343.qmail@icarus.apache.org>
>
> cutting@apache.org wrote:
> > Log:
> > msg.txt
>
> Oops. That log entry was supposed to read:
>
> Added support for boosting the score of documents and
> fields via the
> new methods Document.setBoost(float) and Field.setBoost(float).
>
> Note: This changes the encoding of an indexed value.
> Indexes should
> be re-created from scratch in order for search scores to
> be correct.
> With the new code and an old index, searches will yield very large
> scores for shorter fields, and very small scores for
> longer fields.
> Once the index is re-created, scores will be as before.
>
> Doug
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: document & field boosting [ In reply to ]
Hi,

Doug, do you think the ranking function as stated in the FAQ
(http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searc
h&toc=faq#q31 is still correct after the recent changes?


Clemens


----- Original Message -----
From: "Doug Cutting" <cutting@lucene.com>
To: <tinnes@ecliptictech.com>; <soshima@business.com>;
<schrader.news@evendi.de>; <foo@welho.com>; <melissamifsud@yahoo.com>
Cc: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Monday, July 29, 2002 9:31 PM
Subject: document & field boosting


> FYI, I just added document and field boosting to Lucene. It should be
> in tonight's nightly build.
>
> This lets one, e.g., implement Google-like ranking, where a factor in a
> document's score is determined independently from the text of the
document.
>
> Longer term, I'd still like to open up document scoring, so that a user
> can alter any part of the formula without altering Lucene's core code.
>
> Enjoy!
>
> Doug



--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: document & field boosting [ In reply to ]
Clemens Marschner wrote:
> Doug, do you think the ranking function as stated in the FAQ
> (http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.searc
> h&toc=faq#q31 is still correct after the recent changes?

Yes, this equation is still correct, although it's now incomplete.
There is now another factor, the boost of the field containing the term,
specified when that field was indexed.

As I mentioned before, I would eventually like to make it possible for
folks to easily modify the scoring function. My idea is to generalize
the formula to something like:

sum_t( term_factor(df) * term_doc_factor(tf) * field_factor(length) *
query_boost * field_boost )

where term_factor(), term_doc_factor() and doc_factor() correspond to
methods that folks can easily override.

Currently all of the scoring functions are static methods in a single
class, Similarity.java, so one can in fact currently modify scoring by
re-defining this class, but it is not well documented and only for the
brave.

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>