Mailing List Archive

Re: setBoost Q.
Mike Tinnes wrote:
> I've been working on tying in a PageRank algo to
> my web crawler using lucene and have a few problems. If I don't know the
> boost factor until AFTER the crawl is it possible to still set the boost?

Why not: (1) crawl, saving pages to disk; (2) analyze links and compute
boosts; then, finally, (3) build the Lucene index?

The API does not currently let you change a field's boost after a
document is indexed. It is in theory possible, but would require
overwriting .fXX files, which further complicates inter-process
synchronization of index access. Perhaps this can be added as a caveat
emptor API, but, in the meantime, I suggest the above approach.

> Also what does setBoost() actually do to the rank?

The rank is the position of a document in a hit list: the first hit has
rank one, and so on. Hits are sorted by score. The boost is multiplied
into score of hits. So a boost which is greater than 1.0 will tend to
increase the rank of hits on that field, while a boost which is less
than 1.0 will tend to decrease the rank of hits on that field.

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>