Mailing List Archive: Prioritising certain documents in the search results

Hi

I'm currently using Lucene 8-6.3, and indexing a few thousand documents.
Some of these documents need to be prioritised in the search results, but
not by too much; e.g. an exact phrase match in a normal document still needs
to top the rankings ahead of a priority document that just matches the
individual words.

I'm tokenising and indexing a Title field and a Content field, and storing a
Category field and a Version field. Searches are executed with a standard
QueryParser on Title and Content.

Most documents do not have a category or a version, but if a document has
the category ReleaseNote then it needs to be boosted, and the value of the
boost factor should be correlated with the Version (so Version = 10 is
boosted more than Version = 6, etc).

Looking back through the older versions of the documentation it appears that
document.setBoost might have provided something I could work with, but later
versions of Lucene appear to have dropped this feature. If it still existed
I expect I'd have to experiment with the actual coefficients until I got
reasonable results but in principle it should be possible to arrive at a
good result.

I can't see how to achieve this same thing in the later versions of Lucene.
Best I can come up with is to calculate a factor from the category and
version number and store it as a separate field, then somehow use this field
to scale up (or down) the actual scores reported for the query results. But
I'm sure there must be a better way to do it - can anyone show me what it
is?

cheers

T

check out https://lucene.apache.org/core/9_5_0/core/org/apache/lucene/document/FeatureField.html

I think this is how you want to do it: it has some suggestions on how
to start without training the actual values in the docs, see "if you
don't know where to start"

On Wed, Feb 1, 2023 at 12:03 PM Trevor Nicholls
<trevor@castingthevoid.com> wrote:
>
> Hi
>
>
>
> I'm currently using Lucene 8-6.3, and indexing a few thousand documents.
> Some of these documents need to be prioritised in the search results, but
> not by too much; e.g. an exact phrase match in a normal document still needs
> to top the rankings ahead of a priority document that just matches the
> individual words.
>
> I'm tokenising and indexing a Title field and a Content field, and storing a
> Category field and a Version field. Searches are executed with a standard
> QueryParser on Title and Content.
>
>
>
> Most documents do not have a category or a version, but if a document has
> the category ReleaseNote then it needs to be boosted, and the value of the
> boost factor should be correlated with the Version (so Version = 10 is
> boosted more than Version = 6, etc).
>
>
>
> Looking back through the older versions of the documentation it appears that
> document.setBoost might have provided something I could work with, but later
> versions of Lucene appear to have dropped this feature. If it still existed
> I expect I'd have to experiment with the actual coefficients until I got
> reasonable results but in principle it should be possible to arrive at a
> good result.
>
>
>
> I can't see how to achieve this same thing in the later versions of Lucene.
> Best I can come up with is to calculate a factor from the category and
> version number and store it as a separate field, then somehow use this field
> to scale up (or down) the actual scores reported for the query results. But
> I'm sure there must be a better way to do it - can anyone show me what it
> is?
>
>
>
> cheers
>
> T
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org