Mailing List Archive

Document metadata in ranking?
I am sorry if this has been covered elsewhere, but I wanted to know if it’s possible to inject and use document metadata in ranking search results.

As an example, if I have a pool of documents that fit into 3 broad categories: “Peer Reviewed”, “Professional Journalism”, “Blog Post”, then I would like the documents from the first categeory to rank higher (all else being equal) than those in the second category, and so on.

I can imagine wanting to add more metadata: eg. For highly regarded authors, poorly regarded authors, highly rated journals etc etc.

Is this even possible? Is this what payloads are for?

Any insights would be appreciated!
Re: Document metadata in ranking? [ In reply to ]
Hello Philip,

I’ll answer with a possibility that might be outdated and predates the
existence of payloads (which I think are non-analysed parts so not
appropriate).

Lucene has fields and you can include the metadata within fields in form
of particular tokens. Then you can enrich every query by letting them be
parsed then adding (maybe only for plain queries?) Termqueries as
weighted or’s with an and clause which would boost the higher
metadata.

As for an absolute ordering (where the highest category always comes
first), you certainly need to add some limits on the scores so that the
influence of a positive category takes precedence over the different
orderings (TF-IDF per default).

At the end you can write custom-score-engine but I can only imagine
ruining the performance when doing so...

paul

On 26 Feb 2021, at 3:40, Philip Warner wrote:

> I am sorry if this has been covered elsewhere, but I wanted to know if
> it’s possible to inject and use document metadata in ranking search
> results.
>
> As an example, if I have a pool of documents that fit into 3 broad
> categories: “Peer Reviewed”, “Professional Journalism”,
> “Blog Post”, then I would like the documents from the first
> categeory to rank higher (all else being equal) than those in the
> second category, and so on.
>
> I can imagine wanting to add more metadata: eg. For highly regarded
> authors, poorly regarded authors, highly rated journals etc etc.
>
> Is this even possible? Is this what payloads are for?
>
> Any insights would be appreciated!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org