Mailing List Archive

Lucene Explanation
Hello,
I am currently working on a project that would like to implement Document
Explain where we can see how a document was scored internally in lucene
given a query.

I see that the IndexSearcher has an explain
<https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int->
method
available that returns an Explanation
<https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Explanation.html>
object. An Explanation object only contains a description field (string)
but there is no way to know what part of a score that Explanation object is
for without parsing the description field itself. We wanted to implement
Document Explain in a more safe way where we could know what part of the
score an Explanation object is associated with and not parse the
description string field to find out. Here are a few of the options I have
thought of:

1. I was thinking about extending the similarity class (BM25Similarity) and
then overriding the particular methods that dealt with the different
subcomponents of explain but saw that the explainTF
<https://github.com/apache/lucene/blob/e510ef11c2a4307dd6ecc8c8974eef2c04e3e4d6/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L268>
method
is private. Is there a reason why this is? It would be very useful if it
could be public so that I can override it and store the knowledge that the
returned Explanation is for the TF component of the document score.

2. I also thought about extending the IndexSearcher and overriding the
createWeight method to store the weight structure and then use that to
understand the resulting Explanation structure from the IndexSearcher's
explain method.

Please let me know if any of that didn't make sense. Also, if anyone has
any other ideas on how I could approach this problem suggestions would be
greatly appreciated. Lastly, I would be happy to submit a PR to modify
Lucene's Explanation to be more aware of where it is created.