Mailing List Archive

Feature Request: Smoothing scores for documents without search term
Hello,

My name is Cameron VandenBerg, and I am a Senior Software Engineer at Carnegie Mellon University for Professor Jamie Callan working on search engine research. In my current project I have implemented the scoring logic for our academic search engine, Indri (http://lemurproject.org/indri.php) on top of Lucene.

The major difference between Lucene and Indri is that Indri will give a document a "smoothing score" to a document that does not contain the search term, which has improved our search ranking accuracy.

We are interested in adding a smoothing score method to the Scorable abstract class like this:

/**
* Allows access to the score of a Query
*/
public abstract class Scorable {

/**
* Returns the score of the current document matching the query.
*/
public abstract float score() throws IOException;

/**
* Returns the smoothing score of the current document matching the query.
*/
public abstract float smoothingScore(String docId) throws IOException;

This method can return 0 by default or be implemented in subscorers that anyone would want to implement. In my case, I use this method in my custom implementation of the TermScorer to call the docScorer with the docId and a count of 0 like this:

@Override
public float smoothingScore(DisiWrapper topList, int docId) throws IOException {
return docScorer.score(docId, 0);
}

We are very open to different implementations as well if there is another suggestion. Please let me know what steps I can take to implement this change and submit for review.

Thank you,
Cameron VandenBerg
Re: Feature Request: Smoothing scores for documents without search term [ In reply to ]
Hi,

We normally don't accept a request of this nature, but let's first see if
Lucene's existing API is adequate for your needs. It seems you are trying
to score documents that don't match the query strictly. Thus it seems that
in some sense, all documents ultimately match your query -- at least those
that produce a non-zero smoothingScore? Can that be computed efficiently
without iterating over *all* documents that don't match the query? If I'm
on the right path here, I think you'll need to produce a Lucene Query
that's a combination of two queries -- one which is the original query, and
another that matches a subset of "similar" docs that don't necessarily
match the original query.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Fri, Jul 17, 2020 at 3:52 PM Cameron M VandenBerg <cmw2@cs.cmu.edu>
wrote:

> Hello,
>
>
>
> My name is Cameron VandenBerg, and I am a Senior Software Engineer at
> Carnegie Mellon University for Professor Jamie Callan working on search
> engine research. In my current project I have implemented the scoring
> logic for our academic search engine, Indri (
> http://lemurproject.org/indri.php) on top of Lucene.
>
>
>
> The major difference between Lucene and Indri is that Indri will give a
> document a “smoothing score” to a document that does not contain the search
> term, which has improved our search ranking accuracy.
>
>
>
> We are interested in adding a smoothing score method to the Scorable
> abstract class like this:
>
>
>
> /**
>
> * Allows access to the score of a Query
>
> */
>
> *public* *abstract* *class* Scorable {
>
>
>
> /**
>
> * Returns the score of the current document matching the query.
>
> */
>
> *public* *abstract* *float* score() *throws* IOException;
>
>
>
> /**
>
> * Returns the smoothing score of the current document matching the
> query.
>
> */
>
> *public* *abstract* *float* smoothingScore(String docId) *throws*
> IOException;
>
>
>
> This method can return 0 by default or be implemented in subscorers that
> anyone would want to implement. In my case, I use this method in my custom
> implementation of the TermScorer to call the docScorer with the docId and a
> count of 0 like this:
>
>
>
> @Override
>
> *public* *float* smoothingScore(DisiWrapper topList, *int* docId)
> *throws* IOException {
>
> *return* docScorer.score(docId, 0);
>
> }
>
>
>
> We are very open to different implementations as well if there is another
> suggestion. Please let me know what steps I can take to implement this
> change and submit for review.
>
>
>
> Thank you,
>
> Cameron VandenBerg
>
>
>
>
>