Mailing List Archive: lucene explanation

lucene explanation

Dec 22, 2008, 1:48 PM

Post #1 of 5 (881 views)

Hello,
I'm wondering what the best way to accomplish this is.
When a user enters text to search on it customarily searches 3 fields, resume_text, profile_text, and summary_text, so a standard query would be something like:
(resume_text:(query) OR profile_text:(query) OR summary_text:(query))
For each hit (up to 50) I'd like to find out which part of the query matched with the document. Right now I use the Explanation object, here's the code:
int len = hits.length();
if(len > 50) len = 50;
for(int i=0; i<len; i++){
Explanation ex = searcher.explain(Query.parse("resume_text:(query)"), hits.id(i));
if(ex.isMatch()) ...
ex = searcher.explain(Query.parse("profile_text:(query)"), hits.id(i));
if(ex.isMatch()) ...
ex = searcher.explain(Query.parse("summary_text:(query)"), hits.id(i));
if(ex.isMatch()) ...
}
This works fine with regular queries, but if someone does a query with a wildcard search times increase to more than 30 seconds. Is there a better way to do this?
Thanks
Sincerely,
Chris Salem

Re: lucene explanation [ In reply to ]

erickerickson at gmail

Dec 22, 2008, 2:00 PM

Post #2 of 5 (873 views)

Permalink

Warning! I'm really reaching on this....

But it seems you could use TermDocs/TermEnum to
good effect here. Basically, you should be able, for a
given term, use the above to determine whether
doc N had a hit in one of your fields pretty efficiently.
There's even a WildcardTermEnum that will iterate
over wildcards.

Filters are surprisingly fast to construct, so you could
use the above to construct a filter on each term for
each field. Then determining whether the doc is a hit
for a particular field is just a matter of seeing if
that bit is on in the relevant filter.

Either one should be waaaay under 30 seconds,
although I don't know how big your index is
or how encompassing your wildcard searches
are...

FWIW
Erick

On Mon, Dec 22, 2008 at 4:48 PM, Chris Salem <chris@mainsequence.net> wrote:

> Hello,
> I'm wondering what the best way to accomplish this is.
> When a user enters text to search on it customarily searches 3 fields,
> resume_text, profile_text, and summary_text, so a standard query would be
> something like:
> (resume_text:(query) OR profile_text:(query) OR summary_text:(query))
> For each hit (up to 50) I'd like to find out which part of the query
> matched with the document. Right now I use the Explanation object, here's
> the code:
> int len = hits.length();
> if(len > 50) len = 50;
> for(int i=0; i<len; i++){
> Explanation ex = searcher.explain(Query.parse("resume_text:(query)"),
> hits.id(i));
> if(ex.isMatch()) ...
> ex = searcher.explain(Query.parse("profile_text:(query)"), hits.id(i));
> if(ex.isMatch()) ...
> ex = searcher.explain(Query.parse("summary_text:(query)"), hits.id(i));
> if(ex.isMatch()) ...
> }
> This works fine with regular queries, but if someone does a query with a
> wildcard search times increase to more than 30 seconds. Is there a better
> way to do this?
> Thanks
> Sincerely,
> Chris Salem
>

Re: lucene explanation [ In reply to ]

chris at mainsequence

Dec 23, 2008, 7:58 AM

Post #3 of 5 (847 views)

Permalink

That worked perfectly.
Thanks alot!
Sincerely,
Chris Salem

----- Original Message -----
To: java-user@lucene.apache.org
From: Erick Erickson <erickerickson@gmail.com>
Sent: 12/22/2008 5:00:51 PM
Subject: Re: lucene explanation

Warning! I'm really reaching on this....

But it seems you could use TermDocs/TermEnum to
good effect here. Basically, you should be able, for a
given term, use the above to determine whether
doc N had a hit in one of your fields pretty efficiently.
There's even a WildcardTermEnum that will iterate
over wildcards.

Filters are surprisingly fast to construct, so you could
use the above to construct a filter on each term for
each field. Then determining whether the doc is a hit
for a particular field is just a matter of seeing if
that bit is on in the relevant filter.

Either one should be waaaay under 30 seconds,
although I don't know how big your index is
or how encompassing your wildcard searches
are...

FWIW
Erick

On Mon, Dec 22, 2008 at 4:48 PM, Chris Salem <chris@mainsequence.net> wrote:

> Hello,
> I'm wondering what the best way to accomplish this is.
> When a user enters text to search on it customarily searches 3 fields,
> resume_text, profile_text, and summary_text, so a standard query would be
> something like:
> (resume_text:(query) OR profile_text:(query) OR summary_text:(query))
> For each hit (up to 50) I'd like to find out which part of the query
> matched with the document. Right now I use the Explanation object, here's
> the code:
> int len = hits.length();
> if(len > 50) len = 50;
> for(int i=0; i<len; i++){
> Explanation ex = searcher.explain(Query.parse("resume_text:(query)"),
> hits.id(i));
> if(ex.isMatch()) ...
> ex = searcher.explain(Query.parse("profile_text:(query)"), hits.id(i));
> if(ex.isMatch()) ...
> ex = searcher.explain(Query.parse("summary_text:(query)"), hits.id(i));
> if(ex.isMatch()) ...
> }
> This works fine with regular queries, but if someone does a query with a
> wildcard search times increase to more than 30 seconds. Is there a better
> way to do this?
> Thanks
> Sincerely,
> Chris Salem
>

Re: Lucene Explanation [ In reply to ]

msokolov at gmail

Apr 12, 2021, 6:00 AM

Post #4 of 5 (269 views)

Permalink

You might want to check out
https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to
implement some debugging utilities on top of Explain. It never got
committed, but it does explore some of the challenges around
introducing a more structured explain response.

On Fri, Apr 9, 2021 at 6:40 PM Puneeth Bikkumanla
<puneeth.bikkumanla@mongodb.com.invalid> wrote:
>
> Hello,
> I am currently working on a project that would like to implement Document
> Explain where we can see how a document was scored internally in lucene
> given a query.
>
> I see that the IndexSearcher has an explain
> <https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int->
> method
> available that returns an Explanation
> <https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Explanation.html>
> object. An Explanation object only contains a description field (string)
> but there is no way to know what part of a score that Explanation object is
> for without parsing the description field itself. We wanted to implement
> Document Explain in a more safe way where we could know what part of the
> score an Explanation object is associated with and not parse the
> description string field to find out. Here are a few of the options I have
> thought of:
>
> 1. I was thinking about extending the similarity class (BM25Similarity) and
> then overriding the particular methods that dealt with the different
> subcomponents of explain but saw that the explainTF
> <https://github.com/apache/lucene/blob/e510ef11c2a4307dd6ecc8c8974eef2c04e3e4d6/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L268>
> method
> is private. Is there a reason why this is? It would be very useful if it
> could be public so that I can override it and store the knowledge that the
> returned Explanation is for the TF component of the document score.
>
> 2. I also thought about extending the IndexSearcher and overriding the
> createWeight method to store the weight structure and then use that to
> understand the resulting Explanation structure from the IndexSearcher's
> explain method.
>
> Please let me know if any of that didn't make sense. Also, if anyone has
> any other ideas on how I could approach this problem suggestions would be
> greatly appreciated. Lastly, I would be happy to submit a PR to modify
> Lucene's Explanation to be more aware of where it is created.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene Explanation [ In reply to ]

puneeth.bikkumanla at mongodb

Apr 23, 2021, 8:24 AM

Post #5 of 5 (267 views)

Permalink

Thank you this was very helpful!

On Mon, Apr 12, 2021 at 9:07 AM Michael Sokolov <msokolov@gmail.com> wrote:

> You might want to check out
> https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to
> implement some debugging utilities on top of Explain. It never got
> committed, but it does explore some of the challenges around
> introducing a more structured explain response.
>
> On Fri, Apr 9, 2021 at 6:40 PM Puneeth Bikkumanla
> <puneeth.bikkumanla@mongodb.com.invalid> wrote:
> >
> > Hello,
> > I am currently working on a project that would like to implement Document
> > Explain where we can see how a document was scored internally in lucene
> > given a query.
> >
> > I see that the IndexSearcher has an explain
> > <
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-
> >
> > method
> > available that returns an Explanation
> > <
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Explanation.html
> >
> > object. An Explanation object only contains a description field (string)
> > but there is no way to know what part of a score that Explanation object
> is
> > for without parsing the description field itself. We wanted to implement
> > Document Explain in a more safe way where we could know what part of the
> > score an Explanation object is associated with and not parse the
> > description string field to find out. Here are a few of the options I
> have
> > thought of:
> >
> > 1. I was thinking about extending the similarity class (BM25Similarity)
> and
> > then overriding the particular methods that dealt with the different
> > subcomponents of explain but saw that the explainTF
> > <
> https://github.com/apache/lucene/blob/e510ef11c2a4307dd6ecc8c8974eef2c04e3e4d6/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L268
> >
> > method
> > is private. Is there a reason why this is? It would be very useful if it
> > could be public so that I can override it and store the knowledge that
> the
> > returned Explanation is for the TF component of the document score.
> >
> > 2. I also thought about extending the IndexSearcher and overriding the
> > createWeight method to store the weight structure and then use that to
> > understand the resulting Explanation structure from the IndexSearcher's
> > explain method.
> >
> > Please let me know if any of that didn't make sense. Also, if anyone has
> > any other ideas on how I could approach this problem suggestions would be
> > greatly appreciated. Lastly, I would be happy to submit a PR to modify
> > Lucene's Explanation to be more aware of where it is created.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>