Mailing List Archive

Re: [jira] [Updated] (LUCENE-10454) UnifiedHighlighter can miss terms because of query rewrites
Isn't this kind of like if a tree falls in the woods and nobody is
there does it make a sound? I mean -- if the index is empty, how can
UH fail? No documents will ever match, ergo no highlights will be
returned, so it seems fine that it is unable to extract terms from the
query.

On Thu, Mar 3, 2022 at 2:00 PM Julie Tibshirani (Jira) <jira@apache.org> wrote:
>
>
> [ https://issues.apache.org/jira/browse/LUCENE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
>
> Julie Tibshirani updated LUCENE-10454:
> --------------------------------------
> Attachment: LUCENE-10454.patch
>
> > UnifiedHighlighter can miss terms because of query rewrites
> > -----------------------------------------------------------
> >
> > Key: LUCENE-10454
> > URL: https://issues.apache.org/jira/browse/LUCENE-10454
> > Project: Lucene - Core
> > Issue Type: Bug
> > Reporter: Julie Tibshirani
> > Priority: Minor
> > Attachments: LUCENE-10454.patch
> >
> >
> > Before extracting terms from a query, UnifiedHighlighter rewrites the query using an empty searcher. If the query rewrites to MatchNoDocsQuery when the reader is empty, then the highlighter will fail to extract terms. This is more of an issue now that we rewrite BooleanQuery to MatchNoDocsQuery when any of its required clauses is MatchNoDocsQuery (https://issues.apache.org/jira/browse/LUCENE-10412). I attached a patch showing the problem.
> > This feels like a pretty esoteric issue, but I figured it was worth raising for awareness. I think it only applies when weightMatches=false, which isn't the default. I couldn't find any existing queries in Lucene that would be affected.
> > We ran into it while upgrading Elasticsearch to the latest Lucene snapshot, since a couple custom queries rewrite to MatchNoDocsQuery when the reader is empty.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.20.1#820001)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
> For additional commands, e-mail: issues-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [jira] [Updated] (LUCENE-10454) UnifiedHighlighter can miss terms because of query rewrites [ In reply to ]
Hi Mike, the issue is that UnifiedHighlighter can rewrite the query using
an empty reader even when the index is not empty:
https://github.com/apache/lucene/blob/branch_9_0/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java#L149.
Otherwise I agree there would be no problem! It's certainly a bit of an
unusual case though.

Julie

On Thu, Mar 3, 2022 at 1:10 PM Michael Sokolov <msokolov@gmail.com> wrote:

> Isn't this kind of like if a tree falls in the woods and nobody is
> there does it make a sound? I mean -- if the index is empty, how can
> UH fail? No documents will ever match, ergo no highlights will be
> returned, so it seems fine that it is unable to extract terms from the
> query.
>
> On Thu, Mar 3, 2022 at 2:00 PM Julie Tibshirani (Jira) <jira@apache.org>
> wrote:
> >
> >
> > [
> https://issues.apache.org/jira/browse/LUCENE-10454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >
> > Julie Tibshirani updated LUCENE-10454:
> > --------------------------------------
> > Attachment: LUCENE-10454.patch
> >
> > > UnifiedHighlighter can miss terms because of query rewrites
> > > -----------------------------------------------------------
> > >
> > > Key: LUCENE-10454
> > > URL:
> https://issues.apache.org/jira/browse/LUCENE-10454
> > > Project: Lucene - Core
> > > Issue Type: Bug
> > > Reporter: Julie Tibshirani
> > > Priority: Minor
> > > Attachments: LUCENE-10454.patch
> > >
> > >
> > > Before extracting terms from a query, UnifiedHighlighter rewrites the
> query using an empty searcher. If the query rewrites to MatchNoDocsQuery
> when the reader is empty, then the highlighter will fail to extract terms.
> This is more of an issue now that we rewrite BooleanQuery to
> MatchNoDocsQuery when any of its required clauses is MatchNoDocsQuery (
> https://issues.apache.org/jira/browse/LUCENE-10412). I attached a patch
> showing the problem.
> > > This feels like a pretty esoteric issue, but I figured it was worth
> raising for awareness. I think it only applies when weightMatches=false,
> which isn't the default. I couldn't find any existing queries in Lucene
> that would be affected.
> > > We ran into it while upgrading Elasticsearch to the latest Lucene
> snapshot, since a couple custom queries rewrite to MatchNoDocsQuery when
> the reader is empty.
> >
> >
> >
> > --
> > This message was sent by Atlassian Jira
> > (v8.20.1#820001)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: issues-help@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>