Mailing List Archive

Getting Matches in a document
Hi,
Is there any way to get the matches of a document in the *collect(int
doc) *function
of the collector, other than calling the *matches* function of the *Weight*
Class again?

Thanking you in advance,
Arihant.
Re: Getting Matches in a document [ In reply to ]
Hi Arihant,

Getting Matches is a fairly heavy operation and is designed to be used for top-k hits only, a bit like the explain API. Collectors by contrast are supposed to be very lightweight - collect(doc) could get called millions of times during a search. So the two APIs are not really meant to be used together.

Thanks, Alan

> On 15 Jul 2021, at 04:18, Arihant Samar <arisamjay@gmail.com <mailto:arisamjay@gmail.com>> wrote:
>
> Hi,
> Is there any way to get the matches of a document in the collect(int doc) function of the collector, other than calling the matches function of the Weight Class again?
>
> Thanking you in advance,
> Arihant.
Re: Getting Matches in a document [ In reply to ]
Hi,
Sorry, for a little late response from but actually, this doubt was in
response to the Highlighting Matching in the monitor module of Lucene.
Essentially the matcher is calling the *matches* function for each document
in the index corresponding to a selected query and hence I assume it will
go over the index once for each document.
This includes documents that do not match the query at all which could be
easily eliminated if we simply just search the query first.

Hence what I am trying to say is, if we use Lucene's search for the query
and in the collector, if we call the matches function, we will save going
over the index multiple times for all the documents that do not match the
query.
In the benchmarks I did, this gave substantial improvement. Hence I would
like your opinions on this.
Thanking You,
Arihant.


On Thu, 15 Jul 2021 at 15:16, Alan Woodward <romseygeek@gmail.com> wrote:

> Hi Arihant,
>
> Getting Matches is a fairly heavy operation and is designed to be used for
> top-k hits only, a bit like the explain API. Collectors by contrast are
> supposed to be very lightweight - collect(doc) could get called millions of
> times during a search. So the two APIs are not really meant to be used
> together.
>
> Thanks, Alan
>
> On 15 Jul 2021, at 04:18, Arihant Samar <arisamjay@gmail.com> wrote:
>
> Hi,
> Is there any way to get the matches of a document in the *collect(int
> doc) *function of the collector, other than calling the *matches*
> function of the *Weight* Class again?
>
> Thanking you in advance,
> Arihant.
>
>
>