Hi Adrien,
maybe it changed a bit, but last time I looked into is it was somehow
wrapping all Queries using a wrapper "NamedQuery" or similiar. When it
collected hits it was able to figure out by a wrapper somewhere around
weight/scorer/DISI and set a flag that the query was a hit. It could be
that this bit is only set when it goes into the topdocs, but in general
the work was done at collection phase.
I use this feature quite often also with scanning results and it is very
fast like without named query (at least for my queries - maybe the
result scanning and data transfer took longer than the overhead).
Uwe
P.S.: We at PANGAEA use the feature to implement our "OAI-PMH sets"
(Open Archives Protocol for Metadata Harvesting, a standard API used in
library world). This is for datacenters harvesting our metadata and all
the delivered results dynamically get their assigned sets tagged
(representated as queries). All those set queries are added a named
should queries to the main query and for each result it returns which
set a PANGAEA dataset belongs to (as this is required by the protocol).
Am 27.06.2022 um 13:48 schrieb Adrien Grand:
> Uwe,
>
> Elasticsearch's named queries are not using a collector actually. Ater
> top hits have been evaluated for the whole query, they are evaluated
> independently on each of the top hits. It's probably faster than the
> collector approach since it doesn't add per-document overhead to
> collection, but also less flexible since it cannot compute statistics
> across all matches.
>
> On Mon, Jun 27, 2022 at 12:01 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> I think the collector approach is perfectly fine for
> mass-processing of queries.
>
> By the way: Elasticserach/Opensearch have a feature already
> built-in and it is working based on collector API in a similar way
> like you mentioned (as far as I remember). It is a bit different
> as you can tag any clause in a BQ (so every query) using a "name"
> (they call it "named query",
> https://www.elastic.co/guide/en/elasticsearch/reference/8.2/query-dsl-bool-query.html#named-queries).
> When you get the search results, for each hit it tells you which
> named queries were a match on the hit. The actual implementation
> is some wrapper query on each of those clauses that contains the
> name. In hit collection it just collects all named query instances
> found in query tree. I think their implementation somehow the
> wrapper query scorer impl adds the name to some global state.
>
> Uwe
>
> Am 27.06.2022 um 11:51 schrieb Shai Erera:
>> Out of curiosity and for education purposes, is the Collector
>> approach I proposed wrong/inefficient? Or less efficient than the
>> matches() API?
>>
>> I'm thinking, if you want to both match/rank documents and as a
>> side effect know which fields matched, the Collector will perform
>> better than Weight.matches(), but I could be wrong.
>>
>> Shai
>>
>> On Mon, Jun 27, 2022 at 11:57 AM Dawid Weiss
>> <dawid.weiss@gmail.com> wrote:
>>
>> The matches API is awesome. Use it. You can also get a rough
>> glimpse
>> into a superset of fields potentially matching the query via:
>>
>> query.visit(
>> new QueryVisitor() {
>> @Override
>> public boolean acceptField(String field) {
>> affectedFields.add(field);
>> return false;
>> }
>> });
>>
>> https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/search/Query.html#visit(org.apache.lucene.search.QueryVisitor)
>>
>> I'd go with the Matches API though.
>>
>> Dawid
>>
>> On Mon, Jun 27, 2022 at 10:48 AM Alan Woodward
>> <romseygeek@gmail.com> wrote:
>> >
>> > The Matches API will give you this information - it’s still
>> likely to be fairly slow, but it’s a lot easier to use than
>> trying to parse Explain output.
>> >
>> > Query q = ….;
>> > Weight w = searcher.createWeight(searcher.rewrite(query),
>> ScoreMode.COMPLETE_NO_SCORES, 1.0f);
>> >
>> > Matches m = w.matches(context, doc);
>> > List<String> matchingFields = new ArrayList();
>> > for (String field : m) {
>> > matchingFields.add(field);
>> > }
>> >
>> > Bear in mind that `matches` doesn’t maintain any state
>> between calls, so calling it for every matching document is
>> likely to be slow; for those cases Shai’s suggestion of using
>> a Collector and examining low-level scorers will perform
>> better, but it won’t work for every query type.
>> >
>> >
>> > > On 25 Jun 2022, at 04:14, Yichen Sun <yichen98@bu.edu> wrote:
>> > >
>> > > Hello!
>> > >
>> > > I’m a MSCS student from BU and learning to use Lucene.
>> Recently I try to output matched fields by one query. For
>> example, for one document, there are 10 fields and 2 of them
>> match the query. I want to get the name of these fields.
>> > >
>> > > I have tried using explain() method and getting
>> description then regex. However it cost so much time.
>> > >
>> > > I wonder what is the efficient way to get the matched
>> fields. Would you please offer some help? Thank you so much!
>> > >
>> > > Best regards,
>> > > Yichen Sun
>> >
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail:uwe@thetaphi.de
>
>
>
> --
> Adrien
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de eMail:uwe@thetaphi.de