Mailing List Archive

retrieving search matches with their frequency and positions
Good Morning everyone!

I'm new to Lucene and I use currently version 8.11.2.
I'm doing a simple boolean query. After I've executed the search() method and got results, I'd like to get infotmation about how often a term from the query has been matched. In other words, I'd like to get the matches in a form of terms with properties like frequncy and positions.
How can achive this?

Thanks in advance!
Ned
Re: retrieving search matches with their frequency and positions [ In reply to ]
Hello Ned.
This information is available in explain()
Also in a low level it's available via freq() and docFreq()
https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/package-summary.html#terms


On Sun, Jul 9, 2023 at 10:35?AM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Good Morning everyone!
>
> I'm new to Lucene and I use currently version 8.11.2.
> I'm doing a simple boolean query. After I've executed the search() method
> and got results, I'd like to get infotmation about how often a term from
> the query has been matched. In other words, I'd like to get the matches in
> a form of terms with properties like frequncy and positions.
> How can achive this?
>
> Thanks in advance!
> Ned
>
>

--
Sincerely yours
Mikhail Khludnev
AW: retrieving search matches with their frequency and positions [ In reply to ]
Hello Mikhail,

Great, thanks for the very fast response! The link that you provided is very useful and informative.

Though, I have an understanding issue. After I have searched for a search term, I get always TopDocs that represent the found documents. In my understanding there is no relation to the found terms. How can I fetch the matched terms that were passed by the query object? Then I could fetch the term statistics that is anyway provided by the analyzer or indexer.

I've found the MatchesIterator interface and FilterMatchesIterator class but was not able to use it.

Thank you!
Ned
Re: retrieving search matches with their frequency and positions [ In reply to ]
Hi Ned.
It's about
TopDocs topDocs = searcher.search(query, 10);

for (int i = 0; i < topDocs.scoreDocs.length; i++) {
MatchesIterator matches = searcher.matches(topDocs.scoreDocs[i].
doc, "fieldName", query);
while (matches.next()) { ...

This is (almost) how highlighters (like
https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html)
work.
In some sort you can get
https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-


On Mon, Jul 10, 2023 at 12:19?PM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Hello Mikhail,
>
> Great, thanks for the very fast response! The link that you provided is
> very useful and informative.
>
> Though, I have an understanding issue. After I have searched for a search
> term, I get always TopDocs that represent the found documents. In my
> understanding there is no relation to the found terms. How can I fetch the
> matched terms that were passed by the query object? Then I could fetch the
> term statistics that is anyway provided by the analyzer or indexer.
>
> I've found the MatchesIterator interface and FilterMatchesIterator class
> but was not able to use it.
>
> Thank you!
> Ned
>


--
Sincerely yours
Mikhail Khludnev
AW: retrieving search matches with their frequency and positions [ In reply to ]
Hi Mikhail,

I don't see the matches `searcher.matches(topDocs.scoreDocs[i].doc, "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and currently I cannot upgrade to 9.0.0 or later.

Any ideas? Which API version are you referring to?

Thanks.
Ned
________________________________
Von: Mikhail Khludnev <mkhl@apache.org>
Gesendet: Montag, 10. Juli 2023 11:53
An: java-user@lucene.apache.org <java-user@lucene.apache.org>
Betreff: Re: retrieving search matches with their frequency and positions

Hi Ned.
It's about
TopDocs topDocs = searcher.search(query, 10);

for (int i = 0; i < topDocs.scoreDocs.length; i++) {
MatchesIterator matches = searcher.matches(topDocs.scoreDocs[i].
doc, "fieldName", query);
while (matches.next()) { ...

This is (almost) how highlighters (like
https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F9_0_0%2Fhighlighter%2Forg%2Fapache%2Flucene%2Fsearch%2Fuhighlight%2FUnifiedHighlighter.html&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jI11xiAmxlCshNftDNxw9QRN8NuZayjMcw4mddQTYsQ%3D&reserved=0)<https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html>
work.
In some sort you can get
https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F7_3_1%2Fcore%2Forg%2Fapache%2Flucene%2Fsearch%2FIndexSearcher.html%23explain-org.apache.lucene.search.Query-int-&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ceSRvWrEvxiFkkEY4GgHKt71l8xpaMFI34yrNOJsUPg%3D&reserved=0<https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int->


On Mon, Jul 10, 2023 at 12:19?PM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Hello Mikhail,
>
> Great, thanks for the very fast response! The link that you provided is
> very useful and informative.
>
> Though, I have an understanding issue. After I have searched for a search
> term, I get always TopDocs that represent the found documents. In my
> understanding there is no relation to the found terms. How can I fetch the
> matched terms that were passed by the query object? Then I could fetch the
> term statistics that is anyway provided by the analyzer or indexer.
>
> I've found the MatchesIterator interface and FilterMatchesIterator class
> but was not able to use it.
>
> Thank you!
> Ned
>


--
Sincerely yours
Mikhail Khludnev
Re: retrieving search matches with their frequency and positions [ In reply to ]
OK
https://lucene.apache.org/core/8_11_2/core/org/apache/lucene/search/Weight.html#matches-org.apache.lucene.index.LeafReaderContext-int-


On Mon, Jul 10, 2023 at 2:08?PM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Hi Mikhail,
>
> I don't see the matches `searcher.matches(topDocs.scoreDocs[i].doc,
> "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and
> currently I cannot upgrade to 9.0.0 or later.
>
> Any ideas? Which API version are you referring to?
>
> Thanks.
> Ned
> ________________________________
> Von: Mikhail Khludnev <mkhl@apache.org>
> Gesendet: Montag, 10. Juli 2023 11:53
> An: java-user@lucene.apache.org <java-user@lucene.apache.org>
> Betreff: Re: retrieving search matches with their frequency and positions
>
> Hi Ned.
> It's about
> TopDocs topDocs = searcher.search(query, 10);
>
> for (int i = 0; i < topDocs.scoreDocs.length; i++) {
> MatchesIterator matches =
> searcher.matches(topDocs.scoreDocs[i].
> doc, "fieldName", query);
> while (matches.next()) { ...
>
> This is (almost) how highlighters (like
>
> https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F9_0_0%2Fhighlighter%2Forg%2Fapache%2Flucene%2Fsearch%2Fuhighlight%2FUnifiedHighlighter.html&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jI11xiAmxlCshNftDNxw9QRN8NuZayjMcw4mddQTYsQ%3D&reserved=0
> )<
> https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html
> >
> work.
> In some sort you can get
>
> https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F7_3_1%2Fcore%2Forg%2Fapache%2Flucene%2Fsearch%2FIndexSearcher.html%23explain-org.apache.lucene.search.Query-int-&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ceSRvWrEvxiFkkEY4GgHKt71l8xpaMFI34yrNOJsUPg%3D&reserved=0
> <
> https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-
> >
>
>
> On Mon, Jul 10, 2023 at 12:19?PM nedyalko.zhekov@freelance.de.INVALID
> <nedyalko.zhekov@freelance.de.invalid> wrote:
>
> > Hello Mikhail,
> >
> > Great, thanks for the very fast response! The link that you provided is
> > very useful and informative.
> >
> > Though, I have an understanding issue. After I have searched for a search
> > term, I get always TopDocs that represent the found documents. In my
> > understanding there is no relation to the found terms. How can I fetch
> the
> > matched terms that were passed by the query object? Then I could fetch
> the
> > term statistics that is anyway provided by the analyzer or indexer.
> >
> > I've found the MatchesIterator interface and FilterMatchesIterator class
> > but was not able to use it.
> >
> > Thank you!
> > Ned
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


--
Sincerely yours
Mikhail Khludnev
AW: retrieving search matches with their frequency and positions [ In reply to ]
tHi Mikhail,

I've finally implemented in this way. Sorry for the delayed answer.



TopDocs topDocs = this.searcher.search(query, maxResults);
Weight weight = query.rewrite(this.searcher.getIndexReader()).createWeight(this.searcher, ScoreMode.TOP_DOCS, 1.0f);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {

Matches matches = weight.matches(this.searcher.getIndexReader().leaves().get(0), scoreDoc.doc);
MatchesIterator matchesIterator = matches.getMatches(FIELD_CONTENT_NAME);
while(matchesIterator.next()) {
Query matchedQuery = matchesIterator.getQuery();
Set<Term> matchedTerms = this.extractMatchingTerms(matchedQuery);

// do whatever needed with the terms that are matching
}
}

protected Set<Term> extractMatchingTerms(Query query) throws IOException {

Set<Term> queryTerms = new HashSet<>();
this.searcher.rewrite(query).visit(QueryVisitor.termCollector(queryTerms));
return queryTerms;
}

All matched terms are stored in the Set<Term> matchedTerms variable.
Based on that, one can use the statistcs coming from the analyser and get the frequencies of the terms in the field as well their positions. If you have WildcardQuery you want be able to match terms.

Thanks for your hints.
Ned