Mailing List Archive: retrieving search matches with their frequency and positions

retrieving search matches with their frequency and positions

Jul 9, 2023, 12:34 AM

Post #1 of 7 (257 views)

Good Morning everyone!

I'm new to Lucene and I use currently version 8.11.2.
I'm doing a simple boolean query. After I've executed the search() method and got results, I'd like to get infotmation about how often a term from the query has been matched. In other words, I'd like to get the matches in a form of terms with properties like frequncy and positions.
How can achive this?

Thanks in advance!
Ned

Re: retrieving search matches with their frequency and positions [ In reply to ]

mkhl at apache

Jul 9, 2023, 1:47 AM

Post #2 of 7 (257 views)

Permalink

Hello Ned.
This information is available in explain()
Also in a low level it's available via freq() and docFreq()
https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/package-summary.html#terms

On Sun, Jul 9, 2023 at 10:35?AM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Good Morning everyone!
>
> I'm new to Lucene and I use currently version 8.11.2.
> I'm doing a simple boolean query. After I've executed the search() method
> and got results, I'd like to get infotmation about how often a term from
> the query has been matched. In other words, I'd like to get the matches in
> a form of terms with properties like frequncy and positions.
> How can achive this?
>
> Thanks in advance!
> Ned
>
>

--
Sincerely yours
Mikhail Khludnev

AW: retrieving search matches with their frequency and positions [ In reply to ]

nedyalko.zhekov at freelance

Jul 10, 2023, 2:18 AM

Post #3 of 7 (257 views)

Permalink

Hello Mikhail,

Great, thanks for the very fast response! The link that you provided is very useful and informative.

Though, I have an understanding issue. After I have searched for a search term, I get always TopDocs that represent the found documents. In my understanding there is no relation to the found terms. How can I fetch the matched terms that were passed by the query object? Then I could fetch the term statistics that is anyway provided by the analyzer or indexer.

I've found the MatchesIterator interface and FilterMatchesIterator class but was not able to use it.

Thank you!
Ned

Re: retrieving search matches with their frequency and positions [ In reply to ]

mkhl at apache

Jul 10, 2023, 2:53 AM

Post #4 of 7 (257 views)

Permalink

Hi Ned.
It's about
TopDocs topDocs = searcher.search(query, 10);

for (int i = 0; i < topDocs.scoreDocs.length; i++) {
MatchesIterator matches = searcher.matches(topDocs.scoreDocs[i].
doc, "fieldName", query);
while (matches.next()) { ...

This is (almost) how highlighters (like
https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html)
work.
In some sort you can get
https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-

On Mon, Jul 10, 2023 at 12:19?PM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Hello Mikhail,
>
> Great, thanks for the very fast response! The link that you provided is
> very useful and informative.
>
> Though, I have an understanding issue. After I have searched for a search
> term, I get always TopDocs that represent the found documents. In my
> understanding there is no relation to the found terms. How can I fetch the
> matched terms that were passed by the query object? Then I could fetch the
> term statistics that is anyway provided by the analyzer or indexer.
>
> I've found the MatchesIterator interface and FilterMatchesIterator class
> but was not able to use it.
>
> Thank you!
> Ned
>

--
Sincerely yours
Mikhail Khludnev

AW: retrieving search matches with their frequency and positions [ In reply to ]

nedyalko.zhekov at freelance

Jul 10, 2023, 4:08 AM

Post #5 of 7 (257 views)

Permalink

Hi Mikhail,

I don't see the matches `searcher.matches(topDocs.scoreDocs[i].doc, "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and currently I cannot upgrade to 9.0.0 or later.

Any ideas? Which API version are you referring to?

Thanks.
Ned
________________________________
Von: Mikhail Khludnev <mkhl@apache.org>
Gesendet: Montag, 10. Juli 2023 11:53
An: java-user@lucene.apache.org <java-user@lucene.apache.org>
Betreff: Re: retrieving search matches with their frequency and positions

Hi Ned.
It's about
TopDocs topDocs = searcher.search(query, 10);

for (int i = 0; i < topDocs.scoreDocs.length; i++) {
MatchesIterator matches = searcher.matches(topDocs.scoreDocs[i].
doc, "fieldName", query);
while (matches.next()) { ...

This is (almost) how highlighters (like
https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F9_0_0%2Fhighlighter%2Forg%2Fapache%2Flucene%2Fsearch%2Fuhighlight%2FUnifiedHighlighter.html&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jI11xiAmxlCshNftDNxw9QRN8NuZayjMcw4mddQTYsQ%3D&reserved=0)<https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html>
work.
In some sort you can get
https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F7_3_1%2Fcore%2Forg%2Fapache%2Flucene%2Fsearch%2FIndexSearcher.html%23explain-org.apache.lucene.search.Query-int-&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ceSRvWrEvxiFkkEY4GgHKt71l8xpaMFI34yrNOJsUPg%3D&reserved=0<https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int->

On Mon, Jul 10, 2023 at 12:19?PM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Hello Mikhail,
>
> Great, thanks for the very fast response! The link that you provided is
> very useful and informative.
>
> Though, I have an understanding issue. After I have searched for a search
> term, I get always TopDocs that represent the found documents. In my
> understanding there is no relation to the found terms. How can I fetch the
> matched terms that were passed by the query object? Then I could fetch the
> term statistics that is anyway provided by the analyzer or indexer.
>
> I've found the MatchesIterator interface and FilterMatchesIterator class
> but was not able to use it.
>
> Thank you!
> Ned
>

--
Sincerely yours
Mikhail Khludnev

Re: retrieving search matches with their frequency and positions [ In reply to ]

mkhl at apache

Jul 10, 2023, 5:14 AM

Post #6 of 7 (257 views)

Permalink

OK
https://lucene.apache.org/core/8_11_2/core/org/apache/lucene/search/Weight.html#matches-org.apache.lucene.index.LeafReaderContext-int-

On Mon, Jul 10, 2023 at 2:08?PM nedyalko.zhekov@freelance.de.INVALID
<nedyalko.zhekov@freelance.de.invalid> wrote:

> Hi Mikhail,
>
> I don't see the matches `searcher.matches(topDocs.scoreDocs[i].doc,
> "fieldName", query);` method exposed. I'm using lucene core 8.11.2 and
> currently I cannot upgrade to 9.0.0 or later.
>
> Any ideas? Which API version are you referring to?
>
> Thanks.
> Ned
> ________________________________
> Von: Mikhail Khludnev <mkhl@apache.org>
> Gesendet: Montag, 10. Juli 2023 11:53
> An: java-user@lucene.apache.org <java-user@lucene.apache.org>
> Betreff: Re: retrieving search matches with their frequency and positions
>
> Hi Ned.
> It's about
> TopDocs topDocs = searcher.search(query, 10);
>
> for (int i = 0; i < topDocs.scoreDocs.length; i++) {
> MatchesIterator matches =
> searcher.matches(topDocs.scoreDocs[i].
> doc, "fieldName", query);
> while (matches.next()) { ...
>
> This is (almost) how highlighters (like
>
> https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F9_0_0%2Fhighlighter%2Forg%2Fapache%2Flucene%2Fsearch%2Fuhighlight%2FUnifiedHighlighter.html&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jI11xiAmxlCshNftDNxw9QRN8NuZayjMcw4mddQTYsQ%3D&reserved=0
> )<
> https://lucene.apache.org/core/9_0_0/highlighter/org/apache/lucene/search/uhighlight/UnifiedHighlighter.html
> >
> work.
> In some sort you can get
>
> https://deu01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fcore%2F7_3_1%2Fcore%2Forg%2Fapache%2Flucene%2Fsearch%2FIndexSearcher.html%23explain-org.apache.lucene.search.Query-int-&data=05%7C01%7Cnedyalko.zhekov%40freelance.de%7C159d819cd85a4a40a19408db812b9830%7C5846b1298c984422b5285c15a8f724b7%7C0%7C0%7C638245796461832199%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ceSRvWrEvxiFkkEY4GgHKt71l8xpaMFI34yrNOJsUPg%3D&reserved=0
> <
> https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-
> >
>
>
> On Mon, Jul 10, 2023 at 12:19?PM nedyalko.zhekov@freelance.de.INVALID
> <nedyalko.zhekov@freelance.de.invalid> wrote:
>
> > Hello Mikhail,
> >
> > Great, thanks for the very fast response! The link that you provided is
> > very useful and informative.
> >
> > Though, I have an understanding issue. After I have searched for a search
> > term, I get always TopDocs that represent the found documents. In my
> > understanding there is no relation to the found terms. How can I fetch
> the
> > matched terms that were passed by the query object? Then I could fetch
> the
> > term statistics that is anyway provided by the analyzer or indexer.
> >
> > I've found the MatchesIterator interface and FilterMatchesIterator class
> > but was not able to use it.
> >
> > Thank you!
> > Ned
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

--
Sincerely yours
Mikhail Khludnev

AW: retrieving search matches with their frequency and positions [ In reply to ]

nedyalko.zhekov at freelance

Jul 17, 2023, 10:34 AM

Post #7 of 7 (245 views)

Permalink

tHi Mikhail,

I've finally implemented in this way. Sorry for the delayed answer.

TopDocs topDocs = this.searcher.search(query, maxResults);
Weight weight = query.rewrite(this.searcher.getIndexReader()).createWeight(this.searcher, ScoreMode.TOP_DOCS, 1.0f);

for (ScoreDoc scoreDoc : topDocs.scoreDocs) {

Matches matches = weight.matches(this.searcher.getIndexReader().leaves().get(0), scoreDoc.doc);
MatchesIterator matchesIterator = matches.getMatches(FIELD_CONTENT_NAME);
while(matchesIterator.next()) {
Query matchedQuery = matchesIterator.getQuery();
Set<Term> matchedTerms = this.extractMatchingTerms(matchedQuery);

// do whatever needed with the terms that are matching
}
}

protected Set<Term> extractMatchingTerms(Query query) throws IOException {

Set<Term> queryTerms = new HashSet<>();
this.searcher.rewrite(query).visit(QueryVisitor.termCollector(queryTerms));
return queryTerms;
}

All matched terms are stored in the Set<Term> matchedTerms variable.
Based on that, one can use the statistcs coming from the analyser and get the frequencies of the terms in the field as well their positions. If you have WildcardQuery you want be able to match terms.

Thanks for your hints.
Ned