Mailing List Archive

find the collection of tokens matching a query in Lucene 8.6.3
Problem: I have indexed the filepath and the content of thousands of
documents and can successfully query the index on the text to return a
collection of filepaths. Now I need to create a collection of the tokens in
the index which matched the query.



I can see that there are solutions to a related problem, which is how I
could highlight the matching terms if I displayed relevant fragments of the
document contents. But I don't want to do this; I just want a list of the
tokens. The tokens are in the index, the tokens are matched by the query. It
seems a lot of extra work to take the selected document, retokenize it,
re-execute the query and replace the matching tokens when surely the tokens
which match the query are accessible somewhere. (Besides, I can't use
Lucene's highlighting to display the document with highlights, because the
index is not built from the displayed document but from a pre-processed
extract of it, and I don't want to just display fragments of it).



I thought the Explanation class might be what I need to use but when I
display the content of the explanation for each matching document I see only
something like this:



score=5.9498425

0.0 = No matching clauses



which is no help at all.



Is this a wild goose chase or is it achievable somehow?



cheers

T
Re: find the collection of tokens matching a query in Lucene 8.6.3 [ In reply to ]
Hi,

Never done in myself, but from the doc :
From the query (
https://lucene.apache.org/core/8_6_3/core/org/apache/lucene/search/Query.html),
you can retrieve the Weight (
https://lucene.apache.org/core/8_6_3/core/org/apache/lucene/search/Weight.html),
from which you can access the Matches (
https://lucene.apache.org/core/8_6_3/core/org/apache/lucene/search/Matches.html).
That should give you access to the token positions, and such access to the
tokens that maches.

Le ven. 9 juil. 2021 à 14:05, Trevor Nicholls <trevor@castingthevoid.com> a
écrit :

> Problem: I have indexed the filepath and the content of thousands of
> documents and can successfully query the index on the text to return a
> collection of filepaths. Now I need to create a collection of the tokens in
> the index which matched the query.
>
>
>
> I can see that there are solutions to a related problem, which is how I
> could highlight the matching terms if I displayed relevant fragments of the
> document contents. But I don't want to do this; I just want a list of the
> tokens. The tokens are in the index, the tokens are matched by the query.
> It
> seems a lot of extra work to take the selected document, retokenize it,
> re-execute the query and replace the matching tokens when surely the tokens
> which match the query are accessible somewhere. (Besides, I can't use
> Lucene's highlighting to display the document with highlights, because the
> index is not built from the displayed document but from a pre-processed
> extract of it, and I don't want to just display fragments of it).
>
>
>
> I thought the Explanation class might be what I need to use but when I
> display the content of the explanation for each matching document I see
> only
> something like this:
>
>
>
> score=5.9498425
>
> 0.0 = No matching clauses
>
>
>
> which is no help at all.
>
>
>
> Is this a wild goose chase or is it achievable somehow?
>
>
>
> cheers
>
> T
>
>
>
>
>
>