Problem: I have indexed the filepath and the content of thousands of
documents and can successfully query the index on the text to return a
collection of filepaths. Now I need to create a collection of the tokens in
the index which matched the query.
I can see that there are solutions to a related problem, which is how I
could highlight the matching terms if I displayed relevant fragments of the
document contents. But I don't want to do this; I just want a list of the
tokens. The tokens are in the index, the tokens are matched by the query. It
seems a lot of extra work to take the selected document, retokenize it,
re-execute the query and replace the matching tokens when surely the tokens
which match the query are accessible somewhere. (Besides, I can't use
Lucene's highlighting to display the document with highlights, because the
index is not built from the displayed document but from a pre-processed
extract of it, and I don't want to just display fragments of it).
I thought the Explanation class might be what I need to use but when I
display the content of the explanation for each matching document I see only
something like this:
score=5.9498425
0.0 = No matching clauses
which is no help at all.
Is this a wild goose chase or is it achievable somehow?
cheers
T
documents and can successfully query the index on the text to return a
collection of filepaths. Now I need to create a collection of the tokens in
the index which matched the query.
I can see that there are solutions to a related problem, which is how I
could highlight the matching terms if I displayed relevant fragments of the
document contents. But I don't want to do this; I just want a list of the
tokens. The tokens are in the index, the tokens are matched by the query. It
seems a lot of extra work to take the selected document, retokenize it,
re-execute the query and replace the matching tokens when surely the tokens
which match the query are accessible somewhere. (Besides, I can't use
Lucene's highlighting to display the document with highlights, because the
index is not built from the displayed document but from a pre-processed
extract of it, and I don't want to just display fragments of it).
I thought the Explanation class might be what I need to use but when I
display the content of the explanation for each matching document I see only
something like this:
score=5.9498425
0.0 = No matching clauses
which is no help at all.
Is this a wild goose chase or is it achievable somehow?
cheers
T