Hello Lucene Developers,
We're working on a search service which uses lucene indexes. One of the things I'm hoping to find is different places where we can plug in our custom classes during the search process.
This first use case is for highlighting. The legacy search engine we use collects all term positions for highlighting during the search process. So everything happens all at once instead of the search-first-then-highlight-model. For how we use highlighting, this is more efficient for us, instead of reprocessing the query.
One thought I had was creating a custom scorer that would be called during search, and it would gather highlights in addition to scoring. I think this would be especially useful for proximity queries, or any other scoring based on positions of words in the document. Instead of advancing the term vectors and finding phrases in a document at search time, and then doing it AGAIN at highlight time, if there was a way to access the data used by the search process.
Any suggestions, comments, or references that would enlighten me would be appreciated. I've had great difficulty finding helpful documents as I get to know Lucene.
Thanks,
Chris Hahn
This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
We're working on a search service which uses lucene indexes. One of the things I'm hoping to find is different places where we can plug in our custom classes during the search process.
This first use case is for highlighting. The legacy search engine we use collects all term positions for highlighting during the search process. So everything happens all at once instead of the search-first-then-highlight-model. For how we use highlighting, this is more efficient for us, instead of reprocessing the query.
One thought I had was creating a custom scorer that would be called during search, and it would gather highlights in addition to scoring. I think this would be especially useful for proximity queries, or any other scoring based on positions of words in the document. Instead of advancing the term vectors and finding phrases in a document at search time, and then doing it AGAIN at highlight time, if there was a way to access the data used by the search process.
Any suggestions, comments, or references that would enlighten me would be appreciated. I've had great difficulty finding helpful documents as I get to know Lucene.
Thanks,
Chris Hahn
This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html