Hi,
I was thinking of implementing a search for similar documents (like some commercial search engines do) and wondering if anyone has
already done something like that with Lucene. I thought of collecting all terms of the selected document (or maybe some subset of
them) and then creating a MultiTermQuery containing those terms. Does it make sense? Is there a better way to achieve this?
In order to do it, I would have to get all terms of a given document and so far I haven't found an easy way of doing it (I hope
there's one ;-). The way I was thinking is to extend FilteredTermEnum but, instead of selecting terms by similarity, select them by
docid (for each term, get its termdocs and check for the desired docid). It doesn't look very efficient so if someone could
contribute with other ideas or even related experiences I'd appreciate very much.
TIA
Best regards,
--Daniel
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
I was thinking of implementing a search for similar documents (like some commercial search engines do) and wondering if anyone has
already done something like that with Lucene. I thought of collecting all terms of the selected document (or maybe some subset of
them) and then creating a MultiTermQuery containing those terms. Does it make sense? Is there a better way to achieve this?
In order to do it, I would have to get all terms of a given document and so far I haven't found an easy way of doing it (I hope
there's one ;-). The way I was thinking is to extend FilteredTermEnum but, instead of selecting terms by similarity, select them by
docid (for each term, get its termdocs and check for the desired docid). It doesn't look very efficient so if someone could
contribute with other ideas or even related experiences I'd appreciate very much.
TIA
Best regards,
--Daniel
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>