Mailing List Archive

Re: Getting the terms that matched the HitDoc? & Relevance Feedback
>
>
>Subject:
>
>RE: Relevance Feedback
>From:
>
>Doug Cutting <DCutting@grandcentral.com>
>Date:
>
>Sat, 30 Mar 2002 08:51:39 -0800
>To:
>
>Lucene Users List <lucene-user@jakarta.apache.org>
>
>
>Dmitry Serebrennikov [dmitrys@earthlink.net] has implemented a substantial
>extension to Lucene which should help folks doing this sort of research. It
>provides an explicit vector representation for documents. This way you can,
>e.g., retrieve a number of documents, efficiently sum their vectors, then
>derive a new query from the sum. This code was posted to the list a long
>while back, but is now out of date. As soon as the 1.2 release is final,
>and Dmitry has time, he intends to merge it into Lucene.
>
>Doug
>
Thanks, Doug.

This is true. The code was actually intended, and is being used, for
something more like what is being discussed on the "Relevance Feedback"
thread - retrieving lists of terms based on matching documents. Terms as
well as Term Positions are available from the API given a document.
There is also a cost for using this API. It adds three or four files per
index segment, and one of them is as large as the .prx file (provided
that you choose to "vectorize" every indexed field in the document).
Another issue with the code is that it does not (yet) support use with
unoptimized indexes (those with more than one segment).

Dmitry.





--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>