I am trying to construct a term-term correlation matrix from the data
stored in the index, for an extension to the vector model that I am
researching. In case my terminology is unfamiliar, what I need in order
to do this is, for each term t, a list of those documents which contain t
(also having a record of the number of times that t occurs in each would
be a nice bonus).
From this I can calculate the rest of what I need (number of times that
terms t1 and t2 occur in the same document, etc.).
If necessary I could squeeze by with just knowing the number of documents
in which t1, t2, and the combination (t1 AND t2) appear, but having the
above information from which to work would give me more flexibility.
Anyway, if there is a straightforward way of doing this that I have not
yet spotted, I'd like to know what it is; if not, pointing me at the
appropriate chunks of the source to start hacking on would also be
appreciated.
Thanks in advance for any help that may be offered.
Regards,
Joshua O'Madadhain (Madden)
jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
stored in the index, for an extension to the vector model that I am
researching. In case my terminology is unfamiliar, what I need in order
to do this is, for each term t, a list of those documents which contain t
(also having a record of the number of times that t occurs in each would
be a nice bonus).
From this I can calculate the rest of what I need (number of times that
terms t1 and t2 occur in the same document, etc.).
If necessary I could squeeze by with just knowing the number of documents
in which t1, t2, and the combination (t1 AND t2) appear, but having the
above information from which to work would give me more flexibility.
Anyway, if there is a straightforward way of doing this that I have not
yet spotted, I'd like to know what it is; if not, pointing me at the
appropriate chunks of the source to start hacking on would also be
appreciated.
Thanks in advance for any help that may be offered.
Regards,
Joshua O'Madadhain (Madden)
jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>