I noticed a post discussing term vectors on the lucene-dev list. It is great
that people are working on adding term vectors to Lucene. Stored term
vectors per field (or document) are a great way to get lucene into
classification of documents. There are great many text classification
algorithms that have primarily two inputs: term vector for the whole
collection of documents and term vector per each document. A nice feature to
add using the vectors would be 'like-documents': given an indexed document,
what other documents are there in the index that are like it. Another
feature can be clustering of documents into categories based on the vectors.
I am curious when the term vector support that you worked on will be added
to the main build branch. I couldn't find any information on that.
-Emile
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
that people are working on adding term vectors to Lucene. Stored term
vectors per field (or document) are a great way to get lucene into
classification of documents. There are great many text classification
algorithms that have primarily two inputs: term vector for the whole
collection of documents and term vector per each document. A nice feature to
add using the vectors would be 'like-documents': given an indexed document,
what other documents are there in the index that are like it. Another
feature can be clustering of documents into categories based on the vectors.
I am curious when the term vector support that you worked on will be added
to the main build branch. I couldn't find any information on that.
-Emile
--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>