Mailing List Archive

the list of terms contained in a document
dear all,

we have a linguistics project running here and we
want to use lucene for the
information retrieval. rather then just searching
for specific terms we want
to build frequency lists and detect coocurrences
of terms.

what we need is some kind of the following
functionality (I will give what I
think could be a resulting API)

1. IndexSearcher.search(query) (already implemented)
2. Hits.getLength() (already implemented)
3. for (...) Hits.doc(i).getTerms() or
Hits.doc(i).getTerms(Field) (required)
(4. and for each returned doc its frequency, but
that is the same as above -
or could it be retrieved together with the term list?)

This means, that if I get a Hits object back, I
want for all its documents to
get the terms and their frequency. sure, I could
look the document up and
parse it - again. but then if the first query
produces, say 20.000 hits, I
would have to reparse these 20.000 documents while
this parsing has already
been done for the index creation. instead I wanted
to ask if there is a
possibility within the existing classes (or at
least with some use of them
and some new ones) to retrieve this information:
to wich terms a single
document is assigned to.

thanx a lot for any help or hint
sincerely,
Chantal

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
the list of terms contained in a document [ In reply to ]
dear all,

we have a linguistics project running here and we
want to use lucene for the
information retrieval. rather then just searching
for specific terms we want
to build frequency lists and detect coocurrences
of terms.

what we need is some kind of the following
functionality (I will give what I
think could be a resulting API)

1. IndexSearcher.search(query) (already implemented)
2. Hits.getLength() (already implemented)
3. for (...) Hits.doc(i).getTerms() or
Hits.doc(i).getTerms(Field) (required)
(4. and for each returned doc its frequency, but
that is the same as above -
or could it be retrieved together with the term list?)

This means, that if I get a Hits object back, I
want for all its documents to
get the terms and their frequency. sure, I could
look the document up and
parse it - again. but then if the first query
produces, say 20.000 hits, I
would have to reparse these 20.000 documents while
this parsing has already
been done for the index creation. instead I wanted
to ask if there is a
possibility within the existing classes (or at
least with some use of them
and some new ones) to retrieve this information:
to wich terms a single
document is assigned to.

thanx a lot for any help or hint
sincerely,
Chantal


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>