Mailing List Archive

multiple-term queries and term numbers
A couple of questions:

(1) I'm confused by the API for queries which have multiple terms. How do
I construct a query where each term in the query has a different boost
factor?


(2) Does the Lucene API provide a mechanism for efficiently looking up the
number of a term in an index (that is, the position of that term in the
TermEnum associated with the index)? I need to be able to get this
information for the term expansion that I'm doing on the query (in other
words, given each term in the query, I need to get its number so that I
can access the pre-computed sets of candidate terms in the general lexicon
to which the query may be expanded).

If there is no such mechanism in the API, there are only two reasonably
efficient methods that come to my mind for getting this information. If
I've missed something obvious and efficient, or if my ideas won't work, by
all means tell me. :)

a. At startup, since the TermEnum for an index appears to be sorted
lexiocographically, load the TermEnum into an an array and then do binary
search on the array looking for the term (making sure to run the term
through the Analyzer first).

b. Instead of representing my term lookup tables by arrays of arrays,
represent them as hash tables (keyed by some munging of the string
which represents the term) of arrays.

Thanks in advance for any assistance.

Regards,

Joshua O'Madadhain

jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>