Hello all,
Attached are two classes that handles Phrase queries of the form "microsoft
app*" where app* is supposed to match all words starting with "app".
It's also possible to handle queries where the prefix-term(s) are in the
middle of the phrase like so: "Microsoft* app*"
---
PhrasePrefixQuery
PhrasePrefixQuery.java is a generalized version of PhraseQuery.java, with an
added method add(Term[]).
To use this class, to search for the phrase "Microsoft app*" first use
add(Term) on the term "Microsoft", then find all terms that has "app" as
prefix using IndexReader.terms(Term), and use PhrasePrefixQuery.add(Term[]
terms) to add them to the query.
Known Issues: the method toString() assumes that the first term in a array
of terms is the prefix for the whole array. That might not necessarily be
so.
MultipleTermPositions
MultipleTermPositions.java is a class that implements the TermPositions
interface, and behaves like a single TermPosition would do iterating through
<doc, freq, <pos1, pos2, .. , posn>> tuples, but it handles multiple
TermPositions at once by keeping them in a queue.
Using this class, it was easy to write PhrasePrefixQuery reusing the
ExactPhraseScorer and SloppyPhraseScorer directly.
Known Issues: Doesn't fully implement the TermDocs interface, leaving
read(int[], int[]) and seek(Term) unsupported.
Other Comments
It would possibly be a good idea for IndexReader to have a method
IndexReader.termPositions(Term[]) which could return a
MultipleTermPositions-object. But for now the constructor takes an
IndexReader and an array of Terms.
Also, to fully integrate this into Lucene, code would have to be added to
QueryParser that handles queries of this type.
regards,
Anders Nielsen
Attached are two classes that handles Phrase queries of the form "microsoft
app*" where app* is supposed to match all words starting with "app".
It's also possible to handle queries where the prefix-term(s) are in the
middle of the phrase like so: "Microsoft* app*"
---
PhrasePrefixQuery
PhrasePrefixQuery.java is a generalized version of PhraseQuery.java, with an
added method add(Term[]).
To use this class, to search for the phrase "Microsoft app*" first use
add(Term) on the term "Microsoft", then find all terms that has "app" as
prefix using IndexReader.terms(Term), and use PhrasePrefixQuery.add(Term[]
terms) to add them to the query.
Known Issues: the method toString() assumes that the first term in a array
of terms is the prefix for the whole array. That might not necessarily be
so.
MultipleTermPositions
MultipleTermPositions.java is a class that implements the TermPositions
interface, and behaves like a single TermPosition would do iterating through
<doc, freq, <pos1, pos2, .. , posn>> tuples, but it handles multiple
TermPositions at once by keeping them in a queue.
Using this class, it was easy to write PhrasePrefixQuery reusing the
ExactPhraseScorer and SloppyPhraseScorer directly.
Known Issues: Doesn't fully implement the TermDocs interface, leaving
read(int[], int[]) and seek(Term) unsupported.
Other Comments
It would possibly be a good idea for IndexReader to have a method
IndexReader.termPositions(Term[]) which could return a
MultipleTermPositions-object. But for now the constructor takes an
IndexReader and an array of Terms.
Also, to fully integrate this into Lucene, code would have to be added to
QueryParser that handles queries of this type.
regards,
Anders Nielsen