Mailing List Archive

Getting the terms for a particular field.
Hi!

I'm trying to get out all of the terms in a field. More specifically,
I'm trying to get a complete list of the UIDs I have indexed.

The best I have come up with so far is to go through all the terms
gotten from the IndexReader.terms() and filter on field(). That works
but fields kinda silly, but I don't know enough of Lucenes internals
to know if it is silly or not. ;-)

I'm using NX/Lucene/PyLucene is that makes a difference.

--
Lennart Regebro, Nuxeo http://www.nuxeo.com/
CPS Content Management http://www.nuxeo.org/
Re: Getting the terms for a particular field. [ In reply to ]
: The best I have come up with so far is to go through all the terms
: gotten from the IndexReader.terms() and filter on field(). That works

you're basically on it, but look at the IndexReader.terms(Term) method
which allows you to start with a specific term, and then bear in mind that
the TermEnum goes in order, which means all of the terms for a single
field will come sequentially, so as soon as you see a field name other
then the one you are interested in, you know you can stop.

if you look at the code for RangeFilter you'll see a good example of
iterating over a TermEnum for a single field ... what you want is
effectively the same the work RangeFilter would do when the bounds are
both null.



-Hoss
Re: Getting the terms for a particular field. [ In reply to ]
On 10/25/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
>
> : The best I have come up with so far is to go through all the terms
> : gotten from the IndexReader.terms() and filter on field(). That works
>
> you're basically on it, but look at the IndexReader.terms(Term) method
> which allows you to start with a specific term, and then bear in mind that
> the TermEnum goes in order, which means all of the terms for a single
> field will come sequentially, so as soon as you see a field name other
> then the one you are interested in, you know you can stop.
>
> if you look at the code for RangeFilter you'll see a good example of
> iterating over a TermEnum for a single field ... what you want is
> effectively the same the work RangeFilter would do when the bounds are
> both null.

That works fine, thanks for the help!

--
Lennart Regebro, Nuxeo http://www.nuxeo.com/
CPS Content Management http://www.nuxeo.org/