Mailing List Archive

Filtering results
I would like to exclude certain documents from my search and naturally chose
to extend Filter.

I'm not too clear on the proper usage of Terms, TermEnum and TermDocs, and
though DateFilter has an example of how to implement a Filter, I'm none the
wiser on the proper way of filtering.

My document objects contain a field (called IDENTIFIER) which I would like
to retrieve the value of, in order to compare it a list of allowed values to
determine if the document should be excluded. Initially, I tried to
implement it by iterating through all documents (reader.doc(i)), then
retrieving the value of IDENTIFIER (doc.get(IDENTIFIER)). It works, the
trouble is that I've no way of accessing the document number from within
IndexReader (which is needed to set the BitSet within the bits() method). It
seems to be available from only within TermEnum and TermDocs...

Of course, I probably haven't performed the filtering correctly, so would
appreciate any help on it...

Regards,
Kelvin Tan

Relevanz Pte Ltd
http://www.relevanz.com

180B Bencoolen St.
The Bencoolen, #04-01
S(189648)

Tel: 238 6229
Fax: 337 4417


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Filtering results [ In reply to ]
Kelvin,

>I would like to exclude certain documents from my search and naturally chose
>to extend Filter.
>
>I'm not too clear on the proper usage of Terms, TermEnum and TermDocs, and
>though DateFilter has an example of how to implement a Filter, I'm none the
>wiser on the proper way of filtering.
>
>My document objects contain a field (called IDENTIFIER) which I would like
>to retrieve the value of, in order to compare it a list of allowed values to
>determine if the document should be excluded. Initially, I tried to
>implement it by iterating through all documents (reader.doc(i)), then
>retrieving the value of IDENTIFIER (doc.get(IDENTIFIER)). It works, the
>trouble is that I've no way of accessing the document number from within
>IndexReader (which is needed to set the BitSet within the bits() method). It
>seems to be available from only within TermEnum and TermDocs...
>
>Of course, I probably haven't performed the filtering correctly, so would
>appreciate any help on it...

It works so you did it correctly.
You can also use IndexReader.termDocs( Term('IDENTIFIER', someAllowedValue))
and enumerate the resulting TermDocs. It will get you to the doc nr you need
to set in the BitSet.

Good luck,
Ype

--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Filtering results [ In reply to ]
Ype,

Thanks. :)

Got it using IndexReader.termDocs( Term('IDENTIFIER', someAllowedValue)) as you said...

I kinda got misled by the use of TermEnum in DateFilter...

Kelvin
----- Original Message -----
From: Ype Kingma
To: Lucene Users List
Sent: Wednesday, February 06, 2002 4:19 AM
Subject: Re: Filtering results


Kelvin,

>I would like to exclude certain documents from my search and naturally chose
>to extend Filter.
>
>I'm not too clear on the proper usage of Terms, TermEnum and TermDocs, and
>though DateFilter has an example of how to implement a Filter, I'm none the
>wiser on the proper way of filtering.
>
>My document objects contain a field (called IDENTIFIER) which I would like
>to retrieve the value of, in order to compare it a list of allowed values to
>determine if the document should be excluded. Initially, I tried to
>implement it by iterating through all documents (reader.doc(i)), then
>retrieving the value of IDENTIFIER (doc.get(IDENTIFIER)). It works, the
>trouble is that I've no way of accessing the document number from within
>IndexReader (which is needed to set the BitSet within the bits() method). It
>seems to be available from only within TermEnum and TermDocs...
>
>Of course, I probably haven't performed the filtering correctly, so would
>appreciate any help on it...

It works so you did it correctly.
You can also use IndexReader.termDocs( Term('IDENTIFIER', someAllowedValue))
and enumerate the resulting TermDocs. It will get you to the doc nr you need
to set in the BitSet.

Good luck,
Ype

--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>