Mailing List Archive

Term ordering for IndexReader.termDocs()
Hello,

I'm creating a filter from a set of terms that are read from
a file, and I find that IndexReader.termDocs(Term(fieldName, valueFromFile))
does this quite well (around 0.1 secs elapsed time in jython code.)

Would it be advantageous to sort the values from the file before
using them in this way? This could help to reduce the nr. of disk seeks,
but I have no idea about the way the segments are organized on disk.

I did not yet profile this, because I have only tried it with less then
100 terms on a relatively small index. I wonder whether performance
it still as good at say 20000 terms.

Thanks in advance,
Ype Kingma

--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Term ordering for IndexReader.termDocs() [ In reply to ]
> From: Ype Kingma [mailto:ykingma@xs4all.nl]
>
> I'm creating a filter from a set of terms that are read from
> a file, and I find that IndexReader.termDocs(Term(fieldName,
> valueFromFile))
> does this quite well (around 0.1 secs elapsed time in jython code.)
>
> Would it be advantageous to sort the values from the file before
> using them in this way?

Yes, that would be faster. The term dictionary is sorted and this would
reduce both i/o and computation.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Term ordering for IndexReader.termDocs() [ In reply to ]
Doug,

> > From: Ype Kingma [mailto:ykingma@xs4all.nl]
>>
>> I'm creating a filter from a set of terms that are read from
>> a file, and I find that IndexReader.termDocs(Term(fieldName,
>> valueFromFile))
>> does this quite well (around 0.1 secs elapsed time in jython code.)
>>
>> Would it be advantageous to sort the values from the file before
>> using them in this way?
>
>Yes, that would be faster. The term dictionary is sorted and this would
>reduce both i/o and computation.

Thanks. I suppose it would be correct to assume that the sorting order
is java.lang.String.compareTo() ?

Regards,
Ype
--

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>