Mailing List Archive

Datefiltering performance issues
Hi. I am experiencing some performance issues with the
Datefilter. Basically, I'm searching an index with
around 200000 documents. I've got several threads
sharing the same IndexReader object. A single thread
searching and date filtering the index returns in
about 300ms. If 5 threads are performing searches
simultaneously, date filtering (mainly the creation of
the bitset of documents matching the date criteria I'm
passing) takes around 8s ! With 10 threads,
performance drops to 30s per query !
My investigations led me to the get(Term term) method
of the TermInfoReader. If I'm right (which I'm not
sure of at all...), this method is synchronized and
each thread has to call it for each date term within
the date bounds. So, it looks like there is some
contention here... If I understand well, this method
is synchronized because there is only one single
instance of TermInfoReader per SegmentReader, so each
thread shares the same TermEnum for date filtering.
Does anybody have any idea on how I could make
Datefiletering faster ?

Any help is welcomed ! Thanks,

Sylvain

___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: Datefiltering performance issues [ In reply to ]
If you're basically looking for a query, try using a RangeQuery instead of a
Filter. I think a filter is really best used if you are doing multiple
queries on a subset of your data that you can create a filter for.

Scott

P.S. This question really should have been asked on the *users* list, not
the developers list.

> -----Original Message-----
> From: Sylvain Puccianti [mailto:spuccianti@yahoo.fr]
> Sent: Thursday, June 20, 2002 4:28 PM
> To: lucene-dev@jakarta.apache.org
> Subject: Datefiltering performance issues
>
>
> Hi. I am experiencing some performance issues with the
> Datefilter. Basically, I'm searching an index with
> around 200000 documents. I've got several threads
> sharing the same IndexReader object. A single thread
> searching and date filtering the index returns in
> about 300ms. If 5 threads are performing searches
> simultaneously, date filtering (mainly the creation of
> the bitset of documents matching the date criteria I'm
> passing) takes around 8s ! With 10 threads,
> performance drops to 30s per query !
> My investigations led me to the get(Term term) method
> of the TermInfoReader. If I'm right (which I'm not
> sure of at all...), this method is synchronized and
> each thread has to call it for each date term within
> the date bounds. So, it looks like there is some
> contention here... If I understand well, this method
> is synchronized because there is only one single
> instance of TermInfoReader per SegmentReader, so each
> thread shares the same TermEnum for date filtering.
> Does anybody have any idea on how I could make
> Datefiletering faster ?
>
> Any help is welcomed ! Thanks,
>
> Sylvain
>
> ___________________________________________________________
> Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
> Yahoo! Mail : http://fr.mail.yahoo.com
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Datefiltering performance issues [ In reply to ]
What version of Lucene are you using? There was a patch made in January
to address multi-threaded performance of DateFilter.

Doug


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Datefiltering performance issues [ In reply to ]
Thanks for the quick answer !
I've just downloaded the 1.2 release jar, and my test
gives me the same results. The more threads I've got,
the slower Datefiltering gets (performance degradation
is almost exponential).
I tried to use the RangeQuery, as advised by Scott
Ganyo, but it does not work very well. RangeQuery
creates a TermQuery for each term within lowerTerm and
higherTerm. If my range is too high, as I've got
thoushands of documents, it just blows up memory...
Is there any way to avoid sharing the TermInfosReader
between all threads when creating the Bitset, or
somehow avoid synchronizing the get method (if it is
actually the bottleneck here) ?

Thanks,

Sylvain

--- Doug Cutting <cutting@lucene.com> a écrit : > What
version of Lucene are you using? There was a
> patch made in January
> to address multi-threaded performance of DateFilter.
>
> Doug
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>

___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Datefiltering performance issues [ In reply to ]
In a former life (not with Lucene), I've handled this range problem by
indexing the dates in multiple pieces (YYYY, YYYYMM, YYYYMMDD) and then
at query time constructed multiple ranges to cover what the user wanted:

So,
[19990323 20020612]
becomes:
[19990323 19990331] AND
[199904 199912] AND
[2000 2001] AND
[200201 200205] AND
[20020601 20020612]

(I may have my lucene query syntax mussed up here, but hopefully my
intention is clear)

This dramatically limits the number of terms that need to be evaluated.
(at the expense of larger index size) Also, the 3 term types also need
to be in separate "fields" (or prefixed) so that the ranges only include
one type.

The same trick can be played with non-dates by taking using a 2 word
prefix. ("dog" gets indexed as "dog" and "do") Obviously care should
be taken as to what fields have this extra indexing done. (probably
just Keyword)

It's an idea anyway...

- matt

On Friday, June 21, 2002, at 01:35 PM, Sylvain Puccianti wrote:

> Thanks for the quick answer !
> I've just downloaded the 1.2 release jar, and my test
> gives me the same results. The more threads I've got,
> the slower Datefiltering gets (performance degradation
> is almost exponential).
> I tried to use the RangeQuery, as advised by Scott
> Ganyo, but it does not work very well. RangeQuery
> creates a TermQuery for each term within lowerTerm and
> higherTerm. If my range is too high, as I've got
> thoushands of documents, it just blows up memory...
> Is there any way to avoid sharing the TermInfosReader
> between all threads when creating the Bitset, or
> somehow avoid synchronizing the get method (if it is
> actually the bottleneck here) ?
>
> Thanks,
>
> Sylvain
>
> --- Doug Cutting <cutting@lucene.com> a écrit : > What
> version of Lucene are you using? There was a
>> patch made in January
>> to address multi-threaded performance of DateFilter.
>>
>> Doug
>>
>>
>> --
>> To unsubscribe, e-mail:
>> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
>> For additional commands, e-mail:
>> <mailto:lucene-dev-help@jakarta.apache.org>
>>
>
> ___________________________________________________________
> Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
> Yahoo! Mail : http://fr.mail.yahoo.com
>
> --
> To unsubscribe, e-mail: <mailto:lucene-dev-
> unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-
> help@jakarta.apache.org>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: Datefiltering performance issues [ In reply to ]
I have seen the same DateFilter performance issues with a 1.3 million
document index using JMeter benchmarking on the 1.2 final release.
RangeQuery seems to take 5-10 times as long to return results.

Aside from that Lucene performance stomps our Verity install.

--jon

Jonathan Pace
Sr Programmer/Analyst
FedEx Services
60 FedEx Pkwy
1st Floor Horiz
Collierville, AR 38017


----- Original Message -----
From: Sylvain Puccianti <spuccianti@yahoo.fr>
Date: Friday, June 21, 2002 12:35 pm
Subject: Re: Datefiltering performance issues

> Thanks for the quick answer !
> I've just downloaded the 1.2 release jar, and my test
> gives me the same results. The more threads I've got,
> the slower Datefiltering gets (performance degradation
> is almost exponential).
> I tried to use the RangeQuery, as advised by Scott
> Ganyo, but it does not work very well. RangeQuery
> creates a TermQuery for each term within lowerTerm and
> higherTerm. If my range is too high, as I've got
> thoushands of documents, it just blows up memory...
> Is there any way to avoid sharing the TermInfosReader
> between all threads when creating the Bitset, or
> somehow avoid synchronizing the get method (if it is
> actually the bottleneck here) ?
>
> Thanks,
>
> Sylvain
>
> --- Doug Cutting <cutting@lucene.com> a écrit : > What
> version of Lucene are you using? There was a
> > patch made in January
> > to address multi-threaded performance of DateFilter.
> >
> > Doug
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-dev-help@jakarta.apache.org>
> >
>
> ___________________________________________________________
> Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
> Yahoo! Mail : http://fr.mail.yahoo.com
>
> --
> To unsubscribe, e-mail: <mailto:lucene-dev-
> unsubscribe@jakarta.apache.org>For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>
>