Mailing List Archive

Question about max clause counts
Hello,

I have a question about Lucene's max clause counts limit, exposed in
BooleanQuery::setMaxClauseCount (and now IndexSearcher).

The recommendation seems to be that these limits shouldn't be modified, but
instead more efficient queries should be constructed. Let's say the limits
are bumped to int max or some very high number -- I'm wondering what the
effects of this would be. Would the execution of smaller queries be
affected? Would larger queries execute as efficiently as possible? Or would
some things start to break somewhere?

--Petko
Re: Question about max clause counts [ In reply to ]
Hi Petko,

We have been designing queries and the whole framework for query
execution with the assumption in mind that queries would be
reasonable, so it's hard to tell exactly what would break, but I think
it's expected that queries wouldn't execute in the most efficient way,
CPU-wise, memory-wise and disk-wise. So you would expose your
application to slow queries that might hammer your disk and/or cause
memory pressure if not out-of-memory errors.

On Wed, Jan 12, 2022 at 8:49 PM Petko Minkov <pminkov@gmail.com> wrote:
>
> Hello,
>
> I have a question about Lucene's max clause counts limit, exposed in BooleanQuery::setMaxClauseCount (and now IndexSearcher).
>
> The recommendation seems to be that these limits shouldn't be modified, but instead more efficient queries should be constructed. Let's say the limits are bumped to int max or some very high number -- I'm wondering what the effects of this would be. Would the execution of smaller queries be affected? Would larger queries execute as efficiently as possible? Or would some things start to break somewhere?
>
> --Petko



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Question about max clause counts [ In reply to ]
Thanks for explaining - that makes sense. I see that one of the recommended
approaches for large queries is to use TermInSetQuery. I don't find this in
its docs, but what are its benefits - is it faster, or does it take less
memory?

--Petko

On Thu, Jan 13, 2022 at 1:10 AM Adrien Grand <jpountz@gmail.com> wrote:

> Hi Petko,
>
> We have been designing queries and the whole framework for query
> execution with the assumption in mind that queries would be
> reasonable, so it's hard to tell exactly what would break, but I think
> it's expected that queries wouldn't execute in the most efficient way,
> CPU-wise, memory-wise and disk-wise. So you would expose your
> application to slow queries that might hammer your disk and/or cause
> memory pressure if not out-of-memory errors.
>
> On Wed, Jan 12, 2022 at 8:49 PM Petko Minkov <pminkov@gmail.com> wrote:
> >
> > Hello,
> >
> > I have a question about Lucene's max clause counts limit, exposed in
> BooleanQuery::setMaxClauseCount (and now IndexSearcher).
> >
> > The recommendation seems to be that these limits shouldn't be modified,
> but instead more efficient queries should be constructed. Let's say the
> limits are bumped to int max or some very high number -- I'm wondering what
> the effects of this would be. Would the execution of smaller queries be
> affected? Would larger queries execute as efficiently as possible? Or would
> some things start to break somewhere?
> >
> > --Petko
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Question about max clause counts [ In reply to ]
TermsInSetQuery has a completely different execution model that will
consume the postings of the sub queries in a bitset instead of merging
them on the fly using a heap. It may be faster or slower depending on
the case, the main benefit is that it has bounded memory usage (though
this bounded memory usage is high because of the bitset) and performs
sequential I/O so that it will not hammer your I/O to run the query.

On Fri, Jan 14, 2022 at 12:23 AM Petko Minkov <pminkov@gmail.com> wrote:
>
> Thanks for explaining - that makes sense. I see that one of the recommended approaches for large queries is to use TermInSetQuery. I don't find this in its docs, but what are its benefits - is it faster, or does it take less memory?
>
> --Petko
>
> On Thu, Jan 13, 2022 at 1:10 AM Adrien Grand <jpountz@gmail.com> wrote:
>>
>> Hi Petko,
>>
>> We have been designing queries and the whole framework for query
>> execution with the assumption in mind that queries would be
>> reasonable, so it's hard to tell exactly what would break, but I think
>> it's expected that queries wouldn't execute in the most efficient way,
>> CPU-wise, memory-wise and disk-wise. So you would expose your
>> application to slow queries that might hammer your disk and/or cause
>> memory pressure if not out-of-memory errors.
>>
>> On Wed, Jan 12, 2022 at 8:49 PM Petko Minkov <pminkov@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I have a question about Lucene's max clause counts limit, exposed in BooleanQuery::setMaxClauseCount (and now IndexSearcher).
>> >
>> > The recommendation seems to be that these limits shouldn't be modified, but instead more efficient queries should be constructed. Let's say the limits are bumped to int max or some very high number -- I'm wondering what the effects of this would be. Would the execution of smaller queries be affected? Would larger queries execute as efficiently as possible? Or would some things start to break somewhere?
>> >
>> > --Petko
>>
>>
>>
>> --
>> Adrien
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>


--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org