Mailing List Archive

[Lucene] Selection of threshold
Hi,

While reading Lucene source code, I have a tiny question about the selection of threshold?threshold = value >>> 3.

eg. in NumericComparator#updateCompetitiveIterator(), as 'threshold = iteratorCost >>> 3' a condition for whether to update iterator

eg. in IndexOrDocValuesQuery, as 'threshold = cost() >>> 3' a condition for choosing indexScorerSupplier or dvScorerSupplier

So the selection of threshold base some theory or tradeoff or other reason?

Could I get some suggestion?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [Lucene] Selection of threshold [ In reply to ]
Hi,

This is just a number that proved to work well in practice.

The general idea is that we want to narrow down the set of candidates
periodically in order to speed up query execution. If we do it too often,
then we might spend more time narrowing down the set of candidates than
actually evaluating candidates, and if we don't do it often enough, then
we're still evaluating lots of candidates that have no chance of being
competitive and the query is slow too. What the code samples you shared
mean is that Lucene would only re-evaluate the set of candidates whenever
it seems that we could reduce the number of candidates by 8x.

On Thu, Jul 1, 2021 at 11:57 AM LuXugang <xuganglu@icloud.com.invalid>
wrote:

> Hi,
>
> While reading Lucene source code, I have a tiny question about the
> selection of threshold?threshold = value >>> 3.
>
> eg. in NumericComparator#updateCompetitiveIterator(), as 'threshold =
> iteratorCost >>> 3' a condition for whether to update iterator
>
> eg. in IndexOrDocValuesQuery, as 'threshold = cost() >>> 3' a condition
> for choosing indexScorerSupplier or dvScorerSupplier
>
> So the selection of threshold base some theory or tradeoff or other reason?
>
> Could I get some suggestion?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

--
Adrien
Re: [Lucene] Selection of threshold [ In reply to ]
Thanks for sharing your ideas, Adrien~~

> 2021?7?2? ??1:26?Adrien Grand <jpountz@gmail.com> ???
>
> Hi,
>
> This is just a number that proved to work well in practice.
>
> The general idea is that we want to narrow down the set of candidates periodically in order to speed up query execution. If we do it too often, then we might spend more time narrowing down the set of candidates than actually evaluating candidates, and if we don't do it often enough, then we're still evaluating lots of candidates that have no chance of being competitive and the query is slow too. What the code samples you shared mean is that Lucene would only re-evaluate the set of candidates whenever it seems that we could reduce the number of candidates by 8x.
>
> On Thu, Jul 1, 2021 at 11:57 AM LuXugang <xuganglu@icloud.com.invalid> wrote:
> Hi,
>
> While reading Lucene source code, I have a tiny question about the selection of threshold?threshold = value >>> 3.
>
> eg. in NumericComparator#updateCompetitiveIterator(), as 'threshold = iteratorCost >>> 3' a condition for whether to update iterator
>
> eg. in IndexOrDocValuesQuery, as 'threshold = cost() >>> 3' a condition for choosing indexScorerSupplier or dvScorerSupplier
>
> So the selection of threshold base some theory or tradeoff or other reason?
>
> Could I get some suggestion?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org <mailto:dev-unsubscribe@lucene.apache.org>
> For additional commands, e-mail: dev-help@lucene.apache.org <mailto:dev-help@lucene.apache.org>
>
>
>
> --
> Adrien