Do we have a way to understand how BooleanQuery (and other composite
queries) are advancing their child queries? For example, a simple
conjunction of two queries advances the more restrictive (lower
cost()) query first, enabling the more costly query to skip over more
documents. But we may not be making the best choice in every case, and
I would like to know, for some query, how we are doing. For example,
we could execute in a debugging mode, interposing something that wraps
or observes the Scorers in some way, gathering statistics about how
many documents are visited by each Scorer, which can be aggregated for
later analysis.
This is motivated by a use case we have in which we currently
post-filter our query results in a custom collector using some filters
that we know to be expensive (they must be evaluated on every
document), but we would rather express these post-filters as Queries
and have them advanced during the main Query execution. However when
we tried to do that, we saw some slowdowns (in spite of marking these
Queries as high-cost) and I suspect it is due to the iteration order,
but I'm not sure how to debug.
Suggestions welcome!
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
queries) are advancing their child queries? For example, a simple
conjunction of two queries advances the more restrictive (lower
cost()) query first, enabling the more costly query to skip over more
documents. But we may not be making the best choice in every case, and
I would like to know, for some query, how we are doing. For example,
we could execute in a debugging mode, interposing something that wraps
or observes the Scorers in some way, gathering statistics about how
many documents are visited by each Scorer, which can be aggregated for
later analysis.
This is motivated by a use case we have in which we currently
post-filter our query results in a custom collector using some filters
that we know to be expensive (they must be evaluated on every
document), but we would rather express these post-filters as Queries
and have them advanced during the main Query execution. However when
we tried to do that, we saw some slowdowns (in spite of marking these
Queries as high-cost) and I suspect it is due to the iteration order,
but I'm not sure how to debug.
Suggestions welcome!
-Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org