Hi folks-
I'm reaching out to understand if there's been any past exploration into
alternative concurrent DrillSideways execution approaches. My understanding
of the current approach is that we're achieving some concurrency by using a
CollectorManager with IndexSearcher (allowing parallel execution across the
shards) but also collecting the different facet results by executing N
separate drill down queries, where N is the number of drill downs applied,
each with one of the drill down restrictions removed. This approach seems
like it would do a large amount of duplicate computational work when
executing these queries (e.g., just think of the base query component of
each drill down query being executing N times).
Michael McCandless brought up
<https://issues.apache.org/jira/browse/LUCENE-7588> an alternate approach
of sticking with the existing "doc at a time" methodology (rather than
implementing this "query at a time" approach), but it's not clear to me if
it was explored further. It seems to me like the latency regression of "doc
at a time" would likely be fairly small but the overall computation for
these searches may drop significantly. Is there any more history on this
approach that folks are aware of, or any thoughts on whether-or-not it
would be valuable to explore a "doc at a time" approach (essentially create
a single DrillSidewaysQuery and hand that off to IndexSearcher with the
CollectorManager instead of scheduling N IndexSearcher searches as is done
today)?
Thanks in advance for any thoughts/info/discussion!
Cheers,
-Greg
I'm reaching out to understand if there's been any past exploration into
alternative concurrent DrillSideways execution approaches. My understanding
of the current approach is that we're achieving some concurrency by using a
CollectorManager with IndexSearcher (allowing parallel execution across the
shards) but also collecting the different facet results by executing N
separate drill down queries, where N is the number of drill downs applied,
each with one of the drill down restrictions removed. This approach seems
like it would do a large amount of duplicate computational work when
executing these queries (e.g., just think of the base query component of
each drill down query being executing N times).
Michael McCandless brought up
<https://issues.apache.org/jira/browse/LUCENE-7588> an alternate approach
of sticking with the existing "doc at a time" methodology (rather than
implementing this "query at a time" approach), but it's not clear to me if
it was explored further. It seems to me like the latency regression of "doc
at a time" would likely be fairly small but the overall computation for
these searches may drop significantly. Is there any more history on this
approach that folks are aware of, or any thoughts on whether-or-not it
would be valuable to explore a "doc at a time" approach (essentially create
a single DrillSidewaysQuery and hand that off to IndexSearcher with the
CollectorManager instead of scheduling N IndexSearcher searches as is done
today)?
Thanks in advance for any thoughts/info/discussion!
Cheers,
-Greg