Mailing List Archive: Alternate concurrent DrillSideways approach?

Alternate concurrent DrillSideways approach?

Feb 15, 2021, 9:04 AM

Post #1 of 3 (227 views)

Hi folks-

I'm reaching out to understand if there's been any past exploration into
alternative concurrent DrillSideways execution approaches. My understanding
of the current approach is that we're achieving some concurrency by using a
CollectorManager with IndexSearcher (allowing parallel execution across the
shards) but also collecting the different facet results by executing N
separate drill down queries, where N is the number of drill downs applied,
each with one of the drill down restrictions removed. This approach seems
like it would do a large amount of duplicate computational work when
executing these queries (e.g., just think of the base query component of
each drill down query being executing N times).

Michael McCandless brought up
<https://issues.apache.org/jira/browse/LUCENE-7588> an alternate approach
of sticking with the existing "doc at a time" methodology (rather than
implementing this "query at a time" approach), but it's not clear to me if
it was explored further. It seems to me like the latency regression of "doc
at a time" would likely be fairly small but the overall computation for
these searches may drop significantly. Is there any more history on this
approach that folks are aware of, or any thoughts on whether-or-not it
would be valuable to explore a "doc at a time" approach (essentially create
a single DrillSidewaysQuery and hand that off to IndexSearcher with the
CollectorManager instead of scheduling N IndexSearcher searches as is done
today)?

Thanks in advance for any thoughts/info/discussion!

Cheers,
-Greg

Re: Alternate concurrent DrillSideways approach? [ In reply to ]

lucene at mikemccandless

Feb 23, 2021, 1:32 PM

Post #2 of 3 (225 views)

Permalink

Hi Greg,

As far as I know nobody has experimented any further with concurrent
implementation for drill sideways. Patches welcome!

I would be curious to know how those two concurrent solutions we support
today compare with the serial performance of DrillSidewaysQuery. The
redundant work is indeed frustrating and was the original motivation for
creating DrillSidewaysQuery in the first place.

Mike McCandless

http://blog.mikemccandless.com

On Mon, Feb 15, 2021 at 12:05 PM Greg Miller <gsmiller@gmail.com> wrote:

> Hi folks-
>
> I'm reaching out to understand if there's been any past exploration into
> alternative concurrent DrillSideways execution approaches. My understanding
> of the current approach is that we're achieving some concurrency by using a
> CollectorManager with IndexSearcher (allowing parallel execution across the
> shards) but also collecting the different facet results by executing N
> separate drill down queries, where N is the number of drill downs applied,
> each with one of the drill down restrictions removed. This approach seems
> like it would do a large amount of duplicate computational work when
> executing these queries (e.g., just think of the base query component of
> each drill down query being executing N times).
>
> Michael McCandless brought up
> <https://issues.apache.org/jira/browse/LUCENE-7588> an alternate approach
> of sticking with the existing "doc at a time" methodology (rather than
> implementing this "query at a time" approach), but it's not clear to me if
> it was explored further. It seems to me like the latency regression of "doc
> at a time" would likely be fairly small but the overall computation for
> these searches may drop significantly. Is there any more history on this
> approach that folks are aware of, or any thoughts on whether-or-not it
> would be valuable to explore a "doc at a time" approach (essentially create
> a single DrillSidewaysQuery and hand that off to IndexSearcher with the
> CollectorManager instead of scheduling N IndexSearcher searches as is done
> today)?
>
> Thanks in advance for any thoughts/info/discussion!
>
> Cheers,
> -Greg
>

Re: Alternate concurrent DrillSideways approach? [ In reply to ]

gsmiller at gmail

Feb 23, 2021, 7:51 PM

Post #3 of 3 (225 views)

Permalink

Thanks Mike! I'll follow up with results once I have an opportunity to test
out an alternate approach.

DrillSidewaysQuery certainly looks setup to avoid lots of duplicate work,
and it was a little surprising to find such a different approach in the
concurrent version. That said, the code is certainly much simpler to run a
bunch of DrillSidewaysQueries in parallel!

Cheers,
-Greg

On Tue, Feb 23, 2021 at 1:32 PM Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hi Greg,
>
> As far as I know nobody has experimented any further with concurrent
> implementation for drill sideways. Patches welcome!
>
> I would be curious to know how those two concurrent solutions we support
> today compare with the serial performance of DrillSidewaysQuery. The
> redundant work is indeed frustrating and was the original motivation for
> creating DrillSidewaysQuery in the first place.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Feb 15, 2021 at 12:05 PM Greg Miller <gsmiller@gmail.com> wrote:
>
>> Hi folks-
>>
>> I'm reaching out to understand if there's been any past exploration into
>> alternative concurrent DrillSideways execution approaches. My understanding
>> of the current approach is that we're achieving some concurrency by using a
>> CollectorManager with IndexSearcher (allowing parallel execution across the
>> shards) but also collecting the different facet results by executing N
>> separate drill down queries, where N is the number of drill downs applied,
>> each with one of the drill down restrictions removed. This approach seems
>> like it would do a large amount of duplicate computational work when
>> executing these queries (e.g., just think of the base query component of
>> each drill down query being executing N times).
>>
>> Michael McCandless brought up
>> <https://issues.apache.org/jira/browse/LUCENE-7588> an alternate
>> approach of sticking with the existing "doc at a time" methodology (rather
>> than implementing this "query at a time" approach), but it's not clear to
>> me if it was explored further. It seems to me like the latency regression
>> of "doc at a time" would likely be fairly small but the overall computation
>> for these searches may drop significantly. Is there any more history on
>> this approach that folks are aware of, or any thoughts on whether-or-not it
>> would be valuable to explore a "doc at a time" approach (essentially create
>> a single DrillSidewaysQuery and hand that off to IndexSearcher with the
>> CollectorManager instead of scheduling N IndexSearcher searches as is done
>> today)?
>>
>> Thanks in advance for any thoughts/info/discussion!
>>
>> Cheers,
>> -Greg
>>
>