Mailing List Archive

[DrillSidewaysScorer] performance degradation
Hello, community,

*Question*
Is it ok if I create a Jira issue and pull-request with the following diff?

*Diff*
@@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer {

collectDocID = docID;

- // TODO: we could score on demand instead since we are
- // daat here:
- collectScore = baseScorer.score();
-
if (failedCollector == null) {
+ collectScore = baseScorer.score();
// Hit passed all filters, so it's "real":
collectHit(collector, dims);
} else {

*Motivation*
1. Performance degradation: we have quite heavy custom implementation of
score(). So when we started using DrillSideways, this call became top-1 in
a profiler snapshot (top-3 with default scoring). We tried doUnionScoring
and doDrillDownAdvanceScoring, but no luck:
doUnionScoring scores all baseQuery docIds
doDrillDownAdvanceScoring avoids some redundant docIds scorings,
considering symmetric difference of top two iterator's docIds, but still
scores some docIds, that will be filtered out by 3rd, 4th, ... dimension
iterators
doQueryFirstScoring scores near-miss docIds
Best way is to score only true hits (where baseQuery and all N drill-down
iterators match). So we suggest a small modification of doQueryFirstScoring.

2. Speaking of doQueryFirstScoring, it doesn't look like we need to
calculate a score for near-miss hit, because it won't be used anywhere.
FacetsCollectorManager creates FacetsCollector with default constructor
https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35
so FacetCollector has false for keepScores
https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119
and collectScore is not being used
https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200

Thank you in advance,
https://hh.ru search team,
Grigoriy Troitskiy.
Re: [DrillSidewaysScorer] performance degradation [ In reply to ]
Interesting. Thanks for raising this Grigoriy! Yes, please open an issue to
track this. It would be nice if we could optimize DrillSidewaysScorer to
not compute the score for "near misses"!

Cheers,
-Greg

On Sun, Jul 18, 2021 at 3:52 PM Grigorii Troitckii <keksi4ek@gmail.com>
wrote:

> Hello, community,
>
> *Question*
> Is it ok if I create a Jira issue and pull-request with the following diff?
>
> *Diff*
> @@ -195,11 +195,8 @@ class DrillSidewaysScorer extends BulkScorer {
>
> collectDocID = docID;
>
> - // TODO: we could score on demand instead since we are
> - // daat here:
> - collectScore = baseScorer.score();
> -
> if (failedCollector == null) {
> + collectScore = baseScorer.score();
> // Hit passed all filters, so it's "real":
> collectHit(collector, dims);
> } else {
>
> *Motivation*
> 1. Performance degradation: we have quite heavy custom implementation of
> score(). So when we started using DrillSideways, this call became top-1 in
> a profiler snapshot (top-3 with default scoring). We tried doUnionScoring
> and doDrillDownAdvanceScoring, but no luck:
> doUnionScoring scores all baseQuery docIds
> doDrillDownAdvanceScoring avoids some redundant docIds scorings,
> considering symmetric difference of top two iterator's docIds, but still
> scores some docIds, that will be filtered out by 3rd, 4th, ... dimension
> iterators
> doQueryFirstScoring scores near-miss docIds
> Best way is to score only true hits (where baseQuery and all N drill-down
> iterators match). So we suggest a small modification of
> doQueryFirstScoring.
>
> 2. Speaking of doQueryFirstScoring, it doesn't look like we need to
> calculate a score for near-miss hit, because it won't be used anywhere.
> FacetsCollectorManager creates FacetsCollector with default constructor
>
> https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollectorManager.java#L35
> so FacetCollector has false for keepScores
>
> https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L119
> and collectScore is not being used
>
> https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L200
>
> Thank you in advance,
> https://hh.ru search team,
> Grigoriy Troitskiy.
>