Hi folks-
I've been spending a little time getting familiar with the BKD-tree-based
range query support currently implemented in Lucene, and wonder if there's
ever been a discussion around supporting two-phase iteration in this space.
If I'm understanding the current implementation properly (specifically
looking at PointRangeQuery), it appears that all matches are determined
up-front by 1) identifying segments of the tree that contain candidate
matches (i.e., containing part of the query range), and then 2) confirming
whether-or-not the contained points actually fall within the range. I'm
also a little low on coffee this morning so it's entirely possible I'm
misunderstanding the current implementation (please correct me if so).
With this approach, it seems like we could potentially be doing quite a bit
of wasted effort in some situations. I have no thoughts on how to actually
implement this yet, but I wonder if we could support two-phase iteration by
1) returning all docs with points contained in candidate BKD-tree segments
as an approximation, and then 2) only checking the points against the query
range when confirming matches in the second phase? I think the idea would
extend to LatLonPointDistanceQuery as well (and maybe others?).
I did a Jira search for a related issue but came up empty. Anyone know if
this idea has been discussed previously, or if there's some inherent flaw
with the approach that would make it a non-starter? I don't really have any
cycles to work on this at the moment, but can at least open a Jira issue to
track if it seems like a reasonable thing to explore.
Cheers,
-Greg
I've been spending a little time getting familiar with the BKD-tree-based
range query support currently implemented in Lucene, and wonder if there's
ever been a discussion around supporting two-phase iteration in this space.
If I'm understanding the current implementation properly (specifically
looking at PointRangeQuery), it appears that all matches are determined
up-front by 1) identifying segments of the tree that contain candidate
matches (i.e., containing part of the query range), and then 2) confirming
whether-or-not the contained points actually fall within the range. I'm
also a little low on coffee this morning so it's entirely possible I'm
misunderstanding the current implementation (please correct me if so).
With this approach, it seems like we could potentially be doing quite a bit
of wasted effort in some situations. I have no thoughts on how to actually
implement this yet, but I wonder if we could support two-phase iteration by
1) returning all docs with points contained in candidate BKD-tree segments
as an approximation, and then 2) only checking the points against the query
range when confirming matches in the second phase? I think the idea would
extend to LatLonPointDistanceQuery as well (and maybe others?).
I did a Jira search for a related issue but came up empty. Anyone know if
this idea has been discussed previously, or if there's some inherent flaw
with the approach that would make it a non-starter? I don't really have any
cycles to work on this at the moment, but can at least open a Jira issue to
track if it seems like a reasonable thing to explore.
Cheers,
-Greg