Mailing List Archive

IndexOrDocValuesQuery vs. "Index or Nothing Query"?
Hi folks-

I've got what I suspect is a fairly uncommon use-case, but I wanted to
reach out to this group to see if it resonates with anyone else. I'll avoid
going into all the details for now to keep this email terse and to the
point, but I'm happy to elaborate further on the use-case if helpful.

Does anyone have a use-case for an "index or nothing" query that behaves
similarly to IndexOrDocValuesQuery but has a no-op query as the doc-values
query (i.e., MatchAllDocsQuery)? We have a fairly common use-case in
Amazon's Product Search engine where we want to optionally—based on query
heuristics—add a numeric range query, using the points index, to the
first-phase execution. We don't ever want to add a doc values-based
approximation though. Essentially, we have this separate way of doing more
costly post-filtering that we don't directly model as a single doc
values-based query. In cases where our numeric range approximation is very
selective, we'd like to add a points-based index query to the approximation
phase to help narrow down candidates, but in cases where our numeric range
is not very selective (relative to the rest of the query), we'd like to
skip using an approximation clause altogether.

We used to use IndexOrDocValuesQuery for this with a MatchAllDocsQuery
provided as the doc values-based query, but the behavior changed in Lucene
9.1 (GH#715 <https://github.com/apache/lucene/pull/715>) to rewrite the
query into a MatchAllDocsQuery if either provided query were MatchAllDocs,
which kills our ability to do this. I understand why we would assume that
the both queries provided to IndexOrDocValuesQuery are functionally
equivalent (and rewrite if either is MatchAllDocs), but it leads us down a
path of implementing something similar but with a well-understood
MatchAllDocs fallback behavior.

I'm guessing the answer is "no," but does anyone have a similar use-case to
this? Would there be any interest in an IndexOrNothingQuery (maybe in
sandbox)? Does anyone have other use-cases for IndexOrDocValuesQuery where
the two provided queries are _not_ functionally equivalent by design?

Cheers,
-Greg