Mailing List Archive

Filter and FilteredQuery replacements
Hi there,
Hopefully this is the right audience for my question. I'm a developer
working on an effort to upgrade our Java app from Lucene 5 to Lucene 8 (or
later). While doing investigation into changes in these versions the main
thing that I'm struggling with is how to replace our current usage of
org.apache.lucene.search.Filter as we use this class pretty heavily, and
this class was previously deprecated and has been removed after version 5.

I've looked at the migration guide for Lucene 6
<https://lucene.apache.org/core/6_5_1/MIGRATE.html> and javadocs and I'm
just not understanding the intended path to migrate away from using Filter
and FilteredQuery. In the migration guide I see:
Removal of Filter and FilteredQuery (LUCENE-6301
<https://issues.apache.org/jira/browse/LUCENE-6301>,LUCENE-6583
<https://issues.apache.org/jira/browse/LUCENE-6583>)

Filter and FilteredQuery have been removed. Regular queries can be used
instead of filters as they have been optimized for the filtering case. And
you can construct a BooleanQuery with one MUST clause for the query, and
one FILTER clause for the filter in order to have similar behaviour to
FilteredQuery.

It is my understanding that in older versions of Lucene filters were
similar to queries, except they didn't participate in scoring, and I can
see now how to generate a query that doesn't apply to scoring by using a
BooleanQuery with the BooleanClause.Occur.FILTER option to effectively get
the same behavior. So that makes sense.

But another capability of the Filter class was the ability to provide a
DocIdSet to indicate which documents should be permitted in search results

Filter.getDocIdSet() :
<https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/Filter.html#getDocIdSet(org.apache.lucene.index.LeafReaderContext,%20org.apache.lucene.util.Bits)>

public abstract DocIdSet getDocIdSet(LeafReaderContext context, Bits
acceptDocs) throws IOException;


We use this low-level capability to do filtering of documents based on
BitSets we populate in application code and then convert to DocIdSets when
running Lucene queries in certain contexts. We have an extension of the
Filter class that does exactly this, and it's pretty straighforward. Now
that Filter has been removed, is there a suggested Query implementation to
use that would provide similar behavior? I've looked at the implementations
of the Query class mentioned here:
https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Query.html
but I'm not seeing any that accept a DocIdSet or BitSet, or would be
relevant to the use case I described above. I've also looked at stack
overflow and other forums online to get insight into this problem but to no
avail.
If there's no existing Query implementation relevant to this use case,
would you suggest I write my own Query implementation similar to the old
FilteredQuery? Or might there be a better way to go about solving this
problem that scales better and is performant? We basically want to apply a
BitSet filter to every Lucene query that a user performs in certain
contexts. We have the ability to quickly and easily populate BitSet
instances representing all of the Lucene Doc IDs in the index, with
the bits turned on for those documents we want to be included in search
results.

If this has already been answered in a forum post, I apologize. Or if
there's a Lucene specific forum somewhere I could look at, if you could
kindly point me there, I would appreciate it.

Any help/insight is greatly appreciated.

Thanks,
Scott Robey
Re: Filter and FilteredQuery replacements [ In reply to ]
Hello, Scott.

I've found such straightforward implementation
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L512
and a more space efficient one
https://github.com/apache/lucene/blob/d6dbe4374a5229b827613b85066f3a4da91d5f27/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L531
If you use those snippents, you need to bother about segmentation yourself.
And here is an utility
https://github.com/apache/lucene/blob/main/lucene/join/src/java/org/apache/lucene/search/join/QueryBitSetProducer.java
this one produces top-level bitset over all segments.

On Tue, Jul 12, 2022 at 12:44 AM Scotter <scottrobey@gmail.com> wrote:

> Hi there,
> Hopefully this is the right audience for my question. I'm a developer
> working on an effort to upgrade our Java app from Lucene 5 to Lucene 8 (or
> later). While doing investigation into changes in these versions the main
> thing that I'm struggling with is how to replace our current usage of
> org.apache.lucene.search.Filter as we use this class pretty heavily, and
> this class was previously deprecated and has been removed after version 5.
>
> I've looked at the migration guide for Lucene 6
> <https://lucene.apache.org/core/6_5_1/MIGRATE.html> and javadocs and I'm
> just not understanding the intended path to migrate away from using Filter
> and FilteredQuery. In the migration guide I see:
> Removal of Filter and FilteredQuery (LUCENE-6301
> <https://issues.apache.org/jira/browse/LUCENE-6301>,LUCENE-6583
> <https://issues.apache.org/jira/browse/LUCENE-6583>)
>
> Filter and FilteredQuery have been removed. Regular queries can be used
> instead of filters as they have been optimized for the filtering case. And
> you can construct a BooleanQuery with one MUST clause for the query, and
> one FILTER clause for the filter in order to have similar behaviour to
> FilteredQuery.
>
> It is my understanding that in older versions of Lucene filters were
> similar to queries, except they didn't participate in scoring, and I can
> see now how to generate a query that doesn't apply to scoring by using a
> BooleanQuery with the BooleanClause.Occur.FILTER option to effectively get
> the same behavior. So that makes sense.
>
> But another capability of the Filter class was the ability to provide a
> DocIdSet to indicate which documents should be permitted in search results
>
> Filter.getDocIdSet() :
> <
> https://lucene.apache.org/core/5_5_0/core/org/apache/lucene/search/Filter.html#getDocIdSet(org.apache.lucene.index.LeafReaderContext,%20org.apache.lucene.util.Bits)
> >
>
> public abstract DocIdSet getDocIdSet(LeafReaderContext context, Bits
> acceptDocs) throws IOException;
>
>
> We use this low-level capability to do filtering of documents based on
> BitSets we populate in application code and then convert to DocIdSets when
> running Lucene queries in certain contexts. We have an extension of the
> Filter class that does exactly this, and it's pretty straighforward. Now
> that Filter has been removed, is there a suggested Query implementation to
> use that would provide similar behavior? I've looked at the implementations
> of the Query class mentioned here:
>
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Query.html
> but I'm not seeing any that accept a DocIdSet or BitSet, or would be
> relevant to the use case I described above. I've also looked at stack
> overflow and other forums online to get insight into this problem but to no
> avail.
> If there's no existing Query implementation relevant to this use case,
> would you suggest I write my own Query implementation similar to the old
> FilteredQuery? Or might there be a better way to go about solving this
> problem that scales better and is performant? We basically want to apply a
> BitSet filter to every Lucene query that a user performs in certain
> contexts. We have the ability to quickly and easily populate BitSet
> instances representing all of the Lucene Doc IDs in the index, with
> the bits turned on for those documents we want to be included in search
> results.
>
> If this has already been answered in a forum post, I apologize. Or if
> there's a Lucene specific forum somewhere I could look at, if you could
> kindly point me there, I would appreciate it.
>
> Any help/insight is greatly appreciated.
>
> Thanks,
> Scott Robey
>


--
Sincerely yours
Mikhail Khludnev