Mailing List Archive

Specializing FilterReaders with long-lived behaviour
Continuing down the path of modularizing Elasticsearch we are down to the
last hurdle to eliminate Lucene split packages from the codebase
- LazySoftDeletesDirectoryReaderWrapper [1]. This class is a version of
Lucene's SoftDeletesDirectoryReaderWrapper [2], but very specific to
Elasticsearch's use case.

Similarly to SoftDeletesDirectoryReaderWrapper, we have a long-lived
FilterDirectoryReader and we need a similar DelegatingCacheHelper
behaviour.

To support building long-lived readers without IndexReader to delegate
caching to, would it be acceptable for us to refactor
SoftDeletesDirectoryReaderWrapper, such that its caching behaviour is moved
into specialized versions of the FilterReaders?

For example, one approach would be to extend FilterDirectoryReader [3] into
a LastingFilterDirectoryReader with a baked-in delegating cache behaviour.
Similarly, for the purpose of both SoftDeleteDirectoryReaders
implementations, we'd add a specialization of FilterLeafReader [4] and
FilterCodecReader [5] with the same long-lived cache behaviour.

If this approach is not suitable, is there another better alternative?

Thanks,
Nikola

[1]
https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/apache/lucene/index/LazySoftDeletesDirectoryReaderWrapper.java
[2]
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SoftDeletesDirectoryReaderWrapper.java
[3]
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterDirectoryReader.java
[4]
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterLeafReader.java
[5]
https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterCodecReader.java
Re: Specializing FilterReaders with long-lived behaviour [ In reply to ]
Hi Nikola,

I've been striving to keep the CacheKey constructor private in order
to avoid performance bugs with reader wrappers that only exist for the
lifetime of a single request, so I appreciate that you are thinking of
options that guide users towards only creating custom cache keys on
long-lived readers instead of just making the CacheKey constructor
public.

This proposal sounds interesting, my main concern is that it would add
3 new public classes. Or maybe we could find a way to not have to
actually expose these Filter(Leaf|Codec)Reader specializations?

I know that Yannick wondered if we could make the
DelegatingCacheHelper class public instead. I liked that idea a bit
less at first because it would be a bit easier to mis-use so that we
might get back similar performance bugs. But maybe this concern could
be alleviated by making it a protected class nested under this
LastingFilterDirectoryReader so that it could only be used from
sub-classes. Or maybe under FilterDirectoryReader directly since
"DirectoryReader" already somewhat implies that the reader is
long-lived since its lifetime is supposed to be tied to the Directory.



On Thu, Jan 6, 2022 at 1:56 AM Nikola Grcevski
<nikola.grcevski@elastic.co.invalid> wrote:
>
> Continuing down the path of modularizing Elasticsearch we are down to the last hurdle to eliminate Lucene split packages from the codebase - LazySoftDeletesDirectoryReaderWrapper [1]. This class is a version of Lucene's SoftDeletesDirectoryReaderWrapper [2], but very specific to Elasticsearch's use case.
>
> Similarly to SoftDeletesDirectoryReaderWrapper, we have a long-lived FilterDirectoryReader and we need a similar DelegatingCacheHelper behaviour.
>
> To support building long-lived readers without IndexReader to delegate caching to, would it be acceptable for us to refactor SoftDeletesDirectoryReaderWrapper, such that its caching behaviour is moved into specialized versions of the FilterReaders?
>
> For example, one approach would be to extend FilterDirectoryReader [3] into a LastingFilterDirectoryReader with a baked-in delegating cache behaviour. Similarly, for the purpose of both SoftDeleteDirectoryReaders implementations, we'd add a specialization of FilterLeafReader [4] and FilterCodecReader [5] with the same long-lived cache behaviour.
>
> If this approach is not suitable, is there another better alternative?
>
> Thanks,
> Nikola
>
> [1] https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/apache/lucene/index/LazySoftDeletesDirectoryReaderWrapper.java
> [2] https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SoftDeletesDirectoryReaderWrapper.java
> [3] https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterDirectoryReader.java
> [4] https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterLeafReader.java
> [5] https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterCodecReader.java



--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Specializing FilterReaders with long-lived behaviour [ In reply to ]
Thanks for reviewing this Adrien.

I think your proposal works very well for what I wanted to do. I chose to
move the DelegatingCacheHelper under FilterDirectoryReader, it
seemed simpler and one less public class, but I'm happy to introduce
the new LastingFilterDirectoryReader subclass if that's more appropriate.

I made a PR with the proposed changes here:

https://github.com/apache/lucene/pull/596

Thanks again,
Nikola

On Mon, Jan 10, 2022 at 11:52 AM Adrien Grand <jpountz@gmail.com> wrote:

> Hi Nikola,
>
> I've been striving to keep the CacheKey constructor private in order
> to avoid performance bugs with reader wrappers that only exist for the
> lifetime of a single request, so I appreciate that you are thinking of
> options that guide users towards only creating custom cache keys on
> long-lived readers instead of just making the CacheKey constructor
> public.
>
> This proposal sounds interesting, my main concern is that it would add
> 3 new public classes. Or maybe we could find a way to not have to
> actually expose these Filter(Leaf|Codec)Reader specializations?
>
> I know that Yannick wondered if we could make the
> DelegatingCacheHelper class public instead. I liked that idea a bit
> less at first because it would be a bit easier to mis-use so that we
> might get back similar performance bugs. But maybe this concern could
> be alleviated by making it a protected class nested under this
> LastingFilterDirectoryReader so that it could only be used from
> sub-classes. Or maybe under FilterDirectoryReader directly since
> "DirectoryReader" already somewhat implies that the reader is
> long-lived since its lifetime is supposed to be tied to the Directory.
>
>
>
> On Thu, Jan 6, 2022 at 1:56 AM Nikola Grcevski
> <nikola.grcevski@elastic.co.invalid> wrote:
> >
> > Continuing down the path of modularizing Elasticsearch we are down to
> the last hurdle to eliminate Lucene split packages from the codebase -
> LazySoftDeletesDirectoryReaderWrapper [1]. This class is a version of
> Lucene's SoftDeletesDirectoryReaderWrapper [2], but very specific to
> Elasticsearch's use case.
> >
> > Similarly to SoftDeletesDirectoryReaderWrapper, we have a long-lived
> FilterDirectoryReader and we need a similar DelegatingCacheHelper behaviour.
> >
> > To support building long-lived readers without IndexReader to delegate
> caching to, would it be acceptable for us to refactor
> SoftDeletesDirectoryReaderWrapper, such that its caching behaviour is moved
> into specialized versions of the FilterReaders?
> >
> > For example, one approach would be to extend FilterDirectoryReader [3]
> into a LastingFilterDirectoryReader with a baked-in delegating cache
> behaviour. Similarly, for the purpose of both SoftDeleteDirectoryReaders
> implementations, we'd add a specialization of FilterLeafReader [4] and
> FilterCodecReader [5] with the same long-lived cache behaviour.
> >
> > If this approach is not suitable, is there another better alternative?
> >
> > Thanks,
> > Nikola
> >
> > [1]
> https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/apache/lucene/index/LazySoftDeletesDirectoryReaderWrapper.java
> > [2]
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SoftDeletesDirectoryReaderWrapper.java
> > [3]
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterDirectoryReader.java
> > [4]
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterLeafReader.java
> > [5]
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/FilterCodecReader.java
>
>
>
> --
> Adrien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>