Mailing List Archive

Boolean query regression after migrating from Lucene 8.5 to 9.2
Hello everyone,
We have a use-case which shows about 10 times higher latency for boolean
queries after migrating to Lucene 9.2. Each query contains 3 filter clauses
and up to a thousand single-term should clauses. They usually return less
than 5 documents with a single stored field and used to execute in about
500 ms, but now take more than 5 seconds. Throughput remains roughly the
same, about 60 qps.

A compounding factor is that our disk utilization is quite high and usually
in the range of 70% to 100%. That seems like an obvious issue, but it did
not affect these queries too much before. After the upgrade, we can also
see about twice as high kernel CPU utilization. We are currently using
NIOFSDirectory.

Please let me know if these symptoms mean something to you, it would be
great to know if we are doing something wrong and if there is a way to fix
the queries without upgrading hardware.

Thank you,
Alex
Re: Boolean query regression after migrating from Lucene 8.5 to 9.2 [ In reply to ]
Hello everyone, I did not get a response on this, but wanted to give an
update and ask a few more questions. Our profiling shows that 80% of the
time Lucene spends on reading FSTs for these queries, so the regression
seems to be related to the change in Lucene 8.6 (LUCENE-9257) where the
FSTs were moved off-heap. Our understanding is that with this change,
reading FSTs became less efficient with NIOFSDirectory because Lucene
spends more time on bringing data on heap which also significantly
increases the number of system calls / kernel CPU.

Currently we are trying to avoid switching to MMAP because there is another
process running on the same host and extensively utilizes the FS cache. We
did try to use FileSwitchDirectory and MMAP only a minimal amount of files
- Term Dictionary and Term Index. That helped, but only in some use cases.

Is there anything else we are missing, maybe some other Lucene data
structures are critical to MMAP with FSTs off-heap? Is there anything else
we could try? There is probably an option to bring FSTs back on heap, but
we are trying to avoid it since there is no configuration for this so it
requires a lot of code changes.

Thank you,
Alex


On Tue, Aug 9, 2022 at 6:34 AM Alexander Lukyanchikov <
alexanderlukyanchikov@gmail.com> wrote:

> Hello everyone,
> We have a use-case which shows about 10 times higher latency for boolean
> queries after migrating to Lucene 9.2. Each query contains 3 filter clauses
> and up to a thousand single-term should clauses. They usually return less
> than 5 documents with a single stored field and used to execute in about
> 500 ms, but now take more than 5 seconds. Throughput remains roughly the
> same, about 60 qps.
>
> A compounding factor is that our disk utilization is quite high and
> usually in the range of 70% to 100%. That seems like an obvious issue, but
> it did not affect these queries too much before. After the upgrade, we can
> also see about twice as high kernel CPU utilization. We are currently using
> NIOFSDirectory.
>
> Please let me know if these symptoms mean something to you, it would be
> great to know if we are doing something wrong and if there is a way to fix
> the queries without upgrading hardware.
>
> Thank you,
> Alex
>
Re: Boolean query regression after migrating from Lucene 8.5 to 9.2 [ In reply to ]
Hi Alex,

If you're using NIOFSDirectory then indeed there will be a lot of kernel
calls (seeks over the off-heap fst). I'm not sure anything can be done
about it. mmap seems to be just about the only option that comes to my mind
as then the cost is shifted to the kernel (and data is cached/ released
more efficiently). It would be interesting to provide some kind of
benchmark that would create a large-ish index and then use both directories
for the same lookups - then we'd know whether:

a) it's indeed the problem with nio vs. mmap (very likely),
b) what the actual hotspots are in nio (profiling),
c) the problem is OS-specific; I bet the behavior here is different between
different OSs. MMap will likely be faster on most of them but I wonder if
it's consistent everywhere.

Dawid

On Thu, Aug 18, 2022 at 7:47 PM Alexander Lukyanchikov <
alexanderlukyanchikov@gmail.com> wrote:

> Hello everyone, I did not get a response on this, but wanted to give an
> update and ask a few more questions. Our profiling shows that 80% of the
> time Lucene spends on reading FSTs for these queries, so the regression
> seems to be related to the change in Lucene 8.6 (LUCENE-9257) where the
> FSTs were moved off-heap. Our understanding is that with this change,
> reading FSTs became less efficient with NIOFSDirectory because Lucene
> spends more time on bringing data on heap which also significantly
> increases the number of system calls / kernel CPU.
>
> Currently we are trying to avoid switching to MMAP because there is
> another process running on the same host and extensively utilizes the FS
> cache. We did try to use FileSwitchDirectory and MMAP only a minimal amount
> of files - Term Dictionary and Term Index. That helped, but only in some
> use cases.
>
> Is there anything else we are missing, maybe some other Lucene data
> structures are critical to MMAP with FSTs off-heap? Is there anything else
> we could try? There is probably an option to bring FSTs back on heap, but
> we are trying to avoid it since there is no configuration for this so it
> requires a lot of code changes.
>
> Thank you,
> Alex
>
>
> On Tue, Aug 9, 2022 at 6:34 AM Alexander Lukyanchikov <
> alexanderlukyanchikov@gmail.com> wrote:
>
>> Hello everyone,
>> We have a use-case which shows about 10 times higher latency for boolean
>> queries after migrating to Lucene 9.2. Each query contains 3 filter clauses
>> and up to a thousand single-term should clauses. They usually return less
>> than 5 documents with a single stored field and used to execute in about
>> 500 ms, but now take more than 5 seconds. Throughput remains roughly the
>> same, about 60 qps.
>>
>> A compounding factor is that our disk utilization is quite high and
>> usually in the range of 70% to 100%. That seems like an obvious issue, but
>> it did not affect these queries too much before. After the upgrade, we can
>> also see about twice as high kernel CPU utilization. We are currently using
>> NIOFSDirectory.
>>
>> Please let me know if these symptoms mean something to you, it would be
>> great to know if we are doing something wrong and if there is a way to fix
>> the queries without upgrading hardware.
>>
>> Thank you,
>> Alex
>>
>
Re: Boolean query regression after migrating from Lucene 8.5 to 9.2 [ In reply to ]
On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov
<alexanderlukyanchikov@gmail.com> wrote:

>
> Currently we are trying to avoid switching to MMAP because there is another process running on the same host and extensively utilizes the FS cache.
>

This makes no sense, NIOFSDirectory uses the FS cache the exact same
way as mmap. it just uses read() interface instead.

A self-created problem!

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: Boolean query regression after migrating from Lucene 8.5 to 9.2 [ In reply to ]
Hi Robert, thank you for the response.

I understand that NIOFSDirectory also uses the FS cache, but doesn't
MMapDirectory tend to fill up the cache with unnecessary data for random
access pattern due to sequential read-ahead? Our concern is that it can
potentially lead to evicting hot pages used by another process on the same
host, affecting its performance. As far as I can Elasticsearch also avoids
using MMap for everything by default, e.g stored fields and term vectors
are not MMAPed.

Does it make sense or am I missing something? Is my understanding correct
that it still makes sense to avoid MMAPing files with the random access
pattern on the most recent Lucene and JVM versions?

Thank you,
Alex


On Fri, Aug 19, 2022 at 2:42 AM Robert Muir <rcmuir@gmail.com> wrote:

> On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov
> <alexanderlukyanchikov@gmail.com> wrote:
>
> >
> > Currently we are trying to avoid switching to MMAP because there is
> another process running on the same host and extensively utilizes the FS
> cache.
> >
>
> This makes no sense, NIOFSDirectory uses the FS cache the exact same
> way as mmap. it just uses read() interface instead.
>
> A self-created problem!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: Boolean query regression after migrating from Lucene 8.5 to 9.2 [ In reply to ]
Hi Alexander,

> I understand that NIOFSDirectory also uses the FS cache, but doesn't
> MMapDirectory tend to fill up the cache with unnecessary data for
> random access pattern due to sequential read-ahead? Our concern is
> that it can potentially lead to evicting hot pages used by another
> process on the same host, affecting its performance.
no this is not the case (at least not on Linux or Solaris). It is no
difference between read() and a pagefault by mmap. It will read the same
pages and put them into cache. It won't read more pages for mmap. What
gets read depends on fadvice or madvise, which Lucene does not change
(the OS decides).
> As far as I can Elasticsearch also avoids using MMap for everything by
> default, e.g stored fields and term vectors are not MMAPed.

The Elasticsearch reason is different: It does it because of the limited
number of mappings available by current kernels. Elasticsearch clusters
tend to have many indexes and to avoid too many mappings they do this.
It has nothing to do with caching.

stored fields and term vectors are valid candidates to not mmapping them
if you have pressure on number of mappings. The access pattern is
completely different. So what Elasticserach does is a valid thing to do.
If you really want to spare mappings, use the stored fields / term
vectors approach. But then you also need to disable CFS files which is
contra-productive, as it raises the number of mappings and file handles.

> Does it make sense or am I missing something? Is my understanding
> correct that it still makes sense to avoid MMAPing files with the
> random access pattern on the most recent Lucene and JVM versions?

Who said this? This is simply not true! Myths....

One last word: With the next Lucene version after Java 19 came out you
will be able to work around the "too many mappings" problem for huge
clouds of Elasticsearch clusters due to a new MMAP implementation
choosen using MultiRelease lucene-core.jar file. This will allow them to
mmap everything when Java 19+ is used (and the preview features of Java
are enabled). This works by having huger blocks of virtual memory
(currently limited to 1 Gigabyte per mapping) =>
https://github.com/apache/lucene/pull/912

Uwe

>
> Thank you,
> Alex
>
>
> On Fri, Aug 19, 2022 at 2:42 AM Robert Muir <rcmuir@gmail.com> wrote:
>
> On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov
> <alexanderlukyanchikov@gmail.com> wrote:
>
> >
> > Currently we are trying to avoid switching to MMAP because there
> is another process running on the same host and extensively
> utilizes the FS cache.
> >
>
> This makes no sense, NIOFSDirectory uses the FS cache the exact same
> way as mmap. it just uses read() interface instead.
>
> A self-created problem!
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail:uwe@thetaphi.de
Re: Boolean query regression after migrating from Lucene 8.5 to 9.2 [ In reply to ]
Hi Uwe,
Thank you for the detailed explanation, that helps a lot. I am still trying
to understand and confirm a few details though.

> Is my understanding correct that it still makes sense to avoid MMAPing
> files with the random access pattern on the most recent Lucene and JVM
> versions?
> Who said this? This is simply not true! Myths....


Originally I saw this concern mentioned here:
https://github.com/elastic/elasticsearch/issues/27748. There is an answer
at the bottom, saying this:

> We analyzed the same problem with mmapfs around six months ago and came to
> the same conclusions. This is why we have introduced hybridfs (#36668) to
> use NIO when we expect the access pattern to be random such that sequential
> read-ahead would be painful and otherwise use mmap


I did not do any research to confirm that, but that thread suggests that
cache trashing issue happens only on mmap:

> You can see mmapfs is consuming more cache and IO than niofs.


I also noticed this
<https://github.com/elastic/elasticsearch/pull/36668#discussion_r242147643>comment
from Adrien
<https://github.com/elastic/elasticsearch/pull/36668#discussion_r242147643>
in the PR which introduces hybridfs:

> let's mention FS cache usage rather than the number of mmaps as a reason
> for this hybrid store type?


So if I read this correctly, it seems that the main reason why
Elasticsearch started to use the hybrid approach was the FS cache.

Am I missing something, or maybe things have changed and it's not relevant
anymore?

--
Regards,
Alex


On Mon, Aug 22, 2022 at 8:18 AM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi Alexander,
>
> I understand that NIOFSDirectory also uses the FS cache, but doesn't
> MMapDirectory tend to fill up the cache with unnecessary data for random
> access pattern due to sequential read-ahead? Our concern is that it can
> potentially lead to evicting hot pages used by another process on the same
> host, affecting its performance.
>
> no this is not the case (at least not on Linux or Solaris). It is no
> difference between read() and a pagefault by mmap. It will read the same
> pages and put them into cache. It won't read more pages for mmap. What gets
> read depends on fadvice or madvise, which Lucene does not change (the OS
> decides).
>
> As far as I can Elasticsearch also avoids using MMap for everything by
> default, e.g stored fields and term vectors are not MMAPed.
>
> The Elasticsearch reason is different: It does it because of the limited
> number of mappings available by current kernels. Elasticsearch clusters
> tend to have many indexes and to avoid too many mappings they do this. It
> has nothing to do with caching.
>
> stored fields and term vectors are valid candidates to not mmapping them
> if you have pressure on number of mappings. The access pattern is
> completely different. So what Elasticserach does is a valid thing to do. If
> you really want to spare mappings, use the stored fields / term vectors
> approach. But then you also need to disable CFS files which is
> contra-productive, as it raises the number of mappings and file handles.
>
> Does it make sense or am I missing something? Is my understanding correct
> that it still makes sense to avoid MMAPing files with the random access
> pattern on the most recent Lucene and JVM versions?
>
> Who said this? This is simply not true! Myths....
>
> One last word: With the next Lucene version after Java 19 came out you
> will be able to work around the "too many mappings" problem for huge clouds
> of Elasticsearch clusters due to a new MMAP implementation choosen using
> MultiRelease lucene-core.jar file. This will allow them to mmap everything
> when Java 19+ is used (and the preview features of Java are enabled). This
> works by having huger blocks of virtual memory (currently limited to 1
> Gigabyte per mapping) => https://github.com/apache/lucene/pull/912
>
> Uwe
>
>
> Thank you,
> Alex
>
>
> On Fri, Aug 19, 2022 at 2:42 AM Robert Muir <rcmuir@gmail.com> wrote:
>
>> On Thu, Aug 18, 2022 at 1:47 PM Alexander Lukyanchikov
>> <alexanderlukyanchikov@gmail.com> wrote:
>>
>> >
>> > Currently we are trying to avoid switching to MMAP because there is
>> another process running on the same host and extensively utilizes the FS
>> cache.
>> >
>>
>> This makes no sense, NIOFSDirectory uses the FS cache the exact same
>> way as mmap. it just uses read() interface instead.
>>
>> A self-created problem!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>