Mailing List Archive

1 2  View All
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
So, just cat <Lucene_index_file> will do this.
From: Robert Muir <>
Sent: Tuesday, February 23, 2021 4:45 PM
To: Baris Kazar <>
Cc: java-user <>
Subject: Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)

The preload isn't magical.
It only "reads in the whole file" to get it cached, same as if you did that yourself with 'cat' or 'dd'.
It "warms" the file.

It just does this in an efficient way at the low level to make the warming itself efficient. It madvise()s kernel to announce some read-ahead and then reads the first byte of every mmap'd page (which is enough to fault it in).

At the end of the day it doesn't matter if you wrote a shitty shell script that uses 'dd' to read in each index file and send it to /dev/null, or whether you spent lots of time writing fancy java code to call this preload thing: you get the same result, same end state.

Maybe the preload takes 18 seconds to "warm" the index, vs. your crappy shell script which takes 22 seconds. It is mainly more important for servers and portability (e.g. it will work fine on windows, but obviously will not call madvise).

On Tue, Feb 23, 2021 at 4:18 PM <<>> wrote:

Thanks again, Robert. Could you please explain "preload"? Which functionality is that? we discussed in this thread before about a preload.

Is there a Lucene url / site that i can look at for preload?

Thanks for the explanations. This thread will be useful for many folks i believe.

Best regards

On 2/23/21 4:15 PM, Robert Muir wrote:

On Tue, Feb 23, 2021 at 4:07 PM <<>> wrote:

What i want to achieve: Problem statement:

base case is disk based Lucene index with FSDirectory

speedup case was supposed to be in memory Lucene index with MMapDirectory

On 64-bit systems, FSDirectory just invokes MMapDirectory already. So you don't need to do anything.

Either way MMapDirectory or NIOFSDirectory are doing the same thing: reading your index as a normal file and letting the operating system cache it.
The MMapDirectory is just better because it avoids some overheads, such as read() system call, copying and buffering into java memory space, etc etc.
Some of these overheads are only getting worse, e.g. spectre/meltdown-type fixes make syscalls 8x slower on my computer. So it is good that MMapDirectory avoids it.

So I suggest just stop fighting the operating system, don't give your J2EE container huge amounts of ram, let the kernel do its job.
If you want to "warm" a cold system because nothing is in kernel's cache, then look into preload and so on. It is just "reading files" to get them cached.

1 2  View All