Mailing List Archive

MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory)
Hi,-

it would be nice to create a Lucene index in files and then effectively load it into memory once (since i use in read-only mode). I am looking into if this is doable in Lucene.

i wish there were an option to load whole Lucene index into memory:

Both of below urls have links to the blog url where i quoted a very nice section:

https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDirectory.html
https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDirectory.html

This following blog mentions about such option
to run in the memory: (see the underlined sentence below)

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1

MMapDirectory will not load the whole index into physical memory. Why should it do this? We just ask the operating system to map the file into address space for easy access, by no means we are requesting more. Java and the O/S optionally provide the option to try loading the whole file into RAM (if enough is available), but Lucene does not use that option (we may add this possibility in a later version).

My question is: is there such an option?
is the method setPreLoad for this purpose:
to load all Lucene lndex into memory?

I would like to use MMapDirectory and set my
JVM heap to 16G or a bit less (since my index is
around this much).

The Lucene 8.5.2 (8.5.0 as well) javadocs say:
public void setPreload(boolean preload)
Set to true to ask mapped pages to be loaded into physical memory on init. The behavior is best-effort and operating system dependent.

For example Lucene 4.0.0 does not have setPreLoad method.

https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDirectory.html

Happy Holidays
Best regards


Ps. i know there is also BytesBuffersDirectory class for in memory Lucene but this requires creating Lucene Index on the fly.

This is great for only such kind of Lucene indexes that can be created quickly on the fly.

Ekaterina has a nice article on this BytesBuffersDirectory class:

https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Hello,

Yes, that is exactly what MMapDirectory.setPreload is trying to do, but not
promises (it is best effort). I think it asks the OS to touch all pages in
the mapped region so they are cached in RAM, if you have enough RAM.

Make your JVM heap as low as possible to let the OS have more RAM to use to
load your index.

Mike McCandless

http://blog.mikemccandless.com


On Sun, Dec 13, 2020 at 4:18 PM <baris.kazar@oracle.com> wrote:

> Hi,-
>
> it would be nice to create a Lucene index in files and then effectively
> load it into memory once (since i use in read-only mode). I am looking into
> if this is doable in Lucene.
>
> i wish there were an option to load whole Lucene index into memory:
>
> Both of below urls have links to the blog url where i quoted a very nice
> section:
>
>
> https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDirectory.html
>
> https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDirectory.html
>
> This following blog mentions about such option
> to run in the memory: (see the underlined sentence below)
>
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1
>
> MMapDirectory will not load the whole index into physical memory. Why
> should it do this? We just ask the operating system to map the file into
> address space for easy access, by no means we are requesting more. Java and
> the O/S optionally provide the option to try loading the whole file into
> RAM (if enough is available), but Lucene does not use that option (we may
> add this possibility in a later version).
>
> My question is: is there such an option?
> is the method setPreLoad for this purpose:
> to load all Lucene lndex into memory?
>
> I would like to use MMapDirectory and set my
> JVM heap to 16G or a bit less (since my index is
> around this much).
>
> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
> public void setPreload(boolean preload)
> Set to true to ask mapped pages to be loaded into physical memory on init.
> The behavior is best-effort and operating system dependent.
>
> For example Lucene 4.0.0 does not have setPreLoad method.
>
>
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDirectory.html
>
> Happy Holidays
> Best regards
>
>
> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
> but this requires creating Lucene Index on the fly.
>
> This is great for only such kind of Lucene indexes that can be created
> quickly on the fly.
>
> Ekaterina has a nice article on this BytesBuffersDirectory class:
>
>
> https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36
>
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks Mike, appreciate the reply and the suggestions very much.

And Your article link to concurrent search is amazing.

Together with in memory and concurrent index (especially in read only mode)

these will speed up Lucene queries very much.

Happy Holidays

Best regards


On 12/14/20 10:12 AM, Michael McCandless wrote:
> Hello,
>
> Yes, that is exactly what MMapDirectory.setPreload is trying to do, but not
> promises (it is best effort). I think it asks the OS to touch all pages in
> the mapped region so they are cached in RAM, if you have enough RAM.
>
> Make your JVM heap as low as possible to let the OS have more RAM to use to
> load your index.
>
> Mike McCandless
>
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJn-Lr5mA$
>
>
> On Sun, Dec 13, 2020 at 4:18 PM <baris.kazar@oracle.com> wrote:
>
>> Hi,-
>>
>> it would be nice to create a Lucene index in files and then effectively
>> load it into memory once (since i use in read-only mode). I am looking into
>> if this is doable in Lucene.
>>
>> i wish there were an option to load whole Lucene index into memory:
>>
>> Both of below urls have links to the blog url where i quoted a very nice
>> section:
>>
>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJaN3djDw$
>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJhxlyzBw$
>>
>> This following blog mentions about such option
>> to run in the memory: (see the underlined sentence below)
>>
>>
>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ1O4pdIg$
>>
>> MMapDirectory will not load the whole index into physical memory. Why
>> should it do this? We just ask the operating system to map the file into
>> address space for easy access, by no means we are requesting more. Java and
>> the O/S optionally provide the option to try loading the whole file into
>> RAM (if enough is available), but Lucene does not use that option (we may
>> add this possibility in a later version).
>>
>> My question is: is there such an option?
>> is the method setPreLoad for this purpose:
>> to load all Lucene lndex into memory?
>>
>> I would like to use MMapDirectory and set my
>> JVM heap to 16G or a bit less (since my index is
>> around this much).
>>
>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>> public void setPreload(boolean preload)
>> Set to true to ask mapped pages to be loaded into physical memory on init.
>> The behavior is best-effort and operating system dependent.
>>
>> For example Lucene 4.0.0 does not have setPreLoad method.
>>
>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ_Zf_dhQ$
>>
>> Happy Holidays
>> Best regards
>>
>>
>> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
>> but this requires creating Lucene Index on the fly.
>>
>> This is great for only such kind of Lucene indexes that can be created
>> quickly on the fly.
>>
>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>
>>
>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOIosJjRzQ$
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
I used one of the Linux feature (ramfs, basically mounting ram on a
partition) to guarantee that it's always in ram (No accidental paging ;)
cost too).

https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux

WARN: Only use if it's a read-only index and can fit in ram and have a
back-up copy of that index on persistent disk somewhere. You may use any
directory implementation in Lucene. e.g
https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/store/SimpleFSDirectory.html

The search was amazingly quick as the full index was on ram mounted
directory.
<https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux>








On Mon, Dec 14, 2020 at 11:27 AM <baris.kazar@oracle.com> wrote:

> Thanks Mike, appreciate the reply and the suggestions very much.
>
> And Your article link to concurrent search is amazing.
>
> Together with in memory and concurrent index (especially in read only mode)
>
> these will speed up Lucene queries very much.
>
> Happy Holidays
>
> Best regards
>
>
> On 12/14/20 10:12 AM, Michael McCandless wrote:
> > Hello,
> >
> > Yes, that is exactly what MMapDirectory.setPreload is trying to do, but
> not
> > promises (it is best effort). I think it asks the OS to touch all pages
> in
> > the mapped region so they are cached in RAM, if you have enough RAM.
> >
> > Make your JVM heap as low as possible to let the OS have more RAM to use
> to
> > load your index.
> >
> > Mike McCandless
> >
> >
> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJn-Lr5mA$
> >
> >
> > On Sun, Dec 13, 2020 at 4:18 PM <baris.kazar@oracle.com> wrote:
> >
> >> Hi,-
> >>
> >> it would be nice to create a Lucene index in files and then effectively
> >> load it into memory once (since i use in read-only mode). I am looking
> into
> >> if this is doable in Lucene.
> >>
> >> i wish there were an option to load whole Lucene index into memory:
> >>
> >> Both of below urls have links to the blog url where i quoted a very nice
> >> section:
> >>
> >>
> >>
> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJaN3djDw$
> >>
> >>
> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJhxlyzBw$
> >>
> >> This following blog mentions about such option
> >> to run in the memory: (see the underlined sentence below)
> >>
> >>
> >>
> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ1O4pdIg$
> >>
> >> MMapDirectory will not load the whole index into physical memory. Why
> >> should it do this? We just ask the operating system to map the file into
> >> address space for easy access, by no means we are requesting more. Java
> and
> >> the O/S optionally provide the option to try loading the whole file into
> >> RAM (if enough is available), but Lucene does not use that option (we
> may
> >> add this possibility in a later version).
> >>
> >> My question is: is there such an option?
> >> is the method setPreLoad for this purpose:
> >> to load all Lucene lndex into memory?
> >>
> >> I would like to use MMapDirectory and set my
> >> JVM heap to 16G or a bit less (since my index is
> >> around this much).
> >>
> >> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
> >> public void setPreload(boolean preload)
> >> Set to true to ask mapped pages to be loaded into physical memory on
> init.
> >> The behavior is best-effort and operating system dependent.
> >>
> >> For example Lucene 4.0.0 does not have setPreLoad method.
> >>
> >>
> >>
> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ_Zf_dhQ$
> >>
> >> Happy Holidays
> >> Best regards
> >>
> >>
> >> Ps. i know there is also BytesBuffersDirectory class for in memory
> Lucene
> >> but this requires creating Lucene Index on the fly.
> >>
> >> This is great for only such kind of Lucene indexes that can be created
> >> quickly on the fly.
> >>
> >> Ekaterina has a nice article on this BytesBuffersDirectory class:
> >>
> >>
> >>
> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOIosJjRzQ$
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks Jigar, these are great notes, observations, experiments to know
about and they are very very valuable,

i also plan to write a blog on this topic to help Lucene advance.

Best regards


On 12/14/20 12:44 PM, Jigar Shah wrote:
> I used one of the Linux feature (ramfs, basically mounting ram on a
> partition) to guarantee that it's always in ram (No accidental paging ;)
> cost too).
>
> https://urldefense.com/v3/__https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux__;!!GqivPVa7Brio!L7o3DbosKYTNGBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnA2YBj41g$
>
> WARN: Only use if it's a read-only index and can fit in ram and have a
> back-up copy of that index on persistent disk somewhere. You may use any
> directory implementation in Lucene. e.g
> https://urldefense.com/v3/__https://lucene.apache.org/core/7_3_1/core/org/apache/lucene/store/SimpleFSDirectory.html__;!!GqivPVa7Brio!L7o3DbosKYTNGBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnCKbHPcgQ$
>
> The search was amazingly quick as the full index was on ram mounted
> directory.
> <https://urldefense.com/v3/__https://www.jamescoyle.net/how-to/943-create-a-ram-disk-in-linux__;!!GqivPVa7Brio!L7o3DbosKYTNGBfhVwhvr1QLg-A2u4Xd8QWD5FKapojFuxlIEAQY7H3KlnA2YBj41g$ >
>
>
>
>
>
>
>
>
> On Mon, Dec 14, 2020 at 11:27 AM <baris.kazar@oracle.com> wrote:
>
>> Thanks Mike, appreciate the reply and the suggestions very much.
>>
>> And Your article link to concurrent search is amazing.
>>
>> Together with in memory and concurrent index (especially in read only mode)
>>
>> these will speed up Lucene queries very much.
>>
>> Happy Holidays
>>
>> Best regards
>>
>>
>> On 12/14/20 10:12 AM, Michael McCandless wrote:
>>> Hello,
>>>
>>> Yes, that is exactly what MMapDirectory.setPreload is trying to do, but
>> not
>>> promises (it is best effort). I think it asks the OS to touch all pages
>> in
>>> the mapped region so they are cached in RAM, if you have enough RAM.
>>>
>>> Make your JVM heap as low as possible to let the OS have more RAM to use
>> to
>>> load your index.
>>>
>>> Mike McCandless
>>>
>>>
>> https://urldefense.com/v3/__http://blog.mikemccandless.com__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJn-Lr5mA$
>>>
>>> On Sun, Dec 13, 2020 at 4:18 PM <baris.kazar@oracle.com> wrote:
>>>
>>>> Hi,-
>>>>
>>>> it would be nice to create a Lucene index in files and then effectively
>>>> load it into memory once (since i use in read-only mode). I am looking
>> into
>>>> if this is doable in Lucene.
>>>>
>>>> i wish there were an option to load whole Lucene index into memory:
>>>>
>>>> Both of below urls have links to the blog url where i quoted a very nice
>>>> section:
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJaN3djDw$
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJhxlyzBw$
>>>> This following blog mentions about such option
>>>> to run in the memory: (see the underlined sentence below)
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html?m=1__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ1O4pdIg$
>>>> MMapDirectory will not load the whole index into physical memory. Why
>>>> should it do this? We just ask the operating system to map the file into
>>>> address space for easy access, by no means we are requesting more. Java
>> and
>>>> the O/S optionally provide the option to try loading the whole file into
>>>> RAM (if enough is available), but Lucene does not use that option (we
>> may
>>>> add this possibility in a later version).
>>>>
>>>> My question is: is there such an option?
>>>> is the method setPreLoad for this purpose:
>>>> to load all Lucene lndex into memory?
>>>>
>>>> I would like to use MMapDirectory and set my
>>>> JVM heap to 16G or a bit less (since my index is
>>>> around this much).
>>>>
>>>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>>>> public void setPreload(boolean preload)
>>>> Set to true to ask mapped pages to be loaded into physical memory on
>> init.
>>>> The behavior is best-effort and operating system dependent.
>>>>
>>>> For example Lucene 4.0.0 does not have setPreLoad method.
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDirectory.html__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOJ_Zf_dhQ$
>>>> Happy Holidays
>>>> Best regards
>>>>
>>>>
>>>> Ps. i know there is also BytesBuffersDirectory class for in memory
>> Lucene
>>>> but this requires creating Lucene Index on the fly.
>>>>
>>>> This is great for only such kind of Lucene indexes that can be created
>>>> quickly on the fly.
>>>>
>>>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>>>
>>>>
>>>>
>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-autocomplete-with-lucene-8-5-f2df1bc71c36__;!!GqivPVa7Brio!LEQH8Tyb_BBN_Kc3fEH2w-yhpvS-VwMrpuB0gctqchp3j7L7V6x9piciHOIosJjRzQ$
>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Hi,

as writer of the original bog post, here my comments:

Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
to load everything into memory - but that does not guarantee anything!
Still, I would not recommend to use that function, because all it does is to
just touch every page of the file, so the linux kernel puts it into OS cache
- nothing more; IMHO very ineffective as it slows down openining index for a
stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
later used or not! So this may take some time until it is done. Lateron,
still Lucene needs to open index files, initialize its own data
structures,...

In general it is much better to open index, with MMAP directory and execute
some "sample" queries. This will do exactly the same like the preload
function, but it is more "selective". Parts of the index which are not used
won't be touched, and on top, it will also load ALL the required index
structures to heap.

As always and as mentioned in my blog post: there's nothing that can ensure
your index will stays in memory. Please trust the kernel to do the right
thing. Why do you care at all?

If you are curious and want to have everything in memory all the time:
- use tmpfs as your filesystem (of course you will loose data when OS shuts
down)
- disable swap and/or disable swapiness
- use only as much heap as needed, keep everything of free memory for your
index outside heap.

Fake feelings of "everything in RAM" are misconceptions like:
- use RAMDirectory (deprecated): this may be a desaster as it described in
the blog post
- use ByteBuffersDirectory: a little bit better, but this brings nothing, as
the operating system kernel may still page out your index pages. They still
live in/off heap and are part of usual paging. They are just no longer
backed by a file.

Lucene does most of the stuff outside heap, live with it!

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
> Sent: Sunday, December 13, 2020 10:18 PM
> To: java-user@lucene.apache.org
> Cc: BARIS KAZAR <baris.kazar@oracle.com>
> Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
ByteBuffersDirectory)
>
> Hi,-
>
> it would be nice to create a Lucene index in files and then effectively
load it
> into memory once (since i use in read-only mode). I am looking into if
this is
> doable in Lucene.
>
> i wish there were an option to load whole Lucene index into memory:
>
> Both of below urls have links to the blog url where i quoted a very nice
section:
>
> https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDi
> rectory.html
> https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDi
> rectory.html
>
> This following blog mentions about such option
> to run in the memory: (see the underlined sentence below)
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> 64bit.html?m=1
>
> MMapDirectory will not load the whole index into physical memory. Why
> should it do this? We just ask the operating system to map the file into
address
> space for easy access, by no means we are requesting more. Java and the
O/S
> optionally provide the option to try loading the whole file into RAM (if
enough
> is available), but Lucene does not use that option (we may add this
possibility
> in a later version).
>
> My question is: is there such an option?
> is the method setPreLoad for this purpose:
> to load all Lucene lndex into memory?
>
> I would like to use MMapDirectory and set my
> JVM heap to 16G or a bit less (since my index is
> around this much).
>
> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
> public void setPreload(boolean preload)
> Set to true to ask mapped pages to be loaded into physical memory on init.
The
> behavior is best-effort and operating system dependent.
>
> For example Lucene 4.0.0 does not have setPreLoad method.
>
> https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDi
> rectory.html
>
> Happy Holidays
> Best regards
>
>
> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
but
> this requires creating Lucene Index on the fly.
>
> This is great for only such kind of Lucene indexes that can be created
quickly on
> the fly.
>
> Ekaterina has a nice article on this BytesBuffersDirectory class:
>
> https://medium.com/@ekaterinamihailova/in-memory-search-and-
> autocomplete-with-lucene-8-5-f2df1bc71c36



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks, Uwe

Yes, recommended, tmpfs/ramfs worked like a charm in our use-case with a
read-only index, giving us very high-throughput and consistent response
time on queries.

We had to have some redundancy to be built around that service to be
high-available, so we can do a rolling update on the read-only index
reducing the risk of downtime.



On Mon, Dec 14, 2020 at 1:51 PM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> as writer of the original bog post, here my comments:
>
> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
> to load everything into memory - but that does not guarantee anything!
> Still, I would not recommend to use that function, because all it does is
> to
> just touch every page of the file, so the linux kernel puts it into OS
> cache
> - nothing more; IMHO very ineffective as it slows down openining index for
> a
> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
> later used or not! So this may take some time until it is done. Lateron,
> still Lucene needs to open index files, initialize its own data
> structures,...
>
> In general it is much better to open index, with MMAP directory and execute
> some "sample" queries. This will do exactly the same like the preload
> function, but it is more "selective". Parts of the index which are not used
> won't be touched, and on top, it will also load ALL the required index
> structures to heap.
>
> As always and as mentioned in my blog post: there's nothing that can ensure
> your index will stays in memory. Please trust the kernel to do the right
> thing. Why do you care at all?
>
> If you are curious and want to have everything in memory all the time:
> - use tmpfs as your filesystem (of course you will loose data when OS shuts
> down)
> - disable swap and/or disable swapiness
> - use only as much heap as needed, keep everything of free memory for your
> index outside heap.
>
> Fake feelings of "everything in RAM" are misconceptions like:
> - use RAMDirectory (deprecated): this may be a desaster as it described in
> the blog post
> - use ByteBuffersDirectory: a little bit better, but this brings nothing,
> as
> the operating system kernel may still page out your index pages. They still
> live in/off heap and are part of usual paging. They are just no longer
> backed by a file.
>
> Lucene does most of the stuff outside heap, live with it!
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: baris.kazar@oracle.com <baris.kazar@oracle.com>
> > Sent: Sunday, December 13, 2020 10:18 PM
> > To: java-user@lucene.apache.org
> > Cc: BARIS KAZAR <baris.kazar@oracle.com>
> > Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
> ByteBuffersDirectory)
> >
> > Hi,-
> >
> > it would be nice to create a Lucene index in files and then effectively
> load it
> > into memory once (since i use in read-only mode). I am looking into if
> this is
> > doable in Lucene.
> >
> > i wish there were an option to load whole Lucene index into memory:
> >
> > Both of below urls have links to the blog url where i quoted a very nice
> section:
> >
> > https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDi
> > rectory.html
> > https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDi
> > rectory.html
> >
> > This following blog mentions about such option
> > to run in the memory: (see the underlined sentence below)
> >
> > https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-
> > 64bit.html?m=1
> >
> > MMapDirectory will not load the whole index into physical memory. Why
> > should it do this? We just ask the operating system to map the file into
> address
> > space for easy access, by no means we are requesting more. Java and the
> O/S
> > optionally provide the option to try loading the whole file into RAM (if
> enough
> > is available), but Lucene does not use that option (we may add this
> possibility
> > in a later version).
> >
> > My question is: is there such an option?
> > is the method setPreLoad for this purpose:
> > to load all Lucene lndex into memory?
> >
> > I would like to use MMapDirectory and set my
> > JVM heap to 16G or a bit less (since my index is
> > around this much).
> >
> > The Lucene 8.5.2 (8.5.0 as well) javadocs say:
> > public void setPreload(boolean preload)
> > Set to true to ask mapped pages to be loaded into physical memory on
> init.
> The
> > behavior is best-effort and operating system dependent.
> >
> > For example Lucene 4.0.0 does not have setPreLoad method.
> >
> > https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDi
> > rectory.html
> >
> > Happy Holidays
> > Best regards
> >
> >
> > Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
> but
> > this requires creating Lucene Index on the fly.
> >
> > This is great for only such kind of Lucene indexes that can be created
> quickly on
> > the fly.
> >
> > Ekaterina has a nice article on this BytesBuffersDirectory class:
> >
> > https://medium.com/@ekaterinamihailova/in-memory-search-and-
> > autocomplete-with-lucene-8-5-f2df1bc71c36
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks Uwe, i am not insisting on to load everything into memory

but loading into memory might speed up and i would like to see how much
speedup.


but i have one more question and that is still not clear to me:

"it is much better to open index, with MMAP directory"


does this mean i should not use the constructor but instead use the open
api?


in other words: which way should be preferred?

The example is from both during indexing and searching:


/*First way: Using constructor (without setPreload) :*/

MMapDirectory dir = new MMapDirectory(Paths.get(indexDir)); // Uses
FSLockFactory.getDefault() and DEFAULT_MAX_CHUNK_SIZE which is 1GB
////if (dir.getPreload() == false)
////  dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
enabled-> *commented out*
IndexReader reader = DirectoryReader.open(dir);

...


/*Second way: Or using open (without setPreload) :*/

*Directory* dir = MMapDirectory.open(Paths.get(indexDir)); //open is
inherited from FSDirectory
////if (dir.getPreload() == false)
////  dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
enabled-> *here setPreload cannot be used*
IndexReader reader = DirectoryReader.open(dir);
IndexSearcher is = new IndexSearcher(reader);

...


Best regards


On 12/14/20 1:51 PM, Uwe Schindler wrote:
> Hi,
>
> as writer of the original bog post, here my comments:
>
> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
> to load everything into memory - but that does not guarantee anything!
> Still, I would not recommend to use that function, because all it does is to
> just touch every page of the file, so the linux kernel puts it into OS cache
> - nothing more; IMHO very ineffective as it slows down openining index for a
> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
> later used or not! So this may take some time until it is done. Lateron,
> still Lucene needs to open index files, initialize its own data
> structures,...
>
> In general it is much better to open index, with MMAP directory and execute
> some "sample" queries. This will do exactly the same like the preload
> function, but it is more "selective". Parts of the index which are not used
> won't be touched, and on top, it will also load ALL the required index
> structures to heap.
>
> As always and as mentioned in my blog post: there's nothing that can ensure
> your index will stays in memory. Please trust the kernel to do the right
> thing. Why do you care at all?
>
> If you are curious and want to have everything in memory all the time:
> - use tmpfs as your filesystem (of course you will loose data when OS shuts
> down)
> - disable swap and/or disable swapiness
> - use only as much heap as needed, keep everything of free memory for your
> index outside heap.
>
> Fake feelings of "everything in RAM" are misconceptions like:
> - use RAMDirectory (deprecated): this may be a desaster as it described in
> the blog post
> - use ByteBuffersDirectory: a little bit better, but this brings nothing, as
> the operating system kernel may still page out your index pages. They still
> live in/off heap and are part of usual paging. They are just no longer
> backed by a file.
>
> Lucene does most of the stuff outside heap, live with it!
>
> Uwe
>
> -----
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://urldefense.com/v3/__https://www.thetaphi.de__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTogNnw9_Q$
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
>> Sent: Sunday, December 13, 2020 10:18 PM
>> To: java-user@lucene.apache.org
>> Cc: BARIS KAZAR <baris.kazar@oracle.com>
>> Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
> ByteBuffersDirectory)
>> Hi,-
>>
>> it would be nice to create a Lucene index in files and then effectively
> load it
>> into memory once (since i use in read-only mode). I am looking into if
> this is
>> doable in Lucene.
>>
>> i wish there were an option to load whole Lucene index into memory:
>>
>> Both of below urls have links to the blog url where i quoted a very nice
> section:
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTrcPLQ6cQ$
>> rectory.html
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHToSKhCY-w$
>> rectory.html
>>
>> This following blog mentions about such option
>> to run in the memory: (see the underlined sentence below)
>>
>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTpvqnQhbA$
>> 64bit.html?m=1
>>
>> MMapDirectory will not load the whole index into physical memory. Why
>> should it do this? We just ask the operating system to map the file into
> address
>> space for easy access, by no means we are requesting more. Java and the
> O/S
>> optionally provide the option to try loading the whole file into RAM (if
> enough
>> is available), but Lucene does not use that option (we may add this
> possibility
>> in a later version).
>>
>> My question is: is there such an option?
>> is the method setPreLoad for this purpose:
>> to load all Lucene lndex into memory?
>>
>> I would like to use MMapDirectory and set my
>> JVM heap to 16G or a bit less (since my index is
>> around this much).
>>
>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>> public void setPreload(boolean preload)
>> Set to true to ask mapped pages to be loaded into physical memory on init.
> The
>> behavior is best-effort and operating system dependent.
>>
>> For example Lucene 4.0.0 does not have setPreLoad method.
>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTp_iadIDA$
>> rectory.html
>>
>> Happy Holidays
>> Best regards
>>
>>
>> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
> but
>> this requires creating Lucene Index on the fly.
>>
>> This is great for only such kind of Lucene indexes that can be created
> quickly on
>> the fly.
>>
>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>
>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTry-H8S-g$
>> autocomplete-with-lucene-8-5-f2df1bc71c36
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
This also brings me another question:

does using MMap over FSDirectory bring any advantage with or without tmpfs?

Best regards


On 12/14/20 2:17 PM, Jigar Shah wrote:
> Thanks, Uwe
>
> Yes, recommended, tmpfs/ramfs worked like a charm in our use-case with a
> read-only index, giving us very high-throughput and consistent response
> time on queries.
>
> We had to have some redundancy to be built around that service to be
> high-available, so we can do a rolling update on the read-only index
> reducing the risk of downtime.
>
>
>
> On Mon, Dec 14, 2020 at 1:51 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>> as writer of the original bog post, here my comments:
>>
>> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
>> to load everything into memory - but that does not guarantee anything!
>> Still, I would not recommend to use that function, because all it does is
>> to
>> just touch every page of the file, so the linux kernel puts it into OS
>> cache
>> - nothing more; IMHO very ineffective as it slows down openining index for
>> a
>> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
>> later used or not! So this may take some time until it is done. Lateron,
>> still Lucene needs to open index files, initialize its own data
>> structures,...
>>
>> In general it is much better to open index, with MMAP directory and execute
>> some "sample" queries. This will do exactly the same like the preload
>> function, but it is more "selective". Parts of the index which are not used
>> won't be touched, and on top, it will also load ALL the required index
>> structures to heap.
>>
>> As always and as mentioned in my blog post: there's nothing that can ensure
>> your index will stays in memory. Please trust the kernel to do the right
>> thing. Why do you care at all?
>>
>> If you are curious and want to have everything in memory all the time:
>> - use tmpfs as your filesystem (of course you will loose data when OS shuts
>> down)
>> - disable swap and/or disable swapiness
>> - use only as much heap as needed, keep everything of free memory for your
>> index outside heap.
>>
>> Fake feelings of "everything in RAM" are misconceptions like:
>> - use RAMDirectory (deprecated): this may be a desaster as it described in
>> the blog post
>> - use ByteBuffersDirectory: a little bit better, but this brings nothing,
>> as
>> the operating system kernel may still page out your index pages. They still
>> live in/off heap and are part of usual paging. They are just no longer
>> backed by a file.
>>
>> Lucene does most of the stuff outside heap, live with it!
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> https://urldefense.com/v3/__https://www.thetaphi.de__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMV80mkA-w$
>> eMail: uwe@thetaphi.de
>>
>>> -----Original Message-----
>>> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
>>> Sent: Sunday, December 13, 2020 10:18 PM
>>> To: java-user@lucene.apache.org
>>> Cc: BARIS KAZAR <baris.kazar@oracle.com>
>>> Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
>> ByteBuffersDirectory)
>>> Hi,-
>>>
>>> it would be nice to create a Lucene index in files and then effectively
>> load it
>>> into memory once (since i use in read-only mode). I am looking into if
>> this is
>>> doable in Lucene.
>>>
>>> i wish there were an option to load whole Lucene index into memory:
>>>
>>> Both of below urls have links to the blog url where i quoted a very nice
>> section:
>>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMXBLamTEw$
>>> rectory.html
>>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMV5-KIYlg$
>>> rectory.html
>>>
>>> This following blog mentions about such option
>>> to run in the memory: (see the underlined sentence below)
>>>
>>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMXkDOv-_A$
>>> 64bit.html?m=1
>>>
>>> MMapDirectory will not load the whole index into physical memory. Why
>>> should it do this? We just ask the operating system to map the file into
>> address
>>> space for easy access, by no means we are requesting more. Java and the
>> O/S
>>> optionally provide the option to try loading the whole file into RAM (if
>> enough
>>> is available), but Lucene does not use that option (we may add this
>> possibility
>>> in a later version).
>>>
>>> My question is: is there such an option?
>>> is the method setPreLoad for this purpose:
>>> to load all Lucene lndex into memory?
>>>
>>> I would like to use MMapDirectory and set my
>>> JVM heap to 16G or a bit less (since my index is
>>> around this much).
>>>
>>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>>> public void setPreload(boolean preload)
>>> Set to true to ask mapped pages to be loaded into physical memory on
>> init.
>> The
>>> behavior is best-effort and operating system dependent.
>>>
>>> For example Lucene 4.0.0 does not have setPreLoad method.
>>>
>>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/store/MMapDi__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMWIjjLhuw$
>>> rectory.html
>>>
>>> Happy Holidays
>>> Best regards
>>>
>>>
>>> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
>> but
>>> this requires creating Lucene Index on the fly.
>>>
>>> This is great for only such kind of Lucene indexes that can be created
>> quickly on
>>> the fly.
>>>
>>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>>
>>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-memory-search-and-__;!!GqivPVa7Brio!MAgLdznjSB6VCUW53bxfBB8GANAgHBAQCr4Jl4NIxTNKYeLlRtOl1TtPJMUCxw4qlA$
>>> autocomplete-with-lucene-8-5-f2df1bc71c36
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Hi,


> Thanks Uwe, i am not insisting on to load everything into memory
>
> but loading into memory might speed up and i would like to see how much
> speedup.
>
>
> but i have one more question and that is still not clear to me:
>
> "it is much better to open index, with MMAP directory"
>
>
> does this mean i should not use the constructor but instead use the open
> api?

No that means, use MMapDirectory, it should fit your needs. If you have enough memory outside of heap in your operating system that can be used by Lucene to have all pages of the mmaped file in memory then it’s the best you can have.

FSDirectory.open() is fine as it will always use MMapDirectory on 64 bit platforms.

> in other words: which way should be preferred?

Does not matter. If you want to use setPreload() [beware of slowdowns on opening index files for first time!!!], use constructor of MMAPDirectory, because the FSDirectoryFactory cannot guarantee which implementation you get.

Calling a static method on a class that does not implement it, is generally considered bad practise (Eclipse should warn you). The static FSDirectory.open() is a factory method and should be used (on FSDircetory not its subclass) if you don't know what you want to have and be operating system independent. If you want MMapDirectory and its features specifically, use the constructor.

> The example is from both during indexing and searching:
>
>
> /*First way: Using constructor (without setPreload) :*/
>
> MMapDirectory dir = new MMapDirectory(Paths.get(indexDir)); // Uses
> FSLockFactory.getDefault() and DEFAULT_MAX_CHUNK_SIZE which is 1GB
> ////if (dir.getPreload() == false)
> //// dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
> enabled-> *commented out*
> IndexReader reader = DirectoryReader.open(dir);
>
> ...
>
>
> /*Second way: Or using open (without setPreload) :*/
>
> *Directory* dir = MMapDirectory.open(Paths.get(indexDir)); //open is
> inherited from FSDirectory
> ////if (dir.getPreload() == false)
> //// dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
> enabled-> *here setPreload cannot be used*
> IndexReader reader = DirectoryReader.open(dir);
> IndexSearcher is = new IndexSearcher(reader);
>
> ...
>
>
> Best regards
>
>
> On 12/14/20 1:51 PM, Uwe Schindler wrote:
> > Hi,
> >
> > as writer of the original bog post, here my comments:
> >
> > Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
> > to load everything into memory - but that does not guarantee anything!
> > Still, I would not recommend to use that function, because all it does is to
> > just touch every page of the file, so the linux kernel puts it into OS cache
> > - nothing more; IMHO very ineffective as it slows down openining index for a
> > stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
> > later used or not! So this may take some time until it is done. Lateron,
> > still Lucene needs to open index files, initialize its own data
> > structures,...
> >
> > In general it is much better to open index, with MMAP directory and execute
> > some "sample" queries. This will do exactly the same like the preload
> > function, but it is more "selective". Parts of the index which are not used
> > won't be touched, and on top, it will also load ALL the required index
> > structures to heap.
> >
> > As always and as mentioned in my blog post: there's nothing that can ensure
> > your index will stays in memory. Please trust the kernel to do the right
> > thing. Why do you care at all?
> >
> > If you are curious and want to have everything in memory all the time:
> > - use tmpfs as your filesystem (of course you will loose data when OS shuts
> > down)
> > - disable swap and/or disable swapiness
> > - use only as much heap as needed, keep everything of free memory for your
> > index outside heap.
> >
> > Fake feelings of "everything in RAM" are misconceptions like:
> > - use RAMDirectory (deprecated): this may be a desaster as it described in
> > the blog post
> > - use ByteBuffersDirectory: a little bit better, but this brings nothing, as
> > the operating system kernel may still page out your index pages. They still
> > live in/off heap and are part of usual paging. They are just no longer
> > backed by a file.
> >
> > Lucene does most of the stuff outside heap, live with it!
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> >
> https://urldefense.com/v3/__https://www.thetaphi.de__;!!GqivPVa7Brio!Ll3PR
> 4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-
> yVLBCDuFHTogNnw9_Q$
> > eMail: uwe@thetaphi.de
> >
> >> -----Original Message-----
> >> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
> >> Sent: Sunday, December 13, 2020 10:18 PM
> >> To: java-user@lucene.apache.org
> >> Cc: BARIS KAZAR <baris.kazar@oracle.com>
> >> Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
> > ByteBuffersDirectory)
> >> Hi,-
> >>
> >> it would be nice to create a Lucene index in files and then effectively
> > load it
> >> into memory once (since i use in read-only mode). I am looking into if
> > this is
> >> doable in Lucene.
> >>
> >> i wish there were an option to load whole Lucene index into memory:
> >>
> >> Both of below urls have links to the blog url where i quoted a very nice
> > section:
> >>
> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/a
> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTrcPLQ6cQ$
> >> rectory.html
> >>
> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/a
> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHToSKhCY-w$
> >> rectory.html
> >>
> >> This following blog mentions about such option
> >> to run in the memory: (see the underlined sentence below)
> >>
> >> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-
> mmapdirectory-on-
> __;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW
> -yVLBCDuFHTpvqnQhbA$
> >> 64bit.html?m=1
> >>
> >> MMapDirectory will not load the whole index into physical memory. Why
> >> should it do this? We just ask the operating system to map the file into
> > address
> >> space for easy access, by no means we are requesting more. Java and the
> > O/S
> >> optionally provide the option to try loading the whole file into RAM (if
> > enough
> >> is available), but Lucene does not use that option (we may add this
> > possibility
> >> in a later version).
> >>
> >> My question is: is there such an option?
> >> is the method setPreLoad for this purpose:
> >> to load all Lucene lndex into memory?
> >>
> >> I would like to use MMapDirectory and set my
> >> JVM heap to 16G or a bit less (since my index is
> >> around this much).
> >>
> >> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
> >> public void setPreload(boolean preload)
> >> Set to true to ask mapped pages to be loaded into physical memory on init.
> > The
> >> behavior is best-effort and operating system dependent.
> >>
> >> For example Lucene 4.0.0 does not have setPreLoad method.
> >>
> >>
> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/a
> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTp_iadIDA$
> >> rectory.html
> >>
> >> Happy Holidays
> >> Best regards
> >>
> >>
> >> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
> > but
> >> this requires creating Lucene Index on the fly.
> >>
> >> This is great for only such kind of Lucene indexes that can be created
> > quickly on
> >> the fly.
> >>
> >> Ekaterina has a nice article on this BytesBuffersDirectory class:
> >>
> >> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-
> memory-search-and-
> __;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW
> -yVLBCDuFHTry-H8S-g$
> >> autocomplete-with-lucene-8-5-f2df1bc71c36
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
I see, i think i will use first way the constructor woith MMap and i
will not use setPreload api to avoid slowdowns.

yes, i was expecting a warning from eclipse in the second usage but
nothing came up.

Thanks for the clarifications.

Best regards


On 12/14/20 2:55 PM, Uwe Schindler wrote:
> Hi,
>
>
>> Thanks Uwe, i am not insisting on to load everything into memory
>>
>> but loading into memory might speed up and i would like to see how much
>> speedup.
>>
>>
>> but i have one more question and that is still not clear to me:
>>
>> "it is much better to open index, with MMAP directory"
>>
>>
>> does this mean i should not use the constructor but instead use the open
>> api?
> No that means, use MMapDirectory, it should fit your needs. If you have enough memory outside of heap in your operating system that can be used by Lucene to have all pages of the mmaped file in memory then it’s the best you can have.
>
> FSDirectory.open() is fine as it will always use MMapDirectory on 64 bit platforms.
>
>> in other words: which way should be preferred?
> Does not matter. If you want to use setPreload() [beware of slowdowns on opening index files for first time!!!], use constructor of MMAPDirectory, because the FSDirectoryFactory cannot guarantee which implementation you get.
>
> Calling a static method on a class that does not implement it, is generally considered bad practise (Eclipse should warn you). The static FSDirectory.open() is a factory method and should be used (on FSDircetory not its subclass) if you don't know what you want to have and be operating system independent. If you want MMapDirectory and its features specifically, use the constructor.
>
>> The example is from both during indexing and searching:
>>
>>
>> /*First way: Using constructor (without setPreload) :*/
>>
>> MMapDirectory dir = new MMapDirectory(Paths.get(indexDir)); // Uses
>> FSLockFactory.getDefault() and DEFAULT_MAX_CHUNK_SIZE which is 1GB
>> ////if (dir.getPreload() == false)
>> //// dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
>> enabled-> *commented out*
>> IndexReader reader = DirectoryReader.open(dir);
>>
>> ...
>>
>>
>> /*Second way: Or using open (without setPreload) :*/
>>
>> *Directory* dir = MMapDirectory.open(Paths.get(indexDir)); //open is
>> inherited from FSDirectory
>> ////if (dir.getPreload() == false)
>> //// dir.setPreload(Constants.PRELOAD_YES); // In-Memory Lucene Index
>> enabled-> *here setPreload cannot be used*
>> IndexReader reader = DirectoryReader.open(dir);
>> IndexSearcher is = new IndexSearcher(reader);
>>
>> ...
>>
>>
>> Best regards
>>
>>
>> On 12/14/20 1:51 PM, Uwe Schindler wrote:
>>> Hi,
>>>
>>> as writer of the original bog post, here my comments:
>>>
>>> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
>>> to load everything into memory - but that does not guarantee anything!
>>> Still, I would not recommend to use that function, because all it does is to
>>> just touch every page of the file, so the linux kernel puts it into OS cache
>>> - nothing more; IMHO very ineffective as it slows down openining index for a
>>> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
>>> later used or not! So this may take some time until it is done. Lateron,
>>> still Lucene needs to open index files, initialize its own data
>>> structures,...
>>>
>>> In general it is much better to open index, with MMAP directory and execute
>>> some "sample" queries. This will do exactly the same like the preload
>>> function, but it is more "selective". Parts of the index which are not used
>>> won't be touched, and on top, it will also load ALL the required index
>>> structures to heap.
>>>
>>> As always and as mentioned in my blog post: there's nothing that can ensure
>>> your index will stays in memory. Please trust the kernel to do the right
>>> thing. Why do you care at all?
>>>
>>> If you are curious and want to have everything in memory all the time:
>>> - use tmpfs as your filesystem (of course you will loose data when OS shuts
>>> down)
>>> - disable swap and/or disable swapiness
>>> - use only as much heap as needed, keep everything of free memory for your
>>> index outside heap.
>>>
>>> Fake feelings of "everything in RAM" are misconceptions like:
>>> - use RAMDirectory (deprecated): this may be a desaster as it described in
>>> the blog post
>>> - use ByteBuffersDirectory: a little bit better, but this brings nothing, as
>>> the operating system kernel may still page out your index pages. They still
>>> live in/off heap and are part of usual paging. They are just no longer
>>> backed by a file.
>>>
>>> Lucene does most of the stuff outside heap, live with it!
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> Achterdiek 19, D-28357 Bremen
>>>
>> https://urldefense.com/v3/__https://www.thetaphi.de__;!!GqivPVa7Brio!Ll3PR
>> 4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW-
>> yVLBCDuFHTogNnw9_Q$
>>> eMail: uwe@thetaphi.de
>>>
>>>> -----Original Message-----
>>>> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
>>>> Sent: Sunday, December 13, 2020 10:18 PM
>>>> To: java-user@lucene.apache.org
>>>> Cc: BARIS KAZAR <baris.kazar@oracle.com>
>>>> Subject: MMapDirectory vs In Memory Lucene Index (i.e.,
>>> ByteBuffersDirectory)
>>>> Hi,-
>>>>
>>>> it would be nice to create a Lucene index in files and then effectively
>>> load it
>>>> into memory once (since i use in read-only mode). I am looking into if
>>> this is
>>>> doable in Lucene.
>>>>
>>>> i wish there were an option to load whole Lucene index into memory:
>>>>
>>>> Both of below urls have links to the blog url where i quoted a very nice
>>> section:
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_0/core/org/a
>> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
>> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTrcPLQ6cQ$
>>>> rectory.html
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/8_5_2/core/org/a
>> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
>> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHToSKhCY-w$
>>>> rectory.html
>>>>
>>>> This following blog mentions about such option
>>>> to run in the memory: (see the underlined sentence below)
>>>>
>>>> https://urldefense.com/v3/__https://blog.thetaphi.de/2012/07/use-lucenes-
>> mmapdirectory-on-
>> __;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW
>> -yVLBCDuFHTpvqnQhbA$
>>>> 64bit.html?m=1
>>>>
>>>> MMapDirectory will not load the whole index into physical memory. Why
>>>> should it do this? We just ask the operating system to map the file into
>>> address
>>>> space for easy access, by no means we are requesting more. Java and the
>>> O/S
>>>> optionally provide the option to try loading the whole file into RAM (if
>>> enough
>>>> is available), but Lucene does not use that option (we may add this
>>> possibility
>>>> in a later version).
>>>>
>>>> My question is: is there such an option?
>>>> is the method setPreLoad for this purpose:
>>>> to load all Lucene lndex into memory?
>>>>
>>>> I would like to use MMapDirectory and set my
>>>> JVM heap to 16G or a bit less (since my index is
>>>> around this much).
>>>>
>>>> The Lucene 8.5.2 (8.5.0 as well) javadocs say:
>>>> public void setPreload(boolean preload)
>>>> Set to true to ask mapped pages to be loaded into physical memory on init.
>>> The
>>>> behavior is best-effort and operating system dependent.
>>>>
>>>> For example Lucene 4.0.0 does not have setPreLoad method.
>>>>
>>>>
>> https://urldefense.com/v3/__https://lucene.apache.org/core/4_0_0/core/org/a
>> pache/lucene/store/MMapDi__;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27z
>> NYgjsyXlMh9h6awmbZgSNW-yVLBCDuFHTp_iadIDA$
>>>> rectory.html
>>>>
>>>> Happy Holidays
>>>> Best regards
>>>>
>>>>
>>>> Ps. i know there is also BytesBuffersDirectory class for in memory Lucene
>>> but
>>>> this requires creating Lucene Index on the fly.
>>>>
>>>> This is great for only such kind of Lucene indexes that can be created
>>> quickly on
>>>> the fly.
>>>>
>>>> Ekaterina has a nice article on this BytesBuffersDirectory class:
>>>>
>>>> https://urldefense.com/v3/__https://medium.com/@ekaterinamihailova/in-
>> memory-search-and-
>> __;!!GqivPVa7Brio!Ll3PR4BZgqmgJNQ7MrnsXr27zNYgjsyXlMh9h6awmbZgSNW
>> -yVLBCDuFHTry-H8S-g$
>>>> autocomplete-with-lucene-8-5-f2df1bc71c36
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
On Mon, Dec 14, 2020 at 1:59 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>
> Hi,
>
> as writer of the original bog post, here my comments:
>
> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
> to load everything into memory - but that does not guarantee anything!
> Still, I would not recommend to use that function, because all it does is to
> just touch every page of the file, so the linux kernel puts it into OS cache
> - nothing more; IMHO very ineffective as it slows down openining index for a
> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
> later used or not! So this may take some time until it is done. Lateron,
> still Lucene needs to open index files, initialize its own data
> structures,...
>
> In general it is much better to open index, with MMAP directory and execute
> some "sample" queries. This will do exactly the same like the preload
> function, but it is more "selective". Parts of the index which are not used
> won't be touched, and on top, it will also load ALL the required index
> structures to heap.
>

The main purpose of this thing is a fast warming option for random
access files such as "i want to warm all my norms in RAM" or "i want
to warm all my docvalues in RAM"... really it should only be used with
the FileSwitchDirectory for a targeted purpose such as that: it is
definitely a waste to set it for your entire index. It is just
exposing the https://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html#load()
which first calls madvise(MADV_WILLNEED) and then touches every page.
If you want to "warm" an ENTIRE very specific file for a reason like
this (e.g. per-doc scoring value, ensuring it will be hot for all
docs), it is hard to be more efficient than that.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks Robert.

I think these valuable comments need to be placed on javadocs for future
references.

i think i am getting enough info for making a decision:

i will use MMapDirectory without setPreload and i hope my index will fit
into the RAM.

i plan to post a blog for findings.

Best regards


On 12/14/20 5:52 PM, Robert Muir wrote:
> On Mon, Dec 14, 2020 at 1:59 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>> Hi,
>>
>> as writer of the original bog post, here my comments:
>>
>> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog post is
>> to load everything into memory - but that does not guarantee anything!
>> Still, I would not recommend to use that function, because all it does is to
>> just touch every page of the file, so the linux kernel puts it into OS cache
>> - nothing more; IMHO very ineffective as it slows down openining index for a
>> stupid for-each-page-touch-loop. It will do this with EVERY page, if it is
>> later used or not! So this may take some time until it is done. Lateron,
>> still Lucene needs to open index files, initialize its own data
>> structures,...
>>
>> In general it is much better to open index, with MMAP directory and execute
>> some "sample" queries. This will do exactly the same like the preload
>> function, but it is more "selective". Parts of the index which are not used
>> won't be touched, and on top, it will also load ALL the required index
>> structures to heap.
>>
> The main purpose of this thing is a fast warming option for random
> access files such as "i want to warm all my norms in RAM" or "i want
> to warm all my docvalues in RAM"... really it should only be used with
> the FileSwitchDirectory for a targeted purpose such as that: it is
> definitely a waste to set it for your entire index. It is just
> exposing the https://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html#load()
> which first calls madvise(MADV_WILLNEED) and then touches every page.
> If you want to "warm" an ENTIRE very specific file for a reason like
> this (e.g. per-doc scoring value, ensuring it will be hot for all
> docs), it is hard to be more efficient than that.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Hi,-

 I tried MMapDirectory and i allocated as big as index size on my J2EE
Container but

it only gives me at most 25% speedup and even sometimes a small amount
of slowdown.

How can i effectively use Lucene indexes in memory?

Best regards


On 12/14/20 6:35 PM, baris.kazar@oracle.com wrote:
> Thanks Robert.
>
> I think these valuable comments need to be placed on javadocs for
> future references.
>
> i think i am getting enough info for making a decision:
>
> i will use MMapDirectory without setPreload and i hope my index will
> fit into the RAM.
>
> i plan to post a blog for findings.
>
> Best regards
>
>
> On 12/14/20 5:52 PM, Robert Muir wrote:
>> On Mon, Dec 14, 2020 at 1:59 PM Uwe Schindler <uwe@thetaphi.de> wrote:
>>> Hi,
>>>
>>> as writer of the original bog post, here my comments:
>>>
>>> Yes, MMapDirectory.setPreload() is the feature mentioned in my blog
>>> post is
>>> to load everything into memory - but that does not guarantee anything!
>>> Still, I would not recommend to use that function, because all it
>>> does is to
>>> just touch every page of the file, so the linux kernel puts it into
>>> OS cache
>>> - nothing more; IMHO very ineffective as it slows down openining
>>> index for a
>>> stupid for-each-page-touch-loop. It will do this with EVERY page, if
>>> it is
>>> later used or not! So this may take some time until it is done.
>>> Lateron,
>>> still Lucene needs to open index files, initialize its own data
>>> structures,...
>>>
>>> In general it is much better to open index, with MMAP directory and
>>> execute
>>> some "sample" queries. This will do exactly the same like the preload
>>> function, but it is more "selective". Parts of the index which are
>>> not used
>>> won't be touched, and on top, it will also load ALL the required index
>>> structures to heap.
>>>
>> The main purpose of this thing is a fast warming option for random
>> access files such as "i want to warm all my norms in RAM" or "i want
>> to warm all my docvalues in RAM"... really it should only be used with
>> the FileSwitchDirectory for a targeted purpose such as that: it is
>> definitely a waste to set it for your entire index. It is just
>> exposing the
>> https://docs.oracle.com/javase/7/docs/api/java/nio/MappedByteBuffer.html#load()
>> which first calls madvise(MADV_WILLNEED) and then touches every page.
>> If you want to "warm" an ENTIRE very specific file for a reason like
>> this (e.g. per-doc scoring value, ensuring it will be hot for all
>> docs), it is hard to be more efficient than that.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com> wrote:

> Hi,-
>
> I tried MMapDirectory and i allocated as big as index size on my J2EE
> Container but
>
>
Don't allocate java heap memory for the index, MMapDirectory does not use
java heap memory!
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Ok, but how is this MMapDirectory used then?

Best regards


On 2/23/21 7:03 AM, Robert Muir wrote:
>
>
> On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
> <mailto:baris.kazar@oracle.com>> wrote:
>
> Hi,-
>
>   I tried MMapDirectory and i allocated as big as index size on my
> J2EE
> Container but
>
>
> Don't allocate java heap memory for the index, MMapDirectory does not
> use java heap memory!
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
As Uwe suggested some time ago, tmpfs file system usage with
MMapDirectory is

the only way to get high speedup wrt on disk Lucene index, right?

Best regards


On 2/23/21 1:44 PM, baris.kazar@oracle.com wrote:
>
> Ok, but how is this MMapDirectory used then?
>
> Best regards
>
>
> On 2/23/21 7:03 AM, Robert Muir wrote:
>>
>>
>> On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>> wrote:
>>
>> Hi,-
>>
>>   I tried MMapDirectory and i allocated as big as index size on
>> my J2EE
>> Container but
>>
>>
>> Don't allocate java heap memory for the index, MMapDirectory does not
>> use java heap memory!
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Don't give gobs of memory to your java process, you will just make things
slower. The kernel will cache your index files.

On Tue, Feb 23, 2021 at 1:45 PM <baris.kazar@oracle.com> wrote:

> Ok, but how is this MMapDirectory used then?
>
> Best regards
>
>
> On 2/23/21 7:03 AM, Robert Muir wrote:
> >
> >
> > On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
> > <mailto:baris.kazar@oracle.com>> wrote:
> >
> > Hi,-
> >
> > I tried MMapDirectory and i allocated as big as index size on my
> > J2EE
> > Container but
> >
> >
> > Don't allocate java heap memory for the index, MMapDirectory does not
> > use java heap memory!
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks but then how will MMapDirectory help gain speedup?

i will try tmpfs and see what happens. i was expecting to get on order
of magnitude of speedup from already very fast on disk Lucene indexes.

So i was expecting really really really fast response with MMapDirectory.

Thanks


On 2/23/21 3:40 PM, Robert Muir wrote:
> Don't give gobs of memory to your java process, you will just make
> things slower. The kernel will cache your index files.
>
> On Tue, Feb 23, 2021 at 1:45 PM <baris.kazar@oracle.com
> <mailto:baris.kazar@oracle.com>> wrote:
>
> Ok, but how is this MMapDirectory used then?
>
> Best regards
>
>
> On 2/23/21 7:03 AM, Robert Muir wrote:
> >
> >
> > On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
> <mailto:baris.kazar@oracle.com>
> > <mailto:baris.kazar@oracle.com <mailto:baris.kazar@oracle.com>>>
> wrote:
> >
> >     Hi,-
> >
> >       I tried MMapDirectory and i allocated as big as index size
> on my
> >     J2EE
> >     Container but
> >
> >
> > Don't allocate java heap memory for the index, MMapDirectory
> does not
> > use java heap memory!
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
speedup over what? You are probably already using MMapDirectory (it is the
default). So I don't know what you are trying to achieve, but giving lots
of memory to your java process is not going to help.

If you just want to prevent the first few queries to a fresh cold machine
instance from being slow, you can use the preload for that before you make
it available. You could also use 'cat' or 'dd'.

On Tue, Feb 23, 2021 at 3:45 PM <baris.kazar@oracle.com> wrote:

> Thanks but then how will MMapDirectory help gain speedup?
>
> i will try tmpfs and see what happens. i was expecting to get on order of
> magnitude of speedup from already very fast on disk Lucene indexes.
>
> So i was expecting really really really fast response with MMapDirectory.
>
> Thanks
>
>
> On 2/23/21 3:40 PM, Robert Muir wrote:
>
> Don't give gobs of memory to your java process, you will just make things
> slower. The kernel will cache your index files.
>
> On Tue, Feb 23, 2021 at 1:45 PM <baris.kazar@oracle.com> wrote:
>
>> Ok, but how is this MMapDirectory used then?
>>
>> Best regards
>>
>>
>> On 2/23/21 7:03 AM, Robert Muir wrote:
>> >
>> >
>> > On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
>> > <mailto:baris.kazar@oracle.com>> wrote:
>> >
>> > Hi,-
>> >
>> > I tried MMapDirectory and i allocated as big as index size on my
>> > J2EE
>> > Container but
>> >
>> >
>> > Don't allocate java heap memory for the index, MMapDirectory does not
>> > use java heap memory!
>>
>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks, but each different query i see some slowdown (not much though)
with MMapDirectory and FSDirectory, though.

It is a little bit more with FSdirectory. So, MMapDirectory is slightly
better in that, too: ie, cold start.


What i want to achieve: Problem statement:

base case is disk based Lucene index with FSDirectory

speedup case was supposed to be in memory Lucene index with MMapDirectory


Uwe mentioned tmpfs will help. i will try that next.

Thanks


On 2/23/21 3:54 PM, Robert Muir wrote:
> speedup over what? You are probably already using MMapDirectory (it is
> the default). So I don't know what you are trying to achieve, but
> giving lots of memory to your java process is not going to help.
>
> If you just want to prevent the first few queries to a fresh cold
> machine instance from being slow, you can use the preload for that
> before you make it available. You could also use 'cat' or 'dd'.
>
> On Tue, Feb 23, 2021 at 3:45 PM <baris.kazar@oracle.com
> <mailto:baris.kazar@oracle.com>> wrote:
>
> Thanks but then how will MMapDirectory help gain speedup?
>
> i will try tmpfs and see what happens. i was expecting to get on
> order of magnitude of speedup from already very fast on disk
> Lucene indexes.
>
> So i was expecting really really really fast response with
> MMapDirectory.
>
> Thanks
>
>
> On 2/23/21 3:40 PM, Robert Muir wrote:
>> Don't give gobs of memory to your java process, you will just
>> make things slower. The kernel will cache your index files.
>>
>> On Tue, Feb 23, 2021 at 1:45 PM <baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>> wrote:
>>
>> Ok, but how is this MMapDirectory used then?
>>
>> Best regards
>>
>>
>> On 2/23/21 7:03 AM, Robert Muir wrote:
>> >
>> >
>> > On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>
>> > <mailto:baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>>> wrote:
>> >
>> >     Hi,-
>> >
>> >       I tried MMapDirectory and i allocated as big as index
>> size on my
>> >     J2EE
>> >     Container but
>> >
>> >
>> > Don't allocate java heap memory for the index,
>> MMapDirectory does not
>> > use java heap memory!
>>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
(edited previous response)


Thanks, but each different query at the first run i see some slowdown
(not much though) with MMapDirectory and FSDirectory wrt second, third
runs (due to cold start), though.

Cold start slowdown is a little bit more with FSdirectory. So,
MMapDirectory is slightly better in that, too: ie, cold start.


What i want to achieve: Problem statement:

base case is disk based Lucene index with FSDirectory

speedup case was supposed to be in memory Lucene index with MMapDirectory


Uwe mentioned tmpfs will help. i will try that next.


I thought preload was not helping much as we discussed here.

Thanks


On 2/23/21 3:54 PM, Robert Muir wrote:
> speedup over what? You are probably already using MMapDirectory (it is
> the default). So I don't know what you are trying to achieve, but
> giving lots of memory to your java process is not going to help.
>
> If you just want to prevent the first few queries to a fresh cold
> machine instance from being slow, you can use the preload for that
> before you make it available. You could also use 'cat' or 'dd'.
>
> On Tue, Feb 23, 2021 at 3:45 PM <baris.kazar@oracle.com
> <mailto:baris.kazar@oracle.com>> wrote:
>
> Thanks but then how will MMapDirectory help gain speedup?
>
> i will try tmpfs and see what happens. i was expecting to get on
> order of magnitude of speedup from already very fast on disk
> Lucene indexes.
>
> So i was expecting really really really fast response with
> MMapDirectory.
>
> Thanks
>
>
> On 2/23/21 3:40 PM, Robert Muir wrote:
>> Don't give gobs of memory to your java process, you will just
>> make things slower. The kernel will cache your index files.
>>
>> On Tue, Feb 23, 2021 at 1:45 PM <baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>> wrote:
>>
>> Ok, but how is this MMapDirectory used then?
>>
>> Best regards
>>
>>
>> On 2/23/21 7:03 AM, Robert Muir wrote:
>> >
>> >
>> > On Tue, Feb 23, 2021 at 2:30 AM <baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>
>> > <mailto:baris.kazar@oracle.com
>> <mailto:baris.kazar@oracle.com>>> wrote:
>> >
>> >     Hi,-
>> >
>> >       I tried MMapDirectory and i allocated as big as index
>> size on my
>> >     J2EE
>> >     Container but
>> >
>> >
>> > Don't allocate java heap memory for the index,
>> MMapDirectory does not
>> > use java heap memory!
>>
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
On Tue, Feb 23, 2021 at 4:07 PM <baris.kazar@oracle.com> wrote:

> What i want to achieve: Problem statement:
>
> base case is disk based Lucene index with FSDirectory
>
> speedup case was supposed to be in memory Lucene index with MMapDirectory
>
On 64-bit systems, FSDirectory just invokes MMapDirectory already. So you
don't need to do anything.

Either way MMapDirectory or NIOFSDirectory are doing the same thing:
reading your index as a normal file and letting the operating system cache
it.
The MMapDirectory is just better because it avoids some overheads, such as
read() system call, copying and buffering into java memory space, etc etc.
Some of these overheads are only getting worse, e.g. spectre/meltdown-type
fixes make syscalls 8x slower on my computer. So it is good that
MMapDirectory avoids it.

So I suggest just stop fighting the operating system, don't give your J2EE
container huge amounts of ram, let the kernel do its job.
If you want to "warm" a cold system because nothing is in kernel's cache,
then look into preload and so on. It is just "reading files" to get them
cached.
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
Thanks again, Robert. Could you please explain "preload"? Which
functionality is that? we discussed in this thread before about a preload.

Is there a Lucene url / site that i can look at for preload?

Thanks for the explanations. This thread will be useful for many folks i
believe.

Best regards


On 2/23/21 4:15 PM, Robert Muir wrote:
>
>
> On Tue, Feb 23, 2021 at 4:07 PM <baris.kazar@oracle.com
> <mailto:baris.kazar@oracle.com>> wrote:
>
> What i want to achieve: Problem statement:
>
> base case is disk based Lucene index with FSDirectory
>
> speedup case was supposed to be in memory Lucene index with
> MMapDirectory
>
> On 64-bit systems, FSDirectory just invokes MMapDirectory already. So
> you don't need to do anything.
>
> Either way MMapDirectory or NIOFSDirectory are doing the same thing:
> reading your index as a normal file and letting the operating system
> cache it.
> The MMapDirectory is just better because it avoids some overheads,
> such as read() system call, copying and buffering into java memory
> space, etc etc.
> Some of these overheads are only getting worse, e.g.
> spectre/meltdown-type fixes make syscalls 8x slower on my computer. So
> it is good that MMapDirectory avoids it.
>
> So I suggest just stop fighting the operating system, don't give your
> J2EE container huge amounts of ram, let the kernel do its job.
> If you want to "warm" a cold system because nothing is in kernel's
> cache, then look into preload and so on. It is just "reading files" to
> get them cached.
Re: MMapDirectory vs In Memory Lucene Index (i.e., ByteBuffersDirectory) [ In reply to ]
The preload isn't magical.
It only "reads in the whole file" to get it cached, same as if you did that
yourself with 'cat' or 'dd'.
It "warms" the file.

It just does this in an efficient way at the low level to make the warming
itself efficient. It madvise()s kernel to announce some read-ahead and then
reads the first byte of every mmap'd page (which is enough to fault it in).

At the end of the day it doesn't matter if you wrote a shitty shell script
that uses 'dd' to read in each index file and send it to /dev/null, or
whether you spent lots of time writing fancy java code to call this preload
thing: you get the same result, same end state.

Maybe the preload takes 18 seconds to "warm" the index, vs. your crappy
shell script which takes 22 seconds. It is mainly more important for
servers and portability (e.g. it will work fine on windows, but obviously
will not call madvise).

On Tue, Feb 23, 2021 at 4:18 PM <baris.kazar@oracle.com> wrote:

> Thanks again, Robert. Could you please explain "preload"? Which
> functionality is that? we discussed in this thread before about a preload.
>
> Is there a Lucene url / site that i can look at for preload?
>
> Thanks for the explanations. This thread will be useful for many folks i
> believe.
>
> Best regards
>
>
> On 2/23/21 4:15 PM, Robert Muir wrote:
>
>
>
> On Tue, Feb 23, 2021 at 4:07 PM <baris.kazar@oracle.com> wrote:
>
>> What i want to achieve: Problem statement:
>>
>> base case is disk based Lucene index with FSDirectory
>>
>> speedup case was supposed to be in memory Lucene index with MMapDirectory
>>
> On 64-bit systems, FSDirectory just invokes MMapDirectory already. So you
> don't need to do anything.
>
> Either way MMapDirectory or NIOFSDirectory are doing the same thing:
> reading your index as a normal file and letting the operating system cache
> it.
> The MMapDirectory is just better because it avoids some overheads, such as
> read() system call, copying and buffering into java memory space, etc etc.
> Some of these overheads are only getting worse, e.g. spectre/meltdown-type
> fixes make syscalls 8x slower on my computer. So it is good that
> MMapDirectory avoids it.
>
> So I suggest just stop fighting the operating system, don't give your J2EE
> container huge amounts of ram, let the kernel do its job.
> If you want to "warm" a cold system because nothing is in kernel's cache,
> then look into preload and so on. It is just "reading files" to get them
> cached.
>
>

1 2  View All