Mailing List Archive

Preventing field data from being loaded into page cache
Is there any way to keep field data files out of the operating system's
page cache? We only use fdt for highlighting and don't need to keep it warm
in memory. From what I understand, the operating system is in control of
what files get loaded into the page cache. Does Lucene have any mechanisms
to explicitly prevent them from being cached? Is it even possible with
Java?

Thanks,
Justin Borromeo
Re: Preventing field data from being loaded into page cache [ In reply to ]
Hi,

There is a workaround available called DirectIODirectory. You can
subclass it and override useDirectIO() method to return true only for
fdt files. It wraps another FSDirectory (e.g. MMapDirectory) and
delegates everything back to it, but for those where useDirectIO()
returns true it implements its own IndexInput:

https://github.com/apache/lucene/blob/90f8bac9f75df88fed387d5b9f2b0ee387604387/lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java#L160-L164

The default uses DirectIO only for merges to not pollute page cache
during merging index segments.

Uwe

Am 21.10.2023 um 01:54 schrieb Justin Borromeo:
> Is there any way to keep field data files out of the operating system's
> page cache? We only use fdt for highlighting and don't need to keep it warm
> in memory. From what I understand, the operating system is in control of
> what files get loaded into the page cache. Does Lucene have any mechanisms
> to explicitly prevent them from being cached? Is it even possible with
> Java?
>
> Thanks,
> Justin Borromeo
>
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Preventing field data from being loaded into page cache [ In reply to ]
Justin,

Be advised, overdependence on DirectIODirectory can lead to other problems,
which vary based on the amount of data, access patterns, resource
utilization, etc. I have seen issues in a few places. If you are running a
fork, the problems can be even more pronounced and less understood. You
will know better if the workaround works for you, and it probably will work
because Uwe is usually correct. If you need to free up the page cache, I
totally understand.

My advice is to keep an eye on the CPU Utilization delta as you work
through this change.

Marcus

On Sat, Oct 21, 2023 at 4:36?AM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> There is a workaround available called DirectIODirectory. You can
> subclass it and override useDirectIO() method to return true only for
> fdt files. It wraps another FSDirectory (e.g. MMapDirectory) and
> delegates everything back to it, but for those where useDirectIO()
> returns true it implements its own IndexInput:
>
>
> https://github.com/apache/lucene/blob/90f8bac9f75df88fed387d5b9f2b0ee387604387/lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java#L160-L164
>
> The default uses DirectIO only for merges to not pollute page cache
> during merging index segments.
>
> Uwe
>
> Am 21.10.2023 um 01:54 schrieb Justin Borromeo:
> > Is there any way to keep field data files out of the operating system's
> > page cache? We only use fdt for highlighting and don't need to keep it
> warm
> > in memory. From what I understand, the operating system is in control of
> > what files get loaded into the page cache. Does Lucene have any
> mechanisms
> > to explicitly prevent them from being cached? Is it even possible with
> > Java?
> >
> > Thanks,
> > Justin Borromeo
> >
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--
Marcus Eagan