Mailing List Archive

Loading WFST to Memory Mapped File in Lucene
Hey all!

I'm loading multiple WFST with ~1.1 Gb and the JVM memory increases
proportionally. Looks like the file is stored in memory, meaning not using
Memory Mapped Files at all.

Example code:

In the following code we setup the Lucene to use /tmp/deleteme2 for the
memory mapped file and we load the file from /tmp/deleteme/file.wfst via an
InputStream.

After file load I list the files on /tmp/deleteme2 and nothing is found,
but I'm able to query the WFST file.

@Test
@SneakyThrows
void WFSTLoad() throws IOException {
Path wfstPath = Paths.get("/tmp/deleteme2");
Path wfstFilePath = Paths.get("/tmp/deleteme/file.wfst");

var directory = new MMapDirectory(wfstPath);

WFSTCompletionLookup wfst =
new WFSTCompletionLookup(directory, "temp");

try (var is = new FileInputStream(wfstFilePath.toFile())) {
wfst.load(is);
System.out.println("FILE LOADED");
}

Files.list(wfstPath).forEach(System.out::println);
System.out.println("FILES LISTED");

assertThat(wfst.get("qwert123qwert")).isEqualTo(123);
}

What am I doing wrong?

Thanks for the support

Best Regards
Marcos Rebelo

--

*Marcos Bruno Gomes Rebelo Engineering Manager / Data Scientist / Software
Engineer*
Linkedin: https://www.linkedin.com/in/oleber/
*Adding value to your data. Specialized in Search and Recommendation
Systems*
Technologies: Elastic, Spark, Scala, Jupiter Notebook, Python, ...
Re: Loading WFST to Memory Mapped File in Lucene [ In reply to ]
Looking at the code briefly, I think WFSTCompletionLookup uses on heap
store for the fst. You'd have to load it with off heap fst store instead:

https://github.com/apache/lucene/blob/1b9d98d6ec079e950bdd37137082f81400d3bc2e/lucene/core/src/java/org/apache/lucene/util/fst/OffHeapFSTStore.java

but I don't think there is an API in WFSTCompletionLookup that would allow
you to do that.

D.

On Fri, Dec 23, 2022 at 5:00 PM marcos rebelo <oleber@gmail.com> wrote:

> Hey all!
>
> I'm loading multiple WFST with ~1.1 Gb and the JVM memory increases
> proportionally. Looks like the file is stored in memory, meaning not using
> Memory Mapped Files at all.
>
> Example code:
>
> In the following code we setup the Lucene to use /tmp/deleteme2 for the
> memory mapped file and we load the file from /tmp/deleteme/file.wfst via an
> InputStream.
>
> After file load I list the files on /tmp/deleteme2 and nothing is found,
> but I'm able to query the WFST file.
>
> @Test
> @SneakyThrows
> void WFSTLoad() throws IOException {
> Path wfstPath = Paths.get("/tmp/deleteme2");
> Path wfstFilePath = Paths.get("/tmp/deleteme/file.wfst");
>
> var directory = new MMapDirectory(wfstPath);
>
> WFSTCompletionLookup wfst =
> new WFSTCompletionLookup(directory, "temp");
>
> try (var is = new FileInputStream(wfstFilePath.toFile())) {
> wfst.load(is);
> System.out.println("FILE LOADED");
> }
>
> Files.list(wfstPath).forEach(System.out::println);
> System.out.println("FILES LISTED");
>
> assertThat(wfst.get("qwert123qwert")).isEqualTo(123);
> }
>
> What am I doing wrong?
>
> Thanks for the support
>
> Best Regards
> Marcos Rebelo
>
> --
>
> *Marcos Bruno Gomes Rebelo Engineering Manager / Data Scientist / Software
> Engineer*
> Linkedin: https://www.linkedin.com/in/oleber/
> *Adding value to your data. Specialized in Search and Recommendation
> Systems*
> Technologies: Elastic, Spark, Scala, Jupiter Notebook, Python, ...
>
Re: Loading WFST to Memory Mapped File in Lucene [ In reply to ]
I have the same impression, even if I'm using the MMapDirectory. The data
is on heap.

For my use case, it's a huge waste of memory :( 90% of my data could be
correctly organised and kept in disk.

Thanks for the support

Best regards
Marcos Rebelo

On Tue, 27 Dec 2022, 09:11 Dawid Weiss, <dawid.weiss@gmail.com> wrote:

> Looking at the code briefly, I think WFSTCompletionLookup uses on heap
> store for the fst. You'd have to load it with off heap fst store instead:
>
>
> https://github.com/apache/lucene/blob/1b9d98d6ec079e950bdd37137082f81400d3bc2e/lucene/core/src/java/org/apache/lucene/util/fst/OffHeapFSTStore.java
>
> but I don't think there is an API in WFSTCompletionLookup that would allow
> you to do that.
>
> D.
>
> On Fri, Dec 23, 2022 at 5:00 PM marcos rebelo <oleber@gmail.com> wrote:
>
> > Hey all!
> >
> > I'm loading multiple WFST with ~1.1 Gb and the JVM memory increases
> > proportionally. Looks like the file is stored in memory, meaning not
> using
> > Memory Mapped Files at all.
> >
> > Example code:
> >
> > In the following code we setup the Lucene to use /tmp/deleteme2 for the
> > memory mapped file and we load the file from /tmp/deleteme/file.wfst via
> an
> > InputStream.
> >
> > After file load I list the files on /tmp/deleteme2 and nothing is found,
> > but I'm able to query the WFST file.
> >
> > @Test
> > @SneakyThrows
> > void WFSTLoad() throws IOException {
> > Path wfstPath = Paths.get("/tmp/deleteme2");
> > Path wfstFilePath = Paths.get("/tmp/deleteme/file.wfst");
> >
> > var directory = new MMapDirectory(wfstPath);
> >
> > WFSTCompletionLookup wfst =
> > new WFSTCompletionLookup(directory, "temp");
> >
> > try (var is = new FileInputStream(wfstFilePath.toFile())) {
> > wfst.load(is);
> > System.out.println("FILE LOADED");
> > }
> >
> > Files.list(wfstPath).forEach(System.out::println);
> > System.out.println("FILES LISTED");
> >
> > assertThat(wfst.get("qwert123qwert")).isEqualTo(123);
> > }
> >
> > What am I doing wrong?
> >
> > Thanks for the support
> >
> > Best Regards
> > Marcos Rebelo
> >
> > --
> >
> > *Marcos Bruno Gomes Rebelo Engineering Manager / Data Scientist /
> Software
> > Engineer*
> > Linkedin: https://www.linkedin.com/in/oleber/
> > *Adding value to your data. Specialized in Search and Recommendation
> > Systems*
> > Technologies: Elastic, Spark, Scala, Jupiter Notebook, Python, ...
> >
>
Re: Loading WFST to Memory Mapped File in Lucene [ In reply to ]
Please feel free to provide a pull request that adds the ability to
load the FST off heap to WFSTCompletionLookup. I think it's an
oversight and it'd be a good addition.

Dawid

On Tue, Dec 27, 2022 at 10:35 AM marcos rebelo <oleber@gmail.com> wrote:
>
> I have the same impression, even if I'm using the MMapDirectory. The data
> is on heap.
>
> For my use case, it's a huge waste of memory :( 90% of my data could be
> correctly organised and kept in disk.
>
> Thanks for the support
>
> Best regards
> Marcos Rebelo
>
> On Tue, 27 Dec 2022, 09:11 Dawid Weiss, <dawid.weiss@gmail.com> wrote:
>
> > Looking at the code briefly, I think WFSTCompletionLookup uses on heap
> > store for the fst. You'd have to load it with off heap fst store instead:
> >
> >
> > https://github.com/apache/lucene/blob/1b9d98d6ec079e950bdd37137082f81400d3bc2e/lucene/core/src/java/org/apache/lucene/util/fst/OffHeapFSTStore.java
> >
> > but I don't think there is an API in WFSTCompletionLookup that would allow
> > you to do that.
> >
> > D.
> >
> > On Fri, Dec 23, 2022 at 5:00 PM marcos rebelo <oleber@gmail.com> wrote:
> >
> > > Hey all!
> > >
> > > I'm loading multiple WFST with ~1.1 Gb and the JVM memory increases
> > > proportionally. Looks like the file is stored in memory, meaning not
> > using
> > > Memory Mapped Files at all.
> > >
> > > Example code:
> > >
> > > In the following code we setup the Lucene to use /tmp/deleteme2 for the
> > > memory mapped file and we load the file from /tmp/deleteme/file.wfst via
> > an
> > > InputStream.
> > >
> > > After file load I list the files on /tmp/deleteme2 and nothing is found,
> > > but I'm able to query the WFST file.
> > >
> > > @Test
> > > @SneakyThrows
> > > void WFSTLoad() throws IOException {
> > > Path wfstPath = Paths.get("/tmp/deleteme2");
> > > Path wfstFilePath = Paths.get("/tmp/deleteme/file.wfst");
> > >
> > > var directory = new MMapDirectory(wfstPath);
> > >
> > > WFSTCompletionLookup wfst =
> > > new WFSTCompletionLookup(directory, "temp");
> > >
> > > try (var is = new FileInputStream(wfstFilePath.toFile())) {
> > > wfst.load(is);
> > > System.out.println("FILE LOADED");
> > > }
> > >
> > > Files.list(wfstPath).forEach(System.out::println);
> > > System.out.println("FILES LISTED");
> > >
> > > assertThat(wfst.get("qwert123qwert")).isEqualTo(123);
> > > }
> > >
> > > What am I doing wrong?
> > >
> > > Thanks for the support
> > >
> > > Best Regards
> > > Marcos Rebelo
> > >
> > > --
> > >
> > > *Marcos Bruno Gomes Rebelo Engineering Manager / Data Scientist /
> > Software
> > > Engineer*
> > > Linkedin: https://www.linkedin.com/in/oleber/
> > > *Adding value to your data. Specialized in Search and Recommendation
> > > Systems*
> > > Technologies: Elastic, Spark, Scala, Jupiter Notebook, Python, ...
> > >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org