Mailing List Archive

Lucene 8 causing app server threads to hang due to high rate of network usage
Hello,

My name is Kathleen Hilston, and I am a Software Engineer Sr working for Snap-on Business Solutions (SBS).

We hope you can help us with a problem that we are facing.

Issue: Lucene 8 causing app server threads to hang due to high rate of network usage.

Further details: Recently we migrated from Lucene 7.5.0 to Lucene 8.6.3 and we have encountered severe performance issues after this upgrade. Our Lucene index has multilingual terms, is large in size, and is hosted on a network file storage (EFS at AWS). Our Lucene queries construct a lot of Boolean term queries, and we suspect the off-heap FST introduced with Lucene 8 could be the root cause. The specific issue we are facing after the Lucene upgrade is that, when a user searches for any given term, the tomcat server thread will hang while reading the bytes from an unexpectedly huge inbound flow of data from the Lucene Index on network storage. We have seen inbound data flows ranging from 5% up to 45% of the total index size for a single search, primarily when searching for a term in a different language. This issue does not occur with Lucene 7.

Here is a typical call stack highlighting the point of contention in the Tomcat threads when we encounter this performance issue:

org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:330)
org.apache.lucene.search.Weight.bulkScorer(Weight.java:181)
org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
org.apache.lucene.search.Weight.scorerSupplier(Weight.java:147)
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:115)
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts(SegmentTermsEnum.java:1017)
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts(Lucene84PostingsReader.java:272)
org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocsEnum.<init>(Lucene84PostingsReader.java:1061)
org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init(Lucene84SkipReader.java:103)
org.apache.lucene.codecs.MultiLevelSkipListReader.init(MultiLevelSkipListReader.java:208)
org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels(MultiLevelSkipListReader.java:229)
org.apache.lucene.store.DataInput.readVLong(DataInput.java:190)
org.apache.lucene.store.DataInput.readVLong(DataInput.java:205)
org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:80)
org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:99)

When researching found the LUCENE JIRA LUCENE-8635<https://issues.apache.org/jira/browse/LUCENE-8635> (which is referenced in https://www.elastic.co/blog/whats-new-in-lucene-8 section 'Moving the terms dictionary off-heap'). Would this help the issue?

Please advise.

Thank you

Kathleen Hilston | Software Engineer Sr

Snap-on Business Solutions


[http://rich-iweb-20-rv.ipa.snapbs.com:9001/sbs-sig/i/sbs-100.png]

4025 Kinross Lakes Parkway | Richfield, OH 44286

Office: 330-659-1818

kathleen.hilston@snapon.com<mailto:kathleen.hilston@snapon.com>
Re: Lucene 8 causing app server threads to hang due to high rate of network usage [ In reply to ]
Don't use filesystems such as NFS (that is what EFS is) with lucene! This
is really bad design, and it is the root cause of your issue.

On Tue, Apr 27, 2021 at 1:21 PM Hilston, Kathleen <
Kathleen.Hilston@snapon.com> wrote:

> Hello,
>
>
>
> My name is Kathleen Hilston, and I am a Software Engineer Sr working for
> Snap-on Business Solutions (SBS).
>
>
>
> We hope you can help us with a problem that we are facing.
>
>
>
> *Issue*: Lucene 8 causing app server threads to hang due to high rate of
> network usage.
>
>
>
> *Further details*: Recently we migrated from Lucene 7.5.0 to Lucene 8.6.3
> and we have encountered severe performance issues after this upgrade. Our
> Lucene index has multilingual terms, is large in size, and is hosted on a
> network file storage (EFS at AWS). Our Lucene queries construct a lot of
> Boolean term queries, and we suspect the off-heap FST introduced with
> Lucene 8 could be the root cause. The specific issue we are facing after
> the Lucene upgrade is that, when a user searches for any given term, the
> tomcat server thread will hang while reading the bytes from an unexpectedly
> huge inbound flow of data from the Lucene Index on network storage. We
> have seen inbound data flows ranging from 5% up to 45% of the total index
> size for a single search, primarily when searching for a term in a
> different language. This issue does not occur with Lucene 7.
>
>
>
> Here is a typical call stack highlighting the point of contention in the
> Tomcat threads when we encounter this performance issue:
>
>
>
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
>
> org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
>
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
>
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
>
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
>
> org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:330)
>
> org.apache.lucene.search.Weight.bulkScorer(Weight.java:181)
>
> org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344)
>
>
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
>
>
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
>
>
> org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
>
> org.apache.lucene.search.Weight.scorerSupplier(Weight.java:147)
>
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:115)
>
>
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts(SegmentTermsEnum.java:1017)
>
>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts(Lucene84PostingsReader.java:272)
>
>
> org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocsEnum.<init>(Lucene84PostingsReader.java:1061)
>
>
> org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init(Lucene84SkipReader.java:103)
>
>
> org.apache.lucene.codecs.MultiLevelSkipListReader.init(MultiLevelSkipListReader.java:208)
>
>
> org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels(MultiLevelSkipListReader.java:229)
>
> org.apache.lucene.store.DataInput.readVLong(DataInput.java:190)
>
> org.apache.lucene.store.DataInput.readVLong(DataInput.java:205)
>
>
> org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:80)
>
> org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:99)
>
>
>
> When researching found the LUCENE JIRA LUCENE-8635
> <https://issues.apache.org/jira/browse/LUCENE-8635> (which is referenced
> in https://www.elastic.co/blog/whats-new-in-lucene-8 section ‘Moving the
> terms dictionary off-heap’). Would this help the issue?
>
>
>
> Please advise.
>
>
>
> Thank you
>
>
>
> *Kathleen Hilston* | Software Engineer Sr
>
> Snap-on Business Solutions
>
>
>
> [image: http://rich-iweb-20-rv.ipa.snapbs.com:9001/sbs-sig/i/sbs-100.png]
>
> 4025 Kinross Lakes Parkway | Richfield, OH 44286
>
> Office: 330-659-1818
>
> kathleen.hilston@snapon.com
>
>
>
>
>
Re: Lucene 8 causing app server threads to hang due to high rate of network usage [ In reply to ]
Hello, I am also keenly interested in Lucene performance on NFS/EFS. I have an extensive experience with a another (proprietory) search engine successfully using NFS for indexing/search. In our case, the key has always been making sure that a large portion of the index is in the host page cache, limiting the impact of indexing/merging on cache invalidation, warmup etc. Yes, there would always be scenarios where a specific query would suddenly cause many hard page faults, in which case careful attention to NFS client peformance tuning can help (attribute caching, rsize, etc) as well as kernel tuning. Whether or not the file is mmaped also makes a difference. In our experience mmaped file triggers larger read-ahead in the kernel and so can put a higher burden on the NFS server. Generally the spikes can be tolerated with few queries affected as long as many servers don't hit NFS all at once, essentially causing a DDOS scenario. So I am very interested in your response that having a Lucene on EFS is bad design. There are many advantages to this architecture, since it completely decouples the index data storage from the servers hosting the search service. Is there some specific aspect of Lucene that makes this approach impractical? Is there specific features in Lucene 8 that are more likely to affect this? I don't have a ton of experience with Lucene, but I have seen behavior where query cache can cause spikes due to it caching subquery results. If the subquery is very board, it could potentially overload the NFS server (this was in Lucene 6 though so take with a grain of salt).

To the OP: I don't have 8.6 code, but based on 8.8 source, I believe you may be looking at the wrong thread. The thread that is actually doing the reads would be running in the executor pool.

Thanks
Andrei

> On 04/28/2021 6:43 AM Robert Muir <rcmuir@gmail.com> wrote:
>
>
> Don't use filesystems such as NFS (that is what EFS is) with lucene! This
> is really bad design, and it is the root cause of your issue.
>
> On Tue, Apr 27, 2021 at 1:21 PM Hilston, Kathleen <
> Kathleen.Hilston@snapon.com> wrote:
>
> > Hello,
> >
> >
> >
> > My name is Kathleen Hilston, and I am a Software Engineer Sr working for
> > Snap-on Business Solutions (SBS).
> >
> >
> >
> > We hope you can help us with a problem that we are facing.
> >
> >
> >
> > *Issue*: Lucene 8 causing app server threads to hang due to high rate of
> > network usage.
> >
> >
> >
> > *Further details*: Recently we migrated from Lucene 7.5.0 to Lucene 8.6.3
> > and we have encountered severe performance issues after this upgrade. Our
> > Lucene index has multilingual terms, is large in size, and is hosted on a
> > network file storage (EFS at AWS). Our Lucene queries construct a lot of
> > Boolean term queries, and we suspect the off-heap FST introduced with
> > Lucene 8 could be the root cause. The specific issue we are facing after
> > the Lucene upgrade is that, when a user searches for any given term, the
> > tomcat server thread will hang while reading the bytes from an unexpectedly
> > huge inbound flow of data from the Lucene Index on network storage. We
> > have seen inbound data flows ranging from 5% up to 45% of the total index
> > size for a single search, primarily when searching for a term in a
> > different language. This issue does not occur with Lucene 7.
> >
> >
> >
> > Here is a typical call stack highlighting the point of contention in the
> > Tomcat threads when we encounter this performance issue:
> >
> >
> >
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:432)
> >
> > org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:421)
> >
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:574)
> >
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:445)
> >
> > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:658)
> >
> > org.apache.lucene.search.BooleanWeight.bulkScorer(BooleanWeight.java:330)
> >
> > org.apache.lucene.search.Weight.bulkScorer(Weight.java:181)
> >
> > org.apache.lucene.search.BooleanWeight.scorer(BooleanWeight.java:344)
> >
> >
> > org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
> >
> >
> > org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
> >
> >
> > org.apache.lucene.search.BooleanWeight.scorerSupplier(BooleanWeight.java:379)
> >
> > org.apache.lucene.search.Weight.scorerSupplier(Weight.java:147)
> >
> > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:115)
> >
> >
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.impacts(SegmentTermsEnum.java:1017)
> >
> >
> > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.impacts(Lucene84PostingsReader.java:272)
> >
> >
> > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader$BlockImpactsDocsEnum.<init>(Lucene84PostingsReader.java:1061)
> >
> >
> > org.apache.lucene.codecs.lucene84.Lucene84SkipReader.init(Lucene84SkipReader.java:103)
> >
> >
> > org.apache.lucene.codecs.MultiLevelSkipListReader.init(MultiLevelSkipListReader.java:208)
> >
> >
> > org.apache.lucene.codecs.MultiLevelSkipListReader.loadSkipLevels(MultiLevelSkipListReader.java:229)
> >
> > org.apache.lucene.store.DataInput.readVLong(DataInput.java:190)
> >
> > org.apache.lucene.store.DataInput.readVLong(DataInput.java:205)
> >
> >
> > org.apache.lucene.store.ByteBufferIndexInput.readByte(ByteBufferIndexInput.java:80)
> >
> > org.apache.lucene.store.ByteBufferGuard.getByte(ByteBufferGuard.java:99)
> >
> >
> >
> > When researching found the LUCENE JIRA LUCENE-8635
> > <https://issues.apache.org/jira/browse/LUCENE-8635> (which is referenced
> > in https://www.elastic.co/blog/whats-new-in-lucene-8 section ‘Moving the
> > terms dictionary off-heap’). Would this help the issue?
> >
> >
> >
> > Please advise.
> >
> >
> >
> > Thank you
> >
> >
> >
> > *Kathleen Hilston* | Software Engineer Sr
> >
> > Snap-on Business Solutions
> >
> >
> >
> > [image: http://rich-iweb-20-rv.ipa.snapbs.com:9001/sbs-sig/i/sbs-100.png]
> >
> > 4025 Kinross Lakes Parkway | Richfield, OH 44286
> >
> > Office: 330-659-1818
> >
> > kathleen.hilston@snapon.com
> >
> >
> >
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org