Mailing List Archive

GC cost of creating String resource description on IndexInput clone
Hello everyone,

We recently added Java Flight Recorder (JFR) to our internal benchmarking.
While looking through top heap allocations from JFR output, we noticed the
following to be in top 5 contributors to garbage creation.

org.apache.lucene.store.IndexInput#getFullSliceDescription()
at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
at org.apache.lucene.store.ByteBufferIndexInput#slice()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
at org.apache.lucene.store.IndexInput#randomAccessSlice()

It seems that we construct a new String resource description for each
clone/slice of IndexInput instance. Resource description for a clone/slice
instance is a String concatenation of resource description from original
instance and current slice description. Also, clone/slice can happen
multiple times per query per segment.

Can we avoid upfront String allocation by late-evaluating
IndexInput.toString() on the cloned instance? One approach could be to hold
a reference of the base IndexInput instance in the cloned instance. Then
while evaluating IndexInput.toString() on a cloned instance, we can
construct a resource description by concatenating base instance's
toString() with
current resource description. My understanding is that IndexInput.toString()
is primarily used for debugging purposes hence we can benefit from the
late-evaluation.

With this approach, we are seeing sustainable GC time reduction of ~6% and
gain of ~1% to red-line QPS (throughput). My intention to start this thread
is to collect feedback on this approach as well as to discuss any other
ideas.

Thanks,
Viral Gandhi
Re: GC cost of creating String resource description on IndexInput clone [ In reply to ]
Late evaluation sounds like it would definitely be nice, but I worry about
holding on to object instances longer than necessary might lead to memory
leaks. Sounds like a good issue to open on JIRA.

On Fri, Feb 19, 2021 at 3:58 PM Viral Gandhi <viral.dev111@gmail.com> wrote:

> Hello everyone,
>
> We recently added Java Flight Recorder (JFR) to our internal benchmarking.
> While looking through top heap allocations from JFR output, we noticed the
> following to be in top 5 contributors to garbage creation.
>
> org.apache.lucene.store.IndexInput#getFullSliceDescription()
> at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
> at org.apache.lucene.store.ByteBufferIndexInput#slice()
> at
> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
> at org.apache.lucene.store.IndexInput#randomAccessSlice()
>
> It seems that we construct a new String resource description for each
> clone/slice of IndexInput instance. Resource description for a clone/slice
> instance is a String concatenation of resource description from original
> instance and current slice description. Also, clone/slice can happen
> multiple times per query per segment.
>
> Can we avoid upfront String allocation by late-evaluating
> IndexInput.toString() on the cloned instance? One approach could be to
> hold a reference of the base IndexInput instance in the cloned instance.
> Then while evaluating IndexInput.toString() on a cloned instance, we can
> construct a resource description by concatenating base instance's
> toString() with current resource description. My understanding is that
> IndexInput.toString() is primarily used for debugging purposes hence we
> can benefit from the late-evaluation.
>
> With this approach, we are seeing sustainable GC time reduction of ~6% and
> gain of ~1% to red-line QPS (throughput). My intention to start this thread
> is to collect feedback on this approach as well as to discuss any other
> ideas.
>
> Thanks,
> Viral Gandhi
>
Re: GC cost of creating String resource description on IndexInput clone [ In reply to ]
The issue is that clone or not, they are both IndexInput.java. So if we go
with your proposal, then *sometimes* the code will have this reference and
*other times* it won't and the reference will be null. In that non-clone
case, where would its resource description (filename) come from? I predict
too many bugs.

Making this part of the code complex would be the wrong tradeoff for 1%
performance.


On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev111@gmail.com> wrote:

> Hello everyone,
>
> We recently added Java Flight Recorder (JFR) to our internal benchmarking.
> While looking through top heap allocations from JFR output, we noticed the
> following to be in top 5 contributors to garbage creation.
>
> org.apache.lucene.store.IndexInput#getFullSliceDescription()
> at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
> at org.apache.lucene.store.ByteBufferIndexInput#slice()
> at
> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
> at org.apache.lucene.store.IndexInput#randomAccessSlice()
>
> It seems that we construct a new String resource description for each
> clone/slice of IndexInput instance. Resource description for a clone/slice
> instance is a String concatenation of resource description from original
> instance and current slice description. Also, clone/slice can happen
> multiple times per query per segment.
>
> Can we avoid upfront String allocation by late-evaluating
> IndexInput.toString() on the cloned instance? One approach could be to
> hold a reference of the base IndexInput instance in the cloned instance.
> Then while evaluating IndexInput.toString() on a cloned instance, we can
> construct a resource description by concatenating base instance's
> toString() with current resource description. My understanding is that
> IndexInput.toString() is primarily used for debugging purposes hence we
> can benefit from the late-evaluation.
>
> With this approach, we are seeing sustainable GC time reduction of ~6% and
> gain of ~1% to red-line QPS (throughput). My intention to start this thread
> is to collect feedback on this approach as well as to discuss any other
> ideas.
>
> Thanks,
> Viral Gandhi
>
RE: GC cost of creating String resource description on IndexInput clone [ In reply to ]
Hi,



I was thinking about something similar when working on the Java 17 replacement for MMapDirectory. My idea would still be late evaluation, but using a lambda. Same way how you can do lazy evaluation in Log4j2 when logging with debug/trace/…



Instead of having “String resourceDescription“ as a field in each Indexinput, we can use “Supplier<String> resourceDescription” and initialize it like: () -> foo + bar + parent.toString +…



Just because this might be an option to do, I fully agree with Mike Drob and Robert Muir: The risk of memory leaks for such a small improvement is too high. It also depends on the GC you use. If you use old ConcMarkSweepGC, I agree this is an overhead, but nowadays, Lucene should use G1GC.



In addition, the strings are not for debugging, they are really useful when something goes wrong, e.g. when I/O errors occur.



Uwe



-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de



From: Robert Muir <rcmuir@gmail.com>
Sent: Saturday, February 20, 2021 11:56 AM
To: dev@lucene.apache.org
Subject: Re: GC cost of creating String resource description on IndexInput clone



The issue is that clone or not, they are both IndexInput.java. So if we go with your proposal, then *sometimes* the code will have this reference and *other times* it won't and the reference will be null. In that non-clone case, where would its resource description (filename) come from? I predict too many bugs.



Making this part of the code complex would be the wrong tradeoff for 1% performance.





On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev111@gmail.com <mailto:viral.dev111@gmail.com> > wrote:

Hello everyone,

We recently added Java Flight Recorder (JFR) to our internal benchmarking. While looking through top heap allocations from JFR output, we noticed the following to be in top 5 contributors to garbage creation.

org.apache.lucene.store.IndexInput#getFullSliceDescription()
at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
at org.apache.lucene.store.ByteBufferIndexInput#slice()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
at org.apache.lucene.store.IndexInput#randomAccessSlice()

It seems that we construct a new String resource description for each clone/slice of IndexInput instance. Resource description for a clone/slice instance is a String concatenation of resource description from original instance and current slice description. Also, clone/slice can happen multiple times per query per segment.

Can we avoid upfront String allocation by late-evaluating IndexInput.toString() on the cloned instance? One approach could be to hold a reference of the base IndexInput instance in the cloned instance. Then while evaluating IndexInput.toString() on a cloned instance, we can construct a resource description by concatenating base instance's toString() with current resource description. My understanding is that IndexInput.toString() is primarily used for debugging purposes hence we can benefit from the late-evaluation.

With this approach, we are seeing sustainable GC time reduction of ~6% and gain of ~1% to red-line QPS (throughput). My intention to start this thread is to collect feedback on this approach as well as to discuss any other ideas.



Thanks,

Viral Gandhi
RE: GC cost of creating String resource description on IndexInput clone [ In reply to ]
Hi,



On top of that: You did not prove that the resourceDescription String in IndexInput is the main GC problem. I would suspect that actually the duplicated ByteBuffers are the main contributor to GC; those are unavoidable! The String is produced together with the ByteBuffer duplicates, so I would say: you can’t figure out from JFR what’s the main contributor (the small strings in between should be optimized away anyways).



Uwe



-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de



From: Uwe Schindler <uwe@thetaphi.de>
Sent: Saturday, February 20, 2021 12:17 PM
To: 'dev@lucene.apache.org' <dev@lucene.apache.org>
Subject: RE: GC cost of creating String resource description on IndexInput clone



Hi,



I was thinking about something similar when working on the Java 17 replacement for MMapDirectory. My idea would still be late evaluation, but using a lambda. Same way how you can do lazy evaluation in Log4j2 when logging with debug/trace/…



Instead of having “String resourceDescription“ as a field in each Indexinput, we can use “Supplier<String> resourceDescription” and initialize it like: () -> foo + bar + parent.toString +…



Just because this might be an option to do, I fully agree with Mike Drob and Robert Muir: The risk of memory leaks for such a small improvement is too high. It also depends on the GC you use. If you use old ConcMarkSweepGC, I agree this is an overhead, but nowadays, Lucene should use G1GC.



In addition, the strings are not for debugging, they are really useful when something goes wrong, e.g. when I/O errors occur.



Uwe



-----

Uwe Schindler

Achterdiek 19, D-28357 Bremen

https://www.thetaphi.de

eMail: uwe@thetaphi.de <mailto:uwe@thetaphi.de>



From: Robert Muir <rcmuir@gmail.com <mailto:rcmuir@gmail.com> >
Sent: Saturday, February 20, 2021 11:56 AM
To: dev@lucene.apache.org <mailto:dev@lucene.apache.org>
Subject: Re: GC cost of creating String resource description on IndexInput clone



The issue is that clone or not, they are both IndexInput.java. So if we go with your proposal, then *sometimes* the code will have this reference and *other times* it won't and the reference will be null. In that non-clone case, where would its resource description (filename) come from? I predict too many bugs.



Making this part of the code complex would be the wrong tradeoff for 1% performance.





On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev111@gmail.com <mailto:viral.dev111@gmail.com> > wrote:

Hello everyone,

We recently added Java Flight Recorder (JFR) to our internal benchmarking. While looking through top heap allocations from JFR output, we noticed the following to be in top 5 contributors to garbage creation.

org.apache.lucene.store.IndexInput#getFullSliceDescription()
at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
at org.apache.lucene.store.ByteBufferIndexInput#slice()
at org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
at org.apache.lucene.store.IndexInput#randomAccessSlice()

It seems that we construct a new String resource description for each clone/slice of IndexInput instance. Resource description for a clone/slice instance is a String concatenation of resource description from original instance and current slice description. Also, clone/slice can happen multiple times per query per segment.

Can we avoid upfront String allocation by late-evaluating IndexInput.toString() on the cloned instance? One approach could be to hold a reference of the base IndexInput instance in the cloned instance. Then while evaluating IndexInput.toString() on a cloned instance, we can construct a resource description by concatenating base instance's toString() with current resource description. My understanding is that IndexInput.toString() is primarily used for debugging purposes hence we can benefit from the late-evaluation.

With this approach, we are seeing sustainable GC time reduction of ~6% and gain of ~1% to red-line QPS (throughput). My intention to start this thread is to collect feedback on this approach as well as to discuss any other ideas.



Thanks,

Viral Gandhi
Re: GC cost of creating String resource description on IndexInput clone [ In reply to ]
+1 to explore Uwe's lambda idea. I really hate lambdas (you should too, if
you know what happens behind the scenes), but this sounds like maybe an
actual justified use of lambdas, taking advantage of the capture facility
and not using them just to be "cute"

On Sat, Feb 20, 2021 at 6:17 AM Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
>
>
> I was thinking about something similar when working on the Java 17
> replacement for MMapDirectory. My idea would still be late evaluation, but
> using a lambda. Same way how you can do lazy evaluation in Log4j2 when
> logging with debug/trace/…
>
>
>
> Instead of having “String resourceDescription“ as a field in each
> Indexinput, we can use “Supplier<String> resourceDescription” and
> initialize it like: () -> foo + bar + parent.toString +…
>
>
>
> Just because this might be an option to do, I fully agree with Mike Drob
> and Robert Muir: The risk of memory leaks for such a small improvement is
> too high. It also depends on the GC you use. If you use old
> ConcMarkSweepGC, I agree this is an overhead, but nowadays, Lucene should
> use G1GC.
>
>
>
> In addition, the strings are not for debugging, they are really useful
> when something goes wrong, e.g. when I/O errors occur.
>
>
>
> Uwe
>
>
>
> -----
>
> Uwe Schindler
>
> Achterdiek 19, D-28357 Bremen
>
> https://www.thetaphi.de
>
> eMail: uwe@thetaphi.de
>
>
>
> *From:* Robert Muir <rcmuir@gmail.com>
> *Sent:* Saturday, February 20, 2021 11:56 AM
> *To:* dev@lucene.apache.org
> *Subject:* Re: GC cost of creating String resource description on
> IndexInput clone
>
>
>
> The issue is that clone or not, they are both IndexInput.java. So if we go
> with your proposal, then *sometimes* the code will have this reference and
> *other times* it won't and the reference will be null. In that non-clone
> case, where would its resource description (filename) come from? I predict
> too many bugs.
>
>
>
> Making this part of the code complex would be the wrong tradeoff for 1%
> performance.
>
>
>
>
>
> On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev111@gmail.com>
> wrote:
>
> Hello everyone,
>
> We recently added Java Flight Recorder (JFR) to our internal benchmarking.
> While looking through top heap allocations from JFR output, we noticed the
> following to be in top 5 contributors to garbage creation.
>
> org.apache.lucene.store.IndexInput#getFullSliceDescription()
> at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
> at org.apache.lucene.store.ByteBufferIndexInput#slice()
> at
> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
> at org.apache.lucene.store.IndexInput#randomAccessSlice()
>
> It seems that we construct a new String resource description for each
> clone/slice of IndexInput instance. Resource description for a clone/slice
> instance is a String concatenation of resource description from original
> instance and current slice description. Also, clone/slice can happen
> multiple times per query per segment.
>
> Can we avoid upfront String allocation by late-evaluating
> IndexInput.toString() on the cloned instance? One approach could be to
> hold a reference of the base IndexInput instance in the cloned instance.
> Then while evaluating IndexInput.toString() on a cloned instance, we can
> construct a resource description by concatenating base instance's
> toString() with current resource description. My understanding is that
> IndexInput.toString() is primarily used for debugging purposes hence we
> can benefit from the late-evaluation.
>
> With this approach, we are seeing sustainable GC time reduction of ~6% and
> gain of ~1% to red-line QPS (throughput). My intention to start this thread
> is to collect feedback on this approach as well as to discuss any other
> ideas.
>
>
>
> Thanks,
>
> Viral Gandhi
>
>
Re: GC cost of creating String resource description on IndexInput clone [ In reply to ]
Thank you for your detailed feedback! I will explore the lambda approach
and benchmark its benefits using our internal testing framework.

Viral Gandhi

On Sat, 20 Feb 2021 at 03:23, Robert Muir <rcmuir@gmail.com> wrote:

> +1 to explore Uwe's lambda idea. I really hate lambdas (you should too, if
> you know what happens behind the scenes), but this sounds like maybe an
> actual justified use of lambdas, taking advantage of the capture facility
> and not using them just to be "cute"
>
> On Sat, Feb 20, 2021 at 6:17 AM Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi,
>>
>>
>>
>> I was thinking about something similar when working on the Java 17
>> replacement for MMapDirectory. My idea would still be late evaluation, but
>> using a lambda. Same way how you can do lazy evaluation in Log4j2 when
>> logging with debug/trace/…
>>
>>
>>
>> Instead of having “String resourceDescription“ as a field in each
>> Indexinput, we can use “Supplier<String> resourceDescription” and
>> initialize it like: () -> foo + bar + parent.toString +…
>>
>>
>>
>> Just because this might be an option to do, I fully agree with Mike Drob
>> and Robert Muir: The risk of memory leaks for such a small improvement is
>> too high. It also depends on the GC you use. If you use old
>> ConcMarkSweepGC, I agree this is an overhead, but nowadays, Lucene should
>> use G1GC.
>>
>>
>>
>> In addition, the strings are not for debugging, they are really useful
>> when something goes wrong, e.g. when I/O errors occur.
>>
>>
>>
>> Uwe
>>
>>
>>
>> -----
>>
>> Uwe Schindler
>>
>> Achterdiek 19, D-28357 Bremen
>>
>> https://www.thetaphi.de
>>
>> eMail: uwe@thetaphi.de
>>
>>
>>
>> *From:* Robert Muir <rcmuir@gmail.com>
>> *Sent:* Saturday, February 20, 2021 11:56 AM
>> *To:* dev@lucene.apache.org
>> *Subject:* Re: GC cost of creating String resource description on
>> IndexInput clone
>>
>>
>>
>> The issue is that clone or not, they are both IndexInput.java. So if we
>> go with your proposal, then *sometimes* the code will have this reference
>> and *other times* it won't and the reference will be null. In that
>> non-clone case, where would its resource description (filename) come from?
>> I predict too many bugs.
>>
>>
>>
>> Making this part of the code complex would be the wrong tradeoff for 1%
>> performance.
>>
>>
>>
>>
>>
>> On Fri, Feb 19, 2021 at 4:58 PM Viral Gandhi <viral.dev111@gmail.com>
>> wrote:
>>
>> Hello everyone,
>>
>> We recently added Java Flight Recorder (JFR) to our internal
>> benchmarking. While looking through top heap allocations from JFR output,
>> we noticed the following to be in top 5 contributors to garbage creation.
>>
>> org.apache.lucene.store.IndexInput#getFullSliceDescription()
>> at org.apache.lucene.store.ByteBufferIndexInput#buildSlice()
>> at org.apache.lucene.store.ByteBufferIndexInput#slice()
>> at
>> org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#slice()
>> at org.apache.lucene.store.IndexInput#randomAccessSlice()
>>
>> It seems that we construct a new String resource description for each
>> clone/slice of IndexInput instance. Resource description for a clone/slice
>> instance is a String concatenation of resource description from original
>> instance and current slice description. Also, clone/slice can happen
>> multiple times per query per segment.
>>
>> Can we avoid upfront String allocation by late-evaluating
>> IndexInput.toString() on the cloned instance? One approach could be to
>> hold a reference of the base IndexInput instance in the cloned instance.
>> Then while evaluating IndexInput.toString() on a cloned instance, we can
>> construct a resource description by concatenating base instance's
>> toString() with current resource description. My understanding is that
>> IndexInput.toString() is primarily used for debugging purposes hence we
>> can benefit from the late-evaluation.
>>
>> With this approach, we are seeing sustainable GC time reduction of ~6%
>> and gain of ~1% to red-line QPS (throughput). My intention to start this
>> thread is to collect feedback on this approach as well as to discuss any
>> other ideas.
>>
>>
>>
>> Thanks,
>>
>> Viral Gandhi
>>
>>