Mailing List Archive

Benchmarking Lucene
Hi all,

I work for a JVM vendor, and we're interested in obtaining / creating
a set of Lucene benchmarks for internal use. We plan to use these for
performance regression testing and general performance analysis
(i.e. to make sure Lucene performs well on our JVM). I'm especially
interested in benchmarks that demonstrate opportunities for
improvements in our JIT compiler.

While I imagine that the lucene/benchmark/ directory is probably the
right place to start, I have a few high-level questions that are best
answered by people on this mailing list:

- Are there realistic Lucene workloads that are bottle-necked on the
JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
If so, what are some examples?

- How relevant are the Dacapo "luindex" and "lusearch" benchmarks
today? Will porting them to the latest version of Lucene give me a
benchmark representative of modern Lucene usage, or has Lucene's
performance characteristics evolved in fundamental ways since Dacapo
was published?

- What is the distribution of Lucene versions in production
deployments? Do users tend to aggressively upgrade to the "latest
and greatest" Lucene version, or is there usually a non-trivial lag?

Any other information that you think is useful or relevant is
welcome.

Thanks!
-- Sanjoy
Re: Benchmarking Lucene [ In reply to ]
Which JVM vendor :) There are not so many, unfortunately...

I run nightly benchmarks for Lucene, which are visible at
https://people.apache.org/~mikemccand/lucenebench/

We use this to catch accidental performance regressions... the sources
for all of this are at https://github.com/mikemccand/luceneutil but
running them yourself can be tricky. They index and search
Wikipedia's English export.

Lucene is definitely JVM/GC bound in many cases, e.g. when the index
is "hot" (fully cached by the OS in free RAM).

I'm not familiar with Dacapo...

I'm not sure how aggressively users upgrade ... but I believe most
users use Lucene via Elasticsearch or Solr.

Mike McCandless

http://blog.mikemccandless.com


On Mon, Nov 23, 2015 at 2:42 PM, Sanjoy Das
<sanjoy@playingwithpointers.com> wrote:
> Hi all,
>
> I work for a JVM vendor, and we're interested in obtaining / creating
> a set of Lucene benchmarks for internal use. We plan to use these for
> performance regression testing and general performance analysis
> (i.e. to make sure Lucene performs well on our JVM). I'm especially
> interested in benchmarks that demonstrate opportunities for
> improvements in our JIT compiler.
>
> While I imagine that the lucene/benchmark/ directory is probably the
> right place to start, I have a few high-level questions that are best
> answered by people on this mailing list:
>
> - Are there realistic Lucene workloads that are bottle-necked on the
> JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
> If so, what are some examples?
>
> - How relevant are the Dacapo "luindex" and "lusearch" benchmarks
> today? Will porting them to the latest version of Lucene give me a
> benchmark representative of modern Lucene usage, or has Lucene's
> performance characteristics evolved in fundamental ways since Dacapo
> was published?
>
> - What is the distribution of Lucene versions in production
> deployments? Do users tend to aggressively upgrade to the "latest
> and greatest" Lucene version, or is there usually a non-trivial lag?
>
> Any other information that you think is useful or relevant is
> welcome.
>
> Thanks!
> -- Sanjoy
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
Re: Benchmarking Lucene [ In reply to ]
Michael McCandless wrote:
> Which JVM vendor :) There are not so many, unfortunately...

I work for Azul Systems (https://www.azul.com).

> I run nightly benchmarks for Lucene, which are visible at
> https://people.apache.org/~mikemccand/lucenebench/
>
> We use this to catch accidental performance regressions... the sources
> for all of this are at https://github.com/mikemccand/luceneutil but
> running them yourself can be tricky. They index and search
> Wikipedia's English export.

I was hoping to get hold of benchmarks that are a little more
"lightweight" -- something that I can run from beginning to end in <
30 minutes. Is there an interesting subset of the nightly tests that
I can run within that sort of timeframe?

> Lucene is definitely JVM/GC bound in many cases, e.g. when the index
> is "hot" (fully cached by the OS in free RAM).
>
> I'm not familiar with Dacapo...
>
> I'm not sure how aggressively users upgrade ... but I believe most
> users use Lucene via Elasticsearch or Solr.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Mon, Nov 23, 2015 at 2:42 PM, Sanjoy Das
> <sanjoy@playingwithpointers.com> wrote:
>> Hi all,
>>
>> I work for a JVM vendor, and we're interested in obtaining / creating
>> a set of Lucene benchmarks for internal use. We plan to use these for
>> performance regression testing and general performance analysis
>> (i.e. to make sure Lucene performs well on our JVM). I'm especially
>> interested in benchmarks that demonstrate opportunities for
>> improvements in our JIT compiler.
>>
>> While I imagine that the lucene/benchmark/ directory is probably the
>> right place to start, I have a few high-level questions that are best
>> answered by people on this mailing list:
>>
>> - Are there realistic Lucene workloads that are bottle-necked on the
>> JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
>> If so, what are some examples?
>>
>> - How relevant are the Dacapo "luindex" and "lusearch" benchmarks
>> today? Will porting them to the latest version of Lucene give me a
>> benchmark representative of modern Lucene usage, or has Lucene's
>> performance characteristics evolved in fundamental ways since Dacapo
>> was published?
>>
>> - What is the distribution of Lucene versions in production
>> deployments? Do users tend to aggressively upgrade to the "latest
>> and greatest" Lucene version, or is there usually a non-trivial lag?
>>
>> Any other information that you think is useful or relevant is
>> welcome.
>>
>> Thanks!
>> -- Sanjoy
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
Re: Benchmarking Lucene [ In reply to ]
> I work for Azul Systems (https://www.azul.com).

Ahem. A bit off topic.

Lucene tests are known to quite frequently crash bleeding edge hotspot
releases. Since Zing is not available to us what would be great is to
have Azul run the Lucene test suite on its own JVM so that we can make
sure everything works (for all parties involved).

Dawid