Hi all,
I work for a JVM vendor, and we're interested in obtaining / creating
a set of Lucene benchmarks for internal use. We plan to use these for
performance regression testing and general performance analysis
(i.e. to make sure Lucene performs well on our JVM). I'm especially
interested in benchmarks that demonstrate opportunities for
improvements in our JIT compiler.
While I imagine that the lucene/benchmark/ directory is probably the
right place to start, I have a few high-level questions that are best
answered by people on this mailing list:
- Are there realistic Lucene workloads that are bottle-necked on the
JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
If so, what are some examples?
- How relevant are the Dacapo "luindex" and "lusearch" benchmarks
today? Will porting them to the latest version of Lucene give me a
benchmark representative of modern Lucene usage, or has Lucene's
performance characteristics evolved in fundamental ways since Dacapo
was published?
- What is the distribution of Lucene versions in production
deployments? Do users tend to aggressively upgrade to the "latest
and greatest" Lucene version, or is there usually a non-trivial lag?
Any other information that you think is useful or relevant is
welcome.
Thanks!
-- Sanjoy
I work for a JVM vendor, and we're interested in obtaining / creating
a set of Lucene benchmarks for internal use. We plan to use these for
performance regression testing and general performance analysis
(i.e. to make sure Lucene performs well on our JVM). I'm especially
interested in benchmarks that demonstrate opportunities for
improvements in our JIT compiler.
While I imagine that the lucene/benchmark/ directory is probably the
right place to start, I have a few high-level questions that are best
answered by people on this mailing list:
- Are there realistic Lucene workloads that are bottle-necked on the
JVM's performance (JIT, GC etc.) and *not* e.g. disk / network IO?
If so, what are some examples?
- How relevant are the Dacapo "luindex" and "lusearch" benchmarks
today? Will porting them to the latest version of Lucene give me a
benchmark representative of modern Lucene usage, or has Lucene's
performance characteristics evolved in fundamental ways since Dacapo
was published?
- What is the distribution of Lucene versions in production
deployments? Do users tend to aggressively upgrade to the "latest
and greatest" Lucene version, or is there usually a non-trivial lag?
Any other information that you think is useful or relevant is
welcome.
Thanks!
-- Sanjoy