Mailing List Archive

quick jruby + solr benchmarks
I'm starting to experiment with benchmarks/jruby + solr and just wanted to
get this out there -- getting ready for a week vacation :)

In my solr-ruby 'refactoring' progress, I'm finding some interesting results
and will try to post in the next few weeks.

This is jruby 1.1.4 and solr 1.3 (empty index) -- using the standard Ruby
"Benchmark" library.

The script:

#

require 'java'
require 'benchmark'

solr_dist_root = File.expand_path(File.join(File.dirname(__FILE__), '..',
'apache-solr-1.3.0'))
solr_home = File.join(solr_dist_root, 'example', 'solr')

def require_jars(dir)
jar_pattern = File.join(dir,"**", "*.jar")
jar_files = Dir.glob(jar_pattern)
jar_files.each {|jar_file| require jar_file}
end

def hash_to_params(hash_params)
import org.apache.solr.common.params.ModifiableSolrParams
query = ModifiableSolrParams.new
query.instance_eval do
alias _add add
def add(field, values)
_add(field.to_s, (values.is_a?(Array) ? values :
[values]).to_java(:string))
end
end
hash_params.each_pair do |k,v|
query.add k, v
end
query
end

require_jars(File.join(solr_dist_root, "lib"))
require_jars(File.join(solr_dist_root, "dist"))

# HttpCommons
def http_commons
@http_commons ||= (
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
import org.apache.solr.common.params.MapSolrParams
solr = CommonsHttpSolrServer.new("http://localhost:8983/solr")
)
end

# EmbeddedSolrServer
def embedded(solr_home)
@embedded ||= (
import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
import org.apache.solr.core.CoreContainer
import org.apache.solr.core.CoreDescriptor
import org.apache.solr.client.solrj.SolrQuery
core_name = 'main-core'
container = CoreContainer.new
descriptor = CoreDescriptor.new(container, core_name, solr_home)
core = container.create(descriptor)
container.register(core_name, core, false)
solr = EmbeddedSolrServer.new(container, core_name)
)
end

query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
params = hash_to_params(query)

max = 1000

Benchmark.bm do |x|
x.report 'http commons' do
max.times do
http_commons.query(params)
end
end
x.report 'embedded' do
max.times do
embedded(solr_home).query(params)
end
end
end

# THE RESULTS

# http commons
# 4.634000 0.000000 4.634000 ( 4.633849)
# 4.454000 0.000000 4.454000 ( 4.453764)
# 3.908000 0.000000 3.908000 ( 3.907367)

# embedded
# 2.152000 0.000000 2.152000 ( 2.152226)
# 2.191000 0.000000 2.191000 ( 2.191359)
# 2.083000 0.000000 2.083000 ( 2.082696)
Re: quick jruby + solr benchmarks [ In reply to ]
So about 2x? Not bad. I wonder what running httperf against a simple
app would show.


On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:

> I'm starting to experiment with benchmarks/jruby + solr and just
> wanted to
> get this out there -- getting ready for a week vacation :)
>
> In my solr-ruby 'refactoring' progress, I'm finding some interesting
> results
> and will try to post in the next few weeks.
>
> This is jruby 1.1.4 and solr 1.3 (empty index) -- using the standard
> Ruby
> "Benchmark" library.
>
> The script:
>
> #
>
> require 'java'
> require 'benchmark'
>
> solr_dist_root = File.expand_path(File.join(File.dirname(__FILE__),
> '..',
> 'apache-solr-1.3.0'))
> solr_home = File.join(solr_dist_root, 'example', 'solr')
>
> def require_jars(dir)
> jar_pattern = File.join(dir,"**", "*.jar")
> jar_files = Dir.glob(jar_pattern)
> jar_files.each {|jar_file| require jar_file}
> end
>
> def hash_to_params(hash_params)
> import org.apache.solr.common.params.ModifiableSolrParams
> query = ModifiableSolrParams.new
> query.instance_eval do
> alias _add add
> def add(field, values)
> _add(field.to_s, (values.is_a?(Array) ? values :
> [values]).to_java(:string))
> end
> end
> hash_params.each_pair do |k,v|
> query.add k, v
> end
> query
> end
>
> require_jars(File.join(solr_dist_root, "lib"))
> require_jars(File.join(solr_dist_root, "dist"))
>
> # HttpCommons
> def http_commons
> @http_commons ||= (
> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
> import org.apache.solr.common.params.MapSolrParams
> solr = CommonsHttpSolrServer.new("http://localhost:8983/solr")
> )
> end
>
> # EmbeddedSolrServer
> def embedded(solr_home)
> @embedded ||= (
> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
> import org.apache.solr.core.CoreContainer
> import org.apache.solr.core.CoreDescriptor
> import org.apache.solr.client.solrj.SolrQuery
> core_name = 'main-core'
> container = CoreContainer.new
> descriptor = CoreDescriptor.new(container, core_name, solr_home)
> core = container.create(descriptor)
> container.register(core_name, core, false)
> solr = EmbeddedSolrServer.new(container, core_name)
> )
> end
>
> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
> params = hash_to_params(query)
>
> max = 1000
>
> Benchmark.bm do |x|
> x.report 'http commons' do
> max.times do
> http_commons.query(params)
> end
> end
> x.report 'embedded' do
> max.times do
> embedded(solr_home).query(params)
> end
> end
> end
>
> # THE RESULTS
>
> # http commons
> # 4.634000 0.000000 4.634000 ( 4.633849)
> # 4.454000 0.000000 4.454000 ( 4.453764)
> # 3.908000 0.000000 3.908000 ( 3.907367)
>
> # embedded
> # 2.152000 0.000000 2.152000 ( 2.152226)
> # 2.191000 0.000000 2.191000 ( 2.191359)
> # 2.083000 0.000000 2.083000 ( 2.082696)
Re: quick jruby + solr benchmarks [ In reply to ]
Looks like jruby + DirectSolrConnection are on top. I'll try to get some
update queries next.

1,000 iterations VS 10,000 iterations
Added Ruby MRI 1.8.6, using open-uri / http
Added jruby using open-uri / http

"Benchmark'" standard lib
solr 1.3
empty index
query = ipod

# jruby + CommonsHttpSolrServer
# user system total real
# 1000 iterations
# 4.335000 0.000000 4.335000 ( 4.334744)
# 4.335000 0.000000 4.335000 ( 4.334730)
# 10000 iterations
# 32.355000 0.000000 32.355000 ( 32.354999)
# 32.303000 0.000000 32.303000 ( 32.302859)
# 32.323000 0.000000 32.323000 ( 32.323368)

# jruby + EmbeddedSolrServer
# user system total real
# 1000 iterations
# 2.268000 0.000000 2.268000 ( 2.267976)
# 2.357000 0.000000 2.357000 ( 2.356588)
# 10000 iterations
# 10.650000 0.000000 10.650000 ( 10.649839)
# 8.099000 0.000000 8.099000 ( 8.099088)
# 8.119000 0.000000 8.119000 ( 8.118807)

# jruby + DirectSolrConnection
# user system total real
# 1000 iterations
# 1.593000 0.000000 1.593000 ( 1.592349)
# 1.595000 0.000000 1.595000 ( 1.594842)
# 10000 iterations
# 10.708000 0.000000 10.708000 ( 10.707790)
# 6.952000 0.000000 6.952000 ( 6.951736)
# 7.939000 0.000000 7.939000 ( 7.939191)

# ruby mri + http / open-uri
# user system total real
# 1000 iterations
# 0.760000 0.310000 1.070000 ( 1.607703)
# 0.730000 0.300000 1.030000 ( 1.619739)
# 0.760000 0.330000 1.090000 ( 1.907517)
# 0.740000 0.300000 1.040000 ( 1.543832)
# 10000 iterations
# 7.300000 2.970000 10.270000 ( 15.452759)
# 7.290000 2.960000 10.250000 ( 15.585011)
# 7.330000 2.980000 10.310000 ( 15.781377)

# jruby + http / open-uri
# user system total real
# 10000 iterations
# 27.583000 0.000000 27.583000 ( 27.582765)
# 25.620000 0.000000 25.620000 ( 25.620403)
# 25.474000 0.000000 25.474000 ( 25.473653)
Re: quick jruby + solr benchmarks [ In reply to ]
Yeah that type of benchmark would probably be a lot more useful. I'll see if
I can get something like that going. I've never really done benchmarking
before. Any general tips?

matt

On Tue, Nov 25, 2008 at 7:13 PM, Jamie Orchard-Hays <jamie@dangosaur.us>wrote:

> So about 2x? Not bad. I wonder what running httperf against a simple app
> would show.
>
>
>
> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>
> I'm starting to experiment with benchmarks/jruby + solr and just wanted to
>> get this out there -- getting ready for a week vacation :)
>>
>> In my solr-ruby 'refactoring' progress, I'm finding some interesting
>> results
>> and will try to post in the next few weeks.
>>
>> This is jruby 1.1.4 and solr 1.3 (empty index) -- using the standard Ruby
>> "Benchmark" library.
>>
>> The script:
>>
>> #
>>
>> require 'java'
>> require 'benchmark'
>>
>> solr_dist_root = File.expand_path(File.join(File.dirname(__FILE__), '..',
>> 'apache-solr-1.3.0'))
>> solr_home = File.join(solr_dist_root, 'example', 'solr')
>>
>> def require_jars(dir)
>> jar_pattern = File.join(dir,"**", "*.jar")
>> jar_files = Dir.glob(jar_pattern)
>> jar_files.each {|jar_file| require jar_file}
>> end
>>
>> def hash_to_params(hash_params)
>> import org.apache.solr.common.params.ModifiableSolrParams
>> query = ModifiableSolrParams.new
>> query.instance_eval do
>> alias _add add
>> def add(field, values)
>> _add(field.to_s, (values.is_a?(Array) ? values :
>> [values]).to_java(:string))
>> end
>> end
>> hash_params.each_pair do |k,v|
>> query.add k, v
>> end
>> query
>> end
>>
>> require_jars(File.join(solr_dist_root, "lib"))
>> require_jars(File.join(solr_dist_root, "dist"))
>>
>> # HttpCommons
>> def http_commons
>> @http_commons ||= (
>> import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer
>> import org.apache.solr.common.params.MapSolrParams
>> solr = CommonsHttpSolrServer.new("http://localhost:8983/solr")
>> )
>> end
>>
>> # EmbeddedSolrServer
>> def embedded(solr_home)
>> @embedded ||= (
>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>> import org.apache.solr.core.CoreContainer
>> import org.apache.solr.core.CoreDescriptor
>> import org.apache.solr.client.solrj.SolrQuery
>> core_name = 'main-core'
>> container = CoreContainer.new
>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>> core = container.create(descriptor)
>> container.register(core_name, core, false)
>> solr = EmbeddedSolrServer.new(container, core_name)
>> )
>> end
>>
>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>> params = hash_to_params(query)
>>
>> max = 1000
>>
>> Benchmark.bm do |x|
>> x.report 'http commons' do
>> max.times do
>> http_commons.query(params)
>> end
>> end
>> x.report 'embedded' do
>> max.times do
>> embedded(solr_home).query(params)
>> end
>> end
>> end
>>
>> # THE RESULTS
>>
>> # http commons
>> # 4.634000 0.000000 4.634000 ( 4.633849)
>> # 4.454000 0.000000 4.454000 ( 4.453764)
>> # 3.908000 0.000000 3.908000 ( 3.907367)
>>
>> # embedded
>> # 2.152000 0.000000 2.152000 ( 2.152226)
>> # 2.191000 0.000000 2.191000 ( 2.191359)
>> # 2.083000 0.000000 2.083000 ( 2.082696)
>>
>
>
Re: quick jruby + solr benchmarks [ In reply to ]
On Nov 25, 2008, at 7:13 PM, Jamie Orchard-Hays wrote:
> So about 2x? Not bad. I wonder what running httperf against a simple
> app would show.

Keep in mind these points:

* Solr's query cache. Repeating a query 1000 times is really only
executing the query one time and pulling the document set the rest of
the time, except...

* Solr supports HTTP cache headers. Thus a "smart" HTTP client
that is HTTP cache savvy will get 304's for 999 of those queries
without Solr doing anything but checking the HTTP request headers and
the current state of the index. Note that Matt's benchmark code is
not HTTP cache savvy at the moment (not a flaw per se, just worth
noting).

Erik
Re: quick jruby + solr benchmarks [ In reply to ]
just a couple of quick code comments...

On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
> # EmbeddedSolrServer
> def embedded(solr_home)
> @embedded ||= (
> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
> import org.apache.solr.core.CoreContainer
> import org.apache.solr.core.CoreDescriptor
> import org.apache.solr.client.solrj.SolrQuery
> core_name = 'main-core'
> container = CoreContainer.new
> descriptor = CoreDescriptor.new(container, core_name, solr_home)
> core = container.create(descriptor)

You'll want to close that core, otherwise the JVM doesn't exit. I
changed this to:

@core = ....

> container.register(core_name, core, false)

and used @core there.

> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}

Note that faceting is not enabled unless there is also a &facet=on

> params = hash_to_params(query)
>
> max = 1000
>
> Benchmark.bm do |x|
> x.report 'http commons' do
> max.times do
> http_commons.query(params)
> end
> end
> x.report 'embedded' do
> max.times do
> embedded(solr_home).query(params)
> end
> end
> end

And I added an:

@core.close

at the end.

Erik
Re: quick jruby + solr benchmarks [ In reply to ]
Yeah I overlooked all of that. Thanks Erik. So could a better query test be
an incremental one based on id like:

100.times do |id|
q = "id:#{id}"
# query request here...
end

?

Would you happen to know why the solr home and data dir never really change?
Anytime I use commons http or embedded, a "solr" directory is created in the
same directory as my script. Even though I'm setting the home and data dir
in my code?

Matt

On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <erik@ehatchersolutions.com>wrote:

> just a couple of quick code comments...
>
> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>
>> # EmbeddedSolrServer
>> def embedded(solr_home)
>> @embedded ||= (
>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>> import org.apache.solr.core.CoreContainer
>> import org.apache.solr.core.CoreDescriptor
>> import org.apache.solr.client.solrj.SolrQuery
>> core_name = 'main-core'
>> container = CoreContainer.new
>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>> core = container.create(descriptor)
>>
>
> You'll want to close that core, otherwise the JVM doesn't exit. I changed
> this to:
>
> @core = ....
>
> container.register(core_name, core, false)
>>
>
> and used @core there.
>
> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>
>
> Note that faceting is not enabled unless there is also a &facet=on
>
> params = hash_to_params(query)
>>
>> max = 1000
>>
>> Benchmark.bm do |x|
>> x.report 'http commons' do
>> max.times do
>> http_commons.query(params)
>> end
>> end
>> x.report 'embedded' do
>> max.times do
>> embedded(solr_home).query(params)
>> end
>> end
>> end
>>
>
> And I added an:
>
> @core.close
>
> at the end.
>
> Erik
>
>
Re: quick jruby + solr benchmarks [ In reply to ]
On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
> Yeah I overlooked all of that. Thanks Erik. So could a better query
> test be
> an incremental one based on id like:
>
> 100.times do |id|
> q = "id:#{id}"
> # query request here...
> end
>
> ?

Testing is an art form. Depends on what you are testing. Issuing
entirely unique queries is not very real-world either, but at least it
will cause the bypassing of query and HTTP caching shortcuts.

Many organizations mine their query logs to get a set of
representative queries to test with, for example.

I think your point is proven - EmbeddedSolrServer itself is faster
than CommonsHttpSolrServer. But would you deploy that way? Is your
front-end going to be merged with Solr itself? That may or may not be
very viable, depending on the resources the front-end and Solr needs
and how much system resources you have. What about doing load
balancing? You're then stuck with load balancing your front-end in
tandem with Solr itself.

Again, it all boils down to what you're after with the benchmarks.
And I'm not a benchmarking performance savvy person myself, so I'm not
sure where to take it from here. It's an interesting test, for sure,
and I'd like to have it reviewed by others that really know their
stuff in this realm and with Solr itself that can elaborate on why
there is such a huge difference in speed. Is it just HTTP and
serialize/unserialize overhead? (I tend to doubt that, but don't know)

> Would you happen to know why the solr home and data dir never really
> change?
> Anytime I use commons http or embedded, a "solr" directory is
> created in the
> same directory as my script. Even though I'm setting the home and
> data dir
> in my code?

I don't know at the moment, I'd have to dig deeper.

Erik
Re: quick jruby + solr benchmarks [ In reply to ]
I just had a brief conversation with Yonik on this to get his way more
expert opinion, and it really boils down to this in this particular
test... the query itself is incredibly fast (1 millisecond or less
QTime Solr reports) since there are no documents. So what these
differences are showing is merely the difference between HTTP and a
method call - with nothing else (of note) going on.

In a realer world scenario, the HTTP overhead makes less difference as
the work being done in the query/faceting overshadows the
communication overhead.

There's lies, damned lies, and benchmarks :)

Erik



On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:

> Yeah I overlooked all of that. Thanks Erik. So could a better query
> test be
> an incremental one based on id like:
>
> 100.times do |id|
> q = "id:#{id}"
> # query request here...
> end
>
> ?
>
> Would you happen to know why the solr home and data dir never really
> change?
> Anytime I use commons http or embedded, a "solr" directory is
> created in the
> same directory as my script. Even though I'm setting the home and
> data dir
> in my code?
>
> Matt
>
> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <erik@ehatchersolutions.com
> >wrote:
>
>> just a couple of quick code comments...
>>
>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>
>>> # EmbeddedSolrServer
>>> def embedded(solr_home)
>>> @embedded ||= (
>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>> import org.apache.solr.core.CoreContainer
>>> import org.apache.solr.core.CoreDescriptor
>>> import org.apache.solr.client.solrj.SolrQuery
>>> core_name = 'main-core'
>>> container = CoreContainer.new
>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>> core = container.create(descriptor)
>>>
>>
>> You'll want to close that core, otherwise the JVM doesn't exit. I
>> changed
>> this to:
>>
>> @core = ....
>>
>> container.register(core_name, core, false)
>>>
>>
>> and used @core there.
>>
>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>
>>
>> Note that faceting is not enabled unless there is also a &facet=on
>>
>> params = hash_to_params(query)
>>>
>>> max = 1000
>>>
>>> Benchmark.bm do |x|
>>> x.report 'http commons' do
>>> max.times do
>>> http_commons.query(params)
>>> end
>>> end
>>> x.report 'embedded' do
>>> max.times do
>>> embedded(solr_home).query(params)
>>> end
>>> end
>>> end
>>>
>>
>> And I added an:
>>
>> @core.close
>>
>> at the end.
>>
>> Erik
>>
>>
Re: quick jruby + solr benchmarks [ In reply to ]
Interesting. My main goal was to get a feel for how jruby and the
direct/embedded stuff compared to mri ruby and straight up http. But
obviously, the data and these tests are not realistic at all. Thanks for
your feedback guys.

Matt

On Wed, Nov 26, 2008 at 10:34 AM, Erik Hatcher
<erik@ehatchersolutions.com>wrote:

> I just had a brief conversation with Yonik on this to get his way more
> expert opinion, and it really boils down to this in this particular test...
> the query itself is incredibly fast (1 millisecond or less QTime Solr
> reports) since there are no documents. So what these differences are
> showing is merely the difference between HTTP and a method call - with
> nothing else (of note) going on.
>
> In a realer world scenario, the HTTP overhead makes less difference as the
> work being done in the query/faceting overshadows the communication
> overhead.
>
> There's lies, damned lies, and benchmarks :)
>
> Erik
>
>
>
> On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
>
> Yeah I overlooked all of that. Thanks Erik. So could a better query test
>> be
>> an incremental one based on id like:
>>
>> 100.times do |id|
>> q = "id:#{id}"
>> # query request here...
>> end
>>
>> ?
>>
>> Would you happen to know why the solr home and data dir never really
>> change?
>> Anytime I use commons http or embedded, a "solr" directory is created in
>> the
>> same directory as my script. Even though I'm setting the home and data dir
>> in my code?
>>
>> Matt
>>
>> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <erik@ehatchersolutions.com
>> >wrote:
>>
>> just a couple of quick code comments...
>>>
>>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>>
>>> # EmbeddedSolrServer
>>>> def embedded(solr_home)
>>>> @embedded ||= (
>>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>>> import org.apache.solr.core.CoreContainer
>>>> import org.apache.solr.core.CoreDescriptor
>>>> import org.apache.solr.client.solrj.SolrQuery
>>>> core_name = 'main-core'
>>>> container = CoreContainer.new
>>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>>> core = container.create(descriptor)
>>>>
>>>>
>>> You'll want to close that core, otherwise the JVM doesn't exit. I
>>> changed
>>> this to:
>>>
>>> @core = ....
>>>
>>> container.register(core_name, core, false)
>>>
>>>>
>>>>
>>> and used @core there.
>>>
>>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>
>>>>
>>>>
>>> Note that faceting is not enabled unless there is also a &facet=on
>>>
>>> params = hash_to_params(query)
>>>
>>>>
>>>> max = 1000
>>>>
>>>> Benchmark.bm do |x|
>>>> x.report 'http commons' do
>>>> max.times do
>>>> http_commons.query(params)
>>>> end
>>>> end
>>>> x.report 'embedded' do
>>>> max.times do
>>>> embedded(solr_home).query(params)
>>>> end
>>>> end
>>>> end
>>>>
>>>>
>>> And I added an:
>>>
>>> @core.close
>>>
>>> at the end.
>>>
>>> Erik
>>>
>>>
>>>
>
Re: quick jruby + solr benchmarks [ In reply to ]
Here's something to note when using net/http in Ruby (which open-uri
wraps). Even though it's about as fast as other options, it uses a
huge cpu load when compared to others (on ruby 1.8.6):

http://apocryph.org/more_indepth_analysis_ruby_http_client_performance


On Nov 26, 2008, at 12:06 PM, Matt Mitchell wrote:

> Interesting. My main goal was to get a feel for how jruby and the
> direct/embedded stuff compared to mri ruby and straight up http. But
> obviously, the data and these tests are not realistic at all. Thanks
> for
> your feedback guys.
>
> Matt
>
> On Wed, Nov 26, 2008 at 10:34 AM, Erik Hatcher
> <erik@ehatchersolutions.com>wrote:
>
>> I just had a brief conversation with Yonik on this to get his way
>> more
>> expert opinion, and it really boils down to this in this particular
>> test...
>> the query itself is incredibly fast (1 millisecond or less QTime Solr
>> reports) since there are no documents. So what these differences are
>> showing is merely the difference between HTTP and a method call -
>> with
>> nothing else (of note) going on.
>>
>> In a realer world scenario, the HTTP overhead makes less difference
>> as the
>> work being done in the query/faceting overshadows the communication
>> overhead.
>>
>> There's lies, damned lies, and benchmarks :)
>>
>> Erik
>>
>>
>>
>> On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
>>
>> Yeah I overlooked all of that. Thanks Erik. So could a better query
>> test
>>> be
>>> an incremental one based on id like:
>>>
>>> 100.times do |id|
>>> q = "id:#{id}"
>>> # query request here...
>>> end
>>>
>>> ?
>>>
>>> Would you happen to know why the solr home and data dir never really
>>> change?
>>> Anytime I use commons http or embedded, a "solr" directory is
>>> created in
>>> the
>>> same directory as my script. Even though I'm setting the home and
>>> data dir
>>> in my code?
>>>
>>> Matt
>>>
>>> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <erik@ehatchersolutions.com
>>>> wrote:
>>>
>>> just a couple of quick code comments...
>>>>
>>>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>>>
>>>> # EmbeddedSolrServer
>>>>> def embedded(solr_home)
>>>>> @embedded ||= (
>>>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>>>> import org.apache.solr.core.CoreContainer
>>>>> import org.apache.solr.core.CoreDescriptor
>>>>> import org.apache.solr.client.solrj.SolrQuery
>>>>> core_name = 'main-core'
>>>>> container = CoreContainer.new
>>>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>>>> core = container.create(descriptor)
>>>>>
>>>>>
>>>> You'll want to close that core, otherwise the JVM doesn't exit. I
>>>> changed
>>>> this to:
>>>>
>>>> @core = ....
>>>>
>>>> container.register(core_name, core, false)
>>>>
>>>>>
>>>>>
>>>> and used @core there.
>>>>
>>>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>>
>>>>>
>>>>>
>>>> Note that faceting is not enabled unless there is also a &facet=on
>>>>
>>>> params = hash_to_params(query)
>>>>
>>>>>
>>>>> max = 1000
>>>>>
>>>>> Benchmark.bm do |x|
>>>>> x.report 'http commons' do
>>>>> max.times do
>>>>> http_commons.query(params)
>>>>> end
>>>>> end
>>>>> x.report 'embedded' do
>>>>> max.times do
>>>>> embedded(solr_home).query(params)
>>>>> end
>>>>> end
>>>>> end
>>>>>
>>>>>
>>>> And I added an:
>>>>
>>>> @core.close
>>>>
>>>> at the end.
>>>>
>>>> Erik
>>>>
>>>>
>>>>
>>
Re: quick jruby + solr benchmarks [ In reply to ]
Thanks Jamie. That's kind of shocking actually. What client library do you
use?

On Sun, Nov 30, 2008 at 1:38 PM, Jamie Orchard-Hays <jamie@dangosaur.us>wrote:

> Here's something to note when using net/http in Ruby (which open-uri
> wraps). Even though it's about as fast as other options, it uses a huge cpu
> load when compared to others (on ruby 1.8.6):
>
> http://apocryph.org/more_indepth_analysis_ruby_http_client_performance
>
>
>
> On Nov 26, 2008, at 12:06 PM, Matt Mitchell wrote:
>
> Interesting. My main goal was to get a feel for how jruby and the
>> direct/embedded stuff compared to mri ruby and straight up http. But
>> obviously, the data and these tests are not realistic at all. Thanks for
>> your feedback guys.
>>
>> Matt
>>
>> On Wed, Nov 26, 2008 at 10:34 AM, Erik Hatcher
>> <erik@ehatchersolutions.com>wrote:
>>
>> I just had a brief conversation with Yonik on this to get his way more
>>> expert opinion, and it really boils down to this in this particular
>>> test...
>>> the query itself is incredibly fast (1 millisecond or less QTime Solr
>>> reports) since there are no documents. So what these differences are
>>> showing is merely the difference between HTTP and a method call - with
>>> nothing else (of note) going on.
>>>
>>> In a realer world scenario, the HTTP overhead makes less difference as
>>> the
>>> work being done in the query/faceting overshadows the communication
>>> overhead.
>>>
>>> There's lies, damned lies, and benchmarks :)
>>>
>>> Erik
>>>
>>>
>>>
>>> On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
>>>
>>> Yeah I overlooked all of that. Thanks Erik. So could a better query test
>>>
>>>> be
>>>> an incremental one based on id like:
>>>>
>>>> 100.times do |id|
>>>> q = "id:#{id}"
>>>> # query request here...
>>>> end
>>>>
>>>> ?
>>>>
>>>> Would you happen to know why the solr home and data dir never really
>>>> change?
>>>> Anytime I use commons http or embedded, a "solr" directory is created in
>>>> the
>>>> same directory as my script. Even though I'm setting the home and data
>>>> dir
>>>> in my code?
>>>>
>>>> Matt
>>>>
>>>> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <
>>>> erik@ehatchersolutions.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> just a couple of quick code comments...
>>>>
>>>>>
>>>>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>>>>
>>>>> # EmbeddedSolrServer
>>>>>
>>>>>> def embedded(solr_home)
>>>>>> @embedded ||= (
>>>>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>>>>> import org.apache.solr.core.CoreContainer
>>>>>> import org.apache.solr.core.CoreDescriptor
>>>>>> import org.apache.solr.client.solrj.SolrQuery
>>>>>> core_name = 'main-core'
>>>>>> container = CoreContainer.new
>>>>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>>>>> core = container.create(descriptor)
>>>>>>
>>>>>>
>>>>>> You'll want to close that core, otherwise the JVM doesn't exit. I
>>>>> changed
>>>>> this to:
>>>>>
>>>>> @core = ....
>>>>>
>>>>> container.register(core_name, core, false)
>>>>>
>>>>>
>>>>>>
>>>>>> and used @core there.
>>>>>
>>>>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>>>
>>>>>
>>>>>>
>>>>>> Note that faceting is not enabled unless there is also a &facet=on
>>>>>
>>>>> params = hash_to_params(query)
>>>>>
>>>>>
>>>>>> max = 1000
>>>>>>
>>>>>> Benchmark.bm do |x|
>>>>>> x.report 'http commons' do
>>>>>> max.times do
>>>>>> http_commons.query(params)
>>>>>> end
>>>>>> end
>>>>>> x.report 'embedded' do
>>>>>> max.times do
>>>>>> embedded(solr_home).query(params)
>>>>>> end
>>>>>> end
>>>>>> end
>>>>>>
>>>>>>
>>>>>> And I added an:
>>>>>
>>>>> @core.close
>>>>>
>>>>> at the end.
>>>>>
>>>>> Erik
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>
Re: quick jruby + solr benchmarks [ In reply to ]
The other night I spent a few hours messing with EventMachine, Curb
(libcurl ruby lib) and RFuzz. EventMachine's HTTP2 is just missing
some of the POST features I need, and I didn't want to figure out how
to build what I needed from EventMachine's low-level features. RFuzz
works, but then would crap out completely or go from well under a
second to 20+ seconds to complete a request. I suspect it's not
designed for the large POSTs I need. Curb (which is used with "require
'curl'"--why do some gem authors not name the gem and the library the
same dang thing???) works great. It's not any faster than net/http,
but judging from those tests, I should be saving a lot of CPU.

Jamie

On Dec 3, 2008, at 10:05 AM, Matt Mitchell wrote:

> Thanks Jamie. That's kind of shocking actually. What client library
> do you
> use?
>
> On Sun, Nov 30, 2008 at 1:38 PM, Jamie Orchard-Hays <jamie@dangosaur.us
> >wrote:
>
>> Here's something to note when using net/http in Ruby (which open-uri
>> wraps). Even though it's about as fast as other options, it uses a
>> huge cpu
>> load when compared to others (on ruby 1.8.6):
>>
>> http://apocryph.org/
>> more_indepth_analysis_ruby_http_client_performance
>>
>>
>>
>> On Nov 26, 2008, at 12:06 PM, Matt Mitchell wrote:
>>
>> Interesting. My main goal was to get a feel for how jruby and the
>>> direct/embedded stuff compared to mri ruby and straight up http. But
>>> obviously, the data and these tests are not realistic at all.
>>> Thanks for
>>> your feedback guys.
>>>
>>> Matt
>>>
>>> On Wed, Nov 26, 2008 at 10:34 AM, Erik Hatcher
>>> <erik@ehatchersolutions.com>wrote:
>>>
>>> I just had a brief conversation with Yonik on this to get his way
>>> more
>>>> expert opinion, and it really boils down to this in this particular
>>>> test...
>>>> the query itself is incredibly fast (1 millisecond or less QTime
>>>> Solr
>>>> reports) since there are no documents. So what these differences
>>>> are
>>>> showing is merely the difference between HTTP and a method call -
>>>> with
>>>> nothing else (of note) going on.
>>>>
>>>> In a realer world scenario, the HTTP overhead makes less
>>>> difference as
>>>> the
>>>> work being done in the query/faceting overshadows the communication
>>>> overhead.
>>>>
>>>> There's lies, damned lies, and benchmarks :)
>>>>
>>>> Erik
>>>>
>>>>
>>>>
>>>> On Nov 26, 2008, at 9:54 AM, Matt Mitchell wrote:
>>>>
>>>> Yeah I overlooked all of that. Thanks Erik. So could a better
>>>> query test
>>>>
>>>>> be
>>>>> an incremental one based on id like:
>>>>>
>>>>> 100.times do |id|
>>>>> q = "id:#{id}"
>>>>> # query request here...
>>>>> end
>>>>>
>>>>> ?
>>>>>
>>>>> Would you happen to know why the solr home and data dir never
>>>>> really
>>>>> change?
>>>>> Anytime I use commons http or embedded, a "solr" directory is
>>>>> created in
>>>>> the
>>>>> same directory as my script. Even though I'm setting the home
>>>>> and data
>>>>> dir
>>>>> in my code?
>>>>>
>>>>> Matt
>>>>>
>>>>> On Wed, Nov 26, 2008 at 3:28 AM, Erik Hatcher <
>>>>> erik@ehatchersolutions.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> just a couple of quick code comments...
>>>>>
>>>>>>
>>>>>> On Nov 25, 2008, at 6:04 PM, Matt Mitchell wrote:
>>>>>>
>>>>>> # EmbeddedSolrServer
>>>>>>
>>>>>>> def embedded(solr_home)
>>>>>>> @embedded ||= (
>>>>>>> import org.apache.solr.client.solrj.embedded.EmbeddedSolrServer
>>>>>>> import org.apache.solr.core.CoreContainer
>>>>>>> import org.apache.solr.core.CoreDescriptor
>>>>>>> import org.apache.solr.client.solrj.SolrQuery
>>>>>>> core_name = 'main-core'
>>>>>>> container = CoreContainer.new
>>>>>>> descriptor = CoreDescriptor.new(container, core_name, solr_home)
>>>>>>> core = container.create(descriptor)
>>>>>>>
>>>>>>>
>>>>>>> You'll want to close that core, otherwise the JVM doesn't
>>>>>>> exit. I
>>>>>> changed
>>>>>> this to:
>>>>>>
>>>>>> @core = ....
>>>>>>
>>>>>> container.register(core_name, core, false)
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> and used @core there.
>>>>>>
>>>>>> query = {'qt' => 'standard', 'q'=>'ipod', 'facet.field' => 'cat'}
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Note that faceting is not enabled unless there is also a
>>>>>>> &facet=on
>>>>>>
>>>>>> params = hash_to_params(query)
>>>>>>
>>>>>>
>>>>>>> max = 1000
>>>>>>>
>>>>>>> Benchmark.bm do |x|
>>>>>>> x.report 'http commons' do
>>>>>>> max.times do
>>>>>>> http_commons.query(params)
>>>>>>> end
>>>>>>> end
>>>>>>> x.report 'embedded' do
>>>>>>> max.times do
>>>>>>> embedded(solr_home).query(params)
>>>>>>> end
>>>>>>> end
>>>>>>> end
>>>>>>>
>>>>>>>
>>>>>>> And I added an:
>>>>>>
>>>>>> @core.close
>>>>>>
>>>>>> at the end.
>>>>>>
>>>>>> Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>