Mailing List Archive

Ideas for simplifying solr-ruby and making it better
I'm trying gather some ideas for how solr-ruby's code can be simplified and
better. For example, a lot of the classes are just extending a base class as
a placeholder, and not really doing anything. Some of them extend a base
class and set one option; the request and response modules have a lot of
this going on. Another thing I'm thinking could be cleaned up, simplified or
even made dynamic is the field mapping; and it'd be nice to permit
arbitrary/un-mapped params to be passed in too. Some of the code doesn't
seem all that rubyish, and my feeling is that there are lots of places where
things could be made simpler.

Do any of you have ideas or things that you've disliked about solr-ruby? If
so, please say so! I've got all kinds of ideas I'd like to implement and
crank out, but for now I want to see what other people are thinking.

Matt
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
Matt,

> Do any of you have ideas or things that you've disliked about
solr-ruby? If
> so, please say so! I've got all kinds of ideas I'd like to implement and
> crank out, but for now I want to see what other people are thinking.

I don't have a concrete idea of making it better, but I agree with you.
Do you have the idea? let's discuss it to be things more rubyish.

see also:
http://wiki.apache.org/solr/solr-ruby/BrainStorming

Koji
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
Hey Koji,

Yeah I've seen that page before. I'd love to see solr-ruby get to that
point!

I wonder if starting from the top down would be a good way to approach this
discussion. Like talking about the public API then talk about the code
underneath to support it, then refactoring etc.. So even before discussing
something request/response "placeholder" classes problem, I'll just express
some of the API things that I've always wanted and/or disliked ;)

* more rich and customizable document model:

documents.each do {|doc| puts d.class==MyCustomDocClass}

* more rich and customizable facet model:

response.facets.each do |facet|
facet.field
facet.hits
end

* set document class type dynamically within result set before iterating:

documents.assign_doc_class do |raw_doc|
return Models::CD if doc[:format_type]=='CD'
return Models::StandardDoc
end

# this would be nice because sometimes we're iterating through a result
set with entirely different "types" of docs.

* document field method accessors
doc.id (or at least doc[:id])
# instead of
doc['id']

* pagination aware result sets (documents and facets)

documents.total_pages # etc.
response.facets.has_next?

* ability to pass in arbitrary query fields directly to solr without
worrying about solr-ruby raising an error

* ability to bypass query field mapping completely while querying

* flatten :facets mapping so that:

:facets=>{:fields=>[]}
# becomes
:facet_fields=>[]

* ability to query a custom :query_type and NOT having to create a custom
request/response class pair

Those things are pretty easy to implement. I'd imagine that if solr-ruby has
a solid API, and a simpler code base it'd also be pretty easy to implement
some of the ORM-like features included on the wiki page and even a more DSL
like approach to regular querying:

response = connectiion.search do |q|
q.per_page 10
q.query 'twain'
q.filter_query :title, 'finn'
q.facet_fields :title, :author
q.query_field :title, 0.5
end

What do you think?

Matt

On Sun, Sep 28, 2008 at 6:22 PM, Koji Sekiguchi <koji@r.email.ne.jp> wrote:

> Matt,
>
> > Do any of you have ideas or things that you've disliked about solr-ruby?
> If
> > so, please say so! I've got all kinds of ideas I'd like to implement and
> > crank out, but for now I want to see what other people are thinking.
>
> I don't have a concrete idea of making it better, but I agree with you.
> Do you have the idea? let's discuss it to be things more rubyish.
>
> see also:
> http://wiki.apache.org/solr/solr-ruby/BrainStorming
>
> Koji
>
>
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
Matt, you are the nexus of XTF and Solr. :-)


On Sep 29, 2008, at 9:37 AM, Matt Mitchell wrote:

> Hey Koji,
>
> Yeah I've seen that page before. I'd love to see solr-ruby get to that
> point!
>
> I wonder if starting from the top down would be a good way to
> approach this
> discussion. Like talking about the public API then talk about the code
> underneath to support it, then refactoring etc.. So even before
> discussing
> something request/response "placeholder" classes problem, I'll just
> express
> some of the API things that I've always wanted and/or disliked ;)
>
> * more rich and customizable document model:
>
> documents.each do {|doc| puts d.class==MyCustomDocClass}
>
> * more rich and customizable facet model:
>
> response.facets.each do |facet|
> facet.field
> facet.hits
> end
>
> * set document class type dynamically within result set before
> iterating:
>
> documents.assign_doc_class do |raw_doc|
> return Models::CD if doc[:format_type]=='CD'
> return Models::StandardDoc
> end
>
> # this would be nice because sometimes we're iterating through a
> result
> set with entirely different "types" of docs.
>
> * document field method accessors
> doc.id (or at least doc[:id])
> # instead of
> doc['id']
>
> * pagination aware result sets (documents and facets)
>
> documents.total_pages # etc.
> response.facets.has_next?
>
> * ability to pass in arbitrary query fields directly to solr without
> worrying about solr-ruby raising an error
>
> * ability to bypass query field mapping completely while querying
>
> * flatten :facets mapping so that:
>
> :facets=>{:fields=>[]}
> # becomes
> :facet_fields=>[]
>
> * ability to query a custom :query_type and NOT having to create a
> custom
> request/response class pair
>
> Those things are pretty easy to implement. I'd imagine that if solr-
> ruby has
> a solid API, and a simpler code base it'd also be pretty easy to
> implement
> some of the ORM-like features included on the wiki page and even a
> more DSL
> like approach to regular querying:
>
> response = connectiion.search do |q|
> q.per_page 10
> q.query 'twain'
> q.filter_query :title, 'finn'
> q.facet_fields :title, :author
> q.query_field :title, 0.5
> end
>
> What do you think?
>
> Matt
>
> On Sun, Sep 28, 2008 at 6:22 PM, Koji Sekiguchi <koji@r.email.ne.jp>
> wrote:
>
>> Matt,
>>
>>> Do any of you have ideas or things that you've disliked about solr-
>>> ruby?
>> If
>>> so, please say so! I've got all kinds of ideas I'd like to
>>> implement and
>>> crank out, but for now I want to see what other people are thinking.
>>
>> I don't have a concrete idea of making it better, but I agree with
>> you.
>> Do you have the idea? let's discuss it to be things more rubyish.
>>
>> see also:
>> http://wiki.apache.org/solr/solr-ruby/BrainStorming
>>
>> Koji
>>
>>
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
> I wonder if starting from the top down would be a good way to
approach this
> discussion. Like talking about the public API then talk about the code
> underneath to support it, then refactoring etc.. So even before
discussing
> something request/response "placeholder" classes problem, I'll just
express
> some of the API things that I've always wanted and/or disliked ;)

+1. Let's start from request/response "placeholder" classes.

> * document field method accessors
> doc.id (or at least doc[:id])
> # instead of
> doc['id']

+1.

> * ability to pass in arbitrary query fields directly to solr without
> worrying about solr-ruby raising an error

Why do you need this ability?

Other than those above, I think you show good things up
to start our discussion and they are interesting.
I'd like to get comments/feedbacks from my associate (rubyist).

Cheers,

Koji
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
Hi Koji. Thanks for the feedback!

In regard to the arbitrary query param issue. There are a few reasons why I
brought that up. The first is that there have been times where I wanted to
pass in something to Solr and solr-ruby hadn't yet supported it. Which means
there needs to be a continuous process of field mapping integration, at
least enough to keep up with the latest Solr param spec. Probably won't be
be a real problem, but it did happen to me once. Another issue is that,
sometimes I feel like the mapping gets in the way. Remembering all of the
Solr params is one thing, but then when you use solr-ruby you have to
remember a whole new set. Oh and the solr params are shorter :)

c.query(:q=>'battery operated', :fq=>'location:Chicago', :qf=>'title^0.5',
:fl=>'title, man, price')
# v.s.
c.query(:query=>'battery operated', :filter_queries=>['location:Chicago'],
:query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])

It'd be really nice to have the field mapping be optional, and even
better... plugable field mapping!

For the class placeholder issue... if we first start with the request
classes, we see there is a :response_format, :content_type, and a :handler.
The rest of the data is essentially query param stuff (field mapping). To
make it really simple, the :handler could dissapear, it'd just be set in the
method ('select' for a :query or :search, 'update' for a :delete etc.). The
:response_format could be set based on the :wt value. And the :content_type
could be a preset attribute in the connection instance. So, with that, you
just provide a method that accepts a hash of params.

The current request classes in solr-ruby (Solr::Request::Dismax etc.) really
look like query field mappers to me, that's what the bulk of the code is
doing it seems. So imagine for a querying... the connection class, a simple
query method, and then something like the current Solr::Request::Standard
being thrown in as a plugable mapper?

Not to promote inheritence :) but if Solr::Connection provided raw query ->
HTTP, you could do something like:

MyConnection < Solr::Connection

def query(params)
super map(params) # the real query method accepts only raw solr
params...
end

protected
def map(params)
# convert my param structure to a raw solr query string...
end

end

But there are better ways to do this in Ruby!

Matt

-- as an example, here is something I threw together a few weeks ago just to
get a feel for the minimal code needed for talking to solr. This is nothing
more than an experiment, hasn't been tested, and of course isn't a "real"
project!

lib:
http://github.com/mwmitchell/slite/tree/master/lib/slite.rb

example:
http://github.com/mwmitchell/slite/tree/master/README


>
> > * ability to pass in arbitrary query fields directly to solr without
> > worrying about solr-ruby raising an error
>
> Why do you need this ability?
>
> Other than those above, I think you show good things up
> to start our discussion and they are interesting.
> I'd like to get comments/feedbacks from my associate (rubyist).
>
> Cheers,
>
> Koji
>
>
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
Matt, I think that that is a great idea. Really useful.


On Sep 29, 2008, at 9:40 PM, Matt Mitchell wrote:

> Hi Koji. Thanks for the feedback!
>
> In regard to the arbitrary query param issue. There are a few
> reasons why I
> brought that up. The first is that there have been times where I
> wanted to
> pass in something to Solr and solr-ruby hadn't yet supported it.
> Which means
> there needs to be a continuous process of field mapping integration,
> at
> least enough to keep up with the latest Solr param spec. Probably
> won't be
> be a real problem, but it did happen to me once. Another issue is
> that,
> sometimes I feel like the mapping gets in the way. Remembering all
> of the
> Solr params is one thing, but then when you use solr-ruby you have to
> remember a whole new set. Oh and the solr params are shorter :)
>
> c.query(:q=>'battery
> operated', :fq=>'location:Chicago', :qf=>'title^0.5',
> :fl=>'title, man, price')
> # v.s.
> c.query(:query=>'battery
> operated', :filter_queries=>['location:Chicago'],
> :query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])
>
> It'd be really nice to have the field mapping be optional, and even
> better... plugable field mapping!
>
> For the class placeholder issue... if we first start with the request
> classes, we see there is a :response_format, :content_type, and
> a :handler.
> The rest of the data is essentially query param stuff (field
> mapping). To
> make it really simple, the :handler could dissapear, it'd just be
> set in the
> method ('select' for a :query or :search, 'update' for a :delete
> etc.). The
> :response_format could be set based on the :wt value. And
> the :content_type
> could be a preset attribute in the connection instance. So, with
> that, you
> just provide a method that accepts a hash of params.
>
> The current request classes in solr-ruby (Solr::Request::Dismax
> etc.) really
> look like query field mappers to me, that's what the bulk of the
> code is
> doing it seems. So imagine for a querying... the connection class, a
> simple
> query method, and then something like the current
> Solr::Request::Standard
> being thrown in as a plugable mapper?
>
> Not to promote inheritence :) but if Solr::Connection provided raw
> query ->
> HTTP, you could do something like:
>
> MyConnection < Solr::Connection
>
> def query(params)
> super map(params) # the real query method accepts only raw solr
> params...
> end
>
> protected
> def map(params)
> # convert my param structure to a raw solr query string...
> end
>
> end
>
> But there are better ways to do this in Ruby!
>
> Matt
>
> -- as an example, here is something I threw together a few weeks ago
> just to
> get a feel for the minimal code needed for talking to solr. This is
> nothing
> more than an experiment, hasn't been tested, and of course isn't a
> "real"
> project!
>
> lib:
> http://github.com/mwmitchell/slite/tree/master/lib/slite.rb
>
> example:
> http://github.com/mwmitchell/slite/tree/master/README
>
>
>>
>>> * ability to pass in arbitrary query fields directly to solr without
>>> worrying about solr-ruby raising an error
>>
>> Why do you need this ability?
>>
>> Other than those above, I think you show good things up
>> to start our discussion and they are interesting.
>> I'd like to get comments/feedbacks from my associate (rubyist).
>>
>> Cheers,
>>
>> Koji
>>
>>
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
On Sep 29, 2008, at 9:40 PM, Matt Mitchell wrote:
> In regard to the arbitrary query param issue. There are a few
> reasons why I
> brought that up. The first is that there have been times where I
> wanted to
> pass in something to Solr and solr-ruby hadn't yet supported it.
> Which means
> there needs to be a continuous process of field mapping integration,
> at
> least enough to keep up with the latest Solr param spec. Probably
> won't be
> be a real problem, but it did happen to me once. Another issue is
> that,
> sometimes I feel like the mapping gets in the way. Remembering all
> of the
> Solr params is one thing, but then when you use solr-ruby you have to
> remember a whole new set. Oh and the solr params are shorter :)

Yeah, it was a bit over designed to have alias and be too clever with
parameter mappings from Ruby to HTTP. I'd like to strip away all the
mappings and have solr-ruby in its most elementary API be able to
simply pass through parameters and get the raw Ruby response data
structure back. Very quickly folks will want to build on top of that
to make things cleaner, but that is fine.

> c.query(:q=>'battery
> operated', :fq=>'location:Chicago', :qf=>'title^0.5',
> :fl=>'title, man, price')
> # v.s.
> c.query(:query=>'battery
> operated', :filter_queries=>['location:Chicago'],
> :query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])
>
> It'd be really nice to have the field mapping be optional, and even
> better... plugable field mapping!

+1

Note that the current Solr::Request::Select is pretty close to what
you're asking for here.

One thing I want to really handle cleanly is custom request handler
mappings - making it trivial to request to any handler. It's not too
bad now, but the paired Request/Response class structure needs to go.

> The current request classes in solr-ruby (Solr::Request::Dismax
> etc.) really
> look like query field mappers to me, that's what the bulk of the
> code is
> doing it seems. So imagine for a querying... the connection class, a
> simple
> query method, and then something like the current
> Solr::Request::Standard
> being thrown in as a plugable mapper?

+1

> Not to promote inheritence :) but if Solr::Connection provided raw
> query ->
> HTTP, you could do something like:
>
> MyConnection < Solr::Connection
>
> def query(params)
> super map(params) # the real query method accepts only raw solr
> params...
> end
>
> protected
> def map(params)
> # convert my param structure to a raw solr query string...
> end
>
> end
>
> But there are better ways to do this in Ruby!

Solr::Connection does provide pretty raw operations to Solr. Look at
Solr::Connection#post. Pass in an object that has #handler, #to_s,
and #content_type methods in and you're off and running. The #to_s
being the key to parameter mapping.

> -- as an example, here is something I threw together a few weeks ago
> just to
> get a feel for the minimal code needed for talking to solr. This is
> nothing
> more than an experiment, hasn't been tested, and of course isn't a
> "real"
> project!
>
> lib:
> http://github.com/mwmitchell/slite/tree/master/lib/slite.rb
>
> example:
> http://github.com/mwmitchell/slite/tree/master/README

Cute stuff, Matt!

I think there are goodies to be mined from there for sure.

How about using #method_missing on Connection such that
connection.whatever(:key => 'value') would call to the "whatever"
request handler? That'd be cool.

I'm not sure I agree with creating objects beyond the eval of the Ruby
response though. At least not in the core of solr-ruby. Let's let
the response from Solr itself be the only object a client really
needs. Conversion to other objects can occur a layer above the inner
core of solr-ruby, such as acts_as_solr.

Keep in mind we have access to to change Solr's response format to
suit solr-ruby's needs also. And I can see some custom solr-ruby
classes coming into play that Solr's Ruby response would emit.

Erik
Re: Ideas for simplifying solr-ruby and making it better [ In reply to ]
One other big wish list item I have for solr-ruby beyond gutting it
and simplifying it to the bare essentials, is to make it JRuby-aware.
When running with JRuby, the SolrJ library will be used and will allow
the use of SolrServer such that an EmbeddedSolrServer can be used. I
suspect we can make this all seamless somehow such that MRI works fine
over HTTP, and JRuby gets the advantage of being able to work really
nicely with embedded Solr.

Erik


On Sep 30, 2008, at 5:49 AM, Erik Hatcher wrote:

>
> On Sep 29, 2008, at 9:40 PM, Matt Mitchell wrote:
>> In regard to the arbitrary query param issue. There are a few
>> reasons why I
>> brought that up. The first is that there have been times where I
>> wanted to
>> pass in something to Solr and solr-ruby hadn't yet supported it.
>> Which means
>> there needs to be a continuous process of field mapping
>> integration, at
>> least enough to keep up with the latest Solr param spec. Probably
>> won't be
>> be a real problem, but it did happen to me once. Another issue is
>> that,
>> sometimes I feel like the mapping gets in the way. Remembering all
>> of the
>> Solr params is one thing, but then when you use solr-ruby you have to
>> remember a whole new set. Oh and the solr params are shorter :)
>
> Yeah, it was a bit over designed to have alias and be too clever
> with parameter mappings from Ruby to HTTP. I'd like to strip away
> all the mappings and have solr-ruby in its most elementary API be
> able to simply pass through parameters and get the raw Ruby response
> data structure back. Very quickly folks will want to build on top
> of that to make things cleaner, but that is fine.
>
>> c.query(:q=>'battery
>> operated', :fq=>'location:Chicago', :qf=>'title^0.5',
>> :fl=>'title, man, price')
>> # v.s.
>> c.query(:query=>'battery
>> operated', :filter_queries=>['location:Chicago'],
>> :query_fields=>'title^0.5', :field_list=>['title', 'man', 'price'])
>>
>> It'd be really nice to have the field mapping be optional, and even
>> better... plugable field mapping!
>
> +1
>
> Note that the current Solr::Request::Select is pretty close to what
> you're asking for here.
>
> One thing I want to really handle cleanly is custom request handler
> mappings - making it trivial to request to any handler. It's not
> too bad now, but the paired Request/Response class structure needs
> to go.
>
>> The current request classes in solr-ruby (Solr::Request::Dismax
>> etc.) really
>> look like query field mappers to me, that's what the bulk of the
>> code is
>> doing it seems. So imagine for a querying... the connection class,
>> a simple
>> query method, and then something like the current
>> Solr::Request::Standard
>> being thrown in as a plugable mapper?
>
> +1
>
>> Not to promote inheritence :) but if Solr::Connection provided raw
>> query ->
>> HTTP, you could do something like:
>>
>> MyConnection < Solr::Connection
>>
>> def query(params)
>> super map(params) # the real query method accepts only raw solr
>> params...
>> end
>>
>> protected
>> def map(params)
>> # convert my param structure to a raw solr query string...
>> end
>>
>> end
>>
>> But there are better ways to do this in Ruby!
>
> Solr::Connection does provide pretty raw operations to Solr. Look
> at Solr::Connection#post. Pass in an object that has #handler,
> #to_s, and #content_type methods in and you're off and running. The
> #to_s being the key to parameter mapping.
>
>> -- as an example, here is something I threw together a few weeks
>> ago just to
>> get a feel for the minimal code needed for talking to solr. This is
>> nothing
>> more than an experiment, hasn't been tested, and of course isn't a
>> "real"
>> project!
>>
>> lib:
>> http://github.com/mwmitchell/slite/tree/master/lib/slite.rb
>>
>> example:
>> http://github.com/mwmitchell/slite/tree/master/README
>
> Cute stuff, Matt!
>
> I think there are goodies to be mined from there for sure.
>
> How about using #method_missing on Connection such that
> connection.whatever(:key => 'value') would call to the "whatever"
> request handler? That'd be cool.
>
> I'm not sure I agree with creating objects beyond the eval of the
> Ruby response though. At least not in the core of solr-ruby. Let's
> let the response from Solr itself be the only object a client really
> needs. Conversion to other objects can occur a layer above the
> inner core of solr-ruby, such as acts_as_solr.
>
> Keep in mind we have access to to change Solr's response format to
> suit solr-ruby's needs also. And I can see some custom solr-ruby
> classes coming into play that Solr's Ruby response would emit.
>
> Erik