Greets,
(I'm cc'ing this to lucy-dev@lucene.apache.org, because I think Lucy
should follow the same design principles described in this post.)
KinoSearch is spinning off a few modules, to cut down on the core size
and complexity. For the present time, they will continue to be
distributed with the KinoSearch tarball, but eventually they will
become separate distributions.
KinoSearch::Search::SearchServer and KinoSearch::Search::SearchClient
have moved to KSx::Remote::SearchServer and
KSx::Remote::SearchClient. Eventually, they will be distributed under
KSx::Remote.
The rationale for breaking out SearchServer/SearchClient is that there
are many ways to have machines interconnect; the Socket/faked-up-rpc
approach taken by SearchClient/SearchServer, the XML approach used by
Solr, etc. For core, it is only crucial that the messages that have
to be sent over the network be serializable using *some* technique --
it's not important what technique is chosen.
The other spinoff is Filter. KinoSearch::Search::Filter,
KinoSearch::Search::QueryFilter, and KinoSearch::Search::PolyFilter
have all been removed; their functionality is now encapsulated in
KSx::Search::Filter, which has been refactored as a subclass of
Query. The last filter subclass, KinoSearch::Search::RangeFilter, has
been replaced by a new core class, KinoSearch::Search::RangeQuery
(which behaves similarly to Lucene's ConstantScoringRangeQuery with a
fixed score of 0).
The standard KS search methods no longer take a 'filter' argument.
Here's the new Filter API in action:
my %category_filters;
for my $category (qw( sweet sour salty bitter )) {
my $cat_query = KinoSearch::Search::TermQuery->new(
field => 'category',
term => $category,
);
$category_filters{$category} = KSx::Search::Filter->new(
query => $cat_query,
);
}
while ( my $cgi = CGI::Fast->new ) {
my $user_query = $cgi->param('q');
my $filter = $category_filters{$cgi->param('category')};
my $and_query = KinoSearch::Search::ANDQuery->new;
$and_query->add_child($user_query);
$and_query->add_child($filter);
my $hits = $searcher->search( query => $and_query );
...
Filter is moving outside of core because it is essentially nothing
more a caching optimization. Logically, the following code would
produce exactly the same results as the code above:
while ( my $cgi = CGI::Fast->new ) {
my $user_query = $cgi->param('q');
my $category_query = KinoSearch::Search::TermQuery->new(
field => 'category',
term => $cgi->param('category'),
);
$category_query->set_boost(0);
my $and_query = KinoSearch::Search::ANDQuery->new;
$and_query->add_child($user_query);
$and_query->add_child($category_query);
my $hits = $searcher->search( query => $and_query );
...
The only significant differences are that the Filter only runs the
query once, and that it can't be serialized and sent over the network
in a search cluster (because the search results are cached in a
BitVector which is too big to send).
Lucene provides classes called RemoteCachingWrapperFilter and
FilterManager that address the problem of filter caching in search
clusters, and whose functionality might eventually end up in either
KSx::Remote or KSx::Search::Filter. Again, though, they are caching
optimizations with serialization limitations and as such belong
outside of core.
I thought about keeping Filter as an abstract base class, and putting
the actual functionality into KSx::Search::QueryFilter or something
like that. However, after reviewing the various Filter subclasses in
both Lucene's core and contrib, it looked to me as though nearly all
of them (all except for the SpanFilter subclasses which would need to
be different anyway) could be realized using either ordinary Queries
or Queries in conjunction with this new implementation of Filter.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
(I'm cc'ing this to lucy-dev@lucene.apache.org, because I think Lucy
should follow the same design principles described in this post.)
KinoSearch is spinning off a few modules, to cut down on the core size
and complexity. For the present time, they will continue to be
distributed with the KinoSearch tarball, but eventually they will
become separate distributions.
KinoSearch::Search::SearchServer and KinoSearch::Search::SearchClient
have moved to KSx::Remote::SearchServer and
KSx::Remote::SearchClient. Eventually, they will be distributed under
KSx::Remote.
The rationale for breaking out SearchServer/SearchClient is that there
are many ways to have machines interconnect; the Socket/faked-up-rpc
approach taken by SearchClient/SearchServer, the XML approach used by
Solr, etc. For core, it is only crucial that the messages that have
to be sent over the network be serializable using *some* technique --
it's not important what technique is chosen.
The other spinoff is Filter. KinoSearch::Search::Filter,
KinoSearch::Search::QueryFilter, and KinoSearch::Search::PolyFilter
have all been removed; their functionality is now encapsulated in
KSx::Search::Filter, which has been refactored as a subclass of
Query. The last filter subclass, KinoSearch::Search::RangeFilter, has
been replaced by a new core class, KinoSearch::Search::RangeQuery
(which behaves similarly to Lucene's ConstantScoringRangeQuery with a
fixed score of 0).
The standard KS search methods no longer take a 'filter' argument.
Here's the new Filter API in action:
my %category_filters;
for my $category (qw( sweet sour salty bitter )) {
my $cat_query = KinoSearch::Search::TermQuery->new(
field => 'category',
term => $category,
);
$category_filters{$category} = KSx::Search::Filter->new(
query => $cat_query,
);
}
while ( my $cgi = CGI::Fast->new ) {
my $user_query = $cgi->param('q');
my $filter = $category_filters{$cgi->param('category')};
my $and_query = KinoSearch::Search::ANDQuery->new;
$and_query->add_child($user_query);
$and_query->add_child($filter);
my $hits = $searcher->search( query => $and_query );
...
Filter is moving outside of core because it is essentially nothing
more a caching optimization. Logically, the following code would
produce exactly the same results as the code above:
while ( my $cgi = CGI::Fast->new ) {
my $user_query = $cgi->param('q');
my $category_query = KinoSearch::Search::TermQuery->new(
field => 'category',
term => $cgi->param('category'),
);
$category_query->set_boost(0);
my $and_query = KinoSearch::Search::ANDQuery->new;
$and_query->add_child($user_query);
$and_query->add_child($category_query);
my $hits = $searcher->search( query => $and_query );
...
The only significant differences are that the Filter only runs the
query once, and that it can't be serialized and sent over the network
in a search cluster (because the search results are cached in a
BitVector which is too big to send).
Lucene provides classes called RemoteCachingWrapperFilter and
FilterManager that address the problem of filter caching in search
clusters, and whose functionality might eventually end up in either
KSx::Remote or KSx::Search::Filter. Again, though, they are caching
optimizations with serialization limitations and as such belong
outside of core.
I thought about keeping Filter as an abstract base class, and putting
the actual functionality into KSx::Search::QueryFilter or something
like that. However, after reviewing the various Filter subclasses in
both Lucene's core and contrib, it looked to me as though nearly all
of them (all except for the SpanFilter subclasses which would need to
be different anyway) could be realized using either ordinary Queries
or Queries in conjunction with this new implementation of Filter.
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch