Mailing List Archive

Best way to set default_boolop to 'AND'
Dear List,

for the overwhelming majority of my searches, it makes sense to combine
the terms using 'AND', not 'OR'. To do that, I need to replace
'default_boolop' as described in KinoSearch::QueryParser::QueryParser.
However, that class requires that I give it the list of applicable
fields.
What I have come up with is this:

my $r = KinoSearch::Index::IndexReader->new(invindex => INDEX);
# Copied from dump_index:
my @readers = ref $r->{sub_readers} eq 'ARRAY' ?
@{ $r->{sub_readers} } : $r;
my @fields =
map { $_->get_name }
map { $_->get_infos }
map { $_->get_finfos } @readers;

And then:

my $hits = $searcher->search(query =>
KinoSearch::QueryParser::QueryParser->new(
analyzer => $analyzer,
fields => \@fields,
default_boolop => 'AND',
)->parse($query)
);

This is a large departure from the original code:

my $hits = $searcher->search(query => $query);

which does not care if it knows the names of all the fields [1].

Question: is this the best way to do this or am I missing somthing
obvious?

Thank you,

- Dmitri.

1. By the way, why not?
Best way to set default_boolop to 'AND' [ In reply to ]
On Mar 14, 2007, at 10:04 AM, Dmitri Tikhonov wrote:

> This is a large departure from the original code:
>
> my $hits = $searcher->search(query => $query);
>
> which does not care if it knows the names of all the fields [1].

Searcher->search has a simple interface because it makes many, many
assumptions about what the default behavior should be. It is an
"easy thing easy, hard things possible" API design.

It would not be possible to concentrate all conceivable options in a
single interface; adding even a couple (e.g. default_boolop, fields)
would clutter things up while still covering only a fraction of the
potential configurations.

Therefore, any time you need to override the defaults of Searcher-
>search, it is necessarily going to involve a significant
departure. Nevertheless, I agree that your particular use case could
be improved.

> Question: is this the best way to do this or am I missing somthing
> obvious?

There are slightly less cumbersome ways of generating the list of
fields in 0.15, but they all rely on internal APIs. IndexReader and
Searcher actually have get_fields() methods, which Searcher uses for
precisely this purpose. Take a look at the code in Searcher-
>_prepare_simple_search().

In the development branch of KS (which I don't know if you are
familiar with; the most recent CPAN release is 0.20_02), the
interface has changed.

http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/
QueryParser/QueryParser.html

If you are not adding fields dynamically, then your code could be
rewritten as

my $hits = $searcher->search(query =>
KinoSearch::QueryParser::QueryParser->new(
schema => MySchema->new,
default_boolop => 'AND',
)->parse($query)
);

If you *are* adding fields dynamically, you need to make sure that
the Schema instance you pass to QueryParser knows about them.
Calling MySchema->open() has the side effect of loading up the Schema
object with all fields which were added dynamically on all prior
indexing passes:

my $schema = MySchema->new;
my $invindex = $schema->open('/path/to/invindex'); # adds dynamic
fields
my $searcher = KinoSearch::Searcher->new( invindex => $invindex );

my $hits = $searcher->search(query =>
KinoSearch::QueryParser::QueryParser->new(
schema => $schema,
default_boolop => 'AND',
)->parse($query)
);

> 1. By the way, why not?

[. this quote refer to footnote 1, rather than a list item ]

The default is to search all indexed fields. If that's not
appropriate, you have the option of overriding.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
Best way to set default_boolop to 'AND' [ In reply to ]
Thanks for the detailed response, Marvin -- I will try the other
approaches as well.

I haven't played with 0.20 yet; I am using the stable 0.15 for now. I
hope to have some time to try it soon, though.

- Dmitri.

> -----Original Message-----
> From: kinosearch-bounces+dtikhonov=vonage.com@rectangular.com
> [mailto:kinosearch-bounces+dtikhonov=vonage.com@rectangular.co
> m] On Behalf Of Marvin Humphrey
> Sent: Thursday, March 15, 2007 12:51 PM
> To: Dmitri Tikhonov
> Subject: Re: [KinoSearch] Best way to set default_boolop to 'AND'
>
>
> On Mar 14, 2007, at 10:04 AM, Dmitri Tikhonov wrote:
>
> > This is a large departure from the original code:
> >
> > my $hits = $searcher->search(query => $query);
> >
> > which does not care if it knows the names of all the fields [1].
>
> Searcher->search has a simple interface because it makes many, many
> assumptions about what the default behavior should be. It is
> an "easy thing easy, hard things possible" API design.
>
> It would not be possible to concentrate all conceivable
> options in a single interface; adding even a couple (e.g.
> default_boolop, fields) would clutter things up while still
> covering only a fraction of the potential configurations.
>
> Therefore, any time you need to override the defaults of
> Searcher- >search, it is necessarily going to involve a
> significant departure. Nevertheless, I agree that your
> particular use case could be improved.
>
> > Question: is this the best way to do this or am I missing somthing
> > obvious?
>
> There are slightly less cumbersome ways of generating the
> list of fields in 0.15, but they all rely on internal APIs.
> IndexReader and Searcher actually have get_fields() methods,
> which Searcher uses for precisely this purpose. Take a look
> at the code in Searcher- >_prepare_simple_search().
>
> In the development branch of KS (which I don't know if you
> are familiar with; the most recent CPAN release is 0.20_02),
> the interface has changed.
>
> http://www.rectangular.com/kinosearch/docs/devel/KinoSearch/
> QueryParser/QueryParser.html
>
> If you are not adding fields dynamically, then your code
> could be rewritten as
>
> my $hits = $searcher->search(query =>
> KinoSearch::QueryParser::QueryParser->new(
> schema => MySchema->new,
> default_boolop => 'AND',
> )->parse($query)
> );
>
> If you *are* adding fields dynamically, you need to make sure that
> the Schema instance you pass to QueryParser knows about them.
> Calling MySchema->open() has the side effect of loading up
> the Schema object with all fields which were added
> dynamically on all prior indexing passes:
>
> my $schema = MySchema->new;
> my $invindex = $schema->open('/path/to/invindex'); # adds
> dynamic fields
> my $searcher = KinoSearch::Searcher->new( invindex => $invindex );
>
> my $hits = $searcher->search(query =>
> KinoSearch::QueryParser::QueryParser->new(
> schema => $schema,
> default_boolop => 'AND',
> )->parse($query)
> );
>
> > 1. By the way, why not?
>
> [. this quote refer to footnote 1, rather than a list item ]
>
> The default is to search all indexed fields. If that's not
> appropriate, you have the option of overriding.
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch@rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>