Mailing List Archive

using kinosearch without stemming
I like using KS. It's fast, and though I sometimes get lost in the twisty maze
of classes, the documentation is generally pretty good.

I especially like using it for things that I might have previously used a
database for -- log files and the like, where I want quick and flexible
searching.

Something I keep running into is this kind of scenario:

Assume a field called "action" that can have (among other values) "rejected".
I don't want this to be stemmed, because rather than being ordinary speech it's
effectively like an enum. So my first instinct is to make a FieldSpec subclass
with

sub analyzed { 0 }

However, this only seems to take effect while building the invindex. When I
search for "action:rejected", I get no results; I am guessing this is because
the searcher is still stemming it to 'reject', because if I do this instead it
works:

sub analyzer { return KinoSearch::Analysis::LCNormalizer->new }

This is counterintuitive enough that I feel like I must be missing something.
Is there a more excellent way?

hdp.
using kinosearch without stemming [ In reply to ]
On Jun 7, 2007, at 11:31 AM, Hans Dieter Pearcey wrote:

> I like using KS. It's fast, and though I sometimes get lost in the
> twisty maze
> of classes, the documentation is generally pretty good.

Thanks!

I'll have more to say about navigating the twisty maze later... maybe
over the weekend...

> I especially like using it for things that I might have previously
> used a
> database for -- log files and the like, where I want quick and
> flexible
> searching.

Thanks, it's good to know how people are using KS beyond the
archetypal setup of CGI search for a website.

> Assume a field called "action" that can have (among other values)
> "rejected".
> I don't want this to be stemmed, because rather than being ordinary
> speech it's
> effectively like an enum. So my first instinct is to make a
> FieldSpec subclass
> with
>
> sub analyzed { 0 }
>
> However, this only seems to take effect while building the invindex.

You're right. It's a bug in QueryParser. Here's the code that's
been misbehaving:

for my $field (@$fields) {
# custom analyze for each field unless override
my $analyzer = $supplied_analyzer;
$analyzer = $schema->fetch_analyzer($field) unless defined
$analyzer;

my @token_texts = grep {length} $analyzer->analyze_raw($text);
my $query = $self->_gen_single_field_query( $field,
\@token_texts );
push @queries, $query if defined $query;
}

QueryParser was finding the "correct" analyzer for the field -- since
none was specified, fetch_analyzer() returns the main analyzer for
$schema. However, QueryParser wasn't obeying the field's analyzed()
property, as you discovered.

The problem is fixed as of subversion repository revision 2465.

svn co -r 2465 http://www.rectangular.com/svn/kinosearch/trunk ks

> When I search for "action:rejected",

You may have seen this in a recent post of mine, but just FYI the
'field_name:term_text' syntax is now off by default in QueryParser.
You can get it back via $query_parser->set_heed_colons(1).

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/