Mailing List Archive

Re: newbie: Indexing and searching text not
Hi,

There is a utility that comes with the KinoSearch distribution called
dump_index. Running that shows these terms associated with the body field:

Terms:
body:a
Doc 0 (1 occurrences)
body:bodi
Doc 0 (1 occurrences)
body:here
Doc 0 (1 occurrences)
body:is
Doc 0 (1 occurrences)
body:short
Doc 0 (1 occurrences)
body:this
Doc 0 (1 occurrences)
body:veri
Doc 0 (1 occurrences)

So you can see that the PolyAnalyzer converted "very" to "veri." To get your
example to work then, either search for "veri" or run the word "very"
through the PolyAnalyzer first.

Best,
Mike



On Mon, Aug 25, 2008 at 2:58 PM, <kinosearch-request@rectangular.com> wrote:

> Date: Mon, 25 Aug 2008 11:40:10 +0530
> From: ram <ram@netcore.co.in>
> Subject: Re: [KinoSearch] newbie: Indexing and searching text not
> working
> To: KinoSearch discussion forum <kinosearch@rectangular.com>
> Message-ID: <1219644610.22357.61.camel@darkstar.netcore.co.in>
> Content-Type: text/plain
>
>
> On Sat, 2008-08-23 at 15:22 -0400, Mike Barborak wrote:
> > Hi,
> >
> > After creating your index with PolyAnalyzer, your body field will have
> > the terms "short" and "body" but not "short body." Take a look at
> > KinoSearch::QueryParser::QueryParser as it will likely do what you
> > want.
>
> I think my installation has got some issue. I cant search on a single
> word too
>
>
>
> ---------------------------------------
> use KinoSearch::InvIndexer;
> use KinoSearch::Analysis::PolyAnalyzer;
> use KinoSearch::Searcher;
> use strict;
> #
> # Start on a clean slate
> #
> system("rm -rf /tmp/invindex/*");
> my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language =>
> 'en' );
> @gl::headers = qw(from to cc subject body date reply-to message-id
> in-reply-to filename);
> my $invindexer = KinoSearch::InvIndexer->new(
> invindex => '/tmp/invindex',
> create => 1,
> analyzer => $analyzer,
> );
> foreach (@gl::headers) {
> $invindexer->spec_field( name => $_ ,indexed =>1);
> }
> my $doc = $invindexer->new_doc;
> my %mail = (
> 'date' => 'Mon, 07 Jan 2008 14:04:35 +0530',
> 'to' => 'myteam@example.com',
> 'subject' => 'subject test here ',
> 'body' => 'This is a very short body here ',
> 'cc' => 'ram@example.com',
> 'from' => 'sagar@example.com',
> 'message-id' => '<1199694875.14998.392.camel@sagar.example.com>',
> 'filename'=>'/abc/def'
> );
> foreach (keys %mail) {
> next unless($mail{$_});
> $doc->set_value( $_ => $mail{$_} );
> }
> $invindexer->add_doc($doc);
> $invindexer->finish;
>
>
> $analyzer = KinoSearch::Analysis::PolyAnalyzer->new( language =>
> 'en' );
> my $searcher = KinoSearch::Searcher->new(
> invindex => '/tmp/invindex',
> analyzer => $analyzer,
> );
> #
> # Search on body
> #
> my $term = KinoSearch::Index::Term->new("body","very");
> my $term_query = KinoSearch::Search::TermQuery->new(term => $term);
> my $hits = $searcher->search( query => $term_query );
> while ( my $hit = $hits->fetch_hit_hashref ){
> print "Found HIT in body" . $hit->{body}."\n";
> }
>
> -----------------------------------------------------------------
>
> I am using Fedora-8 and perl-5.10 and latest kinosearch installed via
> CPAN
>