Mailing List Archive

Devel package - cannot access indexed fields
Hi,

I recently installed and tested KinoSearch-0.162. Some features that I
would like to use such as the RangeFilter are implemented in the devel
version, so I upgraded to KinoSearch-0.20_051 on RH 2.6.9-34 i386 with
all build tests passing. I dont mind using the dev package, however I am
having trouble getting basic functionality to work such as reading
fields from the InvIndex. My configuration is very close to that
KinoSearch::Docs::Tutorial::BeyondSimple with the exception that I am
defining %conf directly in both the indexer and search scripts (instead
of reading it from a single file). After building the InvIndex, I need
to change the file permissions on _1.cf and segments_2.yaml (defaults to
600). In search.pl I have something like:

use lib '/path/to/KIndexer.pm/';
use KIndexer;
use KinoSearch::Searcher;
use KinoSearch::QueryParser;
use KinoSearch::Highlight::Highlighter;

my $searcher = KinoSearch::Searcher->new( invindex=>KIndexer->read(
$conf{'invindexpath'} ), );

my $reader = $searcher->get_reader;
print 'docs: '. $reader->max_doc; #about 5k
print ' accessible: '.$reader->num_docs; #about 5k

my $lex = $reader->lexicon( 'summary' ); #silently croaks
$lex->next;
print "\nterm: " .$lex->get_term->get_text;

'summary', and five other fields are defined as text in package KIndexer
within our %fields. I think default mode is index and vectorize. However
, searching (even with a simple term, no query parser) yields no hits,
the highlighter chokes (when implemented according to devel docs), and
fields seem to be undefined. Any ideas?

Thanks,
Tamer





_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Devel package - cannot access indexed fields [ In reply to ]
[OK, folks, I'm back in action.]

On Feb 9, 2008, at 6:39 AM, Tamer Rizk wrote:

> After building the InvIndex, I need
> to change the file permissions on _1.cf and segments_2.yaml
> (defaults to
> 600).

This should no longer be necessary as of r2998. Thanks for pointing
it out.

> my $lex = $reader->lexicon( 'summary' ); #silently croaks

That's the interface for the current svn trunk. The current CPAN dev
release has a different interface, and Lexicon isn't even a public
class. You may have become confused by the fact that the docs at <http://www.rectangular.com/kinosearch/docs/devel/
> reflect svn rather than the dev release. I should probably change
that. For now, you might consult search.cpan.org: <http://search.cpan.org/~creamyg/KinoSearch-0.20_051/
>.

Regardless, I'm surprised that the code above would silently croak.
There is no lexicon() method in 0.20_051, so you should see something
like this, even without strict and warnings:

Can't locate object method "lexicon" via package
"KinoSearch::Index::SegReader" at testy.pl line 12.

Try this and tell me what happens:

my $lexicon = $reader->look_up_field('summary');
die "no lexicon" unless $lexicon;
while ($lexicon->next) {
print $lexicon->get_term->get_text;
}

> 'summary', and five other fields are defined as text in package
> KIndexer
> within our %fields. I think default mode is index and vectorize.
> However
> , searching (even with a simple term, no query parser) yields no hits,
> the highlighter chokes (when implemented according to devel docs), and
> fields seem to be undefined. Any ideas?


One possibility is that you are searching for e.g. an unstemmed term
when only the stemmed version exists in the index. A search for
"horse" will turn up empty if the index only contains the stemmed
version "hors".

Also, I'm confused why error messages are not serving to guide you
towards the correct path. When you say that the highlighter "chokes",
what do you see?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch
Re: Devel package - cannot access indexed fields [ In reply to ]
I was able to get search fully functional with a query parser and range
filter using KinoSearch devel 0.20_051 (2008-01-20) with perl v5.8.5 on
i386-linux the other day. I was, in fact, applying the docs at
http://www.rectangular.com/kinosearch/docs/devel/ to the dev release,
thanks. Using KinoSearch with InvIndex->finish(optimize=>1) on about
80MB of data, (dual Xeon 3.0GHz, 2G RAM), I am getting average 2.5 sec
search time including network latency when request/response as Ajax.

I tried to replicate the error (of no error messages) by bringing back
both the script and package to their former (non working) states from
memory, with no luck. Was using strict, no warnings. I then tried both:

my $reader = $searcher->get_reader;
my $lexicon = $reader->lexicon('summary');
die "no lexicon" unless $lexicon;
while ($lexicon->next) {
print $lexicon->get_term->get_text;
last;
}

and

my $reader = $searcher->get_reader;
my $lexicon = $reader->look_up_field('summary');
die "no lexicon" unless $lexicon;
while ($lexicon->next) {
print $lexicon->get_term->get_text;
last;
}

The former yields:
Can't locate object method "lexicon" via package
"KinoSearch::Index::SegReader"

and the latter: a term.

Since I want to search for a set of exact phrases, I am using the
PolyAnalyzer without the stemmer. With respect to the highlighter, it
does not do anything since upgrade to devel version. Using:

my $highlighter = KinoSearch::Highlight::Highlighter->new;
$highlighter->add_spec( field => 'summary' );
$hits->create_excerpts( highlighter => $highlighter );
...
print $hit->{excerpts}{summary};

I get the summary without highlights. Which, incidentally, is perfectly
okay because KinoSearch is an excellent piece of software. You are the man.

Thanks,
Tamer


Marvin Humphrey wrote:
> [OK, folks, I'm back in action.]
>
> On Feb 9, 2008, at 6:39 AM, Tamer Rizk wrote:
>
>> After building the InvIndex, I need
>> to change the file permissions on _1.cf and segments_2.yaml (defaults to
>> 600).
>
> This should no longer be necessary as of r2998. Thanks for pointing it
> out.
>
>> my $lex = $reader->lexicon( 'summary' ); #silently croaks
>
> That's the interface for the current svn trunk. The current CPAN dev
> release has a different interface, and Lexicon isn't even a public
> class. You may have become confused by the fact that the docs at
> <http://www.rectangular.com/kinosearch/docs/devel/> reflect svn rather
> than the dev release. I should probably change that. For now, you
> might consult search.cpan.org:
> <http://search.cpan.org/~creamyg/KinoSearch-0.20_051/>.
>
> Regardless, I'm surprised that the code above would silently croak.
> There is no lexicon() method in 0.20_051, so you should see something
> like this, even without strict and warnings:
>
> Can't locate object method "lexicon" via package
> "KinoSearch::Index::SegReader" at testy.pl line 12.
>
> Try this and tell me what happens:
>
> my $lexicon = $reader->look_up_field('summary');
> die "no lexicon" unless $lexicon;
> while ($lexicon->next) {
> print $lexicon->get_term->get_text;
> }
>
>> 'summary', and five other fields are defined as text in package KIndexer
>> within our %fields. I think default mode is index and vectorize. However
>> , searching (even with a simple term, no query parser) yields no hits,
>> the highlighter chokes (when implemented according to devel docs), and
>> fields seem to be undefined. Any ideas?
>
>
> One possibility is that you are searching for e.g. an unstemmed term
> when only the stemmed version exists in the index. A search for "horse"
> will turn up empty if the index only contains the stemmed version "hors".
>
> Also, I'm confused why error messages are not serving to guide you
> towards the correct path. When you say that the highlighter "chokes",
> what do you see?
>
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
>
>
> _______________________________________________
> KinoSearch mailing list
> KinoSearch@rectangular.com
> http://www.rectangular.com/mailman/listinfo/kinosearch
>

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch