Mailing List Archive: get_schema in KinoSearch::Index::MultiReader

OK, I can't get my head around this one. Can't get it to a test case either :(
I think it needs to be in a multi-segment index to work, judging by the error
message.

$ perl -MGlob -le 'print Glob->searcher->search(query => q/tag:foobar/)'
KinoSearch::Search::Hits=HASH(0x9084540)

$ perl -MGlob -le 'print Glob->searcher->search(query => q/tag:foo -bar/)'
KinoSearch::Search::Hits=HASH(0x9084540)

$ perl -MGlob -le 'print Glob->searcher->search(query => q/tag:foo-bar/)'
Can't locate object method "get_schema" via package "KinoSearch::Index::MultiReader"
at /usr/local/lib/perl/5.8.8/KinoSearch/Search/PhraseQuery.pm line 132.

NONE of those tags appear in the index. It's only dying on terms with
hyphens in them, but consistently there. Whatever's triggering it, anyway,
I think you want to implement that get_schema method. I'm guessing:

sub get_schema { shift->{invindex}->get_schema } # ???

Simon

On Apr 16, 2007, at 10:59 AM, Simon Cozens wrote:

> OK, I can't get my head around this one. Can't get it to a test
> case either :(

I appreciate your attempting to create one -- they make life so much
easier. Fortunately, this one's obvious.

> I think it needs to be in a multi-segment index to work, judging by
> the error
> message.

Yes. The problem snuck in during recent refactoring of PhraseQuery.
Apparently the test suite never tries a phrase query on a multi-
segment index.

> Can't locate object method "get_schema" via package
> "KinoSearch::Index::MultiReader"
> at /usr/local/lib/perl/5.8.8/KinoSearch/Search/PhraseQuery.pm line
> 132.

The code in PhraseQuery.pm should have been something like this instead:

my $schema = $reader->get_invindex->get_schema;

This isn't the first time I've made that error, just the first time
it's made it into subversion AFAIK. Internally, it's become common
to need access to the Schema instance that an IndexReader subclass is
using... so it's time to add that method.

> NONE of those tags appear in the index. It's only dying on terms with
> hyphens in them, but consistently there.

Terms with hyphens in them get broken at the hyphen by the default
Tokenizer implementation (which is used by PolyAnalyer). QueryParser
breaks on whitespace first (I'm simplifying), then feeds the
resulting string to the field's analyzer. If multiple tokens come
back, it treats the result as a phrase. Thus, searching for 'hold-
up' returns different results than searching for 'hold up'.

(: BTW, one of these days, I'd like to see how the regex in
Plucene::Analysis::Standard::StandardTokenizer performs. You know,
the one that inspired the comment "# Don't blame me, blame the
Plucene people!". :)

> sub get_schema { shift->{invindex}->get_schema } # ???

Looks good. It should go in MultiReader's parent class, IndexReader,
though. Then, for the sake of code simplicity, we should delete
SegReader's implementation by removing 'schema' from the list of
getters initialized by __PACKAGE__->ready_get.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/