Hi there,
I'm running v0.15. I want to make use of multiple (2) analyzers, so
that I can benefit from stemming and stop-words for some fields (as
default behaviour), whilst benefiting from 'exact matches' in other
fields (by not stemming). I have the following code in my 'build
index' script :
#---
# The default analyzer.
my $stemmed_analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [.
KinoSearch::Analysis::LCNormalizer->new( language => 'en' ),
KinoSearch::Analysis::Tokenizer->new( language => 'en' ),
KinoSearch::Analysis::Stopalizer->new( language => 'en' ),
KinoSearch::Analysis::Stemmer->new( language => 'en' )
]
);
my $unstemmed_analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [.
KinoSearch::Analysis::LCNormalizer->new( language => 'en' ),
KinoSearch::Analysis::Tokenizer->new( language => 'en' ),
]
);
my $inv_indexer = KinoSearch::InvIndexer->new(
invindex => $index_dir,
analyzer => $stemmed_analyzer,
create => 1,
);
$inv_indexer->spec_field(
name => 'title',
);
$inv_indexer->spec_field(
name => 'title_unstemmed',
analyzer => $unstemmed_analyzer,
boost => 2,
);
$inv_indexer->spec_field(
name => 'content',
);
#---
In my 'search' script, I am using the following :
#---
my $stemmed_analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [.
KinoSearch::Analysis::LCNormalizer->new( language => 'en' ),
KinoSearch::Analysis::Tokenizer->new( language => 'en' ),
KinoSearch::Analysis::Stopalizer->new( language => 'en' ),
KinoSearch::Analysis::Stemmer->new( language => 'en' )
]
);
my $query_parser = KinoSearch::QueryParser::QueryParser->new(
analyzer => $stemmed_analyzer,
fields => [ qw/ title title_unstemmed content / ],
default_boolop => 'AND',
);
#---
How do I tell it to use the unstemmed analyzer for the title_unstemmed
field? The docs emphasize the importance of using the same analyzer in
both stages (build and search) but I cannot seem to do that as
QueryParser can only take one analyzer.
Thanks,
Adam
I'm running v0.15. I want to make use of multiple (2) analyzers, so
that I can benefit from stemming and stop-words for some fields (as
default behaviour), whilst benefiting from 'exact matches' in other
fields (by not stemming). I have the following code in my 'build
index' script :
#---
# The default analyzer.
my $stemmed_analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [.
KinoSearch::Analysis::LCNormalizer->new( language => 'en' ),
KinoSearch::Analysis::Tokenizer->new( language => 'en' ),
KinoSearch::Analysis::Stopalizer->new( language => 'en' ),
KinoSearch::Analysis::Stemmer->new( language => 'en' )
]
);
my $unstemmed_analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [.
KinoSearch::Analysis::LCNormalizer->new( language => 'en' ),
KinoSearch::Analysis::Tokenizer->new( language => 'en' ),
]
);
my $inv_indexer = KinoSearch::InvIndexer->new(
invindex => $index_dir,
analyzer => $stemmed_analyzer,
create => 1,
);
$inv_indexer->spec_field(
name => 'title',
);
$inv_indexer->spec_field(
name => 'title_unstemmed',
analyzer => $unstemmed_analyzer,
boost => 2,
);
$inv_indexer->spec_field(
name => 'content',
);
#---
In my 'search' script, I am using the following :
#---
my $stemmed_analyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [.
KinoSearch::Analysis::LCNormalizer->new( language => 'en' ),
KinoSearch::Analysis::Tokenizer->new( language => 'en' ),
KinoSearch::Analysis::Stopalizer->new( language => 'en' ),
KinoSearch::Analysis::Stemmer->new( language => 'en' )
]
);
my $query_parser = KinoSearch::QueryParser::QueryParser->new(
analyzer => $stemmed_analyzer,
fields => [ qw/ title title_unstemmed content / ],
default_boolop => 'AND',
);
#---
How do I tell it to use the unstemmed analyzer for the title_unstemmed
field? The docs emphasize the importance of using the same analyzer in
both stages (build and search) but I cannot seem to do that as
QueryParser can only take one analyzer.
Thanks,
Adam