Mailing List Archive

Case sensitive and case insensitive on same index
Hi All

Thank you for your help to date, totally appreciated. I'm currently looking into doing a search that is both case sensitive as well as case insensitive on the same index =o. Upon setting up my object I do the following:

my $tokenizer = KinoSearch::Analysis::Tokenizer->new;
my $lc_normalizer = KinoSearch::Analysis::LCNormalizer->new;
my $stemmer = KinoSearch::Analysis::Stemmer->new(language => 'en');
my $analyzer = KinoSearch::Analysis::PolyAnalyzer->new(analyzers => [$lc_normalizer, $tokenizer, $stemmer]);

The above works nicely for
case insensitive indexing, however removing

my $lc_normalizer = KinoSearch::Analysis::LCNormalizer->new;


will create an index that is case
sensitive!

I'm looking to build a search option that allows the user to 'Match case'. I'm also trying to avoid building a separate index that preserves case sensitivity. Is there a work around for this or am I forced to have duplicate indexing (one case
sensitive while the others case insensitive)?

Assistance appreciated.

Regards,
Riyaad


Re: Case sensitive and case insensitive on same index [ In reply to ]
On Wed, Aug 6, 2008 at 2:33 AM, Riyaad Miller <riyaad.miller@predix.com> wrote:
> I'm looking to build a search option that allows the user to 'Match case'.
> I'm also trying to avoid building a separate index that preserves case
> sensitivity. Is there a work around for this or am I forced to have
> duplicate indexing (one case sensitive while the others case insensitive)?

I don't think there is any easy to do this. The way the
inverted-index works is that each 'word' has a distinct entry in the
index that lists the occurrences of that 'word'. Since you want to
have case-sensitive search, this means the entries for the words have
to be case-sensitive.

To enable case-insensitive search, you'd have to search through lots
of different entries in the index. There is no way to do this
automatically. You could, if you chose, create a custom query parser
that would treat all the case combinations as synonyms, expanding each
query into a lengthy Boolean.

It might be simplest just to have parallel indexes.

Nathan Kurz
nate@verse.com

_______________________________________________
KinoSearch mailing list
KinoSearch@rectangular.com
http://www.rectangular.com/mailman/listinfo/kinosearch