Mailing List Archive

Case Sensitive and Insensitive Searches BOTH needed
Hello,

An application I am building requires me to perform both case sensitive as well as case insensitive searches on the fly. I mean while searching for a particular word/phrase in my application, the user has the option to have the results sensitive or in-sensitive to case, depending upon the state of a checkbox. What can I do to get around this problem?

If I use StandardAnalyzer, it uses the LowerCaseFilter...so the results are all case-insensitive. If I modify the StandardAnalyzer code to remove the LowerCaseFilter, I get all case sensitive results! What should I do?

Thanks,
Ajit
RE: Case Sensitive and Insensitive Searches BOTH needed [ In reply to ]
1) Modify StandardAnalyzer to remove the LowerCaseFilter

2) Make 2 fields in your index, one with the text in the original format and
one with all lower-case format.

3) a) If the checkbox calls for case-insensitive search, make the search
string lower-case and search the field containing the text with lower-case
text.

b) If the checkbox calls for case-sensitive search, search the field with
the original text without modifying the search string.


regards,
Anders Nielsen

-----Original Message-----
From: AJIT RAJWADE [mailto:ajit.rajwade@veritas.com]
Sent: 20. juni 2002 14:46
To: lucene-user@jakarta.apache.org
Subject: Case Sensitive and Insensitive Searches BOTH needed


Hello,

An application I am building requires me to perform both case sensitive as
well as case insensitive searches on the fly. I mean while searching for a
particular word/phrase in my application, the user has the option to have
the results sensitive or in-sensitive to case, depending upon the state of a
checkbox. What can I do to get around this problem?

If I use StandardAnalyzer, it uses the LowerCaseFilter...so the results are
all case-insensitive. If I modify the StandardAnalyzer code to remove the
LowerCaseFilter, I get all case sensitive results! What should I do?

Thanks,
Ajit

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Case Sensitive and Insensitive Searches BOTH needed [ In reply to ]
Perhaps you could use 2 indices and use MultiSearcher to search them.
When you get the results from both indices merge them, using a unique
field to identify duplicates.

Otis

--- AJIT RAJWADE <ajit.rajwade@veritas.com> wrote:
> Hello,
>
> An application I am building requires me to perform both case
> sensitive as well as case insensitive searches on the fly. I mean
> while searching for a particular word/phrase in my application, the
> user has the option to have the results sensitive or in-sensitive to
> case, depending upon the state of a checkbox. What can I do to get
> around this problem?
>
> If I use StandardAnalyzer, it uses the LowerCaseFilter...so the
> results are all case-insensitive. If I modify the StandardAnalyzer
> code to remove the LowerCaseFilter, I get all case sensitive results!
> What should I do?
>
> Thanks,
> Ajit
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Case Sensitive and Insensitive Searches BOTH needed [ In reply to ]
One option that hasn't been said is to code a custom filter that
spits out a case-preserved token and a lower-cased token --per word.
The common case is to return just one token since most words are all
lower case anyway. This option is better than the other ideas I've
heard here so far because the size of the index would be smaller than
the dual-index and dual-field strategies suggested.

~ Dave Smiley


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Case Sensitive and Insensitive Searches BOTH needed [ In reply to ]
Wouldn't that make it hard to search for phrases?

-----Original Message-----
From: David Smiley [mailto:dsmiley@mac.com]
Sent: 21. juni 2002 02:44
To: Lucene Users List
Subject: Re: Case Sensitive and Insensitive Searches BOTH needed


One option that hasn't been said is to code a custom filter that
spits out a case-preserved token and a lower-cased token --per word.
The common case is to return just one token since most words are all
lower case anyway. This option is better than the other ideas I've
heard here so far because the size of the index would be smaller than
the dual-index and dual-field strategies suggested.

~ Dave Smiley


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Case Sensitive and Insensitive Searches BOTH needed [ In reply to ]
On Friday, June 21, 2002, at 05:06 AM, Anders Nielsen wrote:

> Wouldn't that make it hard to search for phrases?


If you use the same Analyzer for the query parser, then you should be
able to search for phrases since the query itself will also go
through the same process.

~ Dave


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Case Sensitive and Insensitive Searches BOTH needed [ In reply to ]
On Saturday, June 22, 2002, at 05:47 AM, Anders Nielsen wrote:

> Let's say the text is "The apple is green" and when we run that
> through the
> analyzer we get the tokens [The the apple apple is is green green].
> (Correct
> me if I'm wrong)

Actually it would be "The the apple is green" since only one token is
spit out if it's already lower-cased.

> Now if we want an case-sensitive search for "The apple", you're
> right that
> if we run it through the same analyzer we search for the tokens [The
> the
> apple apple].
>
> But what if we wan't a case-insensitive phrase search?

Ahh, right. *case-insensitive phrase searches* won't work, but
single word searches --case insensitive or not, and case-sensitive
phrase searches should work. It would be nice if case sensitivity
was something handled by lucene itself to get around this limitation
of the phrase search. The other ideas (separate fields for case
sensitive versions) is a bit heavyweight since there's so much
redundant info.

~ Dave Smiley


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>