Mailing List Archive

Prefix query case sensitive?
I am using the StandardAnalyzer for both indexing and QueryParser. If I
index the following text: "Massachusetts"

The following query finds the document: "mass*"

But if I use this query then the document is not found: "Mass*"

Is this the expected behavior or am I doing something wrong?

Also, what is the difference between a prefix query and wildcard query?
What is the query syntax for a wildcard query? Does the StandardAnalyzer
support WildcardQuery?

Note: I'm using Lucene 1.2 RC2

Thanks.
Paul Friedman


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Prefix query case sensitive? [ In reply to ]
Yes, prefix queries are indeed case sensitive. The problem is that prefixes
do not go through an analyzer, which is where lower-casing is done. The
reason is that if you were searching for "dogs*" you would not want "dogs"
first stemmed to "dog", since that would then match "dog*", which is not the
intended query. A workaround for this is simply to lowercase the entire
query before passing it into the query parser.

Doug

> -----Original Message-----
> From: Paul Friedman [mailto:pfriedman@macromedia.com]
> Sent: Monday, October 29, 2001 1:18 PM
> To: 'lucene-user@jakarta.apache.org'
> Subject: Prefix query case sensitive?
>
>
> I am using the StandardAnalyzer for both indexing and
> QueryParser. If I
> index the following text: "Massachusetts"
>
> The following query finds the document: "mass*"
>
> But if I use this query then the document is not found: "Mass*"
>
> Is this the expected behavior or am I doing something wrong?
>
> Also, what is the difference between a prefix query and
> wildcard query?
> What is the query syntax for a wildcard query? Does the
> StandardAnalyzer
> support WildcardQuery?
>
> Note: I'm using Lucene 1.2 RC2
>
> Thanks.
> Paul Friedman
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Prefix query case sensitive? [ In reply to ]
Doug Cutting wrote:
>
> Yes, prefix queries are indeed case sensitive. The problem is that prefixes
> do not go through an analyzer, which is where lower-casing is done. The
> reason is that if you were searching for "dogs*" you would not want "dogs"
> first stemmed to "dog", since that would then match "dog*", which is not the
> intended query. A workaround for this is simply to lowercase the entire
> query before passing it into the query parser.

(Sorry for the late comment, I got a job to do)

Thats no good, at least with my german stemmer. Lowercase only
prefix/wildcard terms.
The problem is that nouns are stemmed differently, and if the whole
query becomes lowercased, the non-prefix/wildcard terms will be
stemmed differently.
I don't want to switch to medium stemming, because soft stemming
with seperated noun stemming and lowercase indexing produces best
overall results for german language (uniqueness of discriminators,
rate of noise and processing speed).


Btw., Doug, could I get access to the CVS to commit changes/
improvements to the german stemming classes?


Greets,
Gerhard

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>