Mailing List Archive

Wildcard Searching
We're really struggling with trying to understand why the WildcardQuery
seems to strip out the question mark by replacing it with a space. We're
using the daily build, and a StandardAnalyzer. We've got the text "The Round
Window" in our index. If we search on "roun*" the Lucene QueryParser returns
a hit. When we search on "roun?", we don't get any hits. We don't even know
how to make heads or tails of the WildcardQuery or WildcardTermEnum classes.

Also, Lucene returns the parsed version of each of our searches. When we
search by rou*d, Lucene parses it as rou*d (which is what we would expect).
But when we search by rou?d, Lucene parses it as "rou d". It seems to wrap
the term in quotes and replace the question mark with a space. Any ideas? Or
can someone give us an idea of how to understand WildcardQuery or
WildcardTermEnum?

Michael

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Wildcard Searching [ In reply to ]
> From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
>
> Also, Lucene returns the parsed version of each of our
> searches. When we
> search by rou*d, Lucene parses it as rou*d (which is what we
> would expect).
> But when we search by rou?d, Lucene parses it as "rou d". It
> seems to wrap
> the term in quotes and replace the question mark with a
> space. Any ideas? Or
> can someone give us an idea of how to understand WildcardQuery or
> WildcardTermEnum?

It sounds like the problem is in the query parser. Brian?

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Wildcard Searching [ In reply to ]
From my experience with wildcards,
1. They are case sensitive while the regular queries aren't.
2. Only one wild card is allowed in a word. If you are using this with a
bool query, you can use something like the following
(asas*) AND (fhg*fd). This is acceptable
3. There is a requirement of using atleast one character before wildcard in
a query.(*fhhd is not valid)
4. Special characters are not supported (? may be a special character)
Hope this helps!

-----Original Message-----
From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
Sent: Wednesday, February 27, 2002 10:56 AM
To: Lucene Mailing List (E-mail)
Subject: Wildcard Searching


We're really struggling with trying to understand why the WildcardQuery
seems to strip out the question mark by replacing it with a space. We're
using the daily build, and a StandardAnalyzer. We've got the text "The Round
Window" in our index. If we search on "roun*" the Lucene QueryParser returns
a hit. When we search on "roun?", we don't get any hits. We don't even know
how to make heads or tails of the WildcardQuery or WildcardTermEnum classes.

Also, Lucene returns the parsed version of each of our searches. When we
search by rou*d, Lucene parses it as rou*d (which is what we would expect).
But when we search by rou?d, Lucene parses it as "rou d". It seems to wrap
the term in quotes and replace the question mark with a space. Any ideas? Or
can someone give us an idea of how to understand WildcardQuery or
WildcardTermEnum?

Michael

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Wildcard Searching [ In reply to ]
The StandardAnalyzer uses a lowercase filter, but we tried indexing "the
round hat", just to make sure. The * still worked, but the ? still failed.

We noticed that the ? character is listed in the QueryParser as a WILDTERM.
But after that, the code heads into the WildcardQuery class, and we get lost
amidst "setEnum()" and "wildcardEquals()" stuff. :-)

Seriously though, we're using the StandardAnalyzer directly from Lucene. I
suppose it's possible that the ? is a special character that's getting
stripped out. But we need help to find out exactly where the special
characters are defined or filtered.

Michael

-----Original Message-----
From: Aruna Raghavan [mailto:ArunaR@opin.com]
Sent: Wednesday, February 27, 2002 11:00 AM
To: 'Lucene Users List'
Subject: RE: Wildcard Searching


>>From my experience with wildcards,
1. They are case sensitive while the regular queries aren't.
2. Only one wild card is allowed in a word. If you are using this with a
bool query, you can use something like the following
(asas*) AND (fhg*fd). This is acceptable
3. There is a requirement of using atleast one character before wildcard in
a query.(*fhhd is not valid)
4. Special characters are not supported (? may be a special character)
Hope this helps!

-----Original Message-----
From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
Sent: Wednesday, February 27, 2002 10:56 AM
To: Lucene Mailing List (E-mail)
Subject: Wildcard Searching


We're really struggling with trying to understand why the WildcardQuery
seems to strip out the question mark by replacing it with a space. We're
using the daily build, and a StandardAnalyzer. We've got the text "The Round
Window" in our index. If we search on "roun*" the Lucene QueryParser returns
a hit. When we search on "roun?", we don't get any hits. We don't even know
how to make heads or tails of the WildcardQuery or WildcardTermEnum classes.

Also, Lucene returns the parsed version of each of our searches. When we
search by rou*d, Lucene parses it as rou*d (which is what we would expect).
But when we search by rou?d, Lucene parses it as "rou d". It seems to wrap
the term in quotes and replace the question mark with a space. Any ideas? Or
can someone give us an idea of how to understand WildcardQuery or
WildcardTermEnum?

Michael

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Wildcard Searching [ In reply to ]
We just tried adding the "?" character to QueryParser.jj under
<#_TERM_START_CHAR>. We noticed that the "*" was in that list, so we figured
we'd just give it a try. It seems to have worked. Now when we search on
rou?d, we get hits on the word "round". We're going to try searching for
some other variations to make sure that we've done the right thing.

We'd still be interested to know exactly why this worked (assuming it
continues to solve our problem). What is a TERM_START_CHAR and how is it
used? Obviously it does something important. :-)

-----Original Message-----
From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
Sent: Wednesday, February 27, 2002 11:14 AM
To: 'Lucene Users List'
Subject: RE: Wildcard Searching


The StandardAnalyzer uses a lowercase filter, but we tried indexing "the
round hat", just to make sure. The * still worked, but the ? still failed.

We noticed that the ? character is listed in the QueryParser as a WILDTERM.
But after that, the code heads into the WildcardQuery class, and we get lost
amidst "setEnum()" and "wildcardEquals()" stuff. :-)

Seriously though, we're using the StandardAnalyzer directly from Lucene. I
suppose it's possible that the ? is a special character that's getting
stripped out. But we need help to find out exactly where the special
characters are defined or filtered.

Michael

-----Original Message-----
From: Aruna Raghavan [mailto:ArunaR@opin.com]
Sent: Wednesday, February 27, 2002 11:00 AM
To: 'Lucene Users List'
Subject: RE: Wildcard Searching


>>From my experience with wildcards,
1. They are case sensitive while the regular queries aren't.
2. Only one wild card is allowed in a word. If you are using this with a
bool query, you can use something like the following
(asas*) AND (fhg*fd). This is acceptable
3. There is a requirement of using atleast one character before wildcard in
a query.(*fhhd is not valid)
4. Special characters are not supported (? may be a special character)
Hope this helps!

-----Original Message-----
From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
Sent: Wednesday, February 27, 2002 10:56 AM
To: Lucene Mailing List (E-mail)
Subject: Wildcard Searching


We're really struggling with trying to understand why the WildcardQuery
seems to strip out the question mark by replacing it with a space. We're
using the daily build, and a StandardAnalyzer. We've got the text "The Round
Window" in our index. If we search on "roun*" the Lucene QueryParser returns
a hit. When we search on "roun?", we don't get any hits. We don't even know
how to make heads or tails of the WildcardQuery or WildcardTermEnum classes.

Also, Lucene returns the parsed version of each of our searches. When we
search by rou*d, Lucene parses it as rou*d (which is what we would expect).
But when we search by rou?d, Lucene parses it as "rou d". It seems to wrap
the term in quotes and replace the question mark with a space. Any ideas? Or
can someone give us an idea of how to understand WildcardQuery or
WildcardTermEnum?

Michael

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Wildcard Searching [ In reply to ]
Doug Cutting <DCutting@grandcentral.com> writes:

Just noticed this problem in my program.

It seems as if the analyzer passed to QueryParser.parse(), never is
passed to PrefixQuery (which is what my test case is parsed to).

A quick look in QueryParser.jj confirms this:

q = new PrefixQuery(new Term(field, term.image.substring
(0, term.image.length()-1)));


/Stefan Bergstrand


> > From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
> >
> > Also, Lucene returns the parsed version of each of our
> > searches. When we
> > search by rou*d, Lucene parses it as rou*d (which is what we
> > would expect).
> > But when we search by rou?d, Lucene parses it as "rou d". It
> > seems to wrap
> > the term in quotes and replace the question mark with a
> > space. Any ideas? Or
> > can someone give us an idea of how to understand WildcardQuery or
> > WildcardTermEnum?
>
> It sounds like the problem is in the query parser. Brian?
>
> Doug
>
> --
> To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>

--
---------------------------
Stefan Bergstrand
Polopoly - Cultivating the information garden
Ph: +46 8 506 782 67
Cell: +46 704 47 82 67
Fax: +46 8 506 782 51
stefan.bergstrand@polopoly.com, http://www.polopoly.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Wildcard Searching [ In reply to ]
Did the change that you mentioned below really work for you?
I wrote this class:
http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=1638

and it looks like the bug is not in QueryParser, but in some Java class
(could it be WildcardTermEnum?), since the class does not make use of
QueryParser and still demonstrates that WildcardQuery doesn't work
properly.

Thanks,
Otis


--- "Howk, Michael" <MHowk@FSC.Follett.com> wrote:
> We just tried adding the "?" character to QueryParser.jj under
> <#_TERM_START_CHAR>. We noticed that the "*" was in that list, so we
> figured
> we'd just give it a try. It seems to have worked. Now when we search
> on
> rou?d, we get hits on the word "round". We're going to try searching
> for
> some other variations to make sure that we've done the right thing.
>
> We'd still be interested to know exactly why this worked (assuming it
> continues to solve our problem). What is a TERM_START_CHAR and how is
> it
> used? Obviously it does something important. :-)
>
> -----Original Message-----
> From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
> Sent: Wednesday, February 27, 2002 11:14 AM
> To: 'Lucene Users List'
> Subject: RE: Wildcard Searching
>
>
> The StandardAnalyzer uses a lowercase filter, but we tried indexing
> "the
> round hat", just to make sure. The * still worked, but the ? still
> failed.
>
> We noticed that the ? character is listed in the QueryParser as a
> WILDTERM.
> But after that, the code heads into the WildcardQuery class, and we
> get lost
> amidst "setEnum()" and "wildcardEquals()" stuff. :-)
>
> Seriously though, we're using the StandardAnalyzer directly from
> Lucene. I
> suppose it's possible that the ? is a special character that's
> getting
> stripped out. But we need help to find out exactly where the special
> characters are defined or filtered.
>
> Michael
>
> -----Original Message-----
> From: Aruna Raghavan [mailto:ArunaR@opin.com]
> Sent: Wednesday, February 27, 2002 11:00 AM
> To: 'Lucene Users List'
> Subject: RE: Wildcard Searching
>
>
> >From my experience with wildcards,
> 1. They are case sensitive while the regular queries aren't.
> 2. Only one wild card is allowed in a word. If you are using this
> with a
> bool query, you can use something like the following
> (asas*) AND (fhg*fd). This is acceptable
> 3. There is a requirement of using atleast one character before
> wildcard in
> a query.(*fhhd is not valid)
> 4. Special characters are not supported (? may be a special
> character)
> Hope this helps!
>
> -----Original Message-----
> From: Howk, Michael [mailto:MHowk@FSC.Follett.com]
> Sent: Wednesday, February 27, 2002 10:56 AM
> To: Lucene Mailing List (E-mail)
> Subject: Wildcard Searching
>
>
> We're really struggling with trying to understand why the
> WildcardQuery
> seems to strip out the question mark by replacing it with a space.
> We're
> using the daily build, and a StandardAnalyzer. We've got the text
> "The Round
> Window" in our index. If we search on "roun*" the Lucene QueryParser
> returns
> a hit. When we search on "roun?", we don't get any hits. We don't
> even know
> how to make heads or tails of the WildcardQuery or WildcardTermEnum
> classes.
>
> Also, Lucene returns the parsed version of each of our searches. When
> we
> search by rou*d, Lucene parses it as rou*d (which is what we would
> expect).
> But when we search by rou?d, Lucene parses it as "rou d". It seems to
> wrap
> the term in quotes and replace the question mark with a space. Any
> ideas? Or
> can someone give us an idea of how to understand WildcardQuery or
> WildcardTermEnum?
>
> Michael
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax
http://taxes.yahoo.com/