Mailing List Archive

Limit on number of characters before wildcard?
Hi,
From some testing that I have done it appears that there is a limit of 3
characters before the wild card for wildcard queries. In other words, if the
word is dogleash and I looking by using do* it returns wrong results
(usually only a asubset) where as if I use dog*, I get correct results.

Also, wildcard at the begining of the keyword does not seem to be supported.
(*ogleash)
Can some one confirm this? Is this documented anywhere?

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Limit on number of characters before wildcard? [ In reply to ]
Hello,

I haven't tested this like you did, but from looking at the query
parser (QueryParser.jj file in the Lucene distribution)
it seems that only a single character is required before '*' or '?':

...
| <WILDTERM: <_TERM_START_CHAR>
(<_TERM_CHAR> | ( [ "*", "?" ] ))* >
...

_TERM_START_CHAR is defined as:
[ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]

and as you can see from the first definition above this character can
be followed by either zero or more _TERM_CHAR or "*" or "?".

This also answers your question about using an asterisk as the very
first character in the query.

It would be great if Doug or Brian Goetz could confirm or dispute this,
so that I can add it to the Lucene FAQ at jGuru.com.

Otis





--- Aruna Raghavan <ArunaR@opin.com> wrote:
> Hi,
> From some testing that I have done it appears that there is a limit
> of 3
> characters before the wild card for wildcard queries. In other
words,
> if the
> word is dogleash and I looking by using do* it returns wrong
results
> (usually only a asubset) where as if I use dog*, I get correct
> results.
>
> Also, wildcard at the begining of the keyword does not seem to be
> supported.
> (*ogleash)
> Can some one confirm this? Is this documented anywhere?
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Limit on number of characters before wildcard? [ In reply to ]
First character asterisk (eg, *ogleash) is performed by PrefixQuery, which
executes much faster than WildcardQuery.


Dave Kor Kian Wei
Consultant
Product Engineering
NexusEdge Technologies Pte. Ltd.
6 Aljunied Ave 3, #01-02 (Level 4)
Singapore 389932
Tel : (+65)848-2552
Fax : (+65)747-4536
Web : www.nexusedge.com

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, January 11, 2002 11:40 AM
> To: Lucene Users List
> Subject: Re: Limit on number of characters before wildcard?
>
>
> Hello,
>
> I haven't tested this like you did, but from looking at the query
> parser (QueryParser.jj file in the Lucene distribution)
> it seems that only a single character is required before '*' or '?':
>
> ...
> | <WILDTERM: <_TERM_START_CHAR>
> (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> ...
>
> _TERM_START_CHAR is defined as:
> [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
>
> and as you can see from the first definition above this character can
> be followed by either zero or more _TERM_CHAR or "*" or "?".
>
> This also answers your question about using an asterisk as the very
> first character in the query.
>
> It would be great if Doug or Brian Goetz could confirm or dispute this,
> so that I can add it to the Lucene FAQ at jGuru.com.
>
> Otis
>
>
>
>
>
> --- Aruna Raghavan <ArunaR@opin.com> wrote:
> > Hi,
> > From some testing that I have done it appears that there is a limit
> > of 3
> > characters before the wild card for wildcard queries. In other
> words,
> > if the
> > word is dogleash and I looking by using do* it returns wrong
> results
> > (usually only a asubset) where as if I use dog*, I get correct
> > results.
> >
> > Also, wildcard at the begining of the keyword does not seem to be
> > supported.
> > (*ogleash)
> > Can some one confirm this? Is this documented anywhere?
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Limit on number of characters before wildcard? [ In reply to ]
Dave and Otis,
Thanks for your responses.
I don't quite follow what Dave is saying. Do I need to construct these
WildCard or PrefixQueries myself?
Currently I am just using something like
Query query= QueryParser.parse("dogleash");
I also use Boolean Queries by using the following-
BooleanQuery bq = new BooleanQuery();
bq.add(query);

Is dave suggesting that we detect the wildcard ourselves and create the
PrefixQuery??
-----Original Message-----
From: Dave Kor [mailto:dave.kor@nexusedge.com]
Sent: Thursday, January 10, 2002 10:21 PM
To: Lucene Users List
Subject: RE: Limit on number of characters before wildcard?


First character asterisk (eg, *ogleash) is performed by PrefixQuery, which
executes much faster than WildcardQuery.


Dave Kor Kian Wei
Consultant
Product Engineering
NexusEdge Technologies Pte. Ltd.
6 Aljunied Ave 3, #01-02 (Level 4)
Singapore 389932
Tel : (+65)848-2552
Fax : (+65)747-4536
Web : www.nexusedge.com

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, January 11, 2002 11:40 AM
> To: Lucene Users List
> Subject: Re: Limit on number of characters before wildcard?
>
>
> Hello,
>
> I haven't tested this like you did, but from looking at the query
> parser (QueryParser.jj file in the Lucene distribution)
> it seems that only a single character is required before '*' or '?':
>
> ...
> | <WILDTERM: <_TERM_START_CHAR>
> (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> ...
>
> _TERM_START_CHAR is defined as:
> [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
>
> and as you can see from the first definition above this character can
> be followed by either zero or more _TERM_CHAR or "*" or "?".
>
> This also answers your question about using an asterisk as the very
> first character in the query.
>
> It would be great if Doug or Brian Goetz could confirm or dispute this,
> so that I can add it to the Lucene FAQ at jGuru.com.
>
> Otis
>
>
>
>
>
> --- Aruna Raghavan <ArunaR@opin.com> wrote:
> > Hi,
> > From some testing that I have done it appears that there is a limit
> > of 3
> > characters before the wild card for wildcard queries. In other
> words,
> > if the
> > word is dogleash and I looking by using do* it returns wrong
> results
> > (usually only a asubset) where as if I use dog*, I get correct
> > results.
> >
> > Also, wildcard at the begining of the keyword does not seem to be
> > supported.
> > (*ogleash)
> > Can some one confirm this? Is this documented anywhere?
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
>
>


--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Limit on number of characters before wildcard? [ In reply to ]
Just so that nobody is confused in the future, PrefixQuery that Dave is
mentioning is actually a query that lets you make searches such as
'Consult*'.

See http://jguru.com/faq/view.jsp?EID=480194

Otis

--- Dave Kor <dave.kor@nexusedge.com> wrote:
> First character asterisk (eg, *ogleash) is performed by PrefixQuery,
> which
> executes much faster than WildcardQuery.
>
>
> Dave Kor Kian Wei
> Consultant
> Product Engineering
> NexusEdge Technologies Pte. Ltd.
> 6 Aljunied Ave 3, #01-02 (Level 4)
> Singapore 389932
> Tel : (+65)848-2552
> Fax : (+65)747-4536
> Web : www.nexusedge.com
>
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Friday, January 11, 2002 11:40 AM
> > To: Lucene Users List
> > Subject: Re: Limit on number of characters before wildcard?
> >
> >
> > Hello,
> >
> > I haven't tested this like you did, but from looking at the query
> > parser (QueryParser.jj file in the Lucene distribution)
> > it seems that only a single character is required before '*' or
> '?':
> >
> > ...
> > | <WILDTERM: <_TERM_START_CHAR>
> > (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> > ...
> >
> > _TERM_START_CHAR is defined as:
> > [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
> >
> > and as you can see from the first definition above this character
> can
> > be followed by either zero or more _TERM_CHAR or "*" or "?".
> >
> > This also answers your question about using an asterisk as the very
> > first character in the query.
> >
> > It would be great if Doug or Brian Goetz could confirm or dispute
> this,
> > so that I can add it to the Lucene FAQ at jGuru.com.
> >
> > Otis
> >
> >
> >
> >
> >
> > --- Aruna Raghavan <ArunaR@opin.com> wrote:
> > > Hi,
> > > From some testing that I have done it appears that there is a
> limit
> > > of 3
> > > characters before the wild card for wildcard queries. In other
> > words,
> > > if the
> > > word is dogleash and I looking by using do* it returns wrong
> > results
> > > (usually only a asubset) where as if I use dog*, I get correct
> > > results.
> > >
> > > Also, wildcard at the begining of the keyword does not seem to be
> > > supported.
> > > (*ogleash)
> > > Can some one confirm this? Is this documented anywhere?
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Send FREE video emails in Yahoo! Mail!
> > http://promo.yahoo.com/videomail/
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Limit on number of characters before wildcard? [ In reply to ]
Yes, I am clear on how a prefix query is defined. But Dave says somehow a
search *ogleash would work with a PrefixQuery. dog*eash or dog* would work,
not *ogleash. That's where the confusion came from. Just to clarify...
Thanks again.

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Friday, January 11, 2002 12:49 PM
To: Lucene Users List
Subject: RE: Limit on number of characters before wildcard?


Just so that nobody is confused in the future, PrefixQuery that Dave is
mentioning is actually a query that lets you make searches such as
'Consult*'.

See http://jguru.com/faq/view.jsp?EID=480194

Otis

--- Dave Kor <dave.kor@nexusedge.com> wrote:
> First character asterisk (eg, *ogleash) is performed by PrefixQuery,
> which
> executes much faster than WildcardQuery.
>
>
> Dave Kor Kian Wei
> Consultant
> Product Engineering
> NexusEdge Technologies Pte. Ltd.
> 6 Aljunied Ave 3, #01-02 (Level 4)
> Singapore 389932
> Tel : (+65)848-2552
> Fax : (+65)747-4536
> Web : www.nexusedge.com
>
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Friday, January 11, 2002 11:40 AM
> > To: Lucene Users List
> > Subject: Re: Limit on number of characters before wildcard?
> >
> >
> > Hello,
> >
> > I haven't tested this like you did, but from looking at the query
> > parser (QueryParser.jj file in the Lucene distribution)
> > it seems that only a single character is required before '*' or
> '?':
> >
> > ...
> > | <WILDTERM: <_TERM_START_CHAR>
> > (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> > ...
> >
> > _TERM_START_CHAR is defined as:
> > [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
> >
> > and as you can see from the first definition above this character
> can
> > be followed by either zero or more _TERM_CHAR or "*" or "?".
> >
> > This also answers your question about using an asterisk as the very
> > first character in the query.
> >
> > It would be great if Doug or Brian Goetz could confirm or dispute
> this,
> > so that I can add it to the Lucene FAQ at jGuru.com.
> >
> > Otis
> >
> >
> >
> >
> >
> > --- Aruna Raghavan <ArunaR@opin.com> wrote:
> > > Hi,
> > > From some testing that I have done it appears that there is a
> limit
> > > of 3
> > > characters before the wild card for wildcard queries. In other
> > words,
> > > if the
> > > word is dogleash and I looking by using do* it returns wrong
> > results
> > > (usually only a asubset) where as if I use dog*, I get correct
> > > results.
> > >
> > > Also, wildcard at the begining of the keyword does not seem to be
> > > supported.
> > > (*ogleash)
> > > Can some one confirm this? Is this documented anywhere?
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Send FREE video emails in Yahoo! Mail!
> > http://promo.yahoo.com/videomail/
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
> >
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Limit on number of characters before wildcard? [ In reply to ]
You can't make queries like *ogleash.
See that FAQ entry...

Otis

--- Aruna Raghavan <ArunaR@opin.com> wrote:
> Yes, I am clear on how a prefix query is defined. But Dave says
> somehow a
> search *ogleash would work with a PrefixQuery. dog*eash or dog* would
> work,
> not *ogleash. That's where the confusion came from. Just to
> clarify...
> Thanks again.
>
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Friday, January 11, 2002 12:49 PM
> To: Lucene Users List
> Subject: RE: Limit on number of characters before wildcard?
>
>
> Just so that nobody is confused in the future, PrefixQuery that Dave
> is
> mentioning is actually a query that lets you make searches such as
> 'Consult*'.
>
> See http://jguru.com/faq/view.jsp?EID=480194
>
> Otis
>
> --- Dave Kor <dave.kor@nexusedge.com> wrote:
> > First character asterisk (eg, *ogleash) is performed by
> PrefixQuery,
> > which
> > executes much faster than WildcardQuery.
> >
> >
> > Dave Kor Kian Wei
> > Consultant
> > Product Engineering
> > NexusEdge Technologies Pte. Ltd.
> > 6 Aljunied Ave 3, #01-02 (Level 4)
> > Singapore 389932
> > Tel : (+65)848-2552
> > Fax : (+65)747-4536
> > Web : www.nexusedge.com
> >
> > > -----Original Message-----
> > > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > > Sent: Friday, January 11, 2002 11:40 AM
> > > To: Lucene Users List
> > > Subject: Re: Limit on number of characters before wildcard?
> > >
> > >
> > > Hello,
> > >
> > > I haven't tested this like you did, but from looking at the query
> > > parser (QueryParser.jj file in the Lucene distribution)
> > > it seems that only a single character is required before '*' or
> > '?':
> > >
> > > ...
> > > | <WILDTERM: <_TERM_START_CHAR>
> > > (<_TERM_CHAR> | ( [ "*", "?" ] ))* >
> > > ...
> > >
> > > _TERM_START_CHAR is defined as:
> > > [ "a"-"z", "A"-"Z", "_", "\u0080"-"\uFFFE" ]
> > >
> > > and as you can see from the first definition above this character
> > can
> > > be followed by either zero or more _TERM_CHAR or "*" or "?".
> > >
> > > This also answers your question about using an asterisk as the
> very
> > > first character in the query.
> > >
> > > It would be great if Doug or Brian Goetz could confirm or dispute
> > this,
> > > so that I can add it to the Lucene FAQ at jGuru.com.
> > >
> > > Otis
> > >
> > >
> > >
> > >
> > >
> > > --- Aruna Raghavan <ArunaR@opin.com> wrote:
> > > > Hi,
> > > > From some testing that I have done it appears that there is a
> > limit
> > > > of 3
> > > > characters before the wild card for wildcard queries. In other
> > > words,
> > > > if the
> > > > word is dogleash and I looking by using do* it returns wrong
> > > results
> > > > (usually only a asubset) where as if I use dog*, I get correct
> > > > results.
> > > >
> > > > Also, wildcard at the begining of the keyword does not seem to
> be
> > > > supported.
> > > > (*ogleash)
> > > > Can some one confirm this? Is this documented anywhere?
> > > >
> > > > --
> > > > To unsubscribe, e-mail:
> > > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > > For additional commands, e-mail:
> > > > <mailto:lucene-user-help@jakarta.apache.org>
> > > >
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Send FREE video emails in Yahoo! Mail!
> > > http://promo.yahoo.com/videomail/
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <mailto:lucene-user-help@jakarta.apache.org>
> > >
> > >
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> > <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>