Mailing List Archive

Question about RangeQuery and strings...
Hi all,

I've been having some problems using RangeQuery. I have a simple Query which
is essentially "document.field < AB". Field values are:

"" // Empty string
"A SPACE"
"A123456"
"ABC"

Now I expected to find the first three of the four values (and I do with
another commercial search engine product I've worked with). With Lucene I
get nothing. Part of the problem I think is that there are some issues with
case here. Changing my query to "document.field < ab" returns:

"A123456"

Now I would have expected "A SPACE" to get returned, and I was really
surprised that "" wasn't returned. I'm guessing that "" wasn't returned
because no term in the field passed the query criteria, and empty string is
not considered a term.

How should I go about getting what I expect? What is going on here?

Thanks much,

James





--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Question about RangeQuery and strings... [ In reply to ]
I'm replying to my own message because I think I now understand the problem,
and part of it is, in my opinion, a bad implementation of RangedQuery.

When you create a ranged query and omit the lower term, my expectation would
be that I would find everything less than the upper term. Now if I pass
false for the inclusive term, then I would expect that I would find all
terms less than the upper term excluding the upper term itself.

What is happening in the case of lower_term=null, upper_term=x,
inclusive=false is that empty strings are being excluded because inclusive
is set false, and the implementation of RangedQuery creates a default lower
term of Term(fieldName, ""). Since it's not inclusive, it excludes "". This
isn't what I intended, and I don't think it's what most people would imagine
RangedQuery would do in the case I've mentioned.

I equate lower=null, upper=x, inclusive=false to Field < x. lower=null,
upper=x, inclusive=true would be Field <= x. In both cases, the only
difference should be whether or not Field = x is true for the query.

I'm still quite new to Lucene, so maybe I'm wrong about all this because I
just don't understand it well enough. If so, could someone tell me where
I've gone astray?

Thanks much,

James

PS: The rest of the problems I had below I was able to fix by changing how
the fields were tokenized and indexed.

> -----Original Message-----
> From: James Ricci
> Sent: Thursday, June 06, 2002 11:16 AM
> To: 'lucene-user@jakarta.apache.org'
> Subject: Question about RangeQuery and strings...
>
> Hi all,
>
> I've been having some problems using RangeQuery. I have a simple Query
> which is essentially "document.field < AB". Field values are:
>
> "" // Empty string
> "A SPACE"
> "A123456"
> "ABC"
>
> Now I expected to find the first three of the four values (and I do with
> another commercial search engine product I've worked with). With Lucene I
> get nothing. Part of the problem I think is that there are some issues
> with case here. Changing my query to "document.field < ab" returns:
>
> "A123456"
>
> Now I would have expected "A SPACE" to get returned, and I was really
> surprised that "" wasn't returned. I'm guessing that "" wasn't returned
> because no term in the field passed the query criteria, and empty string
> is not considered a term.
>
> How should I go about getting what I expect? What is going on here?
>
> Thanks much,
>
> James
>
>
>
>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Question about RangeQuery and strings... [ In reply to ]
James,

I haven't used RangeQueries, but what you describe does sound confusing
to me. I'll enter it as a bug, just so this information doesn't get
lost, because I am not certain that this is really a bug, even though
it sounds like one to me.

Thanks,
Otis


--- James Ricci <james@riccinursery.com> wrote:
> I'm replying to my own message because I think I now understand the
> problem,
> and part of it is, in my opinion, a bad implementation of
> RangedQuery.
>
> When you create a ranged query and omit the lower term, my
> expectation would
> be that I would find everything less than the upper term. Now if I
> pass
> false for the inclusive term, then I would expect that I would find
> all
> terms less than the upper term excluding the upper term itself.
>
> What is happening in the case of lower_term=null, upper_term=x,
> inclusive=false is that empty strings are being excluded because
> inclusive
> is set false, and the implementation of RangedQuery creates a default
> lower
> term of Term(fieldName, ""). Since it's not inclusive, it excludes
> "". This
> isn't what I intended, and I don't think it's what most people would
> imagine
> RangedQuery would do in the case I've mentioned.
>
> I equate lower=null, upper=x, inclusive=false to Field < x.
> lower=null,
> upper=x, inclusive=true would be Field <= x. In both cases, the only
> difference should be whether or not Field = x is true for the query.
>
> I'm still quite new to Lucene, so maybe I'm wrong about all this
> because I
> just don't understand it well enough. If so, could someone tell me
> where
> I've gone astray?
>
> Thanks much,
>
> James
>
> PS: The rest of the problems I had below I was able to fix by
> changing how
> the fields were tokenized and indexed.
>
> > -----Original Message-----
> > From: James Ricci
> > Sent: Thursday, June 06, 2002 11:16 AM
> > To: 'lucene-user@jakarta.apache.org'
> > Subject: Question about RangeQuery and strings...
> >
> > Hi all,
> >
> > I've been having some problems using RangeQuery. I have a simple
> Query
> > which is essentially "document.field < AB". Field values are:
> >
> > "" // Empty string
> > "A SPACE"
> > "A123456"
> > "ABC"
> >
> > Now I expected to find the first three of the four values (and I do
> with
> > another commercial search engine product I've worked with). With
> Lucene I
> > get nothing. Part of the problem I think is that there are some
> issues
> > with case here. Changing my query to "document.field < ab" returns:
> >
> > "A123456"
> >
> > Now I would have expected "A SPACE" to get returned, and I was
> really
> > surprised that "" wasn't returned. I'm guessing that "" wasn't
> returned
> > because no term in the field passed the query criteria, and empty
> string
> > is not considered a term.
> >
> > How should I go about getting what I expect? What is going on here?
> >
> > Thanks much,
> >
> > James
> >
> >
> >
> >
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>