Mailing List Archive

PLEASE REVIEW: Updated Query Parser Syntax
Hi,
I have updated the query parser syntax document and put it up on the website
(without any links).

http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

Please review it and send feedback.

Current questions:

The current queryParser supports range searches, but can you put a date into
the query parser? Are range searches only used for searching dates?

--Peter


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
"Microsoft Word" NOT "Microsoft Excel"


My understanding of query parser this query would be the same as:
"Microsoft Word" OR NOT "Microsoft Excel"

Same with the -



----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 9:57 AM
Subject: PLEASE REVIEW: Updated Query Parser Syntax


> Hi,
> I have updated the query parser syntax document and put it up on the
website
> (without any links).
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
>
> Please review it and send feedback.
>
> Current questions:
>
> The current queryParser supports range searches, but can you put a date
into
> the query parser? Are range searches only used for searching dates?
>
> --Peter
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
Do you think this should be stated more directly as an option?

It seems like the "OR NOT" is more confusing.

Or are you making the point that this is not "AND NOT" meaning that
"Microsoft Word" is not required?

--Peter


On 5/15/02 7:30 AM, "Eugene Gluzberg" <drag0n2@apache.org> wrote:

>
> "Microsoft Word" NOT "Microsoft Excel"
>
>
> My understanding of query parser this query would be the same as:
> "Microsoft Word" OR NOT "Microsoft Excel"
>
> Same with the -
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
Sorry, I should have been more clear.

As far as I understand, NOT is an unary operator, and applies to the to the
term that follows it.
So:
NOT "bye bye"
is a valid query and will find all documents that do not have the phrase
"bye bye"

if you want to do find all documents that have the set difference between
hello and "bye bye" you will have to use the query:

hello AND NOT "bye bye"

Also it follows that,
hello NOT "bye bye" is equivalent to:
hello OR NOT "bye bye"

So the query hello NOT "bye bye" will find all documents that either have
hello OR do not have "bye bye"

In your description you said:
The NOT operator excludes documents that contain the term after NOT. This
is equivalent to a difference using sets. For example to search for
documents that contain "Microsoft Word" but not "Microsoft Excel":

"Microsoft Word" NOT "Microsoft Excel"

The correct way of doing that would be:
"Microsoft Word" AND NOT "Microsoft Excel"

----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 10:43 AM
Subject: Re: PLEASE REVIEW: Updated Query Parser Syntax


> Do you think this should be stated more directly as an option?
>
> It seems like the "OR NOT" is more confusing.
>
> Or are you making the point that this is not "AND NOT" meaning that
> "Microsoft Word" is not required?
>
> --Peter
>
>
> On 5/15/02 7:30 AM, "Eugene Gluzberg" <drag0n2@apache.org> wrote:
>
> >
> > "Microsoft Word" NOT "Microsoft Excel"
> >
> >
> > My understanding of query parser this query would be the same as:
> > "Microsoft Word" OR NOT "Microsoft Excel"
> >
> > Same with the -
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
I believe you could simplify things by suggesting the alternative
syntax:

+"Microsoft Word" -"Microsoft Excel"

I think this will get you all documents that have the first phrase, but
not the second one.

Otis


--- Eugene Gluzberg <drag0n2@apache.org> wrote:
> Sorry, I should have been more clear.
>
> As far as I understand, NOT is an unary operator, and applies to the
> to the
> term that follows it.
> So:
> NOT "bye bye"
> is a valid query and will find all documents that do not have the
> phrase
> "bye bye"
>
> if you want to do find all documents that have the set difference
> between
> hello and "bye bye" you will have to use the query:
>
> hello AND NOT "bye bye"
>
> Also it follows that,
> hello NOT "bye bye" is equivalent to:
> hello OR NOT "bye bye"
>
> So the query hello NOT "bye bye" will find all documents that either
> have
> hello OR do not have "bye bye"
>
> In your description you said:
> The NOT operator excludes documents that contain the term after
> NOT. This
> is equivalent to a difference using sets. For example to search for
> documents that contain "Microsoft Word" but not "Microsoft Excel":
>
> "Microsoft Word" NOT "Microsoft Excel"
>
> The correct way of doing that would be:
> "Microsoft Word" AND NOT "Microsoft Excel"
>
> ----- Original Message -----
> From: "Peter Carlson" <carlson@bookandhammer.com>
> To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
> Sent: Wednesday, May 15, 2002 10:43 AM
> Subject: Re: PLEASE REVIEW: Updated Query Parser Syntax
>
>
> > Do you think this should be stated more directly as an option?
> >
> > It seems like the "OR NOT" is more confusing.
> >
> > Or are you making the point that this is not "AND NOT" meaning that
> > "Microsoft Word" is not required?
> >
> > --Peter
> >
> >
> > On 5/15/02 7:30 AM, "Eugene Gluzberg" <drag0n2@apache.org> wrote:
> >
> > >
> > > "Microsoft Word" NOT "Microsoft Excel"
> > >
> > >
> > > My understanding of query parser this query would be the same as:
> > > "Microsoft Word" OR NOT "Microsoft Excel"
> > >
> > > Same with the -
> > >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> >
> >
>
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
>


__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
RE: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
Looks great!

A couple of missing features:

1. phrase slop: phrase slop can be set with "t1 t2"~10, which matches t1
and t2 within 10 terms of one another (actually t1 up to 10 before or 8
after t2).

2. field grouping: a bunch of clauses can be scoped to a particular field
with:
f1(+t1 +"t2 t3" ...)
This sets the field for for t1, t2, t3, etc. to be f1. This is a
side-effect of the way that fields and grouping work, but a useful one that
I think deserves an example.

Doug

> -----Original Message-----
> From: Peter Carlson
> [mailto:carlson.at.bookandhammer.com@cutting.at.lucene.com]
> Sent: Wednesday, May 15, 2002 6:58 AM
> To: dcutting@grandcentral.com
> Subject: PLEASE REVIEW: Updated Query Parser Syntax
>
>
> Hi,
> I have updated the query parser syntax document and put it up
> on the website
> (without any links).
>
> http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
>
> Please review it and send feedback.
>
> Current questions:
>
> The current queryParser supports range searches, but can you
> put a date into
> the query parser? Are range searches only used for searching dates?
>
> --Peter
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
Good point. I do think there needs to be more clarification that NOT cannot
be used as the only term and find any results. That is you can say

NOT "bye bye" hello

But you cannot say

NOT "bye bye"

However, I think you you might be wrong in describing the way NOT works with
AND and OR.

hello OR NOT "bye bye"

Will find all documents with hello AND do not have "bye bye". That is the
NOT is always based subtracting its finding from the other search results.
It never subtracts from the complete set of documents in the index.

So in practice
hello AND NOT "bye bye" (+hello -"bye bye")
hello OR NOT "bye bye" (hello -"bye bye")
hello NOT "bye bye" (hello -"bye bye")
NOT "bye bye" hello (-"bye bye" hello)

Are all equivalent because there is just one term being found and one phrase
being removed. These are not equivalent in the general case.

Does that make sense?

--Peter


On 5/15/02 8:26 AM, "Eugene Gluzberg" <drag0n2@apache.org> wrote:

> Sorry, I should have been more clear.
>
> As far as I understand, NOT is an unary operator, and applies to the to the
> term that follows it.
> So:
> NOT "bye bye"
> is a valid query and will find all documents that do not have the phrase
> "bye bye"
>
> if you want to do find all documents that have the set difference between
> hello and "bye bye" you will have to use the query:
>
> hello AND NOT "bye bye"
>
> Also it follows that,
> hello NOT "bye bye" is equivalent to:
> hello OR NOT "bye bye"
>
> So the query hello NOT "bye bye" will find all documents that either have
> hello OR do not have "bye bye"
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
Re: PLEASE REVIEW: Updated Query Parser Syntax [ In reply to ]
Wait a sec,

NOT can be used with only one term.

NOT "bye bye"
is a legal query

so is:
-"bye bye"


Either will find all documents which do not contain the phrase "bye bye"

I will write a test.

----- Original Message -----
From: "Peter Carlson" <carlson@bookandhammer.com>
To: "Lucene Developers List" <lucene-dev@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 4:14 PM
Subject: Re: PLEASE REVIEW: Updated Query Parser Syntax


> Good point. I do think there needs to be more clarification that NOT
cannot
> be used as the only term and find any results. That is you can say
>
> NOT "bye bye" hello
>
> But you cannot say
>
> NOT "bye bye"
>
> However, I think you you might be wrong in describing the way NOT works
with
> AND and OR.
>
> hello OR NOT "bye bye"
>
> Will find all documents with hello AND do not have "bye bye". That is the
> NOT is always based subtracting its finding from the other search results.
> It never subtracts from the complete set of documents in the index.
>
> So in practice
> hello AND NOT "bye bye" (+hello -"bye bye")
> hello OR NOT "bye bye" (hello -"bye bye")
> hello NOT "bye bye" (hello -"bye bye")
> NOT "bye bye" hello (-"bye bye" hello)
>
> Are all equivalent because there is just one term being found and one
phrase
> being removed. These are not equivalent in the general case.
>
> Does that make sense?
>
> --Peter
>
>
> On 5/15/02 8:26 AM, "Eugene Gluzberg" <drag0n2@apache.org> wrote:
>
> > Sorry, I should have been more clear.
> >
> > As far as I understand, NOT is an unary operator, and applies to the to
the
> > term that follows it.
> > So:
> > NOT "bye bye"
> > is a valid query and will find all documents that do not have the phrase
> > "bye bye"
> >
> > if you want to do find all documents that have the set difference
between
> > hello and "bye bye" you will have to use the query:
> >
> > hello AND NOT "bye bye"
> >
> > Also it follows that,
> > hello NOT "bye bye" is equivalent to:
> > hello OR NOT "bye bye"
> >
> > So the query hello NOT "bye bye" will find all documents that either
have
> > hello OR do not have "bye bye"
> >
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>