Mailing List Archive

Lucene Query Structure
Lets say I have two queries which I want to combine into one:

(a and b) OR (c and d)

I would use QueryParser.parse to form the subqueries, but how do I/can I
combine them with the OR logic?

The BooleanQuery can be used to piece queries together but it does not
support OR's correct? (It only supports include, exclude, and "preferential".)

Thanks for any help,

sjb




--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene Query Structure [ In reply to ]
BQ(Term, Term, Include, Exclude)

BQ(
BQ(a,b,true,false)
BQ(a,b,true,false)
false,
false)

Should work...
Winton
>Lets say I have two queries which I want to combine into one:
>
>(a and b) OR (c and d)
>
>I would use QueryParser.parse to form the subqueries, but how do I/can I
>combine them with the OR logic?
>
>The BooleanQuery can be used to piece queries together but it does not
>support OR's correct? (It only supports include, exclude, and
>"preferential".)
>
>Thanks for any help,
>
>sjb
>
>
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


--

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene Query Structure [ In reply to ]
Actually, Winton's suggestion doesn't work because it's inconsistent with
the syntax of BooleanQuery() (the constructor doesn't take arguments, and
add() takes one Query argument, not two).

After considerable study of the documentation, I am still confused about
the semantics of BooleanQuery. I think I can answer the original
syntactic question posed by sjb, but the overall motivation escapes me.

I believe the correct syntax is (given TermQueries a, b, c, and d):

// create a BooleanQuery for '(a AND b)'
BooleanQuery bq_ab = new BooleanQuery();
bq_ab.add(a, true, false);
bq_ab.add(b, true, false);

// same as above with c and d
BooleanQuery bq_cd = new BooleanQuery();
bq_cd.add(c, true, false);
bq_cd.add(d, true, false);

// join two BooleanQueries together
BooleanQuery bq_abcd = new BooleanQuery();
bq_abcd.add(bq_ab, false, false);
bq_abcd.add(bq_cd, false, false);


Now, as sjb pointed out, "(query, false, false)" doesn't really seem to
have the semantics of a boolean OR. In particular:

(1) It's a unary operator: add() adds a Query (or a BooleanClause) to a
BooleanQuery. OR is a binary operator.

(2) "add.(query, false, false)" adds a query Q whose satisfaction is
*irrelevant* to that of the resultant BooleanQuery BQ: documents are
neither required to satisfy Q, nor required to *not* satisfy Q, in order
for BQ to be satisfied. The semantics of a boolean OR should be that at
least *one* of the components (queries) must be satisfied in order for the
entire expression (composite query) to be satisfied.


I conclude that either
(a) I simply don't understand the proper use of BooleanQuery, or
(b) BooleanQuery cannot be used to express a boolean OR.

If (b) is true, either the semantics of BooleanQuery need revising, or
it needs to get called something else so as not to be confusing.

If the semantics of BooleanQuery are revised, I would suggest changing the
syntax as well to reflect the binary nature of Boolean operators:

BooleanQuery.add(Query q1, Query q2, boolean and)

which would equate to (q1 AND q2) if 'and' is true, otherwise (q1 OR q2).


If there are any papers or references which would explain this better,
that would be great. Otherwise, I would really appreciate it if one of
the developers would take a few moments to clarify this issue. I'm trying
to use Lucene as a platform for research in IR; to do this I need a clear
understanding of the exact definition of the scoring system, especially as
it relates to the semantics of queries involving multiple terms.

Once I get a clear understanding of this issue, I would be happy to write
it up and submit it as an addition to the FAQ/docs.

Thanks in advance for any assistance rendered in getting this sorted out.

Regards,

Joshua

On Mon, 18 Feb 2002, Winton Davies wrote:

> BQ(Term, Term, Include, Exclude)
>
> BQ(
> BQ(a,b,true,false)
> BQ(a,b,true,false)
> false,
> false)
>
> Should work...
> Winton

> >Lets say I have two queries which I want to combine into one:
> >
> >(a and b) OR (c and d)
> >
> >I would use QueryParser.parse to form the subqueries, but how do I/can I
> >combine them with the OR logic?
> >
> >The BooleanQuery can be used to piece queries together but it does not
> >support OR's correct? (It only supports include, exclude, and
> >"preferential".)
> >
> >Thanks for any help,
> >
> >sjb



jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.



--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Lucene Query Structure [ In reply to ]
> From: Joshua O'Madadhain [mailto:jmadden@ics.uci.edu]
>
> After considerable study of the documentation, I am still
> confused about the semantics of BooleanQuery.
>
> Now, as sjb pointed out, "(query, false, false)" doesn't
> really seem to have the semantics of a boolean OR.

In fact, it does.

> In particular:
>
> (1) It's a unary operator: add() adds a Query (or a
> BooleanClause) to a BooleanQuery. OR is a binary operator.

OR is not inherently binary.

> The semantics of a boolean OR should
> be that at
> least *one* of the components (queries) must be satisfied in
> order for the
> entire expression (composite query) to be satisfied.

That is the case here.

> I conclude that either
> (a) I simply don't understand the proper use of BooleanQuery, or
> (b) BooleanQuery cannot be used to express a boolean OR.

(a)

Good analogies for the semantics of BooleanQuery are most internet search
engines (except Google) which permit you to put '+' or '-' in front of a
word to require or prohibit it. (Google requires terms by default.) A term
with no plus or minus is not required for a match, but all of the documents
containing it are included.

This form also facilitates efficient search. One can keep track of which
query terms a document contains as a bitmask, then check the boolean
expression against this mask. This is fast if the boolean expression can
itself be expressed as a few simple bitmasks. Converting an arbitrary
boolean expression to disjunctive normal form is possible, but not trivial.
But with this form mask construction is trivial.

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Lucene Query Structure [ In reply to ]
> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Tuesday, February 19, 2002 5:05 PM
> To: 'Lucene Developers List'; Lucene Users List
> Subject: RE: Lucene Query Structure
>
>

>
> Good analogies for the semantics of BooleanQuery are most
> internet search
> engines (except Google) which permit you to put '+' or '-' in
> front of a
> word to require or prohibit it. (Google requires terms by
> default.) A term
> with no plus or minus is not required for a match, but all of
> the documents
> containing it are included.
>

"Keyword searching refers to a search type in which you enter terms representing the concepts you wish to retrieve. Boolean operators are not used.

Implied Boolean logic refers to a search in which symbols are used to represent Boolean logical operators. In this type of search on the Internet, the absence of a symbol is also significant, as the space between keywords defaults to either OR logic or AND logic. Many well-known search engines traditionally defaulted to OR logic, but as a rule are moving away from the practice and defaulting to AND. " http://library.albany.edu/internet/boolean.html

The queryParser of Lucene implies OR logic if no operator found in the query, doesn't it? I think the users (on our site) prefer keyword searching and implied AND logic. Unfortunatly to decide whether a query contains any operator or not, needs to read and parse the whole query (not LL(k) language). How could I modify the queryParser to implement default AND logic?

peter


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
Re: Lucene Query Structure [ In reply to ]
Apologies for the misleading quote, but you got the right syntax in
the end it would seem.
I believe that the add(x,false,false) is just what you need for OR...

The Unary version is kind of ok -- OR is at least one member of a set
present... the set of size one is still a set, it just happens to
co-incide with the case of AND which is all of them members of a set.
2 and more members of a set are obvious. As Doug pointed out OR isn't
a binary operator (or teriary and so on) (although though it is a
Boolean operator :) )

Cheers and sorry again for confusing the issue :)

Winton

>Actually, Winton's suggestion doesn't work because it's inconsistent with
>the syntax of BooleanQuery() (the constructor doesn't take arguments, and
>add() takes one Query argument, not two).
>
>After considerable study of the documentation, I am still confused about
>the semantics of BooleanQuery. I think I can answer the original
>syntactic question posed by sjb, but the overall motivation escapes me.
>
>I believe the correct syntax is (given TermQueries a, b, c, and d):
>
>// create a BooleanQuery for '(a AND b)'
>BooleanQuery bq_ab = new BooleanQuery();
>bq_ab.add(a, true, false);
>bq_ab.add(b, true, false);
>
>// same as above with c and d
>BooleanQuery bq_cd = new BooleanQuery();
>bq_cd.add(c, true, false);
>bq_cd.add(d, true, false);
>
>// join two BooleanQueries together
>BooleanQuery bq_abcd = new BooleanQuery();
>bq_abcd.add(bq_ab, false, false);
>bq_abcd.add(bq_cd, false, false);
>
>
>Now, as sjb pointed out, "(query, false, false)" doesn't really seem to
>have the semantics of a boolean OR. In particular:
>
>(1) It's a unary operator: add() adds a Query (or a BooleanClause) to a
>BooleanQuery. OR is a binary operator.
>
>(2) "add.(query, false, false)" adds a query Q whose satisfaction is
>*irrelevant* to that of the resultant BooleanQuery BQ: documents are
>neither required to satisfy Q, nor required to *not* satisfy Q, in order
>for BQ to be satisfied. The semantics of a boolean OR should be that at
>least *one* of the components (queries) must be satisfied in order for the
>entire expression (composite query) to be satisfied.
>
>
>I conclude that either
>(a) I simply don't understand the proper use of BooleanQuery, or
>(b) BooleanQuery cannot be used to express a boolean OR.
>
>If (b) is true, either the semantics of BooleanQuery need revising, or
>it needs to get called something else so as not to be confusing.
>
>If the semantics of BooleanQuery are revised, I would suggest changing the
>syntax as well to reflect the binary nature of Boolean operators:
>
>BooleanQuery.add(Query q1, Query q2, boolean and)
>
>which would equate to (q1 AND q2) if 'and' is true, otherwise (q1 OR q2).
>
>
>If there are any papers or references which would explain this better,
>that would be great. Otherwise, I would really appreciate it if one of
>the developers would take a few moments to clarify this issue. I'm trying
>to use Lucene as a platform for research in IR; to do this I need a clear
>understanding of the exact definition of the scoring system, especially as
>it relates to the semantics of queries involving multiple terms.
>
>Once I get a clear understanding of this issue, I would be happy to write
>it up and submit it as an addition to the FAQ/docs.
>
>Thanks in advance for any assistance rendered in getting this sorted out.
>
>Regards,
>
>Joshua
>
>On Mon, 18 Feb 2002, Winton Davies wrote:
>
>> BQ(Term, Term, Include, Exclude)
>>
>> BQ(
>> BQ(a,b,true,false)
>> BQ(a,b,true,false)
>> false,
>> false)
>>
>> Should work...
>> Winton
>
>> >Lets say I have two queries which I want to combine into one:
>> >
>> >(a and b) OR (c and d)
>> >
>> >I would use QueryParser.parse to form the subqueries, but how do I/can I
>> >combine them with the OR logic?
>> >
>> >The BooleanQuery can be used to piece queries together but it does not
>> >support OR's correct? (It only supports include, exclude, and
>> >"preferential".)
>> >
>> >Thanks for any help,
>> >
>> >sjb
>
>
>
> jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden
> Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
> It's that moment of dawning comprehension that I live for--Bill Watterson
>My opinions are too rational and insightful to be those of any organization.
>
>
>
>--
>To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
>For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


--

Winton Davies
Lead Engineer, Overture (NSDQ: OVER)
1820 Gateway Drive, Suite 360
San Mateo, CA 94404
work: (650) 403-2259
cell: (650) 867-1598
http://www.overture.com/


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>