Mailing List Archive

Problems with prohibited BooleanQueries
Hello,

I have some trouble understanding the semantics of the BooleanQuery with
regards to prohibited clause.


the method I am using is this:

public final void add(Query query, boolean required, boolean prohibited)

in the class BooleanQuery.


I am not using the QueryParser, but instead I build my own queries from a
Tree of search-strings with boolean operators NOT, AND and OR

so a tree might look like this

AND
/ \
"A" NOT
|
"B"

and that would mean "documents containing A, but not B". (If the AND was an
OR, the NOT branch would be culled)


Now my problem is this, when I make a BooleanQuery containing only a
prohibited TermQuery and I add that to another BooleanQuery as a required
query, I always get 0 results.


here's an example of my code:
---------------------------
Term term1 = new Term("name", "jensen");
Term term2 = new Term("name", "barfod");

BooleanQuery minusQuery = new BooleanQuery();
minusQuery.add(new TermQuery(term2), false, true);

BooleanQuery q1 = new BooleanQuery();
q1.add(new TermQuery(term1), true, false);
q1.add(minusQuery, false, false); // not required that the
minusQuery is "matched"

BooleanQuery q2 = new BooleanQuery();
q2.add(new TermQuery(term1), true, false);
q2.add(minusQuery, true, false); // required that the minusQuery
is "matched"

BooleanQuery q3 = new BooleanQuery();
q3.add(new TermQuery(term1), true, false);
q3.add(new TermQuery(term2), false, true); // add the term directly
as a prohibited query

System.out.println("non-required: " + _index.search(q1,
null).length());
System.out.println("required : " + _index.search(q2,
null).length());
System.out.println("directly : " + _index.search(q3, null).length());
---------------------------

this outputs:

non-required: 55
required : 0
directly : 48



Regards,
Anders Nielsen


--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Problems with prohibited BooleanQueries [ In reply to ]
Lucene does not implement a standalone "NOT" query. (Probably BooleanQuery
should throw an exception if all clauses are prohibited clauses.) Negation
is only implemented with respect to other non-negated clauses.

So you cannot directly model your query tree as a Lucene query tree. NOT
nodes must be incorporated into their parent, a BooleanQuery node, as is
done in your "directly" case.

Doug

> -----Original Message-----
> From: Anders Nielsen [mailto:anders@visator.dk]
> Sent: Wednesday, October 31, 2001 10:36 AM
> To: Lucene Users List
> Subject: Problems with prohibited BooleanQueries
>
>
> Hello,
>
> I have some trouble understanding the semantics of the
> BooleanQuery with
> regards to prohibited clause.
>
>
> the method I am using is this:
>
> public final void add(Query query, boolean required, boolean
> prohibited)
>
> in the class BooleanQuery.
>
>
> I am not using the QueryParser, but instead I build my own
> queries from a
> Tree of search-strings with boolean operators NOT, AND and OR
>
> so a tree might look like this
>
> AND
> / \
> "A" NOT
> |
> "B"
>
> and that would mean "documents containing A, but not B". (If
> the AND was an
> OR, the NOT branch would be culled)
>
>
> Now my problem is this, when I make a BooleanQuery containing only a
> prohibited TermQuery and I add that to another BooleanQuery
> as a required
> query, I always get 0 results.
>
>
> here's an example of my code:
> ---------------------------
> Term term1 = new Term("name", "jensen");
> Term term2 = new Term("name", "barfod");
>
> BooleanQuery minusQuery = new BooleanQuery();
> minusQuery.add(new TermQuery(term2), false, true);
>
> BooleanQuery q1 = new BooleanQuery();
> q1.add(new TermQuery(term1), true, false);
> q1.add(minusQuery, false, false); // not required that the
> minusQuery is "matched"
>
> BooleanQuery q2 = new BooleanQuery();
> q2.add(new TermQuery(term1), true, false);
> q2.add(minusQuery, true, false); // required that
> the minusQuery
> is "matched"
>
> BooleanQuery q3 = new BooleanQuery();
> q3.add(new TermQuery(term1), true, false);
> q3.add(new TermQuery(term2), false, true); // add the
> term directly
> as a prohibited query
>
> System.out.println("non-required: " + _index.search(q1,
> null).length());
> System.out.println("required : " + _index.search(q2,
> null).length());
> System.out.println("directly : " +
> _index.search(q3, null).length());
> ---------------------------
>
> this outputs:
>
> non-required: 55
> required : 0
> directly : 48
>
>
>
> Regards,
> Anders Nielsen
>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Problems with prohibited BooleanQueries [ In reply to ]
How difficult would it be to get BooleanQuery to do a standalone NOT, do you
suppose? That would be very useful in my case.

Scott

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Wednesday, October 31, 2001 2:36 PM
> To: 'Lucene Users List'
> Subject: RE: Problems with prohibited BooleanQueries
>
>
> Lucene does not implement a standalone "NOT" query.
> (Probably BooleanQuery
> should throw an exception if all clauses are prohibited
> clauses.) Negation
> is only implemented with respect to other non-negated clauses.
>
> So you cannot directly model your query tree as a Lucene
> query tree. NOT
> nodes must be incorporated into their parent, a BooleanQuery
> node, as is
> done in your "directly" case.
>
> Doug
>
> > -----Original Message-----
> > From: Anders Nielsen [mailto:anders@visator.dk]
> > Sent: Wednesday, October 31, 2001 10:36 AM
> > To: Lucene Users List
> > Subject: Problems with prohibited BooleanQueries
> >
> >
> > Hello,
> >
> > I have some trouble understanding the semantics of the
> > BooleanQuery with
> > regards to prohibited clause.
> >
> >
> > the method I am using is this:
> >
> > public final void add(Query query, boolean required, boolean
> > prohibited)
> >
> > in the class BooleanQuery.
> >
> >
> > I am not using the QueryParser, but instead I build my own
> > queries from a
> > Tree of search-strings with boolean operators NOT, AND and OR
> >
> > so a tree might look like this
> >
> > AND
> > / \
> > "A" NOT
> > |
> > "B"
> >
> > and that would mean "documents containing A, but not B". (If
> > the AND was an
> > OR, the NOT branch would be culled)
> >
> >
> > Now my problem is this, when I make a BooleanQuery containing only a
> > prohibited TermQuery and I add that to another BooleanQuery
> > as a required
> > query, I always get 0 results.
> >
> >
> > here's an example of my code:
> > ---------------------------
> > Term term1 = new Term("name", "jensen");
> > Term term2 = new Term("name", "barfod");
> >
> > BooleanQuery minusQuery = new BooleanQuery();
> > minusQuery.add(new TermQuery(term2), false, true);
> >
> > BooleanQuery q1 = new BooleanQuery();
> > q1.add(new TermQuery(term1), true, false);
> > q1.add(minusQuery, false, false); // not
> required that the
> > minusQuery is "matched"
> >
> > BooleanQuery q2 = new BooleanQuery();
> > q2.add(new TermQuery(term1), true, false);
> > q2.add(minusQuery, true, false); // required that
> > the minusQuery
> > is "matched"
> >
> > BooleanQuery q3 = new BooleanQuery();
> > q3.add(new TermQuery(term1), true, false);
> > q3.add(new TermQuery(term2), false, true); // add the
> > term directly
> > as a prohibited query
> >
> > System.out.println("non-required: " + _index.search(q1,
> > null).length());
> > System.out.println("required : " + _index.search(q2,
> > null).length());
> > System.out.println("directly : " +
> > _index.search(q3, null).length());
> > ---------------------------
> >
> > this outputs:
> >
> > non-required: 55
> > required : 0
> > directly : 48
> >
> >
> >
> > Regards,
> > Anders Nielsen
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
RE: Problems with prohibited BooleanQueries [ In reply to ]
> From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
>
> How difficult would it be to get BooleanQuery to do a
> standalone NOT, do you
> suppose? That would be very useful in my case.

It would not be that difficult, but it would make queries slow. All terms
not containing a term would need to be enumerated. Since most terms occur
in only a small percentage of the documents, most NOT queries would return
most documents.

Scoring would also be strange. I guess you'd give them all a score of 1.0,
and hope that the query is nested in a more complex query that will
differentiate the scores. But if it's nested, then you could do it with
BooleanQuery as it stands...

So, my question to you is: do you actually want lists of all documents that
do not contain a term, or, rather, do you want to use negation in the
context of other query terms, and are having trouble getting your query
parser to build BooleanQueries?

Doug

--
To unsubscribe, e-mail: <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
RE: Problems with prohibited BooleanQueries [ In reply to ]
I don't use a query parser at all, so that's no issue. I just need a
BooleanQuery to realize that it only has negative clauses and do the right
thing. Right now I have to include a bogus static field in every single
document so that I can use a TermQuery on that bogus field as the left side
of a BooleanQuery subtract. Sure, it works, but it ain't pretty...

Scott

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Thursday, November 01, 2001 10:49 AM
> To: 'Lucene Users List'
> Subject: RE: Problems with prohibited BooleanQueries
>
>
> > From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
> >
> > How difficult would it be to get BooleanQuery to do a
> > standalone NOT, do you
> > suppose? That would be very useful in my case.
>
> It would not be that difficult, but it would make queries
> slow. All terms
> not containing a term would need to be enumerated. Since
> most terms occur
> in only a small percentage of the documents, most NOT queries
> would return
> most documents.
>
> Scoring would also be strange. I guess you'd give them all a
> score of 1.0,
> and hope that the query is nested in a more complex query that will
> differentiate the scores. But if it's nested, then you could
> do it with
> BooleanQuery as it stands...
>
> So, my question to you is: do you actually want lists of all
> documents that
> do not contain a term, or, rather, do you want to use negation in the
> context of other query terms, and are having trouble getting
> your query
> parser to build BooleanQueries?
>
> Doug
>
> --
> To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>