Hi there,
I try to optimize my query, but I think I have come to a point where I have
to extend the functionality of lucene.
I have a set of pretty simple queries, let's call them <A>, <B>, <C>, ...
Now I try to create a query which matches, if AT LEAST 2 of this simple
queries match. I started with a query like the following one (build all the
possible pairs and combine them with 'OR'):
(<A> AND <B>) OR (<A> AND <C>) OR (<B> and <C>) OR ...
My first question:
Is it safe to create the simple Query objects (most of the time TermQuery
objects) and then use these objects multiple times in a BooleanQuery object?
Example:
Query a = new TermQuery("name", "John");
Query b = new TermQuery("name", "James");
Query c = new TermQuery("name", "Jack");
Query q1 = new BooleanQuery();
and.add(a,true,false);
and.add(b,true,false);
Query q2 = new BooleanQuery();
and.add(a,true,false);
and.add(c,true,false);
Query q3 = new BooleanQuery();
and.add(b,true,false);
and.add(c,true,false);
Query q = new BooleanQuery();
q.add(a,false,false);
q.add(b,false,false);
q.add(c,false,false);
Back to the optimization problem:
I was able to optimize the above like this:
(<A> AND (<B> OR <C> OR ...)) OR (<B> AND (<C> OR ...)) OR (<C> AND (...))
OR ...
But this is still to slow and there must be some potential to improve this:
If <A> matches, but the subquery (<B> OR <C> OR ...) doesn't match,
evaluation could stop. But the query above will continue. I could change the
query again to something like the following:
(<A> AND (<B> OR <C> OR ...)) OR NOT(<A>) AND ( .... )
But this would get pretty complex.
Now I would like to create a subclass of Query (or MultiTermQuery?),
something like a MatchCountQuery where I can add my simple Query objects:
MatchCountQuery query = new MatchCountQuery();
query.setMinCount(2); // at least 2 matches
query.setMaxCount(-1); // no limit (I don't realy need this)
query.add(<A>);
query.add(<B>);
query.add(<C>);
query.add(<D>);
Has anyone ever done something like this? Or is there a simpler solution?
Thank you for any kind of help in this case.
Stephan
I try to optimize my query, but I think I have come to a point where I have
to extend the functionality of lucene.
I have a set of pretty simple queries, let's call them <A>, <B>, <C>, ...
Now I try to create a query which matches, if AT LEAST 2 of this simple
queries match. I started with a query like the following one (build all the
possible pairs and combine them with 'OR'):
(<A> AND <B>) OR (<A> AND <C>) OR (<B> and <C>) OR ...
My first question:
Is it safe to create the simple Query objects (most of the time TermQuery
objects) and then use these objects multiple times in a BooleanQuery object?
Example:
Query a = new TermQuery("name", "John");
Query b = new TermQuery("name", "James");
Query c = new TermQuery("name", "Jack");
Query q1 = new BooleanQuery();
and.add(a,true,false);
and.add(b,true,false);
Query q2 = new BooleanQuery();
and.add(a,true,false);
and.add(c,true,false);
Query q3 = new BooleanQuery();
and.add(b,true,false);
and.add(c,true,false);
Query q = new BooleanQuery();
q.add(a,false,false);
q.add(b,false,false);
q.add(c,false,false);
Back to the optimization problem:
I was able to optimize the above like this:
(<A> AND (<B> OR <C> OR ...)) OR (<B> AND (<C> OR ...)) OR (<C> AND (...))
OR ...
But this is still to slow and there must be some potential to improve this:
If <A> matches, but the subquery (<B> OR <C> OR ...) doesn't match,
evaluation could stop. But the query above will continue. I could change the
query again to something like the following:
(<A> AND (<B> OR <C> OR ...)) OR NOT(<A>) AND ( .... )
But this would get pretty complex.
Now I would like to create a subclass of Query (or MultiTermQuery?),
something like a MatchCountQuery where I can add my simple Query objects:
MatchCountQuery query = new MatchCountQuery();
query.setMinCount(2); // at least 2 matches
query.setMaxCount(-1); // no limit (I don't realy need this)
query.add(<A>);
query.add(<B>);
query.add(<C>);
query.add(<D>);
Has anyone ever done something like this? Or is there a simpler solution?
Thank you for any kind of help in this case.
Stephan