Hello,
We are trying to search for phrases where the last term is a prefix match.
For example, find all documents that contain "foo bar.*", with a
configurable slop between "foo" and "bar". We were able to do this using
`SpanNearQuery` where the last clause is a `SpanMultiTermQueryWrapper` that
wraps a `PrefixQuery`. However, this seems to run into the limit of 1024
clauses very quickly if the last term appears as a common prefix in the
index.
I have a branch that reproduces the query at
https://github.com/apache/lucene/compare/main...yixunx:yx/span-query-limit?expand=1,
and also pasted the code below.
It seems that if slop = 0 then we can use `MultiPhraseQuery` instead, which
doesn't hit the clause limit. For the slop != 0 case, is it intended that
`SpanMultiTermQueryWrapper` can easily hit the clause limit, or am I using
the queries wrong? Is there a workaround other than increasing
`maxClauseCount`?
Thank you for the help!
```java
public class TestSpanNearQueryClauseLimit extends LuceneTestCase {
private static final String FIELD_NAME = "field";
private static final int NUM_DOCUMENTS = 1025;
/**
* Creates an index with NUM_DOCUMENTS documents. Each document has a
text field in the form of "abc foo bar_[UUID]".
*/
private Directory createIndex() throws Exception {
Directory dir = newDirectory();
try (IndexWriter writer = new IndexWriter(dir, new
IndexWriterConfig())) {
for (int i = 0; i < NUM_DOCUMENTS; i++) {
Document doc = new Document();
doc.add(new TextField("field", "abc foo bar_" +
UUID.randomUUID(), Field.Store.YES));
writer.addDocument(doc);
}
writer.commit();
}
return dir;
}
public void testSpanNearQueryClauseLimit() throws Exception {
Directory dir = createIndex();
// Find documents that match "abc <some term> bar.*", which should
match all documents.
try (IndexReader reader = DirectoryReader.open(dir)) {
Query query = new SpanNearQuery.Builder(FIELD_NAME, true)
.setSlop(1)
.addClause(new SpanTermQuery(new Term(FIELD_NAME,
"abc")))
.addClause(new SpanMultiTermQueryWrapper<>(new
PrefixQuery(new Term(FIELD_NAME, "bar"))))
.build();
// This throws exception if NUM_DOCUMENTS is > 1024.
// ```
// org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
Query contains too many nested clauses;
// maxClauseCount is set to 1024
// ```
TopDocs docs = new IndexSearcher(reader).search(query, 10);
System.out.println(docs.totalHits);
}
dir.close();
}
}
```
Thank you,
Yixun Xu
We are trying to search for phrases where the last term is a prefix match.
For example, find all documents that contain "foo bar.*", with a
configurable slop between "foo" and "bar". We were able to do this using
`SpanNearQuery` where the last clause is a `SpanMultiTermQueryWrapper` that
wraps a `PrefixQuery`. However, this seems to run into the limit of 1024
clauses very quickly if the last term appears as a common prefix in the
index.
I have a branch that reproduces the query at
https://github.com/apache/lucene/compare/main...yixunx:yx/span-query-limit?expand=1,
and also pasted the code below.
It seems that if slop = 0 then we can use `MultiPhraseQuery` instead, which
doesn't hit the clause limit. For the slop != 0 case, is it intended that
`SpanMultiTermQueryWrapper` can easily hit the clause limit, or am I using
the queries wrong? Is there a workaround other than increasing
`maxClauseCount`?
Thank you for the help!
```java
public class TestSpanNearQueryClauseLimit extends LuceneTestCase {
private static final String FIELD_NAME = "field";
private static final int NUM_DOCUMENTS = 1025;
/**
* Creates an index with NUM_DOCUMENTS documents. Each document has a
text field in the form of "abc foo bar_[UUID]".
*/
private Directory createIndex() throws Exception {
Directory dir = newDirectory();
try (IndexWriter writer = new IndexWriter(dir, new
IndexWriterConfig())) {
for (int i = 0; i < NUM_DOCUMENTS; i++) {
Document doc = new Document();
doc.add(new TextField("field", "abc foo bar_" +
UUID.randomUUID(), Field.Store.YES));
writer.addDocument(doc);
}
writer.commit();
}
return dir;
}
public void testSpanNearQueryClauseLimit() throws Exception {
Directory dir = createIndex();
// Find documents that match "abc <some term> bar.*", which should
match all documents.
try (IndexReader reader = DirectoryReader.open(dir)) {
Query query = new SpanNearQuery.Builder(FIELD_NAME, true)
.setSlop(1)
.addClause(new SpanTermQuery(new Term(FIELD_NAME,
"abc")))
.addClause(new SpanMultiTermQueryWrapper<>(new
PrefixQuery(new Term(FIELD_NAME, "bar"))))
.build();
// This throws exception if NUM_DOCUMENTS is > 1024.
// ```
// org.apache.lucene.search.IndexSearcher$TooManyNestedClauses:
Query contains too many nested clauses;
// maxClauseCount is set to 1024
// ```
TopDocs docs = new IndexSearcher(reader).search(query, 10);
System.out.println(docs.totalHits);
}
dir.close();
}
}
```
Thank you,
Yixun Xu