Mailing List Archive

Why would a search using a ComplexPhraseQueryParser throw an exception for some content, but not all content?
I am using Lucene 8.2, but have also verified this on 8.9.

My query string is either ""by~1 word~1"", or ""ky~1 word~1"".

I am looking for a phrase of these 2 words, with potential 1 character misspelling, or fuzziness.

I realize that 'by' is usually a stop word, that is why I also tested with 'ky'.

My simplified test content is either "AC-2.b word", "AC-2.k word", "AC-2.y word".

The first part of the test content is pulled from actual data my customers are trying to search.

For the query with 'by~1' the exception occurs if the content has '.b' or .y', but not '.k'

For the query with 'ky~1' the exception occurs if the content has '.k' or .y', but not '.b'

Here is the test code:
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.core.*;
import org.apache.lucene.analysis.standard.*;
import org.apache.lucene.analysis.tokenattributes.*;
import org.apache.lucene.analysis.util.*;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.FieldType;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser;

public class phraseTest {

public static Analyzer analyzer = new StandardAnalyzer();
public static IndexWriterConfig config = new IndexWriterConfig(
public static RAMDirectory ramDirectory = new RAMDirectory();
public static IndexWriter indexWriter;
public static Query queryToSearch = null;
public static IndexReader idxReader;
public static IndexSearcher idxSearcher;
public static TopDocs hits;
public static String query_field = "Content";

// Pick only one content string
// public static String content = "AC-2.b word";
public static String content = "AC-2.k word";
// public static String content = "AC-2.y word";

// Pick only one query string
// public static String queryString = "\"by~1 word~1\"";
public static String queryString = "\"ky~1 word~1\"";

public static void main(String[] args) throws IOException {

System.out.println("Content is\n " + content);
System.out.println("Query field is " + query_field);
System.out.println("Query String is '" + queryString + "'");

Document doc = new Document(); // create a new document

* Create a field with term vector enabled
FieldType type = new FieldType();

//term vector enabled
Field cField = new Field(query_field, content, type);

try {
indexWriter = new IndexWriter(ramDirectory, config);

idxReader =;
idxSearcher = new IndexSearcher(idxReader);
ComplexPhraseQueryParser qp =
new ComplexPhraseQueryParser(query_field, analyzer);
queryToSearch = qp.parse(queryString);

// Here is where the searching, etc starts
hits =, idxReader.maxDoc());
System.out.println("scoreDoc size: " + hits.scoreDocs.length);

// highlight the hits ...

} catch (IOException e) {
// TODO Auto-generated catch block
} catch (ParseException e) {
// TODO Auto-generated catch block


Here is the exception (using Lucene 8.2):

Exception in thread "main" java.lang.IllegalArgumentException: Unknown query type "" found in phrase query string "ky~1 word~1"

at org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(






at phraseTest.main(`

Am I using ComplexPhraseQueryParser wrong?

Is this a bug in Lucene?

I have also tested this with a query string like ""dog~2 word~1"".
This causes the same exception if the content has ‘.d’, ‘.o’, or ‘.g’.

Looks like a fuzzy term that reduces to 1 character runs into trouble when encountering a matching single character term in the content.

Thanks in advance for any suggestions, or guidance,

David Shifflett