Mailing List Archive

Lucene Highlighting mergeContiguous
Hello All,

Trying to highlight a phrase "John Doe" using Lucene highlighter but the
content highlights each separate term. Contiguous terms are not merged
together.

For eg: <hl>John</hl><hl>Doe</hl> is returned instead of <hl>John Doe</hl>

I have set the mergeContiguous parameter on the getBestTextFragments method
to true. What can I be missing?

All help appreciated. Thanks!

StandardAnalyzer analyzer = new StandardAnalyzer();
QueryParser parser = new QueryParser("*", analyzer);
QueryScorer scorer = new QueryScorer( parser.parse( query), "*" );
scorer.setExpandMultiTermQuery( true );
scorer.setMaxDocCharsToAnalyze(2000000);
Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter( "<hl>",
"</hl>"), scorer );
highlighter.setTextFragmenter(new SimpleSpanFragmenter(scorer));
highlighter.setMaxDocCharsToAnalyze( 2000000 );
TextFragment[] all =
highlighter.getBestTextFragments(analyzer.tokenStream("*", new
StringReader(str)), str, true, 100);