Mailing List Archive

TokenStream contract violation: close() call missing due to race condition in custom Analyzer
Morning all,

Been recently developing a set of custom analyzers to transform incoming
queries under some criteria (it's country, it's language, etc.).

When deploying the project in multiple clouds I've seen that in one of
them, under high loads, the following error TokenStream contract violation:
close() call missing appears and the only way I've been able to reproduce
it is to concurrently launch several tests locally and I'm actually
struggling to see where the race condition is happening. The stack trace
goes as follows:

java.lang.IllegalStateException: TokenStream contract violation: close()
call missing at
org.apache.lucene.analysis.Tokenizer.setReader(Tokenizer.java:90) at
org.apache.lucene.analysis.Analyzer$TokenStreamComponents.setReader(Analyzer.java:412)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:202) at
my.project.tokenizer.analyzer.CustomAnalyzerProcessor.process(CustomAnalyzerProcessor.java:77)

The set of code which comprises the problem (in execution order) is the
following.

*(CustomAnalyzerProcessor.class)*
public String process(String input) throws Exception {
final TokenStream tokenStream =
customAnalyzer.getAnalyzer().tokenStream("", input); *// Exception thrown
in this line*
final CharTermAttribute charTermAttribute =
tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset(); *// .reset() call*

final List<String> terms = new ArrayList<>();
while (tokenStream.incrementToken()) {
terms.add(charTermAttribute.toString());
}

tokenStream.close(); *// .close() call*

return Joiner.on(" ").join(terms);
}

The class which contains the process(input) method has a map that contains
CustomAnalyzer classes, which implement a couple of methods that add
TokenFilters to process the tokens. The base class for the rest of
implementations is the following one.

public class StandardCustomAnalyzer implements CustomAnalyzer {

private final Analyzer analyzer;

public StandardCustomAnalyzer() {
this.analyzer = buildAnalyzer();
}

protected Analyzer buildAnalyzer(String lang) {
return new Analyzer() {
private final SolrResourceLoader solrResourceLoader = new
SolrResourceLoader(null);

@Override
protected TokenStreamComponents createComponents(String
fieldName) {
final Tokenizer tokenizer = new WhitespaceTokenizer();
TokenStream stream = tokenizer;

for (TokenFilterFactory factory :
CustomAnalyzerProcessor.buildFactories(getTokenFilterList())) {
try {
if (factory instanceof ResourceLoaderAware) {
final Method method =
factory.getClass().getMethod("inform", ResourceLoader.class);
method.invoke(factory, solrResourceLoader);
}
} catch (Exception ex) {
LOG.error("Exception executing inform method", ex);
} finally {
stream = factory.create(stream);
}
}

return new TokenStreamComponents(tokenizer, stream); *//
Possible issue in here with the stream sent to the constructor?*
}
};
}

@Override
public Multimap<Class<? extends TokenFilterFactory>, Map<String,
String>> getTokenFilterList(String lang) {
final Multimap<Class<? extends TokenFilterFactory>, Map<String,
String>> filters = LinkedListMultimap.create(2);

filters.put(LowerCaseFilterFactory.class, Collections.emptyMap());

return filters;
}

@Override
public Analyzer getAnalyzer() {
return analyzer;
}

}

Those are the main points related to the exception which, as said, happens
randomly due to a race condition.

Hope you can shed some light here and thanks in advance.