Mailing List Archive

HEADS UP: TestRandomChains is more picky again! Please watch for Test failures!
Hi,

Robert and I worked a bit on our analyzers again. Starting from now on main
and 9.x branch, the well-known integration test "TestRandomChains"
(complemented by "TestAllAnalyzersHaveFactories") gots lifted to the next
stage: It uses the Java Module System (yeaaah) to discover all TokenFilters,
Tokenizers, CharFilters and now builds random chains of components on top of
ALL analysis modules and lucene.core: It may now happen that it builds a
HTML stripping charfilter together with a KoreanTokenizer and then marks
Katakana numbers with Kuromoji. Finally, it may calculate the phonetic form
with DaitchMokotoffSoundexFilter on the remaining emojis (this all of course
makes no sense in reality, but its tested).

We discovered already some bugs with offsets and missing argument checks on
constructors and opened many issues. There may be other bad components that
break if used in some combinations. So have a look at Jenkins failures (or
if it happens locally):

$ gradlew :lucene:analysis.tests:beast -Dtests.dups=100 --tests
TestRandomChains -Dtests.nightly=true

If you found a bug please open an issue. The mechanism to "disable" an
analysis component from being randomly stoned ?hm chained is now an
annotation:

@IgnoreRandomChains(reason = "LUCENE-10358: fails with incorrect offsets or
causes IndexOutOfBounds")

You can add it to the class (if it is broken and corrupts offsets) or to any
public constructor: Sometimes only a few (advanced) constructors have crazy
parameters we have no argumentProducer for (like one some crazy enum
constant outside of Lucene). E.g., in Kuromoji all ctors replacing the
default dictionaries are annotated to not be used. See
https://issues.apache.org/jira/browse/LUCENE-10352 for more details.

We have an issue open to also make the argumentProducer sometimes pass
"null" to constructors to test their behaviour on invalid input. This may
produce many failures so we have not yet applied it:
https://issues.apache.org/jira/browse/LUCENE-10353

In addition we now also have the "TestAllAnalyzersHaveFactories" test using
the same mechanism to find missing factories or entries in
META-INF/services. We already fixed 2 missing ones
(KoreanNumberFilterFactory, DaitchMokotoffSoundexFilterFactory). These were
long standing bugs, because without a META-INF entry, it is not possible to
correctly use the factory in Solr or CustomAnalyzer.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org