Mailing List Archive

Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable!
This actually reproduces (if you download enwiki). I wonder if we
should tune LineFileDocs so that it avoids trying to add humongous
terms.

D.

On Wed, Apr 20, 2022 at 3:42 AM Apache Jenkins Server
<jenkins@builds.apache.org> wrote:
>
> Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.1/42/
>
> 1 tests failed.
> FAILED: org.apache.lucene.index.TestAllFilesCheckIndexHeader.test
>
> Error Message:
> java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[.125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
>
> Stack Trace:
> java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[.125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
> at __randomizedtesting.SeedInfo.seed([34ECEDA648B62DC2:BCB8D27CE64A403A]:0)
> at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1242)
> at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729)
> at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620)
> at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241)
> at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)
> at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1531)
> at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1816)
> at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1469)
> at org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:222)
> at org.apache.lucene.index.TestAllFilesCheckIndexHeader.test(TestAllFilesCheckIndexHeader.java:58)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
> at java.base/java.lang.Thread.run(Thread.java:834)
> Suppressed: java.lang.IllegalStateException: close() called in wrong state: INCREMENT
> at org.apache.lucene.tests.analysis.MockTokenizer.fail(MockTokenizer.java:136)
> at org.apache.lucene.tests.analysis.MockTokenizer.close(MockTokenizer.java:327)
> at org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:58)
> at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1136)
> ... 48 more
> Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 94384
> at app//org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:258)
> at app//org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:193)
> at app//org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1224)
> ... 48 more
>
>
>
>
> Build Log:
> [...truncated 573 lines...]
> ERROR: The following test(s) have failed:
> - org.apache.lucene.index.TestAllFilesCheckIndexHeader.test (:lucene:core)
> Test output: /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestAllFilesCheckIndexHeader.txt
> Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.index.TestAllFilesCheckIndexHeader.test" -Ptests.jvms=4 -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=34ECEDA648B62DC2 -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false -Ptests.file.encoding=ISO-8859-1 -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/test-data/enwiki.random.lines.txt
>
>
> BUILD SUCCESSFUL in 1h 49m 55s
> 223 actionable tasks: 223 executed
> Build step 'Invoke Gradle script' changed build result to SUCCESS
> Archiving artifacts
> java.lang.InterruptedException: no matches found within 10000
> at hudson.FilePath$ValidateAntFileMask.hasMatch(FilePath.java:3079)
> at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2958)
> at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2939)
> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3329)
> Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to lucene-solr-2
> at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1797)
> at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
> at hudson.remoting.Channel.call(Channel.java:1001)
> at hudson.FilePath.act(FilePath.java:1165)
> at hudson.FilePath.act(FilePath.java:1154)
> at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
> at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
> at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
> at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
> at hudson.model.Build$BuildExecution.post2(Build.java:178)
> at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
> at hudson.model.Run.execute(Run.java:1913)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at hudson.model.ResourceController.execute(ResourceController.java:99)
> at hudson.model.Executor.run(Executor.java:432)
> Caused: hudson.FilePath$TunneledInterruptedException
> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3331)
> at hudson.remoting.UserRequest.perform(UserRequest.java:211)
> at hudson.remoting.UserRequest.perform(UserRequest.java:54)
> at hudson.remoting.Request$2.run(Request.java:376)
> at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused: java.lang.InterruptedException: java.lang.InterruptedException: no matches found within 10000
> at hudson.FilePath.act(FilePath.java:1167)
> at hudson.FilePath.act(FilePath.java:1154)
> at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
> at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
> at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
> at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
> at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
> at hudson.model.Build$BuildExecution.post2(Build.java:178)
> at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
> at hudson.model.Run.execute(Run.java:1913)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at hudson.model.ResourceController.execute(ResourceController.java:99)
> at hudson.model.Executor.run(Executor.java:432)
> No artifacts found that match the file pattern "**/*.events,heapdumps/**,**/hs_err_pid*". Configuration error?
> Recording test results
> [Checks API] No suitable checks publisher found.
> Build step 'Publish JUnit test result report' changed build result to UNSTABLE
> Email was triggered for: Unstable (Test Failures)
> Sending email for trigger: Unstable (Test Failures)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
> For additional commands, e-mail: builds-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
And, for the record - indeed enwiki contains an odd field with a
super-long term that looks like this:

13:24:08.000 {{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=1680}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=738}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=358}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=197}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=305}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=59}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=482}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=613}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=361}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=141}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=34}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=484}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=1723}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}=
[snip]



On Fri, Apr 22, 2022 at 11:57 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:
>
> This actually reproduces (if you download enwiki). I wonder if we
> should tune LineFileDocs so that it avoids trying to add humongous
> terms.
>
> D.
>
> On Wed, Apr 20, 2022 at 3:42 AM Apache Jenkins Server
> <jenkins@builds.apache.org> wrote:
> >
> > Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.1/42/
> >
> > 1 tests failed.
> > FAILED: org.apache.lucene.index.TestAllFilesCheckIndexHeader.test
> >
> > Error Message:
> > java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[.125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
> >
> > Stack Trace:
> > java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[.125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
> > at __randomizedtesting.SeedInfo.seed([34ECEDA648B62DC2:BCB8D27CE64A403A]:0)
> > at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1242)
> > at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729)
> > at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620)
> > at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241)
> > at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)
> > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1531)
> > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1816)
> > at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1469)
> > at org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:222)
> > at org.apache.lucene.index.TestAllFilesCheckIndexHeader.test(TestAllFilesCheckIndexHeader.java:58)
> > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> > at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> > at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> > at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> > at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> > at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> > at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> > at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> > at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> > at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> > at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> > at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> > at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> > at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
> > at java.base/java.lang.Thread.run(Thread.java:834)
> > Suppressed: java.lang.IllegalStateException: close() called in wrong state: INCREMENT
> > at org.apache.lucene.tests.analysis.MockTokenizer.fail(MockTokenizer.java:136)
> > at org.apache.lucene.tests.analysis.MockTokenizer.close(MockTokenizer.java:327)
> > at org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:58)
> > at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1136)
> > ... 48 more
> > Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 94384
> > at app//org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:258)
> > at app//org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:193)
> > at app//org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1224)
> > ... 48 more
> >
> >
> >
> >
> > Build Log:
> > [...truncated 573 lines...]
> > ERROR: The following test(s) have failed:
> > - org.apache.lucene.index.TestAllFilesCheckIndexHeader.test (:lucene:core)
> > Test output: /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestAllFilesCheckIndexHeader.txt
> > Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.index.TestAllFilesCheckIndexHeader.test" -Ptests.jvms=4 -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=34ECEDA648B62DC2 -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false -Ptests.file.encoding=ISO-8859-1 -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/test-data/enwiki.random.lines.txt
> >
> >
> > BUILD SUCCESSFUL in 1h 49m 55s
> > 223 actionable tasks: 223 executed
> > Build step 'Invoke Gradle script' changed build result to SUCCESS
> > Archiving artifacts
> > java.lang.InterruptedException: no matches found within 10000
> > at hudson.FilePath$ValidateAntFileMask.hasMatch(FilePath.java:3079)
> > at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2958)
> > at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2939)
> > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3329)
> > Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to lucene-solr-2
> > at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1797)
> > at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
> > at hudson.remoting.Channel.call(Channel.java:1001)
> > at hudson.FilePath.act(FilePath.java:1165)
> > at hudson.FilePath.act(FilePath.java:1154)
> > at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
> > at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
> > at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
> > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> > at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
> > at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
> > at hudson.model.Build$BuildExecution.post2(Build.java:178)
> > at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
> > at hudson.model.Run.execute(Run.java:1913)
> > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> > at hudson.model.ResourceController.execute(ResourceController.java:99)
> > at hudson.model.Executor.run(Executor.java:432)
> > Caused: hudson.FilePath$TunneledInterruptedException
> > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3331)
> > at hudson.remoting.UserRequest.perform(UserRequest.java:211)
> > at hudson.remoting.UserRequest.perform(UserRequest.java:54)
> > at hudson.remoting.Request$2.run(Request.java:376)
> > at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused: java.lang.InterruptedException: java.lang.InterruptedException: no matches found within 10000
> > at hudson.FilePath.act(FilePath.java:1167)
> > at hudson.FilePath.act(FilePath.java:1154)
> > at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
> > at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
> > at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
> > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> > at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
> > at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
> > at hudson.model.Build$BuildExecution.post2(Build.java:178)
> > at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
> > at hudson.model.Run.execute(Run.java:1913)
> > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> > at hudson.model.ResourceController.execute(ResourceController.java:99)
> > at hudson.model.Executor.run(Executor.java:432)
> > No artifacts found that match the file pattern "**/*.events,heapdumps/**,**/hs_err_pid*". Configuration error?
> > Recording test results
> > [Checks API] No suitable checks publisher found.
> > Build step 'Publish JUnit test result report' changed build result to UNSTABLE
> > Email was triggered for: Unstable (Test Failures)
> > Sending email for trigger: Unstable (Test Failures)
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: builds-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
LOL thanks for getting to the root cause Dawid!

The thing is, such screwed up text is a fact of life for many Lucene
applications -- they accidentally try to ingest massive terms, or index
base64 as if it were text, etc. I think it's healthy for us to also test
Lucene on such content and make sure we don't have some other bug creep in
where Lucene reacts badly, e.g. say causing index corruption because this
IllegalArgumentException was thrown?

This seems to be quite rare -- maybe our (large, nightly) enwiki sample has
only a few such too-massive terms, and this particular test + random seed
hit the jackpot.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Apr 22, 2022 at 6:04 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:

> And, for the record - indeed enwiki contains an odd field with a
> super-long term that looks like this:
>
> 13:24:08.000
> {{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=1680}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=738}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=358}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=197}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=305}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=59}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=482}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=613}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=361}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=141}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=34}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=484}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=1723}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}=
> [snip]
>
>
>
> On Fri, Apr 22, 2022 at 11:57 PM Dawid Weiss <dawid.weiss@gmail.com>
> wrote:
> >
> > This actually reproduces (if you download enwiki). I wonder if we
> > should tune LineFileDocs so that it avoids trying to add humongous
> > terms.
> >
> > D.
> >
> > On Wed, Apr 20, 2022 at 3:42 AM Apache Jenkins Server
> > <jenkins@builds.apache.org> wrote:
> > >
> > > Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.1/42/
> > >
> > > 1 tests failed.
> > > FAILED: org.apache.lucene.index.TestAllFilesCheckIndexHeader.test
> > >
> > > Error Message:
> > > java.lang.IllegalArgumentException: Document contains at least one
> immense term in field="body" (whose UTF8 encoding is longer than the max
> length 32766), all of which were skipped. Please correct the analyzer to
> not produce such terms. The prefix of the first immense term is: '[.125,
> 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125,
> 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...',
> original message: bytes can be at most 32766 in length; got 94384
> > >
> > > Stack Trace:
> > > java.lang.IllegalArgumentException: Document contains at least one
> immense term in field="body" (whose UTF8 encoding is longer than the max
> length 32766), all of which were skipped. Please correct the analyzer to
> not produce such terms. The prefix of the first immense term is: '[.125,
> 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125,
> 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...',
> original message: bytes can be at most 32766 in length; got 94384
> > > at
> __randomizedtesting.SeedInfo.seed([34ECEDA648B62DC2:BCB8D27CE64A403A]:0)
> > > at
> org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1242)
> > > at
> org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729)
> > > at
> org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620)
> > > at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241)
> > > at
> org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)
> > > at
> org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1531)
> > > at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1816)
> > > at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1469)
> > > at
> org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:222)
> > > at
> org.apache.lucene.index.TestAllFilesCheckIndexHeader.test(TestAllFilesCheckIndexHeader.java:58)
> > > at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > > at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > > at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
> > > at
> org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
> > > at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > > at
> org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
> > > at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> > > at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> > > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> > > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
> > > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
> > > at
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
> > > at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > > at
> org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
> > > at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > > at
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
> > > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > > at
> org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
> > > at
> org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
> > > at
> org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
> > > at
> org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
> > > at
> org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
> > > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > > at
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> > > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
> > > at
> com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
> > > at java.base/java.lang.Thread.run(Thread.java:834)
> > > Suppressed: java.lang.IllegalStateException: close() called in
> wrong state: INCREMENT
> > > at
> org.apache.lucene.tests.analysis.MockTokenizer.fail(MockTokenizer.java:136)
> > > at
> org.apache.lucene.tests.analysis.MockTokenizer.close(MockTokenizer.java:327)
> > > at
> org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:58)
> > > at
> org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1136)
> > > ... 48 more
> > > Caused by:
> org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes
> can be at most 32766 in length; got 94384
> > > at
> app//org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:258)
> > > at
> app//org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:193)
> > > at
> app//org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1224)
> > > ... 48 more
> > >
> > >
> > >
> > >
> > > Build Log:
> > > [...truncated 573 lines...]
> > > ERROR: The following test(s) have failed:
> > > - org.apache.lucene.index.TestAllFilesCheckIndexHeader.test
> (:lucene:core)
> > > Test output:
> /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestAllFilesCheckIndexHeader.txt
> > > Reproduce with: gradlew :lucene:core:test --tests
> "org.apache.lucene.index.TestAllFilesCheckIndexHeader.test" -Ptests.jvms=4
> -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:TieredStopAtLevel=1
> -Ptests.seed=34ECEDA648B62DC2 -Ptests.multiplier=2 -Ptests.nightly=true
> -Ptests.badapples=false -Ptests.file.encoding=ISO-8859-1
> -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/test-data/enwiki.random.lines.txt
> > >
> > >
> > > BUILD SUCCESSFUL in 1h 49m 55s
> > > 223 actionable tasks: 223 executed
> > > Build step 'Invoke Gradle script' changed build result to SUCCESS
> > > Archiving artifacts
> > > java.lang.InterruptedException: no matches found within 10000
> > > at
> hudson.FilePath$ValidateAntFileMask.hasMatch(FilePath.java:3079)
> > > at
> hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2958)
> > > at
> hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2939)
> > > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3329)
> > > Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to
> lucene-solr-2
> > > at
> hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1797)
> > > at
> hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
> > > at hudson.remoting.Channel.call(Channel.java:1001)
> > > at hudson.FilePath.act(FilePath.java:1165)
> > > at hudson.FilePath.act(FilePath.java:1154)
> > > at
> hudson.FilePath.validateAntFileMask(FilePath.java:2937)
> > > at
> hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
> > > at
> hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
> > > at
> hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> > > at
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
> > > at
> hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
> > > at
> hudson.model.Build$BuildExecution.post2(Build.java:178)
> > > at
> hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
> > > at hudson.model.Run.execute(Run.java:1913)
> > > at
> hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> > > at
> hudson.model.ResourceController.execute(ResourceController.java:99)
> > > at hudson.model.Executor.run(Executor.java:432)
> > > Caused: hudson.FilePath$TunneledInterruptedException
> > > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3331)
> > > at hudson.remoting.UserRequest.perform(UserRequest.java:211)
> > > at hudson.remoting.UserRequest.perform(UserRequest.java:54)
> > > at hudson.remoting.Request$2.run(Request.java:376)
> > > at
> hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > > at java.lang.Thread.run(Thread.java:748)
> > > Caused: java.lang.InterruptedException:
> java.lang.InterruptedException: no matches found within 10000
> > > at hudson.FilePath.act(FilePath.java:1167)
> > > at hudson.FilePath.act(FilePath.java:1154)
> > > at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
> > > at
> hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
> > > at
> hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
> > > at
> hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> > > at
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
> > > at
> hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
> > > at hudson.model.Build$BuildExecution.post2(Build.java:178)
> > > at
> hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
> > > at hudson.model.Run.execute(Run.java:1913)
> > > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> > > at
> hudson.model.ResourceController.execute(ResourceController.java:99)
> > > at hudson.model.Executor.run(Executor.java:432)
> > > No artifacts found that match the file pattern
> "**/*.events,heapdumps/**,**/hs_err_pid*". Configuration error?
> > > Recording test results
> > > [Checks API] No suitable checks publisher found.
> > > Build step 'Publish JUnit test result report' changed build result to
> UNSTABLE
> > > Email was triggered for: Unstable (Test Failures)
> > > Sending email for trigger: Unstable (Test Failures)
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: builds-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
> For additional commands, e-mail: builds-help@lucene.apache.org
>
>
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
we can write unit tests for what happens on too-big terms. unit tests
that are simple and don't require downloading gigabyte files off the
internet.

i don't think we should intentionally allow our tests to be flaky, i
strongly disagree.

On Sat, Apr 23, 2022 at 6:12 AM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> LOL thanks for getting to the root cause Dawid!
>
> The thing is, such screwed up text is a fact of life for many Lucene applications -- they accidentally try to ingest massive terms, or index base64 as if it were text, etc. I think it's healthy for us to also test Lucene on such content and make sure we don't have some other bug creep in where Lucene reacts badly, e.g. say causing index corruption because this IllegalArgumentException was thrown?
>
> This seems to be quite rare -- maybe our (large, nightly) enwiki sample has only a few such too-massive terms, and this particular test + random seed hit the jackpot.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Apr 22, 2022 at 6:04 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:
>>
>> And, for the record - indeed enwiki contains an odd field with a
>> super-long term that looks like this:
>>
>> 13:24:08.000 {{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=1680}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=738}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=358}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=197}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=305}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=59}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=482}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=613}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=361}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=141}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=34}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=484}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}={{{p1v|}}}|{{{p2n|}}}={{{p2v|}}}|{{{p3n|}}}={{{p3v|}}}|{{{p4n|}}}={{{p4v|}}}|{{{p5n|}}}={{{p5v|}}}|{{{p6n|}}}={{{p6v|}}}|{{{p7n|}}}={{{p7v|}}}|{{{p8n|}}}={{{p8v|}}}|{{{p9n|}}}={{{p9v|}}}|{{{p10n|}}}={{{p10v|}}}|{{{mun|1}}}=1723}}{{{{{substc|}}}{{{1}}}|{{{p1n|}}}=
>> [snip]
>>
>>
>>
>> On Fri, Apr 22, 2022 at 11:57 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:
>> >
>> > This actually reproduces (if you download enwiki). I wonder if we
>> > should tune LineFileDocs so that it avoids trying to add humongous
>> > terms.
>> >
>> > D.
>> >
>> > On Wed, Apr 20, 2022 at 3:42 AM Apache Jenkins Server
>> > <jenkins@builds.apache.org> wrote:
>> > >
>> > > Build: https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-9.1/42/
>> > >
>> > > 1 tests failed.
>> > > FAILED: org.apache.lucene.index.TestAllFilesCheckIndexHeader.test
>> > >
>> > > Error Message:
>> > > java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[.125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
>> > >
>> > > Stack Trace:
>> > > java.lang.IllegalArgumentException: Document contains at least one immense term in field="body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[.125, 125, 123, 123, 123, 123, 123, 115, 117, 98, 115, 116, 99, 124, 125, 125, 125, 123, 123, 123, 49, 125, 125, 125, 124, 123, 123, 123, 112, 49]...', original message: bytes can be at most 32766 in length; got 94384
>> > > at __randomizedtesting.SeedInfo.seed([34ECEDA648B62DC2:BCB8D27CE64A403A]:0)
>> > > at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1242)
>> > > at org.apache.lucene.index.IndexingChain.processField(IndexingChain.java:729)
>> > > at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:620)
>> > > at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:241)
>> > > at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:432)
>> > > at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1531)
>> > > at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1816)
>> > > at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1469)
>> > > at org.apache.lucene.tests.index.RandomIndexWriter.addDocument(RandomIndexWriter.java:222)
>> > > at org.apache.lucene.index.TestAllFilesCheckIndexHeader.test(TestAllFilesCheckIndexHeader.java:58)
>> > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> > > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > > at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1754)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:942)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:978)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
>> > > at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
>> > > at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> > > at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
>> > > at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> > > at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> > > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> > > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> > > at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>> > > at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:819)
>> > > at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:470)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:951)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:836)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:887)
>> > > at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:898)
>> > > at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> > > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> > > at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
>> > > at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>> > > at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>> > > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> > > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> > > at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
>> > > at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
>> > > at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
>> > > at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
>> > > at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
>> > > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>> > > at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> > > at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
>> > > at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
>> > > at java.base/java.lang.Thread.run(Thread.java:834)
>> > > Suppressed: java.lang.IllegalStateException: close() called in wrong state: INCREMENT
>> > > at org.apache.lucene.tests.analysis.MockTokenizer.fail(MockTokenizer.java:136)
>> > > at org.apache.lucene.tests.analysis.MockTokenizer.close(MockTokenizer.java:327)
>> > > at org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:58)
>> > > at org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1136)
>> > > ... 48 more
>> > > Caused by: org.apache.lucene.util.BytesRefHash$MaxBytesLengthExceededException: bytes can be at most 32766 in length; got 94384
>> > > at app//org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:258)
>> > > at app//org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:193)
>> > > at app//org.apache.lucene.index.IndexingChain$PerField.invert(IndexingChain.java:1224)
>> > > ... 48 more
>> > >
>> > >
>> > >
>> > >
>> > > Build Log:
>> > > [...truncated 573 lines...]
>> > > ERROR: The following test(s) have failed:
>> > > - org.apache.lucene.index.TestAllFilesCheckIndexHeader.test (:lucene:core)
>> > > Test output: /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestAllFilesCheckIndexHeader.txt
>> > > Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.index.TestAllFilesCheckIndexHeader.test" -Ptests.jvms=4 -Ptests.haltonfailure=false -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=34ECEDA648B62DC2 -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false -Ptests.file.encoding=ISO-8859-1 -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-9.1/test-data/enwiki.random.lines.txt
>> > >
>> > >
>> > > BUILD SUCCESSFUL in 1h 49m 55s
>> > > 223 actionable tasks: 223 executed
>> > > Build step 'Invoke Gradle script' changed build result to SUCCESS
>> > > Archiving artifacts
>> > > java.lang.InterruptedException: no matches found within 10000
>> > > at hudson.FilePath$ValidateAntFileMask.hasMatch(FilePath.java:3079)
>> > > at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2958)
>> > > at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2939)
>> > > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3329)
>> > > Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to lucene-solr-2
>> > > at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1797)
>> > > at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:356)
>> > > at hudson.remoting.Channel.call(Channel.java:1001)
>> > > at hudson.FilePath.act(FilePath.java:1165)
>> > > at hudson.FilePath.act(FilePath.java:1154)
>> > > at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
>> > > at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
>> > > at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
>> > > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>> > > at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
>> > > at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
>> > > at hudson.model.Build$BuildExecution.post2(Build.java:178)
>> > > at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
>> > > at hudson.model.Run.execute(Run.java:1913)
>> > > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>> > > at hudson.model.ResourceController.execute(ResourceController.java:99)
>> > > at hudson.model.Executor.run(Executor.java:432)
>> > > Caused: hudson.FilePath$TunneledInterruptedException
>> > > at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3331)
>> > > at hudson.remoting.UserRequest.perform(UserRequest.java:211)
>> > > at hudson.remoting.UserRequest.perform(UserRequest.java:54)
>> > > at hudson.remoting.Request$2.run(Request.java:376)
>> > > at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
>> > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> > > at java.lang.Thread.run(Thread.java:748)
>> > > Caused: java.lang.InterruptedException: java.lang.InterruptedException: no matches found within 10000
>> > > at hudson.FilePath.act(FilePath.java:1167)
>> > > at hudson.FilePath.act(FilePath.java:1154)
>> > > at hudson.FilePath.validateAntFileMask(FilePath.java:2937)
>> > > at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:268)
>> > > at hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:78)
>> > > at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
>> > > at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:806)
>> > > at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:755)
>> > > at hudson.model.Build$BuildExecution.post2(Build.java:178)
>> > > at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:699)
>> > > at hudson.model.Run.execute(Run.java:1913)
>> > > at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
>> > > at hudson.model.ResourceController.execute(ResourceController.java:99)
>> > > at hudson.model.Executor.run(Executor.java:432)
>> > > No artifacts found that match the file pattern "**/*.events,heapdumps/**,**/hs_err_pid*". Configuration error?
>> > > Recording test results
>> > > [Checks API] No suitable checks publisher found.
>> > > Build step 'Publish JUnit test result report' changed build result to UNSTABLE
>> > > Email was triggered for: Unstable (Test Failures)
>> > > Sending email for trigger: Unstable (Test Failures)
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: builds-help@lucene.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: builds-help@lucene.apache.org
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
> i don't think we should intentionally allow our tests to be flaky, i
> strongly disagree.

I agree with Robert. Tests should pass. Every time a build fails it
takes some time to investigate the root cause and it is disheartening
when you discover that it's a failure that's "allowed" to happen.

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
Hi Dawid, Rob,

Sorry for the late reply!!

Yeah I agree: unit tests should strive to only and precisely fail for true
bugs, not false alarms like this. I'm sorry this failure wasted your time
Dawid!

So we should fix this failure somehow: Remove all too-big terms from all
test LineFileDocs sources (the small version in git and the large nightly
version)?; Switch this unit test to not use LineFileDocs?; Make this
exception a checked exception (not good to make all users pay the price of
the rare exception)? Change IndexWriter to silently drop these terms?

But then I don't think our testing of too-long terms, which happens to real
users, is great. We have a dedicated unit test case
(TestIndexWriter.testWickedLongTerm) which specifically confirms that the
inverted index will be OK (and throw the right exception) if you attempt to
index a massive term. But what about all our analyzers? Do they handle
too-long terms? Does TestRandomChains sometimes inject massive terms? Or
our random realistic Unicode string generation methods?

Or we can just fallback to nightly benchmarks (closest thing we have to an
"integration test"?) trying to catch such rare real-world problems that our
users might hit?

Mike McCandless

http://blog.mikemccandless.com


On Sat, Apr 23, 2022 at 2:05 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:

> > i don't think we should intentionally allow our tests to be flaky, i
> > strongly disagree.
>
> I agree with Robert. Tests should pass. Every time a build fails it
> takes some time to investigate the root cause and it is disheartening
> when you discover that it's a failure that's "allowed" to happen.
>
> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
On Tue, Apr 26, 2022 at 8:40 AM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> But then I don't think our testing of too-long terms, which happens to real users, is great. We have a dedicated unit test case (TestIndexWriter.testWickedLongTerm) which specifically confirms that the inverted index will be OK (and throw the right exception) if you attempt to index a massive term. But what about all our analyzers? Do they handle too-long terms? Does TestRandomChains sometimes inject massive terms? Or our random realistic Unicode string generation methods?
>

Analyzers typically have a "testRandomHugeStrings()" in addition to
"testRandom()". It uses huge strings but less iterations of the test
(due to time). And yes, this is the same tester-method that
TestRandomChains uses.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
On Tue, Apr 26, 2022 at 8:40 AM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> But then I don't think our testing of too-long terms, which happens to real users, is great. We have a dedicated unit test case (TestIndexWriter.testWickedLongTerm) which specifically confirms that the inverted index will be OK (and throw the right exception) if you attempt to index a massive term. But what about all our analyzers? Do they handle too-long terms? Does TestRandomChains sometimes inject massive terms? Or our random realistic Unicode string generation methods?
>

Hi Mike, I don't think this is the only unit test for indexwriter for
this situation. There is also a whole dedicated class:
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestExceedMaxTermLength.java

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
> Yeah I agree: unit tests should strive to only and precisely fail for true bugs, not false alarms like this. I'm sorry this failure wasted your time Dawid!

Time's not wasted if we decide how to proceed on this.

> So we should fix this failure somehow: Remove all too-big terms from all test LineFileDocs sources (the small version in git and the large nightly version)?; Switch this unit test to not use LineFileDocs?; Make this exception a checked exception (not good to make all users pay the price of the rare exception)? Change IndexWriter to silently drop these terms?

I've seen people indexing weird files, even binary files - this
exception does make sense to me - an insane input yields a sane
response... I don't know what the right fix should be though. If we
keep the current code as is then perhaps we can at least detect this
particular type of exception and not fail the test? Alternatively,
clean up the input data so that it doesn't have such enormous terms?

D.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
On Tue, Apr 26, 2022 at 8:47 AM Robert Muir <rcmuir@gmail.com> wrote:

Analyzers typically have a "testRandomHugeStrings()" in addition to
> "testRandom()". It uses huge strings but less iterations of the test
> (due to time). And yes, this is the same tester-method that
> TestRandomChains uses.


> Hi Mike, I don't think this is the only unit test for indexwriter for
this situation. There is also a whole dedicated class:
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestExceedMaxTermLength.java


Great points Rob! I didn't realize we had a dedicated test class for
too-long terms as well. Awesome!

I love the BaseTokenStreamTestCase.checkRandomData!! It has found so many
crazy issues over the years... it looks like it "typically" makes tokens up
to 8K (hmm sometimes 1K, depending on the specific test class) in length,
joined with a space character. Probably that is good enough, no need to
push the token length beyond IW's hard limit?

Mike McCandless

http://blog.mikemccandless.com
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
On Tue, Apr 26, 2022 at 9:27 AM Dawid Weiss <dawid.weiss@gmail.com> wrote:

> > Yeah I agree: unit tests should strive to only and precisely fail for
> true bugs, not false alarms like this. I'm sorry this failure wasted your
> time Dawid!
>
> Time's not wasted if we decide how to proceed on this.
>

+1, thanks :)


> > So we should fix this failure somehow: Remove all too-big terms from all
> test LineFileDocs sources (the small version in git and the large nightly
> version)?; Switch this unit test to not use LineFileDocs?; Make this
> exception a checked exception (not good to make all users pay the price of
> the rare exception)? Change IndexWriter to silently drop these terms?
>
> I've seen people indexing weird files, even binary files - this
> exception does make sense to me - an insane input yields a sane
> response... I don't know what the right fix should be though. If we
> keep the current code as is then perhaps we can at least detect this
> particular type of exception and not fail the test? Alternatively,
> clean up the input data so that it doesn't have such enormous terms?
>

Maybe we should make a dedicated exception class (instead of the generic
IllegalArgumentException) for this situation and catch it in this test? Or
change this test to index synthetic (randomly generated) text instead? But
all other tests that pull from LineFileDocs will also face this same risk
...

Or I'm also fine with also purging all such insanely long terms from all of
our LineFileDocs too. But I do think that's stepping away from a realistic
problem our users do sometimes encounter.

Another option is to fix the LineFileDocs.java test class to take an
optional boolean to filter out such insanely long terms, and some tests
could explicitly choose to still include them and catch the exception.

I do think it is incredible that it took soooooo many years to uncover this
lurking massive term in the nightly Wikipedia LineFileDocs!! I wonder how
many such massive terms are lurking in this file :) This is like the
search for Nessie.

Also, I wonder if we should lower this limit -- are there really users that
need to index such massive (up to ~32 KB) terms? Maybe it should be an
option on IWC, that defaults to something more sane, but apps could
increase it if they really must index incredibly long terms?

I'll open an issue and let's continue discussing there?

Mike McCandless

http://blog.mikemccandless.com
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
On Wed, Apr 27, 2022 at 11:58 AM Michael McCandless
<lucene@mikemccandless.com> wrote:
>
> Maybe we should make a dedicated exception class (instead of the generic IllegalArgumentException) for this situation and catch it in this test? Or change this test to index synthetic (randomly generated) text instead? But all other tests that pull from LineFileDocs will also face this same risk ...
>
> Or I'm also fine with also purging all such insanely long terms from all of our LineFileDocs too. But I do think that's stepping away from a realistic problem our users do sometimes encounter.
>
> Another option is to fix the LineFileDocs.java test class to take an optional boolean to filter out such insanely long terms, and some tests could explicitly choose to still include them and catch the exception.
>

I don't think we need to add an option to LineFileDocs, we should just
fix our indexing. The text to this big term starts with something like
'}}{{{{{substc|}}}{{{1' (sorry for typos).

But text doesn't get split up at all because of MockAnalyzer ("act
like whitespace with lowercasing"). Except, unlike our *REAL
ANALYZERS*, MockTokenizer has no term limits. If a user was indexing
this crap with WhitespaceTokenizer or StandardTokenizer then they
wouldn't experience this issue.

I think we should fix MockTokenizer.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
On Wed, Apr 27, 2022 at 12:34 PM Robert Muir <rcmuir@gmail.com> wrote:

> On Wed, Apr 27, 2022 at 11:58 AM Michael McCandless
> <lucene@mikemccandless.com> wrote:
> >
> > Maybe we should make a dedicated exception class (instead of the generic
> IllegalArgumentException) for this situation and catch it in this test? Or
> change this test to index synthetic (randomly generated) text instead? But
> all other tests that pull from LineFileDocs will also face this same risk
> ...
> >
> > Or I'm also fine with also purging all such insanely long terms from all
> of our LineFileDocs too. But I do think that's stepping away from a
> realistic problem our users do sometimes encounter.
> >
> > Another option is to fix the LineFileDocs.java test class to take an
> optional boolean to filter out such insanely long terms, and some tests
> could explicitly choose to still include them and catch the exception.
> >
>
> I don't think we need to add an option to LineFileDocs, we should just
> fix our indexing. The text to this big term starts with something like
> '}}{{{{{substc|}}}{{{1' (sorry for typos).
>
> But text doesn't get split up at all because of MockAnalyzer ("act
> like whitespace with lowercasing"). Except, unlike our *REAL
> ANALYZERS*, MockTokenizer has no term limits. If a user was indexing
> this crap with WhitespaceTokenizer or StandardTokenizer then they
> wouldn't experience this issue.
>
> I think we should fix MockTokenizer.
>

+1 to fix MockTokenizer!

Mike McCandless

http://blog.mikemccandless.com
Re: [JENKINS] Lucene » Lucene-NightlyTests-9.1 - Build # 42 - Unstable! [ In reply to ]
I opened https://issues.apache.org/jira/browse/LUCENE-10541 to figure out
WTF we can do about this tricky situation!!!

Thank you Dawid and Rob for trying to iterate here.

Let's continue our discussion on the issue?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Apr 27, 2022 at 12:59 PM Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, Apr 27, 2022 at 12:34 PM Robert Muir <rcmuir@gmail.com> wrote:
>
>> On Wed, Apr 27, 2022 at 11:58 AM Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>> >
>> > Maybe we should make a dedicated exception class (instead of the
>> generic IllegalArgumentException) for this situation and catch it in this
>> test? Or change this test to index synthetic (randomly generated) text
>> instead? But all other tests that pull from LineFileDocs will also face
>> this same risk ...
>> >
>> > Or I'm also fine with also purging all such insanely long terms from
>> all of our LineFileDocs too. But I do think that's stepping away from a
>> realistic problem our users do sometimes encounter.
>> >
>> > Another option is to fix the LineFileDocs.java test class to take an
>> optional boolean to filter out such insanely long terms, and some tests
>> could explicitly choose to still include them and catch the exception.
>> >
>>
>> I don't think we need to add an option to LineFileDocs, we should just
>> fix our indexing. The text to this big term starts with something like
>> '}}{{{{{substc|}}}{{{1' (sorry for typos).
>>
>> But text doesn't get split up at all because of MockAnalyzer ("act
>> like whitespace with lowercasing"). Except, unlike our *REAL
>> ANALYZERS*, MockTokenizer has no term limits. If a user was indexing
>> this crap with WhitespaceTokenizer or StandardTokenizer then they
>> wouldn't experience this issue.
>>
>> I think we should fix MockTokenizer.
>>
>
> +1 to fix MockTokenizer!
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>