Mailing List Archive

Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 215 - Unstable!
This failure is due to a suite timeout.

java.lang.Exception: Test abandoned because suite timeout was reached.


On Fri, Mar 19, 2021 at 8:08 AM Apache Jenkins Server <
jenkins@builds.apache.org> wrote:

> Build:
> https://ci-builds.apache.org/job/Lucene/job/Lucene-NightlyTests-main/215/
>
> 2 tests failed.
> FAILED:
> org.apache.lucene.index.TestDocumentsWriterStallControl.testAcquireReleaseRace
>
> Error Message:
> java.lang.Exception: Test abandoned because suite timeout was reached.
>
> Stack Trace:
> java.lang.Exception: Test abandoned because suite timeout was reached.
> at __randomizedtesting.SeedInfo.seed([C412FA486B53A28D]:0)
>
>
> FAILED:
> org.apache.lucene.index.TestDocumentsWriterStallControl.classMethod
>
> Error Message:
> java.lang.Exception: Suite timeout exceeded (>= 7200000 msec).
>
> Stack Trace:
> java.lang.Exception: Suite timeout exceeded (>= 7200000 msec).
> at __randomizedtesting.SeedInfo.seed([C412FA486B53A28D]:0)
>
>
>
>
> Build Log:
> [...truncated 737 lines...]
> ERROR: The following test(s) have failed:
> -
> org.apache.lucene.index.TestDocumentsWriterStallControl.testAcquireReleaseRace
> (:lucene:core)
> Test output:
> /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestDocumentsWriterStallControl.txt
> Reproduce with: gradlew :lucene:core:test --tests
> "org.apache.lucene.index.TestDocumentsWriterStallControl.testAcquireReleaseRace"
> -Ptests.jvms=4 -Ptests.haltonfailure=false
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=C412FA486B53A28D
> -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false
> -Ptests.file.encoding=ISO-8859-1
> -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt
>
> - org.apache.lucene.index.TestDocumentsWriterStallControl.classMethod
> (:lucene:core)
> Test output:
> /home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/checkout/lucene/core/build/test-results/test/outputs/OUTPUT-org.apache.lucene.index.TestDocumentsWriterStallControl.txt
> Reproduce with: gradlew :lucene:core:test --tests
> "org.apache.lucene.index.TestDocumentsWriterStallControl.classMethod"
> -Ptests.jvms=4 -Ptests.haltonfailure=false
> -Ptests.jvmargs=-XX:TieredStopAtLevel=1 -Ptests.seed=C412FA486B53A28D
> -Ptests.multiplier=2 -Ptests.nightly=true -Ptests.badapples=false
> -Ptests.file.encoding=ISO-8859-1
> -Ptests.linedocsfile=/home/jenkins/jenkins-slave/workspace/Lucene/Lucene-NightlyTests-main/test-data/enwiki.random.lines.txt
>
>
> Deprecated Gradle features were used in this build, making it incompatible
> with Gradle 7.0.
> Use '--warning-mode all' to show the individual deprecation warnings.
> See
> https://docs.gradle.org/6.6.1/userguide/command_line_interface.html#sec:command_line_warnings
>
> BUILD SUCCESSFUL in 4h 43m 18s
> 192 actionable tasks: 192 executed
> Build step 'Invoke Gradle script' changed build result to SUCCESS
> Archiving artifacts
> java.lang.InterruptedException: no matches found within 10000
> at hudson.FilePath$ValidateAntFileMask.hasMatch(FilePath.java:3066)
> at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2945)
> at hudson.FilePath$ValidateAntFileMask.invoke(FilePath.java:2926)
> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3313)
> Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to lucene2
> at
> hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1800)
> at
> hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357)
> at hudson.remoting.Channel.call(Channel.java:1001)
> at hudson.FilePath.act(FilePath.java:1157)
> at hudson.FilePath.act(FilePath.java:1146)
> at hudson.FilePath.validateAntFileMask(FilePath.java:2924)
> at
> hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:270)
> at
> hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:80)
> at
> hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:753)
> at hudson.model.Build$BuildExecution.post2(Build.java:177)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:698)
> at hudson.model.Run.execute(Run.java:1932)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at
> hudson.model.ResourceController.execute(ResourceController.java:97)
> at hudson.model.Executor.run(Executor.java:429)
> Caused: hudson.FilePath$TunneledInterruptedException
> at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3315)
> at hudson.remoting.UserRequest.perform(UserRequest.java:211)
> at hudson.remoting.UserRequest.perform(UserRequest.java:54)
> at hudson.remoting.Request$2.run(Request.java:375)
> at
> hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:73)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused: java.lang.InterruptedException: java.lang.InterruptedException: no
> matches found within 10000
> at hudson.FilePath.act(FilePath.java:1159)
> at hudson.FilePath.act(FilePath.java:1146)
> at hudson.FilePath.validateAntFileMask(FilePath.java:2924)
> at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:270)
> at
> hudson.tasks.BuildStepCompatibilityLayer.perform(BuildStepCompatibilityLayer.java:80)
> at
> hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:804)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:753)
> at hudson.model.Build$BuildExecution.post2(Build.java:177)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:698)
> at hudson.model.Run.execute(Run.java:1932)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at
> hudson.model.ResourceController.execute(ResourceController.java:97)
> at hudson.model.Executor.run(Executor.java:429)
> No artifacts found that match the file pattern
> "**/*.events,heapdumps/**,**/hs_err_pid*". Configuration error?
> Recording test results
> [Checks API] No suitable checks publisher found.
> Build step 'Publish JUnit test result report' changed build result to
> UNSTABLE
> Email was triggered for: Unstable (Test Failures)
> Sending email for trigger: Unstable (Test Failures)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: builds-unsubscribe@lucene.apache.org
> For additional commands, e-mail: builds-help@lucene.apache.org
Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 215 - Unstable! [ In reply to ]
On Fri, Mar 19, 2021 at 4:20 PM Dawid Weiss <dawid.weiss@gmail.com> wrote:

>
> This failure is due to a suite timeout.
>
> java.lang.Exception: Test abandoned because suite timeout was reached.
>
>
>
This test isn't normally in the top slow nightly ones:
https://issues.apache.org/jira/browse/LUCENE-9840

It is actually an actual unit test, it doesn't create actual indexwriters
with slow codecs and index tons of documents or anything.

I'd say good chance it's a deadlock? Need to remember how to get that
special linedocs file to try to reproduce it.
Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 215 - Unstable! [ In reply to ]
On Fri, Mar 19, 2021 at 5:12 PM Chris Hostetter <hossman_lucene@fucit.org>
wrote:

>
> : I did have to tweak the "reproduce with:" line to reproduce it though...
> : maybe because it was a timeout?
> :
> : It said: --tests
> : "org.apache.lucene.index.TestDocumentsWriterStallControl.classMethod"
> : But it should be: --tests
> :
> "org.apache.lucene.index.TestDocumentsWriterStallControl.testAcquireReleaseRace"
> : ?
> : Or maybe no test method at all since we don't know who is truly to blame?
>
> There are two failures reported, w/two reproduce lines in the output. One
> is testAcquireReleaseRace, the other is 'classMethod' -- IIUC classMethod
> is the generic way the gradle test runner reports that something failed at
> the suite (or static class init?) level ... like an AfterClass method, or
> as Dawid said: a suite timeout (which i think is inforced via a ClassRule?
> ... not sure ...)
>
> I think that in the case of a suite timeout you'll always see that test +
> 'classMethod' failure pattern -- unless the test finishes in a timely
> manner and it's some AfterClass cleanup that "hangs"
>
>
>
In this case, the test class has 3 methods and no
before/afterclass/setup/teardown. So I think it is either that
acquireReleaseRace provokes a deadlock, or it is a problem in
test-framework.

But you are correct, there are two "reproduce with"'s (class and test
level), and i just lazily grabbed one of them, it happened to be the class
level one. But IMO it should still work, instead of "classMethod", just
make the "reproduce with" print no method at all. it should just run all 3
test methods in this case?
Re: [JENKINS] Lucene » Lucene-NightlyTests-main - Build # 215 - Unstable! [ In reply to ]
> It said: --tests
> "org.apache.lucene.index.TestDocumentsWriterStallControl.classMethod"
> But it should be: --tests
> "org.apache.lucene.index.TestDocumentsWriterStallControl.testAcquireReleaseRace"
> ?
> Or maybe no test method at all since we don't know who is truly to blame?
>

The reasons are more cleanly stated in the exception that is printed on the
console (not in the summary):

org.apache.lucene.index.TestDocumentsWriterStallControl >
testAcquireReleaseRace FAILED
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([C412FA486B53A28D]:0)

org.apache.lucene.index.TestDocumentsWriterStallControl > classMethod FAILED
java.lang.Exception: Suite timeout exceeded (>= 7200000 msec).
at __randomizedtesting.SeedInfo.seed([C412FA486B53A28D]:0)

The first exception happened in the test (and it's a faux exception
substituted by the runner when the suite timed out); the second is thrown
from class-level (rule).

I agree the message could be improved but these corner cases (timeouts
outside of the normal test code control flow) are hard because they're not
anticipated by the regular junit architecture.

This aside, I think we should include gradle tasks that download those test
data sets. It'd make life easier.

D.