TL;DR:
Two questions:
1) has anyone seen cpan/Test-Harness/t/compat/test-harness-compat.t fail
recently?
2) Does anyone know how (on Linux) a process can receive a kernel-sourced
SIGINT signal, when not caused by typing ^C on the terminal?
Long version:
I've been working on a branch (not yet pushed) which unwraps and
optimises pp_aassign() on PERL_RC_STACK builds. A couple of times I've
seen test-harness-compat.t fail. I suspect that it may be related to
changes or bugs in my branch, hence the first question: has anyone seen
this test file fail in blead recently, or is it just me? This is on a
threaded, debugging build.
I've had extreme difficulty reproducing and diagnosing this issue.
Eventually on a threaded, debugging build on my pp_aasign() branch, I got
this command line to fail the test in about on in every 100-200 runs on
average:
$ cd cpan/Test-Harness
$ time while PERL_HASH_SEED=0 PERL_HASH_SEED_DEBUG=1 ../../perl -I../../lib t/compat/test-harness-compat.t| grep -q '^ok 120'; do date; done
The symptoms are that the process does a 255 exit after test 24, with no
error indication. Reducing the test script has been impossible: the
slightest change to the script and the problem goes away. Running under
valgrind or ASAN and the problem goes away.
I eventually managed to get it to fail while under strace -f. This shows
that the main process and a child is getting killed by a kernel-generated
SIGINT. The last few lines in the trace file are:
1900436 newfstatat(AT_FDCWD, "../../lib/Scalar/Util.pm", <unfinished ...>
1900400 <... pselect6 resumed>) = ? ERESTARTNOHAND (To be restarted if no handler)
1900400 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
1900436 <... newfstatat resumed>{st_mode=S_IFREG|0444, st_size=10761, ...}, 0) = 0
1900436 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
1900436 +++ killed by SIGINT +++
1900400 +++ killed by SIGINT +++
1900400 is the main process.
Nothing in the trace file shows any process doing a kill(...). The OS log
doesn't show anything like an OOM killer killing off processes.
About the only thing I can think of is that, in some mysterious fashion
(i.e. not via kill(pid, SIGINT)), the Test::Harness test is simulating a
^C to see how it gets handled, and due to some bug or race condition or
something it's not getting caught properly.
I'm bemused.
--
Monto Blanco... scorchio!
Two questions:
1) has anyone seen cpan/Test-Harness/t/compat/test-harness-compat.t fail
recently?
2) Does anyone know how (on Linux) a process can receive a kernel-sourced
SIGINT signal, when not caused by typing ^C on the terminal?
Long version:
I've been working on a branch (not yet pushed) which unwraps and
optimises pp_aassign() on PERL_RC_STACK builds. A couple of times I've
seen test-harness-compat.t fail. I suspect that it may be related to
changes or bugs in my branch, hence the first question: has anyone seen
this test file fail in blead recently, or is it just me? This is on a
threaded, debugging build.
I've had extreme difficulty reproducing and diagnosing this issue.
Eventually on a threaded, debugging build on my pp_aasign() branch, I got
this command line to fail the test in about on in every 100-200 runs on
average:
$ cd cpan/Test-Harness
$ time while PERL_HASH_SEED=0 PERL_HASH_SEED_DEBUG=1 ../../perl -I../../lib t/compat/test-harness-compat.t| grep -q '^ok 120'; do date; done
The symptoms are that the process does a 255 exit after test 24, with no
error indication. Reducing the test script has been impossible: the
slightest change to the script and the problem goes away. Running under
valgrind or ASAN and the problem goes away.
I eventually managed to get it to fail while under strace -f. This shows
that the main process and a child is getting killed by a kernel-generated
SIGINT. The last few lines in the trace file are:
1900436 newfstatat(AT_FDCWD, "../../lib/Scalar/Util.pm", <unfinished ...>
1900400 <... pselect6 resumed>) = ? ERESTARTNOHAND (To be restarted if no handler)
1900400 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
1900436 <... newfstatat resumed>{st_mode=S_IFREG|0444, st_size=10761, ...}, 0) = 0
1900436 --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
1900436 +++ killed by SIGINT +++
1900400 +++ killed by SIGINT +++
1900400 is the main process.
Nothing in the trace file shows any process doing a kill(...). The OS log
doesn't show anything like an OOM killer killing off processes.
About the only thing I can think of is that, in some mysterious fashion
(i.e. not via kill(pid, SIGINT)), the Test::Harness test is simulating a
^C to see how it gets handled, and due to some bug or race condition or
something it's not getting caught properly.
I'm bemused.
--
Monto Blanco... scorchio!