Mailing List Archive

smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t
For years we've had intermittent timeout smoke failures on two of Tie::File's
test files, 29a_upcopy.t, 29_downcopy.t.

Currently it would appear that they only fail (via test timeouts) on Arm
platform smokers. The failing test(s) vary.

The alarm timeout is per test, initially at 5 seconds; then increased by
me to 10s in Nov 2022, then to 20s for parallel builds by Yves in Feb 2023.

For comparison, on my fast Intel laptop on a debugging build,
29_downcopy.t takes 0.08s to run - that represents about 300 microseconds
per test - a test that is taking more than 20s on a smoker and timing out.

In terms of memory usage, (again on my debugging build), just loading the
test script but running no tests has a memory footprint of 12.5Mb;
running all the tests too increases that to 13.1Mb.

So it's really not using much memory or CPU. So why does the smoke fail?
I'm guessing at least one of the smokers is a Raspberry Pi based on the OS
("Raspbian GNU/Linux 10.13"). Now I'd expect an rPi to be slower, but that
much slower?

So the question really is:

is the fact that the tests running that slow reasonable, and so we should
just disable the test files on slow platforms, or is something else afoot
here?

Does anyone have an rPi they could run those test scripts on and see how
much memory and CPU those two files use?

Each failing test does something like: create a test file consisting of a
few 8k blocks, then do some line inserts, which behind the scenes triggers
a few OS calls along the lines of lseek(i); read(8k block); lseek(j);
write(8k block). But overall, not really a lot of IO activity, nor on
large files. Even with tests running in parallel I wouldn't expect the CPU
to over-burdened to that extent.


--
"Strange women lying in ponds distributing swords is no basis for a system
of government. Supreme executive power derives from a mandate from the
masses, not from some farcical aquatic ceremony."
-- Dennis, "Monty Python and the Holy Grail"
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
Hi there,

On Sun, 25 Feb 2024, Dave Mitchell wrote:

> For years we've had intermittent timeout smoke failures on two of Tie::File's
> test files, 29a_upcopy.t, 29_downcopy.t.
> ...
> ...
> is the fact that the tests running that slow reasonable, and so we should
> just disable the test files on slow platforms, or is something else afoot
> here?
>
> Does anyone have an rPi they could run those test scripts on and see how
> much memory and CPU those two files use?
> ...

You're probably looking at I/O issues - almost all Raspberry Pis will
have USB mass storage. Some (including my desktop machine) will have
e.g. networked home directories.

Here's 'make' test on my desktop using the remote home directory:

8<----------------------------------------------------------------------
raspberrypi:~/junk/Tie-File-1.07$ >>> li README
raspberrypi:~/junk/Tie-File-1.07$ >>> perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Tie::File
Writing MYMETA.yml and MYMETA.json
raspberrypi:~/junk/Tie-File-1.07$ >>> time make test
cp lib/Tie/File.pm blib/lib/Tie/File.pm
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01_gen.t ........... ok
t/02_fetchsize.t ..... ok
t/03_longfetch.t ..... ok
t/04_splice.t ........ ok
t/05_size.t .......... ok
t/06_fixrec.t ........ ok
t/07_rv_splice.t ..... ok
t/08_ro.t ............ ok
t/09_gen_rs.t ........ ok
t/10_splice_rs.t ..... ok
t/11_rv_splice_rs.t .. ok
t/12_longfetch_rs.t .. ok
t/13_size_rs.t ....... ok
t/14_lock.t .......... ok
t/15_pushpop.t ....... ok
t/16_handle.t ........ ok
t/17_misc_meth.t ..... ok
t/18_rs_fixrec.t ..... ok
t/19_cache.t ......... ok
t/20_cache_full.t .... ok
t/21_win32.t ......... skipped: (no reason given)
t/22_autochomp.t ..... ok
t/23_rv_ac_splice.t .. ok
t/24_cache_loop.t .... ok
t/25_gen_nocache.t ... ok
t/26_twrite.t ........ ok
t/27_iwrite.t ........ ok
t/28_mtwrite.t ....... ok
t/29_downcopy.t ...... ok
t/29a_upcopy.t ....... ok
t/30_defer.t ......... ok
t/31_autodefer.t ..... ok
t/32_defer_misc.t .... ok
t/33_defer_vs.t ...... ok
t/40_abs_cache.t ..... ok
t/41_heap.t .......... ok
t/42_offset.t ........ ok
t/43_synopsis.t ...... ok
All tests successful.
Files=38, Tests=4725, 500 wallclock secs ( 1.50 usr 0.37 sys + 5.25 cusr 4.75 csys = 11.87 CPU)
Result: PASS

real 8m21.944s
user 0m7.332s
sys 0m5.282s
raspberrypi:~/junk/Tie-File-1.07$ >>>
8<----------------------------------------------------------------------

Here it is again, running in a RAM fs on /tmp:

8<----------------------------------------------------------------------
raspberrypi:~/junk/ged/Tie-File-1.07$ >>> perl Makefile.PL
Checking if your kit is complete...
Looks good
Generating a Unix-style Makefile
Writing Makefile for Tie::File
Writing MYMETA.yml and MYMETA.json
raspberrypi:~/junk/ged/Tie-File-1.07$ >>> time make test
cp lib/Tie/File.pm blib/lib/Tie/File.pm
PERL_DL_NONLAZY=1 "/usr/bin/perl" "-MExtUtils::Command::MM" "-MTest::Harness" "-e" "undef *Test::Harness::Switches; test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/01_gen.t ........... ok
t/02_fetchsize.t ..... ok
t/03_longfetch.t ..... ok
t/04_splice.t ........ ok
t/05_size.t .......... ok
t/06_fixrec.t ........ ok
t/07_rv_splice.t ..... ok
t/08_ro.t ............ ok
t/09_gen_rs.t ........ ok
t/10_splice_rs.t ..... ok
t/11_rv_splice_rs.t .. ok
t/12_longfetch_rs.t .. ok
t/13_size_rs.t ....... ok
t/14_lock.t .......... ok
t/15_pushpop.t ....... ok
t/16_handle.t ........ ok
t/17_misc_meth.t ..... ok
t/18_rs_fixrec.t ..... ok
t/19_cache.t ......... ok
t/20_cache_full.t .... ok
t/21_win32.t ......... skipped: (no reason given)
t/22_autochomp.t ..... ok
t/23_rv_ac_splice.t .. ok
t/24_cache_loop.t .... ok
t/25_gen_nocache.t ... ok
t/26_twrite.t ........ ok
t/27_iwrite.t ........ ok
t/28_mtwrite.t ....... ok
t/29_downcopy.t ...... ok
t/29a_upcopy.t ....... ok
t/30_defer.t ......... ok
t/31_autodefer.t ..... ok
t/32_defer_misc.t .... ok
t/33_defer_vs.t ...... ok
t/40_abs_cache.t ..... ok
t/41_heap.t .......... ok
t/42_offset.t ........ ok
t/43_synopsis.t ...... ok
All tests successful.
Files=38, Tests=4725, 5 wallclock secs ( 1.01 usr 0.20 sys + 3.65 cusr 1.25 csys = 6.11 CPU)
Result: PASS

real 0m5.967s
user 0m5.192s
sys 0m1.597s
raspberrypi:~/junk/ged/Tie-File-1.07$ >>>
8<----------------------------------------------------------------------

The really slow test in the first run was in fact 28, not 29. After
getting bored waiting for 28, I missed 29 altogether. Test 27 was
pretty slow too.

Please let me know if I can do anything else to help, I'd be very glad
to run more tests/benchmarks/logging/whatever if you need them.

--

73,
Ged.
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
On Sun, Feb 25, 2024 at 03:00:06PM +0000, G.W. Haywood wrote:
> You're probably looking at I/O issues - almost all Raspberry Pis will
> have USB mass storage. Some (including my desktop machine) will have
> e.g. networked home directories.

Ah, interesting.

> Please let me know if I can do anything else to help, I'd be very glad
> to run more tests/benchmarks/logging/whatever if you need them.

Would be interesting to the output from /usr/bin/time for various runs:

From the perl build directory:

$ /usr/bin/time ./perl -Ilib -e 1
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 7192maxresident)k
0inputs+0outputs (0major+235minor)pagefaults 0swaps


$ /usr/bin/time ./perl -Ilib -MTie::File -e 1
0.03user 0.00system 0:00.03elapsed 94%CPU (0avgtext+0avgdata 12236maxresident)k
0inputs+0outputs (0major+1062minor)pagefaults 0swaps


$ /usr/bin/time ./perl -Ilib dist/Tie-File/t/29_downcopy.t
....
0.10user 0.03system 0:00.53elapsed 26%CPU (0avgtext+0avgdata 13192maxresident)k
0inputs+47048outputs (0major+1327minor)pagefaults 0swaps


$ /usr/bin/time ./perl -Ilib dist/Tie-File/t/29a_upcopy.t
....
0.03user 0.00system 0:00.04elapsed 91%CPU (0avgtext+0avgdata 12572maxresident)k
0inputs+5408outputs (0major+1156minor)pagefaults 0swaps



--
Overhead, without any fuss, the stars were going out.
-- Arthur C Clarke
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
Hi there,

On Sun, 25 Feb 2024, Dave Mitchell wrote:

> Would be interesting to the output from /usr/bin/time for various runs:
>
>> From the perl build directory:
>
> $ /usr/bin/time ./perl -Ilib -e 1
> 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 7192maxresident)k
> 0inputs+0outputs (0major+235minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib -e 1
0.00user 0.00system 0:00.00elapsed 85%CPU (0avgtext+0avgdata 4460maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps

> $ /usr/bin/time ./perl -Ilib -MTie::File -e 1
> 0.03user 0.00system 0:00.03elapsed 94%CPU (0avgtext+0avgdata 12236maxresident)k
> 0inputs+0outputs (0major+1062minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib -MTie::File -e 1
0.06user 0.00system 0:00.07elapsed 90%CPU (0avgtext+0avgdata 7972maxresident)k
0inputs+0outputs (0major+939minor)pagefaults 0swaps

> $ /usr/bin/time ./perl -Ilib dist/Tie-File/t/29_downcopy.t
> ....
> 0.10user 0.03system 0:00.53elapsed 26%CPU (0avgtext+0avgdata 13192maxresident)k
> 0inputs+47048outputs (0major+1327minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib dist/Tie-File/t/29_downcopy.t
1..718
ok 1
...
...
ok 718
0.42user 0.47system 0:55.79elapsed 1%CPU (0avgtext+0avgdata 8972maxresident)k
0inputs+47000outputs (0major+1212minor)pagefaults 0swaps

> $ /usr/bin/time ./perl -Ilib dist/Tie-File/t/29a_upcopy.t
> ....
> 0.03user 0.00system 0:00.04elapsed 91%CPU (0avgtext+0avgdata 12572maxresident)k
> 0inputs+5408outputs (0major+1156minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib dist/Tie-File/t/29a_upcopy.t
1..55
ok 1
...
...
ok 55 - try(20960, 200, undef)
0.07user 0.08system 0:05.43elapsed 2%CPU (0avgtext+0avgdata 8344maxresident)k
0inputs+5344outputs (0major+1032minor)pagefaults 0swaps

FWIW building perl on the Pi takes about twenty minutes, on the 2.7GHz
four core i7 laptop which supplies the Pi's home directory, about 8.

HTH

--

73,
Ged.
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
Oh, bother, copy to list:

Hi there,

On Sun, 25 Feb 2024, Dave Mitchell wrote:

> Would be interesting to the output from /usr/bin/time for various runs:
>
> From the perl build directory:
>
> $ /usr/bin/time ./perl -Ilib -e 1
> 0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 7192maxresident)k
> 0inputs+0outputs (0major+235minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib -e 1
0.00user 0.00system 0:00.00elapsed 85%CPU (0avgtext+0avgdata 4460maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps

> $ /usr/bin/time ./perl -Ilib -MTie::File -e 1
> 0.03user 0.00system 0:00.03elapsed 94%CPU (0avgtext+0avgdata 12236maxresident)k
> 0inputs+0outputs (0major+1062minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib -MTie::File -e 1
0.06user 0.00system 0:00.07elapsed 90%CPU (0avgtext+0avgdata
7972maxresident)k
0inputs+0outputs (0major+939minor)pagefaults 0swaps

> $ /usr/bin/time ./perl -Ilib dist/Tie-File/t/29_downcopy.t
> ....
> 0.10user 0.03system 0:00.53elapsed 26%CPU (0avgtext+0avgdata 13192maxresident)k
> 0inputs+47048outputs (0major+1327minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib dist/Tie-File/t/29_downcopy.t
1..718
ok 1
...
...
ok 718
0.42user 0.47system 0:55.79elapsed 1%CPU (0avgtext+0avgdata 8972maxresident)k
0inputs+47000outputs (0major+1212minor)pagefaults 0swaps

> $ /usr/bin/time ./perl -Ilib dist/Tie-File/t/29a_upcopy.t
> ....
> 0.03user 0.00system 0:00.04elapsed 91%CPU (0avgtext+0avgdata 12572maxresident)k
> 0inputs+5408outputs (0major+1156minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib dist/Tie-File/t/29a_upcopy.t
1..55
ok 1
...
...
ok 55 - try(20960, 200, undef)
0.07user 0.08system 0:05.43elapsed 2%CPU (0avgtext+0avgdata 8344maxresident)k
0inputs+5344outputs (0major+1032minor)pagefaults 0swaps

FWIW building perl on the Pi takes about twenty minutes, on the 2.7GHz
four core i7 laptop which supplies the Pi's home directory, about 8.

HTH

--
73,
Ged.
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
On Mon, Feb 26, 2024 at 09:26:07AM +0000, G.W. Haywood wrote:
> > Would be interesting to the output from /usr/bin/time for various runs:

Thanks for the results.

They show very long elapsed times with low CPU, so highly I/O bound as you
suggested. I've just created this PR:

https://github.com/Perl/perl5/pull/22031

which updates those two test files to use the OS's temp dir rather than
the current dir. I would be interested to see how their elapsed time
change on your system, and whether your /tmp is mounted as tmpfs or
similar.

--
Red sky at night - gerroff my land!
Red sky at morning - gerroff my land!
-- old farmers' sayings #14
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
Hi Dave,

On Mon, 26 Feb 2024, Dave Mitchell wrote:

> ... I would be interested to see how their elapsed time change on
> your system ...

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib dist/Tie-File/t/29_downcopy.t
1..718
ok 1
...
...
ok 718
0.60user 0.23system 0:00.91elapsed 92%CPU (0avgtext+0avgdata 11400maxresident)k
0inputs+0outputs (0major+1857minor)pagefaults 0swaps

raspberrypi:~/Downloads/src/perl/perl-5.39.8$ >>> /usr/bin/time ./perl -Ilib dist/Tie-File/t/29a_upcopy.t
1..55
...
...
ok 55 - try(20960, 200, undef)
0.13user 0.09system 0:00.28elapsed 79%CPU (0avgtext+0avgdata 10756maxresident)k
0inputs+0outputs (0major+1593minor)pagefaults 0swaps

> and whether your /tmp is mounted as tmpfs or similar.

raspberrypi:~$ >>> mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,relatime)
tmpfs on /var/tmp type tmpfs (rw,nosuid,nodev,relatime)

This Pi has 8GB of RAM.

Incidentally your patches didn't apply cleanly, I had to wing it.
Same issue for both 29 and 29a.

Patch for 29_downcopy.t:
8<----------------------------------------------------------------------
@@ -256,8 +264,9 @@ sub try0 {
map { defined $_ ? $_ : 'undef' }
$pos, $len, $newlen, $FLEN, $line;

- open F, '>', $file or die "Couldn't open file $file: $!";
- binmode F;
+ my ($fh, $file) = File::Temp::tempfile("29-XXXXX", DIR => $tempdir);
+
+ binmode $fh;
8<----------------------------------------------------------------------

Expected:
8<----------------------------------------------------------------------
@@ -241,8 +249,9 @@

sub try {
my ($pos, $len, $newlen) = @_;
- open F, '>', $file or die "Couldn't open file $file: $!";
- binmode F;
+ my ($fh, $file) = File::Temp::tempfile("29-XXXXX", DIR => $tempdir);
+
+ binmode $fh;
8<----------------------------------------------------------------------

--

73,
Ged.
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
On Mon, Feb 26, 2024 at 03:08:16PM +0000, G.W. Haywood wrote:
> > ... I would be interested to see how their elapsed time change on
> > your system ...

Thanks for that.

So if I'm reading right, the elapsed time for 29_downcopy.t has reduced
from 55.29s to 0.91s by the combination of: switching to File::Temp, and
/tmp being tmpfs !

So it looks like this might have fixed the smoking issue.

> Incidentally your patches didn't apply cleanly, I had to wing it.
> Same issue for both 29 and 29a.

Both files were modified in blead by me a few commits ago - perhaps your
bleed wasn't quite up to date?

--
Never do today what you can put off till tomorrow.
Re: smoke failures on Tie::File's 29a_upcopy.t, 29_downcopy.t [ In reply to ]
Hi there,

On Mon, 26 Feb 2024, Dave Mitchell wrote:

> So if I'm reading right, the elapsed time for 29_downcopy.t has reduced
> from 55.29s to 0.91s by the combination of: switching to File::Temp, and
> /tmp being tmpfs !

Yup.

> Both files were modified in blead by me a few commits ago - perhaps your
> bleed wasn't quite up to date?

Right, I thought it would be something like that.

--

73,
Ged.