Mailing List Archive: Benchmarking a 'no-snails' world (was: Re: PSC #049 2022-01-07)

Benchmarking a 'no-snails' world (was: Re: PSC #049 2022-01-07)

Jan 17, 2022, 10:28 AM

Post #1 of 8 (391 views)

TL;DR: benchmarks demonstrate no performance gain is possible.

On Mon, 17 Jan 2022 09:55:34 -0500
Felipe Gasper <felipe@felipegasper.com> wrote:

> > 1) leaving @_ untouched when calling a signatured-sub (i.e. it is
> > still the @_ of the caller).
> >
> > This will have a significant performance boost, especially when
> > calling small stub functions like accessors. At the moment perl has
> > to do the equivalent of
...
> The points heretofore raised in response to this seem to be:
>
> 1) There is no viable branch currently that implements leaving @_
> untouched.
>
> 2) The performance gain has yet to be shown.
>
> I’d love to help in either of these regards, but I lack the knowledge
> to assist with #1, and #2 can’t happen without the former.

Taking a slightly-edited version of rjbs's example code from elsewhere
I get the following benchmarks on my machine. Each test was performed
several times and I tried to ignore ones that showed weird timing skew
(probably from background noise of my laptop doing other things at the
time), and have tried to select a "typical" example.

The three test functions are:

sub full { die "arity" unless @_ == 1; my ($x) = @_; return $x * $x }
sub bare { my ($x) = @_; return $x * $x }
sub sigs ($x) { return $x * $x }

First, a 5.34.0 release:

$ perl5.34.0 benchmark-entersub.pl
full: 1.6000s
bare: 1.3186s (speedup x1.21)
sigs: 1.4231s (speedup x1.12)

The signatured version is about 12% faster while performing the same
behaviour. The bare version is 21% faster than full, though lacks the
arity check.

Next up, a perl built from my discourage-defav-in-sigsub branch (this
is significantly slower than the release perl above in absolute terms,
because it's an unoptimised debug build; but ignore that):

$ ./perl -Ilib benchmark-entersub.pl
full: 9.0413s
bare: 7.0131s (speedup x1.29)
sigs: 6.4964s (speedup x1.39)

That's more in line with rjbs's original observations - bare is faster
than full (by about 29%) but signatures easily win out here, coming in
at 39% faster (and also being faster than the bare version).

Next up, an edit of a point partway on my "no-snails" branch. At this
point, I've edited the various pp_arg* functions to look in the AV
found in PAD_SVl(0) instead of GvAV(PL_defav), and I skip the
assignment to &GvAV(PL_defav) in this case. The actual code being
skipped is tiny[1] - as far as I can tell basically a single pointer
assignment; since in order to make pp_arg* work at all we still have to
copy the args to the AV found in PAD_SVl(0). As perhaps expected, this
change makes no observable difference to timing:

$ ./perl -Ilib benchmark-entersub.pl
full: 8.7698s
bare: 6.9522s (speedup x1.26)
sigs: 6.3569s (speedup x1.38)

Finally, by noticing that the example code we're benchmarking doesn't
really depend on the values it returns, I decided to break perl by
doing *even less work* than would actually be required to make the args
give the right answers, just to get an upper bound on the highest
possible speedup that could be achieved. In this broken version, I don't
set up GvAV(PL_defav), nor do I set up the AV in PAD_SVl(0). I don't
copy the arguments anywhere at all. OP_ARGELEM now can't find them and
will just return undef. I even had to stub out the contents of
pp_argcheck so it doesn't even perform an arity check. To be clear: this
version of perl is totally useless, but should be even faster than it
is possible to achieve for real, because any real perl would have to do
more work than this version:

$ ./perl -Ilib benchmark-entersub.pl
full: 8.7818s
bare: 6.9137s (speedup x1.27)
sigs: 6.7048s (speedup x1.31)

I find this result the most difficult to understand as it is very
surprising. I've made pp_entersub slower for everyone (I suspect now
because it has to make an extra conditional jump on CvSIGNATURE(cv))
but what's worse is that calling the signatured subs is only 31% faster
than the speed of the full ones (it used to be 38% faster; see above).
And all this for a broken implementation which doesn't even make the
arguments visible or do any arity checking. Adding those things back
would necessarily involve adding more code to what I currently have,
and thus slow it down further.

In conclusion:

As they stand in current bleadperl, signatured subs are already faster
to call (by a measurable > 30%) than pureperl code that performs the
same work by a snail-unpack - either with or without an additional
manually-coded arity check. This is true even considering that perl is
creating the snail (GvAV(PL_defgv)) and pad-zero (PAD_SVl(0)) AV and
copying the argument values into it. (The same AV is shared by both
places).

An edited version of perl that conditionally does not attempt to set up
the snail or pad-zero array for signatured subs does not perform any
faster than this (and indeed runs slower), even before one attempts to
add in any code that might implement passing the actual argument values
into a signatured sub.

I do not believe that it is possible to gain any performance benefit by
skipping the snail-array setup that is performed by non-signatured subs
in "legacy" perl mode.

In case folks want to attempt to replicate or extend these tests for
themselves, I have attached

benchmark-entersub.pl
- the script used to print these numbers

0001-No-setup-snail-array-or-PADSVlzero.diff
- the full set of changes from current blead, to the (broken) perl
that I used for the final benchmark

-----

Footnotes:

[1]: The code to skip assigning to GvAV(PL_defav):

diff --git a/pp_hot.c b/pp_hot.c
index 477cdd48b8..e596615743 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -5246,7 +5246,10 @@ PP(pp_entersub)

defavp = &GvAV(PL_defgv);
cx->blk_sub.savearray = *defavp;
- *defavp = MUTABLE_AV(SvREFCNT_inc_simple_NN(av));
+ if(!CvSIGNATURE(cv))
+ *defavp = MUTABLE_AV(SvREFCNT_inc_simple_NN(av));
+ else
+ SvREFCNT_inc_simple_void_NN(*defavp);

/* it's the responsibility of whoever leaves a sub to ensure
* that a clean, empty AV is left in pad[0]. This is
normally

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/