Mailing List Archive

Optimised lexical assignments / faster macroöps
A question in two parts:
1. A specific question,
2. ... which leads on to a broader generalisation

-----

I've been looking lately into the details of how the optree is formed
around regular assignments into lexical variables. Assignments into
scalar lexicals aren't too bad, they look like:

my $lex = (some RHS scalar value);

OP_SASSIGN=BINOP:
(whatever ops for the RHS value)
OP_PADSV [targ = $lex + OPf_REF|OPf_MOD flags]

assignments into arrays or hashes look a little bit more complex:

my @lex = (some RHS list value);

OP_AASSIGN=BINOP:
OP_LIST
OP_PUSHMARK
(whatever ops for the RHS value)
OP_LIST
OP_PUSHMARK
OP_PADAV [targ = $lex + OPf_REF|OPf_MOD flags]

with similar for hashes except using OP_PADHV

I can't help thinking that this puts quite a bit of churn on the value
stack, and also the markstack in the list assignment case. I wonder
whether in these relatively-common cases, it might make sense to
peephole-optimise a set of three specialised ops that use op_targ to
store the pad index of a variable to be assigned into; thus turning
these cases into:

OP_PADSV_STORE=UNOP [targ = $lex]
(whatever ops for the RHS value)

OP_PADAV_STORE=LISTOP [targ = @lex]
OP_PUSHMARK
(whatever ops for RHS value)

(plus OP_PADHV_STORE which would look similar)

To this end I might have a go at making a little CPAN module for doing
this.

It would also be useful to measure whether it actually makes any
performance benefit. If so it might become a useful core addition.

-----

Except, now this leads me onto the larger question. There's nothing
*particularly* specific to lexical assignments about this. I'm sure
similar optimisations could be argued about for many other situations
in core perl.

Right now, we have a few bits of core already that do things like this;
e.g. OP_MULTICONCAT or OP_AELEMFAST_LEX, which are just high-speed
optimisations of common optree shapes. They take the observation that
running a few, larger ops ends up being faster overall than lots of
small ones.

It's a nice pattern - I've already written a module for doing similar
things to MULTICONCAT with maths operations:

https://metacpan.org/pod/Faster::Maths

This is also loosely inspired by Zefram's

https://metacpan.org/pod/Devel::GoFaster

I wonder if there is scope for creating a general way to do this sort
of thing, and a way to measure the performance boost you get from doing
that in any reasonable workflow, to judge whether it's worth doing that.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Optimised lexical assignments / faster macroöps [ In reply to ]
2021-10-6? 20:18 Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:

> A question in two parts:
> 1. A specific question,
> 2. ... which leads on to a broader generalisation
>
> -----
>
> I've been looking lately into the details of how the optree is formed
> around regular assignments into lexical variables. Assignments into
> scalar lexicals aren't too bad, they look like:
>
> my $lex = (some RHS scalar value);
>
> OP_SASSIGN=BINOP:
> (whatever ops for the RHS value)
> OP_PADSV [targ = $lex + OPf_REF|OPf_MOD flags]
>
> assignments into arrays or hashes look a little bit more complex:
>
> my @lex = (some RHS list value);
>
> OP_AASSIGN=BINOP:
> OP_LIST
> OP_PUSHMARK
> (whatever ops for the RHS value)
> OP_LIST
> OP_PUSHMARK
> OP_PADAV [targ = $lex + OPf_REF|OPf_MOD flags]
>
> with similar for hashes except using OP_PADHV
>
> I can't help thinking that this puts quite a bit of churn on the value
> stack, and also the markstack in the list assignment case. I wonder
> whether in these relatively-common cases, it might make sense to
> peephole-optimise a set of three specialised ops that use op_targ to
> store the pad index of a variable to be assigned into; thus turning
> these cases into:
>
> OP_PADSV_STORE=UNOP [targ = $lex]
> (whatever ops for the RHS value)
>
> OP_PADAV_STORE=LISTOP [targ = @lex]
> OP_PUSHMARK
> (whatever ops for RHS value)
>
> (plus OP_PADHV_STORE which would look similar)
>
> To this end I might have a go at making a little CPAN module for doing
> this.
>
> It would also be useful to measure whether it actually makes any
> performance benefit. If so it might become a useful core addition.
>
> -----
>
> Except, now this leads me onto the larger question. There's nothing
> *particularly* specific to lexical assignments about this. I'm sure
> similar optimisations could be argued about for many other situations
> in core perl.
>
> Right now, we have a few bits of core already that do things like this;
> e.g. OP_MULTICONCAT or OP_AELEMFAST_LEX, which are just high-speed
> optimisations of common optree shapes. They take the observation that
> running a few, larger ops ends up being faster overall than lots of
> small ones.
>
> It's a nice pattern - I've already written a module for doing similar
> things to MULTICONCAT with maths operations:
>
> https://metacpan.org/pod/Faster::Maths
>
> This is also loosely inspired by Zefram's
>
> https://metacpan.org/pod/Devel::GoFaster
>
> I wonder if there is scope for creating a general way to do this sort
> of thing, and a way to measure the performance boost you get from doing
> that in any reasonable workflow, to judge whether it's worth doing that.
>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/


Sorry, this topic is too difficult for me.

It is good to talk to the core team members who know the details of the op
tree.
Re: Optimised lexical assignments / faster macroöps [ In reply to ]
On Wed, 6 Oct 2021 12:18:09 +0100
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> Except, now this leads me onto the larger question. There's nothing
> *particularly* specific to lexical assignments about this. I'm sure
> similar optimisations could be argued about for many other situations
> in core perl.

A few other situations that come to mind:

* Giving OP_CONST the OA_TARGMY flag; so situations like

$idx = 1;

Could optimise away the OP_PADSV and OP_SASSIGN into a single
void-context OP_CONST with a TARG set in it.

* Permitting (somehow? - no idea how) OA_TARGMY to also apply in
LVINTRO situations, meaning that every case where OA_TARGMY might
apply would also apply in situations like

my $idx = 1;

* Extending the "op_targ" concept into the incoming argument. I
notice that a few ops currently abuse that field, which is supposed
to be about the "target" (i.e. output destination) of an op, to
really encode where its source argument comes from. A few ops have
the OPf_STACKED flag optionally set, and if absent it means they
take their incoming argument from the pad in op_targ.

This would add a new field to every UNOP to store the pad offset
of its incoming argument, and then optimise every UNOP whose
op_first is a PADSV into having no KID and putting the pad offset
in that field instead. I suspect this would apply in many more
cases in real code, than the current OA_TARGMY optimisation does
now, and would give benefits to optree size and execution time, at
the small cost of memory for every UNOP. This one would need some
more in-depth benchmarking to make a case justifying it, but if we
had some good benchmarking test cases it could show a good win.

> I wonder if there is scope for creating a general way to do this sort
> of thing, and a way to measure the performance boost you get from
> doing that in any reasonable workflow, to judge whether it's worth
> doing that.

So yes I want to reïterate this part of the question here. I can easily
write code that "eh, might be better" but I'd want to have real-life
test cases to measure with benchmarks to demonstrate "this change makes
real programs X% faster", for some X, so we can evaluate the magnitude
of the improvement, verses the additional code complexity costs.

I would really like people to supply some interesting test cases. Right
now for my Faster::Maths module, I have a single, small script that
attempts to generate Julia fractals, to demonstrate that the module
makes it run about 30% faster:

https://metacpan.org/release/PEVANS/Faster-Maths-0.02/source/t/95benchmark.t

In summary:

If you want perl to run faster, send me examples of your slow
programs so I can optimise them :)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
RE: Optimised lexical assignments / faster macroöps [ In reply to ]
macroöps:
Notice, that only few human languages consider ö as "o" which isn't a part of diphthong.
German language (and influenced ones, like Estonian) consider this as a vowel similar to o but pronounced differently.

How will you read the word öö, which is a normal Estinian word? Non-exotic one, it translates as night.

Russian language have ? which is read differently compared to Mo?t
Ukrainian language has ? which appears to be in the name of this country,"???????".

Also search engines could misinterpret your ö.

I think no one is going to read macroops as "macru:ps"

Mine 2 roubles on that.


Internal Use - Confidential

-----Original Message-----
From: Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
Sent: Thursday, December 2, 2021 6:56 PM
To: perl5-porters@perl.org
Subject: Re: Optimised lexical assignments / faster macroöps


[EXTERNAL EMAIL]

On Wed, 6 Oct 2021 12:18:09 +0100
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> Except, now this leads me onto the larger question. There's nothing
> *particularly* specific to lexical assignments about this. I'm sure
> similar optimisations could be argued about for many other situations
> in core perl.

A few other situations that come to mind:

* Giving OP_CONST the OA_TARGMY flag; so situations like

$idx = 1;

Could optimise away the OP_PADSV and OP_SASSIGN into a single
void-context OP_CONST with a TARG set in it.

* Permitting (somehow? - no idea how) OA_TARGMY to also apply in
LVINTRO situations, meaning that every case where OA_TARGMY might
apply would also apply in situations like

my $idx = 1;

* Extending the "op_targ" concept into the incoming argument. I
notice that a few ops currently abuse that field, which is supposed
to be about the "target" (i.e. output destination) of an op, to
really encode where its source argument comes from. A few ops have
the OPf_STACKED flag optionally set, and if absent it means they
take their incoming argument from the pad in op_targ.

This would add a new field to every UNOP to store the pad offset
of its incoming argument, and then optimise every UNOP whose
op_first is a PADSV into having no KID and putting the pad offset
in that field instead. I suspect this would apply in many more
cases in real code, than the current OA_TARGMY optimisation does
now, and would give benefits to optree size and execution time, at
the small cost of memory for every UNOP. This one would need some
more in-depth benchmarking to make a case justifying it, but if we
had some good benchmarking test cases it could show a good win.

> I wonder if there is scope for creating a general way to do this sort
> of thing, and a way to measure the performance boost you get from
> doing that in any reasonable workflow, to judge whether it's worth
> doing that.

So yes I want to reïterate this part of the question here. I can easily write code that "eh, might be better" but I'd want to have real-life test cases to measure with benchmarks to demonstrate "this change makes real programs X% faster", for some X, so we can evaluate the magnitude of the improvement, verses the additional code complexity costs.

I would really like people to supply some interesting test cases. Right now for my Faster::Maths module, I have a single, small script that attempts to generate Julia fractals, to demonstrate that the module makes it run about 30% faster:

https://urldefense.com/v3/__https://metacpan.org/release/PEVANS/Faster-Maths-0.02/source/t/95benchmark.t__;!!LpKI!3AlX7-pZLESDone77VwMHaWzhuowvx63J8ncB4F0ErVS8aFybo1qJ6BOS4cGxPLSDoOH$ [metacpan[.]org]

In summary:

If you want perl to run faster, send me examples of your slow
programs so I can optimise them :)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://urldefense.com/v3/__https://metacpan.org/author/PEVANS__;!!LpKI!3AlX7-pZLESDone77VwMHaWzhuowvx63J8ncB4F0ErVS8aFybo1qJ6BOS4cGxFw-VbD6$ [metacpan[.]org]
https://urldefense.com/v3/__http://www.leonerd.org.uk/__;!!LpKI!3AlX7-pZLESDone77VwMHaWzhuowvx63J8ncB4F0ErVS8aFybo1qJ6BOS4cGxPsfAP6h$ [leonerd[.]org[.]uk] | https://urldefense.com/v3/__https://www.tindie.com/stores/leonerd/__;!!LpKI!3AlX7-pZLESDone77VwMHaWzhuowvx63J8ncB4F0ErVS8aFybo1qJ6BOS4cGxJ_vXXIs$ [tindie[.]com]
Re: Optimised lexical assignments / faster macro?ps [ In reply to ]
On Thu, Dec 02, 2021 at 03:55:56PM +0000, Paul "LeoNerd" Evans wrote:
> I would really like people to supply some interesting test cases. Right
> now for my Faster::Maths module, I have a single, small script that
> attempts to generate Julia fractals, to demonstrate that the module
> makes it run about 30% faster:
>
> https://metacpan.org/release/PEVANS/Faster-Maths-0.02/source/t/95benchmark.t

Simply adding float only versions of the arithmetic OPs gives you most
of that performance improvement, see:

https://github.com/Perl/perl5/pull/17831

Tony