Mailing List Archive

Fwd: Why is my `any` keyword so slow?
(oops, only sent to LeoNerd originally, copy here):

On Fri, Apr 23, 2021 at 5:11 PM Paul "LeoNerd" Evans
<leonerd@leonerd.org.uk> wrote:
> If anyone happens to find themselves with some spare time, I'd
> appreciate any assistance or suggestions of something to look into
> here, to see why my version is being so slow. Or is it the case that
> List::Util's version just really is *that* much more efficient, given
> as it uses dMULTICALL - at which point maybe we can adopt some of its
> performance tricks into perl core's way of doing grep/etc... and make
> them faster too?

One thing I noticed:

alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $_ > 5 } (1..5)'
e <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->e
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
d <@> scope sK ->7
8 <;> ex-nextstate(main 3 -e:1) v:% ->9
c <2> gt sK/2 ->d
a <1> rv2sv sK/1 ->b
9 <$> gv(*_) s ->a
b <$> const(IV 5) s ->c
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK

There's an rv2sv which should be optimized by the peephole optimizer.
After reading around a bit it sounds like since you constructed these
ops yourself, you need to call that yourself.
After some experimenting I found this worked:

+ OP **startp = &(anywhile->op_other);
+ PL_peepp(*startp);

*out = (OP *)anywhile;
return KEYWORD_PLUGIN_EXPR;

That gets:

alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $_ > 5 } (1..5)'
c <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->c
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
- <@> scope sK ->7
8 <;> ex-nextstate(main 3 -e:1) v:% ->9
b <2> gt sK/2 ->7
- <1> ex-rv2sv sK/1 ->a
9 <$> gvsv(*_) s ->a
a <$> const(IV 5) s ->b
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK

And that goes from:

Rate CORE::grep List::Keywords/any List::Util::any
CORE::grep 17772/s -- -45%
-78%
List::Keywords/any 32059/s 80% -- -60%
List::Util::any 79482/s 347% 148%

To:

Rate CORE::grep List::Keywords/any List::Util::any
CORE::grep 16205/s -- -54%
-79%
List::Keywords/any 35611/s 120% --
-53%
List::Util::any 75990/s 369%
113% --

Beyond that, your benchmark is doing this:

my $ret = any {$cmpcount{lka}++; $_ > 50 } @nums;
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $x{a}++; $_ > 5 }
(1..5)'
l <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->l
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
k <@> leave sKP ->7
8 <0> enter s ->9
9 <;> nextstate(main 3 -e:1) v:% ->a
e <1> preinc[t1] vK/1 ->f
d <2> helem sKRM/2 ->e
b <1> rv2hv sKR ->c
a <$> gv(*x) s ->b
c <$> const(PV "a") s/BARE ->d
f <;> nextstate(main 3 -e:1) v:% ->g
j <2> gt sK/2 ->k
h <1> rv2sv sK/1 ->i
g <$> gv(*_) s ->h
i <$> const(IV 5) s ->j
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK

Which, because of the op_scope() call, creates this optree:

alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $x{a}++; $_ > 5 }
(1..5)'
l <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->l
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
k <@> leave sKP ->7
8 <0> enter s ->9
9 <;> nextstate(main 3 -e:1) v:% ->a
e <1> preinc[t1] vK/1 ->f
d <2> helem sKRM/2 ->e
b <1> rv2hv sKR ->c
a <$> gv(*x) s ->b
c <$> const(PV "a") s/BARE ->d
f <;> nextstate(main 3 -e:1) v:% ->g
j <2> gt sK/2 ->k
h <1> rv2sv sK/1 ->i
g <$> gv(*_) s ->h
i <$> const(IV 5) s ->j
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK

That is, it's wrapped the block in an enter/leave pair which is adding
a lot of overhead. If you remove that so it's just any { $_ > 50 }, we
get just an op_scope() (which gets elided) and then our benchmark
looks like:

Rate CORE::grep List::Util::any List::Keywords/any
CORE::grep 57537/s -- -60%
-61%
List::Util::any 144858/s 152% --
-1%
List::Keywords/any 147016/s 156% 1% --

So in the best case, your code is faster now I think.

There's one final thing I don't quite understand that your code is
doing and maybe doesn't need to:

You call op_scope() on the block which DTRT to make sure the code gets
wrapped properly to clean things up (ENTER/LEAVE if needed, pp_scope
otherwise, etc)
But then your code is also doing:

SAVETMPS;
ENTER_with_name("any_item");

...
FREETMPS;
LEAVE_with_name("any_item");

I'm... not entirely sure these are necessary.

And removing them does give a bit of a speed boost.

If you do remove them, I think one risk is that temporaries will stack
up until the final exit from anywhile, but .. that's kinda what
List::Util::any() does anyway?

I dunno!

-- Matthew Horsfall (alh)