(oops, only sent to LeoNerd originally, copy here):
On Fri, Apr 23, 2021 at 5:11 PM Paul "LeoNerd" Evans
<leonerd@leonerd.org.uk> wrote:
> If anyone happens to find themselves with some spare time, I'd
> appreciate any assistance or suggestions of something to look into
> here, to see why my version is being so slow. Or is it the case that
> List::Util's version just really is *that* much more efficient, given
> as it uses dMULTICALL - at which point maybe we can adopt some of its
> performance tricks into perl core's way of doing grep/etc... and make
> them faster too?
One thing I noticed:
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $_ > 5 } (1..5)'
e <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->e
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
d <@> scope sK ->7
8 <;> ex-nextstate(main 3 -e:1) v:% ->9
c <2> gt sK/2 ->d
a <1> rv2sv sK/1 ->b
9 <$> gv(*_) s ->a
b <$> const(IV 5) s ->c
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
There's an rv2sv which should be optimized by the peephole optimizer.
After reading around a bit it sounds like since you constructed these
ops yourself, you need to call that yourself.
After some experimenting I found this worked:
+ OP **startp = &(anywhile->op_other);
+ PL_peepp(*startp);
*out = (OP *)anywhile;
return KEYWORD_PLUGIN_EXPR;
That gets:
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $_ > 5 } (1..5)'
c <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->c
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
- <@> scope sK ->7
8 <;> ex-nextstate(main 3 -e:1) v:% ->9
b <2> gt sK/2 ->7
- <1> ex-rv2sv sK/1 ->a
9 <$> gvsv(*_) s ->a
a <$> const(IV 5) s ->b
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
And that goes from:
Rate CORE::grep List::Keywords/any List::Util::any
CORE::grep 17772/s -- -45%
-78%
List::Keywords/any 32059/s 80% -- -60%
List::Util::any 79482/s 347% 148%
To:
Rate CORE::grep List::Keywords/any List::Util::any
CORE::grep 16205/s -- -54%
-79%
List::Keywords/any 35611/s 120% --
-53%
List::Util::any 75990/s 369%
113% --
Beyond that, your benchmark is doing this:
my $ret = any {$cmpcount{lka}++; $_ > 50 } @nums;
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $x{a}++; $_ > 5 }
(1..5)'
l <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->l
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
k <@> leave sKP ->7
8 <0> enter s ->9
9 <;> nextstate(main 3 -e:1) v:% ->a
e <1> preinc[t1] vK/1 ->f
d <2> helem sKRM/2 ->e
b <1> rv2hv sKR ->c
a <$> gv(*x) s ->b
c <$> const(PV "a") s/BARE ->d
f <;> nextstate(main 3 -e:1) v:% ->g
j <2> gt sK/2 ->k
h <1> rv2sv sK/1 ->i
g <$> gv(*_) s ->h
i <$> const(IV 5) s ->j
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
Which, because of the op_scope() call, creates this optree:
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $x{a}++; $_ > 5 }
(1..5)'
l <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->l
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
k <@> leave sKP ->7
8 <0> enter s ->9
9 <;> nextstate(main 3 -e:1) v:% ->a
e <1> preinc[t1] vK/1 ->f
d <2> helem sKRM/2 ->e
b <1> rv2hv sKR ->c
a <$> gv(*x) s ->b
c <$> const(PV "a") s/BARE ->d
f <;> nextstate(main 3 -e:1) v:% ->g
j <2> gt sK/2 ->k
h <1> rv2sv sK/1 ->i
g <$> gv(*_) s ->h
i <$> const(IV 5) s ->j
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
That is, it's wrapped the block in an enter/leave pair which is adding
a lot of overhead. If you remove that so it's just any { $_ > 50 }, we
get just an op_scope() (which gets elided) and then our benchmark
looks like:
Rate CORE::grep List::Util::any List::Keywords/any
CORE::grep 57537/s -- -60%
-61%
List::Util::any 144858/s 152% --
-1%
List::Keywords/any 147016/s 156% 1% --
So in the best case, your code is faster now I think.
There's one final thing I don't quite understand that your code is
doing and maybe doesn't need to:
You call op_scope() on the block which DTRT to make sure the code gets
wrapped properly to clean things up (ENTER/LEAVE if needed, pp_scope
otherwise, etc)
But then your code is also doing:
SAVETMPS;
ENTER_with_name("any_item");
...
FREETMPS;
LEAVE_with_name("any_item");
I'm... not entirely sure these are necessary.
And removing them does give a bit of a speed boost.
If you do remove them, I think one risk is that temporaries will stack
up until the final exit from anywhile, but .. that's kinda what
List::Util::any() does anyway?
I dunno!
-- Matthew Horsfall (alh)
On Fri, Apr 23, 2021 at 5:11 PM Paul "LeoNerd" Evans
<leonerd@leonerd.org.uk> wrote:
> If anyone happens to find themselves with some spare time, I'd
> appreciate any assistance or suggestions of something to look into
> here, to see why my version is being so slow. Or is it the case that
> List::Util's version just really is *that* much more efficient, given
> as it uses dMULTICALL - at which point maybe we can adopt some of its
> performance tricks into perl core's way of doing grep/etc... and make
> them faster too?
One thing I noticed:
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $_ > 5 } (1..5)'
e <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->e
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
d <@> scope sK ->7
8 <;> ex-nextstate(main 3 -e:1) v:% ->9
c <2> gt sK/2 ->d
a <1> rv2sv sK/1 ->b
9 <$> gv(*_) s ->a
b <$> const(IV 5) s ->c
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
There's an rv2sv which should be optimized by the peephole optimizer.
After reading around a bit it sounds like since you constructed these
ops yourself, you need to call that yourself.
After some experimenting I found this worked:
+ OP **startp = &(anywhile->op_other);
+ PL_peepp(*startp);
*out = (OP *)anywhile;
return KEYWORD_PLUGIN_EXPR;
That gets:
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $_ > 5 } (1..5)'
c <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->c
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
- <@> scope sK ->7
8 <;> ex-nextstate(main 3 -e:1) v:% ->9
b <2> gt sK/2 ->7
- <1> ex-rv2sv sK/1 ->a
9 <$> gvsv(*_) s ->a
a <$> const(IV 5) s ->b
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
And that goes from:
Rate CORE::grep List::Keywords/any List::Util::any
CORE::grep 17772/s -- -45%
-78%
List::Keywords/any 32059/s 80% -- -60%
List::Util::any 79482/s 347% 148%
To:
Rate CORE::grep List::Keywords/any List::Util::any
CORE::grep 16205/s -- -54%
-79%
List::Keywords/any 35611/s 120% --
-53%
List::Util::any 75990/s 369%
113% --
Beyond that, your benchmark is doing this:
my $ret = any {$cmpcount{lka}++; $_ > 50 } @nums;
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $x{a}++; $_ > 5 }
(1..5)'
l <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->l
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
k <@> leave sKP ->7
8 <0> enter s ->9
9 <;> nextstate(main 3 -e:1) v:% ->a
e <1> preinc[t1] vK/1 ->f
d <2> helem sKRM/2 ->e
b <1> rv2hv sKR ->c
a <$> gv(*x) s ->b
c <$> const(PV "a") s/BARE ->d
f <;> nextstate(main 3 -e:1) v:% ->g
j <2> gt sK/2 ->k
h <1> rv2sv sK/1 ->i
g <$> gv(*_) s ->h
i <$> const(IV 5) s ->j
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
Which, because of the op_scope() call, creates this optree:
alh@emondsfield:~/List-Keywords-0.01$ /tmp/perlnew/bin/perl5.33.9
-Mblib -MList::Keywords=any -MO=Concise -e 'any { $x{a}++; $_ > 5 }
(1..5)'
l <@> leave[1 ref] vKP/REFC ->(end)
1 <0> enter v ->2
2 <;> nextstate(main 1 -e:1) v:%,{ ->3
7 <|> anywhile(other->8) vK ->l
6 <@> anystart K ->7
3 <0> pushmark s ->4
- <1> null K/1 ->4
k <@> leave sKP ->7
8 <0> enter s ->9
9 <;> nextstate(main 3 -e:1) v:% ->a
e <1> preinc[t1] vK/1 ->f
d <2> helem sKRM/2 ->e
b <1> rv2hv sKR ->c
a <$> gv(*x) s ->b
c <$> const(PV "a") s/BARE ->d
f <;> nextstate(main 3 -e:1) v:% ->g
j <2> gt sK/2 ->k
h <1> rv2sv sK/1 ->i
g <$> gv(*_) s ->h
i <$> const(IV 5) s ->j
5 <1> rv2av lKP/1 ->6
4 <$> const(AV ARRAY) s ->5
-e syntax OK
That is, it's wrapped the block in an enter/leave pair which is adding
a lot of overhead. If you remove that so it's just any { $_ > 50 }, we
get just an op_scope() (which gets elided) and then our benchmark
looks like:
Rate CORE::grep List::Util::any List::Keywords/any
CORE::grep 57537/s -- -60%
-61%
List::Util::any 144858/s 152% --
-1%
List::Keywords/any 147016/s 156% 1% --
So in the best case, your code is faster now I think.
There's one final thing I don't quite understand that your code is
doing and maybe doesn't need to:
You call op_scope() on the block which DTRT to make sure the code gets
wrapped properly to clean things up (ENTER/LEAVE if needed, pp_scope
otherwise, etc)
But then your code is also doing:
SAVETMPS;
ENTER_with_name("any_item");
...
FREETMPS;
LEAVE_with_name("any_item");
I'm... not entirely sure these are necessary.
And removing them does give a bit of a speed boost.
If you do remove them, I think one risk is that temporaries will stack
up until the final exit from anywhile, but .. that's kinda what
List::Util::any() does anyway?
I dunno!
-- Matthew Horsfall (alh)