On Thu, Sep 23, 2021 at 03:07:14PM +0200, demerphq wrote:
> On Fri, 10 Sept 2021 at 13:48, Nicholas Clark <nick@ccl4.org> wrote:
> > But, this *might* be worth doing for us, if there are lots of hashes with
> > 1 to 4 keys. (version objects seem to be 4 keys, MRO lookup tables are 1
> > key, at worst 2) - "official" size 4 buckets with a load factor of 1.0,
> > meaning actually allocating 7 buckets and storing up to 4 keys.
> >
>
> Doesn't sound like a good idea to me. But you know best. :-)
No, I don't know best. In that, for some things I have a good feel for how
far the code can be bent in a new direction (without it breaking), but I
don't have a great feel for which directions it is actually important to
bend in. Or how to measure success.
> > VC12.0 is capable of *targeting* XP and Windows Server 2003
> > (build host requirement is Windows 7/Windows Server 2012)
> >
>
> We only introduced the flag so we would stop breaking our windows builds.
> IF that isnt a problem anymore then yay.
I hope the explanation in the commit make sense in
https://github.com/Sereal/Sereal/pull/265
Although *that* commit was the first that I wrote, and maybe the text in
our perldelta.pod is clearer:
=item *
The Perl C source code now uses some C99 features, which we have verified are
supported by all compilers we target. This means that Perl's headers now
contain some code that is legal in C99 but not C89.
This may cause problems for some XS modules that unconditionally add
C<-Werror=declaration-after-statement> to their C compiler flags if compiling
with gcc or clang. Earlier versions of Perl support long obsolete compilers
that are strict in rejecting certain C99 features, particularly mixed
declarations and code, and hence it makes sense for XS module authors to audit
that their code does not violate this. However, doing this is now only
possible on these earlier versions of Perl, hence these modules need to be
changed to only add this flag for C<<$] < 5.035005>>.
=cut
> Yes, that is on my todo list to play with. For certain types of hashes IMO
> the PL_strtab is a massive pessimization. We have to do a double store and
> a lot of complex crap and we end up having one hidden hash that scales to
> the size of the set of unique keys in all hashes, which can chew up a lot
> of memory in the background and can play havok with preload loading of
> data.
>
> Eg, consider I want to build 10 hashes with 10 million keys each, im
> actually building 11 hashes, one with 100M keys, and ten with 10M. AND if I
> build of them prefork, and then build a totally different hash with similar
> keys, I start COW'ing the hidden hash.
>
> I have dreamed that we would/could change how hashes work, and only use
> PL_strtab when they are blessed. So when a hash starts of it would be
> unshared. When it gets blessed it gets converted into a shared hash. Eg,
> maybe introduce a "no shared_keys;" pragma, or something like that.
>
> The idea of PL_strtab as far as I know was to dedupe keys used in objects
> where there is a high level of repetition of the keys. When you are build
> large hashes of arbitrary strings, or aggregations in memory with lots of
> breakouts that are unlikely to overlap, or are building hashes prefork, the
> value is really questionable.
Sort of like this?
https://github.com/nwc10/perl5/tree/larger-hashes-without-shared-keys
It turns off shared keys once hashes get (roughly) larger than 42 keys.
(Except for objects and symbol tables)
As I wrote above:
No, I don't know best. In that, for some things I have a good feel for how
far the code can be bent in a new direction (without it breaking), but I
don't have a great feel for which directions it is actually important to
bend in. Or how to measure success.
in that, it all ought to be faster for the use cases described, but I fail
to reliably benchmark it.
Taking my previous (not great) benchmark that is roughly a "seen" hash:
$ cat ~/test/big-hash-r3-my.pl
#!perl -w
use strict;
my $size = shift;
$size ||= 1e5;
my %hash;
if ($size < 0) {
$size = -$size;
keys %hash = $size;
}
# Avoid a lot of scope entries/exits so that we're mostly timing what we care
# about
++$hash{$_ * 3}
for 1 .. $size;
$hash{$_ * 5} || 0
for 1 .. $size;
$hash{$_ * 5} || 0
for 1 .. $size;
$hash{$_ * 5} || 0
for 1 .. $size;
__END__
For blead, one run:
$ valgrind --tool=cachegrind --branch-sim=yes /home/nick/Sandpit/snap-v5.35.4-150-geb058c0793/bin/perl5.35.5 ~/test/big-hash-r3-my.pl 1e7 ==2083422== Cachegrind, a cache and branch-prediction profiler
==2083422== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==2083422== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2083422== Command: /home/nick/Sandpit/snap-v5.35.4-150-geb058c0793/bin/perl5.35.5 /home/nick/test/big-hash-r3-my.pl 1e7
==2083422==
--2083422-- warning: L3 cache found, using its data for the LL simulation.
==2083422== brk segment overflow in thread #1: can't grow to 0x4838000
==2083422== (see section Limitations in user manual)
==2083422== NOTE: further instances of this message will not be shown
==2083422==
==2083422== I refs: 37,648,207,135
==2083422== I1 misses: 31,488
==2083422== LLi misses: 6,735
==2083422== I1 miss rate: 0.00%
==2083422== LLi miss rate: 0.00%
==2083422==
==2083422== D refs: 17,888,205,650 (12,036,875,789 rd + 5,851,329,861 wr)
==2083422== D1 misses: 245,352,635 ( 220,149,381 rd + 25,203,254 wr)
==2083422== LLd misses: 232,768,464 ( 209,874,979 rd + 22,893,485 wr)
==2083422== D1 miss rate: 1.4% ( 1.8% + 0.4% )
==2083422== LLd miss rate: 1.3% ( 1.7% + 0.4% )
==2083422==
==2083422== LL refs: 245,384,123 ( 220,180,869 rd + 25,203,254 wr)
==2083422== LL misses: 232,775,199 ( 209,881,714 rd + 22,893,485 wr)
==2083422== LL miss rate: 0.4% ( 0.4% + 0.4% )
==2083422==
==2083422== Branches: 5,764,880,483 ( 5,252,453,290 cond + 512,427,193 ind)
==2083422== Mispredicts: 426,430,089 ( 106,405,461 cond + 320,024,628 ind)
==2083422== Mispred rate: 7.4% ( 2.0% + 62.5% )
for that branch (almost - before rebasing):
$ valgrind --tool=cachegrind --branch-sim=yes /home/nick/Sandpit/snap-v5.35.4-159-g82b0176e75/bin/perl5.35.5 ~/test/big-hash-r3-my.pl 1e7 ==2082396== Cachegrind, a cache and branch-prediction profiler
==2082396== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==2082396== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2082396== Command: /home/nick/Sandpit/snap-v5.35.4-159-g82b0176e75/bin/perl5.35.5 /home/nick/test/big-hash-r3-my.pl 1e7
==2082396==
--2082396-- warning: L3 cache found, using its data for the LL simulation.
==2082396== brk segment overflow in thread #1: can't grow to 0x4832000
==2082396== (see section Limitations in user manual)
==2082396== NOTE: further instances of this message will not be shown
==2082396==
==2082396== I refs: 36,103,067,458
==2082396== I1 misses: 31,762
==2082396== LLi misses: 6,739
==2082396== I1 miss rate: 0.00%
==2082396== LLi miss rate: 0.00%
==2082396==
==2082396== D refs: 17,049,471,109 (11,534,685,391 rd + 5,514,785,718 wr)
==2082396== D1 misses: 205,496,802 ( 188,346,327 rd + 17,150,475 wr)
==2082396== LLd misses: 194,991,458 ( 179,112,354 rd + 15,879,104 wr)
==2082396== D1 miss rate: 1.2% ( 1.6% + 0.3% )
==2082396== LLd miss rate: 1.1% ( 1.6% + 0.3% )
==2082396==
==2082396== LL refs: 205,528,564 ( 188,378,089 rd + 17,150,475 wr)
==2082396== LL misses: 194,998,197 ( 179,119,093 rd + 15,879,104 wr)
==2082396== LL miss rate: 0.4% ( 0.4% + 0.3% )
==2082396==
==2082396== Branches: 5,594,219,347 ( 5,081,890,292 cond + 512,329,055 ind)
==2082396== Mispredicts: 409,023,997 ( 89,004,563 cond + 320,019,434 ind)
==2082396== Mispred rate: 7.3% ( 1.8% + 62.5% )
so it looks sort-of better, but
1) not fantastically
2) I know that something (hash randomisation) moves those numbers about
So, assuming that you have some test cases that more easily benchmark things
like "what memory doesn't stay shared when you fork?" it would be useful if
you could try benchmarking that branch (vs blead) to see if it helps the
use cases that you (or your $work) are interested in.
I'm really not doing very well at benchmarking whether changes are useful.
I can make changes, but I seem to be shooting in the dark, which isn't
really working out.
Nicholas Clark
> On Fri, 10 Sept 2021 at 13:48, Nicholas Clark <nick@ccl4.org> wrote:
> > But, this *might* be worth doing for us, if there are lots of hashes with
> > 1 to 4 keys. (version objects seem to be 4 keys, MRO lookup tables are 1
> > key, at worst 2) - "official" size 4 buckets with a load factor of 1.0,
> > meaning actually allocating 7 buckets and storing up to 4 keys.
> >
>
> Doesn't sound like a good idea to me. But you know best. :-)
No, I don't know best. In that, for some things I have a good feel for how
far the code can be bent in a new direction (without it breaking), but I
don't have a great feel for which directions it is actually important to
bend in. Or how to measure success.
> > VC12.0 is capable of *targeting* XP and Windows Server 2003
> > (build host requirement is Windows 7/Windows Server 2012)
> >
>
> We only introduced the flag so we would stop breaking our windows builds.
> IF that isnt a problem anymore then yay.
I hope the explanation in the commit make sense in
https://github.com/Sereal/Sereal/pull/265
Although *that* commit was the first that I wrote, and maybe the text in
our perldelta.pod is clearer:
=item *
The Perl C source code now uses some C99 features, which we have verified are
supported by all compilers we target. This means that Perl's headers now
contain some code that is legal in C99 but not C89.
This may cause problems for some XS modules that unconditionally add
C<-Werror=declaration-after-statement> to their C compiler flags if compiling
with gcc or clang. Earlier versions of Perl support long obsolete compilers
that are strict in rejecting certain C99 features, particularly mixed
declarations and code, and hence it makes sense for XS module authors to audit
that their code does not violate this. However, doing this is now only
possible on these earlier versions of Perl, hence these modules need to be
changed to only add this flag for C<<$] < 5.035005>>.
=cut
> Yes, that is on my todo list to play with. For certain types of hashes IMO
> the PL_strtab is a massive pessimization. We have to do a double store and
> a lot of complex crap and we end up having one hidden hash that scales to
> the size of the set of unique keys in all hashes, which can chew up a lot
> of memory in the background and can play havok with preload loading of
> data.
>
> Eg, consider I want to build 10 hashes with 10 million keys each, im
> actually building 11 hashes, one with 100M keys, and ten with 10M. AND if I
> build of them prefork, and then build a totally different hash with similar
> keys, I start COW'ing the hidden hash.
>
> I have dreamed that we would/could change how hashes work, and only use
> PL_strtab when they are blessed. So when a hash starts of it would be
> unshared. When it gets blessed it gets converted into a shared hash. Eg,
> maybe introduce a "no shared_keys;" pragma, or something like that.
>
> The idea of PL_strtab as far as I know was to dedupe keys used in objects
> where there is a high level of repetition of the keys. When you are build
> large hashes of arbitrary strings, or aggregations in memory with lots of
> breakouts that are unlikely to overlap, or are building hashes prefork, the
> value is really questionable.
Sort of like this?
https://github.com/nwc10/perl5/tree/larger-hashes-without-shared-keys
It turns off shared keys once hashes get (roughly) larger than 42 keys.
(Except for objects and symbol tables)
As I wrote above:
No, I don't know best. In that, for some things I have a good feel for how
far the code can be bent in a new direction (without it breaking), but I
don't have a great feel for which directions it is actually important to
bend in. Or how to measure success.
in that, it all ought to be faster for the use cases described, but I fail
to reliably benchmark it.
Taking my previous (not great) benchmark that is roughly a "seen" hash:
$ cat ~/test/big-hash-r3-my.pl
#!perl -w
use strict;
my $size = shift;
$size ||= 1e5;
my %hash;
if ($size < 0) {
$size = -$size;
keys %hash = $size;
}
# Avoid a lot of scope entries/exits so that we're mostly timing what we care
# about
++$hash{$_ * 3}
for 1 .. $size;
$hash{$_ * 5} || 0
for 1 .. $size;
$hash{$_ * 5} || 0
for 1 .. $size;
$hash{$_ * 5} || 0
for 1 .. $size;
__END__
For blead, one run:
$ valgrind --tool=cachegrind --branch-sim=yes /home/nick/Sandpit/snap-v5.35.4-150-geb058c0793/bin/perl5.35.5 ~/test/big-hash-r3-my.pl 1e7 ==2083422== Cachegrind, a cache and branch-prediction profiler
==2083422== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==2083422== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2083422== Command: /home/nick/Sandpit/snap-v5.35.4-150-geb058c0793/bin/perl5.35.5 /home/nick/test/big-hash-r3-my.pl 1e7
==2083422==
--2083422-- warning: L3 cache found, using its data for the LL simulation.
==2083422== brk segment overflow in thread #1: can't grow to 0x4838000
==2083422== (see section Limitations in user manual)
==2083422== NOTE: further instances of this message will not be shown
==2083422==
==2083422== I refs: 37,648,207,135
==2083422== I1 misses: 31,488
==2083422== LLi misses: 6,735
==2083422== I1 miss rate: 0.00%
==2083422== LLi miss rate: 0.00%
==2083422==
==2083422== D refs: 17,888,205,650 (12,036,875,789 rd + 5,851,329,861 wr)
==2083422== D1 misses: 245,352,635 ( 220,149,381 rd + 25,203,254 wr)
==2083422== LLd misses: 232,768,464 ( 209,874,979 rd + 22,893,485 wr)
==2083422== D1 miss rate: 1.4% ( 1.8% + 0.4% )
==2083422== LLd miss rate: 1.3% ( 1.7% + 0.4% )
==2083422==
==2083422== LL refs: 245,384,123 ( 220,180,869 rd + 25,203,254 wr)
==2083422== LL misses: 232,775,199 ( 209,881,714 rd + 22,893,485 wr)
==2083422== LL miss rate: 0.4% ( 0.4% + 0.4% )
==2083422==
==2083422== Branches: 5,764,880,483 ( 5,252,453,290 cond + 512,427,193 ind)
==2083422== Mispredicts: 426,430,089 ( 106,405,461 cond + 320,024,628 ind)
==2083422== Mispred rate: 7.4% ( 2.0% + 62.5% )
for that branch (almost - before rebasing):
$ valgrind --tool=cachegrind --branch-sim=yes /home/nick/Sandpit/snap-v5.35.4-159-g82b0176e75/bin/perl5.35.5 ~/test/big-hash-r3-my.pl 1e7 ==2082396== Cachegrind, a cache and branch-prediction profiler
==2082396== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==2082396== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2082396== Command: /home/nick/Sandpit/snap-v5.35.4-159-g82b0176e75/bin/perl5.35.5 /home/nick/test/big-hash-r3-my.pl 1e7
==2082396==
--2082396-- warning: L3 cache found, using its data for the LL simulation.
==2082396== brk segment overflow in thread #1: can't grow to 0x4832000
==2082396== (see section Limitations in user manual)
==2082396== NOTE: further instances of this message will not be shown
==2082396==
==2082396== I refs: 36,103,067,458
==2082396== I1 misses: 31,762
==2082396== LLi misses: 6,739
==2082396== I1 miss rate: 0.00%
==2082396== LLi miss rate: 0.00%
==2082396==
==2082396== D refs: 17,049,471,109 (11,534,685,391 rd + 5,514,785,718 wr)
==2082396== D1 misses: 205,496,802 ( 188,346,327 rd + 17,150,475 wr)
==2082396== LLd misses: 194,991,458 ( 179,112,354 rd + 15,879,104 wr)
==2082396== D1 miss rate: 1.2% ( 1.6% + 0.3% )
==2082396== LLd miss rate: 1.1% ( 1.6% + 0.3% )
==2082396==
==2082396== LL refs: 205,528,564 ( 188,378,089 rd + 17,150,475 wr)
==2082396== LL misses: 194,998,197 ( 179,119,093 rd + 15,879,104 wr)
==2082396== LL miss rate: 0.4% ( 0.4% + 0.3% )
==2082396==
==2082396== Branches: 5,594,219,347 ( 5,081,890,292 cond + 512,329,055 ind)
==2082396== Mispredicts: 409,023,997 ( 89,004,563 cond + 320,019,434 ind)
==2082396== Mispred rate: 7.3% ( 1.8% + 62.5% )
so it looks sort-of better, but
1) not fantastically
2) I know that something (hash randomisation) moves those numbers about
So, assuming that you have some test cases that more easily benchmark things
like "what memory doesn't stay shared when you fork?" it would be useful if
you could try benchmarking that branch (vs blead) to see if it helps the
use cases that you (or your $work) are interested in.
I'm really not doing very well at benchmarking whether changes are useful.
I can make changes, but I seem to be shooting in the dark, which isn't
really working out.
Nicholas Clark