Mailing List Archive

Robin Hood Hashing for the perl core
Sorry that this message is about 4 months late for April Fool's Day.

So, this doesn't really fit the "elevator pitch" shape because I don't think
that proposal would really fit the RFC shape either.


I'd like to suggest that we switch the core's hashtable implementation from
the current array of linked-lists (which is not friendly with CPU caches) to
"open addressing" hashing (which is). More specifically, to "Robin Hood
Hashing", and most specifically to the implementation that I wrote for
MoarVM (and Perl 5) - "A Better Hash".

Why...

For the simple explanation, you need to know about "load factor".

The top level of all hashtable implementations is an array of something
(pointers or structure) - an array that gets resized "as needed", but when
that happens is a trade off.

To make things work, you can't use all the entries in this array. Some have
to be empty. The proportion that are full is your load factor. It can't go
above 100%. If you load factor gets too high, you resize the hash.
(Typically you double its size. As there are the same number of keys, your
load factor halves)

Assume a typical load factor of 50%. This means that half your top level
array elements are empty. Imagine that each used element is next to an empty
element - it makes these diagrams easier:

The current hash implementation is an array of pointers to linked lists of
HE structs, which in turn point to the key and value:

| |
+------+ +------+ +------+
| HE * | -> | Key | ---> | HEK |
+------+ +- -+ +------+
| NULL | | Val | -+
+------+ +- -+ | +------+
| | | NULL | +-> | SV |
+------+ +------+

^^^^ points to next entry in the list if needed.

The proposed implementation "open addressing" eliminates the linked lists by
storing HE structs in a flat array:

| |
+------+ +------+
| Key | ---> | HEK |
+- -+ +------+
| Val | -+
+------+ | +------+
| NULL | +-> | SV |
+- -+ +------+
| NULL |
+------+
| |

This

1) reduces the overhead by (on average) 1 pointer per hash entry,
2) eliminates one pointer dereference (and potential CPU cache miss)
3) keeps all the hash close in memory (likely more CPU cache hits)

The downsides are

1) more CPU work in maintaining that flat array in order
2) slightly more memory allocation for hashes under about 50 entries
(but usually that memory is allocated but not needed)



I propose that we use the hash that MoarVM switched to last year.
It's my independent C implementation of the design of Martin Ankerl's hash
at https://github.com/martinus/robin-hood-hashing

I named mine "A Better Hash"

There are a lot of comments explaining how it works at
https://github.com/MoarVM/MoarVM/blob/master/src/core/str_hash_table.h

The MoarVM version has 64 bit hash values, but 32 bit sizes for hashtables
(so at most 4G entries)

Perl 5 currently has 32 bit, 32 bit, and a 31 bit length limit for keys.

I propose that while we're disrupting everything else, we switch to U64
hashes, SSize_t entries and SSize_t keys.

Because this all sounds crazy and impossible and you'd never believe me, I
have a working proof of concept at

https://github.com/nwc10/perl5/tree/smoke-me/nicholas/ABH-POC

It builds DBI, Moose, Moo, JSON::XS, YAML::XS just fine.

As in, all dependencies, all tests pass, vanilla, no changes.

(it's definitely a Proof Of Concept. The code is a mess of 3 styles.
Some dump.c functionality and a few regression tests are disabled.
keys %hash = $value; is not implemented yet. ithreads cloning doesn't
clone the hash iterator. The commit messages could be better.
Apart from that it works. At least on Linux, AIX and HP/UX)

Internally the core is ABH with 64 bit hash values and 64 bit sizes in the C
APIs, but it presents a legacy 32 bit interface to XS, and XS works.

It seems that you only need 32/64 bit compatibility wrappers around 9 core
API functions.

I *did* need to patch Cpanel::JSON::XS and Sereal to remove
-Werror-no-declaration-after-statement from their C compiler flags.

Until "this week" that flag was an excellent idea, to avoid accidentally
writing non-C89 code. However, I want to use C99 in my inline wrappers, and
hence I've just discovered one of our future C99 pain points....


Sereal needed t/020_sort_keys.t temporarily disabled, and these changes:

+++ srl_encoder.c 2021-08-02 21:39:24.461838518 +0000
@@ -1203,6 +1203,23 @@
{
HE *he;
const int do_share_keys = HvSHAREKEYS((SV *)src);
+
+#ifdef HvABH
+ Perl_ABH_Table *table= HvABH(src);
+ Perl_ABH_Iterator iterator= Perl_ABH_first(table);
+ while (!Perl_ABH_at_end(table, iterator)) {
+ HE *he= (HE *) Perl_ABH_current(table, iterator);
+ SV *v= HeVAL(he);
+ if (v != &PL_sv_placeholder) {
+ srl_dump_hk(aTHX_ enc, he, do_share_keys);
+ CALL_SRL_DUMP_SV(enc, v);
+ if (--n == 0) {
+ break;
+ }
+ }
+ iterator= Perl_ABH_next(table, iterator);
+ }
+#else
HE **he_ptr= HvARRAY(src);
HE **he_end= he_ptr + HvMAX(src) + 1;

@@ -1219,6 +1236,7 @@
}
}
} while ( he_ptr < he_end );
+#endif
}

SRL_STATIC_INLINE void
@@ -1508,7 +1526,12 @@
if ( share_keys && SRL_ENC_HAVE_OPTION(enc, SRL_F_SHARED_HASHKEYS) /* only enter branch if shared hk's enabled */
#if PERL_VERSION >= 10
&& (!DO_SHARED_HASH_ENTRY_REFCOUNT_CHECK
- || src->he_valu.hent_refcount > 1)
+#ifdef HvABH
+ || src->hent_hek->hek_refcount > 1
+#else
+ || src->he_valu.hent_refcount > 1
+#endif
+ )
#endif
)
{



I think that we can fake everything in the existing (macro soup) API *except*

1) HvARRAY
2) The type of PL_strtab


HvARRAY can't be replaced or emulated. I think that 14 distros use it:

https://metacpan.org/release/Devel-Arena
https://metacpan.org/release/Devel-MAT-Dumper
https://metacpan.org/release/Devel-SizeMe
https://metacpan.org/release/Glib
https://metacpan.org/release/Hash-Spy
https://metacpan.org/release/Internals-DumpArenas
https://metacpan.org/release/JSPL
https://metacpan.org/release/JavaBin
https://metacpan.org/release/Sereal-Encoder
https://metacpan.org/release/Sub-Disable
https://metacpan.org/release/autovivification
https://metacpan.org/release/mod_perl
https://metacpan.org/release/namespace-clean-xs
https://metacpan.org/release/perlec

grep.metacpan.org can't filter out results from ppport.h (or embed.fnc)
so I might have missed some
(this is a feature of grep.cpan.me that I miss)

I think we can patch them all. Several are just truth tests - I propose
a new macro HvIS_EMPTY(hv) to replace them, which we also add to ppport.h

Others just iterate over all the hash buckets - I propose we add
hv_foreach() to core and ppport.h to handle these cleanly.

PL_strtab is mentioned in

https://metacpan.org/release/Devel-Arena
https://metacpan.org/release/Devel-GC-Helper
https://metacpan.org/release/Devel-MAT-Dumper

that I can find, and as it turns out at least in xsh/threads.h, which a
routine search can't find.

The "XS exposure" to these two is much smaller than you might imagine.
As long as the iterator APIs keep working and returning HE*s and HEK*s,
most code is happy.


Why am I suggesting "A Better Hash" and not $other...

1) C implementation exists
2) It's portable
3) It's suitably licensed
4) It supports hash randomisation
5) You can delete during hash iteration without having "problems"


Before suggesting other hashes, please could you check that they meet all 5
of the above criteria. I'm suggesting here something that is known to work
and can be polished and delivered within a month or two part time. It builds
on approximately every CPU and OS known to humanity (including sparc64 and
ia64, so any endianness, and the alignment is 64 bit clean. *Not* tested on
VMS, zOS, the CPU that zOS runs on, hppa or RiscV. Yet.)

We can in turn replace this with something even better once code for that
exists. But until then, don't let the perfect be the enemy of the good.


Randomisation and Iteration

Randomisation is implemented by each hash having a private random 64 bit
"salt", and getting a new salt for each hashtable size doubling. Instead of
perturbing the iteration order, the insert position is perturbed.

For the current Linked List Hashes, the exploitable weakness was that on
hash "grow" linked lists chains are *split* (not completely re-ordered),
meaning that exactly 1 bit of the *key*'s hash value determines the split,
and hence *that* bit is exposed by the iteration order. Without all the
current mitigations to perturb iteration order it is demonstrably possible
to infer that value of the bit in enough keys to work out the interpreter's
global hash seed.

For ABH, all 64 bits of salt (and all 32 or 64 bits of the key's hash value)
are mixed to get the insert position. On hash resize a completely new salt
is used, and hence totally new order is generated. I believe this means that
anyone attempting to attack needs to figure out 96 (ideally 128) bits of
hidden state simultaneously, not just the 1 bit for the Linked List splits.
Hence I believe that it's secure. I'd appreciate someone competent proving
me wrong on this.

This means that internally hash iterators are just an integer array index.
This avoids all sorts of complexity with dangling pointers, allocation, etc.

Iterators actually start at the highest index and count down to 0. This has
three benefits

0) The "end of hash" iterator value is the same for hashes of any size
1) It's cheap to test a CPU register against a constant.
2) It's the reverse direction from internal movements caused by insertion
and deletion

That last one is key - it means that it's possible to delete the hash entry
at the current iterator position without perturbing the iteration.

Perl documents this as supported behaviour. We need to support it.


Performance

"The marvel is not that the bear dances well, but that the bear dances at
all."


It's actually a bit better than that. It's not noticeably slower on
anything. I'm not sure what makes a good benchmark. Code I've tried isn't
bottlenecked on hashing. I've tried using cachegrind - we're using more CPU
in the hash code, and more data references, but fewer last level misses.

But some of this could be down to tuning - open addressing hashes have do
more work for insert and delete moving entries around to maintain their
invariants. Robin Hood invariants are slightly more work on insert and
delete, but intended to reduce work on lookups. And the approach here of
storing more hash bits in the metadata means more ordering constraints (so
move moving), but fewer lookups outside of the metadata bytes on reading.

As I don't know what to benchmark, I've not tried tuning anything.
(To be fair, I'm exhausted after the big push to get this working at all.
It's why I've been quiet for a few weeks)

The load factor might be too high. There's actually quite a lot that can
be tuned:

* load factor (even make it dynamic as a function of hash size)
* thresholds for resizing
* thresholds for how many hash bits in the metadata
* smaller sizes (official "4", "2" or even "1", and with a load factor of 1.0)

but without meaningful benchmarks (which I don't have) it's not going to
answer any questions.


So for now I need to take a few days break from this, but everyone else is
welcome to play with the code, and figure out how to benchmark it.
(And break the randomisation)

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, Aug 03, 2021 at 02:27:42PM +0000, Nicholas Clark wrote:

> (it's definitely a Proof Of Concept. The code is a mess of 3 styles.
> Some dump.c functionality and a few regression tests are disabled.
> keys %hash = $value; is not implemented yet. ithreads cloning doesn't
> clone the hash iterator. The commit messages could be better.

and I disabled the non-recursive freeing of hashes in sv_clear, and didn't
write new code to replace it.

> Apart from that it works. At least on Linux, AIX and HP/UX)

And dromedary.p5h.org, which has a gcc/glibc combination that is old
enough to be annoying. Which is a useful test.

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
Thank you for posting this detailed introduction to the new hashing implementation. I just wanted to remark on one thing. Randomization of the iteration order isn't a goal in itself, but it serves two purposes: to stop attackers being able to guess the hash seed by observing the iteration order, and to 'teach people a lesson' and make sure they don't inadvertently rely on a hash key ordering that's subject to change.

If the hash implementation fixed an iteration order that didn't give away anything about the hash seed or how to engineer collisions, then it wouldn't need to be randomized. Some hash implementations allow you to return the keys in the order they were originally inserted, without extra cost. I think that would be a very desirable feature in any replacement hash implementation. If the insertion order could be preserved with only a small constant penalty, that would speed up and simplify code where you currently use the much slower Tie::IxHash, and would also speed up code which currently needs 'sort keys', not because alphabetical ordering is particularly important, but just to get deterministic output.

If your new hash implementation doesn't provide an obvious way to preserve the insertion order of keys, and so randomization will continue to be necessary, then I apologize for letting out one of the bees I have in my bonnet.

--
Ed Avis <ed.avis@qmaw.com>
Please ignore confidentiality stuff below this point.

This email and any files transmitted with it are CONFIDENTIAL and are intended solely for the use of the individual(s) or entity to whom they are addressed. Any unauthorised copying, disclosure or distribution of the material within this email is strictly forbidden. Any views or opinions presented within this email are solely those of the author and do not necessarily represent those of PGIM Limited, QMA Wadhwani LLP or their affiliates unless otherwise specifically stated. An electronic message is not binding on its sender. Any message referring to a binding agreement must be confirmed in writing and duly signed. If you have received this email in error, please notify the sender immediately and delete the original. Telephone, electronic and other communications and conversations with PGIM Limited, QMA Wadhwani LLP and/or their associated persons may be recorded and retained. PGIM Limited and QMA Wadhwani LLP are authorised and regulated by the Financial Conduct Authority. PGIM Limited (registered in England No. 3809039) has its registered office at Grand Buildings, 1-3 Strand, Trafalgar Square, London WC2N 5HR and QMA Wadhwani LLP (registered in England No. OC303168) has its registered office at 9th Floor, Orion House, 5 Upper St. Martin's Lane, London, England, WC2H 9EA.

Please note that your personal information may be stored and processed in any country where we have facilities or in which we engage service providers. If you provide personal information to us by email or otherwise, you consent to the transfer of that information to countries outside of your country of residence and these countries may have different data protection rules than your country.

To learn about our privacy policies, please use this link<https://www.pgim.com/disclaimer/privacy-center> to read the Privacy Notices.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, 3 Aug 2021 14:27:42 +0000
Nicholas Clark <nick@ccl4.org> wrote:

> I think we can patch them all. Several are just truth tests - I
> propose a new macro HvIS_EMPTY(hv) to replace them, which we also add
> to ppport.h

Regardless of anything else propsed, that sounds like a good first
step - and can probably bring some useful benefits independently of any
changes to actual hashing structure if it helps decouple modules from
core decisions.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Robin Hood Hashing for the perl core [ In reply to ]
Nicholas Clark <nick@ccl4.org> wrote:
:I'd like to suggest that we switch the core's hashtable implementation from
:the current array of linked-lists (which is not friendly with CPU caches) to
:"open addressing" hashing (which is). More specifically, to "Robin Hood
:Hashing", and most specifically to the implementation that I wrote for
:MoarVM (and Perl 5) - "A Better Hash".

How separable is it, would it be possible to add the new hash algorithm
separately from the switch from linked lists? The latter sounds like it
necessarily needs a big-bang approach, but we have prior experience in
supporting alternate hash algorithms.

:The downsides are
:
:1) more CPU work in maintaining that flat array in order

Can you quantify that, or describe in more detail what is involved?
Some of my maths code involves hashes with order of 10^8 keys, typically
"seen" hashes (that let me do X if I haven't seen this key before).
Typically for those the only interaction with the hash is a single
line like:
do_X($obj) unless $seen{$obj->serialize}++;

In particular, it sounds like ABH is optimized for cases where searches
are much more common than inserts/deletes, so I think "seen" hashes
in workloads where duplication is rare would be close to worst-case.

Having read through the comments in MoarVM, I find some of it confusing,
but it sounds like the trade-off could be tweaked by controlling the
number of extra hash bits stored. Is that something we could consider
making controllable on a per-hash basis?

:I think we can patch them all. Several are just truth tests - I propose
:a new macro HvIS_EMPTY(hv) to replace them, which we also add to ppport.h

I agree with leonerd that we should do this one right away.

:Others just iterate over all the hash buckets - I propose we add
:hv_foreach() to core and ppport.h to handle these cleanly.

Does that cover all remaining uses of HvARRAY that you found? Ie could
we provide HvIS_EMPTY and hv_foreach and then deprecate HvARRAY?

I'm nervous that there are enough uses of the various affected structures
on CPAN that it seems likely there'll be yet more in the DarkPAN; I think
we should aim to mitigate as much as possible in advance with a combination
of new interfaces and deprecations.

Additionally there is a fair amount of discussion in the notes about
randomization to avoid algorithmic complexity attacks, but I see none
about the converse - turning off the randomization to allow reproducible
behaviour, eg for debugging. I think that's also important, and also
merits some commentary.

Hugo
Re: Robin Hood Hashing for the perl core [ In reply to ]
If the proposed new hash function is "better" than the existing function
then the existing hash function is not "the best" which raises the
possibility that there might be other "better" hash functions to choose
from. One such possibility might use multi way trees:
https://metacpan.org/dist/Tree-Multi to replace the collision chain with a
more efficient implementation encoded using "Single Instruction, Multiple
Data": https://metacpan.org/pod/Nasm::X86#Avx512-instructions which would
be impervious to hashing attacks regardless of the seed chosen as in the
worst case the hash would degenerate to a single constant height tree of
broad degree.

May I therefore propose that if hash functions are in fact in play: there
should be a competition to find the best hash functions currently available
today and that the best of the best should be made available to all Perl
programmers via a "use' clause that gives each Perl programmer the
opportunity to choose the best function for their specific needs? This
would provide a powerful argument for using Perl - the only language that
puts the needs of its application programmers first and unambiguously ahead
of the needs of the programmers supporting the language.


On Tue, Aug 3, 2021 at 7:32 PM Ed Avis <ed.avis@qmaw.com> wrote:

> Thank you for posting this detailed introduction to the new hashing
> implementation. I just wanted to remark on one thing. Randomization of
> the iteration order isn't a goal in itself, but it serves two purposes: to
> stop attackers being able to guess the hash seed by observing the iteration
> order, and to 'teach people a lesson' and make sure they don't
> inadvertently rely on a hash key ordering that's subject to change.
>
> If the hash implementation fixed an iteration order that didn't give away
> anything about the hash seed or how to engineer collisions, then it
> wouldn't need to be randomized. Some hash implementations allow you to
> return the keys in the order they were originally inserted, without extra
> cost. I think that would be a very desirable feature in any replacement
> hash implementation. If the insertion order could be preserved with only a
> small constant penalty, that would speed up and simplify code where you
> currently use the much slower Tie::IxHash, and would also speed up code
> which currently needs 'sort keys', not because alphabetical ordering is
> particularly important, but just to get deterministic output.
>
> If your new hash implementation doesn't provide an obvious way to preserve
> the insertion order of keys, and so randomization will continue to be
> necessary, then I apologize for letting out one of the bees I have in my
> bonnet.
>
> --
> Ed Avis <ed.avis@qmaw.com>
> Please ignore confidentiality stuff below this point.
>
>
> This email and any files transmitted with it are CONFIDENTIAL and are
> intended solely for the use of the individual(s) or entity to whom they are
> addressed. Any unauthorised copying, disclosure or distribution of the
> material within this email is strictly forbidden. Any views or opinions
> presented within this email are solely those of the author and do not
> necessarily represent those of PGIM Limited, QMA Wadhwani LLP or their
> affiliates unless otherwise specifically stated. An electronic message is
> not binding on its sender. Any message referring to a binding agreement
> must be confirmed in writing and duly signed. If you have received this
> email in error, please notify the sender immediately and delete the
> original. Telephone, electronic and other communications and conversations
> with PGIM Limited, QMA Wadhwani LLP and/or their associated persons may be
> recorded and retained. PGIM Limited and QMA Wadhwani LLP are authorised and
> regulated by the Financial Conduct Authority. PGIM Limited (registered in
> England No. 3809039) has its registered office at Grand Buildings, 1-3
> Strand, Trafalgar Square, London WC2N 5HR and QMA Wadhwani LLP (registered
> in England No. OC303168) has its registered office at 9th Floor, Orion
> House, 5 Upper St. Martin's Lane, London, England, WC2H 9EA.
>
> Please note that your personal information may be stored and processed in
> any country where we have facilities or in which we engage service
> providers. If you provide personal information to us by email or otherwise,
> you consent to the transfer of that information to countries outside of
> your country of residence and these countries may have different data
> protection rules than your country.
>
> To learn about our privacy policies, please use this link
> <https://www.pgim.com/disclaimer/privacy-center> to read the Privacy
> Notices.
>
Re: Robin Hood Hashing for the perl core [ In reply to ]
Philip R Brenan writes:

> If the proposed new hash function is "better" than the existing function
> then the existing hash function is not "the best" which raises the
> possibility that there might be other "better" hash functions to choose
> from. One such possibility might use multi way trees:

It might. But then again, Nicholas has a proof-of-concept of his
specific proposal, which is different to the possibility of other
algorithms.

> May I therefore propose that if hash functions are in fact in play:

I don't think it works that certain categories of changes are “in play”
or not: more that any proposal is considered on its merits.

> there should be a competition to find the best hash functions
> currently available today and that the best of the best should be made
> available to all Perl programmers via a "use' clause that gives each
> Perl programmer the opportunity to choose the best function for their
> specific needs?

You're suggesting that even if the proposed new hashing algorithm (with
a working proof-of-concept, a list of the fall-out from switching to it
and what needs patching, and an experienced Perl 5 developer seemingly
offering to do the work involved) is agreed to be an improvement on the
current one, that we should decline this proposal for the aspiration of
a competition and switchable algorithms?

A competition that would take effort to run and oversee (and to come up
with entries for), and then further effort implementing each of the
chosen algorithms, and their switching method?

I'm not saying there wouldn't be any advantages in that happening. But
even if volunteers were found to do all that (and those were additional
volunteers, not taking time away from doing other things on Perl) and it
ran very smoothly, it would clearly take longer to do than making the
proposed hash algorithm switch.

And there's the chance that it wouldn't go well, or efforts would fizzle
away without reaching completion.

At which point it'd be several years later and we'd still be on the
current hashing algorithm.

Let's debate a specific proposal on its own merits, and not discard it,
or derail the discussion, in the hope of something bigger and vaguer
that possibly could happen in the future.

> This would provide a powerful argument for using Perl - the only
> language that puts the needs of its application programmers first and
> unambiguously ahead of the needs of the programmers supporting the
> language.

I can think of dozens of ways in which languages' designs prioritize
their users over their own developers. I'm not sure that switchable
hashing algorithms is one that would occur to most users, nor that it's
such a key feature that it would bring hoards running to Perl. One could
even argue the opposite: that users are best served by being freed of
such low-level considerations, trusting the implementation to do
something sensible without their having to even think about it.

But that's largely irrelevant anyway: even if I'm wrong and user-
switchable hashing algorithms is just what Perl needs, that still isn't
a reason to drop the specific proposal being made in this thread.

Smylers
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, Aug 03, 2021 at 02:27:42PM +0000, Nicholas Clark wrote:
> I'd like to suggest that we switch the core's hashtable implementation from
> the current array of linked-lists (which is not friendly with CPU caches) to
> "open addressing" hashing (which is). More specifically, to "Robin Hood
> Hashing", and most specifically to the implementation that I wrote for
> MoarVM (and Perl 5) - "A Better Hash".

[snip long description]

Maybe my eyes glazed over at the important bit, but I didn't see anything
explaining what the RHH does when there's a collision (i.e. the equivalent
of having more than one entry in the HE linked list in the old way).

> It's actually a bit better than that. It's not noticeably slower on
> anything. I'm not sure what makes a good benchmark. Code I've tried isn't
> bottlenecked on hashing. I've tried using cachegrind - we're using more CPU
> in the hash code, and more data references, but fewer last level misses.

A good start would be:

$ ./perl -Ilib Porting/bench.pl --tests='/^expr::hash/' -j 8 -w /tmp/results -v perlbinary1 perlbinary2 ...

which runs cachegrind on various empty and full loops containing the
snippets of code matched in t/perf/benchmarks and writes the results (with
startup and loop overhead factored out) into the file /tmp/results.
The -j 8 says run 8 benchmarks in parallel. The -v says show what's going on.

Then display the results in various formats, e.g. as percentages:

$ ./perl -Ilib Porting/bench.pl -r /tmp/results

or as raw numbers:

$ ./perl -Ilib Porting/bench.pl -r /tmp/results --raw

--
Overhead, without any fuss, the stars were going out.
-- Arthur C Clarke
Re: Robin Hood Hashing for the perl core [ In reply to ]
Dave Mitchell <davem@iabyn.com> wrote:
:Maybe my eyes glazed over at the important bit, but I didn't see anything
:explaining what the RHH does when there's a collision (i.e. the equivalent
:of having more than one entry in the HE linked list in the old way).

The crux of it is described by an example in
https://github.com/MoarVM/MoarVM/blob/master/src/core/str_hash_table.h
starting around line 319 (using the top 4 bits of the ascii value as the
hash key, and optionally the bottom 4 as the "extra" bits).

Hugo
Re: Robin Hood Hashing for the perl core [ In reply to ]
??, 3 ???. 2021 ?. ? 17:28, Nicholas Clark <nick@ccl4.org>:
>
>HvARRAY can't be replaced or emulated. I think that 14 distros use it:

You've missed XS::Framework and everything that depends on it. Maybe
you've searched only in .xs files? You should add at least .h, but on
a full grep I see more - for example threads:tbb in .cc for a total of
24 distributions.

For our (mine/syber) modules, HvARRAY usage is to avoid
tiedness/boundary checks for internal hashes that we iterate over, so
if a replacement API does them - it's useless for this particular
purpose. We don't mind adopting our code for new internals, they just
shouldn't be "too closed" behind static functions.

Best regards,
Sergey Aleynikov
Re: Robin Hood Hashing for the perl core [ In reply to ]
On 2021-08-04 1:01 a.m., Smylers wrote:
> Let's debate a specific proposal on its own merits, and not discard it,
> or derail the discussion, in the hope of something bigger and vaguer
> that possibly could happen in the future.

I fully agree.

As the saying goes, the perfect is the enemy of the good.

The current discussion should just focus on whether the current proposed new
solution is significantly better than what Perl has now and thus whether Perl
switching to it would be an improvement.

Consideration of other algorithms can be their own separate discussions if or
when someone comes up with a specific example and makes a case it is better than
the winner of the current discussion, including that someone has already
committed to do the work etc.

-- Darren Duncan
Re: Robin Hood Hashing for the perl core [ In reply to ]
Op 06-08-2021 om 03:13 schreef Darren Duncan:
> On 2021-08-04 1:01 a.m., Smylers wrote:
>> Let's debate a specific proposal on its own merits, and not discard it,
>> or derail the discussion, in the hope of something bigger and vaguer
>> that possibly could happen in the future.
>
> I fully agree.
>
> As the saying goes, the perfect is the enemy of the good.
>
> The current discussion should just focus on whether the current
> proposed new solution is significantly better than what Perl has now
> and thus whether Perl switching to it would be an improvement.
>
> Consideration of other algorithms can be their own separate
> discussions if or when someone comes up with a specific example and
> makes a case it is better than the winner of the current discussion,
> including that someone has already committed to do the work etc.


Yes, but a question that may be asked is, can this be generalized to
support different algorithms easily, so programmers can choose the
hashing algorithm at runtime. And if it can, is it worth to take such a
broadening of the scope on at the same time as handling the current
proposal. If it is easy to implement, it should be looked at.


I suspect the answer to this is a firm no, but it never hurts to look at
this. Anyone who knows the internals who can say something about that?


M4
Re: Robin Hood Hashing for the perl core [ In reply to ]
On 2021-08-06 12:20 a.m., Martijn Lievaart wrote:
>
> Op 06-08-2021 om 03:13 schreef Darren Duncan:
>> On 2021-08-04 1:01 a.m., Smylers wrote:
>>> Let's debate a specific proposal on its own merits, and not discard it,
>>> or derail the discussion, in the hope of something bigger and vaguer
>>> that possibly could happen in the future.
>>
>> I fully agree.
>>
>> As the saying goes, the perfect is the enemy of the good.
>>
>> The current discussion should just focus on whether the current proposed new
>> solution is significantly better than what Perl has now and thus whether Perl
>> switching to it would be an improvement.
>>
>> Consideration of other algorithms can be their own separate discussions if or
>> when someone comes up with a specific example and makes a case it is better
>> than the winner of the current discussion, including that someone has already
>> committed to do the work etc.
>
> Yes, but a question that may be asked is, can this be generalized to support
> different algorithms easily, so programmers can choose the hashing algorithm at
> runtime. And if it can, is it worth to take such a broadening of the scope on at
> the same time as handling the current proposal. If it is easy to implement, it
> should be looked at.
>
> I suspect the answer to this is a firm no, but it never hurts to look at this.
> Anyone who knows the internals who can say something about that?

I would say that at this stage the best argument for having that feature of
hashing algorithm choice is:

1. If it turns out from measuring that some common use cases perform way better
with the current algorithm and others perform way better with the new one, and
users would then benefit from being able to choose to tune their scenarios.

2. Also implementing the ability to choose isn't too much effort or has too poor
trade-offs of is own.

3. If going forward the added maintenance cost of two algorithms rather than one
is worth it.

-- Darren Duncan
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, Aug 06, 2021 at 12:41:56AM -0700, Darren Duncan wrote:
> On 2021-08-06 12:20 a.m., Martijn Lievaart wrote:
> >
> > Op 06-08-2021 om 03:13 schreef Darren Duncan:
> > > On 2021-08-04 1:01 a.m., Smylers wrote:
> > > > Let's debate a specific proposal on its own merits, and not discard it,
> > > > or derail the discussion, in the hope of something bigger and vaguer
> > > > that possibly could happen in the future.
> > >
> > > I fully agree.
> > >
> > > As the saying goes, the perfect is the enemy of the good.

As I wrote in my orginal message:

We can in turn replace this with something even better once code for that
exists. But until then, don't let the perfect be the enemy of the good.

> > > The current discussion should just focus on whether the current
> > > proposed new solution is significantly better than what Perl has now
> > > and thus whether Perl switching to it would be an improvement.
> > >
> > > Consideration of other algorithms can be their own separate
> > > discussions if or when someone comes up with a specific example and
> > > makes a case it is better than the winner of the current discussion,
> > > including that someone has already committed to do the work etc.

Yes, exactly that.

> > Yes, but a question that may be asked is, can this be generalized to
> > support different algorithms easily, so programmers can choose the
> > hashing algorithm at runtime. And if it can, is it worth to take such a
> > broadening of the scope on at the same time as handling the current
> > proposal. If it is easy to implement, it should be looked at.
> >
> > I suspect the answer to this is a firm no, but it never hurts to look at
> > this. Anyone who knows the internals who can say something about that?

1) I am not confident that it's easy to implement
2) *I* am not in a position to attempt it
3) I'm not in a position to mentor someone else who might attempt it
4) I doubt I'd even be in a position to review the code if someone did
attempt it.
(I estimate that it even if someone else started now, it would take them
a couple of months at least, and I by then I won't have as much free time
as I currently have)

I'm not stopping anyone doing this, but based on previous experience of
how often large good quality surprise contributions arrive, I doubt that
it is going to happen.

> I would say that at this stage the best argument for having that feature of
> hashing algorithm choice is:
>
> 1. If it turns out from measuring that some common use cases perform way
> better with the current algorithm and others perform way better with the new
> one, and users would then benefit from being able to choose to tune their
> scenarios.
>
> 2. Also implementing the ability to choose isn't too much effort or has too
> poor trade-offs of is own.
>
> 3. If going forward the added maintenance cost of two algorithms rather than
> one is worth it.

That's a good summary.

We have finite programmer time.

(eg - I think all but two messages in this thread so far have been from
people who can consider designs and trade offs, but can't review the C code)

If we want to maintain two (or three) effectively we'd be stealing from
one to feed the other(s).

Much like the previous PSC opinion on sort, I'm going to extrapolate and
suggest that it's only better to have 2+, if it's clear that a significant
amount of the user base are actually going to actively *need* to use
different hashing algorithms.

But if all we really end up with is an "optional feature" that rarely
anyone uses (because it costs them more to learn about it and evaluate it
than they gain by choosing it), then we've wasted effort that could have
been spent better elsewhere.


So if someone else steps up an implements another hash, great.

If someone else steps up and figures out how to switch between 2 (or more),
great.

And most of what they need is the approach taken in this branch (cleaned up)
anyway, to get to the point where it was possible to switch at all.


But don't wait for this. Don't assume that it's going to happen.

But until then we should consider what we know is possible, rather than
distract ourselves with talk about what someone else might do.


Nicholas Clark

PS I doubt I'll reply to all the relevant messages in this thread (or others)
today.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, Aug 03, 2021 at 02:36:33PM +0000, Ed Avis wrote:

> If the hash implementation fixed an iteration order that didn't give away anything about the hash seed or how to engineer collisions, then it wouldn't need to be randomized. Some hash implementations allow you to return the keys in the order they were originally inserted, without extra cost. I think that would be a very desirable feature in any replacement hash implementation. If the insertion order could be preserved with only a small constant penalty, that would speed up and simplify code where you currently use the much slower Tie::IxHash, and would also speed up code which currently needs 'sort keys', not because alphabetical ordering is particularly important, but just to get deterministic output.
>
> If your new hash implementation doesn't provide an obvious way to preserve the insertion order of keys, and so randomization will continue to be necessary, then I apologize for letting out one of the bees I have in my bonnet.

It's a good question.

No, it doesn't provide any way to preserve insertion order.

It's inherent in the Robin Hood algorithm that it controls the order within
the hash. It's actually *exploiting* the known order and ordering
constraints it adds to make searching terminate sooner.

So to maintain/remember insertion order would require storing more
information, which increases complexity, and likely size too.

(*likely* size, because by using a different data structure, you can
possibly make size savings elsewhere)

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, 3 Aug 2021 at 16:28, Nicholas Clark <nick@ccl4.org> wrote:

> Sorry that this message is about 4 months late for April Fool's Day.
>
> So, this doesn't really fit the "elevator pitch" shape because I don't
> think
> that proposal would really fit the RFC shape either.
>
>
> I'd like to suggest that we switch the core's hashtable implementation from
> the current array of linked-lists (which is not friendly with CPU caches)
> to
> "open addressing" hashing (which is). More specifically, to "Robin Hood
> Hashing", and most specifically to the implementation that I wrote for
> MoarVM (and Perl 5) - "A Better Hash".
>

Wow. I had looked into this once, and got discouraged.

I am super happy to hear that you did this and I am in favour.

NIcholas if there is anything I can do to help please reach out to me at my
work email.


> Why...
>
> For the simple explanation, you need to know about "load factor".
>

When you say this I assume you mean "to understand why open hashing is
preferable you need to know about load factor". That stray pesky comma...
:-)


>
> The top level of all hashtable implementations is an array of something
> (pointers or structure) - an array that gets resized "as needed", but when
> that happens is a trade off.
>
> To make things work, you can't use all the entries in this array. Some have
> to be empty. The proportion that are full is your load factor. It can't go
> above 100%. If you load factor gets too high, you resize the hash.
> (Typically you double its size. As there are the same number of keys, your
> load factor halves)
>
> Assume a typical load factor of 50%. This means that half your top level
> array elements are empty. Imagine that each used element is next to an
> empty
> element - it makes these diagrams easier:
>

Just for the record: in modern perls we hash split on collision only and
when the load factor is about 66%, prior to that change we used to split
when the load factor reached 100%. (100% is considered very high.)


>
> The current hash implementation is an array of pointers to linked lists of
> HE structs, which in turn point to the key and value:
>
> | |
> +------+ +------+ +------+
> | HE * | -> | Key | ---> | HEK |
> +------+ +- -+ +------+
> | NULL | | Val | -+
> +------+ +- -+ | +------+
> | | | NULL | +-> | SV |
> +------+ +------+
>
> ^^^^ points to next entry in the list if needed.
>
> The proposed implementation "open addressing" eliminates the linked lists
> by
> storing HE structs in a flat array:
>
> | |
> +------+ +------+
> | Key | ---> | HEK |
> +- -+ +------+
> | Val | -+
> +------+ | +------+
> | NULL | +-> | SV |
> +- -+ +------+
> | NULL |
> +------+
> | |
>
> This
>
> 1) reduces the overhead by (on average) 1 pointer per hash entry,
> 2) eliminates one pointer dereference (and potential CPU cache miss)
> 3) keeps all the hash close in memory (likely more CPU cache hits)
>
> The downsides are
>
> 1) more CPU work in maintaining that flat array in order
> 2) slightly more memory allocation for hashes under about 50 entries
> (but usually that memory is allocated but not needed)
>

One additional downside of open hashing is that it does not allow
constructing a hash that can have more than 100% load factor. i dont think
we care, but its an important point. With chained hashing you can have a
load factor higher than 100% if you want, and i believe that is what
happens when the existing bucket array gets oversized. I dont think it
matters, in practice such a hash would be insanely large in memory.

I also believe that the penalty of a poor hash function is higher, but I
dont know that I can prove it. (Eg, a poor hash function that produces too
many collisiions causes more damage to open addressing than to
chain-addressed) but with SipHash as the standard hash function this isnt a
problem.

I propose that we use the hash that MoarVM switched to last year.
> It's my independent C implementation of the design of Martin Ankerl's hash
> at https://github.com/martinus/robin-hood-hashing
>
> I named mine "A Better Hash"
>
> There are a lot of comments explaining how it works at
> https://github.com/MoarVM/MoarVM/blob/master/src/core/str_hash_table.h
>
> The MoarVM version has 64 bit hash values, but 32 bit sizes for hashtables
> (so at most 4G entries)
>
> Perl 5 currently has 32 bit, 32 bit, and a 31 bit length limit for keys.
>
> I propose that while we're disrupting everything else, we switch to U64
> hashes, SSize_t entries and SSize_t keys.
>
> Because this all sounds crazy and impossible and you'd never believe me, I
> have a working proof of concept at
>
> https://github.com/nwc10/perl5/tree/smoke-me/nicholas/ABH-POC
>
> It builds DBI, Moose, Moo, JSON::XS, YAML::XS just fine.
>
> As in, all dependencies, all tests pass, vanilla, no changes.
>
> (it's definitely a Proof Of Concept. The code is a mess of 3 styles.
> Some dump.c functionality and a few regression tests are disabled.
> keys %hash = $value; is not implemented yet. ithreads cloning doesn't
> clone the hash iterator. The commit messages could be better.
> Apart from that it works. At least on Linux, AIX and HP/UX)
>
> Internally the core is ABH with 64 bit hash values and 64 bit sizes in the
> C
> APIs, but it presents a legacy 32 bit interface to XS, and XS works.
>
> It seems that you only need 32/64 bit compatibility wrappers around 9 core
> API functions.
>
> I *did* need to patch Cpanel::JSON::XS and Sereal to remove
> -Werror-no-declaration-after-statement from their C compiler flags.
>

Why? Doesn't this just make it unprotable to AIX and old MS builds?


> Until "this week" that flag was an excellent idea, to avoid accidentally
> writing non-C89 code. However, I want to use C99 in my inline wrappers, and
> hence I've just discovered one of our future C99 pain points....
>
>
> Sereal needed t/020_sort_keys.t temporarily disabled, and these changes:
>

FWIW, I would be happy to take patches to Sereal to ensure that it compiles
properly with the old and new hash. And I would love to try it out and help
to the extent I can.


>
> +++ srl_encoder.c 2021-08-02 21:39:24.461838518 +0000
> @@ -1203,6 +1203,23 @@
> {
> HE *he;
> const int do_share_keys = HvSHAREKEYS((SV *)src);
> +
> +#ifdef HvABH
> + Perl_ABH_Table *table= HvABH(src);
> + Perl_ABH_Iterator iterator= Perl_ABH_first(table);
> + while (!Perl_ABH_at_end(table, iterator)) {
> + HE *he= (HE *) Perl_ABH_current(table, iterator);
> + SV *v= HeVAL(he);
> + if (v != &PL_sv_placeholder) {
> + srl_dump_hk(aTHX_ enc, he, do_share_keys);
> + CALL_SRL_DUMP_SV(enc, v);
> + if (--n == 0) {
> + break;
> + }
> + }
> + iterator= Perl_ABH_next(table, iterator);
> + }
> +#else
> HE **he_ptr= HvARRAY(src);
> HE **he_end= he_ptr + HvMAX(src) + 1;
>
> @@ -1219,6 +1236,7 @@
> }
> }
> } while ( he_ptr < he_end );
> +#endif
> }
>
> SRL_STATIC_INLINE void
> @@ -1508,7 +1526,12 @@
> if ( share_keys && SRL_ENC_HAVE_OPTION(enc,
> SRL_F_SHARED_HASHKEYS) /* only enter branch if shared hk's enabled */
> #if PERL_VERSION >= 10
> && (!DO_SHARED_HASH_ENTRY_REFCOUNT_CHECK
> - || src->he_valu.hent_refcount > 1)
> +#ifdef HvABH
> + || src->hent_hek->hek_refcount > 1
> +#else
> + || src->he_valu.hent_refcount > 1
> +#endif
> + )
> #endif
> )
> {
>
>
>
> I think that we can fake everything in the existing (macro soup) API
> *except*
>
> 1) HvARRAY



2) The type of PL_strtab
>

I am not sure I get this one, can you explain more.


>
>
> HvARRAY can't be replaced or emulated. I think that 14 distros use it:
>
> https://metacpan.org/release/Devel-Arena
> https://metacpan.org/release/Devel-MAT-Dumper
> https://metacpan.org/release/Devel-SizeMe
> https://metacpan.org/release/Glib
> https://metacpan.org/release/Hash-Spy
> https://metacpan.org/release/Internals-DumpArenas
> https://metacpan.org/release/JSPL
> https://metacpan.org/release/JavaBin
> https://metacpan.org/release/Sereal-Encoder
> https://metacpan.org/release/Sub-Disable
> https://metacpan.org/release/autovivification
> https://metacpan.org/release/mod_perl
> https://metacpan.org/release/namespace-clean-xs
> https://metacpan.org/release/perlec
>
> grep.metacpan.org can't filter out results from ppport.h (or embed.fnc)
> so I might have missed some
> (this is a feature of grep.cpan.me that I miss)
>
> I think we can patch them all. Several are just truth tests - I propose
> a new macro HvIS_EMPTY(hv) to replace them, which we also add to ppport.h
>

Make sense.


> Others just iterate over all the hash buckets - I propose we add
> hv_foreach() to core and ppport.h to handle these cleanly.
>

Ok, the above comment about HvARRY is explained now I guess.


>
> PL_strtab is mentioned in
>
> https://metacpan.org/release/Devel-Arena
> https://metacpan.org/release/Devel-GC-Helper
> https://metacpan.org/release/Devel-MAT-Dumper
>
> that I can find, and as it turns out at least in xsh/threads.h, which a
> routine search can't find.
>

Its moral existance is pretty important to Sereal, even if we dont mention
it directly. I still dont get the issue with it.


>
> The "XS exposure" to these two is much smaller than you might imagine.
> As long as the iterator APIs keep working and returning HE*s and HEK*s,
> most code is happy.
>
>
> Why am I suggesting "A Better Hash" and not $other...
>
> 1) C implementation exists
> 2) It's portable
> 3) It's suitably licensed
> 4) It supports hash randomisation
> 5) You can delete during hash iteration without having "problems"
>

It can? Interesting. I need to read up on that, I didnt think that was true.


>
>
> Before suggesting other hashes, please could you check that they meet all 5
> of the above criteria. I'm suggesting here something that is known to work
> and can be polished and delivered within a month or two part time. It
> builds
> on approximately every CPU and OS known to humanity (including sparc64 and
> ia64, so any endianness, and the alignment is 64 bit clean. *Not* tested on
> VMS, zOS, the CPU that zOS runs on, hppa or RiscV. Yet.)
>
> We can in turn replace this with something even better once code for that
> exists. But until then, don't let the perfect be the enemy of the good.
>

Agreed. An open hashing algorithm that is plumbed into Perl already is
worth 5 that arent.


>
> Randomisation and Iteration
>
> Randomisation is implemented by each hash having a private random 64 bit
> "salt", and getting a new salt for each hashtable size doubling. Instead of
> perturbing the iteration order, the insert position is perturbed.
>

If you happened to have a link to this in github id be grateful.


>
> For the current Linked List Hashes, the exploitable weakness was that on
> hash "grow" linked lists chains are *split* (not completely re-ordered),
> meaning that exactly 1 bit of the *key*'s hash value determines the split,
> and hence *that* bit is exposed by the iteration order. Without all the
> current mitigations to perturb iteration order it is demonstrably possible
> to infer that value of the bit in enough keys to work out the interpreter's
> global hash seed.
>
> For ABH, all 64 bits of salt (and all 32 or 64 bits of the key's hash
> value)
> are mixed to get the insert position. On hash resize a completely new salt
> is used, and hence totally new order is generated. I believe this means
> that
> anyone attempting to attack needs to figure out 96 (ideally 128) bits of
> hidden state simultaneously, not just the 1 bit for the Linked List splits.
> Hence I believe that it's secure. I'd appreciate someone competent proving
> me wrong on this.
>

Sounds good. Also since the old one was only "a bit more secure than doing
nothing" I dont think you need to hit the target of "demostrably secure"
you just need to be more secure than what I did, and I think that is pretty
easy to do. If you can provide a code link to the relevant logic it would
be helpful.


>
> This means that internally hash iterators are just an integer array index.
> This avoids all sorts of complexity with dangling pointers, allocation,
> etc.
>
> Iterators actually start at the highest index and count down to 0. This has
> three benefits
>
> 0) The "end of hash" iterator value is the same for hashes of any size
> 1) It's cheap to test a CPU register against a constant.
> 2) It's the reverse direction from internal movements caused by insertion
> and deletion
>
> That last one is key - it means that it's possible to delete the hash entry
> at the current iterator position without perturbing the iteration.
>
> Perl documents this as supported behaviour. We need to support it.
>

Yep. If we can insert safely during implementation then that would be very
cool.


>
>
> Performance
>
> "The marvel is not that the bear dances well, but that the bear dances at
> all."
>
>
> It's actually a bit better than that. It's not noticeably slower on
> anything. I'm not sure what makes a good benchmark. Code I've tried isn't
> bottlenecked on hashing. I've tried using cachegrind - we're using more CPU
> in the hash code, and more data references, but fewer last level misses.
>
> But some of this could be down to tuning - open addressing hashes have do
> more work for insert and delete moving entries around to maintain their
> invariants. Robin Hood invariants are slightly more work on insert and
> delete, but intended to reduce work on lookups. And the approach here of
> storing more hash bits in the metadata means more ordering constraints (so
> move moving), but fewer lookups outside of the metadata bytes on reading.
>
> As I don't know what to benchmark, I've not tried tuning anything.
> (To be fair, I'm exhausted after the big push to get this working at all.
> It's why I've been quiet for a few weeks)
>

Test it with a broken hash function, or a precomputed set of keys that
collide in the right way. If it doesnt degrade terribly IMO you are good.


>
> The load factor might be too high. There's actually quite a lot that can
> be tuned:
>
> * load factor (even make it dynamic as a function of hash size)
>

If speed is the objective AFAUI the underlying math for the birthday
paradox says what this should be, IIRC basically something in the %60 to
~%66. After that the collision probability starts to approach 1
relatively quickly. As far as I know there is no difference on this for
open hashing versus chain, the underlying costs are related to the
probability of collision which is purely determined by how many elements
are filled, if anything open addressing benefits more from a lower load
factor than chain addressing I would think.

* thresholds for resizing
> * thresholds for how many hash bits in the metadata
> * smaller sizes (official "4", "2" or even "1", and with a load factor of
> 1.0)
>
> but without meaningful benchmarks (which I don't have) it's not going to
> answer any questions.
>

I need to read up on the implementation to say anything about that part.
Looking forward to it. :-)


So for now I need to take a few days break from this, but everyone else is
> welcome to play with the code, and figure out how to benchmark it.
> (And break the randomisation)
>

This is super awesome, I will try to find some time to look into more
generally and I will definitely find some time to look at specific parts,
and if you ask me to do something to help ill put it on my priority list.
Thanks a lot, this is one of the more exciting changes for some time from
my POV.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, 6 Aug 2021 at 10:07, Nicholas Clark <nick@ccl4.org> wrote:

> On Fri, Aug 06, 2021 at 12:41:56AM -0700, Darren Duncan wrote:
> > On 2021-08-06 12:20 a.m., Martijn Lievaart wrote:
> > >
> > > Op 06-08-2021 om 03:13 schreef Darren Duncan:
> > > > On 2021-08-04 1:01 a.m., Smylers wrote:
> > > > > Let's debate a specific proposal on its own merits, and not
> discard it,
> > > > > or derail the discussion, in the hope of something bigger and
> vaguer
> > > > > that possibly could happen in the future.
> > > >
> > > > I fully agree.
> > > >
> > > > As the saying goes, the perfect is the enemy of the good.
>
> As I wrote in my orginal message:
>
> We can in turn replace this with something even better once code for
> that
> exists. But until then, don't let the perfect be the enemy of the good.
>
> +1


> > > > The current discussion should just focus on whether the current
> > > > proposed new solution is significantly better than what Perl has now
> > > > and thus whether Perl switching to it would be an improvement.
> > > >
> > > > Consideration of other algorithms can be their own separate
> > > > discussions if or when someone comes up with a specific example and
> > > > makes a case it is better than the winner of the current discussion,
> > > > including that someone has already committed to do the work etc.
>
> Yes, exactly that.
>
> > > Yes, but a question that may be asked is, can this be generalized to
> > > support different algorithms easily, so programmers can choose the
> > > hashing algorithm at runtime. And if it can, is it worth to take such a
> > > broadening of the scope on at the same time as handling the current
> > > proposal. If it is easy to implement, it should be looked at.
> > >
> > > I suspect the answer to this is a firm no, but it never hurts to look
> at
> > > this. Anyone who knows the internals who can say something about that?
>
> 1) I am not confident that it's easy to implement
> 2) *I* am not in a position to attempt it
> 3) I'm not in a position to mentor someone else who might attempt it
> 4) I doubt I'd even be in a position to review the code if someone did
> attempt it.
> (I estimate that it even if someone else started now, it would take them
> a couple of months at least, and I by then I won't have as much free
> time
> as I currently have)
>
> I'm not stopping anyone doing this, but based on previous experience of
> how often large good quality surprise contributions arrive, I doubt that
> it is going to happen.
>
>
I dont see that we need it either. I can understand why someone might want
to swap out the hash function, but not the hash table.


> > I would say that at this stage the best argument for having that feature
> of
> > hashing algorithm choice is:
> >
> > 1. If it turns out from measuring that some common use cases perform way
> > better with the current algorithm and others perform way better with the
> new
> > one, and users would then benefit from being able to choose to tune their
> > scenarios.
> >
> > 2. Also implementing the ability to choose isn't too much effort or has
> too
> > poor trade-offs of is own.
> >
> > 3. If going forward the added maintenance cost of two algorithms rather
> than
> > one is worth it.
>
> That's a good summary.
>
> We have finite programmer time.
>
> (eg - I think all but two messages in this thread so far have been from
> people who can consider designs and trade offs, but can't review the C
> code)
>
> If we want to maintain two (or three) effectively we'd be stealing from
> one to feed the other(s).
>
>
As the sereal patch shows it also cascades into CPAN.


> Much like the previous PSC opinion on sort, I'm going to extrapolate and
> suggest that it's only better to have 2+, if it's clear that a significant
> amount of the user base are actually going to actively *need* to use
> different hashing algorithms.
>
> But if all we really end up with is an "optional feature" that rarely
> anyone uses (because it costs them more to learn about it and evaluate it
> than they gain by choosing it), then we've wasted effort that could have
> been spent better elsewhere.
>
>
> So if someone else steps up an implements another hash, great.
>

From the POV of Sereal and CPAN I would not say "great" unless it truly
could be swapped out without Sereal breaking.

cheers,
Yves
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, Aug 03, 2021 at 02:27:42PM +0000, Nicholas Clark wrote:
> The proposed implementation "open addressing" eliminates the linked lists by
> storing HE structs in a flat array:

So in effect HvARRAY() changes from being an array of SV pointers to an
array of HE structs?

How will this affect the hv_fetch_ent() and similar functions, which
return a pointer to a HE? In particular what happens to such a pointer if
the HvARRAY() gets reallocated in the meantime. E.g.

HE *he = hv_fetch_ent(hv, ...);
hv_store(hv,...); /* XXX triggers HvARRAY realloc */
do_something_with(he->val); /* using freed memory! */


--
"Emacs isn't a bad OS once you get used to it.
It just lacks a decent editor."
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Sat, Aug 07, 2021 at 11:24:56AM +0100, Dave Mitchell wrote:

Sorry for the delay in replying. I've been on holiday. There were loads of
pumpkins there, but I think that they will require a lot of training up
before they become useful to us:

https://gist.github.com/nwc10/37a85e5272bcfc460b77809ec648d518

> On Tue, Aug 03, 2021 at 02:27:42PM +0000, Nicholas Clark wrote:
> > The proposed implementation "open addressing" eliminates the linked lists by
> > storing HE structs in a flat array:
>
> So in effect HvARRAY() changes from being an array of SV pointers to an
> array of HE structs?

Effectively yes (except that you meant HE struct pointers. It's AvARRAY
that is SV pointers. Well, SV**)

Which isn't the same thing - in as much as I can't see any way to fake up
an interface that offers an array of pointers-to (so can be passed around
as values of type HE **) when all that exists in memory is an array of HE *,
short of

1) allocating memory for an entire array of HE **
2) initialising it correctly
3) "there you go"
4) freeing it ("later")

hence why I eliminated the macro HvARRAY and renamed the entry in the union
that it wrapped. "Break early, break often..." ish.

> How will this affect the hv_fetch_ent() and similar functions, which
> return a pointer to a HE? In particular what happens to such a pointer if
> the HvARRAY() gets reallocated in the meantime. E.g.
>
> HE *he = hv_fetch_ent(hv, ...);
> hv_store(hv,...); /* XXX triggers HvARRAY realloc */
> do_something_with(he->val); /* using freed memory! */

It will go wrong the way that you figured out.

We could work around *this* problem by deferring the memory free onto the
save stack - the entire hash structure is a single block of memory.

But array reallocation only happens on insert, and insert can move things
within the allocated block.


Which got me thinking more generally...

https://grep.metacpan.org/search?q=hv_fetch_ent.%2A%5C%28

thinks that there are 128 distributions using hv_fetch_ent. That's small
enough to audit (but not small enough to do while writing an e-mail) to
see if any do this.

(and 85 using hv_store_ent, most of which use it in void context)


I'm going to guess (but haven't checked this yet) that few to none do this.
Any that do can be patched. (Like with HvARRAY)

At which point we can say all of

1) We have audited *all* of CPAN for expected problems
2) We are confident that the entire public ecosystem is good
3) Roughly 0.0something% of CPAN distributions were affected
4) If you have your own XS code, you should audit for these things [...]


where we have a good feel for how small 0.0something is.


But really this is the only general guarantee that we can make for any
stable release

1) We checked CPAN
2) We figured out the probability of breakage
3) If it's published, we have found it and fixed it already
4) If it's not published, we can't fix that for you


Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
Could I mention one more topic related to hashing, which might turn out to be related to the proposed new implementation. When a reference is used as a hash key, I believe that currently it gets stringified to something like ARRAY(0xabc) and this string is then hashed. (Correct me if I'm wrong.)

For the programmer this is a bit awkward, since you can't get back the original objects by iterating over the keys. You have to use a wrapper like Tie::RefHash, which is slower, and awkward for multidimensional data structures. (There is Tie::RefHash::Nestable for hashes of hashes, but even that doesn't cover a hash of arrays of hashes, and so on.)

But I think it must also be slower. Surely if you have the reference you can treat it as an address (a 64-bit or 32-bit integer) and have a fast way to transform that to a position in the hash data structure. Even if the "address hashing" function ends up being fairly complex, it still has to be faster than converting the address to ASCII text and then hashing the string. Storing the string will also use more memory than storing a pointer plus a tag to say what type of object.

So I am wondering whether, if the hash internals get reworked for greater speed, the treatment of references can also change. Then using references as a hash key would be faster. Moreover, it could be fast to get back the original references rather than a stringified version -- perhaps with a new opcode or new builtin that is like 'keys' but doesn't stringify. Tie::RefHash could change to call that, for compatibility with existing programs, while new programs could use the builtin directly and be even quicker (no tying overhead).

I have asked before whether references could always be preserved in hash keys -- so "Tie::RefHash all of the time". There are some corner cases where this would change the semantics, such as adding to a hash both a string and a reference which stringifies to the same thing, even if you ignore the question of code that explicitly checks ref() on each hash key returned. So for now I'm not advocating this as a global change. However, I think if Perl could provide a fast builtin way to use references as hash keys and get them back unharmed, it would be very useful for new code. Is that something that can be fitted into the new hashing implementation?


This email and any files transmitted with it are CONFIDENTIAL and are intended solely for the use of the individual(s) or entity to whom they are addressed. Any unauthorised copying, disclosure or distribution of the material within this email is strictly forbidden. Any views or opinions presented within this email are solely those of the author and do not necessarily represent those of PGIM Limited, QMA Wadhwani LLP or their affiliates unless otherwise specifically stated. An electronic message is not binding on its sender. Any message referring to a binding agreement must be confirmed in writing and duly signed. If you have received this email in error, please notify the sender immediately and delete the original. Telephone, electronic and other communications and conversations with PGIM Limited, QMA Wadhwani LLP and/or their associated persons may be recorded and retained. PGIM Limited and QMA Wadhwani LLP are authorised and regulated by the Financial Conduct Authority. PGIM Limited (registered in England No. 3809039) has its registered office at Grand Buildings, 1-3 Strand, Trafalgar Square, London WC2N 5HR and QMA Wadhwani LLP (registered in England No. OC303168) has its registered office at 9th Floor, Orion House, 5 Upper St. Martin's Lane, London, England, WC2H 9EA.

Please note that your personal information may be stored and processed in any country where we have facilities or in which we engage service providers. If you provide personal information to us by email or otherwise, you consent to the transfer of that information to countries outside of your country of residence and these countries may have different data protection rules than your country.

To learn about our privacy policies, please use this link<https://www.pgim.com/disclaimer/privacy-center> to read the Privacy Notices.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Wed, Aug 18, 2021 at 01:09:47PM +0000, Ed Avis wrote:
> Could I mention one more topic related to hashing, which might turn out to be related to the proposed new implementation. When a reference is used as a hash key, I believe that currently it gets stringified to something like ARRAY(0xabc) and this string is then hashed. (Correct me if I'm wrong.)

Yes, but if it's blessed then the class name is part of that:

$ perl -lwe 'my $obj = bless {}; print $obj'
main=HASH(0x9a0fb0)

and if has overloaded stringification, it can do whatever it wants.

Which becomes relevant...

> For the programmer this is a bit awkward, since you can't get back the original objects by iterating over the keys. You have to use a wrapper like Tie::RefHash, which is slower, and awkward for multidimensional data structures. (There is Tie::RefHash::Nestable for hashes of hashes, but even that doesn't cover a hash of arrays of hashes, and so on.)
>
> But I think it must also be slower. Surely if you have the reference you can treat it as an address (a 64-bit or 32-bit integer) and have a fast way to transform that to a position in the hash data structure. Even if the "address hashing" function ends up being fairly complex, it still has to be faster than converting the address to ASCII text and then hashing the string. Storing the string will also use more memory than storing a pointer plus a tag to say what type of object.
>
> So I am wondering whether, if the hash internals get reworked for greater speed, the treatment of references can also change. Then using references as a hash key would be faster. Moreover, it could be fast to get back the original references rather than a stringified version -- perhaps with a new opcode or new builtin that is like 'keys' but doesn't stringify. Tie::RefHash could change to call that, for compatibility with existing programs, while new programs could use the builtin directly and be even quicker (no tying overhead).
>
> I have asked before whether references could always be preserved in hash keys -- so "Tie::RefHash all of the time". There are some corner cases where this would change the semantics, such as adding to a hash both a string and a reference which stringifies to the same thing, even if you ignore the question of code that explicitly checks ref() on each hash key returned. So for now I'm not advocating this as a global change. However, I think if Perl could provide a fast builtin way to use references as hash keys and get them back unharmed, it would be very useful for new code. Is that something that can be fitted into the new hashing implementation?

Even with the current hash implementation, I think that this would be simple
enough for the extremely constrained corner case of

* exclusively references
* "hashed"/equality determined by address


(and either with the current hash or ABH, it needs more C code, and more
if statements or function pointers or something else equivalent...)

because one would

1) just store the pointer to the SV
2) do the numeric hashing using the pointer address, and comparison using
pointer address


and that "hashing"/"comparison" step is *not* using the same approach as
Perl currently uses for (string) comparison between references.


You can't (then) also store regular strings as keys in the same hash.

Unless you then change to using the current stringification of references
as your "key" for hashing. (even if you calculate it as needed, or cache it)
as otherwise (as you observe) effectively you could store duplicate keys
(the true reference, and its stringification)


But then you hit another problem:

$ perl -lwe 'my $obj = bless {}; print $obj; bless $obj, "other"; print $obj'
main=HASH(0x1a56fb0)
other=HASH(0x1a56fb0)


"I" can change the string that a reference hashes to, after I insert it into
the hash.

So is that disallowed? If so how?


And this also can't support references that want to create objects that
act as "value" types. So these two are different:

$ perl -MMath::BigInt -MScalar::Util=refaddr -lwe 'my $x = Math::BigInt->new(42); my $y = Math::BigInt->new(42); print $x == $y ? "==" : "!="; print refaddr($x) == refaddr($y) ? "Same" : "different"'
==
different


but that means that

* either one has to use exactly the same object to look up in the hash
* or one has to then implement some form of overloaded comparison


the first makes it extremely inflexible and counter intuitive
the second is not going to be massively performant


I don't see a winning move here.

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
Thanks for your reply. I imagined that if you use the pointer address as the hash key, you'd need a separate field storing the type of object or the package name it's blessed into. That's needed to always return the same string as when it was first inserted. Having the separate lookup adds overhead, but I think it would still be a win compared to making, hashing, and storing the string each time.

You make a good point that the stringification of an object can change while its address does not. I would say that using such objects as hash keys is asking for trouble to start with. Of course, it has to be handled if the new implementation is to be 100% compatible. Perhaps one day, the semantics could change a little so that if an object's stringification changes, the stringified hash key also changes but the object retains its position in the hash, since it's hashed by identity. But that would be out of scope for the current work.

The question of "value" types like Math::BigInt is a bigger reason why it's hard to change the current behaviour. I think it's unfortunate; Perl has painted itself into a corner with forcing all hash keys to be strings and losing the original identity, and there's no clear path to change it without a compatibility break, when in my opinion using an object as a hash key and getting back the same object falls under "easy things should be easy".


This email and any files transmitted with it are CONFIDENTIAL and are intended solely for the use of the individual(s) or entity to whom they are addressed. Any unauthorised copying, disclosure or distribution of the material within this email is strictly forbidden. Any views or opinions presented within this email are solely those of the author and do not necessarily represent those of PGIM Limited, QMA Wadhwani LLP or their affiliates unless otherwise specifically stated. An electronic message is not binding on its sender. Any message referring to a binding agreement must be confirmed in writing and duly signed. If you have received this email in error, please notify the sender immediately and delete the original. Telephone, electronic and other communications and conversations with PGIM Limited, QMA Wadhwani LLP and/or their associated persons may be recorded and retained. PGIM Limited and QMA Wadhwani LLP are authorised and regulated by the Financial Conduct Authority. PGIM Limited (registered in England No. 3809039) has its registered office at Grand Buildings, 1-3 Strand, Trafalgar Square, London WC2N 5HR and QMA Wadhwani LLP (registered in England No. OC303168) has its registered office at 9th Floor, Orion House, 5 Upper St. Martin's Lane, London, England, WC2H 9EA.

Please note that your personal information may be stored and processed in any country where we have facilities or in which we engage service providers. If you provide personal information to us by email or otherwise, you consent to the transfer of that information to countries outside of your country of residence and these countries may have different data protection rules than your country.

To learn about our privacy policies, please use this link<https://www.pgim.com/disclaimer/privacy-center> to read the Privacy Notices.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Wed, Aug 18, 2021 at 01:09:47PM +0000, Ed Avis wrote:
> But I think it must also be slower. Surely if you have the reference you can treat it as an address (a 64-bit or 32-bit integer) and have a fast way to transform that to a position in the hash data structure. Even if the "address hashing" function ends up being fairly complex, it still has to be faster than converting the address to ASCII text and then hashing the string. Storing the string will also use more memory than storing a pointer plus a tag to say what type of object.

In addition to Nicholas's response, when objects are duplicated into a
new thread the duplicated object has a new address which breaks
the hash, and finding the object given the address.

$ perl -Mthreads -le 'my $x = bless {}; print $x; threads->create(sub { print $x })->join; print $x'
main=HASH(0x55ddc8938278)
main=HASH(0x55ddc8a7b1c0)
main=HASH(0x55ddc8938278)

Tony
Re: Robin Hood Hashing for the perl core [ In reply to ]
Tony Cook wrote:

>In addition to Nicholas's response, when objects are duplicated into a
>new thread the duplicated object has a new address which breaks
>the hash, and finding the object given the address.

But that's equally true with the way hashing works currently. The address changes, so the string representation changes, so you can no longer look up with the same key.

% perl -Mthreads -le 'my $x = bless {}; $h{$x} = "hello"; print $h{$x}; threads->create(sub { print $h{$x} // "undef" })->join'
hello
undef

So while it's a reminder to be careful about object identity when you are using threads, it's not a point of difference between stringifying hash keys and hashing by address.


This email and any files transmitted with it are CONFIDENTIAL and are intended solely for the use of the individual(s) or entity to whom they are addressed. Any unauthorised copying, disclosure or distribution of the material within this email is strictly forbidden. Any views or opinions presented within this email are solely those of the author and do not necessarily represent those of PGIM Limited, QMA Wadhwani LLP or their affiliates unless otherwise specifically stated. An electronic message is not binding on its sender. Any message referring to a binding agreement must be confirmed in writing and duly signed. If you have received this email in error, please notify the sender immediately and delete the original. Telephone, electronic and other communications and conversations with PGIM Limited, QMA Wadhwani LLP and/or their associated persons may be recorded and retained. PGIM Limited and QMA Wadhwani LLP are authorised and regulated by the Financial Conduct Authority. PGIM Limited (registered in England No. 3809039) has its registered office at Grand Buildings, 1-3 Strand, Trafalgar Square, London WC2N 5HR and QMA Wadhwani LLP (registered in England No. OC303168) has its registered office at 9th Floor, Orion House, 5 Upper St. Martin's Lane, London, England, WC2H 9EA.

Please note that your personal information may be stored and processed in any country where we have facilities or in which we engage service providers. If you provide personal information to us by email or otherwise, you consent to the transfer of that information to countries outside of your country of residence and these countries may have different data protection rules than your country.

To learn about our privacy policies, please use this link<https://www.pgim.com/disclaimer/privacy-center> to read the Privacy Notices.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Sun, 22 Aug 2021 at 05:40, Ed Avis <ed.avis@qmaw.com> wrote:

> Tony Cook wrote:
>
> >In addition to Nicholas's response, when objects are duplicated into a
> >new thread the duplicated object has a new address which breaks
> >the hash, and finding the object given the address.
>
> But that's equally true with the way hashing works currently. The address
> changes, so the string representation changes, so you can no longer look up
> with the same key.
>
> % perl -Mthreads -le 'my $x = bless {}; $h{$x} = "hello"; print $h{$x};
> threads->create(sub { print $h{$x} // "undef" })->join'
> hello
> undef
>
> So while it's a reminder to be careful about object identity when you are
> using threads, it's not a point of difference between stringifying hash
> keys and hashing by address.
>

It is a demonstration that we dont have a well defined object identity
function and that having one would be useful. :-)

yves
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Wed, Aug 18, 2021 at 01:09:47PM +0000, Ed Avis wrote:
> I have asked before whether references could always be preserved in hash
> keys

Just a general observation here about proposals to change or improve hash
semantics. I've been meaning for many years (but lacked tuits) to introduce
hash vtables, which would allow for differing hash implementations to be
called on a per-hash basis (so you could declare a hash as being of a
key-order-preserving or ref-as-key-preserving type for example).

Implementing this would then allow proposals such as yours to implemented
as optional extras without risking breaking backwards compatibility or
internals efficiency.

I even reserved a flag bit for it:

#define SVf_FAKE 0x01000000 /* ...
3: HV: informally reserved by DAPM
for vtables

Having such a mechanism would also allow us to remove all the
special-cased code in hv.c for %ENV, tied etc.

--
"There's something wrong with our bloody ships today, Chatfield."
-- Admiral Beatty at the Battle of Jutland, 31st May 1916.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Wed, Aug 18, 2021 at 10:15:48AM +0000, Nicholas Clark wrote:
> > How will this affect the hv_fetch_ent() and similar functions, which
> > return a pointer to a HE? In particular what happens to such a pointer if
> > the HvARRAY() gets reallocated in the meantime. E.g.
> >
> > HE *he = hv_fetch_ent(hv, ...);
> > hv_store(hv,...); /* XXX triggers HvARRAY realloc */
> > do_something_with(he->val); /* using freed memory! */
>
> It will go wrong the way that you figured out.

Perhaps we should update the docs for hv_*ent() now to state that the
returned value has a limited lifetime, and mustn't be used after any
further 'write' to the hash?

--
My Dad used to say 'always fight fire with fire', which is probably why
he got thrown out of the fire brigade.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Mon, Aug 30, 2021 at 01:53:20PM +0100, Dave Mitchell wrote:
> On Wed, Aug 18, 2021 at 10:15:48AM +0000, Nicholas Clark wrote:
> > > How will this affect the hv_fetch_ent() and similar functions, which
> > > return a pointer to a HE? In particular what happens to such a pointer if
> > > the HvARRAY() gets reallocated in the meantime. E.g.
> > >
> > > HE *he = hv_fetch_ent(hv, ...);
> > > hv_store(hv,...); /* XXX triggers HvARRAY realloc */
> > > do_something_with(he->val); /* using freed memory! */
> >
> > It will go wrong the way that you figured out.
>
> Perhaps we should update the docs for hv_*ent() now to state that the
> returned value has a limited lifetime, and mustn't be used after any
> further 'write' to the hash?

Whatever else happens, yes I think that this is a good idea.

But more generally, if you're totally paranoid you can't assume anything
about it if you let the hash out of your sight, as anything you call might
cause the hash entry to be deleted, and hence the HE to be freed.

Your code doesn't own a reference to it.

I doubt that there's anything that evil out there though.

On Mon, Aug 30, 2021 at 01:51:50PM +0100, Dave Mitchell wrote:
> On Wed, Aug 18, 2021 at 01:09:47PM +0000, Ed Avis wrote:
> > I have asked before whether references could always be preserved in hash
> > keys
>
> Just a general observation here about proposals to change or improve hash
> semantics. I've been meaning for many years (but lacked tuits) to introduce
> hash vtables, which would allow for differing hash implementations to be
> called on a per-hash basis (so you could declare a hash as being of a
> key-order-preserving or ref-as-key-preserving type for example).

Yes. At least, "yes" is what I thought at first.

But I think for this case, it won't help most of the time.

"I don't want hash entries to move underneath me" is a constraint wanted by
the call site (the third party XS code)

"This hash can move about"/"This hash does not" would be a property of the
vtable, and you couldn't "turn it on/off" for any hash passed into the XS
code.

> Implementing this would then allow proposals such as yours to implemented
> as optional extras without risking breaking backwards compatibility or
> internals efficiency.

So I don't think that it can implement this, at least not "risk free"

I'm also not convinced about the efficiency. At the lowest level, it's a call
through a function pointer, which I thought that CPUs didn't totally enjoy.

But one level up, I can see (and in that branch) have made code and structure
assumptions based on how the parts of the hash fit together.

Trying to be general, particularly with iterators, is more complex.


I'm also sort of starting to think that there are two "levels" of hash
we're dealing with (anyway)

1) the low(er) level "dumb" hash that stores things, and can be iterated
- a data structure
2) SVt_PVHV, which can trigger arbitrary code on actions (such as tie)
- an object

and the VTABLEs for the two somewhat differ.

For the lower level, I think the minimal efficient API we get away with is

* build
* demolish
* fetch
* LVALUE fetch
* delete returning

and ideally

* pre-grow this hash

store is implemented as LVALUE fetch.
or LVALUE fetch is emulated with fetch and then maybe store
delete returning would be emulated with fetch and then delete


and then we also want iteration, which I think is:

* build
* demolish
* current key
* current value
* advance
* finished?


but really we'd like to have first class iterator (ie 2+ read-only
can exist for the same hash at the same time)


and that API I've sketched out above

1) doesn't map 1 to 1 with the current tied hash API
2) doesn't consider whether we need to have "bless" at the API level


and the current hash API is nuts with tied hashes.

"you call store, and then if it returns 0 you actually need to call set magic
on your value", IIRC


so it's not obvious to me how "tied hashes" fit into what the core needs to
call down to. Or what shape a sane API should be for the core to expose,
such that you can actually pass tied hashes into unsuspecting XS code and,
well, it actually works.

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Wed, Sep 01, 2021 at 01:50:29PM +0000, Nicholas Clark wrote:
> On Mon, Aug 30, 2021 at 01:53:20PM +0100, Dave Mitchell wrote:
> > Just a general observation here about proposals to change or improve hash
> > semantics. I've been meaning for many years (but lacked tuits) to introduce
> > hash vtables, which would allow for differing hash implementations to be
> > called on a per-hash basis (so you could declare a hash as being of a
> > key-order-preserving or ref-as-key-preserving type for example).
>
> Yes. At least, "yes" is what I thought at first.
>
> But I think for this case, it won't help most of the time.
>
> "I don't want hash entries to move underneath me" is a constraint wanted by
> the call site (the third party XS code)
>
> "This hash can move about"/"This hash does not" would be a property of the
> vtable, and you couldn't "turn it on/off" for any hash passed into the XS
> code.

Er, I was suggesting vtables would help with ref-preserving key semantics,
not that they would help with HE lifetimes!

> I'm also not convinced about the efficiency. At the lowest level, it's a call
> through a function pointer, which I thought that CPUs didn't totally enjoy.

My "vision" such as it was, is that vtable functions are only called for
the special cases. I.e. whereas hv_common() etc currently have a whole
bunch of code along the lines of:

if (has %ENV magic)
do something special;
if (has tie magic)
do something special;
if (is stash)
do something special;
...etc...

It would now look like:

if (hv->sv_flags & SVf_HV_VTABLE) {
...call the relevant chain of vtable fn pointers ...
return if vtable function indicated a return was appropriate;
}
... handle a plain perl hash ...

So most special-case code is now hidden behind a single conditional.

> But one level up, I can see (and in that branch) have made code and structure
> assumptions based on how the parts of the hash fit together.

I confess I haven't given much thought to the API, especially in terms of
what should be the collection of vtable methods, and especially as to
whether they should be a small collection of low-level functions, mimicing
hv_common() etc, or lots of higher-level ones, mimicing all the
hv_fetch(), hv_fetch_ent() etc functions.

--
In England there is a special word which means the last sunshine
of the summer. That word is "spring".
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Wed, Aug 04, 2021 at 02:45:30PM +0100, Dave Mitchell wrote:

> A good start would be:
>
> $ ./perl -Ilib Porting/bench.pl --tests='/^expr::hash/' -j 8 -w /tmp/results -v perlbinary1 perlbinary2 ...
>
> which runs cachegrind on various empty and full loops containing the
> snippets of code matched in t/perf/benchmarks and writes the results (with
> startup and loop overhead factored out) into the file /tmp/results.
> The -j 8 says run 8 benchmarks in parallel. The -v says show what's going on.

It takes more CPU cycles for the hash tests, and about the same everywhere
else.

The tests are all for hashes with 1 or 2 keys? If so, this isn't at all
surprising as I know that it's trading instructions for memory locality,
but also that hashes with few keys are on average slightly larger.

> :The downsides are
> :
> :1) more CPU work in maintaining that flat array in order
>
> Can you quantify that, or describe in more detail what is involved?

quantify is hard, as every benchmark will produce something different.

But for the existing hash, insertion or deletion are operations on a linked
list, so are O(1), and are just pointer updates.

(A linked list with potentially no cache locality, so O(1) isn't sure fire
cheap)


For any Robin Hood hash, insertion or deletion can involve moving other keys
around, so I guess that's internally O(m) where `m` is how many keys you
need to move. So that's memmove of a block rather than pointer updates, *but*
there's good cache locality, so it might be close to free.

("free" for the first order approximation/assumption that cache misses
dominate CPU throughput)

> Some of my maths code involves hashes with order of 10^8 keys, typically
> "seen" hashes (that let me do X if I haven't seen this key before).
> Typically for those the only interaction with the hash is a single
> line like:
> do_X($obj) unless $seen{$obj->serialize}++;
>
> In particular, it sounds like ABH is optimized for cases where searches
> are much more common than inserts/deletes, so I think "seen" hashes
> in workloads where duplication is rare would be close to worst-case.

Yes. I think your reasoning is spot on.

So, after some experimenting, here is a benchmark tuned to flatter it:

#!perl -w
use strict;

my $size = shift;
$size ||= 1e5;

my %hash;

if ($size < 0) {
$size = -$size;
keys %hash = $size;
}

# Avoid a lot of scope entries/exits so that we're mostly timing what we care
# about

++$hash{$_ * 3}
for 1 .. $size;

$hash{$_ * 5} || 0
for 1 .. $size;

$hash{$_ * 5} || 0
for 1 .. $size;

$hash{$_ * 5} || 0
for 1 .. $size;

__END__


Running that with -1e7 (1 million keys, pre-grow hash)

blead at 75595dd289

==8755== I refs: 36,259,338,443
==8755== I1 misses: 34,395
==8755== LLi misses: 7,111
==8755== I1 miss rate: 0.00%
==8755== LLi miss rate: 0.00%
==8755==
==8755== D refs: 16,767,897,454 (11,109,054,413 rd + 5,658,843,041 wr)
==8755== D1 misses: 215,912,461 ( 190,745,837 rd + 25,166,624 wr)
==8755== LLd misses: 206,452,293 ( 183,566,167 rd + 22,886,126 wr)
==8755== D1 miss rate: 1.3% ( 1.7% + 0.4% )
==8755== LLd miss rate: 1.2% ( 1.7% + 0.4% )
==8755==
==8755== LL refs: 215,946,856 ( 190,780,232 rd + 25,166,624 wr)
==8755== LL misses: 206,459,404 ( 183,573,278 rd + 22,886,126 wr)
==8755== LL miss rate: 0.4% ( 0.4% + 0.4% )


ABH with a load factor of 0.675 (effectively what the branch now has as HEAD):

==10595== I refs: 41,940,009,601
==10595== I1 misses: 33,078
==10595== LLi misses: 6,927
==10595== I1 miss rate: 0.00%
==10595== LLi miss rate: 0.00%
==10595==
==10595== D refs: 18,636,752,812 (12,272,777,479 rd + 6,363,975,333 wr)
==10595== D1 misses: 222,023,137 ( 182,141,483 rd + 39,881,654 wr)
==10595== LLd misses: 181,489,351 ( 145,139,062 rd + 36,350,289 wr)
==10595== D1 miss rate: 1.2% ( 1.5% + 0.6% )
==10595== LLd miss rate: 1.0% ( 1.2% + 0.6% )
==10595==
==10595== LL refs: 222,056,215 ( 182,174,561 rd + 39,881,654 wr)
==10595== LL misses: 181,496,278 ( 145,145,989 rd + 36,350,289 wr)
==10595== LL miss rate: 0.3% ( 0.3% + 0.6% )


Lots more CPU instructions.
More CPU cache references, but far fewer last level read misses.

Deliberate choices that favour it here

1) pre-grow the hash
2) more reads than writes
3) lexical hash, so we're timing hash free as well


Note that there's quite a lot more data writing - that's all the memmove()


> Having read through the comments in MoarVM, I find some of it confusing,
> but it sounds like the trade-off could be tweaked by controlling the
> number of extra hash bits stored. Is that something we could consider
> making controllable on a per-hash basis?

Yes. Things that can be tuned

1) load factor
2) extra hash bits store (if any)
3) strategy for "do i need to grow" - try to avoid it by reducing the extra
hash bits, or just bite the bullet and allocate more memory

and

1) load factor could change as a function of size
2) as could extra hash bits


in playing with that benchmark, I found that a fixed load factor of 0.675
seemed to be better than either 0.75 (what I started with) or 0.5

Also, 5 extra hash bits *seemed* to be optimal. 6, 4, and 0 all produce
more cache misses.

(0 would mean that the extra hash bits aren't used, hence the code that
supports them could be deleted. That would mean fewer CPU instructions
generally, but the testing suggested that doing this hurts the data cache
more)


Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Tue, Aug 03, 2021 at 04:54:56PM +0100, hv@crypt.org wrote:
> Nicholas Clark <nick@ccl4.org> wrote:
> :I'd like to suggest that we switch the core's hashtable implementation from
> :the current array of linked-lists (which is not friendly with CPU caches) to
> :"open addressing" hashing (which is). More specifically, to "Robin Hood
> :Hashing", and most specifically to the implementation that I wrote for
> :MoarVM (and Perl 5) - "A Better Hash".
>
> How separable is it, would it be possible to add the new hash algorithm
> separately from the switch from linked lists? The latter sounds like it
> necessarily needs a big-bang approach, but we have prior experience in
> supporting alternate hash algorithms.

I think there's some confusion here. It doesn't help that "hash" can be
used to mean several things, and I guess "hash algorithm" too.

1) the function that maps arbitrary length keys to fixed size integers
(hash function, generating a hash value)
2) an associative array
(hash table - a data structure)

and I might have missed another one

We also have "seed", for the private hopefully-random integer which is used
to perturb the hash function so that external attackers can't pre-compute
the mapping from key to hash value.

Probably we should actually call that "salt", as that's the term for one
way hashes for passwords.


There isn't a new hash function. But currently where we're using 64 bit
hash function we truncate the output to 32 bits.

I "need" (I think the need is genuine) 64 bit hash values to make the
security work, so I changed the code to stop truncating them.

weaker) hash bucket order is exposed on iteration (it can't be hidden)
stronger) mapping back from bucket order to get the salt requires figuring
out 128 bits of secret random state


but it's only 96 bits if we stick to 32 bit hash values.

(Also, with 32 bit hash values we can't go beyond 2**32 keys in a hash,
whereas I wanted to make this thing future proof.)


And given that 32 -> 64 bit hash values needs some API "fun and games" it
seemed a good time to also move key lengths to size_t (so 64 bit where it
matters)

The API "fun and games" seemed to work just fine - I could build and test
all of CPAN that I tried. So I'm no longer worried about this part.

We could actually just do the 64 bit "API" with the existing linked list
hash structure, changing the relevant 2 members to 64 bits and size_t.

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Thu, Aug 05, 2021 at 10:56:24AM +0300, Sergey Aleynikov wrote:
> ??, 3 ???. 2021 ?. ? 17:28, Nicholas Clark <nick@ccl4.org>:
> >
> >HvARRAY can't be replaced or emulated. I think that 14 distros use it:
>
> You've missed XS::Framework and everything that depends on it. Maybe
> you've searched only in .xs files? You should add at least .h, but on
> a full grep I see more - for example threads:tbb in .cc for a total of
> 24 distributions.

I did. The trick seems to be to skip the web interface and go direct to the
command line. This makes it easy to filter out ppport.h, and bulk extract
disto names without worrying about pagers.

Cloning https://github.com/metacpan/metacpan-cpan-extracted "only" takes 15G.

> For our (mine/syber) modules, HvARRAY usage is to avoid
> tiedness/boundary checks for internal hashes that we iterate over, so
> if a replacement API does them - it's useless for this particular
> purpose. We don't mind adopting our code for new internals, they just
> shouldn't be "too closed" behind static functions.

Yes, thanks for the heads up. I looked at XS::Framework, and I think the
existing inline functions exposed in the proof of concept are enough to
cover it, with possibly one more needed. It probably helps that my iterator
design started out with looking at C++ STL code :-)

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, Sep 10, 2021 at 08:50:42AM +0000, Nicholas Clark wrote:
> Lots more CPU instructions.
> More CPU cache references, but far fewer last level read misses.

I'd be interested in seeing the changes in number of branches and numbers
of failed branch predictions. My unquantified intuition is that that is
one of the most important performance factors in modern CPUs.

--
Monto Blanco... scorchio!
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, Aug 06, 2021 at 11:26:49AM +0200, demerphq wrote:
> On Tue, 3 Aug 2021 at 16:28, Nicholas Clark <nick@ccl4.org> wrote:
>
> > Sorry that this message is about 4 months late for April Fool's Day.
> >
> > So, this doesn't really fit the "elevator pitch" shape because I don't
> > think
> > that proposal would really fit the RFC shape either.
> >
> >
> > I'd like to suggest that we switch the core's hashtable implementation from
> > the current array of linked-lists (which is not friendly with CPU caches)
> > to
> > "open addressing" hashing (which is). More specifically, to "Robin Hood
> > Hashing", and most specifically to the implementation that I wrote for
> > MoarVM (and Perl 5) - "A Better Hash".
> >
>
> Wow. I had looked into this once, and got discouraged.
>
> I am super happy to hear that you did this and I am in favour.

Well, "did" might be an exaggeration. I make a mess, that works. But that
is only the first 80%...

> NIcholas if there is anything I can do to help please reach out to me at my
> work email.

The two things that I know I'm not confident about that I think you (maybe
with $ork colleagues) knew about

1) benchmarking (so load factors, and trade offs there)

In that I see that the current hash table is "split if we get an
insertion collision once the load factor is over 0.66", which isn't
exactly an *obvious* choice... :-)

2) security

the way we split the linked list hash table is a function of 1 bit of the
hash value. If we didn't put effort into perturbing the iteration order to
conceal the bucket order, an attacker who can control the hash table's
keys can infer that single bit, work backwards from there to expose the
internal random state, and then with that pre-compute keys that collide

I *think* that perturbing the iteration order (by traversing the buckets
in an order determined by XOR-ing a per-hash-table random integer) is
exactly equivalent to *storing* the hash table with the buckets in that
perturbed order. Hence (for a single iteration) there's no difference in
exposure.

(Clearly for repeated iterations, a new random integer each time doesn't
expose anything, whereas the same bucket order exposes the same integer)

(Yes, we now have to store that random integer salt for every hash table,
not just hash tables we iterate)

But *if we properly mix* the per-hash-table 64 bit salt to get the bucket
order, am I right in thinking that, an attacker who can control the keys
and observe the iteration order, needs to *simultaneously* solve for 64
bits of random state? And never less.

(Contrast with the linked list split, where it's 1 bit)

hence, is the approach of salting the insertion order no less secure
than what we have currently?


> One additional downside of open hashing is that it does not allow
> constructing a hash that can have more than 100% load factor. i dont think
> we care, but its an important point. With chained hashing you can have a
> load factor higher than 100% if you want, and i believe that is what
> happens when the existing bucket array gets oversized. I dont think it
> matters, in practice such a hash would be insanely large in memory.

Technically you can kind-of-sort-of just, in that one of the standard-ish
tricks for open addressing generally was to "unwrap" the hash table to avoid
a modulo instruction when walking along looking for an empty bucket. Given
you know the maximum probe distance, you allocate that many "extra" buckets
at the end, and just increment your pointer when searching, instead of
increment and then maybe "wrap" it back to the start.

Hence you could make a hash with "4" buckets, and a load factor of 1.25,
relying on the extra buckets to provide enough space. Yes, you'd mostly
be doing a linear search...

But, this *might* be worth doing for us, if there are lots of hashes with
1 to 4 keys. (version objects seem to be 4 keys, MRO lookup tables are 1
key, at worst 2) - "official" size 4 buckets with a load factor of 1.0,
meaning actually allocating 7 buckets and storing up to 4 keys.

> I also believe that the penalty of a poor hash function is higher, but I
> dont know that I can prove it. (Eg, a poor hash function that produces too
> many collisiions causes more damage to open addressing than to
> chain-addressed) but with SipHash as the standard hash function this isnt a
> problem.

I get this impression too. The folks blogging about this stress that "you
need a good hash function" but never go into much more detail than that.


> > I *did* need to patch Cpanel::JSON::XS and Sereal to remove
> > -Werror-no-declaration-after-statement from their C compiler flags.
> >
>
> Why? Doesn't this just make it unprotable to AIX and old MS builds?

No, not AIX :-)

https://www.nntp.perl.org/group/perl.perl5.porters/2021/06/msg260331.html

The PSC proposes that we no longer support VC11.0 and earlier.

...

VC12.0 is capable of *targeting* XP and Windows Server 2003
(build host requirement is Windows 7/Windows Server 2012)

VC12.0 supports C99 (well, the bits that C11 didn't make optional)

All the *nix vendor compilers are fine. Amusingly it's actually VMS that was
late to the party, but now HP have got enough of its act together to be
useful.

> 2) The type of PL_strtab
> >
>
> I am not sure I get this one, can you explain more.

Currently PL_strtab *is* (almost) a regular SV of type SVt_PVHV.
It's marked as having "unshared" hash keys, but you can (in theory) do that
to any other hash.

The only crazy thing about it is that the "value" is a reference count
rather than a pointer.

All the current hash code assumes a fixed HE * structure, hence a fixed size.

Whereas the ABH internals were written to be flexible and handle hashes of
"any size you like, as long as they start with a pointer to the key"

This means that the shared string table entries can be just a pointer to
a key - ie sizeof(void *)

With the reference count moved *into* the HEK.

1) This makes some of the other code simpler - sharing a HEK is cheaper
2) The open addressing array wastes less space for each empty "slot"


but this means that PL_strtab can no longer be passed into regular HV*
functions to get stats from it.

> > Why am I suggesting "A Better Hash" and not $other...
> >
> > 1) C implementation exists
> > 2) It's portable
> > 3) It's suitably licensed
> > 4) It supports hash randomisation
> > 5) You can delete during hash iteration without having "problems"
> >
>
> It can? Interesting. I need to read up on that, I didnt think that was true.

Well, I thought that the "rule" was that it was OK to delete at the position
of the current iterator...

But, actually the current code takes a lot of care to ensure that you can
delete *any* HE from a hash without corrupting the pointers. ie every
delete checks

1) are they deleting the current iterator?
2) are they deleting the target of the next pointer in the current iterator?

and fixes everything up accordingly

> > 0) The "end of hash" iterator value is the same for hashes of any size
> > 1) It's cheap to test a CPU register against a constant.
> > 2) It's the reverse direction from internal movements caused by insertion
> > and deletion
> >
> > That last one is key - it means that it's possible to delete the hash entry
> > at the current iterator position without perturbing the iteration.
> >
> > Perl documents this as supported behaviour. We need to support it.
> >
>
> Yep. If we can insert safely during implementation then that would be very
> cool.

Well, "safe" is possible as "it won't crash" but I think that it's not
viable to also try to make guarantees like "you'll see existing key exactly
once, even if you insert during iteration"

(which technically we can't currently, as a hash split happens while you
are iterating, keys you already saw can get moved into a bucket that has
just been created, which has to be at a future index)

in that, any Robin Hood hash

1) might need to move entries in the array to make space during insert
2) and might need to "split", which is a completely new order

deletion (at the iterator) can be made "safe" because

1) you never need to split
2) if you iterate the array in the reverse direction from the direction that
entries "move" to "close" up a hole, then deletion (at the iterator) can't
move entries across the iterator (no repeats, and no skips)


if someone else has figured this out, I've not found it, or seen it written
down anywhere (clearly).

> If speed is the objective AFAUI the underlying math for the birthday
> paradox says what this should be, IIRC basically something in the %60 to
> ~%66. After that the collision probability starts to approach 1
> relatively quickly. As far as I know there is no difference on this for
> open hashing versus chain, the underlying costs are related to the
> probability of collision which is purely determined by how many elements
> are filled, if anything open addressing benefits more from a lower load
> factor than chain addressing I would think.

Interesting, 60%-66%. I'd read this mail then forgotten the details when
exploring - of 0.5, 0.625 and 0.75, 0.625 is the fastest

(I typed 0.675 in my other messages. That was a typo. My fault. It was
always 0.625 - 5/8)


I'd read that *generally* open addressing benefits from lower load factors,
for most collision strategies, it's better to finish soon.

(Of those that I can remember, linear probing has to keep going until it
finds an empty slot, so higher load factors mean that slot will be further,
and both insert and lookup pay the penalty of that long probe distance.
I think that all the others are similar - you can't know that you have a
hash miss until you reach an empty slot, or hit your max probe distance)

Whereas Robin Hood hashing's probe distance invariants mean that a lookup
can stop *before* it finds an empty slot. Folks blogging about their
exploits in writing hash tables seemed to suggest that it could support
a higher load factor.

But 0.625 seemed good.

> I need to read up on the implementation to say anything about that part.
> Looking forward to it. :-)

It's a long messy branch at

https://github.com/nwc10/perl5/tree/smoke-me/nicholas/ABH-POC

(which was somewhere earlier in this thread.

but has also been rebased a couple of times, and I'm trying to extract
runs of commits from it that stand on their own.)


And I can paste this faster than linking to it:

There are a lot of comments explaining how it works at
https://github.com/MoarVM/MoarVM/blob/master/src/core/str_hash_table.h

which I didn't duplicate for this proof of concept/proof of insanity.

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, Sep 10, 2021 at 10:47:48AM +0100, Dave Mitchell wrote:
> On Fri, Sep 10, 2021 at 08:50:42AM +0000, Nicholas Clark wrote:
> > Lots more CPU instructions.
> > More CPU cache references, but far fewer last level read misses.
>
> I'd be interested in seeing the changes in number of branches and numbers
> of failed branch predictions. My unquantified intuition is that that is
> one of the most important performance factors in modern CPUs.

Aha, --branch-sim=yes

blead at 75595dd289

==9493== I refs: 36,259,777,136
==9493== I1 misses: 34,365
==9493== LLi misses: 7,108
==9493== I1 miss rate: 0.00%
==9493== LLi miss rate: 0.00%
==9493==
==9493== D refs: 16,768,164,534 (11,109,175,017 rd + 5,658,989,517 wr)
==9493== D1 misses: 215,977,059 ( 190,801,625 rd + 25,175,434 wr)
==9493== LLd misses: 206,542,187 ( 183,664,107 rd + 22,878,080 wr)
==9493== D1 miss rate: 1.3% ( 1.7% + 0.4% )
==9493== LLd miss rate: 1.2% ( 1.7% + 0.4% )
==9493==
==9493== LL refs: 216,011,424 ( 190,835,990 rd + 25,175,434 wr)
==9493== LL misses: 206,549,295 ( 183,671,215 rd + 22,878,080 wr)
==9493== LL miss rate: 0.4% ( 0.4% + 0.4% )
==9493==
==9493== Branches: 5,829,099,182 ( 5,326,755,821 cond + 502,343,361 ind)
==9493== Mispredicts: 429,396,239 ( 89,391,173 cond + 340,005,066 ind)
==9493== Mispred rate: 7.4% ( 1.7% + 67.7% )


ABH, load factor 0.625 (*not* 0.675 as I mis-reported earlier. Sorry)

==27112== I refs: 41,942,836,004
==27112== I1 misses: 33,102
==27112== LLi misses: 6,921
==27112== I1 miss rate: 0.00%
==27112== LLi miss rate: 0.00%
==27112==
==27112== D refs: 18,637,514,170 (12,273,187,422 rd + 6,364,326,748 wr)
==27112== D1 misses: 222,136,637 ( 182,252,838 rd + 39,883,799 wr)
==27112== LLd misses: 181,543,299 ( 145,196,987 rd + 36,346,312 wr)
==27112== D1 miss rate: 1.2% ( 1.5% + 0.6% )
==27112== LLd miss rate: 1.0% ( 1.2% + 0.6% )
==27112==
==27112== LL refs: 222,169,739 ( 182,285,940 rd + 39,883,799 wr)
==27112== LL misses: 181,550,220 ( 145,203,908 rd + 36,346,312 wr)
==27112== LL miss rate: 0.3% ( 0.3% + 0.6% )
==27112==
==27112== Branches: 6,629,846,443 ( 6,119,713,828 cond + 510,132,615 ind)
==27112== Mispredicts: 447,055,595 ( 127,049,280 cond + 320,006,315 ind)
==27112== Mispred rate: 6.7% ( 2.1% + 62.7% )


If I read that right

More branches.

In absolute numbers, more mispredicts for conditional branches, fewer for
indirect branches. (Where *are* the indirect branches we're hitting so much?),

In absolute numbers slightly more mispredicts over all, but proportionally
fewer.

(Burn more CPU work with the intent to reduce cache misses)

I'm not sure what can be tweaked to be better.
Or what actually matters, and what's fluff.
What is the benchmark?


Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, Sep 10, 2021 at 12:21:06PM +0000, Nicholas Clark wrote:
> If I read that right
>
> More branches.
>
> In absolute numbers, more mispredicts for conditional branches, fewer for
> indirect branches. (Where *are* the indirect branches we're hitting so much?),

In perl, the biggest indirects are usually the per-op op->op_ppaddr(aTHX)
calls in runops() and any switch() statements.

> In absolute numbers slightly more mispredicts over all, but proportionally
> fewer.
>
> (Burn more CPU work with the intent to reduce cache misses)
>
> I'm not sure what can be tweaked to be better.
> Or what actually matters, and what's fluff.
> What is the benchmark?

Your guess is as good as mine!

--
"But Sidley Park is already a picture, and a most amiable picture too.
The slopes are green and gentle. The trees are companionably grouped at
intervals that show them to advantage. The rill is a serpentine ribbon
unwound from the lake peaceably contained by meadows on which the right
amount of sheep are tastefully arranged." -- Lady Croom, "Arcadia"
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, 10 Sep 2021 12:21:06 +0000
Nicholas Clark <nick@ccl4.org> wrote:

> Aha, --branch-sim=yes
>
> blead at 75595dd289
>
> ==9493== I refs: 36,259,777,136
> ==9493== I1 misses: 34,365
> ==9493== LLi misses: 7,108
> ==9493== I1 miss rate: 0.00%
> ==9493== LLi miss rate: 0.00%
> ==9493==
> ==9493== D refs: 16,768,164,534 (11,109,175,017 rd + 5,658,989,517 wr)
> ==9493== D1 misses: 215,977,059 ( 190,801,625 rd + 25,175,434 wr)
> ==9493== LLd misses: 206,542,187 ( 183,664,107 rd + 22,878,080 wr)
> ==9493== D1 miss rate: 1.3% ( 1.7% + 0.4% )
> ==9493== LLd miss rate: 1.2% ( 1.7% + 0.4% )
> ==9493==
> ==9493== LL refs: 216,011,424 ( 190,835,990 rd + 25,175,434 wr)
> ==9493== LL misses: 206,549,295 ( 183,671,215 rd + 22,878,080 wr)
> ==9493== LL miss rate: 0.4% ( 0.4% + 0.4% )
> ==9493==
> ==9493== Branches: 5,829,099,182 ( 5,326,755,821 cond + 502,343,361 ind)
> ==9493== Mispredicts: 429,396,239 ( 89,391,173 cond + 340,005,066 ind)
> ==9493== Mispred rate: 7.4% ( 1.7% + 67.7% )
>
>
> ABH, load factor 0.625 (*not* 0.675 as I mis-reported earlier. Sorry)
>
> ==27112== I refs: 41,942,836,004
> ==27112== I1 misses: 33,102
> ==27112== LLi misses: 6,921
> ==27112== I1 miss rate: 0.00%
> ==27112== LLi miss rate: 0.00%
> ==27112==
> ==27112== D refs: 18,637,514,170 (12,273,187,422 rd + 6,364,326,748 wr)
> ==27112== D1 misses: 222,136,637 ( 182,252,838 rd + 39,883,799 wr)
> ==27112== LLd misses: 181,543,299 ( 145,196,987 rd + 36,346,312 wr)
> ==27112== D1 miss rate: 1.2% ( 1.5% + 0.6% )
> ==27112== LLd miss rate: 1.0% ( 1.2% + 0.6% )
> ==27112==
> ==27112== LL refs: 222,169,739 ( 182,285,940 rd + 39,883,799 wr)
> ==27112== LL misses: 181,550,220 ( 145,203,908 rd + 36,346,312 wr)
> ==27112== LL miss rate: 0.3% ( 0.3% + 0.6% )
> ==27112==
> ==27112== Branches: 6,629,846,443 ( 6,119,713,828 cond + 510,132,615 ind)
> ==27112== Mispredicts: 447,055,595 ( 127,049,280 cond + 320,006,315 ind)
> ==27112== Mispred rate: 6.7% ( 2.1% + 62.7% )
>
>
> If I read that right
>
> More branches.
>
> In absolute numbers, more mispredicts for conditional branches, fewer for
> indirect branches. (Where *are* the indirect branches we're hitting so much?),
>
> In absolute numbers slightly more mispredicts over all, but proportionally
> fewer.
>
> (Burn more CPU work with the intent to reduce cache misses)
>
> I'm not sure what can be tweaked to be better.
> Or what actually matters, and what's fluff.
> What is the benchmark?
>
>
> Nicholas Clark

Valgrind's branch prediction numbers are almost completely made up and
shouldn't be trusted. According to their documentation, valgrind's
simulated branch predictor is based on early Pentium 4. Modern CPUs work
completely differently!

https://valgrind.org/docs/manual/cg-manual.html#branch-sim

I'm also sceptical about its cache simulation. The only metrics I trust
are instruction and branch counts.

perf and AMD uProf are better tools for measuring those things.
Unfortunately, they aren't nearly as pleasant to use as valgrind.

BTW, while branch prediction numbers are very useful for figuring out
what the CPU is doing, it's important to remember that, in the end,
wall-clock time is the only metric that actually matters.
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, Sep 10, 2021 at 04:31:41PM +0200, Tomasz Konojacki wrote:
> On Fri, 10 Sep 2021 12:21:06 +0000
> Nicholas Clark <nick@ccl4.org> wrote:
>
> > Aha, --branch-sim=yes

> Valgrind's branch prediction numbers are almost completely made up and
> shouldn't be trusted. According to their documentation, valgrind's

The cynic in me wants to say:
So that fits quite nicely with benchmarks generally, doesn't it?

> simulated branch predictor is based on early Pentium 4. Modern CPUs work
> completely differently!
>
> https://valgrind.org/docs/manual/cg-manual.html#branch-sim

Thanks for the reminder about how ancient this is, and hence whether it's
really even useful.

> I'm also sceptical about its cache simulation. The only metrics I trust
> are instruction and branch counts.
>
> perf and AMD uProf are better tools for measuring those things.
> Unfortunately, they aren't nearly as pleasant to use as valgrind.

I've not met uProf, but if it's like perf, one needs root access to run it?

Right now I have user-level access to powerful hardware owned by (nice)
other people, but root access only to a couple of small things at home.

> BTW, while branch prediction numbers are very useful for figuring out
> what the CPU is doing, it's important to remember that, in the end,
> wall-clock time is the only metric that actually matters.

Yes, totally. In this case the "problem" being that tuning things (the code
or the benchmark) pushes work between one end member of "more instruction
effort, but stress the data cache less" and the other of "less instruction
effort, but stress the data cache more"

Well, not even two - I guess that can also be broken down into
"do more instruction dispatch but less branching" vs "less/more"
and the data side also looks like "trading more reads for fewer writes"
and even "more L1 work for fewer L3 misses".


So yes, wallclock is the only metric. And that needs benchmarks of the
appropriate shape and scale for the real workloads on the CPUs.

Dammit, which really means "benchmark something close to your production
code".

Nicholas Clark
Re: Robin Hood Hashing for the perl core [ In reply to ]
On Fri, 10 Sept 2021 at 13:48, Nicholas Clark <nick@ccl4.org> wrote:

> 2) security
>

> the way we split the linked list hash table is a function of 1 bit of
> the
> hash value. If we didn't put effort into perturbing the iteration order
> to
> conceal the bucket order, an attacker who can control the hash table's
> keys can infer that single bit, work backwards from there to expose the
> internal random state, and then with that pre-compute keys that collide
>
...


> hence, is the approach of salting the insertion order no less secure
> than what we have currently?
>

When bucket perturbing was introduced we were using a weak hash function.
I added the perturbing to make attacks on the seed harder. Now that we use
Siphash I think bucket perturbing can be ditched as an
unnecessary complication.


> But, this *might* be worth doing for us, if there are lots of hashes with
> 1 to 4 keys. (version objects seem to be 4 keys, MRO lookup tables are 1
> key, at worst 2) - "official" size 4 buckets with a load factor of 1.0,
> meaning actually allocating 7 buckets and storing up to 4 keys.
>

Doesn't sound like a good idea to me. But you know best. :-)


>
> > I also believe that the penalty of a poor hash function is higher, but I
> > dont know that I can prove it. (Eg, a poor hash function that produces
> too
> > many collisiions causes more damage to open addressing than to
> > chain-addressed) but with SipHash as the standard hash function this
> isnt a
> > problem.
>
> I get this impression too. The folks blogging about this stress that "you
> need a good hash function" but never go into much more detail than that.
>
>
Yeah, so don't worry about it. Siphash is a good hash function. The only
problem is that it is not 32 bit. Do we care anymore?


>
> > > I *did* need to patch Cpanel::JSON::XS and Sereal to remove
> > > -Werror-no-declaration-after-statement from their C compiler flags.
> > >
> >
> > Why? Doesn't this just make it unprotable to AIX and old MS builds?
>
> No, not AIX :-)
>
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/06/msg260331.html
>
> The PSC proposes that we no longer support VC11.0 and earlier.


> ...
>
> VC12.0 is capable of *targeting* XP and Windows Server 2003
> (build host requirement is Windows 7/Windows Server 2012)
>

We only introduced the flag so we would stop breaking our windows builds.
IF that isnt a problem anymore then yay.


>
> > 2) The type of PL_strtab
> > >
> >
> > I am not sure I get this one, can you explain more.
>
> Currently PL_strtab *is* (almost) a regular SV of type SVt_PVHV.
> It's marked as having "unshared" hash keys, but you can (in theory) do that
> to any other hash.
>

Yes, that is on my todo list to play with. For certain types of hashes IMO
the PL_strtab is a massive pessimization. We have to do a double store and
a lot of complex crap and we end up having one hidden hash that scales to
the size of the set of unique keys in all hashes, which can chew up a lot
of memory in the background and can play havok with preload loading of
data.

Eg, consider I want to build 10 hashes with 10 million keys each, im
actually building 11 hashes, one with 100M keys, and ten with 10M. AND if I
build of them prefork, and then build a totally different hash with similar
keys, I start COW'ing the hidden hash.

I have dreamed that we would/could change how hashes work, and only use
PL_strtab when they are blessed. So when a hash starts of it would be
unshared. When it gets blessed it gets converted into a shared hash. Eg,
maybe introduce a "no shared_keys;" pragma, or something like that.

The idea of PL_strtab as far as I know was to dedupe keys used in objects
where there is a high level of repetition of the keys. When you are build
large hashes of arbitrary strings, or aggregations in memory with lots of
breakouts that are unlikely to overlap, or are building hashes prefork, the
value is really questionable.


> The only crazy thing about it is that the "value" is a reference count
> rather than a pointer.
>
> All the current hash code assumes a fixed HE * structure, hence a fixed
> size.
>
> Whereas the ABH internals were written to be flexible and handle hashes of
> "any size you like, as long as they start with a pointer to the key"
>
> This means that the shared string table entries can be just a pointer to
> a key - ie sizeof(void *)
>
> With the reference count moved *into* the HEK.
>
> 1) This makes some of the other code simpler - sharing a HEK is cheaper
> 2) The open addressing array wastes less space for each empty "slot"
>
>
> but this means that PL_strtab can no longer be passed into regular HV*
> functions to get stats from it.
>

That probably only breaks some (inconsequential) code I wrote. AFAIK almost
nobody knows about PL_strtab. I wouldnt worry about it.


>
> > If speed is the objective AFAUI the underlying math for the birthday
> > paradox says what this should be, IIRC basically something in the %60 to
> > ~%66. After that the collision probability starts to approach 1
> > relatively quickly. As far as I know there is no difference on this for
> > open hashing versus chain, the underlying costs are related to the
> > probability of collision which is purely determined by how many elements
> > are filled, if anything open addressing benefits more from a lower load
> > factor than chain addressing I would think.
>
> Interesting, 60%-66%. I'd read this mail then forgotten the details when
> exploring - of 0.5, 0.625 and 0.75, 0.625 is the fastest


> (I typed 0.675 in my other messages. That was a typo. My fault. It was
> always 0.625 - 5/8)
>
>
> I'd read that *generally* open addressing benefits from lower load factors,
> for most collision strategies, it's better to finish soon.
>
> (Of those that I can remember, linear probing has to keep going until it
> finds an empty slot, so higher load factors mean that slot will be further,
> and both insert and lookup pay the penalty of that long probe distance.
> I think that all the others are similar - you can't know that you have a
> hash miss until you reach an empty slot, or hit your max probe distance)
>
> Whereas Robin Hood hashing's probe distance invariants mean that a lookup
> can stop *before* it finds an empty slot. Folks blogging about their
> exploits in writing hash tables seemed to suggest that it could support
> a higher load factor.
>
> But 0.625 seemed good.
>

I am pretty sure that you always want a lower load factor with open
addressing than you do with chain addressing, and that for any given load
factor open addressing will be worse in terms of collisions (which is fine,
open addressing is fast because of memory localization and more compact
memory utilization) so probably you should use a lower load factor.

Consider the probability of collision when you insert a new unique key into
a hash that contains 2 keys, both of which collide. With chain addressing
the chance of collision will be 1/k as we will have 2 items in one bucket,
with open addressing it will be 2/k as we will have two buckets occupied.
We know that with only very few entries in the hash table we will start
seeing collisions, so we can assume that most of the time a chain
addressing scheme will use less buckets than an open addressing scheme.

So I would argue you want to tune open addressing to a lower load factor
than chain addressing to get the best insert performance from it.

I am guessing that %50 for open addressing would give the same profile as
2/3rds. (I was going to say you should do the math, but then i did it
myself):

The number of used buckets in a chain addressed hash with 'd' buckets that
contains 'n' items is expected to be: d - (d * ((d-1)/d)**n). (when d=365
this is the same as saying how many distinct birthdays should be with 'n'
people in the room). The probability of a collsion is then (d - (d *
((d-1)/d)**n))/d.

Plugging some numbers into that, it turns out that when n is about 2/3rds
of d you get a collision rate of ~50%: (Which explains why 2/3rds is
favoured).

perl -le'for my $bits (3..24) { my $d= 1<<$bits; my $n= 2*$d/3; my $dm1od=
($d-1)/$d; my $r= ($d - ($d * ($dm1od**$n))); printf "n=%11d d=%11d
chain:%.4f\n", $n, $d, $r/$d;}'
n= 5 d= 8 chain:0.5094
n= 10 d= 16 chain:0.4976
n= 21 d= 32 chain:0.4920
n= 42 d= 64 chain:0.4893
n= 85 d= 128 chain:0.4879
n= 170 d= 256 chain:0.4873
n= 341 d= 512 chain:0.4869
n= 682 d= 1024 chain:0.4868
n= 1365 d= 2048 chain:0.4867
n= 2730 d= 4096 chain:0.4866
n= 5461 d= 8192 chain:0.4866
n= 10922 d= 16384 chain:0.4866
n= 21845 d= 32768 chain:0.4866
n= 43690 d= 65536 chain:0.4866
n= 87381 d= 131072 chain:0.4866
n= 174762 d= 262144 chain:0.4866
n= 349525 d= 524288 chain:0.4866
n= 699050 d= 1048576 chain:0.4866
n= 1398101 d= 2097152 chain:0.4866
n= 2796202 d= 4194304 chain:0.4866
n= 5592405 d= 8388608 chain:0.4866
n= 11184810 d= 16777216 chain:0.4866

Thus to have the same collisiion properties you want to set the max load
factor for the open addressed scheme to 50%. Note that the approximation of
2/3rd I chose produces a rough collision probability of 45%.

Cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"