Mailing List Archive: HvIS_EMPTY (was Re: Robin Hood Hashing for the perl core)

On Tue, Aug 03, 2021 at 04:54:56PM +0100, hv@crypt.org wrote:

> :I think we can patch them all. Several are just truth tests - I propose
> :a new macro HvIS_EMPTY(hv) to replace them, which we also add to ppport.h
>
> I agree with leonerd that we should do this one right away.
>
> :Others just iterate over all the hash buckets - I propose we add
> :hv_foreach() to core and ppport.h to handle these cleanly.
>
> Does that cover all remaining uses of HvARRAY that you found? Ie could
> we provide HvIS_EMPTY and hv_foreach and then deprecate HvARRAY?

Sigh, tied hashes...

So, the initial "implementation" of HvIS_EMPTY() was !HvTOTALKEYS(), which
made sense, as it's answering the same "logical" question as !HvARRAY() -
is this hash empty? Except that it answers it correctly.

But then I realised that "half" the fun with the XS interface to hashes is
that almost all of it doesn't hide the implementation details of tied
hashes. The only real exception is the existing iteration interface, and
also my implementation of hv_foreach().

So, it feels like a design regression to add a new interface which is both
"you're on your own with tied hashes" and inconsistent with foreach. Hence,
the better implementation is this:

PERL_STATIC_INLINE bool
Perl_HvIS_EMPTY(const HV *hv) {
return !SvRMAGICAL(hv) && !HvTOTALKEYS(hv);
}

at which point I tested that in the perl core (in place of the previous),
and discovered that

1) Devel::Peek::Dump would SEGV on a tied hash - previously untested...
https://github.com/Perl/perl5/commit/8937088f79a80f8c
2) Most of the other uses in the core probably weren't exposed to tied
hashes, but would crash similarly if one arrived, because they dereference
HvARRAY(), "knowing" that it can't be NULL...

so ultimately you can't have a sane HvIS_EMPTY() without also migrating the
following code from explicit iteration of HvARRAY() to an API that (also)
handles tied hashes.

So, the answer to the actual question about CPAN - I think that modules
partition roughly like this

1) HvIS_EMPTY

Sub-Disable

2) just replace HvARRAY iteration with hv_foreach:

Class-Accessor-Inherited-XS
Data-Recursive
Devel-Arena (but build fails on blead)
JavaBin

3) HvIS_EMPTY and hv_foreach

Devel-Size
Internals-DumpArenas
namespace-clean-xs
Sub-Disable

4) HvIS_EMPTY and hv_foreach, but needs special handling for PL_strtab

Devel-MAT-Dumper
Devel-SizeMe (but build fails on blead)

5) Just remove the existing code...

Cairo
Glib

(these both "inherit" the code from pp_defined, but are only interested
in scalars)

perlec (but build fails on blead)

autovivification

just make it return 0 - it also "borrows" the code from pp_defined.
Whilst it does pass that code AVs and HVs, it takes equivalent (and
correct) code paths if it doesn't check AvARRAY and HvARRAY. The change
would mean
static OP *a_pp_root_unop(pTHX)
static OP *a_pp_root_binop(pTHX)
would instead hit:
return oi->old_pp(aTHX);

6) Unclear, and don't pass tests on blead

B-C - unclear if anything needs fixing - tests have failed for some time

7) Unclear, and don't build on blead

JSPL - wraps obsolete spidermonkey implementation
mod_perl - modperl_perl_hv_fetch_he - could be replaced with hv_common?

8) Need code changes

Hash-Spy - it's swapping the guts of hashes around. It needs different code.

Sereal-Encoder
- might technically be possible with hv_foreach, but I'd already
prototyped "different code"
Storable
- ditto

threads-tbb
- doesn't build (arm32)/some tests abort with coredumps (x86_64)
uses interfaces Intel have marked as deprecated...
*as written* it would need "new code" for its iterator, but it
could *probably* be refactored to the right "shape" for
hv_foreach. However, I don't see a future for the module...

XS-Framework
- Has two uses of HvARRAY.
The macro XS_HV_ITER effectively *is* hv_foreach
Hash.begin()/end() - I think that these can be directly mapped to
the existing HvABH API
(Might need to add an "iterator is equal" function to enable
comparisons with Hash.end(), but the ABH iterator API was
purposefully pretty close to the C++ STL iteration "shape")

> I'm nervous that there are enough uses of the various affected structures
> on CPAN that it seems likely there'll be yet more in the DarkPAN; I think
> we should aim to mitigate as much as possible in advance with a combination
> of new interfaces and deprecations.

I think that it's 23 distributions out of 2726 on CPAN that have XS code.
To me it looks like many of the ones in that list are doing "other"
internals things that are sufficiently invasive that they're all at some
risk of breaking on any major release.

Of all the companies that I've worked for (or seen inside) I think that only
two have had custom XS code, and none of it was dealing with data structures
more complex than scalars. In both cases they were also competent enough to
build their own Perl binary rather than using whatever the OS gave them, and
competent enough to fix their own XS code.

So

1) I'm really not sure how much *XS* code there is in places we can't see
2) It's XS code. It won't install if it won't compile
3) The changes we need here are going to be compile time failures

> Additionally there is a fair amount of discussion in the notes about
> randomization to avoid algorithmic complexity attacks, but I see none
> about the converse - turning off the randomization to allow reproducible
> behaviour, eg for debugging. I think that's also important, and also
> merits some commentary.

Agree. The branch was the "proof of concept". Can I even make it work?

This part would follow logically from cleaning up the existing hash
randomisation setup, which I'd not done either.

Nicholas Clark