Mailing List Archive: PERL_RC_STACK branch: first cut

PERL_RC_STACK branch: first cut

Feb 26, 2023, 8:06 AM

Post #1 of 36 (652 views)

TL;DR:

The branch containing the first stage of my "make the stack ref-counted"
work is nearly ready for merging. In a default build, the core is mostly
unchanged: only when built with -DPERL_RC_STACK does it behave
differently. It needs testing on Windows. I need to decide whether to
merge soon or wait until after 5.38. Once in core it will affect people
who write or modify PP functions. Currently an RC build runs about 20%
slower; the second stage of this work will to optimise things - I expect
this to make an RC perl run at about the same speed as a standard perl. At
the bottom of this email is a cut+paste from a new section in perlguts
which explains how it works. I'm not sure how the API functions get added
to ppport.h.

In full:

The PR request

Smoke me/davem/rc stack2 (PR #20858)

is a branch which allows the perl argument stack to be reference-counted.
Subject to CI testing and feedback, it's ready for merging.

This work has been a lot harder, taken a lot longer, and is more invasive
than I originally expected.

When merged, in default builds:

* the stack still isn't reference-counted
* there should be no change in performance
* most PP functions are unchanged except that their declaration
has changed from PP(pp_const) to PP_wrapped(pp_const, 0, 0) and similar;
on default builds the PP_wrapped() macro ignores args 2 and 3 and does
the same as PP().
* A number of PP functions have already been updated to use the new regime;
this means getting rid of dSP, and replacing PUSHs()/POPs() etc with
rpp_push1() and rpp_popfree_1() inline functions etc, which on default
builds don't modify the reference count.

On builds configured with PERL_RC_STACK:

* items pushed on and popped off the stack have their reference count
adjusted.
* PP functions (such as pp_const()) which haven't been updated to use the
new regime yet are wrapped via the PP_wrapped() macro - this wrapper
function adjusts args before and after calling the real PP function,
and is a big overhead for lightweight pp functions. In the second phase
of this project, such PP functions will be modified to use rpp_() style
functions directly.
* XS code is all assumed to be non-RC-aware, and pp_entersub() will call
those those XS functions via a compatibility wrapper which will slow
things down a bit. There is not yet an API to allow XS functions to be
written in an RC-aware way.
* Any XS code which includes a custom OP with its own PP function, or
contains a PP function intended to override a core PP function ,won't
work unmodified;. Either the PP function needs to be wrapped, or
updated to use rpp_() functions.

The first 16 commits in this branch are generic fixups for things I
spotted while working on this branch, and could be committed even if this
branch was rejected overall.

Overall, an RC build perl runs the test suite approximately 20% slower.
I expect that difference to vanish once most small/hot pp functions have
been unwrapped.

Nothing under dist/ or cpan/ apart from dist/threads/ has needed to be
modified to run on RC builds, indicating that the change is mostly
seamless. Similarly, the only things under lib/ and ext/ that needed
tweaking were XS-APItest and Devel-Peek. Moose and all its dependencies
install ok. (But Coro currently crashes and burns).

I expect that perl will continue through several releases with
PERL_RC_STACK not the default, then switch over to being the default, and
then some releases later it will become mandatory, and all the conditional
code can be removed from the core. I.e. similar to how PERL_OP_PARENT was
gradually incorporated.

I've added a new section to perlguts,

=head1 Reference-counted argument stack

which should be required reading for any core hackers. I've cut and pasted
it at the bottom of this email for convenience.

The current state CI testing is that it mostly passes (this is for non-RC
builds), except that: Windows msvc142 keeps failing to compile things due
to not liking macros within a macro call, e.g.

FOO(bar,
#ifdef X
1
#else
0
#endif
);

which I'm currently whack-a-moleing.

Windows mingw64 is failing the build somewhere around threads-shared/ with
an exit code 2 but no obvious error message. I don;t know whether that's
an ongoing issue or an issue with my branch.

Also, win32 hasn't been tested with a PERL_RC_STACK build yet - it's quite
possible there are bugs in the cloning / fake fork code which don't get
exercised on Linux builds.

There is a whole new API with functions having an rpp_ prefix, plus a
handful of other new functions. I'm not clear how these get added to
PPPort. It usually seems to happen by magic. The API isn't complete yet;
as I work on unwrapping the bulk of the PP functions, it will become
clearer what extra 'nice to have' functions should be added. For example
I suspect that rpp_push_1(&PL_sv_undef) will be common enough to get its
own function, which will also be more efficient as it will skip
incrementing the refcount of an immortal SV.

Finally, here's a cut+paste from perlguts:

========================================================================

=head1 Reference-counted argument stack

=head2 Introduction

As of perl 5.38 (XXX check version) there is a build option,
C<PERL_RC_STACK>, not enabled by default, which requires that items pushed
onto, or popped off the argument stack have their reference counts
adjusted. It is intended that eventually this will be the default way and
finally the only way to configure perl.

The macros which manipulate the stack such as PUSHs() and POPs() don't
adjust the reference count of the SV. Most of the time this is fine, since
something else is keeping the SV alive while on the argument stack, such
a pointer from the TEMPs stack, or from the pad (e.g. a lexical variable
or a C<PADTMP>). Occasionally this can go horribly wrong. For example,
this code:

my @a = (1,2,3);
sub f { @a = (); print "(@_)\n" };
f(@a, 4);

may print undefined or random freed values, since some of the elements of
@_, which have been aliased to the elements of @a, have been freed.
C<PERL_RC_STACK> is intended to fix this by making each SV pointer on the
argument stack increment the reference count (RC) of the SV by one.

In this new environment, unmodified existing PP and XS functions, which
have been written assuming a non reference-counted (non-RC for short)
stack, are called via special wrapper functions which adjust the stack
before and after. At the moment there is no API to write an RC XS
function, so all XS code will continue to be called via a wrapper (which
makes them slightly slower), but means that in general, CPAN distributions
containing XS code code continue to work without modification.

However, PP functions, either in perl core, or those in XS functions used
to implement custom ops or to override the pp functions for built-in ops,
need dealing with specially. For the latter, they can just be wrapped;
this involves the least work, but has a performance impact. In the longer
term, and for core PP functions, they need unwrapping and rewriting using
a new API. With this, the old macros such as PUSHs() have been replaced
with a new set of (mostly inline) functions with a common prefix, such as
rpp_push_1(). "RPP" stands for "reference-counted push and pop functions".
The new functions modify the reference count on C<PERL_RC_STACK> builds,
while leaving them unadjusted otherwise. Thus in core they generally work
in both cases, while in XS code they are portable to older perl versions
via C<PPPort> (XXX assuming that they get been added to C<PPPort>).

The rest of this section is mainly concerned with how to convert existing
PP functions and how to write new PP functions to use the new C<rpp_> API.

A reference-counted perl can be built using the PERL_RC_STACK define.
For development and debugging purposes, it is best to enable leaking
scalar debugging too, as that displays extra information about scalars
that have leaked or been prematurely freed, and it also enables extra
assertions in the macros and functions which manipulate the stack:

Configure -DDEBUGGING \
-Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'

=head2 Reference counted stack states

In the new regime, the current argument stack can be in one of three
states, which can be determined by the shown expression.

=over

=item * not reference-counted

!AvREAL(PL_curstack)

In this case, perl will assume when emptying the stack (such as during a
croak()) that the items on it don't need freeing. This is the traditional
perl behaviour. On C<PERL_RC_STACK> builds, such stacks will be rarely
encountered.

=item * fully reference-counted

AvREAL(PL_curstack) && !PL_curstackinfo->si_stack_nonrc_base

All the items on the stack are reference counted, and will be freed by
functions like rpp_popfree_1() or if perl croak()s. This is the normal
state of the stack in C<PERL_RC_STACK> builds.

=item * partially reference-counted (split)

AvREAL(PL_curstack) && PL_curstackinfo->si_stack_nonrc_base > 0

In this case, items on the stack from the index C<si_stack_nonrc_base>
upwards are non-RC; those below are RC. This state occurs when a PP or XS
function has been wrapped. In this case, the wrapper function pushes a
non-RC copy of the arg pointers above the cut then calls the real
function. When that returns, any returned args have their ref counts
bumped up. See below for more details.

=back

Note that perl uses a stack-of-stacks, and the AvREAL() and
C<si_stack_nonrc_base> states are per stack. When perl starts up, the main
stack is RC, but by default, new stacks pushed in XS code via PUSHSTACKi()
are non-RC, so it is quite possible to get a mixture. The perl core itself
uses the new push_stackinfo() function which replaces PUSHSTACKi() and
allows you to specify that the new stack should be RC by default.
(XXX core hasn't been updated mostly yet to use push_stackinfo())

Most places in the core assume a particular RC environment. In particular,
it is assumed that within a runops loop, all the PP functions are
RC-aware, either because they have been (re)written to be aware, or
because they have been wrapped. Whenever a runops loop is entered via
CALLRUNOPS(), it will check the current state of the stack, and if it's
not fully RC, will temporarily update its contents to be fully RC before
entering the main runops loop. Then if necessary it will restore the stack
to its old state on return. This means that functions like call_sv(),
which can be called from any environment (e.g. RC core or wrapped and
temporarily non-RC XS code) will always do the Right Thing when invoking
the runops loop, no matter what the current stack state is.

Similarly, croaks and the like (which can occur anywhere) have to be able
to handle both stack types. So there are a few places in core - call_sv(),
eval_sv() etc, Perl_die_unwind() and S_my_exit_jump(), which have been
specially crafted to handle both cases; everything else can assume a fixed
environment.

=head2 Wrapping

Normally a core PP function is declared like this:

PP(pp_foo)
{
...
}

This expands to something like:

OP* Perl_foo(pTHX)
{
...
}

When such a function needs to be wrapped, it is instead declared as:

PP_wrapped(pp_foo, nargs, nlists)
{
...
}

which on non-RC builds, expands to the same as PP() (the extra args are
ignored). On RC builds it expands to something like

OP* Perl_pp_foo(pTHX)
{
return Perl_pp_wrap(aTHX_ S_Perl_pp_foo_norc, nargs, nlists);
}

STATIC OP* S_Perl_pp_foo_norc(pTHX)
{
...
}

Here the externally visible PP function calls pp_wrap(), which adjusts
the stack contents, then calls the hidden real body of the PP function,
then on return, adjusts the stack back.

There is an API macro, XSPP_wrapped(), intended for use on PP functions
declared in XS code, It is identical to PP_wrapped(), except that it
doesn't prepend a C<Perl_> prefix to the function name.

The C<nargs> and C<nlists> parameters to the macro are numeric constants
or simple expressions which specify how many arguments the PP function
expects, or how many lists it expects. For example,

PP_wrapped(pp_add, 2, 0); /* consumes two args off the stack */

PP_wrapped(pp_readline, /* consumes two or three args */
((PL_op->op_flags & OPf_STACKED) ? 2 : 1), 0);

PP_wrapped(pp_push, 0, 1); /* consumes one list */

PP_wrapped(pp_aassign, 0, 2); /* consumes two lists */

To understand what pp_wrap() does, consider calling Perl_pp_foo() which
expects three arguments. On entry the stack may look like:

... A+ B+ C+

(where the C<+> indicates that the pointers to A, B and C are each
reference counted). The wrapper function pp_wrap() marks a cut at the
current stack position using C<si_stack_nonrc_base>, then, based on the
value of C<nargs>, pushes a copy of those three pointers above the cut:

... A+ B+ C+ | A0 B0 C0

(where the C<0> indicates that the pointers aren't RC), then calls the
=head1 Reference-counted argument stack

=head2 Introduction

As of perl 5.38 (XXX check version) there is a build option,
C<PERL_RC_STACK>, not enabled by default, which requires that items pushed
onto, or popped off the argument stack have their reference counts
adjusted. It is intended that eventually this will be the default way and
finally the only way to configure perl.

The macros which manipulate the stack such as PUSHs() and POPs() don't
adjust the reference count of the SV. Most of the time this is fine, since
something else is keeping the SV alive while on the argument stack, such
a pointer from the TEMPs stack, or from the pad (e.g. a lexical variable
or a C<PADTMP>). Occasionally this can go horribly wrong. For example,
this code:

my @a = (1,2,3);
sub f { @a = (); print "(@_)\n" };
f(@a, 4);

may print undefined or random freed values, since some of the elements of
@_, which have been aliased to the elements of @a, have been freed.
C<PERL_RC_STACK> is intended to fix this by making each SV pointer on the
argument stack increment the reference count (RC) of the SV by one.

In this new environment, unmodified existing PP and XS functions, which
have been written assuming a non reference-counted (non-RC for short)
stack, are called via special wrapper functions which adjust the stack
before and after. At the moment there is no API to write an RC XS
function, so all XS code will continue to be called via a wrapper (which
makes them slightly slower), but means that in general, CPAN distributions
containing XS code code continue to work without modification.

However, PP functions, either in perl core, or those in XS functions used
to implement custom ops or to override the pp functions for built-in ops,
need dealing with specially. For the latter, they can just be wrapped;
this involves the least work, but has a performance impact. In the longer
term, and for core PP functions, they need unwrapping and rewriting using
a new API. With this, the old macros such as PUSHs() have been replaced
with a new set of (mostly inline) functions with a common prefix, such as
rpp_push_1(). "RPP" stands for "reference-counted push and pop functions".
The new functions modify the reference count on C<PERL_RC_STACK> builds,
while leaving them unadjusted otherwise. Thus in core they generally work
in both cases, while in XS code they are portable to older perl versions
via C<PPPort> (XXX assuming that they get been added to C<PPPort>).

The rest of this section is mainly concerned with how to convert existing
PP functions and how to write new PP functions to use the new C<rpp_> API.

A reference-counted perl can be built using the PERL_RC_STACK define.
For development and debugging purposes, it is best to enable leaking
scalar debugging too, as that displays extra information about scalars
that have leaked or been prematurely freed, and it also enables extra
assertions in the macros and functions which manipulate the stack:

Configure -DDEBUGGING \
-Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS'

=head2 Reference counted stack states

In the new regime, the current argument stack can be in one of three
states, which can be determined by the shown expression.

=over

=item * not reference-counted

!AvREAL(PL_curstack)

In this case, perl will assume when emptying the stack (such as during a
croak()) that the items on it don't need freeing. This is the traditional
perl behaviour. On C<PERL_RC_STACK> builds, such stacks will be rarely
encountered.

=item * fully reference-counted

AvREAL(PL_curstack) && !PL_curstackinfo->si_stack_nonrc_base

All the items on the stack are reference counted, and will be freed by
functions like rpp_popfree_1() or if perl croak()s. This is the normal
state of the stack in C<PERL_RC_STACK> builds.

=item * partially reference-counted (split)

AvREAL(PL_curstack) && PL_curstackinfo->si_stack_nonrc_base > 0

In this case, items on the stack from the index C<si_stack_nonrc_base>
upwards are non-RC; those below are RC. This state occurs when a PP or XS
function has been wrapped. In this case, the wrapper function pushes a
non-RC copy of the arg pointers above the cut then calls the real
function. When that returns, any returned args have their ref counts
bumped up. See below for more details.

=back

Note that perl uses a stack-of-stacks, and the AvREAL() and
C<si_stack_nonrc_base> states are per stack. When perl starts up, the main
stack is RC, but by default, new stacks pushed in XS code via PUSHSTACKi()
are non-RC, so it is quite possible to get a mixture. The perl core itself
uses the new push_stackinfo() function which replaces PUSHSTACKi() and
allows you to specify that the new stack should be RC by default.
(XXX core hasn't been updated mostly yet to use push_stackinfo())

Most places in the core assume a particular RC environment. In particular,
it is assumed that within a runops loop, all the PP functions are
RC-aware, either because they have been (re)written to be aware, or
because they have been wrapped. Whenever a runops loop is entered via
CALLRUNOPS(), it will check the current state of the stack, and if it's
not fully RC, will temporarily update its contents to be fully RC before
entering the main runops loop. Then if necessary it will restore the stack
to its old state on return. This means that functions like call_sv(),
which can be called from any environment (e.g. RC core or wrapped and
temporarily non-RC XS code) will always do the Right Thing when invoking
the runops loop, no matter what the current stack state is.

Similarly, croaks and the like (which can occur anywhere) have to be able
to handle both stack types. So there are a few places in core - call_sv(),
eval_sv() etc, Perl_die_unwind() and S_my_exit_jump(), which have been
specially crafted to handle both cases; everything else can assume a fixed
environment.

=head2 Wrapping

Normally a core PP function is declared like this:

PP(pp_foo)
{
...
}

This expands to something like:

OP* Perl_foo(pTHX)
{
...
}

When such a function needs to be wrapped, it is instead declared as:

PP_wrapped(pp_foo, nargs, nlists)
{
...
}

which on non-RC builds, expands to the same as PP() (the extra args are
ignored). On RC builds it expands to something like

OP* Perl_pp_foo(pTHX)
{
return Perl_pp_wrap(aTHX_ S_Perl_pp_foo_norc, nargs, nlists);
}

STATIC OP* S_Perl_pp_foo_norc(pTHX)
{
...
}

Here the externally visible PP function calls pp_wrap(), which adjusts
the stack contents, then calls the hidden real body of the PP function,
then on return, adjusts the stack back.

There is an API macro, XSPP_wrapped(), intended for use on PP functions
declared in XS code, It is identical to PP_wrapped(), except that it
doesn't prepend a C<Perl_> prefix to the function name.

The C<nargs> and C<nlists> parameters to the macro are numeric constants
or simple expressions which specify how many arguments the PP function
expects, or how many lists it expects. For example,

PP_wrapped(pp_add, 2, 0); /* consumes two args off the stack */

PP_wrapped(pp_readline, /* consumes two or three args */
((PL_op->op_flags & OPf_STACKED) ? 2 : 1), 0);

PP_wrapped(pp_push, 0, 1); /* consumes one list */

PP_wrapped(pp_aassign, 0, 2); /* consumes two lists */

To understand what pp_wrap() does, consider calling Perl_pp_foo() which
expects three arguments. On entry the stack may look like:

... A+ B+ C+

(where the C<+> indicates that the pointers to A, B and C are each
reference counted). The wrapper function pp_wrap() marks a cut at the
current stack position using C<si_stack_nonrc_base>, then, based on the
value of C<nargs>, pushes a copy of those three pointers above the cut:

... A+ B+ C+ | A0 B0 C0

(where the C<0> indicates that the pointers aren't RC), then calls the
real PP function, S_Perl_pp_foo_norc(). That function processes A, B and C,
pops them off the stack, and pushes some result SVs. None of this
manipulation adjusts any RCs. On return to pp_wrap(), the stack may look
something like:

... A+ B+ C+ | X0 Y0

The wrapper function bumps up the RCs of X and Y, decrements A B C,
shifts the results down and sets C<si_stack_nonrc_base> to zero, leaving
the stack as:

... X+ Y+

In places like pp_entersub(), a similar wrapping (via the function
xs_wrap()) is done when calling XS subs.

A complex calling environment might have multiple nested stacks with
different RC states. Perl starts off with an RC stack. Then for example,
pp_entersub() is called, which (via xs_wrap()) splits the stack and
executes the XS function in a non-RC environment. That function may call
PUSHSTACKi(), which creates a new non-RC stack, then calls call_sv(), which
does CALLRUNOPS(), which causes the new stack to temporarily become RC.
Then a tied method is called, which pushes a new RC stack, and so on. (XXX
currently tied methods actually push a non-RC stack. To be fixed soon).

=head2 (Re)writing a PP function using the rpp_ API

Wrapping a pp function has a performance overhead, and is there mainly as
a temporary crutch. Eventually, PP functions should be updated to use
rpp_() functions, and any new PP functions should be written this way from
scratch and thus not ever need wrapping.

The traditional PP stack API consisted of a C<dSP> declaration, plus a
number of macros to push, pop and extend the stack. A I<very simplified>
pp_add() function might look something like:

PP(pp_add)
{
dSP;
dTARGET;
IV right = SvIV(POPs);
IV left = SvIV(POPs);
TARGi(left + right, 1);
PUSHs(TARG);
PUTBACK;
return NORMAL;
}

which expands to something like:

{
SV **sp = PL_stack_sp;
Sv *targ = PAD_SV(PL_op->op_targ);
IV right = SvIV(*sp--);
IV left = SvIV(*sp--);
sv_setiv(targ, left + right);
*++sp = targ;
PL_stack_sp = sp;
return PL_op->op_next;
}

The whole C<dSP> thing harks back to the days before decent optimising
compilers. It was always error-prone, e.g. if you forgot a C<PUTBACK> or
C<SPAGAIN>. The new API always just accesses C<PL_stack_sp> directly. In
fact the first step of upgrading a PP function is always to remove the
C<dSP> declaration. This has the happy side effect that any old-style
macros left in the pp function which implicitly use C<sp> will become
compile errors. The existence of a C<dSP> somewhere in core is a good sign
that that function still needs updating.

An obvious question is: why not just modify the definitions of the PUSHs()
etc macros to modify reference counts on RC builds? The issue is that perl
can croak at basically any point in execution (e.g. the SvIV() above might
call FETCH() on a tied variable which then croaks). Thus at all times the
RC of each SV must be properly accounted for. For example, doing something
like

SV *sv = SvREFCNT_dec(*PL_stack--);
IV i = SvIV(sv);

means that C<sv> leaks if SvIV() triggers a croak. Also, SvIV() would be
accessing a freed SV if C<sv> had an RC of 1.

The new regime has the general outline that arguments are left on the
stack until they are finished with, then removed and their reference count
adjusted at that point. With the new API, the pp_add() function looks
something like:

{
dTARGET;
IV right = SvIV(PL_stack_sp[ 0]);
IV left = SvIV(PL_stack_sp[-1]);
TARGi(left + right, 1);
rpp_replace_2_1(targ);
return NORMAL;
}

The rpp_replace_2_1() function pops two values off the stack and pushes
one new value on, while adjusting reference counts as appropriate
(depending on whether built with C<PERL_RC_STACK> or not).

The rpp_() functions in the new API are as follows.

new function approximate old equivant
------------ -----------------------

rpp_extend(n) EXTEND(SP, n)

rpp_push_1(sv) PUSHs(sv)
rpp_push_2(sv1, sv2)) PUSHs(sv1); PUSHs(sv2)
rpp_xpush_1(sv) XPUSHs(sv)
rpp_xpush_2(sv1, sv2)) EXTEND(SP,2); PUSHs(sv1); PUSHs(sv2);

rpp_push_1_norc(sv) mPUSHs(sv) // on RC bulds, skips RC++;
// on non-RC builds, mortalises
rpp_popfree_1() (void)POPs;
rpp_popfree_2() (void)POPs; (void)POPs;
rpp_popfree_to(svp) PL_stack_sp = svp;
rpp_obliterate_stack_to(svp) // see description below

sv = rpp_pop_1_norc() sv = SvREFCNT_inc(POPs)

rpp_replace_1_1(sv) (void)POPs; PUSHs(sv);
rpp_replace_2_1(sv) (void)POPs; (void)POPs; PUSHs(sv);

rpp_try_AMAGIC_1() tryAMAGICun_MG()
rpp_try_AMAGIC_2() tryAMAGICbin_MG()

(no replacement) dATARGET // just write the macro body in full

Other new C and perl functions related to reference-counted stacks are:

push_stackinfo(type,rc) PUSHSTACKi(type)
pop_stackinfo() POPSTACK()
switch_argstack(to) SWITCHSTACK(from,to)

(Internals::stack_refcounted() & 1) # perl built with PERL_RC_STACK

Note that rpp_popfree_1() etc aren't direct replacements for C<POPs>. The
rpp_() variants don't return a value and are intended to be called when
the SV is finished with. So

SV *sv = POPs;
... do stuff with sv ...

becomes

SV *sv = *PL_stack_sp;
... do stuff with sv ...
rpp_popfree_1(); /* does SvREFCNT_dec(*PL_stack_sp--) */

The rpp_replace_M_N() functions are shortcuts for popping and freeing C<M>
items then pushing and bumping up the RCs of C<N> items. Note that they
handle edge cases such as an old and new SV being the same.

rpp_popfree_to(svp) is designed to replace code like

PL_stack_sp = PL_stack_base + cx->blk_oldsp;

which typically appears in list ops or scope exits when the arguments are
finished with. Left unaltered, all the SVs above C<oldsp> would leak. The
new approach is

rpp_popfree_to(PL_stack_base + cx->blk_oldsp);

There is a rarely-used variant of this, rpp_obliterate_stack_to(), which
pops the stack back to the specified index regardless of the current RC
state of the stack. So for example if the stack is split, it will only
adjust the RCs of any SVs which are below the split point, while
rpp_popfree_to() would mindlessly free I<all> SVs (on RC builds anyway).
For normal PP functions you should only ever use rpp_popfree_to(), which
is faster.

There are no new equivalents for all the convenience macros like POPi()
and (shudder) dPOPPOPiirl(). These should be replaced with the rpp_()
functions above and with the conversions and variable declarations being
made explicit, e.g. dPOPPOPiirl() becomes:

IV right = SvIV(PL_stack_sp[ 0]);
IV left = SvIV(PL_stack_sp[-1]);
rpp_popfree_2();

A couple of the rpp_() functions with C<norc> in their names don't
adjust the reference count on RC builds, but do on non-RC builds (so the
reverse of most C<rpp> functions).

rpp_push_1_norc(sv) does a simple C<*++PL_stack_sp = sv> on RC builds. It
is typically used to "root" a newly-created SV, which already has an RC of
1. On non-RC builds it mortalises the SV instead. So for example, code
which used to look like

mPUSHs(newSViv(i));

which expanded to the equivalent of:

PUSHs(sv_2mortal(newSViv(i));

should be rewritten as:

rpp_push_1_norc(newSViv(i));

This is because when the stack is reference-counted, rooting the SV on the
stack is sufficient; there's no longer a need to root it via the temps
stack.

Similarly, on RC builds, C<sv = rpp_pop_1_norc()> does a simple
C<sv = *PL_stack_sv--> without adjusting the reference count, while on
non-RC builds it actually increments the SV's reference count. It is
intended for cases where you immediately want to increment the reference
count again after popping, e.g. where the SV is to be immediately embedded
somewhere. For example this code:

SV *sv = PL_stack_sp[0];
SvREFCNT_inc(sv);
av_store(av, i, sv); /* in real life should check return value */
rpp_popfree_1();

can be more efficiently written as

av_store(av, i, rpp_pop_1_norc());

By using this function, the code works correctly on both RC and non-RC
builds.

The macros which appear at the start of many PP functions to check for
unary or binary op overloading (among other things) have been replaced
with rpp_try_AMAGIC_1() and _2() inline functions, which now rely on the
calling PP function to choose whether to return immediately rather than
the return being hidden away in the macro.

In the spirit of hiding away less in macros, C<dATARGET> hasn't been given
a replacement; where its effect is needed, it is now written out in full;
see pp_add() for an example.

Finally, a couple of rpp() functions provide information rather than
manipulate the stack.

rpp_is_lone(sv) indicates whether C<sv>, assumed to be still on the stack,
it kept alive only by a reference-counted pointer from the argument and/or
temps stacks, and thus is a candidate for some optimisations (like
skipping the copying of return arguments from a subroutine call).

rpp_stack_is_rc() indicates whether the current stack is currently
reference-counted. It's used mainly in a few places like call_sv() which
can be called from anywhere, and thus have to deal with both cases.

So for example, rather than using rpp_xpush_1(), call_sv() has lines like:

rpp_extend(1);
*++PL_stack_sp = sv;
#ifdef PERL_RC_STACK
if (rpp_stack_is_rc())
SvREFCNT_inc_simple_void_NN(sv);
#endif

which works on both standard builds and RC builds, and works whether
call_sv() is called from a standard PP function (rpp_stack_is_rc() is
true) or from a wrapped PP or XS function (rpp_stack_is_rc() is false).
Note that you're unlikely to need to use this function, as in most places,
such as PP or XS functions, it is always RC or non-RC respectively. In
fact, under C<DEBUG_LEAKING_SCALARS>, PUSHs() and similar macros include
an C<assert(!rpp_stack_is_rc())>, while rpp_push_1() and similar functions
have C<assert(rpp_stack_is_rc())>.

The macros for pushing new stackinfos have been replaced with inline
functions which don't rely on C<dSP> being in scope, and which have less
ambiguous names: they make it clear that a new I<stackinfo> is being
pushed, rather than just some sort of I<stack>. push_stackinfo() also has
a boolean argument indicating whether the new argument stack should be
reference-counted or not. For backwards compatibility, PUSHSTACKi(type) is
defined to be push_stackinfo(type, 0).

Some test scripts check for things like leaks by testing that the
reference count of a particular variable has an expected value. If this
is different on a perl built with C<PERL_RC_STACK>, then the perl
function Internals::stack_refcounted() can be used. This returns an
integer, the lowest bit of which indicates that perl was built with
C<PERL_RC_STACK>. Other bits are reserved for future use and should be
masked out.
real PP function, S_Perl_pp_foo_norc(). That function processes A, B and C,
pops them off the stack, and pushes some result SVs. None of this
manipulation adjusts any RCs. On return to pp_wrap(), the stack may look
something like:

... A+ B+ C+ | X0 Y0

The wrapper function bumps up the RCs of X and Y, decrements A B C,
shifts the results down and sets C<si_stack_nonrc_base> to zero, leaving
the stack as:

... X+ Y+

In places like pp_entersub(), a similar wrapping (via the function
xs_wrap()) is done when calling XS subs.

A complex calling environment might have multiple nested stacks with
different RC states. Perl starts off with an RC stack. Then for example,
pp_entersub() is called, which (via xs_wrap()) splits the stack and
executes the XS function in a non-RC environment. That function may call
PUSHSTACKi(), which creates a new non-RC stack, then calls call_sv(), which
does CALLRUNOPS(), which causes the new stack to temporarily become RC.
Then a tied method is called, which pushes a new RC stack, and so on. (XXX
currently tied methods actually push a non-RC stack. To be fixed soon).

=head2 (Re)writing a PP function using the rpp_ API

Wrapping a pp function has a performance overhead, and is there mainly as
a temporary crutch. Eventually, PP functions should be updated to use
rpp_() functions, and any new PP functions should be written this way from
scratch and thus not ever need wrapping.

The traditional PP stack API consisted of a C<dSP> declaration, plus a
number of macros to push, pop and extend the stack. A I<very simplified>
pp_add() function might look something like:

PP(pp_add)
{
dSP;
dTARGET;
IV right = SvIV(POPs);
IV left = SvIV(POPs);
TARGi(left + right, 1);
PUSHs(TARG);
PUTBACK;
return NORMAL;
}

which expands to something like:

{
SV **sp = PL_stack_sp;
Sv *targ = PAD_SV(PL_op->op_targ);
IV right = SvIV(*sp--);
IV left = SvIV(*sp--);
sv_setiv(targ, left + right);
*++sp = targ;
PL_stack_sp = sp;
return PL_op->op_next;
}

The whole C<dSP> thing harks back to the days before decent optimising
compilers. It was always error-prone, e.g. if you forgot a C<PUTBACK> or
C<SPAGAIN>. The new API always just accesses C<PL_stack_sp> directly. In
fact the first step of upgrading a PP function is always to remove the
C<dSP> declaration. This has the happy side effect that any old-style
macros left in the pp function which implicitly use C<sp> will become
compile errors. The existence of a C<dSP> somewhere in core is a good sign
that that function still needs updating.

An obvious question is: why not just modify the definitions of the PUSHs()
etc macros to modify reference counts on RC builds? The issue is that perl
can croak at basically any point in execution (e.g. the SvIV() above might
call FETCH() on a tied variable which then croaks). Thus at all times the
RC of each SV must be properly accounted for. For example, doing something
like

SV *sv = SvREFCNT_dec(*PL_stack--);
IV i = SvIV(sv);

means that C<sv> leaks if SvIV() triggers a croak. Also, SvIV() would be
accessing a freed SV if C<sv> had an RC of 1.

The new regime has the general outline that arguments are left on the
stack until they are finished with, then removed and their reference count
adjusted at that point. With the new API, the pp_add() function looks
something like:

{
dTARGET;
IV right = SvIV(PL_stack_sp[ 0]);
IV left = SvIV(PL_stack_sp[-1]);
TARGi(left + right, 1);
rpp_replace_2_1(targ);
return NORMAL;
}

The rpp_replace_2_1() function pops two values off the stack and pushes
one new value on, while adjusting reference counts as appropriate
(depending on whether built with C<PERL_RC_STACK> or not).

The rpp_() functions in the new API are as follows.

new function approximate old equivant
------------ -----------------------

rpp_extend(n) EXTEND(SP, n)

rpp_push_1(sv) PUSHs(sv)
rpp_push_2(sv1, sv2)) PUSHs(sv1); PUSHs(sv2)
rpp_xpush_1(sv) XPUSHs(sv)
rpp_xpush_2(sv1, sv2)) EXTEND(SP,2); PUSHs(sv1); PUSHs(sv2);

rpp_push_1_norc(sv) mPUSHs(sv) // on RC bulds, skips RC++;
// on non-RC builds, mortalises
rpp_popfree_1() (void)POPs;
rpp_popfree_2() (void)POPs; (void)POPs;
rpp_popfree_to(svp) PL_stack_sp = svp;
rpp_obliterate_stack_to(svp) // see description below

sv = rpp_pop_1_norc() sv = SvREFCNT_inc(POPs)

rpp_replace_1_1(sv) (void)POPs; PUSHs(sv);
rpp_replace_2_1(sv) (void)POPs; (void)POPs; PUSHs(sv);

rpp_try_AMAGIC_1() tryAMAGICun_MG()
rpp_try_AMAGIC_2() tryAMAGICbin_MG()

(no replacement) dATARGET // just write the macro body in full

Other new C and perl functions related to reference-counted stacks are:

push_stackinfo(type,rc) PUSHSTACKi(type)
pop_stackinfo() POPSTACK()
switch_argstack(to) SWITCHSTACK(from,to)

(Internals::stack_refcounted() & 1) # perl built with PERL_RC_STACK

Note that rpp_popfree_1() etc aren't direct replacements for C<POPs>. The
rpp_() variants don't return a value and are intended to be called when
the SV is finished with. So

SV *sv = POPs;
... do stuff with sv ...

becomes

SV *sv = *PL_stack_sp;
... do stuff with sv ...
rpp_popfree_1(); /* does SvREFCNT_dec(*PL_stack_sp--) */

The rpp_replace_M_N() functions are shortcuts for popping and freeing C<M>
items then pushing and bumping up the RCs of C<N> items. Note that they
handle edge cases such as an old and new SV being the same.

rpp_popfree_to(svp) is designed to replace code like

PL_stack_sp = PL_stack_base + cx->blk_oldsp;

which typically appears in list ops or scope exits when the arguments are
finished with. Left unaltered, all the SVs above C<oldsp> would leak. The
new approach is

rpp_popfree_to(PL_stack_base + cx->blk_oldsp);

There is a rarely-used variant of this, rpp_obliterate_stack_to(), which
pops the stack back to the specified index regardless of the current RC
state of the stack. So for example if the stack is split, it will only
adjust the RCs of any SVs which are below the split point, while
rpp_popfree_to() would mindlessly free I<all> SVs (on RC builds anyway).
For normal PP functions you should only ever use rpp_popfree_to(), which
is faster.

There are no new equivalents for all the convenience macros like POPi()
and (shudder) dPOPPOPiirl(). These should be replaced with the rpp_()
functions above and with the conversions and variable declarations being
made explicit, e.g. dPOPPOPiirl() becomes:

IV right = SvIV(PL_stack_sp[ 0]);
IV left = SvIV(PL_stack_sp[-1]);
rpp_popfree_2();

A couple of the rpp_() functions with C<norc> in their names don't
adjust the reference count on RC builds, but do on non-RC builds (so the
reverse of most C<rpp> functions).

rpp_push_1_norc(sv) does a simple C<*++PL_stack_sp = sv> on RC builds. It
is typically used to "root" a newly-created SV, which already has an RC of
1. On non-RC builds it mortalises the SV instead. So for example, code
which used to look like

mPUSHs(newSViv(i));

which expanded to the equivalent of:

PUSHs(sv_2mortal(newSViv(i));

should be rewritten as:

rpp_push_1_norc(newSViv(i));

This is because when the stack is reference-counted, rooting the SV on the
stack is sufficient; there's no longer a need to root it via the temps
stack.

Similarly, on RC builds, C<sv = rpp_pop_1_norc()> does a simple
C<sv = *PL_stack_sv--> without adjusting the reference count, while on
non-RC builds it actually increments the SV's reference count. It is
intended for cases where you immediately want to increment the reference
count again after popping, e.g. where the SV is to be immediately embedded
somewhere. For example this code:

SV *sv = PL_stack_sp[0];
SvREFCNT_inc(sv);
av_store(av, i, sv); /* in real life should check return value */
rpp_popfree_1();

can be more efficiently written as

av_store(av, i, rpp_pop_1_norc());

By using this function, the code works correctly on both RC and non-RC
builds.

The macros which appear at the start of many PP functions to check for
unary or binary op overloading (among other things) have been replaced
with rpp_try_AMAGIC_1() and _2() inline functions, which now rely on the
calling PP function to choose whether to return immediately rather than
the return being hidden away in the macro.

In the spirit of hiding away less in macros, C<dATARGET> hasn't been given
a replacement; where its effect is needed, it is now written out in full;
see pp_add() for an example.

Finally, a couple of rpp() functions provide information rather than
manipulate the stack.

rpp_is_lone(sv) indicates whether C<sv>, assumed to be still on the stack,
it kept alive only by a reference-counted pointer from the argument and/or
temps stacks, and thus is a candidate for some optimisations (like
skipping the copying of return arguments from a subroutine call).

rpp_stack_is_rc() indicates whether the current stack is currently
reference-counted. It's used mainly in a few places like call_sv() which
can be called from anywhere, and thus have to deal with both cases.

So for example, rather than using rpp_xpush_1(), call_sv() has lines like:

rpp_extend(1);
*++PL_stack_sp = sv;
#ifdef PERL_RC_STACK
if (rpp_stack_is_rc())
SvREFCNT_inc_simple_void_NN(sv);
#endif

which works on both standard builds and RC builds, and works whether
call_sv() is called from a standard PP function (rpp_stack_is_rc() is
true) or from a wrapped PP or XS function (rpp_stack_is_rc() is false).
Note that you're unlikely to need to use this function, as in most places,
such as PP or XS functions, it is always RC or non-RC respectively. In
fact, under C<DEBUG_LEAKING_SCALARS>, PUSHs() and similar macros include
an C<assert(!rpp_stack_is_rc())>, while rpp_push_1() and similar functions
have C<assert(rpp_stack_is_rc())>.

The macros for pushing new stackinfos have been replaced with inline
functions which don't rely on C<dSP> being in scope, and which have less
ambiguous names: they make it clear that a new I<stackinfo> is being
pushed, rather than just some sort of I<stack>. push_stackinfo() also has
a boolean argument indicating whether the new argument stack should be
reference-counted or not. For backwards compatibility, PUSHSTACKi(type) is
defined to be push_stackinfo(type, 0).

Some test scripts check for things like leaks by testing that the
reference count of a particular variable has an expected value. If this
is different on a perl built with C<PERL_RC_STACK>, then the perl
function Internals::stack_refcounted() can be used. This returns an
integer, the lowest bit of which indicates that perl was built with
C<PERL_RC_STACK>. Other bits are reserved for future use and should be
masked out.

========================================================================

--
This email is confidential, and now that you have read it you are legally
obliged to shoot yourself. Or shoot a lawyer, if you prefer. If you have
received this email in error, place it in its original wrapping and return
for a full refund. By opening this email, you accept that Elvis lives.

RE: PERL_RC_STACK branch: first cut [ In reply to ]

perl5-porters at perl

Feb 26, 2023, 11:39 PM

Post #2 of 36 (651 views)