On 3/28/24 07:05, Max via perl5-porters wrote:
> Dear Perl 5 Porters, I am updating from Perl 5.26.1 to 5.38.2. These
> versions have a different behavior in the interpolation of strings.
> While 5.26.1 is evaluating each @{[ $Var++ ]} from left to right one
> after another, 5.38.2 seems to first evaluate each @{[ $Var++ ]} and
> construct the string thereafter returning a wrong value for the first
> @{[ $Var++ ]} value at that point of interpolation. ###############
> TestInterpolateString.pl #################### $Var = 0; $String =
> "Interpolated string: @{[ $Var++ ]} $Var @{[ $Var++ ]} $Var"; print
> $String . "\n"; 1;
> ###################################################### >
> /opt/perl/5.26.1/bin/perl TestInterpolateString.pl Interpolated string:
> 0 1 1 2 > /opt/perl/5.38.2/bin/perl TestInterpolateString.plInterpolated
> string: 0 2 1 2 > /opt/perl/5.34.0/bin/perl
> TestInterpolateString.plInterpolated string: 0 2 1 2
> ######################################################
> Would you agree that this is a bug in 5.38.2 or is the change intended?
> I do prefer the 5.26.1 behavior, as it is more logical and allows
> consequtive changes within the string by each @{[ ]} expression. Do you
> think that in future Perl versions the behavior will fall back to that
> of Perl 5.26.1? Thank you in advance. Sincerely, Max.
>
The behavior you mentioned changed between 5.26 and 5.28 in the
following commit, which introduced the multiconcat op:
#####
$ gitshowf e839e6ed99c6b25aee589f56bb58de2f8fa00f41
commit e839e6ed99c6b25aee589f56bb58de2f8fa00f41
Author: David Mitchell <davem@iabyn.nospamdeletethisbit.com>
AuthorDate: Tue Aug 8 18:42:14 2017 +0100
Commit: David Mitchell <davem@iabyn.nospamdeletethisbit.com>
CommitDate: Tue Oct 31 15:31:26 2017 +0000
Add OP_MULTICONCAT op
Allow multiple OP_CONCAT, OP_CONST ops, plus optionally an OP_SASSIGN
or OP_STRINGIFY, to be combined into a single OP_MULTICONCAT op,
which can
make things a *lot* faster: 4x or more.
In more detail: it will optimise into a single OP_MULTICONCAT, most
expressions of the form
LHS RHS
where LHS is one of
(empty)
my $lexical =
$lexical =
$lexical .=
expression =
expression .=
and RHS is one of
(A . B . C . ...) where A,B,C etc are expressions and/or
string constants
"aAbBc..." where a,A,b,B etc are expressions
and/or
string constants
sprintf "..%s..%s..", A,B,.. where the format is a constant string
containing only '%s' and '%%'
elements,
and A,B, etc are scalar
expressions (so
only a fixed, compile-time-known
number of
args: no arrays or list context
function
calls etc)
It doesn't optimise other forms, such as
($a . $b) . ($c. $d)
((($a .= $b) .= $c) .= $d);
(although sub-parts of those expressions might be converted to an
OP_MULTICONCAT). This is partly because it would be hard to
maintain the
correct ordering of tie or overload calls.
The compiler uses heuristics to determine when to convert: in general,
expressions involving a single OP_CONCAT aren't converted, unless some
other saving can be made, for example if an OP_CONST can be
eliminated, or
in the presence of 'my $x = .. ' which OP_MULTICONCAT can apply
OPpTARGET_MY to, but OP_CONST can't.
The multiconcat op is of type UNOP_AUX, with the op_aux structure
directly
holding a pointer to a single constant char* string plus a list of
segment
lengths. So for
"a=$a b=$b\n";
the constant string is "a= b=\n", and the segment lengths are (2,3,1).
If the constant string has different non-utf8 and utf8 representations
(such as "\x80") then both variants are pre-computed and stored in
the aux
struct, along with two sets of segment lengths.
For all the above LHS types, any SASSIGN op is optimised away. For
a LHS
of '$lex=', '$lex.=' or 'my $lex=', the PADSV is optimised away too.
For example where $a and $b are lexical vars, this statement:
my $c = "a=$a, b=$b\n";
formerly compiled to
const[PV "a="] s
padsv[$a:1,3] s
concat[t4] sK/2
const[PV ", b="] s
concat[t5] sKS/2
padsv[$b:1,3] s
concat[t6] sKS/2
const[PV "\n"] s
concat[t7] sKS/2
padsv[$c:2,3] sRM*/LVINTRO
sassign vKS/2
and now compiles to:
padsv[$a:1,3] s
padsv[$b:1,3] s
multiconcat("a=, b=\n",2,4,1)[$c:2,3] vK/LVINTRO,TARGMY,STRINGIFY
In terms of how much faster it is, this code:
my $a = "the quick brown fox jumps over the lazy dog";
my $b = "to be, or not to be; sorry, what was the question again?";
for my $i (1..10_000_000) {
my $c = "a=$a, b=$b\n";
}
runs 2.7 times faster, and if you throw utf8 mixtures in it gets even
better. This loop runs 4 times faster:
my $s;
my $a = "ab\x{100}cde";
my $b = "fghij";
my $c = "\x{101}klmn";
for my $i (1..10_000_000) {
$s = "\x{100}wxyz";
$s .= "foo=$a bar=$b baz=$c";
}
The main ways in which OP_MULTICONCAT gains its speed are:
* any OP_CONSTs are eliminated, and the constant bits (already in the
right encoding) are copied directly from the constant string
attached to
the op's aux structure.
* It optimises away any SASSIGN op, and possibly a PADSV op on the
LHS, in
all cases; OP_CONCAT only did this in very limited circumstances.
* Because it has a holistic view of the entire concatenation
expression,
it can do the whole thing in one efficient go, rather than
creating and
copying intermediate results. pp_multiconcat() goes to considerable
efforts to avoid inefficiencies. For example it will only
SvGROW() the
target once, and to the exact size needed, no matter what mix of utf8
and non-utf8 appear on the LHS and RHS. It never allocates any
temporary SVs except possibly in the case of tie or overloading.
* It does all its own appending and utf8 handling rather than calling
out to functions like sv_catsv().
* It's very good at handling the LHS appearing on the RHS; for
example in
$x = "abcd";
$x = "-$x-$x-";
It will do roughly the equivalent of the following (where targ is
$x);
SvPV_force(targ);
SvGROW(targ, 11);
p = SvPVX(targ);
Move(p, p+1, 4, char);
Copy("-", p, 1, char);
Copy("-", p+5, 1, char);
Copy(p+1, p+6, 4, char);
Copy("-", p+10, 1, char);
SvCUR(targ) = 11;
p[11] = '\0';
Formerly, pp_concat would have used multiple PADTMPs or temporary
SVs to
handle situations like that.
The code is quite big; both S_maybe_multiconcat() and pp_multiconcat()
(the main compile-time and runtime parts of the implementation) are
over
700 lines each. It turns out that when you combine multiple ops, the
number of edge cases grows exponentially ;-)
#####
We certainly haven't had this described as a bug until now, but I'll let
Dave Mitchell and others comment further.