Mailing List Archive

defined $a, where $a is actually a HASH
The summary - I think that this change is correct and should stay.


The details - I did this. It's in blead. The commit message explains the
reasoning. The code removed is deadweight, and likely contains 2 (more)
branches, which are one of the banes of performant code:


commit 2517717a8902648de577d07f548caba6e3f2ecea
Author: Nicholas Clark <nick@ccl4.org>
Date: Mon Jul 19 07:01:15 2021 +0000

The cases for SVt_PVAV and SVt_PVHV in pp_defined are unreachable.

Remove them, and hit to the C compiler that it's unlikely that someone used
`defined` on a subroutine.

These have been unreachable since `defined @array` and `defined %hash`
became syntax errors. Whilst the same PP code is used for // and //=,
expressions such as`@a // @b` put the left array (or hash) in scalar
context, meaning that it always returns a define value.
(Should we warn on these?)

diff --git a/pp_hot.c b/pp_hot.c
index c693b301bb..fcc5e14454 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1368,24 +1368,14 @@ PP(pp_defined)
}

defined = FALSE;
- switch (SvTYPE(sv)) {
- case SVt_PVAV:
- if (AvMAX(sv) >= 0 || SvGMAGICAL(sv) || (SvRMAGICAL(sv) && mg_find(sv, PERL_MAGIC_tied)))
- defined = TRUE;
- break;
- case SVt_PVHV:
- if (HvARRAY(sv) || SvGMAGICAL(sv) || (SvRMAGICAL(sv) && mg_find(sv, PERL_MAGIC_tied)))
- defined = TRUE;
- break;
- case SVt_PVCV:
+ if (UNLIKELY(SvTYPE(sv) == SVt_PVCV)) {
if (CvROOT(sv) || CvXSUB(sv))
defined = TRUE;
- break;
- default:
+ }
+ else {
SvGETMAGIC(sv);
if (SvOK(sv))
defined = TRUE;
- break;
}

if (is_dor) {




It turns out that it *is* reached by (at least) one testcase on CPAN:

https://metacpan.org/release/SYBER/XS-Framework-1.5.0/source/t/upgrade.t#L14

# undefs to HV
$obj = bless \(my $a = undef), 'AAA';
XS::Framework::obj2hv($obj);
ok($obj->{key} = 1);
is($obj->{key}, 1);
ok defined $a;


Note, at this point, $a *isn't* a scalar. It's a hash. *Not* a reference to
a hash. An actual hash.

If I add:

use Devel::Peek; Dump ($a);
my $copy = $a;


I get this:


$ /home/nick/Sandpit/snap-v5.35.2-247-gc243917ec3-g/bin/perl5.35.3 -Mblib t/upgrade.t
ok 1
ok 2
not ok 3
# Failed test at t/upgrade.t line 14.
SV = PVHV(0x562e3f5f8fa0) at 0x562e3f706ff8
REFCNT = 2
FLAGS = (OBJECT,SHAREKEYS)
STASH = 0x562e3f60faf0 "AAA"
ARRAY = 0x562e3f625db0 (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
Elt "key" HASH = 0x2f975654
SV = IV(0x562e3f60fb88) at 0x562e3f60fb98
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
Bizarre copy of HASH in scalar assignment at t/upgrade.t line 16.
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 255 just after 3.



Given that both:

1) You can only create this situation from XS code
2) What is created is *not* legal in Perl-space


I don't think that what I changed was "supported" behaviour (or even
"specified").

Hence I think that the behaviour in blead should stay, and the distribution
should change this line in this regression test, to express its testing
intent in some other way.

I'm not quite sure *what* the defined test is needed for, as there's this
test for `defined $a` after the call to obj2hv() but no corresponding call
after obj2av(). Maybe it should just go. Otherwise, taking a reference to $a
and then using Scalar::Util::reftype on that exposes the storage type
(which changes from 'SCALAR' to 'HASH') - ie:

$ diff -u t/upgrade.t.orig t/upgrade.t
--- t/upgrade.t.orig 2021-07-13 11:02:17.000000000 +0000
+++ t/upgrade.t 2021-08-19 13:42:54.401902362 +0000
@@ -3,15 +3,17 @@
use Test::More;
use Test::Deep;
use XS::Framework;
+use Scalar::Util qw(reftype);

my $obj;

# undefs to HV
$obj = bless \(my $a = undef), 'AAA';
+is(reftype(\$a), 'SCALAR');
XS::Framework::obj2hv($obj);
ok($obj->{key} = 1);
is($obj->{key}, 1);
-ok defined $a;
+is(reftype(\$a), 'HASH');
XS::Framework::obj2hv($obj);


Nicholas Clark
Re: defined $a, where $a is actually a HASH [ In reply to ]
Nicholas Clark <nick@ccl4.org> wrote:
:The summary - I think that this change is correct and should stay.
:
:
:The details - I did this. It's in blead. The commit message explains the
:reasoning. The code removed is deadweight, and likely contains 2 (more)
:branches, which are one of the banes of performant code:
[...]
: These have been unreachable since `defined @array` and `defined %hash`
: became syntax errors.

I think it would be wise to have an assert under DEBUGGING, especially
since you have already discovered that they have _not_ been unreachable.

: Whilst the same PP code is used for // and //=,
: expressions such as`@a // @b` put the left array (or hash) in scalar
: context, meaning that it always returns a define value.
: (Should we warn on these?)

If C<defined(@a)> is a syntax error, then I think C<@a // @b> should
be too: I feel C<A // B> should as far as possible be identical to
C<defined(A) ? A : B> except for the double-evaluation of A.

Even better would be making C<@a || @b> equivalent to C<@a ? @a : @b>)
(even in list context), and parallel-ly C<@a && @b> to C<!@a ? @a : @b>.

@neilb: this probably belongs in the ongoing PSC quirks discussion,
if it isn't already there.

Hugo
Re: defined $a, where $a is actually a HASH [ In reply to ]
> : These have been unreachable since `defined @array` and `defined %hash`
> : became syntax errors.
>


looks like I missed several meetings. I still expect

no strict;
print (defined(@arr) ? 1 : 2);
@arr = (1);
print (defined(@arr) ? 1 : 2);
print (defined(%h) ? 1 : 2);
%h = (1,2);
print (defined(%h) ? 1 : 2);
print "\n";

to print 2121.

I have apparently never needed to do that, but the new thing is something
like *h->{HASH}, right?


--
"Lay off that whiskey, and let that cocaine be!" -- Johnny Cash
Re: defined $a, where $a is actually a HASH [ In reply to ]
David Nicol <davidnicol@gmail.com> writes:

>> : These have been unreachable since `defined @array` and `defined %hash`
>> : became syntax errors.
>>
>
>
> looks like I missed several meetings. I still expect
>
> no strict;
> print (defined(@arr) ? 1 : 2);
> @arr = (1);
> print (defined(@arr) ? 1 : 2);
> print (defined(%h) ? 1 : 2);
> %h = (1,2);
> print (defined(%h) ? 1 : 2);
> print "\n";
>
> to print 2121.

That's been deprecated since 5.6:

https://perldoc.perl.org/perl56delta#defined(@array)-is-deprecated

and became a syntax error in 5.22:

https://perldoc.perl.org/perl5220delta#defined(@array)-and-defined(%25hash)-are-now-fatal-errors

> I have apparently never needed to do that, but the new thing is something
> like *h->{HASH}, right?

What new thing?

The thing Nicholas mentioned breaking is XS code storing a hash or array
(PVHV or PVAV), not a reference to one (an IV with SVf_ROK), in a scalar
variable, and then doing defined($foo), which is not a valid thing to do.

- ilmari
Re: defined $a, where $a is actually a HASH [ In reply to ]
On Thu, Aug 19, 2021 at 04:35:00PM -0500, David Nicol wrote:
> > : These have been unreachable since `defined @array` and `defined %hash`
> > : became syntax errors.
> >
>
>
> looks like I missed several meetings. I still expect
>
> no strict;
> print (defined(@arr) ? 1 : 2);
> @arr = (1);
> print (defined(@arr) ? 1 : 2);
> print (defined(%h) ? 1 : 2);
> %h = (1,2);
> print (defined(%h) ? 1 : 2);
> print "\n";
>
> to print 2121.
>
> I have apparently never needed to do that, but the new thing is something
> like *h->{HASH}, right?

C<< defined AGGREGATE >> returned information about the internal state
of the aggregate, and wasn't about whether there was a glob or not.

It worked for lexicals:

$ ~/perl/5.10.0-debug/bin/perl -le 'my @x; print defined @x; @x = 1; print defined @x; @x = (); print defined @x'

1
1

but it was misleading, since while lexicals do get initialized when
introduced, that internal state isn't reset:

$ ~/perl/5.10.0-debug/bin/perl -le 'for (1,2) { my @x; print defined @x; @x = 1; print defined @x; @x = (); print defined @x; }'

1
1
1
1
1

The equivalent would be something like:

use B;
my $x = B::svref_2object(\@x);
my $def = $x->MAX >= 0;

but why do that?

Tony
Re: defined $a, where $a is actually a HASH [ In reply to ]
On Thu, Aug 19, 2021 at 05:18:10PM +0100, hv@crypt.org wrote:
> Nicholas Clark <nick@ccl4.org> wrote:
> :The summary - I think that this change is correct and should stay.
> :
> :
> :The details - I did this. It's in blead. The commit message explains the
> :reasoning. The code removed is deadweight, and likely contains 2 (more)
> :branches, which are one of the banes of performant code:
> [...]
> : These have been unreachable since `defined @array` and `defined %hash`
> : became syntax errors.
>
> I think it would be wise to have an assert under DEBUGGING, especially
> since you have already discovered that they have _not_ been unreachable.

Good point. I did this:

commit 034242a8a539b024648b18318271eabddf40ca48
Author: Nicholas Clark <nick@ccl4.org>
Date: Wed Sep 8 08:14:28 2021 +0000

In pp_defined assert that the SV is not a hash or array.

The code that handled hashes and arrays was removed by commit 2517717a8902:
The cases for SVt_PVAV and SVt_PVHV in pp_defined are unreachable.

Remove them, and hit to the C compiler that it's unlikely that someone used
`defined` on a subroutine.

These have been unreachable since `defined @array` and `defined %hash`
became syntax errors. Whilst the same PP code is used for // and //=,
expressions such as`@a // @b` put the left array (or hash) in scalar
context, meaning that it always returns a define value.
(Should we warn on these?)

However, it turns out that that the removed code was reachable by XS code
generating data structures that would be "illegal" in pure Perl (eg also
triggering "Bizarre copy of ..." errors). Hence with -DDEBUGGING, add logic
to report problematic cases, instead of silently continuing with changed
behaviour.

diff --git a/pp_hot.c b/pp_hot.c
index 17683ff000..43c61bd272 100644
--- a/pp_hot.c
+++ b/pp_hot.c
@@ -1365,6 +1365,15 @@ PP(pp_defined)
RETSETNO;
}

+ /* Historically what followed was a switch on SvTYPE(sv), handling SVt_PVAV,
+ * SVt_PVCV, SVt_PVHV and "default". `defined &sub` is still valid syntax,
+ * hence we still need the special case PVCV code. But AVs and HVs now
+ * should never arrive here... */
+#ifdef DEBUGGING
+ assert(SvTYPE(sv) != SVt_PVAV);
+ assert(SvTYPE(sv) != SVt_PVHV);
+#endif
+
if (UNLIKELY(SvTYPE(sv) == SVt_PVCV)) {
if (CvROOT(sv) || CvXSUB(sv))
defined = TRUE;


Nicholas Clark
Re: defined $a, where $a is actually a HASH [ In reply to ]
On Wed, 8 Sep 2021 09:56:47 +0000
Nicholas Clark <nick@ccl4.org> wrote:

> +#ifdef DEBUGGING
> + assert(SvTYPE(sv) != SVt_PVAV);
> + assert(SvTYPE(sv) != SVt_PVHV);
> +#endif

(as per IRC): Pretty sure you don't need to #ifdef that, because
assert() does nothing on non-DEBUGGING builds anyway.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/