Mailing List Archive

PV vs NV/IV distinctions - crowd-testing required
*dons the PSC hat once again...*

It is currently the case that the way Perl turns scalars into strings
will cause both the the POK ("has a string") and the IOK or NOK
("has some kind of number") flags to be set. Both flags are also set
when Perl tries to use a value that was previously a string as a
number. This means that in the following scenario, both values come out
identical:

my $was_POK = "10";
my $was_IOK = 10;

print "The values are $was_POK, $was_IOK"
if $was_POK == 10 and $was_IOK == 10;

# At this point, both scalars have both POK and IOK flags
## (Actually for the nitpickers, this may be the NOK flag, to be
## honest I'm not entirely sure but the distinction isn't important
## today)


Nicholas Clark has a PR that makes a small change to the internal Perl
function that stringifies a scalar:

https://github.com/Perl/perl5/pull/18958

The upshot of this is to change the way the flags are set on the SV
after this stringification, so that the POK flag is no longer set.
While this change is (currently) invisible from Pureperl code, it makes
an important distinction at the XS level, for example allowing things
like serialisation or encoding functions to tell the difference between
"the programmer intends this to be a string" and "the programmer
intends this to be a number".

This seems like a good idea going forward. With this change alone,
things like JSON serialisation XS modules can make use of this new
information to yield more expected results out of doing things like
this. Values that programmers expect to be numbers will still be
numbers even after you try to debug-print them and they get
stringified. They become much less fragile.

We've been slightly hesitant to merge this though, because such a
change does have the potential for some far-reaching consequences.
There is no doubt code around somewhere - on CPAN or the DarkPAN -
which may now behave differently because of it. Indeed, some may say
this is the point - we can now begin to make a distinction between
"intended as a string" and "intended as a number" where previously we
could not. It is an area which needs further exploration.

Already some modules have been tested against this branch and found to
be working fine:

https://github.com/Perl/perl5/pull/18958#issuecomment-874044458

We'd like more input though. If anyone has some code that is likely to
care about this distinction (primarily we're thinking data
serialisation and similar, but no doubt it'll pop up in other places
too) can you test it and let us know. Or, at the very least, point us
in the direction of some code that we can test for you.

It would be nice to get this in place, because it would be the first
step towards having some sort of Perl-visible distinction in these
different programmer intents. That's a longer and more interesting
discussion for another time though. For now we just want to know:

Will PR #18958 cause any breakage? Is it a good idea to merge it now?

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
On Thu, Jul 22, 2021 at 9:41 PM Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

>
> Nicholas Clark has a PR that makes a small change to the internal Perl
> function that stringifies a scalar:
>
> https://github.com/Perl/perl5/pull/18958
>
> The upshot of this is to change the way the flags are set on the SV
> after this stringification, so that the POK flag is no longer set.
>
>
But what if it's the PV slot that holds the numeric value we want to use ?
Doesn't it get lost if the POK flag is unset ?

In the following contrived example, the PV , NV and IV slots all end up
with different (unique) values.
But it's the value held in the PV slot that is the one we're interested in.

###############################
use strict;
use warnings;
use Math::BigInt;
use POSIX;
use Devel::Peek;

my $num;
my $str = '98765' x 70; # could be any value

# If the PV is greater than DBL_MAX
# assign the PV to a Math::BigInt object.
# Else, assign the PV to an IV/UV/NV

if($str > POSIX::DBL_MAX) # $str is now POK && NOK and/or IOK
# irrespective of the value in
the PV slot
{
$num = Math::BigInt->new($str);
}
else
{
$num = $str + 0;
}

print "$num\n";
Dump $str;
###############################

I haven't actually looked at PR #18958.
Which branch is it ? I should probably give it a whirl.

Cheers,
Rob
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
On Fri, 23 Jul 2021 00:20:27 +1000
sisyphus <sisyphus359@gmail.com> wrote:

> But what if it's the PV slot that holds the numeric value we want to
> use ? Doesn't it get lost if the POK flag is unset ?

No. The PV slot being valid is signaled by the private pPOK flag, not
the (public) POK flag.

SVs contain two sets of flags, that before this PR is applied, are
basically handled identically. There's almost no situations in perl
currently that result in the private flags differing from the public
ones.

The intention of this PR is to split them apart a bit as per their
original intention - that the private flags say "this cached SV slot
field is valid", vs the public flags that say "... and you should
actually use it".

> I haven't actually looked at PR #18958.
> Which branch is it ? I should probably give it a whirl.

github says that it's nwc10:IV-vs-PV

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
2021-7-22 20:41 Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:

> *dons the PSC hat once again...*
>
> It is currently the case that the way Perl turns scalars into strings
> will cause both the the POK ("has a string") and the IOK or NOK
> ("has some kind of number") flags to be set. Both flags are also set
> when Perl tries to use a value that was previously a string as a
> number. This means that in the following scenario, both values come out
> identical:
>

I have no ability to judge this issue.

However, I felt that this was tied to the real problem of JSON output.

I want to know if the JSON module authors respond well.

JSON, JSON::PP, JSON::XS, Mojo::JSON, Cpanel::JSON

Is it possible to communicate?
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
On Thu, Jul 22, 2021 at 9:41 PM Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> *dons the PSC hat once again...*
>
> This means that in the following scenario, both values come out
> identical:
>
> my $was_POK = "10";
> my $was_IOK = 10;
>
> print "The values are $was_POK, $was_IOK"
> if $was_POK == 10 and $was_IOK == 10;
>
> # At this point, both scalars have both POK and IOK flags
> ## (Actually for the nitpickers, this may be the NOK flag, to be
> ## honest I'm not entirely sure but the distinction isn't important
> ## today)
>
>
Just an aside on that:
If I change the four occurrences of the integer value (10) to a floating
point value (say 10.2), then neither the POK nor the pPOK flag gets set on
$was_IOK.
On perl-5.34.0, according to Devel::Peek, $was_IOK ends up with only the
NOK and pNOK flags set.

######################
use warnings;
use Devel::Peek;

my $was_POK = "10.2";
my $was_IOK = 10.2;

print "The values are $was_POK, $was_IOK"
if $was_POK == 10.2 and $was_IOK == 10.2;

Dump $was_POK;
Dump $was_IOK;
######################

I don't know if there's a reason for that difference in behaviour.
I don't know if that difference is in any way relevant to PR#18958

Cheers,
Rob
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
On Fri, 23 Jul 2021 12:55:34 +1000
sisyphus <sisyphus359@gmail.com> wrote:

> Just an aside on that:
> If I change the four occurrences of the integer value (10) to a floating
> point value (say 10.2), then neither the POK nor the pPOK flag gets set on
> $was_IOK.
> On perl-5.34.0, according to Devel::Peek, $was_IOK ends up with only the
> NOK and pNOK flags set.
>
> ######################
> use warnings;
> use Devel::Peek;
>
> my $was_POK = "10.2";
> my $was_IOK = 10.2;
>
> print "The values are $was_POK, $was_IOK"
> if $was_POK == 10.2 and $was_IOK == 10.2;
>
> Dump $was_POK;
> Dump $was_IOK;
> ######################
>
> I don't know if there's a reason for that difference in behaviour.
> I don't know if that difference is in any way relevant to PR#18958
>
> Cheers,
> Rob

NVs are treated specially, the idea is that we don't want to cache their
stringification, because 'use locale' might change the radix point
character. See https://github.com/Perl/perl5/issues/11872 for more
details.

As you have (indirectly) showed, since $was_POK is both NOK and POK, its
string form *is* cached, and thus it isn't sensitive to "use locale".
I'm not sure if it's a bug or not.
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
On Fri, Jul 23, 2021 at 1:55 PM Tomasz Konojacki <me@xenu.pl> wrote:


> NVs are treated specially, the idea is that we don't want to cache their
> stringification, because 'use locale' might change the radix point
> character. See https://github.com/Perl/perl5/issues/11872 for more
> details.
>
>
The upshot of this is that the proposed type of change won't affect NVs.
It's affecting only numeric scalars that subsequently acquire a setting in
the PV slot, and NVs don't subsequently acquire such a setting for the
reason you've indicated.
(Also, the value contained in the NVs PV slot would often be different to
the value in the NV slot. I wondered whether that might have been another
reason for the behaviour .... but I doubt that was ever actually a
consideration.)

So, I didn't need to worry to about the condition (if($VNOK(arg) &&
!SvPOK(arg)) being altered by this type of change, but it's a different
story wrt the condition (if($VIOK(arg) && !SvPOK(arg)).
However, the latter is something I've never found a need to do.


> As you have (indirectly) showed, since $was_POK is both NOK and POK, its
> string form *is* cached, and thus it isn't sensitive to "use locale".
> I'm not sure if it's a bug or not.
>

I hope that, if it *is* a bug, then it's one that doesn't get fixed.
It would be a pity to lose a string like '1.23e5000' (which would generally
assign as a finite value to Math::BigFloat) just because one first checked
the magnitude of it via some numeric comparison.
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
On Thu, 22 Jul 2021 12:40:21 +0100
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> We'd like more input though. If anyone has some code that is likely to
> care about this distinction (primarily we're thinking data
> serialisation and similar, but no doubt it'll pop up in other places
> too) can you test it and let us know. Or, at the very least, point us
> in the direction of some code that we can test for you.
>
> It would be nice to get this in place, because it would be the first
> step towards having some sort of Perl-visible distinction in these
> different programmer intents. That's a longer and more interesting
> discussion for another time though. For now we just want to know:
>
> Will PR #18958 cause any breakage? Is it a good idea to merge it
> now?

It's been quite a few months now and so far nobody's come along and
said "no this is a bad idea because ..."

I propose therefore we should look into actually merging this.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: PV vs NV/IV distinctions - crowd-testing required [ In reply to ]
2022-1-26 9:04 Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:

> On Thu, 22 Jul 2021 12:40:21 +0100
> "Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:
>
> > We'd like more input though. If anyone has some code that is likely to
> > care about this distinction (primarily we're thinking data
> > serialisation and similar, but no doubt it'll pop up in other places
> > too) can you test it and let us know. Or, at the very least, point us
> > in the direction of some code that we can test for you.
> >
> > It would be nice to get this in place, because it would be the first
> > step towards having some sort of Perl-visible distinction in these
> > different programmer intents. That's a longer and more interesting
> > discussion for another time though. For now we just want to know:
> >
> > Will PR #18958 cause any breakage? Is it a good idea to merge it
> > now?
>
> It's been quite a few months now and so far nobody's come along and
> said "no this is a bad idea because ..."
>
> I propose therefore we should look into actually merging this.
>
>
>
I feel this is an improvement and breaks little, however the actual effect
is not known.

Unexpected incompatibilities can occur.

Is it merged just after Perl 5.36 release?