Mailing List Archive

PSC #040(?) - Perl SV Flags special
Present: Neil, Paul, Rik plus Karl, Nick, Yves

Background: The overall problem of trying to track "programmer intent"
on things like strings-vs-numbers; useful for situations like encoders
for JSON/MsgPack/etc, Sereal, as well as general in-language use in
places that Perl does not currently support.

We spent the first part of the meeting on a fairly informal discussion
trying to drill down into what the real problem is that we're trying to
address. The consensus overall was that the existing SV flags (the 3
"public" POK, NOK, IOK, plus 3 "private" pPOK, pNOK, pIOK) are a poor
interface for XS code (and by extension perl code) to ask as to the
intended nature of Perl values.

The discussion quickly agreed that the meaning of the private flags is
fairly clear - private flags are set when the corresponding `struct SV`
field is set to a value, and thus that field is meaningful to be read
again.

What is far less clear is what the public flags are meant to mean. We
spent a while each discussing what our personal "mental model" of these
flags is, to eventually discover that there isn't a clear consensus on
what they really mean in precise detail; there's edge-cases all round.
For instance, currently PV->IV conversion will turn on IOK and pIOK,
but PV->NV conversion will not turn on NOK while it does pNOK. Looking
historically there have been a number of changes to the behaviours of
public flags over the years. Sometime around the 5.18 era Chip made a
big change.

Nick writes:
The big Chip patch was in 2012:
https://github.com/Perl/perl5/commit/4bac9ae47b5ad7845a24e26b0e95609805de688a

Effectively in its comment about "This scheme did not cover ROK,
however" it's referencing this commit:

https://github.com/Perl/perl5/commit/463ee0b2acbd047c27e8b5393cdd8398881824c5

(I'd love to say "just follow that link to be surprised" but you'd all
think that it was a Rickroll, so I'll observe that it's "perl 5.0
alpha 4")

which added the flags SVp_POK ...IOK and ..NOK and added references
for the first time, but didn't add a private flag for references. I
assume because it didn't occur to Larry that magic get might return a
reference, not just values.

In summary: it's a mess.

Having spent enough time on that we decided it might be better to
attack the problem from a different route, by carefully defining what
sort of API shape we want, in terms of what operations should be
supported and how they should behave and interact with one another.

A central theme of the problem is that while we have many API-defined
functions/macros for setting SV value fields (e.g. `sv_setiv()`) and
querying them again (e.g. `SvIV()`), we don't actually have anything to
ask type-intent of values (e.g. the "is this a string?" question). All
we have is large bodies of XS code that tries to do this by testing the
various public (or at times private) xOK flags, but without clear
guidance from core they often cope poorly in weird cornercases. Core's
behaviour of smudging additional flags on also doesn't help here - it's
far too easy to end up with SVs that are both POK and IOK, having
forgotten if they were originally strings or numbers. Nick's PR
(https://github.com/Perl/perl5/pull/18958) will certainly help this
situation but that isn't sufficient.

The suggested next steps here involve creating a long list of "test
cases"; situations involving performing various kinds of operations on
values/variables, and specifying what are the properties of results,
and side-effects on variables within it. Likely many of these
properties will take the form of "appears to be a string" or "appears
to be a number" or similar.

These test cases would suggest the form of an `XS::APItest`-like `.t`
file which can live in Perl core, containing all the tests. Most
critically it would require defining a new API to ask questions of the
form "appears to be (some sort of data shape)" about the SVs. For
instance, while the recent "stable-bools" branch added `SvIsBOOL()`
there aren't similar test functions for other value shapes like strings
and numbers. So that would need thinking about somehow.

Once an API shape is settled on, we can then look at how to implement
the predicate tests. Likely many of them will take the form of SV flags
tests, in combination with maybe some amount of behavioural change in
how core perl sets these flags. But it is important to stress that the
SV flags are very much the "core internals" of how the tests would
work. The test functions themselves would be the expected API that XS
authors would use (and likely pureperl authors via some wrapper
functions in maybe Scalar::Util or more likely whatever our new
std/builtins/functions namespace ends up becoming; see #PSC 034).

Acknowledging the fact that there is currently a large body of existing
XS code which tests particular flag combinations, it would still be
useful if the new behaviour of core internals was largely compatible
with what decisions this existing code makes, at least in the common
unambiguous cases.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: PSC #040(?) - Perl SV Flags special [ In reply to ]
On Thu, Oct 14, 2021 at 11:18:42AM +0100, Paul "LeoNerd" Evans wrote:

> What is far less clear is what the public flags are meant to mean. We

After the call, I think I came to the conclusion that (pragmatically) the
public flags really are "implementation detail" than "public".

https://github.com/Perl/perl5/pull/18958#issuecomment-937622729

I don't know if it's even written down anywhere, but the basic "data
structure with cached values and flags" design paradigm seems to be

* have a data structure with space to cache conversions and flags
* have macros which inline small code which can quickly test the flags
for common cases, and where possible use cached values
* else call into a function for the slow path, which can hopefully
cache values and set flags to permit fast paths in the future
* arrange all the various SV data structures so that the cache slots
are at the same offsets from (stored) pointers, so that the reading
code in the macros doesn't need to branch


The second problem we're juggling is that that Perl doesn't have a type
system (old news!) but in this context it's that it doesn't have a
*numeric* type system.

(Bonus fun)


as in

$c = $a + $b;

is "obviously" numeric addition.

But what should be the result of?

18014398509481984 + 2e16;

or even

1e16 + 2e16;


The point/problem being that all these values *can* be stored as 64 bit
integers, but effectively printing them out *as* 64 bit integers creates
false precision.

(This doesn't matter in JavaScript because all maths *is* 64 bit IEEE
floating point. But we're faking things internally to also offer 64 bit
integer maths on 64 bit platforms. And we don't have a type system...)


and related to this is that this is valid and doesn't warn:

$ perl -wle 'print " 18014398509481984" + "18014398509481984 "'
36028797018963968


(consider scripts that process data coming in from text files, where those
files are a bit sloppy with their whitespace.)


We have all these useful behaviour currently. It's a juggling act to keep
them all working whilst also changing things/adding more "things we test for"

> https://github.com/Perl/perl5/commit/463ee0b2acbd047c27e8b5393cdd8398881824c5
>
> (I'd love to say "just follow that link to be surprised" but you'd all
> think that it was a Rickroll, so I'll observe that it's "perl 5.0
> alpha 4")

This, on the other hand...

https://geizhals.at/?=PHPE9568F35-D428-11d2-A769-00AA001ACF42

> forgotten if they were originally strings or numbers. Nick's PR
> (https://github.com/Perl/perl5/pull/18958) will certainly help this
> situation but that isn't sufficient.

I think that it's pretty close to sufficient for distinguishing "numeric"
vs "string". In that:

> The suggested next steps here involve creating a long list of "test
> cases"; situations involving performing various kinds of operations on
> values/variables, and specifying what are the properties of results,
> and side-effects on variables within it. Likely many of these
> properties will take the form of "appears to be a string" or "appears
> to be a number" or similar.

Yves presented some test cases that were new to me, which I can see cause
code paths to be taken deep in the conversion routines that don't set flags
consistently with how that PR assumes they should be. But for all that I
skimmed, it seemed fixable (within the limits of "string" vs "number")

Specifically, I think that the testing regime should be combinatorial of

1) I create a value (an integer literal, a floating point literal, or a string
containing something that is, or is *close* to either)
2) Maybe I copy it
3) I read it (or the copy) as (in an integer expression, in a floating point
expression, in a string context)
(internally a conversion might cached, and flags might change as a result)
4) Maybe I copy it again

5) Can the new API still report correctly what step (1) was?
6) If it's used in addition or other "maybe IV/maybe NV" arithmetic, does it
behave the same way as if used immediately after step 1?
(same choice of IV vs NV? Same warnings?)


I think that this is viable for "string" vs "numeric"

(vs undef vs boolean vs reference vs "you're own your own here because it's a
dualvar")


I'm not sure that we can push this to distinguishing between "started as an
integer literal" vs "started a floating point", *and* I'm not sure if we
need to. The big problem we're trying to solve here is correctly generating
formats such as JSON and YAML that *are* sensitive to strings vs numbers,
and I didn't think that they (or their other-language consumers) were
sensitive to "what sort of a number is it?"

Nicholas Clark
Re: PSC #040(?) - Perl SV Flags special [ In reply to ]
On Fri, Oct 15, 2021 at 4:03 AM Nicholas Clark <nick@ccl4.org> wrote:

> I'm not sure that we can push this to distinguishing between "started as an
> integer literal" vs "started a floating point", *and* I'm not sure if we
> need to. The big problem we're trying to solve here is correctly generating
> formats such as JSON and YAML that *are* sensitive to strings vs numbers,
> and I didn't think that they (or their other-language consumers) were
> sensitive to "what sort of a number is it?"
>

I can't speak to the others, but JSON does distinguish integers and floats;
some implementations have deigned to preserve roundtripping NVs with a
trailing '.0' though there are a couple problems with this. Perl NVs pop
into existence at some unexpected times, such as the result of 2**2. And
this isn't as important IME as creating, say, a JSON document with boolean
values.

-Dan
Re: PSC #040(?) - Perl SV Flags special [ In reply to ]
On Fri, Oct 15, 2021 at 10:18:36AM -0400, Dan Book wrote:
> On Fri, Oct 15, 2021 at 4:03 AM Nicholas Clark <nick@ccl4.org> wrote:
>
> > I'm not sure that we can push this to distinguishing between "started as an
> > integer literal" vs "started a floating point", *and* I'm not sure if we
> > need to. The big problem we're trying to solve here is correctly generating
> > formats such as JSON and YAML that *are* sensitive to strings vs numbers,
> > and I didn't think that they (or their other-language consumers) were
> > sensitive to "what sort of a number is it?"
> >
>
> I can't speak to the others, but JSON does distinguish integers and floats;
> some implementations have deigned to preserve roundtripping NVs with a
> trailing '.0' though there are a couple problems with this. Perl NVs pop
> into existence at some unexpected times, such as the result of 2**2. And
> this isn't as important IME as creating, say, a JSON document with boolean
> values.

Oh my. How does (or "can?") Javascript itself manage to roundtrip JSON like
this?


How to implement C<**> in perl to return integers for integer input wasn't
obvious, given that the C library builtins are pow (and also powl these
days).

Hence it kept its pre-5.8.0 behaviour of always returning an NV.
It's consistent.

If there's a good way to do it in (or close to) O(1) for integers, I'd
love to know.

Nicholas Clark
Re: PSC #040(?) - Perl SV Flags special [ In reply to ]
On Fri, 15 Oct 2021 08:02:54 +0000
Nicholas Clark <nick@ccl4.org> wrote:

> After the call, I think I came to the conclusion that (pragmatically)
> the public flags really are "implementation detail" than "public".

...

Yes; the more we look at this the more it feels like the flags aren't
really "private" vs "public", as a case of "private and nobody looks at
them" vs "private but other people are nosey and poke about at them
anyway". We don't have a proper public API to look at these - which is
the intention we're aiming to fix.

> The second problem we're juggling is that that Perl doesn't have a
> type system (old news!) but in this context it's that it doesn't have
> a *numeric* type system.

Let's steal Scheme's ;)

Hah - no, I joke. But I do mention it because people often comment that
Perl's "type system" is broken because it doesn't distinguish numbers
vs strings. But then many other languages don't distinguish integers
vs. reals. Scheme in particular has a whole huge category that, for
example, can represent "exact" rationals such as 1/3 as real
two-integer quotients, as a fundamentally different type from "inexact"
floats; the common implementation of which is an IEEE double.

From Scheme's perspective almost any other type system is lacking in
fine-grained detail.

...

> (This doesn't matter in JavaScript because all maths *is* 64 bit IEEE
> floating point. But we're faking things internally to also offer 64
> bit integer maths on 64 bit platforms. And we don't have a type
> system...)

This is why I understood that JSON (whose origin inspiration is
JavaScript) wouldn't distinguish true integers from floats. It's simply
a formatting optimisation by realising that some numerical values can
be exactly expressed without a decimal point and trailing sub-unit
digits.


--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: PSC #040(?) - Perl SV Flags special [ In reply to ]
On Fri, Oct 15, 2021 at 10:34 AM Nicholas Clark <nick@ccl4.org> wrote:

> On Fri, Oct 15, 2021 at 10:18:36AM -0400, Dan Book wrote:
> > On Fri, Oct 15, 2021 at 4:03 AM Nicholas Clark <nick@ccl4.org> wrote:
> >
> > > I'm not sure that we can push this to distinguishing between "started
> as an
> > > integer literal" vs "started a floating point", *and* I'm not sure if
> we
> > > need to. The big problem we're trying to solve here is correctly
> generating
> > > formats such as JSON and YAML that *are* sensitive to strings vs
> numbers,
> > > and I didn't think that they (or their other-language consumers) were
> > > sensitive to "what sort of a number is it?"
> > >
> >
> > I can't speak to the others, but JSON does distinguish integers and
> floats;
> > some implementations have deigned to preserve roundtripping NVs with a
> > trailing '.0' though there are a couple problems with this. Perl NVs pop
> > into existence at some unexpected times, such as the result of 2**2. And
> > this isn't as important IME as creating, say, a JSON document with
> boolean
> > values.
>
> Oh my. How does (or "can?") Javascript itself manage to roundtrip JSON like
> this?
>
>
> How to implement C<**> in perl to return integers for integer input wasn't
> obvious, given that the C library builtins are pow (and also powl these
> days).
>
> Hence it kept its pre-5.8.0 behaviour of always returning an NV.
> It's consistent.
>
> If there's a good way to do it in (or close to) O(1) for integers, I'd
> love to know.
>

** is certainly not the only case which does this unexpectedly. '42' +
'bar' also ends up as an NV, though this is less commonly encountered. See
https://github.com/rurban/Cpanel-JSON-XS/pull/63 for the full discussion.

-Dan
Re: PSC #040(?) - Perl SV Flags special [ In reply to ]
On Thu, Oct 14, 2021 at 11:18:42AM +0100, Paul LeoNerd Evans wrote:

> The suggested next steps here involve creating a long list of "test
> cases"; situations involving performing various kinds of operations on
> values/variables, and specifying what are the properties of results,
> and side-effects on variables within it. Likely many of these
> properties will take the form of "appears to be a string" or "appears
> to be a number" or similar.
>
> These test cases would suggest the form of an `XS::APItest`-like `.t`
> file which can live in Perl core, containing all the tests. Most
> critically it would require defining a new API to ask questions of the
> form "appears to be (some sort of data shape)" about the SVs. For
> instance, while the recent "stable-bools" branch added `SvIsBOOL()`
> there aren't similar test functions for other value shapes like strings
> and numbers. So that would need thinking about somehow.

While I daren't suggest that I've got it right, this might be a good
place to start as it looks at all kinds of weird edge-cases:
https://metacpan.org/release/DCANTRELL/Scalar-Type-0.1.2/source/t/all.t

although given the current limitations, those tests are aimed at
divining whether we can accurately represent a value as a number (or as
an int) as opposed to whether it was originally a number/int.

--
David Cantrell | Reality Engineer, Ministry of Information

European immigration: making Britain great since AD43