Mailing List Archive: Native data checking in Perl

Native data checking in Perl

curtis.poe at gmail

May 17, 2023, 1:17 AM

Post #1 of 43 (913 views)

Hi all,

I’ve been reading through past P5P messages to get a better sense of what
the porters are willing to accept for declaring “types.” I’ll use the term
data “checks” to avoid confusion about type systems.

For a few months, Damian and I have been hashing out a system of checks for
Perl. In particular, we looked at some of the most popular solutions:
Moo/se and Type::Tiny (working within Perl’s constraints), and also Dios an
Zydeco (working to extend Perl’s constraints), plus Raku and some
diversions into other languages.

Unfortunately, Damian deeply regrets that, for personal reasons, he is
unable to continue to work on the project, but is still wholeheartedly in
favor of its stated goals.

At present, there’s a working implementation in a private repo and we
support things like this:

sub foo :returns(ARRAY[INT]) ( $max_size :of(UINT) ) {...}
my $count :of(INT) = 0;

In fact, we support a lot more than that and have almost 200K tests
passing, though it's still very much alpha.

In reading P5P and other discussions, it was pleasant to see that there’s
little objection to the notion of checks, but the disagreements largely
arise over syntax and then discussion dies down again. In sharing our work
privately with others, the strongest objection has been the syntax. People
want this (punting on return values):

sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
my Int $count = 0;

Or this:

sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
my $count is Int = 0;

Or something else entirely.

Our proposal used attributes simply because we were concerned that P5P
would reject the my Dog $spot syntax (and I like the attribute syntax)
because it’s already taken (though not frequently used). However, given the
feedback, I think this didn’t need to be a concern.

Can we discuss this? I think the *semantics* are largely in place. It’s the
*syntax* where things keep breaking down. If we can find an agreed syntax,
I can take the existing work and write up a spec (that’s actually the hard
part, but most of it’s done, pending syntax changes).

The design “the thing and hand it over” worked for Corinna. This will be a
much more important change, but it can work here, too.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

leonerd at leonerd

May 17, 2023, 2:27 AM

Post #2 of 43 (913 views)

On Wed, 17 May 2023 10:17:57 +0200
Ovid <curtis.poe@gmail.com> wrote:

> we were concerned that P5P
> would reject the my Dog $spot syntax (and I like the attribute syntax)
> because it’s already taken (though not frequently used). However,
> given the feedback, I think this didn’t need to be a concern.

I'm in favour of using that syntax *exactly because* it is reserved for
exactly this reason. Please stop avoiding using it because of that -
this is what it's there for :)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 17, 2023, 2:32 AM

Post #3 of 43 (913 views)

Hi there,

On Wed, 17 May 2023, Ovid wrote:

> ... declaring “types.” ...

Wouldn't that be nice? :)

> ... disagreements largely arise over syntax ...

Unsurprisingly.

> ... People want this ...
>
> sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
> my Int $count = 0;
>
> Or this:
>
> sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
> my $count is Int = 0;
>
> Or something else entirely.

Coming from a number of other languages before Perl, the first of
those feels natural to me. The second makes my flesh crawl. As for
the others, I wouldn't know what I might like or not until I saw it.
(Also, ignoring what might be a UInt/Uint typo, I kinda like UInt.)

> Can we discuss this? I think the *semantics* are largely in
> place. It’s the *syntax* where things keep breaking down. ...

Isn't there more to it than that? One of my personal hobbyhorses is
that I've fallen into bear-traps using 64-bit integers, where just
using a scalar in a print statement changed the scalar's value without
any warning. When used two lines later in the same sub, it's clear
that my 64 flags have been converted from an integer type to a float
type, with the result that the rounding errors have totally trashed my
64 flags. I've had to jump through hoops to get things to look sane.

I'm not sure what these proposals would do to help me. I'm not saying
that it makes them bad proposals, just that I don't know if they'd be
any help to me if they were implemented. It seems like there's quite
a large mountain of code Out There that would need majorly overhauling
to make general use of something like this. OTOH I imagine there's an
opportunity for a great deal of efficiency improvement if you can say
"this thingy is never going to need to store anything but an integer".

I wonder if something like a more accessible interface to XS might be
more rewarding; if you want types, use a language which offers them.
Around here, more or less daily I link code I wrote in assembler with
code I wrote in C and C++, because by doing it that way it was easier
to get exactly what I wanted. You might even be able to avoid some of
the argument about syntax this way. Of course your alpha code would
then likely end up in the bin, sorry.

Incidentally I haven't used the term 'integer' as a placeholder for
some generic typedef. I'd be happy to start with plain old integers,
and worry about other types in a decade or two.

I hope this helps, even if it's only a little.

--

73,
Ged.

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 17, 2023, 4:56 AM

Post #4 of 43 (913 views)

On Wed, May 17, 2023 at 11:32?AM G.W. Haywood via perl5-porters <
perl5-porters@perl.org> wrote:

> > sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
> > my Int $count = 0;
> >
> > Or this:
> >
> > sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
> > my $count is Int = 0;
>
> Coming from a number of other languages before Perl, the first of
> those feels natural to me.

I suspect most feel the same way.

I'm not sure what these proposals would do to help me. I'm not saying
> that it makes them bad proposals, just that I don't know if they'd be
> any help to me if they were implemented. It seems like there's quite
> a large mountain of code Out There that would need majorly overhauling
> to make general use of something like this.

Two things:

1. Regarding the 64-bit bug you mentioned (but didn't include above), if
there's a pre-existing bug, checks don't make the situation worse. If
checks introduce a new bug, that's an issue. Sounds like you hit a
pre-existing bug.
2. There's no mountain of code. All checks are optional and you could
introduce them for a single "problematic" variable if you choose:

sub foo ($name, $field, HashRef[Int], $totals) {...}

OTOH I imagine there's an
> opportunity for a great deal of efficiency improvement if you can say
> "this thingy is never going to need to store anything but an integer".
>

Probably, but that's not an issue for now and probably shouldn't be for the
MVP.

I just want to have some convergence on syntax before I rewrite the spec.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

May 18, 2023, 12:04 PM

Post #5 of 43 (913 views)

On Wed, May 17, 2023 at 10:17:57AM +0200, Ovid wrote:
> At present, there’s a working implementation in a private repo and we
> support things like this:
>
> sub foo :returns(ARRAY[INT]) ( $max_size :of(UINT) ) {...}
> my $count :of(INT) = 0;
[snip]
> Can we discuss this? I think the *semantics* are largely in place. It’s the
> *syntax* where things keep breaking down.

Well I for one would like to know (in general terms) what the proposed
semantics are, because it's its not obvious to me from your email. And off
the top of my head I can think of several possible semantics.

So for example, what does

my $count :of(INT) = ...;

actually do?

Some obvious guesses are that:

* it warns or croaks if the value about to be assigned isn't an int:

my $count :of(INT) = "99"; # string, not int

* it warns or croaks if the value about to be assigned isn't coercible to
an int:

my $count :of(INT) = "99"; # ok
my $count :of(INT) = "foo"; # fails
my $count :of(INT) = undef; # fails
my $count :of(INT) = 1.2; # fails?

* it doesn't warn/croak, but ensures that $count ends up as an int:

my $count :of(INT) = "99";
Dump $count; # shows that $count is an IV, not a PV.

* It also (or not) ensures that any further assignments to $count
during its lifetime also do one of the things suggested above

my $count :of(INT) = 1;
$count = "foo"; # fails - or not?

--
Indomitable in retreat, invincible in advance, insufferable in victory
-- Churchill on Montgomery

Re: Native data checking in Perl [ In reply to ]

chris at prather

May 18, 2023, 1:25 PM

Post #6 of 43 (913 views)

On Thu, May 18, 2023 at 3:05?PM Dave Mitchell <davem@iabyn.com> wrote:
>
> On Wed, May 17, 2023 at 10:17:57AM +0200, Ovid wrote:
> > At present, there’s a working implementation in a private repo and we
> > support things like this:
> >
> > sub foo :returns(ARRAY[INT]) ( $max_size :of(UINT) ) {...}
> > my $count :of(INT) = 0;
> [snip]
> > Can we discuss this? I think the *semantics* are largely in place. It’s the
> > *syntax* where things keep breaking down.

I'm not Ovid, and I don't have the hair to play him on TV but I have
spent a lot of time thinking about this (and had some opinionated
conversations with Ovid about it too).

> Well I for one would like to know (in general terms) what the proposed
> semantics are, because it's its not obvious to me from your email. And off
> the top of my head I can think of several possible semantics.
>
> So for example, what does
>
> my $count :of(INT) = ...;
>
> actually do?

Well the semantics of Type::Tiny and Moose in general are (when boiled
down to their simplest form) a name associated with some kind of
subroutine that returns a boolean. Someone somewhere would declare
something like `$checks{INT} = sub { m/\d+/ };` Then `my $count
:of(INT) = ...` would call the code associated with INT on the
assignment to $count and if the code returns false it throws an
exception.

> Some obvious guesses are that:
>
> * it warns or croaks if the value about to be assigned isn't an int:
>
> my $count :of(INT) = "99"; # string, not int

Yes, but see below.

> * it warns or croaks if the value about to be assigned isn't coercible to
> an int:
>
> my $count :of(INT) = "99"; # ok
> my $count :of(INT) = "foo"; # fails
> my $count :of(INT) = undef; # fails
> my $count :of(INT) = 1.2; # fails?

Even "99" and 1.2 should IMO fail, but it depends on the
implementation of the check code for INT, and what defaults we set for
implicit coercions. Moose requires you flag coercible attributes
(coerce => 1 on an attribute definition), this being Perl and
coercions being baked so deeply into the language I don't think that
would fly in general but allowing someone to disable them would be
incredibly useful/nice e.g. `

no coercion;
my $count :of(INT) = "99"; # warn/croak/dies
my $count :of(INT) = 1.2; # warn/croak/die

> * it doesn't warn/croak, but ensures that $count ends up as an int:
>
> my $count :of(INT) = "99";
> Dump $count; # shows that $count is an IV, not a PV.

Probably but see above, I'd like to be able to make that
warn/croak/die in some circumstances.

> * It also (or not) ensures that any further assignments to $count
> during its lifetime also do one of the things suggested above
>
> my $count :of(INT) = 1;
> $count = "foo"; # fails - or not?

Yeah that should IMO fail, it should with Moose or Type::Tiny … as
long as the check code is associated with the variable it would be
called when the variable is set or when it would trigger a coercion.
For example:

my $count :of(INT) = 99; # all good so far
say ''.$count; # I want enable this to warn/croak/die

Being able to make the system warn/croak/die on the latter would
really help track down problems with data that's destined for places
where "99" and 99 are sometimes not equal. We do this sometimes with
uninitialized values, it'd be nice to be able to tell Perl what I'm
expecting and have it tell me when that's not what I've got.

Without having seen much of Ovid and Damian's code (but having had a
long conversation with Ovid about it a couple months ago), it seems to
me that having a way to "efficiently" associate a variable with
potentially user-defined code that checks that value it's about to
contain belongs to a set of allowed values, and to warn/croak/die when
it doesn't … would make finding whole classes of data related bugs
easier. The words "efficiently" and "check ... allowed values" are
carrying a _lot_ of water there but that's the basic semantics that I
think Ovid was assuming everyone agreed with. It seems to be where
CPAN has settled.

-Chris

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 18, 2023, 11:21 PM

Post #7 of 43 (913 views)

On Thu, May 18, 2023 at 9:05?PM Dave Mitchell <davem@iabyn.com> wrote:

> > Can we discuss this? I think the *semantics* are largely in place. It’s
> the
> > *syntax* where things keep breaking down.
>
> Well I for one would like to know (in general terms) what the proposed
> semantics are, because it's its not obvious to me from your email. And off
> the top of my head I can think of several possible semantics.

I understand what you're coming from. I was strongly hoping we could at
least get some consensus on what P5P would be willing to accept in terms of
syntax so that when the rest comes along, part of the issue is out of the
way.

However, now that you've asked the questions, I don't think I can
reasonably dodge them. However, I don't want to get bogged down in a debate
on semantics at this time. Suggestions around this area have died too many
times, despite the apparent support for the idea.

>
>
So for example, what does
>
> my $count :of(INT) = ...;
>
> actually do?
>

By default this is fatal, but it can told to warn instead:

> my $count :of(INT) = "99"; # string, not int
>
> * it warns or croaks if the value about to be assigned isn't coercible to
> an int:
>
> my $count :of(INT) = "99"; # ok
> my $count :of(INT) = "foo"; # fails
> my $count :of(INT) = undef; # fails
> my $count :of(INT) = 1.2; # fails?
>
> * it doesn't warn/croak, but ensures that $count ends up as an int:
>
> my $count :of(INT) = "99";
> Dump $count; # shows that $count is an IV, not a PV.
>
> * It also (or not) ensures that any further assignments to $count
> during its lifetime also do one of the things suggested above
>
> my $count :of(INT) = 1;
> $count = "foo"; # fails - or not?
>
>
> --
> Indomitable in retreat, invincible in advance, insufferable in victory
> -- Churchill on Montgomery
>

--
Curtis "Ovid" Poe
--
CTO, All Around the World
World-class software development and consulting
https://allaroundtheworld.fr/

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 18, 2023, 11:23 PM

Post #8 of 43 (913 views)

The original, obviously incomplete email was sent by accident before I
completed it. I'll send another one soon.

> By default this is fatal, but it can told to warn instead:
>
>
>

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 19, 2023, 12:16 AM

Post #9 of 43 (913 views)

On Thu, May 18, 2023 at 9:05?PM Dave Mitchell <davem@iabyn.com> wrote:

> On Wed, May 17, 2023 at 10:17:57AM +0200, Ovid wrote:
> Well I for one would like to know (in general terms) what the proposed
> semantics are, because it's its not obvious to me from your email. And off
> the top of my head I can think of several possible semantics.
>

I understand what you're coming from. I was strongly hoping we could at
least get some consensus on what P5P would be willing to accept in terms of
syntax so that when the rest comes along, part of the issue is out of the
way.

However, now that you've asked the questions, I don't think I can
reasonably dodge them. However, I don't want to get bogged down in a debate
on semantics at this time. Suggestions around this area have died too many
times, despite the apparent support for the idea.

> So for example, what does
>
> my $count :of(INT) = ...;
>
> actually do?
>

By default it's fatal. However, you can ask it to warn instead, or even
disable checks completely.

> my $count :of(INT) = "99"; # ok
> my $count :of(INT) = "foo"; # fails
> my $count :of(INT) = undef; # fails
> my $count :of(INT) = 1.2; # fails?
>

You have the above correct. Assigning a reference also fails, with the
exception that if the reference is blessed and it has numeric overloading,
it should overload as an integer and that value will be used.

The checks do not coerce any data beyond what Perl does natively (e.g., the
string "99" becomes a number). Uer-defined coercions are specced but not
intended to be in the MVP. They are orthogonal to checks, including being
defined separately, because if you need to disable checks for performance
reasons (or because the check turns out to be wrong), you can't safely
disable coercions without breaking the code.

> my $count :of(INT) = "99";
> Dump $count; # shows that $count is an IV, not a PV.

Great question. This is not specced, but clearly something we'd want to
know. At present, it's a PV. The way checks currently work is that the
check is made after the assignment. This is an implementation artefact. I
think IV would be probably be better, but I'd rather bikeshed that later.

> my $count :of(INT) = 1;

$count = "foo"; # fails - or not?
>

Yes, that fails. For Moo/se, assuming you use the methods and don't "reach
inside" the object, the isa declaration on the attribute is respected. For
Type::Tiny, it's only checked once and after that, all bets are off (this
isn't a criticism. I use the module religiously, but I suspect the
performance implications are the driving factor in this decision).

So right now, the leading contenders for the syntax appear to be something
like (with the case of the check name and/or the name of the attribute to
be argued separately):

1. my INT $count = 0;
2. my $count :of(INT) = 0;
3. my $count is INT = 0;

(With the latter being "isa" instead of "is" for classes.
https://www.nntp.perl.org/group/perl.perl5.porters/2019/11/msg256683.html).

So far, the first version appears to be the most popular.

I'm rather partial to the second version for reasons I won't go into here,
but I won't argue for that position. I simply would like to gain a
consensus on what we'd like to do, taking into account what is actually
possible.

One thing which might be worth taking into consideration is how we use
Perl. Perl's primary type system revolves around how we *structure* data,
irrespective of the *kinds* of data we have (avoiding the word "type"
because it's so overloaded. Thus, we might find ourselves doing this:

my Hash[Hash[Int]] $grades = {
yusef => { math => [19, 13, 20],
biology => [12, 18, 17] },
jing => { math => [20, 11, 17],
biology => [7, 18, 18] }
};

This is important because I've been checking through several codebases to
see the kinds of checks people are writing and sometimes they get very
complex and span multiple lines. Languages like Java avoid having "ugly"
declarations in type declarations because they wrap everything in classes:

Grades studentGrades; # "Grades" is a class name

Because Perlers like to throw around complex data structures and sometimes
those structures are one-offs, having to go through the tedium of declaring
a new "check" for every one might mean complex check declarations. The
above might become: Hash[Hash[Int[1..20]] and if those grades could be
floats, we'd need even more syntax for whether or not ranges are inclusive
or exclusive.

That would be trivial if if was a user-defined type:

check Grade :isa(NUM) ($n) {
0 < $n <= 20; # zero to twenty, but zero itself is not allowed
}

But at that point, the user needs to define the check in Perl. Such common
use cases should probably be native (for performance), meaning that any
syntax we prefer should take this into account. Currently, that's handled
with:

my $grade :of(NUM[0 <.. 20]) = get_grade($student); # 0 < $grade <= 20

Does the following work?

my NUM[0 <.. 20] $grade = get_grade($student);

That turns the original check into:

my Hash[Hash[NUM[0 <.. 20]]] $grades = ...;

Things are starting to get ugly there (please ignore inconsistent case in
checks; we'll get to that later), but given how we tend to use Perl, it's
kind of unavoidable. So any decision on what syntax we prefer
("prefix/postfix" and "attributes/no attributes") should take this into
account.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

May 19, 2023, 12:20 AM

Post #10 of 43 (913 views)

On Fri, May 19, 2023 at 08:21:56AM +0200, Ovid wrote:
> I understand what you're coming from. I was strongly hoping we could at
> least get some consensus on what P5P would be willing to accept in terms of
> syntax so that when the rest comes along, part of the issue is out of the
> way.

I'm happy for any discussion on semantics to be deferred for now and for
this thread to concentrate (for now) on syntax. I just wanted a general
feel for what the proposed syntax is actually supposed to do, (e.g.
whether it imposes a type on the variable throughout its life, or whether
it's a constraint checker for the value of an initial assignment, or
whatever) which might influence people's choice of syntax.

So I look forward the remainder of your truncated reply :-)

--
O Unicef Clearasil!
Gibberish and Drivel!
-- "Bored of the Rings"

Re: Native data checking in Perl [ In reply to ]

May 19, 2023, 12:59 AM

Post #11 of 43 (913 views)

On Fri, 19 May 2023 09:16:43 +0200, Ovid <curtis.poe@gmail.com> wrote:

> So right now, the leading contenders for the syntax appear to be something
> like (with the case of the check name and/or the name of the attribute to
> be argued separately):
>
> 1. my INT $count = 0;
> 2. my $count :of(INT) = 0;
> 3. my $count is INT = 0;
>
> (With the latter being "isa" instead of "is" for classes.
> https://www.nntp.perl.org/group/perl.perl5.porters/2019/11/msg256683.html).
>
> So far, the first version appears to be the most popular.

If there is now way to specify your own types/type-classes, I'd support 1 and 2

1 because it is the most intuitive and easy to use, read and maintain

2 because I can imagine one wants to allow

my $used :of(INT,REAL) = 0;

with any combination of acceptable types, making unacceptable type fatal

In Raku one can build types out of existing types, restrictions in both
definedness and range

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 19, 2023, 4:39 AM

Post #12 of 43 (913 views)

On 2023-05-19 09:59, perl5@tux.freedom.nl wrote:
> On Fri, 19 May 2023 09:16:43 +0200, Ovid <curtis.poe@gmail.com> wrote:
>
>> So right now, the leading contenders for the syntax appear to be something
>> like (with the case of the check name and/or the name of the attribute to
>> be argued separately):
>>
>> 1. my INT $count = 0;
>> 2. my $count :of(INT) = 0;
>> 3. my $count is INT = 0;
>>
>> (With the latter being "isa" instead of "is" for classes.
>> https://www.nntp.perl.org/group/perl.perl5.porters/2019/11/msg256683.html).
>>
>> So far, the first version appears to be the most popular.
> If there is now way to specify your own types/type-classes, I'd support 1 and 2
>
> 1 because it is the most intuitive and easy to use, read and maintain
>
> 2 because I can imagine one wants to allow
>
> my $used :of(INT,REAL) = 0;
>
> with any combination of acceptable types, making unacceptable type fatal
>
> In Raku one can build types out of existing types, restrictions in both
> definedness and range
>

When I played with (tersely) applying "value-restrictions" to variables,
I generally used the format:

T_Int my ($x, $y) = (...);

as that is easy to implement in current Perl, with a sub T_Int {...}.

IIRC, it was easier than using the format

my ($x, $y) :T_Int = (...);

In the linked msg256683.html, Dave Mitchel writes:
"In particular, we shouldn't use a type prefix before the variable name to
specify a constraint; that should be reserved for a hypothetical future
type system."
which IMO remains important.

I would love to be able to write:

sub succ (UInt64 $_u) :UInt64 {
      state UInt64 $__MUL = 0x27bb2ee687b0b0fd;
      state UInt64 $__ADD = 0x00000000b504f32d;
      return $_u * $__MUL + $__ADD; # should ignore any overflows
}

or rather terser:

{ use UInt64;
    sub succ ($_u) {
    state $__MUL = 0x27bb2ee687b0b0fd;
        state $__ADD = 0x00000000b504f32d;
        return $_u * $__MUL + $__ADD;
}
}

-- Ruud

Re: Native data checking in Perl [ In reply to ]

fawaka at gmail

May 19, 2023, 5:52 AM

Post #13 of 43 (913 views)

On Fri, May 19, 2023 at 9:17?AM Ovid <curtis.poe@gmail.com> wrote:

> One thing which might be worth taking into consideration is how we use
> Perl. Perl's primary type system revolves around how we *structure* data,
> irrespective of the *kinds* of data we have (avoiding the word "type"
> because it's so overloaded. Thus, we might find ourselves doing this:
>
> my Hash[Hash[Int]] $grades = {
> yusef => { math => [19, 13, 20],
> biology => [12, 18, 17] },
> jing => { math => [20, 11, 17],
> biology => [7, 18, 18] }
> };
>

I don't think that is possible in a useful way. That is only workable if
values were typed instead of variables. And I don't think we can type
values retroactively. It's not that you can't do this (potentially
expensive) check, but you can't maintain its invariance in any meaningful
way. It would still allow modifying $grades in ways that would defy the
given constraint.

These are not types, they're type refinements. That doesn't make it useless
but it does make it severely limited.

Leon

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 19, 2023, 6:29 AM

Post #14 of 43 (913 views)

On Fri, May 19, 2023 at 2:52?PM Leon Timmermans <fawaka@gmail.com> wrote:

> my Hash[Hash[Int]] $grades = {
>> yusef => { math => [19, 13, 20],
>> biology => [12, 18, 17] },
>> jing => { math => [20, 11, 17],
>> biology => [7, 18, 18] }
>> };
>>
>
> I don't think that is possible in a useful way. That is only workable if
> values were typed instead of variables. And I don't think we can type
> values retroactively. It's not that you can't do this (potentially
> expensive) check, but you can't maintain its invariance in any meaningful
> way. It would still allow modifying $grades in ways that would defy the
> given constraint.
>

Yeah, that's a serious issue we encountered. Consider this:

my @records Hash[Array[Int]] = { bobbie => [3,2,1] };
$records[0] = { bobbie => [3,2,"foo"] }; # fatal
$records[0]{bobbie}[-1] = "foo"; # legal

For the above, the last two lines are kind of equivalent in terms of data
(though obviously the reference changes in the fatal version). However, the
last line "works" because we simply documented that checks are
prerequisites for *assignment to the variable*, not to *assignments to
the contents* of that variable. There are simply too many problems with the
latter, particularly in light of the fact that this is Perl and we didn't
have this from the start.

Best,
Ovid

--
Curtis "Ovid" Poe
--
CTO, All Around the World
World-class software development and consulting
https://allaroundtheworld.fr/

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 19, 2023, 6:35 AM

Post #15 of 43 (913 views)

On Fri, May 19, 2023 at 9:59?AM <perl5@tux.freedom.nl> wrote:

> If there is now way to specify your own types/type-classes, I'd support 1
> and 2
>
>
I think you meant "no way" instead of "now way"?

We do have a spec for specifying your own checks. Classes are easy and
don't require a new check:

my $dog :of(OBJ[Animal::Dog]) = get_dog('spot');

User-defined checks are easy. If you want a scalar to never decrease in
value:

check Monotonic :isa(NUM) ($value, %value) { $value >= $value{old} }

> 2 because I can imagine one wants to allow
>
> my $used :of(INT,REAL) = 0;
>
> with any combination of acceptable types, making unacceptable type fatal
>

That's handled:

my $used :of( INT | REAL ) = 0;

Best,
Ovid

--
Curtis "Ovid" Poe
--
CTO, All Around the World
World-class software development and consulting
https://allaroundtheworld.fr/

Re: Native data checking in Perl [ In reply to ]

May 19, 2023, 6:56 AM

Post #16 of 43 (913 views)

On Fri, 19 May 2023 15:35:02 +0200, Ovid <curtis.poe@gmail.com> wrote:

> On Fri, May 19, 2023 at 9:59?AM <perl5@tux.freedom.nl> wrote:
>
> > If there is now way to specify your own types/type-classes, I'd support 1
> > and 2
>
> I think you meant "no way" instead of "now way"?

I did

> We do have a spec for specifying your own checks. Classes are easy and
> don't require a new check:
>
> my $dog :of(OBJ[Animal::Dog]) = get_dog('spot');

????

> User-defined checks are easy. If you want a scalar to never decrease in
> value:
>
> check Monotonic :isa(NUM) ($value, %value) { $value >= $value{old} }

I probably need to read that more than twice to let it land

> > 2 because I can imagine one wants to allow
> >
> > my $used :of(INT,REAL) = 0;
> >
> > with any combination of acceptable types, making unacceptable type fatal
>
> That's handled:
>
> my $used :of( INT | REAL ) = 0;

Yeah!

> Ovid

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

Re: Native data checking in Perl [ In reply to ]

happy.barney at gmail

May 19, 2023, 1:26 PM

Post #17 of 43 (913 views)

Few notes / questions (and many more pending):

- should this be available for literal value? eg:

$foo = 1 :is (Float);
$foo = undef :is (Maybe[Str]);

- should this be available for expressions? eg:

return 1 :is (Float)
return :is (Maybe[Str]);
return foo :is (Float) (1, 2, 3);
return do :is (Maybe[Str]) ...;

- should this be available in boolean context?

if ($foo :is (Float)) { ... }

- this can supplement tainted (imho very good idea though current
implementation not sufficient)

foo (1 :is (Tainted))

And one comment.

Generally it's considered a good practice to not use primitive types /
constraints / ....
It will be nice (and feature easy to sell) to not allow primitive types in
:is(),
forcing user to declare domain specific types, eg:

instead of
$limit :is (UINT)

program should contain:
check Limit :extends(UINT) { $_ <= 32 };
$limit :is (Limit)

Brano

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 3:53 PM

Post #18 of 43 (913 views)

A few things I would want to see:

I prefer the consistency of having the name of an entity always on the left and
everything describing it to be on the right of the name.

Ideally the ordering is something that would still work if we were defining the
same interface in a hash ref say, where the identifiers in question are usually
are the keys.

I like the greater overall consistency of the "item keyword typedef" being used
everywhere, the same for "is" as "returns" etc.

Something else I want to see in a clean and consistent way is both supporting
data checks that correspond directly and unambiguously to Perl native types
(undef, boolean, integer, float, arrayref, hashref, <corinna object>, etc),
while also supporting extensibility for any user-defined predicate function (a
sub that takes any value and returns true or false if it is or isn't of the
type) which would be the basis for more complex checks, either passable by name
or subref.

We would want something that makes it easy to have the metadata needed to
"compile" Perl code to something C-like for performance if we wished to.

The unambiguous ways to refer to Perl native types must not be user overridable,
a code parser or compiler must be able to know it is safe that if say "Int" or
'ArrayRef" is seen, it always means the native one, not possibly something the
user replaced it with.

There could be 1:1 correspondants where it makes sense pre-defined that could be
overridden if there is a good reason to support that, and otherwise users should
have to choose names that are different from the built-ins for theirs.

-- Darren Duncan

On 2023-05-17 4:56 a.m., Ovid wrote:
> On Wed, May 17, 2023 at 11:32?AM G.W. Haywood via perl5-porters wrote:
>
> > sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
> > my Int $count = 0;
> >
> > Or this:
> >
> > sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
> > my $count is Int = 0;
>
> Coming from a number of other languages before Perl, the first of
> those feels natural to me.
>
> I suspect most feel the same way.

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 4:09 PM

Post #19 of 43 (913 views)

On 2023-05-18 1:25 p.m., Chris Prather wrote:
> Even "99" and 1.2 should IMO fail, but it depends on the
> implementation of the check code for INT, and what defaults we set for
> implicit coercions. Moose requires you flag coercible attributes
> (coerce => 1 on an attribute definition), this being Perl and
> coercions being baked so deeply into the language I don't think that
> would fly in general but allowing someone to disable them would be
> incredibly useful/nice e.g. `
>
> no coercion;
> my $count :of(INT) = "99"; # warn/croak/dies
> my $count :of(INT) = 1.2; # warn/croak/die
>
>> * it doesn't warn/croak, but ensures that $count ends up as an int:
>>
>> my $count :of(INT) = "99";
>> Dump $count; # shows that $count is an IV, not a PV.
>
> Probably but see above, I'd like to be able to make that
> warn/croak/die in some circumstances.

My general thought is that it is important for Perl to let users explicitly
declare what they expect, for example:

my $count1 :is(Int);
my $count2 :is(Int) :coerce(Int);

The first example would throw an exception if anything that isn't already an Int
is assigned to it, and the second example would try to coerce the thing to an Int.

What :is takes is a predicate resulting in true or false depending on its input;
what :coerce takes is a mapping function that returns an Int value derived from
its input. An optimization can be that the :coerce one is only invoked when :is
is false.

The :is and :coerce can be used anywhere one may declare a type, whether on a
local variable or a parameter declaration or a returns etc, or in the returns
case maybe :returns would be analagous to :is and there could be a
:returns-coerce analagous to the :coerce.

I'm not suggesting specific keywords, but that we have distinct ones for these
purposes.

The strict :is is the most important, the :coerce is something nice to add.

There can be multiple :coerce functions that coerce to the same type, depending
on desired semantics, and in the general case, :coerce is really just a generic
title for what ideally would be a set of functions that are specific in their
behavior, kind of like how I feel floor(), ceil(), to_zero(), to_even() etc are
better rounding function names than round().

And the fact these options can exist makes it even better for consistency to
have the "entity keyword type" format rather than "type entity" format because
the former is extensible. Or if you want to have both, "type entity" should
ALWAYS mean strict exception throwing yes/no and NOT coerce ever, and one can
use the :foo syntax to indicate coersion when desired.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 4:12 PM

Post #20 of 43 (913 views)

On 2023-05-19 12:20 a.m., Dave Mitchell wrote:
> On Fri, May 19, 2023 at 08:21:56AM +0200, Ovid wrote:
>> I understand what you're coming from. I was strongly hoping we could at
>> least get some consensus on what P5P would be willing to accept in terms of
>> syntax so that when the rest comes along, part of the issue is out of the
>> way.
>
> I'm happy for any discussion on semantics to be deferred for now and for
> this thread to concentrate (for now) on syntax. I just wanted a general
> feel for what the proposed syntax is actually supposed to do, (e.g.
> whether it imposes a type on the variable throughout its life, or whether
> it's a constraint checker for the value of an initial assignment, or
> whatever) which might influence people's choice of syntax.
>
> So I look forward the remainder of your truncated reply :-)

One can't really put off semantics in order to discuss syntax; discussing syntax
is meaningless without knowing what that syntax DOES. Especially when there may
be several different behaviors to represent that would have distinct syntax, eg
:is vs :coerce etc. -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 4:44 PM

Post #21 of 43 (913 views)

I also want to firmly put my support behind the concept that Perl undef is
effectively its own singleton type and whenever one declares that something is
an "int" or "float" or whatever else it is implicitly the case that those are
DEFINED, and guaranteed to contain a valid int/float/etc at all times.

When one wants something to be allowed to be undefined, that should be declared
with an explicit union such as "int|undef" and if one just says "int" then the
latter is excluded.

This will obviously mean we will need a well documented "default value" for
every type, eg if one declares "my Int $foo;" without assigning it a value, then
$foo implicitly contains zero, assuming we don't want to require an explicit
assignment or declaration of a default value to remove any doubt, though that
might possibly get unwieldy with more complicated types.

This also means that if one declares eg "my UserDefType $foo;" then this ALSO
does NOT allow undef, and must be some object of that type or whatever.

I can see this being resolved on one hand by either of:

1. Have some standard way for a class to declare what its "default" value is,
where that is useful, for example explicitly declaring it is their concept of
zero. Then $foo implicitly gets that default.

2. Although for some classes it doesn't make sense to require a default, eg if
an object represents an active connection to some external resource. Then the
$foo line is an error but in the general case that would only be known at
runtime and could result in a no-default-value exception or something.

3. Have a compile-time requirement that any "my Foo $foo" must either explicitly
assign a value or must be of the form "my Foo|undef $foo".

Of course, then there is the related matter, if we have a type union, then whose
default from the union is used as the default of the union as a whole? I would
suggest the easiest to understand option is treat the union as being an ordered
rather than unordered set and go left to right, using the default of the
left-most option that defines one. Note that having "undef" in the list would
only make the default "undef" if it appears first or all options before it don't
declare a default.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 12:49 AM

Post #22 of 43 (913 views)

On Fri, May 19, 2023 at 10:26?PM Branislav Zahradník <happy.barney@gmail.com>
wrote:

> - should this be available for literal value? eg:
> - should this be available in boolean context?
> - this can supplement tainted (imho very good idea though current
> implementation not sufficient)
>

I really like these ideas. Will add them to my notes now, but probably not
something for an MVP.

> Generally it's considered a good practice to not use primitive types /
> constraints / ....
> It will be nice (and feature easy to sell) to not allow primitive types in
> :is(),
> forcing user to declare domain specific types, eg:
>

A number of code bases were consulted prior to this to see real-world uses.
Quite often people have a "one off" check fora variable or sub that doesn't
justify the value of a domain-specific constraint (D

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 1:22 AM

Post #23 of 43 (913 views)

On Sat, May 20, 2023 at 12:53?AM Darren Duncan <darren@darrenduncan.net>
wrote:

> I like the greater overall consistency of the "item keyword typedef" being
> used
> everywhere, the same for "is" as "returns" etc.
>

Noted.

> Something else I want to see in a clean and consistent way is both
> supporting
> data checks that correspond directly and unambiguously to Perl native
> types
> (undef, boolean, integer, float, arrayref, hashref, <corinna object>,
> etc),
>

That's generally there (with some hand-waving because the devil's in the
details).

> while also supporting extensibility for any user-defined predicate
> function (a
> sub that takes any value and returns true or false if it is or isn't of
> the
> type) which would be the basis for more complex checks, either passable by
> name
> or subref.
>

Currently we don't have predicate functions defined, but there's a robust
definition for user-defined checks.

> We would want something that makes it easy to have the metadata needed to
> "compile" Perl code to something C-like for performance if we wished to.
>

Post-MVP.

> The unambiguous ways to refer to Perl native types must not be user
> overridable,
> a code parser or compiler must be able to know it is safe that if say
> "Int" or
> 'ArrayRef" is seen, it always means the native one, not possibly something
> the
> user replaced it with.
>

Already planned.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 1:29 AM

Post #24 of 43 (913 views)

On Sat, May 20, 2023 at 1:09?AM Darren Duncan <darren@darrenduncan.net>
wrote:

> My general thought is that it is important for Perl to let users
> explicitly
> declare what they expect, for example:
>
> my $count1 :is(Int);
> my $count2 :is(Int) :coerce(Int);
>

Coercions are a "nice-to-have" and absolutely post-MVP. I'd be half-tempted
to say "not allowed."

1. They can't be disabled or downgraded to warnings, even if they're
wrong/buggy
2. Coercing any reference value can alter the calling code's data
3. Implicit coercions like you show above are a minefield (do we
*really* want
references coerced to integers?)
4. Explicit coercions declared via the coercion keyword give us
action-at-a-distance that can be hard to debug (because debugging a
post-coercion value doesn't mean it's easy to get the pre-coercion value)

I'm not saying "no coercions" simply because I don't have that authority,
but they have enough pitfalls that they need to be handled carefully.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 1:40 AM

Post #25 of 43 (913 views)

On Sat, May 20, 2023 at 1:44?AM Darren Duncan <darren@darrenduncan.net>
wrote:

> When one wants something to be allowed to be undefined, that should be
> declared
> with an explicit union such as "int|undef" and if one just says "int" then
> the
> latter is excluded.
>

That's already in the spec.

> This will obviously mean we will need a well documented "default value"
> for
> every type, eg if one declares "my Int $foo;" without assigning it a
> value, then
> $foo implicitly contains zero, assuming we don't want to require an
> explicit
> assignment or declaration of a default value to remove any doubt, though
> that
> might possibly get unwieldy with more complicated types.
>

I think magic defaults are a very bad idea and you *can't* do it. Just a
few checks we support:

open my $fh :of(HANDLE), '<', $file;
while (my ( $key, $value :of(GLOB) ) = each %main:: ) { ... }
my $version :of(VSTR) = v5.22.0;
my $spot :of(OBJ[Dog]) = get_dog('spot');

There's plenty more where those came from. Not one of those has a sane
default.

Your example of an ENUM or type union is another good example of why the
default would be bad. Imagine an enum with seven allowed values.
Arbitrarily choosing the first one for me because I forgot an initial
assignment? Bugs waiting to happen.

We also support some common subchecks, such as TUPLE and DICT. There's no
way default will be reasonable for those.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 20, 2023, 2:13 AM

Post #26 of 43 (654 views)

Op 20-05-2023 om 10:40 schreef Ovid:
> On Sat, May 20, 2023 at 1:44?AM Darren Duncan
> <darren@darrenduncan.net> wrote:
>
> When one wants something to be allowed to be undefined, that
> should be declared
> with an explicit union such as "int|undef" and if one just says
> "int" then the
> latter is excluded.
>
>
> That's already in the spec.
>
> This will obviously mean we will need a well documented "default
> value" for
> every type, eg if one declares "my Int $foo;" without assigning it
> a value, then
> $foo implicitly contains zero, assuming we don't want to require
> an explicit
> assignment or declaration of a default value to remove any doubt,
> though that
> might possibly get unwieldy with more complicated types.
>
>
> I think magic defaults are a very bad idea and you /can't/ do it. Just
> a few checks we support:
>
> open my $fh :of(HANDLE), '<', $file;
> while (my ( $key, $value :of(GLOB) ) = each %main:: ) { ... }
> my $version :of(VSTR) = v5.22.0;
> my $spot :of(OBJ[Dog]) = get_dog('spot');
> There's plenty more where those came from. Not one of those has a sane
> default.
>
> Your example of an ENUM or type union is another good example of why
> the default would be bad. Imagine an enum with seven allowed values.
> Arbitrarily choosing the first one for me because I forgot an initial
> assignment? Bugs waiting to happen.
>
> We also support some common subchecks, such as TUPLE and DICT. There's
> no way default will be reasonable for those.

Does that mean that the default will be the same as it is now, or will
it mean that a value must be assigned and defaults are disallowed?

M4

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 2:35 AM

Post #27 of 43 (654 views)

On Sat, May 20, 2023 at 11:13?AM Martijn Lievaart via perl5-porters <
perl5-porters@perl.org> wrote:

> Does that mean that the default will be the same as it is now, or will it
> mean that a value must be assigned and defaults are disallowed?
>

Not sure exactly what you're asking (I probably just need more coffee).
Currently, you, the developer, will have to assign a value to a checked
variable.

my $count :of(INT) = 0;
my $count :of(INT); # fatal because undef isn't an integer
my $count :of(INT|UNDEF); # allowed

Damian and I went a few rounds over that one, but it's important. We're
retrofitting a check system onto a language where this has always been
ad-hoc. Consider the following pseudo-code:

var count Int;

For many languages where "types" are part of the initial spec, that
declaration is illegal, but you get a compile-time type failure if you try
to use that value prior to assignment. For dynamic languages, that's often
a runtime error.

That doesn't work for Perl because it's perfectly fine to $count++ if that
variable is undefined. And what happens for this?

sub foo () {
my $count :of(INT);
foreach my $cust (get_customers()) {
next unless meets_conditions($cust);
$count++;
}
return $count;
}

We lose all guarantees. This is a silly example because $count will coerce
to zero, but we can easily create more complex examples where allowing an
unitialized variable is problematic. To quote part of Damian's response on
this:

To be truly useful – and safe – these checks have to be guarantees that
> subsequent code
> can rely upon the values stored in checked variables. Allowing checks to
> be initially
> skirted utterly destroys that certainty.

> I understand that undef is special. And convenient. It’s Perl’s single
> most highly
> auto-coercive value. But that’s exactly why we can’t allow checked values
> to remain
> uninitialized. Because undef will coerce to just about anything, which
> utterly
> breaks the guarantee that a checked value is not just anything, but rather
> is
> something highly specific.

> In short, my view is: if you want your variable to be sloppy and unchecked,
> then be sloppy and don’t make it checked. If you make it checked, then
> don’t
> expect it to still be sloppy and (initially) unchecked.

In short, this is going to be a contentious issue, but I'd rather we be
strict at first just to err on the side of caution.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

lukasmai.403 at gmail

May 20, 2023, 2:46 AM

Post #28 of 43 (654 views)

On 20.05.23 11:35, Ovid wrote:
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed
>

Semantically, Perl so far does not have initialization, only assignment.
The syntax 'my $x = 0' means "create the local variable $x, then assign
0 to it". So what happens in the proposed system if we break the two
steps apart?

my $count :of(INT) = do { goto L; };
L: say $count; # ?

Also, what do these do?

my $count :of(INT) += 0;

${\my $count :of(INT)} = 0;

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 20, 2023, 2:51 AM

Post #29 of 43 (654 views)

Op 20-05-2023 om 11:35 schreef Ovid:
> On Sat, May 20, 2023 at 11:13?AM Martijn Lievaart via perl5-porters
> <perl5-porters@perl.org> wrote:
>
> Does that mean that the default will be the same as it is now, or
> will it mean that a value must be assigned and defaults are
> disallowed?
>
>
> Not sure exactly what you're asking (I probably just need more coffee).

Thanks, clear. Just to clarify, I was trying to ask if you must assign a
value *always* when using these type checks, to which you clearly
answered 'yes'. So leaving a variable unintialized will be forbidden,
which is fine with me.

Thanks,

M4

Re: Native data checking in Perl [ In reply to ]

lukasmai.403 at gmail

May 20, 2023, 3:04 AM

Post #30 of 43 (654 views)

On 20.05.23 11:35, Ovid wrote:
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed
>

Semantically, Perl so far does not have initialization, only assignment.
The syntax 'my $x = 0' means "create the local variable $x, then assign
0 to it". So what happens in the proposed system if we break the two
steps apart?

my $count :of(INT) = do { goto L; };
L: say $count; # ?

Also, what do these do?

my $count :of(INT) += 0;

${\my $count :of(INT)} = 0;

Re: Native data checking in Perl [ In reply to ]

May 20, 2023, 3:05 AM

Post #31 of 43 (654 views)

On Sat, 20 May 2023 11:35:36 +0200, Ovid <curtis.poe@gmail.com> wrote:

> On Sat, May 20, 2023 at 11:13?AM Martijn Lievaart via perl5-porters <
> perl5-porters@perl.org> wrote:
>
> > Does that mean that the default will be the same as it is now, or will it
> > mean that a value must be assigned and defaults are disallowed?
>
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed

I guess that - at least in *my* perception - :of(INT|UNDEF) will be
used *a lot* more than :of(INT). I deduce that from my use of types in
Raku. I feel it non-intuitive to make the most used case take this
amount of syntax.

> Damian and I went a few rounds over that one, but it's important. We're
> retrofitting a check system onto a language where this has always been
> ad-hoc. Consider the following pseudo-code:
>
> var count Int;
>
> For many languages where "types" are part of the initial spec, that
> declaration is illegal, but you get a compile-time type failure if you try
> to use that value prior to assignment. For dynamic languages, that's often
> a runtime error.
>
> That doesn't work for Perl because it's perfectly fine to $count++ if that
> variable is undefined. And what happens for this?
>
> sub foo () {
> my $count :of(INT);
> foreach my $cust (get_customers()) {
> next unless meets_conditions($cust);
> $count++;
> }
> return $count;
> }
>
> We lose all guarantees. This is a silly example because $count will coerce
> to zero, but we can easily create more complex examples where allowing an
> unitialized variable is problematic. To quote part of Damian's response on
> this:
>
> To be truly useful – and safe – these checks have to be guarantees that
> > subsequent code
> > can rely upon the values stored in checked variables. Allowing checks to
> > be initially
> > skirted utterly destroys that certainty.
>
> > I understand that undef is special. And convenient. It’s Perl’s single
> > most highly
> > auto-coercive value. But that’s exactly why we can’t allow checked values
> > to remain
> > uninitialized. Because undef will coerce to just about anything, which
> > utterly
> > breaks the guarantee that a checked value is not just anything, but rather
> > is
> > something highly specific.
>
> > In short, my view is: if you want your variable to be sloppy and unchecked,
> > then be sloppy and don’t make it checked. If you make it checked, then
> > don’t
> > expect it to still be sloppy and (initially) unchecked.
>
> In short, this is going to be a contentious issue, but I'd rather we be
> strict at first just to err on the side of caution.
>
> Best,
> Ovid

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

Re: Native data checking in Perl [ In reply to ]

lukasmai.403+p5p at gmail

May 20, 2023, 3:14 AM

Post #32 of 43 (654 views)

On 20.05.23 11:35, Ovid wrote:
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed
>

Semantically, Perl so far does not have initialization, only assignment.
The syntax 'my $x = 0' means "create the local variable $x, then assign
0 to it". So what happens in the proposed system if we break the two
steps apart?

my $count :of(INT) = do { goto L; };
L: say $count; # ?

Also, what do these do?

my $count :of(INT) += 0;

${\my $count :of(INT)} = 0;

Re: Native data checking in Perl [ In reply to ]

happy.barney at gmail

May 20, 2023, 5:38 AM

Post #33 of 43 (654 views)

Semantically, Perl so far does not have initialization, only assignment.
> The syntax 'my $x = 0' means "create the local variable $x, then assign
> 0 to it". So what happens in the proposed system if we break the two
> steps apart?
>
> my $count :of(INT) = do { goto L; };
> L: say $count; # ?
>
> Also, what do these do?
>
> my $count :of(INT) += 0;
>
> ${\my $count :of(INT)} = 0;
>

Nice points. Maybe time to revive my earlier idea about "not-exist" (to
differentiate between uninitialized and undefined).

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 7:04 AM

Post #34 of 43 (654 views)

On Sat, May 20, 2023 at 12:14?PM Lukas Mai <lukasmai.403+p5p@gmail.com>
wrote:

The following responses are my assumptions based on the current spec. I
could be mistaken, or these could be bad ideas.

>
> my $count :of(INT) = do { goto L; };
> L: say $count; # ?
>

My assumption would be that we check on assignment, so $count is undefined
in the say. Whatever value the do {} block returns then gets assigned to
$count, though we're possibly in a weird situation where $count will not be
assigned to on the my line.

> Also, what do these do?
>
> my $count :of(INT) += 0;
>

We only check on assignment, so the + part of that would be fetching the
value, undef, and adding zero to it, leaving zero to be assigned. Thus,
no error on the check.

> ${\my $count :of(INT)} = 0;
>

I think it should be fatal since you're not assigning anything to $count
directly. Instead, you're taking a reference to it and assigning to the
dereference. Hover, the checks attach to the variable, not the data (or in
the case, the reference).

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 7:15 AM

Post #35 of 43 (654 views)

On Sat, May 20, 2023 at 4:04?PM Ovid <curtis.poe@gmail.com> wrote:

> ${\my $count :of(INT)} = 0;
>>
>
> I think it should be fatal since you're not assigning anything to $count
> directly. Instead, you're taking a reference to it and assigning to the
> dereference. Hover, the checks attach to the variable, not the data (or in
> the case, the reference).
>

Actually, I think what I said might not be coherent. Ignore that. The
answer is "I'm not sure."

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 10:57 AM

Post #36 of 43 (654 views)

On 2023-05-20 1:29 a.m., Ovid wrote:
> On Sat, May 20, 2023 at 1:09?AM Darren Duncan wrote:
> My general thought is that it is important for Perl to let users explicitly
> declare what they expect, for example:
>
> my $count1 :is(Int);
> my $count2 :is(Int) :coerce(Int);
>
> Coercions are a "nice-to-have" and absolutely post-MVP. I'd be half-tempted to
> say "not allowed."
>
> 1. They can't be disabled or downgraded to warnings, even if they're wrong/buggy
> 2. Coercing any reference value can alter the calling code's data
> 3. Implicit coercions like you show above are a minefield (do we /really/ want
> references coerced to integers?)
> 4. Explicit coercions declared via the coercion keyword give us
> action-at-a-distance that can be hard to debug (because debugging a
> post-coercion value doesn't mean it's easy to get the pre-coercion value)
>
> I'm not saying "no coercions" simply because I don't have that authority, but
> they have enough pitfalls that they need to be handled carefully.

I should make an important clarification.

I don't actually want or like coercions at all.

My preference is that everything is strict yes/no, and people just use explicit
mapping functions eg floor() on assignment when applicable.

What I was actually arguing for is that IF there are coersions, they should be
explicit as such and are always an alternative where the default is no coersions.

My stated proposal before was more about placating those who feel coercions are
important to have in Perl because its the Perl way or whatever.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 11:01 AM

Post #37 of 43 (654 views)

On 2023-05-20 1:40 a.m., Ovid wrote:
> On Sat, May 20, 2023 at 1:44?AM Darren Duncan wrote:
> When one wants something to be allowed to be undefined, that should be declared
> with an explicit union such as "int|undef" and if one just says "int" then the
> latter is excluded.
>
> That's already in the spec.
>
> This will obviously mean we will need a well documented "default value" for
> every type, eg if one declares "my Int $foo;" without assigning it a value,
> then
> $foo implicitly contains zero, assuming we don't want to require an explicit
> assignment or declaration of a default value to remove any doubt, though that
> might possibly get unwieldy with more complicated types.
>
> I think magic defaults are a very bad idea and you /can't/ do it. Just a few
> checks we support:
>
> open my $fh :of(HANDLE), '<', $file;
> while (my ( $key, $value :of(GLOB) ) = each %main:: ) { ... }
> my $version :of(VSTR) = v5.22.0;
> my $spot :of(OBJ[Dog]) = get_dog('spot');
> There's plenty more where those came from. Not one of those has a sane default.
>
> Your example of an ENUM or type union is another good example of why the
> default would be bad. Imagine an enum with seven allowed values. Arbitrarily
> choosing the first one for me because I forgot an initial assignment? Bugs
> waiting to happen.
>
> We also support some common subchecks, such as TUPLE and DICT. There's no way
> default will be reasonable for those.
> Best,
> Ovid

I read this that you advocate for requiring users to explicitly assign to or
provide an initial value to a variable/parameter/etc when declaring it with a
type/check, and I'm perfectly fine with that, and I think it was one of the
options I gave. The "defaults" idea was more about an alternative if we
consider it important to be able to declare a variable etc without giving it an
initial value. -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 12:55 PM

Post #38 of 43 (654 views)

On 2023-05-20 3:05 a.m., perl5@tux.freedom.nl wrote:
> On Sat, 20 May 2023 11:35:36 +0200, Ovid <curtis.poe@gmail.com> wrote:
>> Not sure exactly what you're asking (I probably just need more coffee).
>> Currently, you, the developer, will have to assign a value to a checked
>> variable.
>>
>> my $count :of(INT) = 0;
>> my $count :of(INT); # fatal because undef isn't an integer
>> my $count :of(INT|UNDEF); # allowed
>
> I guess that - at least in *my* perception - :of(INT|UNDEF) will be
> used *a lot* more than :of(INT). I deduce that from my use of types in
> Raku. I feel it non-intuitive to make the most used case take this
> amount of syntax.

I would argue that most programs allowing undef in more contexts than
disallowing it such as you mention are poorly designed, relying on bad habits.
It is essentially the same as the very widespread problem across the whole
programming world that bake in and default nullability everywhere.

Allowing undef is something we want to actively discourage, an enormous source
of problems, and users should explicitly opt-in to it where they want it rather
than having to explicitly opt-out where they don't.

The tersest way to allow undef by default is to not mark something with a type
in the first place; if you care enough to mark something with a type, I would
hope one wants it more strict.

That being said, I can see good cause to provide an optional shorthand for
"foo|or" undef given how it is so common and simple. I suggest something like
"foo?" which would have this meaning. It doesn't have to be a trailing question
mark, however there is existing precedent in other languages for that. I think
maybe C# does it that way maybe.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 12:58 PM

Post #39 of 43 (654 views)

On 2023-05-20 12:55 p.m., Darren Duncan wrote:
> On 2023-05-20 3:05 a.m., perl5@tux.freedom.nl wrote:
>> On Sat, 20 May 2023 11:35:36 +0200, Ovid <curtis.poe@gmail.com> wrote:
>>> Not sure exactly what you're asking (I probably just need more coffee).
>>> Currently, you, the developer, will have to assign a value to a checked
>>> variable.
>>>
>>>      my $count :of(INT) = 0;
>>>      my $count :of(INT);       # fatal because undef isn't an integer
>>>      my $count :of(INT|UNDEF); # allowed
>>
>> I guess that - at least in *my* perception - :of(INT|UNDEF) will be
>> used *a lot* more than :of(INT). I deduce that from my use of types in
>> Raku. I feel it non-intuitive to make the most used case take this
>> amount of syntax.
>
> I would argue that most programs allowing undef in more contexts than
> disallowing it such as you mention are poorly designed, relying on bad habits.
> It is essentially the same as the very widespread problem across the whole
> programming world that bake in and default nullability everywhere.
>
> Allowing undef is something we want to actively discourage, an enormous source
> of problems, and users should explicitly opt-in to it where they want it rather
> than having to explicitly opt-out where they don't.
>
> The tersest way to allow undef by default is to not mark something with a type
> in the first place; if you care enough to mark something with a type, I would
> hope one wants it more strict.
>
> That being said, I can see good cause to provide an optional shorthand for
> "foo|or" undef given how it is so common and simple. I suggest something like
> "foo?" which would have this meaning. It doesn't have to be a trailing question
> mark, however there is existing precedent in other languages for that. I think
> maybe C# does it that way maybe.

Another thing I meant to say before: Having "int" NOT include undef is also the
principle of least surprise, and it is more consistent. When you say "int" you
MEAN "just int"; same as saying "int" excludes "float", which isn't "int", it
excludes undef, which isn't "int". -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

happy.barney at gmail

May 20, 2023, 2:55 PM

Post #40 of 43 (654 views)

Another question:

should constraint names share namespace with functions or not ?

eg: int (builtin) vs int (constraint)

Both approaches have pros:
- when yes, check can be imported by name
- when no, you can have clear separation allowing to mix new code into old
codebase without conflicts

Moreover, introducing idea of separated symbol namespaces (with possibility
to declare own namespace)
will bring another possibility to extending Perl

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 5:22 PM

Post #41 of 43 (654 views)

On 2023-05-20 2:55 p.m., Branislav Zahradník wrote:
> Another question:
>
> should constraint names share namespace with functions or not ?
>
> eg: int (builtin) vs int (constraint)
>
> Both approaches have pros:
> - when yes, check can be imported by name
> - when no, you can have clear separation allowing to mix new code into old
> codebase without conflicts
>
> Moreover, introducing idea of separated symbol namespaces (with possibility to
> declare own namespace)
> will bring another possibility to extending Perl

Maybe the answer can be some of both options.

At such time that user-defined predicate functions are supported for this
purpose, which apparently isn't in the first version, the answer would have to
be yes its a shared namespace, at least for those.

As far as the ones that are explicitly both system/language-defined and NOT
user-overridable, it would make sense for those to be a separate namespace, and
also automatically available without any "use" statement, or implied by "use
5.40" etc.

There would then have to be a way syntactically to distinguish the two
categories, say when one says "int" in a type/check reference.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 5:28 PM

Post #42 of 43 (654 views)

On 2023-05-20 5:22 p.m., Darren Duncan wrote:
> On 2023-05-20 2:55 p.m., Branislav Zahradník wrote:
>> Another question:
>>
>> should constraint names share namespace with functions or not ?
>>
>> eg: int (builtin) vs int (constraint)
>>
>> Both approaches have pros:
>> - when yes, check can be imported by name
>> - when no, you can have clear separation allowing to mix new code into old
>> codebase without conflicts
>>
>> Moreover, introducing idea of separated symbol namespaces (with possibility to
>> declare own namespace)
>> will bring another possibility to extending Perl
>
> Maybe the answer can be some of both options.
>
> At such time that user-defined predicate functions are supported for this
> purpose, which apparently isn't in the first version, the answer would have to
> be yes its a shared namespace, at least for those.
>
> As far as the ones that are explicitly both system/language-defined and NOT
> user-overridable, it would make sense for those to be a separate namespace, and
> also automatically available without any "use" statement, or implied by "use
> 5.40" etc.
>
> There would then have to be a way syntactically to distinguish the two
> categories, say when one says "int" in a type/check reference.

Perhaps this might also be something builtin:: is useful for, assuming users
can't override those, as a way of just using the shared namespace exclusively,
which in the interest of general extensibility may be preferred. That is,
design this as if generic function support is how it is done, but the first
version raises an error if anything other than a built-in is used, which allows
some parts of the implementation to shortcut and not actually work for anything
other than builtins. I'm not advocating this but its an idea. -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 11:34 PM

Post #43 of 43 (654 views)

On Sat, May 20, 2023 at 11:55?PM Branislav Zahradník <happy.barney@gmail.com>
wrote:

> should constraint names share namespace with functions or not ?
>
> eg: int (builtin) vs int (constraint)
>
> Both approaches have pros:
> - when yes, check can be imported by name
> - when no, you can have clear separation allowing to mix new code into old
> codebase without conflicts
>

They should not share the same namespace. "checks imported by name" is a
pain. I'm tired of going through codebases and manually curating the
imports from Type::Tiny and friends, just as I'm tired of writing a new
signatures using a type I *know* exists but which I forgot to import.
Languages which allow you to specify types don't have this limitation for a
good reason: it's painful manual drudgery which the developer should not
need to handle. There are currently almost 30 built-in checks, that's
probably going to grow, and user-defined checks will only add to the
problem.

There would be no need to specify lists of which checks you allow or not
because they do not share the same namespace.

Best,
Ovid