Mailing List Archive: Native data checking in Perl

Native data checking in Perl

curtis.poe at gmail

May 17, 2023, 1:17 AM

Post #1 of 43 (876 views)

Hi all,

I’ve been reading through past P5P messages to get a better sense of what
the porters are willing to accept for declaring “types.” I’ll use the term
data “checks” to avoid confusion about type systems.

For a few months, Damian and I have been hashing out a system of checks for
Perl. In particular, we looked at some of the most popular solutions:
Moo/se and Type::Tiny (working within Perl’s constraints), and also Dios an
Zydeco (working to extend Perl’s constraints), plus Raku and some
diversions into other languages.

Unfortunately, Damian deeply regrets that, for personal reasons, he is
unable to continue to work on the project, but is still wholeheartedly in
favor of its stated goals.

At present, there’s a working implementation in a private repo and we
support things like this:

sub foo :returns(ARRAY[INT]) ( $max_size :of(UINT) ) {...}
my $count :of(INT) = 0;

In fact, we support a lot more than that and have almost 200K tests
passing, though it's still very much alpha.

In reading P5P and other discussions, it was pleasant to see that there’s
little objection to the notion of checks, but the disagreements largely
arise over syntax and then discussion dies down again. In sharing our work
privately with others, the strongest objection has been the syntax. People
want this (punting on return values):

sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
my Int $count = 0;

Or this:

sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
my $count is Int = 0;

Or something else entirely.

Our proposal used attributes simply because we were concerned that P5P
would reject the my Dog $spot syntax (and I like the attribute syntax)
because it’s already taken (though not frequently used). However, given the
feedback, I think this didn’t need to be a concern.

Can we discuss this? I think the *semantics* are largely in place. It’s the
*syntax* where things keep breaking down. If we can find an agreed syntax,
I can take the existing work and write up a spec (that’s actually the hard
part, but most of it’s done, pending syntax changes).

The design “the thing and hand it over” worked for Corinna. This will be a
much more important change, but it can work here, too.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

leonerd at leonerd

May 17, 2023, 2:27 AM

Post #2 of 43 (876 views)

On Wed, 17 May 2023 10:17:57 +0200
Ovid <curtis.poe@gmail.com> wrote:

> we were concerned that P5P
> would reject the my Dog $spot syntax (and I like the attribute syntax)
> because it’s already taken (though not frequently used). However,
> given the feedback, I think this didn’t need to be a concern.

I'm in favour of using that syntax *exactly because* it is reserved for
exactly this reason. Please stop avoiding using it because of that -
this is what it's there for :)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 17, 2023, 2:32 AM

Post #3 of 43 (876 views)

Hi there,

On Wed, 17 May 2023, Ovid wrote:

> ... declaring “types.” ...

Wouldn't that be nice? :)

> ... disagreements largely arise over syntax ...

Unsurprisingly.

> ... People want this ...
>
> sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
> my Int $count = 0;
>
> Or this:
>
> sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
> my $count is Int = 0;
>
> Or something else entirely.

Coming from a number of other languages before Perl, the first of
those feels natural to me. The second makes my flesh crawl. As for
the others, I wouldn't know what I might like or not until I saw it.
(Also, ignoring what might be a UInt/Uint typo, I kinda like UInt.)

> Can we discuss this? I think the *semantics* are largely in
> place. It’s the *syntax* where things keep breaking down. ...

Isn't there more to it than that? One of my personal hobbyhorses is
that I've fallen into bear-traps using 64-bit integers, where just
using a scalar in a print statement changed the scalar's value without
any warning. When used two lines later in the same sub, it's clear
that my 64 flags have been converted from an integer type to a float
type, with the result that the rounding errors have totally trashed my
64 flags. I've had to jump through hoops to get things to look sane.

I'm not sure what these proposals would do to help me. I'm not saying
that it makes them bad proposals, just that I don't know if they'd be
any help to me if they were implemented. It seems like there's quite
a large mountain of code Out There that would need majorly overhauling
to make general use of something like this. OTOH I imagine there's an
opportunity for a great deal of efficiency improvement if you can say
"this thingy is never going to need to store anything but an integer".

I wonder if something like a more accessible interface to XS might be
more rewarding; if you want types, use a language which offers them.
Around here, more or less daily I link code I wrote in assembler with
code I wrote in C and C++, because by doing it that way it was easier
to get exactly what I wanted. You might even be able to avoid some of
the argument about syntax this way. Of course your alpha code would
then likely end up in the bin, sorry.

Incidentally I haven't used the term 'integer' as a placeholder for
some generic typedef. I'd be happy to start with plain old integers,
and worry about other types in a decade or two.

I hope this helps, even if it's only a little.

--

73,
Ged.

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 17, 2023, 4:56 AM

Post #4 of 43 (876 views)

On Wed, May 17, 2023 at 11:32?AM G.W. Haywood via perl5-porters <
perl5-porters@perl.org> wrote:

> > sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
> > my Int $count = 0;
> >
> > Or this:
> >
> > sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
> > my $count is Int = 0;
>
> Coming from a number of other languages before Perl, the first of
> those feels natural to me.

I suspect most feel the same way.

I'm not sure what these proposals would do to help me. I'm not saying
> that it makes them bad proposals, just that I don't know if they'd be
> any help to me if they were implemented. It seems like there's quite
> a large mountain of code Out There that would need majorly overhauling
> to make general use of something like this.

Two things:

1. Regarding the 64-bit bug you mentioned (but didn't include above), if
there's a pre-existing bug, checks don't make the situation worse. If
checks introduce a new bug, that's an issue. Sounds like you hit a
pre-existing bug.
2. There's no mountain of code. All checks are optional and you could
introduce them for a single "problematic" variable if you choose:

sub foo ($name, $field, HashRef[Int], $totals) {...}

OTOH I imagine there's an
> opportunity for a great deal of efficiency improvement if you can say
> "this thingy is never going to need to store anything but an integer".
>

Probably, but that's not an issue for now and probably shouldn't be for the
MVP.

I just want to have some convergence on syntax before I rewrite the spec.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

May 18, 2023, 12:04 PM

Post #5 of 43 (876 views)

On Wed, May 17, 2023 at 10:17:57AM +0200, Ovid wrote:
> At present, there’s a working implementation in a private repo and we
> support things like this:
>
> sub foo :returns(ARRAY[INT]) ( $max_size :of(UINT) ) {...}
> my $count :of(INT) = 0;
[snip]
> Can we discuss this? I think the *semantics* are largely in place. It’s the
> *syntax* where things keep breaking down.

Well I for one would like to know (in general terms) what the proposed
semantics are, because it's its not obvious to me from your email. And off
the top of my head I can think of several possible semantics.

So for example, what does

my $count :of(INT) = ...;

actually do?

Some obvious guesses are that:

* it warns or croaks if the value about to be assigned isn't an int:

my $count :of(INT) = "99"; # string, not int

* it warns or croaks if the value about to be assigned isn't coercible to
an int:

my $count :of(INT) = "99"; # ok
my $count :of(INT) = "foo"; # fails
my $count :of(INT) = undef; # fails
my $count :of(INT) = 1.2; # fails?

* it doesn't warn/croak, but ensures that $count ends up as an int:

my $count :of(INT) = "99";
Dump $count; # shows that $count is an IV, not a PV.

* It also (or not) ensures that any further assignments to $count
during its lifetime also do one of the things suggested above

my $count :of(INT) = 1;
$count = "foo"; # fails - or not?

--
Indomitable in retreat, invincible in advance, insufferable in victory
-- Churchill on Montgomery

Re: Native data checking in Perl [ In reply to ]

chris at prather

May 18, 2023, 1:25 PM

Post #6 of 43 (876 views)

On Thu, May 18, 2023 at 3:05?PM Dave Mitchell <davem@iabyn.com> wrote:
>
> On Wed, May 17, 2023 at 10:17:57AM +0200, Ovid wrote:
> > At present, there’s a working implementation in a private repo and we
> > support things like this:
> >
> > sub foo :returns(ARRAY[INT]) ( $max_size :of(UINT) ) {...}
> > my $count :of(INT) = 0;
> [snip]
> > Can we discuss this? I think the *semantics* are largely in place. It’s the
> > *syntax* where things keep breaking down.

I'm not Ovid, and I don't have the hair to play him on TV but I have
spent a lot of time thinking about this (and had some opinionated
conversations with Ovid about it too).

> Well I for one would like to know (in general terms) what the proposed
> semantics are, because it's its not obvious to me from your email. And off
> the top of my head I can think of several possible semantics.
>
> So for example, what does
>
> my $count :of(INT) = ...;
>
> actually do?

Well the semantics of Type::Tiny and Moose in general are (when boiled
down to their simplest form) a name associated with some kind of
subroutine that returns a boolean. Someone somewhere would declare
something like `$checks{INT} = sub { m/\d+/ };` Then `my $count
:of(INT) = ...` would call the code associated with INT on the
assignment to $count and if the code returns false it throws an
exception.

> Some obvious guesses are that:
>
> * it warns or croaks if the value about to be assigned isn't an int:
>
> my $count :of(INT) = "99"; # string, not int

Yes, but see below.

> * it warns or croaks if the value about to be assigned isn't coercible to
> an int:
>
> my $count :of(INT) = "99"; # ok
> my $count :of(INT) = "foo"; # fails
> my $count :of(INT) = undef; # fails
> my $count :of(INT) = 1.2; # fails?

Even "99" and 1.2 should IMO fail, but it depends on the
implementation of the check code for INT, and what defaults we set for
implicit coercions. Moose requires you flag coercible attributes
(coerce => 1 on an attribute definition), this being Perl and
coercions being baked so deeply into the language I don't think that
would fly in general but allowing someone to disable them would be
incredibly useful/nice e.g. `

no coercion;
my $count :of(INT) = "99"; # warn/croak/dies
my $count :of(INT) = 1.2; # warn/croak/die

> * it doesn't warn/croak, but ensures that $count ends up as an int:
>
> my $count :of(INT) = "99";
> Dump $count; # shows that $count is an IV, not a PV.

Probably but see above, I'd like to be able to make that
warn/croak/die in some circumstances.

> * It also (or not) ensures that any further assignments to $count
> during its lifetime also do one of the things suggested above
>
> my $count :of(INT) = 1;
> $count = "foo"; # fails - or not?

Yeah that should IMO fail, it should with Moose or Type::Tiny … as
long as the check code is associated with the variable it would be
called when the variable is set or when it would trigger a coercion.
For example:

my $count :of(INT) = 99; # all good so far
say ''.$count; # I want enable this to warn/croak/die

Being able to make the system warn/croak/die on the latter would
really help track down problems with data that's destined for places
where "99" and 99 are sometimes not equal. We do this sometimes with
uninitialized values, it'd be nice to be able to tell Perl what I'm
expecting and have it tell me when that's not what I've got.

Without having seen much of Ovid and Damian's code (but having had a
long conversation with Ovid about it a couple months ago), it seems to
me that having a way to "efficiently" associate a variable with
potentially user-defined code that checks that value it's about to
contain belongs to a set of allowed values, and to warn/croak/die when
it doesn't … would make finding whole classes of data related bugs
easier. The words "efficiently" and "check ... allowed values" are
carrying a _lot_ of water there but that's the basic semantics that I
think Ovid was assuming everyone agreed with. It seems to be where
CPAN has settled.

-Chris

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 18, 2023, 11:21 PM

Post #7 of 43 (876 views)

On Thu, May 18, 2023 at 9:05?PM Dave Mitchell <davem@iabyn.com> wrote:

> > Can we discuss this? I think the *semantics* are largely in place. It’s
> the
> > *syntax* where things keep breaking down.
>
> Well I for one would like to know (in general terms) what the proposed
> semantics are, because it's its not obvious to me from your email. And off
> the top of my head I can think of several possible semantics.

I understand what you're coming from. I was strongly hoping we could at
least get some consensus on what P5P would be willing to accept in terms of
syntax so that when the rest comes along, part of the issue is out of the
way.

However, now that you've asked the questions, I don't think I can
reasonably dodge them. However, I don't want to get bogged down in a debate
on semantics at this time. Suggestions around this area have died too many
times, despite the apparent support for the idea.

>
>
So for example, what does
>
> my $count :of(INT) = ...;
>
> actually do?
>

By default this is fatal, but it can told to warn instead:

> my $count :of(INT) = "99"; # string, not int
>
> * it warns or croaks if the value about to be assigned isn't coercible to
> an int:
>
> my $count :of(INT) = "99"; # ok
> my $count :of(INT) = "foo"; # fails
> my $count :of(INT) = undef; # fails
> my $count :of(INT) = 1.2; # fails?
>
> * it doesn't warn/croak, but ensures that $count ends up as an int:
>
> my $count :of(INT) = "99";
> Dump $count; # shows that $count is an IV, not a PV.
>
> * It also (or not) ensures that any further assignments to $count
> during its lifetime also do one of the things suggested above
>
> my $count :of(INT) = 1;
> $count = "foo"; # fails - or not?
>
>
> --
> Indomitable in retreat, invincible in advance, insufferable in victory
> -- Churchill on Montgomery
>

--
Curtis "Ovid" Poe
--
CTO, All Around the World
World-class software development and consulting
https://allaroundtheworld.fr/

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 18, 2023, 11:23 PM

Post #8 of 43 (876 views)

The original, obviously incomplete email was sent by accident before I
completed it. I'll send another one soon.

> By default this is fatal, but it can told to warn instead:
>
>
>

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 19, 2023, 12:16 AM

Post #9 of 43 (876 views)

On Thu, May 18, 2023 at 9:05?PM Dave Mitchell <davem@iabyn.com> wrote:

> On Wed, May 17, 2023 at 10:17:57AM +0200, Ovid wrote:
> Well I for one would like to know (in general terms) what the proposed
> semantics are, because it's its not obvious to me from your email. And off
> the top of my head I can think of several possible semantics.
>

I understand what you're coming from. I was strongly hoping we could at
least get some consensus on what P5P would be willing to accept in terms of
syntax so that when the rest comes along, part of the issue is out of the
way.

However, now that you've asked the questions, I don't think I can
reasonably dodge them. However, I don't want to get bogged down in a debate
on semantics at this time. Suggestions around this area have died too many
times, despite the apparent support for the idea.

> So for example, what does
>
> my $count :of(INT) = ...;
>
> actually do?
>

By default it's fatal. However, you can ask it to warn instead, or even
disable checks completely.

> my $count :of(INT) = "99"; # ok
> my $count :of(INT) = "foo"; # fails
> my $count :of(INT) = undef; # fails
> my $count :of(INT) = 1.2; # fails?
>

You have the above correct. Assigning a reference also fails, with the
exception that if the reference is blessed and it has numeric overloading,
it should overload as an integer and that value will be used.

The checks do not coerce any data beyond what Perl does natively (e.g., the
string "99" becomes a number). Uer-defined coercions are specced but not
intended to be in the MVP. They are orthogonal to checks, including being
defined separately, because if you need to disable checks for performance
reasons (or because the check turns out to be wrong), you can't safely
disable coercions without breaking the code.

> my $count :of(INT) = "99";
> Dump $count; # shows that $count is an IV, not a PV.

Great question. This is not specced, but clearly something we'd want to
know. At present, it's a PV. The way checks currently work is that the
check is made after the assignment. This is an implementation artefact. I
think IV would be probably be better, but I'd rather bikeshed that later.

> my $count :of(INT) = 1;

$count = "foo"; # fails - or not?
>

Yes, that fails. For Moo/se, assuming you use the methods and don't "reach
inside" the object, the isa declaration on the attribute is respected. For
Type::Tiny, it's only checked once and after that, all bets are off (this
isn't a criticism. I use the module religiously, but I suspect the
performance implications are the driving factor in this decision).

So right now, the leading contenders for the syntax appear to be something
like (with the case of the check name and/or the name of the attribute to
be argued separately):

1. my INT $count = 0;
2. my $count :of(INT) = 0;
3. my $count is INT = 0;

(With the latter being "isa" instead of "is" for classes.
https://www.nntp.perl.org/group/perl.perl5.porters/2019/11/msg256683.html).

So far, the first version appears to be the most popular.

I'm rather partial to the second version for reasons I won't go into here,
but I won't argue for that position. I simply would like to gain a
consensus on what we'd like to do, taking into account what is actually
possible.

One thing which might be worth taking into consideration is how we use
Perl. Perl's primary type system revolves around how we *structure* data,
irrespective of the *kinds* of data we have (avoiding the word "type"
because it's so overloaded. Thus, we might find ourselves doing this:

my Hash[Hash[Int]] $grades = {
yusef => { math => [19, 13, 20],
biology => [12, 18, 17] },
jing => { math => [20, 11, 17],
biology => [7, 18, 18] }
};

This is important because I've been checking through several codebases to
see the kinds of checks people are writing and sometimes they get very
complex and span multiple lines. Languages like Java avoid having "ugly"
declarations in type declarations because they wrap everything in classes:

Grades studentGrades; # "Grades" is a class name

Because Perlers like to throw around complex data structures and sometimes
those structures are one-offs, having to go through the tedium of declaring
a new "check" for every one might mean complex check declarations. The
above might become: Hash[Hash[Int[1..20]] and if those grades could be
floats, we'd need even more syntax for whether or not ranges are inclusive
or exclusive.

That would be trivial if if was a user-defined type:

check Grade :isa(NUM) ($n) {
0 < $n <= 20; # zero to twenty, but zero itself is not allowed
}

But at that point, the user needs to define the check in Perl. Such common
use cases should probably be native (for performance), meaning that any
syntax we prefer should take this into account. Currently, that's handled
with:

my $grade :of(NUM[0 <.. 20]) = get_grade($student); # 0 < $grade <= 20

Does the following work?

my NUM[0 <.. 20] $grade = get_grade($student);

That turns the original check into:

my Hash[Hash[NUM[0 <.. 20]]] $grades = ...;

Things are starting to get ugly there (please ignore inconsistent case in
checks; we'll get to that later), but given how we tend to use Perl, it's
kind of unavoidable. So any decision on what syntax we prefer
("prefix/postfix" and "attributes/no attributes") should take this into
account.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

May 19, 2023, 12:20 AM

Post #10 of 43 (876 views)

On Fri, May 19, 2023 at 08:21:56AM +0200, Ovid wrote:
> I understand what you're coming from. I was strongly hoping we could at
> least get some consensus on what P5P would be willing to accept in terms of
> syntax so that when the rest comes along, part of the issue is out of the
> way.

I'm happy for any discussion on semantics to be deferred for now and for
this thread to concentrate (for now) on syntax. I just wanted a general
feel for what the proposed syntax is actually supposed to do, (e.g.
whether it imposes a type on the variable throughout its life, or whether
it's a constraint checker for the value of an initial assignment, or
whatever) which might influence people's choice of syntax.

So I look forward the remainder of your truncated reply :-)

--
O Unicef Clearasil!
Gibberish and Drivel!
-- "Bored of the Rings"

Re: Native data checking in Perl [ In reply to ]

May 19, 2023, 12:59 AM

Post #11 of 43 (876 views)

On Fri, 19 May 2023 09:16:43 +0200, Ovid <curtis.poe@gmail.com> wrote:

> So right now, the leading contenders for the syntax appear to be something
> like (with the case of the check name and/or the name of the attribute to
> be argued separately):
>
> 1. my INT $count = 0;
> 2. my $count :of(INT) = 0;
> 3. my $count is INT = 0;
>
> (With the latter being "isa" instead of "is" for classes.
> https://www.nntp.perl.org/group/perl.perl5.porters/2019/11/msg256683.html).
>
> So far, the first version appears to be the most popular.

If there is now way to specify your own types/type-classes, I'd support 1 and 2

1 because it is the most intuitive and easy to use, read and maintain

2 because I can imagine one wants to allow

my $used :of(INT,REAL) = 0;

with any combination of acceptable types, making unacceptable type fatal

In Raku one can build types out of existing types, restrictions in both
definedness and range

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 19, 2023, 4:39 AM

Post #12 of 43 (876 views)

On 2023-05-19 09:59, perl5@tux.freedom.nl wrote:
> On Fri, 19 May 2023 09:16:43 +0200, Ovid <curtis.poe@gmail.com> wrote:
>
>> So right now, the leading contenders for the syntax appear to be something
>> like (with the case of the check name and/or the name of the attribute to
>> be argued separately):
>>
>> 1. my INT $count = 0;
>> 2. my $count :of(INT) = 0;
>> 3. my $count is INT = 0;
>>
>> (With the latter being "isa" instead of "is" for classes.
>> https://www.nntp.perl.org/group/perl.perl5.porters/2019/11/msg256683.html).
>>
>> So far, the first version appears to be the most popular.
> If there is now way to specify your own types/type-classes, I'd support 1 and 2
>
> 1 because it is the most intuitive and easy to use, read and maintain
>
> 2 because I can imagine one wants to allow
>
> my $used :of(INT,REAL) = 0;
>
> with any combination of acceptable types, making unacceptable type fatal
>
> In Raku one can build types out of existing types, restrictions in both
> definedness and range
>

When I played with (tersely) applying "value-restrictions" to variables,
I generally used the format:

T_Int my ($x, $y) = (...);

as that is easy to implement in current Perl, with a sub T_Int {...}.

IIRC, it was easier than using the format

my ($x, $y) :T_Int = (...);

In the linked msg256683.html, Dave Mitchel writes:
"In particular, we shouldn't use a type prefix before the variable name to
specify a constraint; that should be reserved for a hypothetical future
type system."
which IMO remains important.

I would love to be able to write:

sub succ (UInt64 $_u) :UInt64 {
      state UInt64 $__MUL = 0x27bb2ee687b0b0fd;
      state UInt64 $__ADD = 0x00000000b504f32d;
      return $_u * $__MUL + $__ADD; # should ignore any overflows
}

or rather terser:

{ use UInt64;
    sub succ ($_u) {
    state $__MUL = 0x27bb2ee687b0b0fd;
        state $__ADD = 0x00000000b504f32d;
        return $_u * $__MUL + $__ADD;
}
}

-- Ruud

Re: Native data checking in Perl [ In reply to ]

fawaka at gmail

May 19, 2023, 5:52 AM

Post #13 of 43 (876 views)

On Fri, May 19, 2023 at 9:17?AM Ovid <curtis.poe@gmail.com> wrote:

> One thing which might be worth taking into consideration is how we use
> Perl. Perl's primary type system revolves around how we *structure* data,
> irrespective of the *kinds* of data we have (avoiding the word "type"
> because it's so overloaded. Thus, we might find ourselves doing this:
>
> my Hash[Hash[Int]] $grades = {
> yusef => { math => [19, 13, 20],
> biology => [12, 18, 17] },
> jing => { math => [20, 11, 17],
> biology => [7, 18, 18] }
> };
>

I don't think that is possible in a useful way. That is only workable if
values were typed instead of variables. And I don't think we can type
values retroactively. It's not that you can't do this (potentially
expensive) check, but you can't maintain its invariance in any meaningful
way. It would still allow modifying $grades in ways that would defy the
given constraint.

These are not types, they're type refinements. That doesn't make it useless
but it does make it severely limited.

Leon

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 19, 2023, 6:29 AM

Post #14 of 43 (876 views)

On Fri, May 19, 2023 at 2:52?PM Leon Timmermans <fawaka@gmail.com> wrote:

> my Hash[Hash[Int]] $grades = {
>> yusef => { math => [19, 13, 20],
>> biology => [12, 18, 17] },
>> jing => { math => [20, 11, 17],
>> biology => [7, 18, 18] }
>> };
>>
>
> I don't think that is possible in a useful way. That is only workable if
> values were typed instead of variables. And I don't think we can type
> values retroactively. It's not that you can't do this (potentially
> expensive) check, but you can't maintain its invariance in any meaningful
> way. It would still allow modifying $grades in ways that would defy the
> given constraint.
>

Yeah, that's a serious issue we encountered. Consider this:

my @records Hash[Array[Int]] = { bobbie => [3,2,1] };
$records[0] = { bobbie => [3,2,"foo"] }; # fatal
$records[0]{bobbie}[-1] = "foo"; # legal

For the above, the last two lines are kind of equivalent in terms of data
(though obviously the reference changes in the fatal version). However, the
last line "works" because we simply documented that checks are
prerequisites for *assignment to the variable*, not to *assignments to
the contents* of that variable. There are simply too many problems with the
latter, particularly in light of the fact that this is Perl and we didn't
have this from the start.

Best,
Ovid

--
Curtis "Ovid" Poe
--
CTO, All Around the World
World-class software development and consulting
https://allaroundtheworld.fr/

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 19, 2023, 6:35 AM

Post #15 of 43 (876 views)

On Fri, May 19, 2023 at 9:59?AM <perl5@tux.freedom.nl> wrote:

> If there is now way to specify your own types/type-classes, I'd support 1
> and 2
>
>
I think you meant "no way" instead of "now way"?

We do have a spec for specifying your own checks. Classes are easy and
don't require a new check:

my $dog :of(OBJ[Animal::Dog]) = get_dog('spot');

User-defined checks are easy. If you want a scalar to never decrease in
value:

check Monotonic :isa(NUM) ($value, %value) { $value >= $value{old} }

> 2 because I can imagine one wants to allow
>
> my $used :of(INT,REAL) = 0;
>
> with any combination of acceptable types, making unacceptable type fatal
>

That's handled:

my $used :of( INT | REAL ) = 0;

Best,
Ovid

--
Curtis "Ovid" Poe
--
CTO, All Around the World
World-class software development and consulting
https://allaroundtheworld.fr/

Re: Native data checking in Perl [ In reply to ]

May 19, 2023, 6:56 AM

Post #16 of 43 (876 views)

On Fri, 19 May 2023 15:35:02 +0200, Ovid <curtis.poe@gmail.com> wrote:

> On Fri, May 19, 2023 at 9:59?AM <perl5@tux.freedom.nl> wrote:
>
> > If there is now way to specify your own types/type-classes, I'd support 1
> > and 2
>
> I think you meant "no way" instead of "now way"?

I did

> We do have a spec for specifying your own checks. Classes are easy and
> don't require a new check:
>
> my $dog :of(OBJ[Animal::Dog]) = get_dog('spot');

????

> User-defined checks are easy. If you want a scalar to never decrease in
> value:
>
> check Monotonic :isa(NUM) ($value, %value) { $value >= $value{old} }

I probably need to read that more than twice to let it land

> > 2 because I can imagine one wants to allow
> >
> > my $used :of(INT,REAL) = 0;
> >
> > with any combination of acceptable types, making unacceptable type fatal
>
> That's handled:
>
> my $used :of( INT | REAL ) = 0;

Yeah!

> Ovid

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

Re: Native data checking in Perl [ In reply to ]

happy.barney at gmail

May 19, 2023, 1:26 PM

Post #17 of 43 (876 views)

Few notes / questions (and many more pending):

- should this be available for literal value? eg:

$foo = 1 :is (Float);
$foo = undef :is (Maybe[Str]);

- should this be available for expressions? eg:

return 1 :is (Float)
return :is (Maybe[Str]);
return foo :is (Float) (1, 2, 3);
return do :is (Maybe[Str]) ...;

- should this be available in boolean context?

if ($foo :is (Float)) { ... }

- this can supplement tainted (imho very good idea though current
implementation not sufficient)

foo (1 :is (Tainted))

And one comment.

Generally it's considered a good practice to not use primitive types /
constraints / ....
It will be nice (and feature easy to sell) to not allow primitive types in
:is(),
forcing user to declare domain specific types, eg:

instead of
$limit :is (UINT)

program should contain:
check Limit :extends(UINT) { $_ <= 32 };
$limit :is (Limit)

Brano

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 3:53 PM

Post #18 of 43 (876 views)

A few things I would want to see:

I prefer the consistency of having the name of an entity always on the left and
everything describing it to be on the right of the name.

Ideally the ordering is something that would still work if we were defining the
same interface in a hash ref say, where the identifiers in question are usually
are the keys.

I like the greater overall consistency of the "item keyword typedef" being used
everywhere, the same for "is" as "returns" etc.

Something else I want to see in a clean and consistent way is both supporting
data checks that correspond directly and unambiguously to Perl native types
(undef, boolean, integer, float, arrayref, hashref, <corinna object>, etc),
while also supporting extensibility for any user-defined predicate function (a
sub that takes any value and returns true or false if it is or isn't of the
type) which would be the basis for more complex checks, either passable by name
or subref.

We would want something that makes it easy to have the metadata needed to
"compile" Perl code to something C-like for performance if we wished to.

The unambiguous ways to refer to Perl native types must not be user overridable,
a code parser or compiler must be able to know it is safe that if say "Int" or
'ArrayRef" is seen, it always means the native one, not possibly something the
user replaced it with.

There could be 1:1 correspondants where it makes sense pre-defined that could be
overridden if there is a good reason to support that, and otherwise users should
have to choose names that are different from the built-ins for theirs.

-- Darren Duncan

On 2023-05-17 4:56 a.m., Ovid wrote:
> On Wed, May 17, 2023 at 11:32?AM G.W. Haywood via perl5-porters wrote:
>
> > sub foo ( UInt $max_size ) returns ArrayRef[Int] {...}
> > my Int $count = 0;
> >
> > Or this:
> >
> > sub foo ( $max_size is Uint ) returns ArrayRef[Int] {...}
> > my $count is Int = 0;
>
> Coming from a number of other languages before Perl, the first of
> those feels natural to me.
>
> I suspect most feel the same way.

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 4:09 PM

Post #19 of 43 (876 views)

On 2023-05-18 1:25 p.m., Chris Prather wrote:
> Even "99" and 1.2 should IMO fail, but it depends on the
> implementation of the check code for INT, and what defaults we set for
> implicit coercions. Moose requires you flag coercible attributes
> (coerce => 1 on an attribute definition), this being Perl and
> coercions being baked so deeply into the language I don't think that
> would fly in general but allowing someone to disable them would be
> incredibly useful/nice e.g. `
>
> no coercion;
> my $count :of(INT) = "99"; # warn/croak/dies
> my $count :of(INT) = 1.2; # warn/croak/die
>
>> * it doesn't warn/croak, but ensures that $count ends up as an int:
>>
>> my $count :of(INT) = "99";
>> Dump $count; # shows that $count is an IV, not a PV.
>
> Probably but see above, I'd like to be able to make that
> warn/croak/die in some circumstances.

My general thought is that it is important for Perl to let users explicitly
declare what they expect, for example:

my $count1 :is(Int);
my $count2 :is(Int) :coerce(Int);

The first example would throw an exception if anything that isn't already an Int
is assigned to it, and the second example would try to coerce the thing to an Int.

What :is takes is a predicate resulting in true or false depending on its input;
what :coerce takes is a mapping function that returns an Int value derived from
its input. An optimization can be that the :coerce one is only invoked when :is
is false.

The :is and :coerce can be used anywhere one may declare a type, whether on a
local variable or a parameter declaration or a returns etc, or in the returns
case maybe :returns would be analagous to :is and there could be a
:returns-coerce analagous to the :coerce.

I'm not suggesting specific keywords, but that we have distinct ones for these
purposes.

The strict :is is the most important, the :coerce is something nice to add.

There can be multiple :coerce functions that coerce to the same type, depending
on desired semantics, and in the general case, :coerce is really just a generic
title for what ideally would be a set of functions that are specific in their
behavior, kind of like how I feel floor(), ceil(), to_zero(), to_even() etc are
better rounding function names than round().

And the fact these options can exist makes it even better for consistency to
have the "entity keyword type" format rather than "type entity" format because
the former is extensible. Or if you want to have both, "type entity" should
ALWAYS mean strict exception throwing yes/no and NOT coerce ever, and one can
use the :foo syntax to indicate coersion when desired.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 4:12 PM

Post #20 of 43 (876 views)

On 2023-05-19 12:20 a.m., Dave Mitchell wrote:
> On Fri, May 19, 2023 at 08:21:56AM +0200, Ovid wrote:
>> I understand what you're coming from. I was strongly hoping we could at
>> least get some consensus on what P5P would be willing to accept in terms of
>> syntax so that when the rest comes along, part of the issue is out of the
>> way.
>
> I'm happy for any discussion on semantics to be deferred for now and for
> this thread to concentrate (for now) on syntax. I just wanted a general
> feel for what the proposed syntax is actually supposed to do, (e.g.
> whether it imposes a type on the variable throughout its life, or whether
> it's a constraint checker for the value of an initial assignment, or
> whatever) which might influence people's choice of syntax.
>
> So I look forward the remainder of your truncated reply :-)

One can't really put off semantics in order to discuss syntax; discussing syntax
is meaningless without knowing what that syntax DOES. Especially when there may
be several different behaviors to represent that would have distinct syntax, eg
:is vs :coerce etc. -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 19, 2023, 4:44 PM

Post #21 of 43 (876 views)

I also want to firmly put my support behind the concept that Perl undef is
effectively its own singleton type and whenever one declares that something is
an "int" or "float" or whatever else it is implicitly the case that those are
DEFINED, and guaranteed to contain a valid int/float/etc at all times.

When one wants something to be allowed to be undefined, that should be declared
with an explicit union such as "int|undef" and if one just says "int" then the
latter is excluded.

This will obviously mean we will need a well documented "default value" for
every type, eg if one declares "my Int $foo;" without assigning it a value, then
$foo implicitly contains zero, assuming we don't want to require an explicit
assignment or declaration of a default value to remove any doubt, though that
might possibly get unwieldy with more complicated types.

This also means that if one declares eg "my UserDefType $foo;" then this ALSO
does NOT allow undef, and must be some object of that type or whatever.

I can see this being resolved on one hand by either of:

1. Have some standard way for a class to declare what its "default" value is,
where that is useful, for example explicitly declaring it is their concept of
zero. Then $foo implicitly gets that default.

2. Although for some classes it doesn't make sense to require a default, eg if
an object represents an active connection to some external resource. Then the
$foo line is an error but in the general case that would only be known at
runtime and could result in a no-default-value exception or something.

3. Have a compile-time requirement that any "my Foo $foo" must either explicitly
assign a value or must be of the form "my Foo|undef $foo".

Of course, then there is the related matter, if we have a type union, then whose
default from the union is used as the default of the union as a whole? I would
suggest the easiest to understand option is treat the union as being an ordered
rather than unordered set and go left to right, using the default of the
left-most option that defines one. Note that having "undef" in the list would
only make the default "undef" if it appears first or all options before it don't
declare a default.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 12:49 AM

Post #22 of 43 (876 views)

On Fri, May 19, 2023 at 10:26?PM Branislav Zahradník <happy.barney@gmail.com>
wrote:

> - should this be available for literal value? eg:
> - should this be available in boolean context?
> - this can supplement tainted (imho very good idea though current
> implementation not sufficient)
>

I really like these ideas. Will add them to my notes now, but probably not
something for an MVP.

> Generally it's considered a good practice to not use primitive types /
> constraints / ....
> It will be nice (and feature easy to sell) to not allow primitive types in
> :is(),
> forcing user to declare domain specific types, eg:
>

A number of code bases were consulted prior to this to see real-world uses.
Quite often people have a "one off" check fora variable or sub that doesn't
justify the value of a domain-specific constraint (D

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 1:22 AM

Post #23 of 43 (876 views)

On Sat, May 20, 2023 at 12:53?AM Darren Duncan <darren@darrenduncan.net>
wrote:

> I like the greater overall consistency of the "item keyword typedef" being
> used
> everywhere, the same for "is" as "returns" etc.
>

Noted.

> Something else I want to see in a clean and consistent way is both
> supporting
> data checks that correspond directly and unambiguously to Perl native
> types
> (undef, boolean, integer, float, arrayref, hashref, <corinna object>,
> etc),
>

That's generally there (with some hand-waving because the devil's in the
details).

> while also supporting extensibility for any user-defined predicate
> function (a
> sub that takes any value and returns true or false if it is or isn't of
> the
> type) which would be the basis for more complex checks, either passable by
> name
> or subref.
>

Currently we don't have predicate functions defined, but there's a robust
definition for user-defined checks.

> We would want something that makes it easy to have the metadata needed to
> "compile" Perl code to something C-like for performance if we wished to.
>

Post-MVP.

> The unambiguous ways to refer to Perl native types must not be user
> overridable,
> a code parser or compiler must be able to know it is safe that if say
> "Int" or
> 'ArrayRef" is seen, it always means the native one, not possibly something
> the
> user replaced it with.
>

Already planned.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 1:29 AM

Post #24 of 43 (876 views)

On Sat, May 20, 2023 at 1:09?AM Darren Duncan <darren@darrenduncan.net>
wrote:

> My general thought is that it is important for Perl to let users
> explicitly
> declare what they expect, for example:
>
> my $count1 :is(Int);
> my $count2 :is(Int) :coerce(Int);
>

Coercions are a "nice-to-have" and absolutely post-MVP. I'd be half-tempted
to say "not allowed."

1. They can't be disabled or downgraded to warnings, even if they're
wrong/buggy
2. Coercing any reference value can alter the calling code's data
3. Implicit coercions like you show above are a minefield (do we
*really* want
references coerced to integers?)
4. Explicit coercions declared via the coercion keyword give us
action-at-a-distance that can be hard to debug (because debugging a
post-coercion value doesn't mean it's easy to get the pre-coercion value)

I'm not saying "no coercions" simply because I don't have that authority,
but they have enough pitfalls that they need to be handled carefully.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 1:40 AM

Post #25 of 43 (876 views)

On Sat, May 20, 2023 at 1:44?AM Darren Duncan <darren@darrenduncan.net>
wrote:

> When one wants something to be allowed to be undefined, that should be
> declared
> with an explicit union such as "int|undef" and if one just says "int" then
> the
> latter is excluded.
>

That's already in the spec.

> This will obviously mean we will need a well documented "default value"
> for
> every type, eg if one declares "my Int $foo;" without assigning it a
> value, then
> $foo implicitly contains zero, assuming we don't want to require an
> explicit
> assignment or declaration of a default value to remove any doubt, though
> that
> might possibly get unwieldy with more complicated types.
>

I think magic defaults are a very bad idea and you *can't* do it. Just a
few checks we support:

open my $fh :of(HANDLE), '<', $file;
while (my ( $key, $value :of(GLOB) ) = each %main:: ) { ... }
my $version :of(VSTR) = v5.22.0;
my $spot :of(OBJ[Dog]) = get_dog('spot');

There's plenty more where those came from. Not one of those has a sane
default.

Your example of an ENUM or type union is another good example of why the
default would be bad. Imagine an enum with seven allowed values.
Arbitrarily choosing the first one for me because I forgot an initial
assignment? Bugs waiting to happen.

We also support some common subchecks, such as TUPLE and DICT. There's no
way default will be reasonable for those.

Best,
Ovid