Mailing List Archive: Native data checking in Perl

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 20, 2023, 2:13 AM

Post #26 of 43 (628 views)

Op 20-05-2023 om 10:40 schreef Ovid:
> On Sat, May 20, 2023 at 1:44?AM Darren Duncan
> <darren@darrenduncan.net> wrote:
>
> When one wants something to be allowed to be undefined, that
> should be declared
> with an explicit union such as "int|undef" and if one just says
> "int" then the
> latter is excluded.
>
>
> That's already in the spec.
>
> This will obviously mean we will need a well documented "default
> value" for
> every type, eg if one declares "my Int $foo;" without assigning it
> a value, then
> $foo implicitly contains zero, assuming we don't want to require
> an explicit
> assignment or declaration of a default value to remove any doubt,
> though that
> might possibly get unwieldy with more complicated types.
>
>
> I think magic defaults are a very bad idea and you /can't/ do it. Just
> a few checks we support:
>
> open my $fh :of(HANDLE), '<', $file;
> while (my ( $key, $value :of(GLOB) ) = each %main:: ) { ... }
> my $version :of(VSTR) = v5.22.0;
> my $spot :of(OBJ[Dog]) = get_dog('spot');
> There's plenty more where those came from. Not one of those has a sane
> default.
>
> Your example of an ENUM or type union is another good example of why
> the default would be bad. Imagine an enum with seven allowed values.
> Arbitrarily choosing the first one for me because I forgot an initial
> assignment? Bugs waiting to happen.
>
> We also support some common subchecks, such as TUPLE and DICT. There's
> no way default will be reasonable for those.

Does that mean that the default will be the same as it is now, or will
it mean that a value must be assigned and defaults are disallowed?

M4

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 2:35 AM

Post #27 of 43 (628 views)

On Sat, May 20, 2023 at 11:13?AM Martijn Lievaart via perl5-porters <
perl5-porters@perl.org> wrote:

> Does that mean that the default will be the same as it is now, or will it
> mean that a value must be assigned and defaults are disallowed?
>

Not sure exactly what you're asking (I probably just need more coffee).
Currently, you, the developer, will have to assign a value to a checked
variable.

my $count :of(INT) = 0;
my $count :of(INT); # fatal because undef isn't an integer
my $count :of(INT|UNDEF); # allowed

Damian and I went a few rounds over that one, but it's important. We're
retrofitting a check system onto a language where this has always been
ad-hoc. Consider the following pseudo-code:

var count Int;

For many languages where "types" are part of the initial spec, that
declaration is illegal, but you get a compile-time type failure if you try
to use that value prior to assignment. For dynamic languages, that's often
a runtime error.

That doesn't work for Perl because it's perfectly fine to $count++ if that
variable is undefined. And what happens for this?

sub foo () {
my $count :of(INT);
foreach my $cust (get_customers()) {
next unless meets_conditions($cust);
$count++;
}
return $count;
}

We lose all guarantees. This is a silly example because $count will coerce
to zero, but we can easily create more complex examples where allowing an
unitialized variable is problematic. To quote part of Damian's response on
this:

To be truly useful – and safe – these checks have to be guarantees that
> subsequent code
> can rely upon the values stored in checked variables. Allowing checks to
> be initially
> skirted utterly destroys that certainty.

> I understand that undef is special. And convenient. It’s Perl’s single
> most highly
> auto-coercive value. But that’s exactly why we can’t allow checked values
> to remain
> uninitialized. Because undef will coerce to just about anything, which
> utterly
> breaks the guarantee that a checked value is not just anything, but rather
> is
> something highly specific.

> In short, my view is: if you want your variable to be sloppy and unchecked,
> then be sloppy and don’t make it checked. If you make it checked, then
> don’t
> expect it to still be sloppy and (initially) unchecked.

In short, this is going to be a contentious issue, but I'd rather we be
strict at first just to err on the side of caution.

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

lukasmai.403 at gmail

May 20, 2023, 2:46 AM

Post #28 of 43 (628 views)

On 20.05.23 11:35, Ovid wrote:
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed
>

Semantically, Perl so far does not have initialization, only assignment.
The syntax 'my $x = 0' means "create the local variable $x, then assign
0 to it". So what happens in the proposed system if we break the two
steps apart?

my $count :of(INT) = do { goto L; };
L: say $count; # ?

Also, what do these do?

my $count :of(INT) += 0;

${\my $count :of(INT)} = 0;

Re: Native data checking in Perl [ In reply to ]

perl5-porters at perl

May 20, 2023, 2:51 AM

Post #29 of 43 (628 views)

Op 20-05-2023 om 11:35 schreef Ovid:
> On Sat, May 20, 2023 at 11:13?AM Martijn Lievaart via perl5-porters
> <perl5-porters@perl.org> wrote:
>
> Does that mean that the default will be the same as it is now, or
> will it mean that a value must be assigned and defaults are
> disallowed?
>
>
> Not sure exactly what you're asking (I probably just need more coffee).

Thanks, clear. Just to clarify, I was trying to ask if you must assign a
value *always* when using these type checks, to which you clearly
answered 'yes'. So leaving a variable unintialized will be forbidden,
which is fine with me.

Thanks,

M4

Re: Native data checking in Perl [ In reply to ]

lukasmai.403 at gmail

May 20, 2023, 3:04 AM

Post #30 of 43 (628 views)

On 20.05.23 11:35, Ovid wrote:
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed
>

Semantically, Perl so far does not have initialization, only assignment.
The syntax 'my $x = 0' means "create the local variable $x, then assign
0 to it". So what happens in the proposed system if we break the two
steps apart?

my $count :of(INT) = do { goto L; };
L: say $count; # ?

Also, what do these do?

my $count :of(INT) += 0;

${\my $count :of(INT)} = 0;

Re: Native data checking in Perl [ In reply to ]

May 20, 2023, 3:05 AM

Post #31 of 43 (628 views)

On Sat, 20 May 2023 11:35:36 +0200, Ovid <curtis.poe@gmail.com> wrote:

> On Sat, May 20, 2023 at 11:13?AM Martijn Lievaart via perl5-porters <
> perl5-porters@perl.org> wrote:
>
> > Does that mean that the default will be the same as it is now, or will it
> > mean that a value must be assigned and defaults are disallowed?
>
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed

I guess that - at least in *my* perception - :of(INT|UNDEF) will be
used *a lot* more than :of(INT). I deduce that from my use of types in
Raku. I feel it non-intuitive to make the most used case take this
amount of syntax.

> Damian and I went a few rounds over that one, but it's important. We're
> retrofitting a check system onto a language where this has always been
> ad-hoc. Consider the following pseudo-code:
>
> var count Int;
>
> For many languages where "types" are part of the initial spec, that
> declaration is illegal, but you get a compile-time type failure if you try
> to use that value prior to assignment. For dynamic languages, that's often
> a runtime error.
>
> That doesn't work for Perl because it's perfectly fine to $count++ if that
> variable is undefined. And what happens for this?
>
> sub foo () {
> my $count :of(INT);
> foreach my $cust (get_customers()) {
> next unless meets_conditions($cust);
> $count++;
> }
> return $count;
> }
>
> We lose all guarantees. This is a silly example because $count will coerce
> to zero, but we can easily create more complex examples where allowing an
> unitialized variable is problematic. To quote part of Damian's response on
> this:
>
> To be truly useful – and safe – these checks have to be guarantees that
> > subsequent code
> > can rely upon the values stored in checked variables. Allowing checks to
> > be initially
> > skirted utterly destroys that certainty.
>
> > I understand that undef is special. And convenient. It’s Perl’s single
> > most highly
> > auto-coercive value. But that’s exactly why we can’t allow checked values
> > to remain
> > uninitialized. Because undef will coerce to just about anything, which
> > utterly
> > breaks the guarantee that a checked value is not just anything, but rather
> > is
> > something highly specific.
>
> > In short, my view is: if you want your variable to be sloppy and unchecked,
> > then be sloppy and don’t make it checked. If you make it checked, then
> > don’t
> > expect it to still be sloppy and (initially) unchecked.
>
> In short, this is going to be a contentious issue, but I'd rather we be
> strict at first just to err on the side of caution.
>
> Best,
> Ovid

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.37 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org

Re: Native data checking in Perl [ In reply to ]

lukasmai.403+p5p at gmail

May 20, 2023, 3:14 AM

Post #32 of 43 (628 views)

On 20.05.23 11:35, Ovid wrote:
> Not sure exactly what you're asking (I probably just need more coffee).
> Currently, you, the developer, will have to assign a value to a checked
> variable.
>
> my $count :of(INT) = 0;
> my $count :of(INT); # fatal because undef isn't an integer
> my $count :of(INT|UNDEF); # allowed
>

Semantically, Perl so far does not have initialization, only assignment.
The syntax 'my $x = 0' means "create the local variable $x, then assign
0 to it". So what happens in the proposed system if we break the two
steps apart?

my $count :of(INT) = do { goto L; };
L: say $count; # ?

Also, what do these do?

my $count :of(INT) += 0;

${\my $count :of(INT)} = 0;

Re: Native data checking in Perl [ In reply to ]

happy.barney at gmail

May 20, 2023, 5:38 AM

Post #33 of 43 (628 views)

Semantically, Perl so far does not have initialization, only assignment.
> The syntax 'my $x = 0' means "create the local variable $x, then assign
> 0 to it". So what happens in the proposed system if we break the two
> steps apart?
>
> my $count :of(INT) = do { goto L; };
> L: say $count; # ?
>
> Also, what do these do?
>
> my $count :of(INT) += 0;
>
> ${\my $count :of(INT)} = 0;
>

Nice points. Maybe time to revive my earlier idea about "not-exist" (to
differentiate between uninitialized and undefined).

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 7:04 AM

Post #34 of 43 (628 views)

On Sat, May 20, 2023 at 12:14?PM Lukas Mai <lukasmai.403+p5p@gmail.com>
wrote:

The following responses are my assumptions based on the current spec. I
could be mistaken, or these could be bad ideas.

>
> my $count :of(INT) = do { goto L; };
> L: say $count; # ?
>

My assumption would be that we check on assignment, so $count is undefined
in the say. Whatever value the do {} block returns then gets assigned to
$count, though we're possibly in a weird situation where $count will not be
assigned to on the my line.

> Also, what do these do?
>
> my $count :of(INT) += 0;
>

We only check on assignment, so the + part of that would be fetching the
value, undef, and adding zero to it, leaving zero to be assigned. Thus,
no error on the check.

> ${\my $count :of(INT)} = 0;
>

I think it should be fatal since you're not assigning anything to $count
directly. Instead, you're taking a reference to it and assigning to the
dereference. Hover, the checks attach to the variable, not the data (or in
the case, the reference).

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 7:15 AM

Post #35 of 43 (628 views)

On Sat, May 20, 2023 at 4:04?PM Ovid <curtis.poe@gmail.com> wrote:

> ${\my $count :of(INT)} = 0;
>>
>
> I think it should be fatal since you're not assigning anything to $count
> directly. Instead, you're taking a reference to it and assigning to the
> dereference. Hover, the checks attach to the variable, not the data (or in
> the case, the reference).
>

Actually, I think what I said might not be coherent. Ignore that. The
answer is "I'm not sure."

Best,
Ovid

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 10:57 AM

Post #36 of 43 (628 views)

On 2023-05-20 1:29 a.m., Ovid wrote:
> On Sat, May 20, 2023 at 1:09?AM Darren Duncan wrote:
> My general thought is that it is important for Perl to let users explicitly
> declare what they expect, for example:
>
> my $count1 :is(Int);
> my $count2 :is(Int) :coerce(Int);
>
> Coercions are a "nice-to-have" and absolutely post-MVP. I'd be half-tempted to
> say "not allowed."
>
> 1. They can't be disabled or downgraded to warnings, even if they're wrong/buggy
> 2. Coercing any reference value can alter the calling code's data
> 3. Implicit coercions like you show above are a minefield (do we /really/ want
> references coerced to integers?)
> 4. Explicit coercions declared via the coercion keyword give us
> action-at-a-distance that can be hard to debug (because debugging a
> post-coercion value doesn't mean it's easy to get the pre-coercion value)
>
> I'm not saying "no coercions" simply because I don't have that authority, but
> they have enough pitfalls that they need to be handled carefully.

I should make an important clarification.

I don't actually want or like coercions at all.

My preference is that everything is strict yes/no, and people just use explicit
mapping functions eg floor() on assignment when applicable.

What I was actually arguing for is that IF there are coersions, they should be
explicit as such and are always an alternative where the default is no coersions.

My stated proposal before was more about placating those who feel coercions are
important to have in Perl because its the Perl way or whatever.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 11:01 AM

Post #37 of 43 (628 views)

On 2023-05-20 1:40 a.m., Ovid wrote:
> On Sat, May 20, 2023 at 1:44?AM Darren Duncan wrote:
> When one wants something to be allowed to be undefined, that should be declared
> with an explicit union such as "int|undef" and if one just says "int" then the
> latter is excluded.
>
> That's already in the spec.
>
> This will obviously mean we will need a well documented "default value" for
> every type, eg if one declares "my Int $foo;" without assigning it a value,
> then
> $foo implicitly contains zero, assuming we don't want to require an explicit
> assignment or declaration of a default value to remove any doubt, though that
> might possibly get unwieldy with more complicated types.
>
> I think magic defaults are a very bad idea and you /can't/ do it. Just a few
> checks we support:
>
> open my $fh :of(HANDLE), '<', $file;
> while (my ( $key, $value :of(GLOB) ) = each %main:: ) { ... }
> my $version :of(VSTR) = v5.22.0;
> my $spot :of(OBJ[Dog]) = get_dog('spot');
> There's plenty more where those came from. Not one of those has a sane default.
>
> Your example of an ENUM or type union is another good example of why the
> default would be bad. Imagine an enum with seven allowed values. Arbitrarily
> choosing the first one for me because I forgot an initial assignment? Bugs
> waiting to happen.
>
> We also support some common subchecks, such as TUPLE and DICT. There's no way
> default will be reasonable for those.
> Best,
> Ovid

I read this that you advocate for requiring users to explicitly assign to or
provide an initial value to a variable/parameter/etc when declaring it with a
type/check, and I'm perfectly fine with that, and I think it was one of the
options I gave. The "defaults" idea was more about an alternative if we
consider it important to be able to declare a variable etc without giving it an
initial value. -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 12:55 PM

Post #38 of 43 (628 views)

On 2023-05-20 3:05 a.m., perl5@tux.freedom.nl wrote:
> On Sat, 20 May 2023 11:35:36 +0200, Ovid <curtis.poe@gmail.com> wrote:
>> Not sure exactly what you're asking (I probably just need more coffee).
>> Currently, you, the developer, will have to assign a value to a checked
>> variable.
>>
>> my $count :of(INT) = 0;
>> my $count :of(INT); # fatal because undef isn't an integer
>> my $count :of(INT|UNDEF); # allowed
>
> I guess that - at least in *my* perception - :of(INT|UNDEF) will be
> used *a lot* more than :of(INT). I deduce that from my use of types in
> Raku. I feel it non-intuitive to make the most used case take this
> amount of syntax.

I would argue that most programs allowing undef in more contexts than
disallowing it such as you mention are poorly designed, relying on bad habits.
It is essentially the same as the very widespread problem across the whole
programming world that bake in and default nullability everywhere.

Allowing undef is something we want to actively discourage, an enormous source
of problems, and users should explicitly opt-in to it where they want it rather
than having to explicitly opt-out where they don't.

The tersest way to allow undef by default is to not mark something with a type
in the first place; if you care enough to mark something with a type, I would
hope one wants it more strict.

That being said, I can see good cause to provide an optional shorthand for
"foo|or" undef given how it is so common and simple. I suggest something like
"foo?" which would have this meaning. It doesn't have to be a trailing question
mark, however there is existing precedent in other languages for that. I think
maybe C# does it that way maybe.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 12:58 PM

Post #39 of 43 (628 views)

On 2023-05-20 12:55 p.m., Darren Duncan wrote:
> On 2023-05-20 3:05 a.m., perl5@tux.freedom.nl wrote:
>> On Sat, 20 May 2023 11:35:36 +0200, Ovid <curtis.poe@gmail.com> wrote:
>>> Not sure exactly what you're asking (I probably just need more coffee).
>>> Currently, you, the developer, will have to assign a value to a checked
>>> variable.
>>>
>>>      my $count :of(INT) = 0;
>>>      my $count :of(INT);       # fatal because undef isn't an integer
>>>      my $count :of(INT|UNDEF); # allowed
>>
>> I guess that - at least in *my* perception - :of(INT|UNDEF) will be
>> used *a lot* more than :of(INT). I deduce that from my use of types in
>> Raku. I feel it non-intuitive to make the most used case take this
>> amount of syntax.
>
> I would argue that most programs allowing undef in more contexts than
> disallowing it such as you mention are poorly designed, relying on bad habits.
> It is essentially the same as the very widespread problem across the whole
> programming world that bake in and default nullability everywhere.
>
> Allowing undef is something we want to actively discourage, an enormous source
> of problems, and users should explicitly opt-in to it where they want it rather
> than having to explicitly opt-out where they don't.
>
> The tersest way to allow undef by default is to not mark something with a type
> in the first place; if you care enough to mark something with a type, I would
> hope one wants it more strict.
>
> That being said, I can see good cause to provide an optional shorthand for
> "foo|or" undef given how it is so common and simple. I suggest something like
> "foo?" which would have this meaning. It doesn't have to be a trailing question
> mark, however there is existing precedent in other languages for that. I think
> maybe C# does it that way maybe.

Another thing I meant to say before: Having "int" NOT include undef is also the
principle of least surprise, and it is more consistent. When you say "int" you
MEAN "just int"; same as saying "int" excludes "float", which isn't "int", it
excludes undef, which isn't "int". -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

happy.barney at gmail

May 20, 2023, 2:55 PM

Post #40 of 43 (628 views)

Another question:

should constraint names share namespace with functions or not ?

eg: int (builtin) vs int (constraint)

Both approaches have pros:
- when yes, check can be imported by name
- when no, you can have clear separation allowing to mix new code into old
codebase without conflicts

Moreover, introducing idea of separated symbol namespaces (with possibility
to declare own namespace)
will bring another possibility to extending Perl

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 5:22 PM

Post #41 of 43 (628 views)

On 2023-05-20 2:55 p.m., Branislav Zahradník wrote:
> Another question:
>
> should constraint names share namespace with functions or not ?
>
> eg: int (builtin) vs int (constraint)
>
> Both approaches have pros:
> - when yes, check can be imported by name
> - when no, you can have clear separation allowing to mix new code into old
> codebase without conflicts
>
> Moreover, introducing idea of separated symbol namespaces (with possibility to
> declare own namespace)
> will bring another possibility to extending Perl

Maybe the answer can be some of both options.

At such time that user-defined predicate functions are supported for this
purpose, which apparently isn't in the first version, the answer would have to
be yes its a shared namespace, at least for those.

As far as the ones that are explicitly both system/language-defined and NOT
user-overridable, it would make sense for those to be a separate namespace, and
also automatically available without any "use" statement, or implied by "use
5.40" etc.

There would then have to be a way syntactically to distinguish the two
categories, say when one says "int" in a type/check reference.

-- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

darren at darrenduncan

May 20, 2023, 5:28 PM

Post #42 of 43 (628 views)

On 2023-05-20 5:22 p.m., Darren Duncan wrote:
> On 2023-05-20 2:55 p.m., Branislav Zahradník wrote:
>> Another question:
>>
>> should constraint names share namespace with functions or not ?
>>
>> eg: int (builtin) vs int (constraint)
>>
>> Both approaches have pros:
>> - when yes, check can be imported by name
>> - when no, you can have clear separation allowing to mix new code into old
>> codebase without conflicts
>>
>> Moreover, introducing idea of separated symbol namespaces (with possibility to
>> declare own namespace)
>> will bring another possibility to extending Perl
>
> Maybe the answer can be some of both options.
>
> At such time that user-defined predicate functions are supported for this
> purpose, which apparently isn't in the first version, the answer would have to
> be yes its a shared namespace, at least for those.
>
> As far as the ones that are explicitly both system/language-defined and NOT
> user-overridable, it would make sense for those to be a separate namespace, and
> also automatically available without any "use" statement, or implied by "use
> 5.40" etc.
>
> There would then have to be a way syntactically to distinguish the two
> categories, say when one says "int" in a type/check reference.

Perhaps this might also be something builtin:: is useful for, assuming users
can't override those, as a way of just using the shared namespace exclusively,
which in the interest of general extensibility may be preferred. That is,
design this as if generic function support is how it is done, but the first
version raises an error if anything other than a built-in is used, which allows
some parts of the implementation to shortcut and not actually work for anything
other than builtins. I'm not advocating this but its an idea. -- Darren Duncan

Re: Native data checking in Perl [ In reply to ]

curtis.poe at gmail

May 20, 2023, 11:34 PM

Post #43 of 43 (628 views)

On Sat, May 20, 2023 at 11:55?PM Branislav Zahradník <happy.barney@gmail.com>
wrote:

> should constraint names share namespace with functions or not ?
>
> eg: int (builtin) vs int (constraint)
>
> Both approaches have pros:
> - when yes, check can be imported by name
> - when no, you can have clear separation allowing to mix new code into old
> codebase without conflicts
>

They should not share the same namespace. "checks imported by name" is a
pain. I'm tired of going through codebases and manually curating the
imports from Type::Tiny and friends, just as I'm tired of writing a new
signatures using a type I *know* exists but which I forgot to import.
Languages which allow you to specify types don't have this limitation for a
good reason: it's painful manual drudgery which the developer should not
need to handle. There are currently almost 30 built-in checks, that's
probably going to grow, and user-defined checks will only add to the
problem.

There would be no need to specify lists of which checks you allow or not
because they do not share the same namespace.

Best,
Ovid