Mailing List Archive: Data checks semantics

Data checks semantics

May 25, 2023, 1:22 PM

Post #1 of 31 (792 views)

Hi all,

No one responded to my email linking the full data checks spec. I'm not
surprised because it was linking to a huge and overwhelming document. I've
now had the time to summarize the key semantics that we've defined.

I hope these can help clarify some things and help the conversation move
forward. Note that the syntax is how we defined it in the Data::Checks
module. Consider it "for information purposes" only. We're largely focusing
on semantics here. I came up with eight key points, with the third point
being troublesome.

*1. Checks are on the variable, not the data*

my $foo :of(INT) = 4;
$foo = 'hello'; # fatal

However:

my $foo :of(INT) = 4;
my $bar = $foo;
$bar = 'hello'; # legal

This is because we don't want checks to have "infectious" side effects that
might surprise you. The developer should have full control over the data
checks.

*2. No type inference*

No surprises. The developer should have full control over the data checks.

*3. Checks are on assignment to the variable*
*This is probably the most problematic bit.*

A check applied to a variable is not an invariant on that variable. It's a
prerequisite for assignment to that variable.

An invariant on the variable would guarantee that the contents of the
variable must always meet a given constraint; a "prerequisite for
assignment" only guarantees that each element must be assigned values that
meet the constraint at the moment they are assigned.

So an array such as `my @data :of(HASH[INT])` only requires that each
element of `@data` must be assigned a hashref whose values are integers. If
you were to subsequently modify an element like so (with the caveat that
the two lines aren't exactly equivalent):

$data[$idx] = { $key => 'not an integer' }; # fatal
$data[$idx]{$key} = 'not an integer"; # not fatal !

The second assignment is not modifying `@data` directly, only retrieving a
value from it and modifying the contents of an entirely different variable
through the retrieved reference value.

We *could* specify that checks are invariants, instead of prerequisites,
but that would require that any reference value stored within a checked
arrayref or hashref would have to have checks automatically and recursively
applied to them as well, which would greatly increase the cost of checking,
and might also lead to unexpected action-at-a-distance, when the
now-checked references are modified through some other access mechanism.

Moreover, we would have to ensure that such auto-subchecked references were
appropriately “de-checked” if they are ever removed from the checked
container. And how would we manage any conflict if the nested referents
happened to have their own (possibly inconsistent) checks?

So the checks are simply assertions on direct assignments, rather than
invariants over a variable’s entire nested data structure.

This is unsatisfying, but we're playing with the matches we have, not the
flamethrower we want.

*4. Signature checks*
We need to work out the syntax, but the current plan is something like this:

sub count_valid :returns(UINT) (@customers :of(OBJ[Customer])) {
...
}

The `@customers` variable should maintain the check in the body of the sub,
but the return check is applied once and only once on the data returned at
the time that it's returned.

*5. Scalars require valid assignments*
my $total :of(NUM); # fatal, because undef fails the check

This is per previous discussions. Many languages allow this:

int foo;

But as soon as you assign something to `foo`, it's fatal if it's not an
integer. For Perl, that's a bit tricky as there's no clear difference
between uninitialized and undef. While using that variable prior to
assignment is fatal in many languages, that would be more difficult in
Perl. Thus, we require a valid assignment.

As a workaround, this is bad, but valid:

my $total :of(INT|UNDEF);

This restriction doesn't apply to arrays or hashes because being empty
trivially passes the check.

*6. Fatal*
By default, a failed check is fatal. We have provisions to downgrade them
to warnings or disable them completely.

*7. Internal representation*
my $foo :of(INT) = "0";
Dump($foo);

"0" naturally coerces to an integer, so that's allowed. However, we don't
plan (for the MVP) to guarantee that Dump shows an IV instead of a PV.
We're hoping that can be addressed post-MVP.

*8. User-defined checks*

Users should be able to define their own checks:

check LongStr :params($N :of(PosInt)) :isa(STR) ($n) { length $n >= $N }

The above would allow this:

my $name :of(LongStr[10]) = get_name(); # must be at least 10 characters

The body of a check definition should return a true or false value, or
die/croak with a more useful message.

A user-defined check is *not* allowed to change the value of the variable
passed in. Otherwise, we could not safely disable checks on demand
(coercions are not planned for the MVP, but we have them specced and they
use a separate syntax).

I was thinking user-defined checks should be post-MVP, but it's unclear to
me how useful checks would be without them. That's a discussion for later.

Best,
Ovid

Re: Data checks semantics [ In reply to ]

perl5-porters at perl

May 25, 2023, 1:30 PM

Post #2 of 31 (792 views)