Mailing List Archive

Pre-PPC: native data checks for Perl
Hi all,

Happy New Year!

This is a pre-PPC for asking if we want a full PPC focused on data checks,
an innovative approach to ensuring data integrity in Perl without
explicitly using the overloaded term "type". This would have been sent
sooner, but internal bike-shedding combined with some contentious
discussion on P5P left me needing to recharge my batteries.

After considerable development, including a comprehensive spec and a robust
implementation passing nearly 200,000 tests, we're at a pivotal point. Our
solution currently employs attributes for data checks, as seen here:

sub fibonacci :returns(UINT) ($nth :of(PositiveInt)) {
...
}

However, this approach has sparked debate. We considered alternatives like
prefix or postfix data checks for better clarity and efficiency:

sub fibonacci (PositiveInt $nth) :returns(UINT) {
...
}

or

sub fibonacci ($nth PositiveInt) :returns(UINT) {
...
}

The crux of our discussion is the syntax for return types. We're striving
to integrate seamlessly with Perl's existing syntax while introducing this
powerful new feature. Most proposed syntaxes for declaring data checks on
returned values have had issues of one sort or another.

The proposal would offer a rich array of features including a core type
hierarchy, user-defined types, parameterized checks, check expressions,
coercions, and more, all designed to enhance Perl's robustness and
reliability.

- We have a spec:
https://gist.github.com/thoughtstream/08b7fd48b09c99ae47d6d9f82b913986
(warning: it's massive)
- We have an implementation: https://github.com/Perl-Apollo/oshun
(almost 200,000 tests are passing)
- We have a problem: the syntax

I believe this would represent a huge step forward for Perl, and I'm
looking forward to your feedback.

Best,
Ovid
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Wed, 3 Jan 2024 16:48:08 +0100
Ovid <curtis.poe@gmail.com> wrote:

> However, this approach has sparked debate. We considered alternatives
> like prefix or postfix data checks for better clarity and efficiency:
>
> sub fibonacci (PositiveInt $nth) :returns(UINT) {
> ...
> }

I most prefer that as a syntax, for params at least. Unsure about the
return value though. Hrmmmm.

> The crux of our discussion is the syntax for return types. We're
> striving to integrate seamlessly with Perl's existing syntax while
> introducing this powerful new feature. Most proposed syntaxes for
> declaring data checks on returned values have had issues of one sort
> or another.

Yeah... I can imagine.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Wed, Jan 3, 2024 at 5:42?PM Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:


> > sub fibonacci (PositiveInt $nth) :returns(UINT) {
> > ...
> > }
>
> I most prefer that as a syntax, for params at least.


I have absolutely no position at this point about parameter or regular
variable check syntax. I'm so burnt out on the conversation that so long as
we have an agreement on something that's a forward-compatible syntax, I'll
probably by into it.


> > The crux of our discussion is the syntax for return types. We're
> > striving to integrate seamlessly with Perl's existing syntax while
> > introducing this powerful new feature. Most proposed syntaxes for
> > declaring data checks on returned values have had issues of one sort
> > or another.
>
> Yeah... I can imagine.
>

Yup. Here's Python (similar to Swift and other languages):

def fibonacci(nth: int) -> int:
...

Ignoring that those are only type hints for linters (which Perl probably
can't have), we use the -> syntax for method calls and dereferencing. I'm
unsure that would fly.

Here's Java:

int fibonacci (int nth) {
...

Replacing sub or method keyword with the name of the data check is dead in
the water.

Kotlin:

fun fibonacci (nth: Int): Int {
...

That looks nice, but it also looks like an attribute.

Other languages do things like func fibonacci<INT> (nth: UINT) ...

I don't see that working well, either.

Syntax of return checks is the single biggest blocker. Syntax of variable
and parameter checks, either attributes, prefix, or postfix, are the
secondary blocker. I just want to push something forward, so I'd like these
two issues sorted and then the bike-shedding over a PPC could begin.

Best,
Ovid
Re: Pre-PPC: native data checks for Perl [ In reply to ]
syntax should be consistent - same behaviour should be expressed same way,
regardless of context.

syntax is also irrelevant - should it be clear and easy to learn (and
consistent and optional)
what is important is behaviour - independent symbol space, runtime symbol
availability, and so on.

also syntax should not favour mvp making next step(s) inconsistent / hard
to read / hard to implement.
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On 2024-01-03 11:00 a.m., Branislav Zahradník wrote:
> syntax should be consistent - same behaviour should be expressed same way,
> regardless of context.

I agree, this is one of the most important things, the consistency.

And that is one reason I strongly believe that it is best to have the type AFTER
the thing it describes rather than before, in all contexts (parameters, return
types, variables, etc), because that scales best for looking nice to having
check definitions that are more than a few characters long.

> syntax is also irrelevant - should it be clear and easy to learn (and consistent
> and optional)
> what is important is behaviour - independent symbol space, runtime symbol
> availability, and so on.
>
> also syntax should not favour mvp making next step(s) inconsistent / hard to
> read / hard to implement.

I generally agree with all of this.

-- Darren Duncan
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Wed, 3 Jan 2024 at 22:27, Darren Duncan <darren@darrenduncan.net> wrote:

> On 2024-01-03 11:00 a.m., Branislav Zahradník wrote:
> > syntax should be consistent - same behaviour should be expressed same
> way,
> > regardless of context.
>
> I agree, this is one of the most important things, the consistency.
>
> And that is one reason I strongly believe that it is best to have the type
> AFTER
> the thing it describes rather than before, in all contexts (parameters,
> return
> types, variables, etc), because that scales best for looking nice to
> having
> check definitions that are more than a few characters long.
>
>
I didn't want be so direct :-)
It also scales best with requirements (if valid):
- optional data contract definition
- independent symbol space for data contract names
- capability to intermix contract names and unaugmented perl code, eg:
(meaning) Int where { 10 < $_ < 10 }

along with LeoNerd's attribute work, this also leads back to my earliest
request for help regarding independent symbol spaces, along with declare /
declared syntax

Best regards,
Brano
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Wed, 3 Jan 2024 16:42:15 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> > However, this approach has sparked debate. We considered
> > alternatives like prefix or postfix data checks for better clarity
> > and efficiency:
> >
> > sub fibonacci (PositiveInt $nth) :returns(UINT) {
> > ...
> > }
>
> I most prefer that as a syntax, for params at least. Unsure about the
> return value though. Hrmmmm.

Actually the more I think on this, the more I think that actually
putting a constraint name /after/ the thing that is being constrained
makes more sense. It's extra useful information that doesn't get in the
way upfront understanding of the thing. So maybe constraint checks come
afterwards:

sub fibonacci($nth PositiveInt) UINT
{
...
}

It means as a human reader you can kindof ignore all those bits and
skim over them, and they don't get too much in the way.

sub fibonacci($nth ##--------) ##--
{ ... }

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Wed, 3 Jan 2024 16:48:08 +0100
Ovid <curtis.poe@gmail.com> wrote:

> Hi all,
>
> Happy New Year!
>
> This is a pre-PPC for asking if we want a full PPC focused on data
> checks, an innovative approach to ensuring data integrity in Perl
> without explicitly using the overloaded term "type".

Sure, feel free to write a PPC as this overall ability feels like
something we want in the language. But keep in mind there'll be lots of
back-and-forth about small details.

Also keep in mind that a spec alone does not lead to an implementation.
I'm already still working on `feature 'class'` and imagine I will be
for some time yet. I would like to see "value constraints" as a Perl
feature, but I wouldn't want to see a spec sit around unimplemented for
years as an untouched PPC document.

(I would also add on a personal note that, given a choice between
writing a spec for a new thing vs helping finish the implementation of
an already-in-progress thing, perhaps picking the latter would be
best?)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Thu, Jan 4, 2024 at 6:23?PM Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:


> (I would also add on a personal note that, given a choice between
> writing a spec for a new thing vs helping finish the implementation of
> an already-in-progress thing, perhaps picking the latter would be
> best?)
>
>
Yes, I completely agree with you. I noticed there wasn't much progress
lately on the class feature and I mistakenly assumed that was because you
were busy. I'm sorry about that. I'll shoot you an email later and work
with you to create a plan to finish the class syntax.

Best,
Ovid
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On 2024-01-04 17:55, Paul "LeoNerd" Evans wrote:

> [...]
> Actually the more I think on this, the more I think that actually
> putting a constraint name /after/ the thing that is being constrained
> makes more sense. It's extra useful information that doesn't get in the
> way upfront understanding of the thing. So maybe constraint checks come
> afterwards:
>
> sub fibonacci($nth PositiveInt) UINT
> {
> ...
> }
>
> It means as a human reader you can kindof ignore all those bits and
> skim over them, and they don't get too much in the way.
>
> sub fibonacci($nth ##--------) ##--
> { ... }

Presumably one could still code like:

  PositiveInt my ($x, \@ids);

where PositiveInt is an (only run-time?) sub-call
that can apply constraints to its parameters.


And

  my ($x, @ids) :PositiveInt;


And even

  my PositiveInt ($x, @ids) :DUMMY;


-- Ruud
Re: Pre-PPC: native data checks for Perl [ In reply to ]
- there can be check PositiveInt, sub PositiveInt (glob PositiveInt and
format PositiveInt at the same time)
- syntax should be consistent for every usage (pseudocode)
- implicit variables: $x : PositiveInt
- scoped variables: my / our / local / state / field
- multiple checks per declaration: my ($length : PositiveInt, @values :
Array of NegativeInt) = ...
- signatures
- return values
- "cast" operation: ($x + $y) : PositiveInt

Let's wait for Ovid's pre-RFC, and prepare use-cases until then.

Personally I prefer name contract and probably extended attribute syntax
not relying on balanced THING.
(eg: imho it's better to say contract Class [ can (foo (Int)) ] than check
Class [ can (foo (Int)) ])
Re: Pre-PPC: native data checks for Perl [ In reply to ]
> On 5 Jan 2024, at 03:55, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:
>
> ?On Wed, 3 Jan 2024 16:42:15 +0000
> "Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:
>>>
[…]
> Actually the more I think on this, the more I think that actually
> putting a constraint name /after/ the thing that is being constrained
> makes more sense. It's extra useful information that doesn't get in the
> way upfront understanding of the thing. So maybe constraint checks come
> afterwards:
>
> sub fibonacci($nth PositiveInt) UINT
> {
> ...
> }
>
> It means as a human reader you can kindof ignore all those bits and
> skim over them, and they don't get too much in the way.
>
> sub fibonacci($nth ##--------) ##--
> { ... }
>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

+1 from me. Even in languages that arrange these elements differently, this is how my monkey brain parses the information.

James
Re: Pre-PPC: native data checks for Perl [ In reply to ]
It comes across to me that the vast majority of those who commented on this
proposal feel that NAME CHECK order is strongly preferred over CHECK NAME order,
for a number of both pragmatic and aesthetic reasons, plus similarity with a
subset of other languages, plus its what the Oshun proof of concept currently does.

While it comes across to me that the only arguments for CHECK NAME order is that
it is similar to a different subset of other languages and that the Perl parser
already implements it to a degree as an unrealized feature.

Given this, I feel that using NAME CHECK order for Oshun seems to be the clear
winner to go with; while not unanimous it has a very clear strong lead.

-- Darren Duncan

On 2024-01-05 11:35 p.m., James Watson wrote:
> On 5 Jan 2024, at 03:55, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:
>> Actually the more I think on this, the more I think that actually
>> putting a constraint name /after/ the thing that is being constrained
>> makes more sense. It's extra useful information that doesn't get in the
>> way upfront understanding of the thing. So maybe constraint checks come
>> afterwards:
>>
>> sub fibonacci($nth PositiveInt) UINT
>> {
>> ...
>> }
>>
>> It means as a human reader you can kindof ignore all those bits and
>> skim over them, and they don't get too much in the way.
>>
>> sub fibonacci($nth ##--------) ##--
>> { ... }
>
> +1 from me. Even in languages that arrange these elements differently, this is how my monkey brain parses the information.
Re: Pre-PPC: native data checks for Perl [ In reply to ]
While it comes across to me that the only arguments for CHECK NAME order is
> that
> it is similar to a different subset of other languages and that the Perl
> parser
> already implements it to a degree as an unrealized feature.
>
>
As I mentioned earlier BAREWORD NAME syntax should be reserved for altering
underlying variable storage implementation.

As an example hypothetical HASH implementations:
my builtin::hash::insert_order %hash_with_keys_in_insert_order;
my builtin::hash::ignore_case %hash;
my builtin::hash::lru_order %hash_with_keys_in_lru_order;

that also rises question: what is Readonly? storage implementation or data
check?

Best regards,
Brano
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On 2024-01-06 10:41 p.m., Branislav Zahradník wrote:
> that also rises question: what is Readonly? storage implementation or data check?

Between those 2 options, Readonly is absolutely a storage implementation.

It implements a write-once read-many store.

Readonly is definitely not a data check because it isn't acting to limit the
content of a variable to values that satisfy a specific type/constraint.

If Readonly were a data check then its effect would be to define the trivial
type/constraint corresponding to the empty set (the counterpart of the universal
type/constraint), meaning there is no possible value that could satisfy being
held in the variable.

Alternately it could be fair to argue that Readonly is a third option, neither
storage implementation nor data check, however otherwise it is clearly a storage
implementation, as far as users are concerned.

-- Darren Duncan
Re: Pre-PPC: native data checks for Perl [ In reply to ]
> Alternately it could be fair to argue that Readonly is a third option,
> neither
> storage implementation nor data check, however otherwise it is clearly a
> storage
> implementation, as far as users are concerned.
>

That question is something that bothers me since Oshun was announced.
Being naive optimist I believe this PPC will evolve beyond basic
implementation and that
we should look forward in naming as well (I favour "data contract" as it
will look good in
propagation / documentation as well)


>
> -- Darren Duncan
>
>
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On 2024-01-06 11:11 p.m., Branislav Zahradník wrote:
> That question is something that bothers me since Oshun was announced.
> Being naive optimist I believe this PPC will evolve beyond basic implementation
> and that
> we should look forward in naming as well (I favour "data contract" as it will
> look good in
> propagation / documentation as well)

For my part I have no problem with using the term "data contract" which you
propose; it seems no worse than "check". -- Darren Duncan
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Sun, Jan 7, 2024 at 8:34?AM Darren Duncan <darren@darrenduncan.net>
wrote:


> For my part I have no problem with using the term "data contract" which
> you
> propose; it seems no worse than "check". -- Darren Duncan
>

As an aside, if possible, I'll probably be spending time trying to help
Paul with the class feature, so can't return to this right away.

However, "data contract" can't work. Due to how Perl variables work, we can
easily check assignments to the top-level structure, but for nested
structures, it becomes prohibitively expensive to check them. Thus, we a
system which makes it much easier to validate that our data is what we
expect it to be, but if we have a complex data structure and we're
assigning internally to the references, we can't easily promise that the
data is still correct. See also:
https://gist.github.com/thoughtstream/08b7fd48b09c99ae47d6d9f82b913986#applying-a-check-to-a-parameter-variable

Thus, this is not a contract. Trying to recurse through large data
structures is simply too expensive.

Best,
Ovid
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On Sun, 7 Jan 2024 at 11:23, Ovid <curtis.poe@gmail.com> wrote:

> On Sun, Jan 7, 2024 at 8:34?AM Darren Duncan <darren@darrenduncan.net>
> wrote:
>
>
>> For my part I have no problem with using the term "data contract" which
>> you
>> propose; it seems no worse than "check". -- Darren Duncan
>>
>
> As an aside, if possible, I'll probably be spending time trying to help
> Paul with the class feature, so can't return to this right away.
>
> However, "data contract" can't work. Due to how Perl variables work, we
> can easily check assignments to the top-level structure, but for nested
> structures, it becomes prohibitively expensive to check them. Thus, we a
> system which makes it much easier to validate that our data is what we
> expect it to be, but if we have a complex data structure and we're
> assigning internally to the references, we can't easily promise that the
> data is still correct. See also:
> https://gist.github.com/thoughtstream/08b7fd48b09c99ae47d6d9f82b913986#applying-a-check-to-a-parameter-variable
>
> Thus, this is not a contract. Trying to recurse through large data
> structures is simply too expensive.
>

you probably meant "how Perl variables currently work" :-)

Approach I used elsewhere: backreferences.

my @foo;
my $foo = { foo => \ @foo };
- @foo on assign will call "owner -> check contract"

push @{ $foo->{foo}, ...;
- again owners of @foo will be queried to check contract"

this way you will perform contract check only on affected paths

end I'll repeat myself again, it is about end game. I doesn't need to be
fully implemented from beginning (putting it as "todo" topic will be enough)




>
> Best,
> Ovid
>
Re: Pre-PPC: native data checks for Perl [ In reply to ]
Op 07-01-2024 om 11:22 schreef Ovid:
> On Sun, Jan 7, 2024 at 8:34?AM Darren Duncan <darren@darrenduncan.net>
> wrote:
>
> For my part I have no problem with using the term "data contract"
> which you
> propose; it seems no worse than "check". -- Darren Duncan
>
>
> As an aside, if possible, I'll probably be spending time trying to
> help Paul with the class feature, so can't return to this right away.
>
> However, "data contract" can't work. Due to how Perl variables work,
> we can easily check assignments to the top-level structure, but for
> nested structures, it becomes prohibitively expensive to check them.
> Thus, we a system which makes it much easier to validate that our data
> is what we expect it to be, but if we have a complex data structure
> and we're assigning internally to the references, we can't easily
> promise that the data is still correct. See also:
> https://gist.github.com/thoughtstream/08b7fd48b09c99ae47d6d9f82b913986#applying-a-check-to-a-parameter-variable
>
> Thus, this is not a contract. Trying to recurse through large data
> structures is simply too expensive.


I may be wrong, but doesn't this depend on the check itself? If the
check implements deep checking, it implements deep checking, if it
doesn't, it doesn't. Which is just a part of the contract. So I would
say contract is fine, and I think preferable.


HTH,

M4
Re: Pre-PPC: native data checks for Perl [ In reply to ]
On 07.01.24 21:19, Martijn Lievaart via perl5-porters wrote:
>
>
> I may be wrong, but doesn't this depend on the check itself? If the
> check implements deep checking, it implements deep checking, if it
> doesn't, it doesn't. Which is just a part of the contract. So I would
> say contract is fine, and I think preferable.

Consider this case:

my @array = (1, 2, 3);
my $ref :ArrayRef(Int) = \@array; # presumably OK
$array[0] = "hello"; # validate here??
say $ref->[0]; # not an integer anymore??

If you start modifying structures that are referenced from elsewhere
(arbitrarily deeply), you can't easily find all the "checked" places
from which a particular scalar is reachable (in order to re-run all
their checks) and it would be slow. So even if a check implements deep
checking, that only saves you if the check is actually triggered and
gets to run, which it probably won't if inner structures are also
reachable through a non-checked path.
Re: Pre-PPC: native data checks for Perl [ In reply to ]
I may be wrong, but doesn't this depend on the check itself? If the check
> implements deep checking, it implements deep checking, if it doesn't, it
> doesn't. Which is just a part of the contract. So I would say contract is
> fine, and I think preferable.
>
>
>
I think Ovid refers situation when your data structure contains ref with
weaker contract than contract of data structure element, eg:
my %hash; # contract: key foo is arrayref of positive integers less than 10
my @list; # contract: list of integers:

@list = (1 .. 9);
%hash = { foo => \ @list }; # at assignment contract if valid

push @list, 10;
# this breaks contract of %hash


> HTH,
>
> M4
>
>
>
Re: Pre-PPC: native data checks for Perl [ In reply to ]
Op 07-01-2024 om 22:47 schreef Lukas Mai:
> On 07.01.24 21:19, Martijn Lievaart via perl5-porters wrote:
>>
>>
>> I may be wrong, but doesn't this depend on the check itself? If the
>> check implements deep checking, it implements deep checking, if it
>> doesn't, it doesn't. Which is just a part of the contract. So I would
>> say contract is fine, and I think preferable.
>
> Consider this case:
>
>     my @array = (1, 2, 3);
>     my $ref :ArrayRef(Int) = \@array;  # presumably OK
>     $array[0] = "hello";  # validate here??
>     say $ref->[0];  # not an integer anymore??
>
> If you start modifying structures that are referenced from elsewhere
> (arbitrarily deeply), you can't easily find all the "checked" places
> from which a particular scalar is reachable (in order to re-run all
> their checks) and it would be slow. So even if a check implements deep
> checking, that only saves you if the check is actually triggered and
> gets to run, which it probably won't if inner structures are also
> reachable through a non-checked path.


That's not the point I tried to make. The issue is, is the name
'contract' appropriate even if checks aren't deep. Which I tried to
argue, it is.


HTH,

M4