Mailing List Archive

Text vs bytes WAS Re: PPC Elevator Pitch for Perl::Types
In all my years of writing Perl, and JS code to consume Perl APIs, the only truly vexing type problem I found was Perl’s inability to distinguish text from byte strings.

FWIW.

-FG

> On Aug 16, 2023, at 00:38, Oodler 577 via perl5-porters <perl5-porters@perl.org> wrote:
>
> ?# Proposed Perl Changes Elevator Pitch for Perl::Types
>
> This document is the first step in the Proposed Perl Changes process:
>
> https://github.com/Perl/PPCs/blob/main/README.md
>
> Respectfully submitted by the _Perl::Types Committee_:
> * Will Braswell (WBRASWELL), Co-Founder & Chairman
> * Brett Estrade (OODLER), Co-Founder & 1st Vice Chairman
> * Zakariyya Mughal (ZMUGHAL), Co-Founder & 2nd Vice Chairman
> * John Napiorkowski (JJNAPIORK)
> * Darren Duncan (DUNCAND)
> * Nedzad Hrnjica (NHRNJICA)
> * Rohit Manjrekar (MANJREKAR)
> * Paul Millard (MAGUDAS)
> * Joshua Day (HAX)
> * Tyler Bird (BIRDTY)
> * Robbie Hatley (HATLEYSFT)
> * David Warner (WESTBASE)
> * Daniel Mera (DMERA)
> * Duong Vu (DVU)
> * Rajan Shah (RSHAH)
>
> ## Here is a problem
>
> Perl lacks sufficient exposure of the already-existing real natural
> Perl data types for use by the programmer. This has lead to false
> claims that the Perl interpreter "has no data types". This has
> also lead to countless programmer-hours spent devising synthetic
> or unnatural type systems that rely entirely on fuzzy data guessing
> via regular expressions, etc.
>
> Fortunately, the Perl compiler already provides the capability to
> expose the underlying real native C data types which can be used
> by Perl programmers to incrementally improve performance, eventually
> achieving the full native speed of compiled C code. Among other
> features, the Perl compiler also enables real natural data type
> checking with identical behavior in both dynamic (intrepreted) mode
> and static (compiled) mode.
>
> https://metacpan.org/dist/RPerl
>
> The data type subsystem of the Perl compiler is currently in the
> process of being extracted and refactored as an independant CPAN
> distribution called `Perl::Types`. This distribution provides new,
> core capabilities and thus should be included in the Perl core
> distribution as a "Dual-Life" module.
>
> https://github.com/Dual-Life
>
> ## Here is the syntax that we're proposing
>
> The Perl interpreter already has an unused slot in the grammar for
> this very purpose:
>
> `my TYPE $var;`
>
> The Perl interpreter and the Perl compiler already provide the
> basic data types to be used in the grammar slot above:
>
> * `boolean` (`SV` when `SvIsBOOL(sv)` is true, new in 5.36)
> * `integer` (`IV`)
> * `number` (`NV`)
> * `string` (`PV`)
> * `array` (`AV`) & `arrayref` (`RV(AV)`)
> * `hash` (`HV`) & `hashref` (`RV(HV)`)
> * user-defined classes
>
> Custom data structures are declared with compound or nested data
> types composed from the basic types above, for example:
>
> * `integer::arrayref`
> * `integer::arrayref::arrayref`
> * `integer::arrayref::hashref`
> * `integer::arrayref::arrayref::hashref`
> * `string::arrayref`
> * `MyClass::arrayref`
> * etc.
>
> Attempting to utilize incompatible data types gives the same behavior
> and same errors in both interpreted mode and compiled mode, for
> example:
>
> ```
> #!/usr/bin/perl
> use Perl::Types;
>
> sub squared {
> { my number $RETURN_TYPE };
> ( my number $base ) = @ARG;
> return $base ** 2;
> }
>
> squared(2); # FINE
> squared(2.3); # FINE
> squared(to_number('2')); # FINE
> my number $foo = 23;
> squared($foo); # FINE
>
> squared(); # ERROR ENVxx, TYPE-CHECKING MISMATCH: number value expected but ...
> squared('2'); # ERROR ENVxx, TYPE-CHECKING MISMATCH: number value expected but ...
> my string $bar = 'howdy';
> squared($bar); # ERROR ENVxx, TYPE-CHECKING MISMATCH: number value expected but ...
> squared([2]); # ERROR ENVxx, TYPE-CHECKING MISMATCH: number value expected but ...
> ```
>
> The syntax for arrays and hashes is similarly straightforward:
> ```
> sub multiply_array {
> { my number::arrayref $RETURN_TYPE };
> ( my integer $input_integer, my number::arrayref $input_array ) = @ARG;
> my number::arrayref $output_array = [
> $input_integer * $input_array->[0],
> $input_integer * $input_array->[1],
> $input_integer * $input_array->[2]
> ];
> return $output_array;
> }
>
> sub multiply_hash {
> { my number::hashref $RETURN_TYPE };
> ( my integer $input_integer, my number::hashref $input_hash ) = @ARG;
> my number::hashref $output_hash = {
> a => $input_integer * $input_hash->{a},
> b => $input_integer * $input_hash->{b},
> c => $input_integer * $input_hash->{c}
> };
> return $output_hash;
> }
> ```
>
> ## Here are the benefits of this
>
> The primary benefit of including `Perl::Types` in the Perl core
> distribution is that it will provide a greatly-needed capability
> of exposing the underlying C-level data types and data structures
> to every Perl programmer, so they may utilize Perl data types to
> achieve a number of benefits including but not limited to:
>
> * increased performance
> * code correctness
> * type safety (type checking)
> * memory safety (bounds checking)
> * documentation of code intent
> * potential for polymorphism
> * potential for derived or synthetic types
>
> Additionally, this foundational support for enabling Perl data
> types at the interpreter level will allow for the creation of
> synthetic or unnatural data types and data structures to be built
> on top of `Perl::Types` itself. For example, the capabilities
> provided in `Perl::Types` are necessary for any effort to introduce
> real natural type checking constraints related to Perl's new `class`
> keyword.
>
> ## Here are potential problems
>
> Because the Perl interpreter has already implemented both the real
> natural Perl data types, as well as the required `my TYPE $var`
> syntax construct, we therefore do not foresee the introduction of
> any significant issues by including `Perl::Types` in the Perl core
> distribution.
>
> The largest barrier to adoption for the Perl compiler is the need
> for numerous non-Perl dependencies, such as the C++ compiler and
> other libraries. This barrier will be completely removed for the
> `Perl::Types` distribution, due to being refactored out of the
> compiler distribution and removal of all non-core dependencies.
>
> On Behalf of the _Perl::Types Committee_,
> Brett Estrade
>
> --
> oodler@cpan.org
> oodler577@sdf-eu.org
> SDF-EU Public Access UNIX System - http://sdfeu.org
> irc.perl.org #openmp #pdl #native
Re: Text vs bytes WAS Re: PPC Elevator Pitch for Perl::Types [ In reply to ]
On 2023-08-16 3:40 a.m., Felipe Gasper via perl5-porters wrote:
> In all my years of writing Perl, and JS code to consume Perl APIs, the only truly vexing type problem I found was Perl’s inability to distinguish text from byte strings.
> FWIW.
> -FG

I agree that this is very important.

The Perl 5.36 update that let us finally distinguish booleans from numbers and
text was a huge step forward. Thank you again to everyone who made that work.

I feel that a similar kind of core enhancement to the booleans one that lets us
thoroughly distinguish the intent of a string to be octets/raw versus characters
would be extremely valuable and something that I could use.

And this matter is orthogonal to the Perl::Types and Oshun proposals/discussions
but having it would be a big support for both of them, and for my independent
work on data interchange/serialization mechanisms/formats.

-- Darren Duncan