Mailing List Archive

No implicit coercion?
Hi all,

This is not a pre-RFC, but I think most of us have been bitten at times by bugs like this:

    my @values = (in => "data.csv");
    $values[2]++;

Except that $values[2] was the string "n/a" and now it's "1".

The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:

    use explicit;
    my $num = 3;   # integer
    $num += .42;    # fatal because it creates a float

Instead, we have to do this:

    use explicit;
    my $num = 4;
    $num = float($num);
    # or `$num = 4.0;`
    $num += .42; # works just fine

This would also be fatal:

    use explicit;
    my $value = {};
    say ++$value;

Is this possible? I realize there are many edge cases.

Best,
Ovid
-- 
IT consulting, training, specializing in Perl, databases, and agile development
http://www.allaroundtheworld.fr/. 

Buy my book! - http://bit.ly/beginning_perl
Re: No implicit coercion? [ In reply to ]
> On Dec 27, 2021, at 13:35, Ovid via perl5-porters <perl5-porters@perl.org> wrote:
>
> Hi all,
>
> This is not a pre-RFC, but I think most of us have been bitten at times by bugs like this:
>
> my @values = (in => "data.csv");
> $values[2]++;
>
> Except that $values[2] was the string "n/a" and now it's "1".
>
> The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:
>
> use explicit;
> my $num = 3; # integer
> $num += .42; # fatal because it creates a float
>
> Instead, we have to do this:
>
> use explicit;
> my $num = 4;
> $num = float($num);
> # or `$num = 4.0;`
> $num += .42; # works just fine
>
> This would also be fatal:
>
> use explicit;
> my $value = {};
> say ++$value;
>
> Is this possible? I realize there are many edge cases.

+1

IMO a “stricter” mode that refuses to do “nonsensical” things like numeric operations on non-numeric SVs, scalar ops on references, etc. would be one of the most helpful features Perl could add. (Magic notwithstanding, of course.)

We all know Perl “happily confuses” numbers and numeric strings, but other cases seem like places where the language could (fairly?) readily assist with bug-hunting.

Is there a list anywhere of such behaviours?

-F
Re: No implicit coercion? [ In reply to ]
On Monday, 27 December 2021, 19:43:02 CET, Felipe Gasper <felipe@felipegasper.com> wrote:

> IMO a “stricter” mode that refuses to do “nonsensical” things like numeric operations on 
> non- numeric SVs, scalar ops on references, etc. would be one of the most helpful features
> Perl could add. (Magic notwithstanding, of course.)
>
> We all know Perl “happily confuses” numbers and numeric strings, but other cases seem like
> places where the language could (fairly?) readily assist with bug-hunting.
>
> Is there a list anywhere of such behaviours?

Another example is JSON:

    {value:"0"}

If someone sends something as a string and I'm expecting a number, I can only guess if "0" is supposed to be the string "0", the number zero, or "false".

Most of the time the Perl code does the right thing, until it doesn't. And that can cause weird error far away from where the parsing occurs.

(Yes, I'm aware of the new booleans for Perl: https://www.nntp.perl.org/group/perl.perl5.porters/2021/11/msg261993.html)

Best,
Ovid
-- 
IT consulting, training, specializing in Perl, databases, and agile development
http://www.allaroundtheworld.fr/. 

Buy my book! - http://bit.ly/beginning_perl
Re: No implicit coercion? [ In reply to ]
> On Dec 27, 2021, at 14:00, Ovid <curtis_ovid_poe@yahoo.com> wrote:
>
> On Monday, 27 December 2021, 19:43:02 CET, Felipe Gasper <felipe@felipegasper.com> wrote:
>
>> IMO a “stricter” mode that refuses to do “nonsensical” things like numeric operations on
>> non- numeric SVs, scalar ops on references, etc. would be one of the most helpful features
>> Perl could add. (Magic notwithstanding, of course.)
>>
>> We all know Perl “happily confuses” numbers and numeric strings, but other cases seem like
>> places where the language could (fairly?) readily assist with bug-hunting.
>>
>> Is there a list anywhere of such behaviours?
>
> Another example is JSON:
>
> {value:"0"}
>
> If someone sends something as a string and I'm expecting a number, I can only guess if "0" is supposed to be the string "0", the number zero, or "false".
>
> Most of the time the Perl code does the right thing, until it doesn't. And that can cause weird error far away from where the parsing occurs.
>
> (Yes, I'm aware of the new booleans for Perl: https://www.nntp.perl.org/group/perl.perl5.porters/2021/11/msg261993.html)

For “numeric strings” like this I’m not sure it’s as straightforward that a failure should happen.

The recent discussions concerning IOK, IOKp, et al. *may* make it more feasible to fail in these cases, but IMO it would make more sense to solve the less-murky cases first.

-F
Re: No implicit coercion? [ In reply to ]
On Mon, Dec 27, 2021 at 12:35 PM Ovid via perl5-porters <
perl5-porters@perl.org> wrote:

>
> my @values = (in => "data.csv");
> $values[2]++;
>
> Except that $values[2] was the string "n/a" and now it's "1".
>

Huh? $values[2] was undefined.



> The core of the idea is simple. it would be lovely to have something to
> prevent implicit coercion in a given lexical scope:
>
> use explicit;
> my $num = 3; # integer
> $num += .42; # fatal because it creates a float
>

What am I missing?
sub increment_integers_or_die {
defined(wantarray()) and croak("increment_integers_or_die called in
non-void context");
for my $numb (@_){
/\D/ and die "NOT AN INTEGER: >>>$numb<<<\n";
$numb++;
}
}

--
"Lay off that whiskey, and let that cocaine be!" -- Johnny Cash
Re: No implicit coercion? [ In reply to ]
Not commenting on this specific issue, but some
time ago, there was a document collecting perl
idiosyncracies; maybe this should be added?

Maybe the recent thing related to "state" should
also be. I think there have been others recently
that fall into this category.

Cheers,
Brett

* Ovid via perl5-porters <perl5-porters@perl.org> [2021-12-27 18:35:05 +0000]:

> Hi all,
>
> This is not a pre-RFC, but I think most of us have been bitten at times by bugs like this:
>
> ? ? my @values =?(in => "data.csv");
> ? ? $values[2]++;
>
> Except that $values[2] was the string "n/a" and now it's "1".
>
> The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:
>
> ? ? use explicit;
> ? ? my $num = 3;? ?# integer
> ? ? $num += .42;? ? # fatal because it creates a float
>
> Instead, we have to do this:
>
> ? ? use explicit;
> ? ? my $num = 4;
> ? ? $num = float($num);
> ? ? # or `$num = 4.0;`
> ? ? $num += .42; # works just fine
>
> This would also be fatal:
>
> ? ? use explicit;
> ? ? my $value = {};
> ? ? say ++$value;
>
> Is this possible? I realize there are many edge cases.
>
> Best,
> Ovid
> --?
> IT consulting, training, specializing in Perl, databases, and agile development
> http://www.allaroundtheworld.fr/.?
>
> Buy my book! - http://bit.ly/beginning_perl
>

--
--
oodler@cpan.org
oodler577@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native
Re: No implicit coercion? [ In reply to ]
On Monday, 27 December 2021, 20:28:22 CET, David Nicol <davidnicol@gmail.com> wrote:

On Mon, Dec 27, 2021 at 12:35 PM Ovid via perl5-porters <perl5-porters@perl.org> wrote:
> >
> >     my @values = (in => "data.csv");
> >     $values[2]++;
> >
> > Except that $values[2] was the string "n/a" and now it's "1".
>
> Huh? $values[2] was undefined.

I was imagining, from the example, that csv() returned an array ref (should have been an AoA for the example) and that a third field in which you're expecting a number was actually the string "n/a".

Some protest that we need to make sure our data is pristine to avoid that, but I'm sure most of us can tell stories of dirty data that can't be cleaned at the source.

> >  The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:
> >
> >   use explicit;
> >   my $num = 3;   # integer
> >   $num += .42;    # fatal because it creates a float
> >
>
> What am I missing?
>
> sub increment_integers_or_die {
>   defined(wantarray()) and croak("increment_integers_or_die called in non-void context");
>   for my $numb (@_){
>         /\D/ and die "NOT AN INTEGER: >>>$numb<<<\n";
>         $numb++;
>   }
> }
 
I'm sorry, David, but that's a perfect example of what we don't want to do (or maybe I'm just missing what you're trying to say).

We can think of code as business code (I'd say functional, but the terms overloaded) and structural. Business code is the code that does what it says on the ticket we've gotten. Structural code is the stuff that ties all of the business code together (the beautiful house versus the foundation, girders, wiring, etc.). If we have "pre-fab" structural code that solves common business code problems, that's less of that code that we have to write. For example, when was the last time most of us have had to write iterator classes in Perl. I haven't had to do it for years, except for lazy iterators (hmm ...) because I can iterate directly over arrays and hashes.

Sometimes it's nice to have a bit more strictness in the language and since Perl happily, and often silently, coerces our IVs, NVs, and PVs into each other (but I'm pretty sure it doesn't coerce your HVs, AVs, or GVs), even if we don't want that. So we have to write extra structural code to get our functional behavior. This means:

* More work for the developer to implement
* More things for the maintainer to learn
* More bugs as the newly implemented functional code is written incorrectly (as it is in your example)

More and more static languages are benefiting from being able to use dynamic features (the JVM has had invokedynamic for years). More and more dynamic languages are benefitting from being able to use static features. Sometimes that's a huge win.

I've had times where the following has bitten me had in JSON generation (pretty sure most of us have had things like this):

    $ perl -MDevel::Peek -E 'my $x = 3; say Dump($x)'
    SV = IV(0x1308244c0) at 0x1308244d0
      REFCNT = 1
      FLAGS = (IOK,pIOK)
      IV = 3


    $ perl -MDevel::Peek -E 'my $x = 3; warn $x; say Dump($x)'
    3
    SV = PVIV(0x149824c20) at 0x1498244d0
      REFCNT = 1
      FLAGS = (IOK,POK,pIOK,pPOK)
      IV = 3
      PV = 0x600002510270 "3"\0
      CUR = 1
      LEN = 10

In the second example, we now have a PV value for the scalar and we had a string in the JSON instead of an integer. The cause of that was because the client was *logging* their data and I had to dig around to figure out exactly where to put the "0+$var" as late as possible to fix this. Just for once, I would like to not have to worry about those issues any more.

Having at least IV, NV, PV, (and the new boolean type) not auto-coerce into one another for a given lexical scope (or data structure) would be heaven-sent.

Best,
Ovid
-- 
IT consulting, training, specializing in Perl, databases, and agile development
http://www.allaroundtheworld.fr/. 

Buy my book! - http://bit.ly/beginning_perl
--
"Lay off that whiskey, and let that cocaine be!" -- Johnny Cash
Re: No implicit coercion? [ In reply to ]
On Mon, 27 Dec 2021 18:35:05 +0000 (UTC), Ovid via perl5-porters <perl5-porters@perl.org> wrote:

> Hi all,
>
> This is not a pre-RFC, but I think most of us have been bitten at times by bugs like this:
>
>     my @values = (in => "data.csv");

I guess you mean `my @values = csv (in => "data.csv");`
^^^^

>     $values[2]++;
>
> Except that $values[2] was the string "n/a" and now it's "1".
>
> The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:
>
>     use explicit;
>     my $num = 3;   # integer
>     $num += .42;    # fatal because it creates a float

I would oppose. *IF* I would want this, and yes, that happens, I want
strongly types variables and not something more global that changes all
variable behavior.

my Int $num = 3;
$num += .42; # Fatal

> Instead, we have to do this:
>
>     use explicit;
>     my $num = 4;
>     $num = float($num);
>     # or `$num = 4.0;`
>     $num += .42; # works just fine
>
> This would also be fatal:
>
>     use explicit;
>     my $value = {};
>     say ++$value;
>
> Is this possible? I realize there are many edge cases.

Even *if* possible, I would rather see that energy put into typed
variables instead. And I would use them. Typed variables also open
up to multi-subs as in Raku.

> Ovid

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.33 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
Re: No implicit coercion? [ In reply to ]
On Mon, Dec 27, 2021 at 06:35:05PM +0000, Ovid via perl5-porters wrote:

Re-ordering the sections...

> The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:
>
> ? ? use explicit;
> ? ? my $num = 3;? ?# integer
> ? ? $num += .42;? ? # fatal because it creates a float
>
> Instead, we have to do this:
>
> ? ? use explicit;
> ? ? my $num = 4;
> ? ? $num = float($num);
> ? ? # or `$num = 4.0;`
> ? ? $num += .42; # works just fine

making explicit floating point <=> integer conversion workable is fraught
(with complexity and confusion). If Perl *has* a design architecture at all,
part of it is that scalars are just scalars, and you use operators to
declare whether they are to be treated as strings, numbers
(or something else). There isn't *any* existing distinction between whether
your numbers are integers, floating points, (or bigints or anything else
that can be implemented via overloading)

I'm going to write 4.0 to mean a floating point four, and 4 to mean an integer
four here

What's the result of this:

$result = int 4.2;

Is it 4.0 or 4?

I could make a case for either answer. Type conversion, or truncation towards
0?

Similarly abs. sqrt probably isn't ambiguous (you want integer square roots,
you convert to float and truncate back), but there might be others I didn't
think of.


More generally, under `use explict` the return type of addition:

$c = $a + $b;

$a FLOAT INT
$b
FLOAT float croak
INT croak int*

* but croak on overflow

1) The above table means that we have an op that is still polymorphic.
2) It can return int or float (or croak), depending on inputs.
3) It doesn't handle overloaded objects
4) There's no clean way to say "I'd like floating point maths always, other
than"

$c = float($a) + float($b);

as you need to cast first to avoid that corner case of integer overflow

And under `no explicit`, `$a + $b` does whatever it currently does?

(which isn't that table above, even with the croaks removed)


The "overloading" mentioned above feels like a can of worms. I'm really not
sure if the current overloading interface maps to what is needed. Or what
would work. Conversion is declared as:

conversion => 'bool "" 0+ qr',


There's no integer/floating point distinction. And I suspect that trying to
constrain it to those two hinders bigint/bigfloat/bigrat interaction, and I
don't think that just adding a third numeric type is going to be sufficient.



If one takes out the floating point/integer conversion control:


> ? ? my @values =?(in => "data.csv");
> ? ? $values[2]++;
>
> Except that $values[2] was the string "n/a" and now it's "1".

Fatalising string to numeric seems easy both conceptually and implementation
wise. Arguably it's just a matter of spelling - how to turn it on, and what
the error message should say. I think the mechanisms have been in place since
5.6.0:

$ perl -we 'eval { use warnings FATAL => "numeric"; $values[2] = "n/a"; $values[2]++; }; print "We got: $@"'
We got: Argument "n/a" treated as 0 in increment (++) at -e line 1.


However, there's no way to fatalise conversion in the other direction.
It seems a bit strange to name the pragma as "explicit conversion" when it's
more about explicit lossy conversion.

> This would also be fatal:
>
> ? ? use explicit;
> ? ? my $value = {};
> ? ? say ++$value;

Being able to fatalise this (and independently stringification of references)
has been talked about for a while.

I'm not sure if anyone had a plan that considered references to objects with
overloading

As ever, there seems to be an arms race between folks who create such objects

"it's semantically a scalar. Treat it as such. I put effort into making
it walk and quack like a kosher duck"

and folks who receive such objects

"your Trojan duck is actually made of wood. It's inedible"


Other missed corner cases are whether it matters that this would need to be
turned off in order to compare two references (for equality, and for ordering
such as sorting), and generating hash keys from references.

And how to avoid the actual error message getting lost if a reference is
mistakenly stringified when generating an error message. (Likely to be missed
during testing, and only bite in production.)

> Is this possible? I realize there are many edge cases.

I think that attempting to constrain numbers to just "integers" and
"floating point" has too many edge cases to be realistically implementable.
It's also inflexible, as there are a lot more classes of numbers than
"native sized integers" and "native sized binary floating point".


Faulting lossy string to number conversion, and faulting references used as
strings or numbers seems viable, but I think that they belong as 2 (or
even 3) different names.

Nicholas Clark
Re: No implicit coercion? [ In reply to ]
as type coercion is the feature -- it's on a plain Perl, but it's still a
feature -- the "PMAW" for anyone who remembers those e-mails twenty years
ago -- how about invoking this mode with C<no coercion> instead of C<use
explicit> ?

Because "turning on a mode where the thing doesn't happen" requires more
mental gymnastics than "turning off something that by default is on".
Re: No implicit coercion? [ In reply to ]
On Tue, Dec 28, 2021 at 10:07:38AM -0600, David Nicol wrote:
> as type coercion is the feature -- it's on a plain Perl, but it's still a
> feature -- the "PMAW" for anyone who remembers those e-mails twenty years
> ago -- how about invoking this mode with C<no coercion> instead of C<use
> explicit> ?

I already wrote an idea of `no stringification` a while ago:

https://metacpan.org/release/PEVANS/stringification-0.01_004

Having (a? another?) pragma to disallow reference->number conversions,
and string<->number conversions might also be useful.

I don't think theabove implementation is good though - to work nicely it
really must be a core feature. This is one of those ideas that can be
experimented with in a syntax way as a CPAN module, but actual
implementation really does require core.

--

Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: No implicit coercion? [ In reply to ]
I just saw this proposal now, holidays.

Ovid, now THIS proposal in principle is something I can strongly get behind,
with just the details to discuss.

THIS proposal I feel more properly addresses the real problem that the
so-called-3-valued-logic proposal seemed to be aimed at a symptom of and that I
was never really comfortable with.

I also disagree with H.Merijn Brand's proposal that such functionality is tied
to declared typed variables, because that approach is too limited.

A proper general solution addresses arbitrarily complex expressions at every
stage of the expression tree, whose inputs and outputs are typically anonymous
and not named variables.

Ovid's proposal and explicitly typed variables are complementary features, one
is not a substitute for the other.

Also, applied consistently, Ovid's no implicit conversions proposal would NOT
block re-assignment to a variable of values of different types, because no
coersion is happening here. Again, complementary ideas. So eg "+=" would be
blocked but "=" would not.

I also strongly support David Nichol's proposal to call the pragma "no coersion"
rather than "use explicit" because the former is very clear what is going on,
disabling an active behavior, while the latter is way too ambiguous, "explicit
what", that doesn't say its about type coercion.

I also believe that Ovid's proposal should also handle the "undef" situation,
meaning that "no coercion" ALSO causes attempts to use undef as if it were a
defined value also fail. You can test if an undef value is undef and you can
assign undef to things, but you can't add to it or catenate it etc.

Note that in a stricter interpretation of Ovid's proposal that I prefer, using
anything in a boolean context ALSO fails unless it is explicitly a boolean using
that new real boolean feature.

And so, I see these things as all being distinct and mutually exclusive and that
implicit coersion between them should be an error under the pragma:

- undef
- boolean
- integer
- float
- character string
- octet string

And that's before we get to any kind of reference type.

Note that I consider regexes to be specific to character strings and feeding
them anything else should fail.

To go along with the pragma, there should be a simple consistent built-in
function/operator for asking, using the strict definition consistent with
otherwise using a debugging-type module, what kind of thing some $foo is.

And I mean for example builtin::is_int() only returns true for a PVIV or
whatever its called, and NOT for a string that matches "\d+".

As for the set of explicit coercion functions, it should not simply be one
overloaded one per target type named after the target type unless there is a
clearly defined and reasonable single best way to convert.

So for example, "undef" can reasonably cast to a single value of every other
type, boolean can reasonably cast to integer and float in one way, and integer
to float in one way.

But beyond that there shouldn't really be single generic explicit casts, instead
there should be one or more cast function/operator that is explicit in how it is
casting.

For example, all float to integer casts should be named after what rounding
method they are using when the float isn't an exact integer. So no "to_int" but
yes to the likes of "int_round_down/floor", "int_round_up/ceiling",
"int_round_to_zero/truncate" etc.

Also number to string casts should be specifying the numeric base such as
"to_text_base_10" or "to_text_base_16" or alternately just have people use a
sprintf type function instead for that.

So while forbidding implicit conversions is the main goal, we want any explicit
conversion routines to also be explicit on HOW they are converting, such as the
examples I gave, or a lot of the benefits of the pragma are lost.

I have thought about all this a lot further while doing some of my own language
design projects, so can go down the rabbit hole further if I'm asked.

-- Darren Duncan

On 2021-12-27 10:35 a.m., Ovid via perl5-porters wrote:
> Hi all,
>
> This is not a pre-RFC, but I think most of us have been bitten at times by bugs like this:
>
>     my @values = (in => "data.csv");
>     $values[2]++;
>
> Except that $values[2] was the string "n/a" and now it's "1".
>
> The core of the idea is simple. it would be lovely to have something to prevent implicit coercion in a given lexical scope:
>
>     use explicit;
>     my $num = 3;   # integer
>     $num += .42;    # fatal because it creates a float
>
> Instead, we have to do this:
>
>     use explicit;
>     my $num = 4;
>     $num = float($num);
>     # or `$num = 4.0;`
>     $num += .42; # works just fine
>
> This would also be fatal:
>
>     use explicit;
>     my $value = {};
>     say ++$value;