Mailing List Archive

Pre-RFC: Phaser Expressions
TL;DR: Add `BEGIN expr...` syntax to hoist the evaluation time of an
expression into BEGIN time, without creating a new scope.

Regular `my` variable assignment happens at runtime, even though the
variable itself is visible much earlier. Usually we don't notice this
problem, but occasionally awkward things happen.

For example, in a unit test or self-contained script file, it's common
to put little helper classes at the end of the file so they don't
clutter up the main logic. This means that assignment statements in
that package happen "too late" for the program to work properly. E.g.
consider

#!/usr/bin/perl
use v5.36;

Helper::say_hello;

package Greeter {
my $message = "Hello, world";
sub say_hello { say $message }
}

Here, while the variable is known to exist and thus the code compiles
OK, the assignment hasn't actually happened yet and so `undef` is
printed, with a warning.

The first reaction to this might be to wrap the line in a BEGIN block,
but then the variable is hidden by the braces of that BEGIN block.
Often this is solved in practice by splitting the declaration and
assignment in two:

package Greeter {
my $message; BEGIN { $message = "Hello, world"; }
...

This is a bit messy, and also a DRY failure - we've named the variable
twice.

It would be great if `BEGIN` (and by extension the admittedly
less-useful phasers of INIT, CHECK and UNITCHECK) could also be
prefixes for expressions, allowing one to write:

package Greeter {
BEGIN my $message = "Hello, world";
sub say_hello { say $message }
}

There are no extra braces here, so the variable isn't hidden by the
block but is visible to subsequent code. But being prefixed by BEGIN we
can see it is evaluated at BEGIN time and its side-effects (namely, the
assignment of a value into the variable) have already happened.


Phaser expressions in non-void context would also be evaluated once, at
the appropriate time, and replaced by a compiletime constant containing
the result. This might also be handy for many situations where
currently folks `use constant ...` to get a constant that's only ever
used once.

use Digest::MD5 'md5_hex';

foreach ( 1 .. 100 ) {
say "The MD5 hash is ", BEGIN(md5_hex("my content here"));
}

This would act like an "anonymous scalar", being roughly equivalent to:

use Digest::MD5 'md5_hex';

my $md5; BEGIN { $md5 = md5_hex("my content here"); }

foreach ( 1 .. 100 ) {
say "The MD5 hash is ", $md5;
}

except without polluting the lexical namespace with an additional name.


Such syntax would also handily solve the problem of `my` being used for
class data, as discussed by:

https://github.com/Ovid/Cor/issues/44#issuecomment-968818780

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Pre-RFC: Phaser Expressions [ In reply to ]
> On Monday, 15 November 2021, 12:47:51 CET, Paul "LeoNerd" Evans  <leonerd@leonerd.org.uk> wrote:
>
> TL;DR: Add `BEGIN expr...` syntax to hoist the evaluation time of an
>  expression into BEGIN time, without creating a new scope.

Yes, please! I get tired of doing the following:

    use Getopt::Long;
    my %opt_for;
    BEGIN {
        GetOptions( \%opt_for, ... );
        ...
    }

I would love to simply do:

    use Getopt::Long;
    BEGIN GetOptions( \my %opt_for, ... );

In fact, grepping through a few codebases shows a number of places where we use BEGIN blocks with variables declared before them.

Best,
Ovid
-- 
IT consulting, training, specializing in Perl, databases, and agile development
http://www.allaroundtheworld.fr/. 

Buy my book! - http://bit.ly/beginning_perl
Re: Pre-RFC: Phaser Expressions [ In reply to ]
I don't oppose this Pre-RFC.

However I am worried that RFCs will increase before more important things
are completed.

For example, isa, try - catch - finally, match, signatures.


2021?11?15?(?) 20:47 Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>:

> TL;DR: Add `BEGIN expr...` syntax to hoist the evaluation time of an
> expression into BEGIN time, without creating a new scope.
>
> Regular `my` variable assignment happens at runtime, even though the
> variable itself is visible much earlier. Usually we don't notice this
> problem, but occasionally awkward things happen.
>
> For example, in a unit test or self-contained script file, it's common
> to put little helper classes at the end of the file so they don't
> clutter up the main logic. This means that assignment statements in
> that package happen "too late" for the program to work properly. E.g.
> consider
>
> #!/usr/bin/perl
> use v5.36;
>
> Helper::say_hello;
>
> package Greeter {
> my $message = "Hello, world";
> sub say_hello { say $message }
> }
>
> Here, while the variable is known to exist and thus the code compiles
> OK, the assignment hasn't actually happened yet and so `undef` is
> printed, with a warning.
>
> The first reaction to this might be to wrap the line in a BEGIN block,
> but then the variable is hidden by the braces of that BEGIN block.
> Often this is solved in practice by splitting the declaration and
> assignment in two:
>
> package Greeter {
> my $message; BEGIN { $message = "Hello, world"; }
> ...
>
> This is a bit messy, and also a DRY failure - we've named the variable
> twice.
>
> It would be great if `BEGIN` (and by extension the admittedly
> less-useful phasers of INIT, CHECK and UNITCHECK) could also be
> prefixes for expressions, allowing one to write:
>
> package Greeter {
> BEGIN my $message = "Hello, world";
> sub say_hello { say $message }
> }
>
> There are no extra braces here, so the variable isn't hidden by the
> block but is visible to subsequent code. But being prefixed by BEGIN we
> can see it is evaluated at BEGIN time and its side-effects (namely, the
> assignment of a value into the variable) have already happened.
>
>
> Phaser expressions in non-void context would also be evaluated once, at
> the appropriate time, and replaced by a compiletime constant containing
> the result. This might also be handy for many situations where
> currently folks `use constant ...` to get a constant that's only ever
> used once.
>
> use Digest::MD5 'md5_hex';
>
> foreach ( 1 .. 100 ) {
> say "The MD5 hash is ", BEGIN(md5_hex("my content here"));
> }
>
> This would act like an "anonymous scalar", being roughly equivalent to:
>
> use Digest::MD5 'md5_hex';
>
> my $md5; BEGIN { $md5 = md5_hex("my content here"); }
>
> foreach ( 1 .. 100 ) {
> say "The MD5 hash is ", $md5;
> }
>
> except without polluting the lexical namespace with an additional name.
>
>
> Such syntax would also handily solve the problem of `my` being used for
> class data, as discussed by:
>
> https://github.com/Ovid/Cor/issues/44#issuecomment-968818780
>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
>
Re: Pre-RFC: Phaser Expressions [ In reply to ]
On Mon, 15 Nov 2021 at 19:47, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> TL;DR: Add `BEGIN expr...` syntax to hoist the evaluation time of an
> expression into BEGIN time, without creating a new scope.
>

From the example use-cases, it seems that it's not so much the _scope_
that's an issue - what we're really looking for is a value-returning BEGIN
block?

e.g. promote BEGIN to behave like `do`, but execute the contents once the
closing brace is encountered, and provide a list of constant values:

my ($list, $of, @constants) = BEGIN { ... };

(yes, it's not as simple as that - it'd break existing usage since we'd
have to start putting semicolons after BEGIN { ... } - but details like
this would be something for an RFC stage)

The bareword expression prefix isn't too convincing, seems like it would
make it harder to review code and see exactly where the boundaries of the
BEGIN-hoisted pieces are. See `return ... or next` for example.
Re: Pre-RFC: Phaser Expressions [ In reply to ]
On Tue, Nov 16, 2021 at 2:10 AM Tom Molesworth via perl5-porters
<perl5-porters@perl.org> wrote:
>
> On Mon, 15 Nov 2021 at 19:47, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:
>>
>> TL;DR: Add `BEGIN expr...` syntax to hoist the evaluation time of an
>> expression into BEGIN time, without creating a new scope.
>
>
> From the example use-cases, it seems that it's not so much the _scope_ that's an issue - what we're really looking for is a value-returning BEGIN block?
>
> e.g. promote BEGIN to behave like `do`, but execute the contents once the closing brace is encountered, and provide a list of constant values:
>
> my ($list, $of, @constants) = BEGIN { ... };

Something like this doesn't solve half of what this proposal is meant
to. While the return from the BEGIN would be calculated at compile
time, the variables would still not be populated until runtime. So
later attempts to use the variables at compile time would find the
variables uninitialized. This is what the expression prefix form is
meant to solve. Without the braces, the variables can be populated at
compile time without changing their scope.

>
> (yes, it's not as simple as that - it'd break existing usage since we'd have to start putting semicolons after BEGIN { ... } - but details like this would be something for an RFC stage)
>
> The bareword expression prefix isn't too convincing, seems like it would make it harder to review code and see exactly where the boundaries of the BEGIN-hoisted pieces are. See `return ... or next` for example.
Re: Pre-RFC: Phaser Expressions [ In reply to ]
>
> > my ($list, $of, @constants) = BEGIN { ... };
>
> Something like this doesn't solve half of what this proposal is meant
> to. While the return from the BEGIN would be calculated at compile
> time, the variables would still not be populated until runtime. So
> later attempts to use the variables at compile time would find the
> variables uninitialized. This is what the expression prefix form is
> meant to solve. Without the braces, the variables can be populated at
> compile time without changing their scope.
>
>
such syntax follows KIM (referencing latest Ovid's article)
More, such syntax may be usable also for another keywords, not only my, but
also our, state, local
Re: Pre-RFC: Phaser Expressions [ In reply to ]
From the keyboard of Branislav Zahradn?k [16.11.21,11:57]:

>
>
> >
> > my ($list, $of, @constants) = BEGIN { ... };
>
> Something like this doesn't solve half of what this proposal is meant
> to. While the return from the BEGIN would be calculated at compile
> time, the variables would still not be populated until runtime. So
> later attempts to use the variables at compile time would find the
> variables uninitialized. This is what the expression prefix form is
> meant to solve. Without the braces, the variables can be populated at
> compile time without changing their scope.
>
>
> such syntax follows KIM (referencing latest Ovid's article)
> More, such syntax may be usable also for another keywords, not only my, but also our,
> state, local

This syntax opens a can of worms and sacrifices well established perl
syntax principles for a small gain of syntactic sugar.

The RHS of an assignment is evaluated prior to assignment to the LHS.
In the compile phase, statements are examined, variables and subcalls
are allocated and woven into a syntax tree - this is my (poor) notion
of the compile time phase, where runtime may be forced via BEGIN, use.
The idea of making BEGIN blocks into part of an expression blurs the
lines between compile-time and runtime semantics for no good reason.

Quick, what happens here?

use strict;
my $oo;
$foo = do { BEGIN { 1 } };
print $foo,$/;

BEGIN, CHECK, INIT, END blocks etc. are not subroutines for a very good
reason: they are not arbitrarily callable and don't but establish some
special kind of flow control for code wrt execution time, nothing more.

If we follow that path of allowing retuns from BEGIN et al, we should -
if only for the sake of orthogonality - also allow return values for bare
blocks and other constructs of flow control, e.g. foreach loops:

my ($foo, $ref, @bar) = foreach($thingy->sub($baz)) {
...
# what gets assigned? the last values calculated?
}

We have map() for that and other constructs.
Please let us keep perl as simple as possible.

0--gg-

--
_($_=" "x(1<<5)."?\n".q?/)Oo. G?\ /
/\_?/(q /
---------------------------- \__(m.====?.(_("always off the crowd"))."?
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Pre-RFC: Phaser Expressions [ In reply to ]
Missing f in

use strict;
my $foo; # <---- here
$foo = do { BEGIN { 1 } };
print $foo,$/;

Sorry for that. - I probably should have wrapped that BEGIN into some
subroutine for more clarity.

0--gg-

From the keyboard of shmem [17.11.21,18:12]:
[...]
> use strict;
> my $oo;
> $foo = do { BEGIN { 1 } };
> print $foo,$/;

0--gg-

--
_($_=" "x(1<<5)."?\n".q?/)Oo. G?\ /
/\_?/(q /
---------------------------- \__(m.====?.(_("always off the crowd"))."?
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Pre-RFC: Phaser Expressions [ In reply to ]
> Quick, what happens here?
>
> use strict;
> my $oo;
> $foo = do { BEGIN { 1 } };
> print $foo,$/;
>
> BEGIN, CHECK, INIT, END blocks etc. are not subroutines for a very good
> reason: they are not arbitrarily callable and don't but establish some
> special kind of flow control for code wrt execution time, nothing more.
>
>
You got it little bit wrong. I'm not talking about treating phasers as a
value expressions
I'm talking about new grammar rule, shortly speaking:

statement: variable-list '=' BEGIN block;

vs

statement: BEGIN variable-list = expression

in both cases you (or someone who will implement this) has to some kind of
AST transformation
(I know, perly doesn't have AST layer) so expressions with different then
BEGIN will make sense:

INIT my $initialized = 1;
vs.
my $initialized = INIT { 1 };

first variant will be misleading (especially for newbies - yes, I still
believe that there is a willingness to attract some)
Re: Pre-RFC: Phaser Expressions [ In reply to ]
I don't like it.

That said, if we must do this I'd like to see it postfix, like the other
non-bracketing modifiers that happen and ends of expressions, which are if,
while, unless, for, and possibly something else I missed. We would
introduce a new postfix expression keyword, "at" or "during" or "atphase"
or something like that, of the same class as postfix if and friends, and it
would only be syntactically correct when followed by one of the "phasers."

it would look like this:

my $initialized = 1 at INIT;

That's my two sunken rai stones.

On Wed, Nov 17, 2021 at 1:59 PM Branislav Zahradník <happy.barney@gmail.com>
wrote:

> You got it little bit wrong. I'm not talking about treating phasers as a
> value expressions
> I'm talking about new grammar rule, shortly speaking:
>
> statement: variable-list '=' BEGIN block;
>
> vs
>
> statement: BEGIN variable-list = expression
>
> in both cases you (or someone who will implement this) has to some kind of
> AST transformation
> (I know, perly doesn't have AST layer) so expressions with different then
> BEGIN will make sense:
>
> INIT my $initialized = 1;
> vs.
> my $initialized = INIT { 1 };
>
> first variant will be misleading (especially for newbies - yes, I still
> believe that there is a willingness to attract some)
>
>

--
"Lay off that whiskey, and let that cocaine be!" -- Johnny Cash
Re: Pre-RFC: Phaser Expressions [ In reply to ]
On Thu, 18 Nov 2021 at 05:32, David Nicol <davidnicol@gmail.com> wrote:

> I don't like it.
>
> That said, if we must do this I'd like to see it postfix, like the other
> non-bracketing modifiers that happen and ends of expressions, which are if,
> while, unless, for, and possibly something else I missed. We would
> introduce a new postfix expression keyword, "at" or "during" or "atphase"
> or something like that, of the same class as postfix if and friends, and it
> would only be syntactically correct when followed by one of the "phasers."
>
> it would look like this:
>
> my $initialized = 1 at INIT;
>
> That's my two sunken rai stones.
>
>
Hm, we can read such expression as "default value of variable available
since phase X". Using my previous proposal:

my $foo := :default => BEGIN { 1 };
my $bar := :default => INIT { 1 };

Advantage: new grammar, new behaviour, not conflict with existing terms,
chains, or so (new grammar unambiguous as far as I'm aware of)
Disadvantage: new grammar

Details of phaser default value:
- evaluated in phase
- assigned to variable on first read
- destroyed after evaluation if already initialized
Re: Pre-RFC: Phaser Expressions [ In reply to ]
Eww.

On Thu, Nov 18, 2021 at 2:16 AM Branislav Zahradník <happy.barney@gmail.com>
wrote:

>
>> my $initialized = 1 at INIT;
>>
>> Hm, we can read such expression as "default value of variable available
> since phase X". Using my previous proposal:
>
> my $foo := :default => BEGIN { 1 };
> my $bar := :default => INIT { 1 };
>

I don't understand why you're limiting the proposed new grammar to variable
assignment.

The keywords "my" and "our" are syntactic sugar for two separate
operations.

1. declare a lexical variable, that might be an alias for a package
variable
2. when used as an L-value, initialize said variable when execution
reaches the code

For ease, the two can be combined.

If you don't want them combined, don't combine them.

my $foo; BEGIN {$foo = 'set at compile time'}

What's the problem?

My counter-proposal

1. solves the only problem I see, which is that you don't like the look
of curly braces and want to reduce their quantity: "my $foo = 27 at BEGIN;"
has no curlies and is pretty concise and doesn't introduce any new syntax,
just a new keyword "at" of a class of keywords we already have.
2. opens the door for more flexible postfix flow control modifiers,
like { $bar = 3 if $foo if $baz } instead of { $baz and $foo and $bar = 3
}, which would mean the same thing exactly. Unless I missed the change,
postfix flow control modifiers currently don't stack, possibly due to the
confusion over which level would own the topic, to which I would say "the
innermost, obviously."





--
"Lay off that whiskey, and let that cocaine be!" -- Johnny Cash