Hey David,
TL;DR:
* Standard Perl provides guidelines for writing clear and unambiguous
Perl code. It aims to improve readability and parsability for both
humans and machines. It focuses on writing Perl in a way that reduces
syntax complexity without changing the language's essence. It's not a
new Perl, just a suggestion on how to write Perl more clearly so we
can build strong tools for it.
* Guacamole, utilizing the Marpa parser, is one such tool designed to
parse Perl code that follows the "standard" guidelines into a
structured format. This enables advanced code analysis, refactoring,
and generation by understanding the code's structure and intent. You
still run it with perl as you did before - that part doesn't change.
It also doesn't remove Perl features. Those are all still supported
since you're still running it with the perl interpreter.
* RPerl differs significantly as it is a parser that translates Perl
code into C++. It requires adherence to a specific subset of the Perl
language, offering speed at the expense of the broader Perl syntax's
flexibility and compatibility.
Below is a much more detailed answer, trying to break down all the
elements. Sorry for how tedious it is. If the TL;DR is enough, please
ignore the rest. :)
On Mon, Feb 19, 2024 at 1:01?AM David Christensen
<dpchrist@holgerdanske.com> wrote:
>
> On 2/17/24 09:36, Ovid wrote:
> > ... https://metacpan.org/pod/standard ...
>
>
> Thank you for mentioning the Perl module "standard".
>
>
> Going a short way down the rabbit hole, I see:
>
> https://metacpan.org/pod/standard
>
> https://metacpan.org/pod/Guacamole
>
> https://blogs.perl.org/users/jeffrey_kegler/2011/11/what-is-the-marpa-algorithm.html
>
> https://metacpan.org/dist/Marpa-R2/view/pod/Marpa_R2.pod
>
> https://metacpan.org/dist/Marpa-R2/view/pod/Scanless/DSL.pod
>
>
> So, a domain specific language (DSL) for describing programming
> languages according to their grammar, a DSL grammatical description of a
> programming language similar to Perl that fits within the constraints of
> the DSL and related tools, a module for validating the syntax of
> programs written in this variant Perl, a module capable of producing
> parser/evaluator objects per a DSL grammatical description, and those
> objects are capable of running programs written in the described
> programming language (notably, the Perl variant) (?).
>
>
> Future plans include modifying the Perl variant DSL grammatical
> description to facilitate learning, usage, and security, to facilitate
> language additions (such as macros), and to facilitate tool chain
> improvements (such as IDE integration)?
>
>
> I would be interested in a compiler. Would "standard" and related
> support this?
There are multiple parts involved here:
* Syntax: Just the rules of what the language looks like, how to write
a function call or declare a function, etc.
* Parser: Breaking down the string that represents the code into a
tree of operations ("there's an addition with two variables")
* Virtual Machine (VM): Running the operations that the parser recognized
The Perl language has "perl" - an interpreter which parses and runs
Perl code. Putting aside the "running" part of it - just parsing Perl
itself - is famously practically impossible. This is because Perl has
syntax which can be interpreted in multiple ways (i.e, ambiguous) and
only by running the code (or enough code), you would be able to
resolve it.
A good example of this is prototypes, which tell perl how to parse the
rest of the code. "When you see this function, you parse the rest of
the code after it in the following manner."
Example code: foo bar 1, { a => 2 }
This code is ambiguous. Without knowing the prototype for the "foo()"
and "bar()" subroutines, you are not entirely sure what's going on.
The perl interpreter has the same problem.
* Does the "bar()" subroutine receive two parameters ("1" and "{ a =>
2 }") or does it only get "1" and the "foo()" subroutine get the
result of "bar()" as a first parameter and "{ a => 2 }" as a second
parameter?
* Is "{ a => 2 }" a coderef or a hashref? When you look at it now,
it's a hashref, but what about "{ ( a => 2 ) }"? That's a coderef.
What about "{; a => 2 }"? Coderef again. Anyway, confusing.
* Oh, and if you accidentally put a newline between "{" and "a", the
"a" will be called as a subroutine, rather than automatically treated
as a string.
So to resolve these options (at least the first one), the perl
interpreter will load the code that defines the prototypes for "foo()"
and "bar()", executing any BEGIN {} block it finds along the way, and
then be able to parse the rest.
The syntax of Perl is quite vast which creates this problem. However,
instead of purposefully keeping this ambiguous by cutting down chars,
you can keep some characters and make it clearer:
> foo( bar(1), +{ "a" => 2 } )
> foo( bar( 1, +{ "a" => 2 } ) )
Both of these ways are still compliant with the Perl syntax (the
definition of the language) and they will be executed perfectly fine -
exactly the same way as before, except they leave no room for
ambiguity. The parentheses make it clear which are the parameters for
which function. It's also clear these are functions. An added benefit
is that the perl interpreter knows what was supposed to happen, so if
there's an issue, it will give you a better error:
$ perl -e'foo( 1, +{ "a" => 2 } )'
Undefined subroutine &main::foo called at -e line 1.
$ perl -e'foo 1, +{ "a" => 2 }'
Number found where operator expected at -e line 1, near "foo 1"
(Do you need to predeclare foo?)
syntax error at -e line 1, near "foo 1"
Execution of -e aborted due to compilation errors.
In the first case, we told perl that "foo()" is a subroutine. We were
clear on which were the parameters. When we tried running it, it knew
that subroutine was not available and it told you exactly that.
In the second case, perl had a parsing error in which it's not exactly
clear what happened. It says "there's a symbol here called foo that
I'm not familiar with." Yeah, it helps if you're technical enough to
understand, but otherwise, you're not necessarily sure what happened.
perl isn't necessarily sure either. We're not even getting into issues
with indirect notation, in which perl thinks it could've been
something entirely different and then gives you an error for this
entirely different syntax which you didn't necessarily even know was a
thing.
Standard Perl is a document that suggests how to write Perl in a way
that doesn't confuse perl or developers. It says "don't write foo 1,
but instead foo(1)". That's basically it. Not all of its suggestions
are well received. We're gotten accustomed to not quoting strings in
various cases (left of "=>" and as keys in hash access), but those
also provide room for ambiguity. Imagine "sub foo {'hello'}
$hash{foo}" - you might say "oh, the key is foo" but another developer
might assume it calls the subroutine foo and the key is hello. What
about "$hash{+foo}" or "$hash{-foo}" - you might think however these
behave, it's the same, but that's not true. Plus does something
different than minus for the parsing. Standard Perl says "just quote
all strings". It makes it easier to read and easier to parse.
There are a lot of other benefits besides "not confusing developers"
and "getting better warnings and errors from the parser."
If you can understand every code you see without having to run any of
it, you can use a program to analyze it. That's where Guacamole and
Marpa come in. Marpa allows you to define syntax (either using code or
using a BNF string) and it will then parse it for you and create the
tree of operations. You can then use the tree to do what you want.
Guacamole includes the BNF string and a few utilities around that.
When you load the "standard" pragma, it will take your code and try to
parse it with Guacamole. If you wrote your code in a way that's
ambiguous, it will fail and tell you where the issue is. You could
then correct your code. If your code finally loads successfully, it
means you adhered to all of Standard Perl and now you can use
Guacamole utilities against your code.
Guacamole has a few cool tools that use the op tree. It can output
code strings back (like B::Deparse). Since we have a tree of
operations and can generate code from another tree of operations, we
can easily define in code how code should be structured ("a function
called foo with the following parameters") and we could generate this
code. Imagine Perl::Critic finding a problem and rewriting the code
into the correct code. (Or imagine being able to find similar code
based on its operations tree even if the strings are not the same - we
could detect they accomplish the same.)
I tried doing something like this manually with PPI -
Ref::Util::Rewriter. You can see how odd the code is and, despite the
comments, I'm still not sure what I attempted doing in every line of
code or why. Instead, we can easily say "find the following code
structure and replace it with this code structure - now generate that
code string and recreate my program with this new code instead." Some
companies (like Meta) have systems like that which can rewrite vast
amounts of code across the entire code-base with 100% accuracy. It is
definitely a killer feature for a language, which Perl does not have,
but statically parseable languages have.
PPI does document parsing - trying to understand the elements as best
it can, but eventually throwing it to you with a shrug. You can't
blame it, It can't know everything because the code is ambiguous and
PPI can't run code to get enough context. If you read Perl::Critic
policies, you will see a lot of the code with PPI is moving around
trying to figure out what a thing is - just like I'm doing with
Ref::Util::Rewriter. But with Guacamole, you know for sure. There's no
"I don't know, it's just a word - you figure it out." Instead, it
tells you what every element is.
>
>
>
> How do "standard" and related compare to RPerl?
>
> https://metacpan.org/dist/RPerl/view/script/rperl
>
There's a big difference between the two. My memory of RPerl is hazy
(so I might be corrected on the thread), but basically RPerl is a
parser that generates C++ code that compiles to a native binary,
meaning the OS is able to run the program directly instead of needing
a VM as the runner. RPerl doesn't use the "perl" VM for anything. It
doesn't use it to parse or to run. It parses on its own to generate
the corresponding C++ code. So RPerl needs to implement everything it
wants to support from the Perl syntax by itself, which is not easy to
do, because the Perl language can do quite a lot.
Standard Perl is just a document that suggests how to write Perl in a
way that can be statically parsed by anything, including the "perl" VM
or RPerl.
Guacamole is only a parser (using Marpa as the actual parsing
framework), but it also doesn't run it. It just generates a big tree
of operations from the code it reads.
Standard Perl and Guacamole don't move you away from perl, they just
suggest how to write it and - wherever you do write the way it
suggests - it can now provide tooling to do cool stuff. It's still
Perl and you're still running it with perl.
RPerl, on the other hand, says "write this variation of Perl and we
could compile it into C++ and run it really fast"[1]. That variation
isn't just about syntax, but actual features since you are moving away
from perl and run it with RPerl. Guacamole is just a tool on the side.
A different way to see it is:
* If you follow what Standard Perl suggests, nothing in your code
changes other than being more readable (for people and programs). Now
you can use Guacamole to do really cool things, still running your
code exactly as you did until now. Standard Perl supports practically
everything (with few exceptions like prototypes which make your code
ambiguous without adding any new capabilities, speed, or features).
* Standard Perl is also gradual in the sense that, if you follow the
standard for only 10% of your codebase, you can use the Guacamole
utilities on those 10%. Update the rest of your code, and you can use
those utilities there too. It's not an all or nothing approach. The
running is still done by "perl" itself, exactly as before.
* To follow Standard Perl and Guacamole, the Perl language and
interpreter doesn't need any updates or changes. It's just a slightly
more constrained way of writing it. The Perl 5 Porters needn't change
anything for you to just use it.
* Using RPerl is replacing the "perl" binary and you can only use
enough of the Perl features as RPerl supports.
Sawyer.
[1] One such thing it doesn't support as a capability is magic. If you
try to use it, it will not work because RPerl doesn't implement the
concept of magic which the "perl" VM does. Things like DBI will not
run on RPerl. The importance of databases is expressed by others on
the thread already.