Mailing List Archive

Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex
I'm looking at getting around to implementing PPC 0019 finally.

https://github.com/Perl/PPCs/blob/main/ppcs/ppc0019-qt-string.md

First interesting question: Should qt() strings be sub-lexed, or not..?

To explain this question, I'll first need to draw attention to an
annoying quirk of how existing strings like q() and qq() work.

When the lexer encounters a quote-start operator like q or qq, the
first thing it does is look at what the delimiting characters are, and
then it scans ahead looking for the end marker. While looking, it knows
how to count handed pairs *of that marker* and ignore escaped versions,
but it doesn't know anything else. Once it has found the bounds of that
string quoting form, it goes off into a separate parse phase to
understand the inner contents of it, which then get inserted at the
parse point.

q(this is the contents) and now we are outside

q(we can count (inner) parentheses) and now this is outside

q(we ignore \( escaped parens) and now this is outside

but that's as far as it goes. Note that it *does not* understand perl
code inside qq() strings.

eval: qq(This ${\ somefunc ')' } is not valid)
Compile error: Can't find string terminator "'" anywhere before EOF at
(eval 7) line 1.

What went wrong here?

Remember - the lexer first looks at the quoting marker, and then tries
to find the end. It found the end.

qq(This ${\ somefunc ')

################## ^-- Oh look here's the end.

That inside then gets passed into a sub-lexer to parse, and then gets
inserted back into the original syntax

qq(###################)' } is not valid)

Oops. Well, that definitely doesn't look like valid perl code - offhand
I don't know if the parse error comes from the sub-lex inside or the
main parse outside, but either way, it failed.


So with that in mind - what do we feel about the new qt() string syntax?

I.e. what do people feel -should- be the behaviour of a construction
like

sub f { ... }

say qt(Is this { f(")") } valid syntax?);

Should it:

1) Yield a parse error similar to the ones given in the example above?

2) Parse as valid perl code yielding a similar result to:

say 'Is this ', f(")"), ' valid syntax?';

3) Something else?


I feel that interpretation 2 might be most useful and powerful, but
would be inconsistent with existing behaviour of existing operators.
Interpretation 1 is certainly easier to achieve as it reüses existing
parser structures, but given the whole point is to interpolate code
inside the {braces} it might lead to weird annoying cases that don't
work so well.

Does anyone have any good examples one way or other from other
languages that have a similar construction?

(Cross-posted to https://github.com/Perl/PPCs/issues/47)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 1/11/24 10:30, Paul "LeoNerd" Evans wrote:
> Should it:
>
> 1) Yield a parse error similar to the ones given in the example above?
>
> 2) Parse as valid perl code yielding a similar result to:
>
> say 'Is this ', f(")"), ' valid syntax?';
>
> 3) Something else?
>
>
> I feel that interpretation 2 might be most useful and powerful, but
> would be inconsistent with existing behaviour of existing operators.

I would also agree that #2 is better for users of perl, but would be a
significant burden to the implementors of syntax highlighting. 
Currently those syntax highlighters get to take advantage of the same
easy parsing.  If you force them to dive into a full perl parse they
might have to re-structure their entire code to be able to recursively
call into it.  You would end up in the short term with most editors not
bothering to fix that, and then having misleading syntax highlighting
which could confuse users worse than option #1 would have.

On the general topic of string interpolations, I did some recent
exploration into this for CodeGen::Cpppp and decided that the nicest
extension of string interpolation would be to make "${{ }}" parse as a
code block.  It's more characters, but reads quite nicely and doesn't
get in the way of code generation.  qt// would be fairly horrible for
code generation if every { needs escaped. Also, "${{ }}" could be added
to the regular interpolation contexts and not break back-compat since it
would have been an error, before.

Compared to the ppc 19 examples:

|# Simple scalar interpolation qt<Greetings, {$title} {$name}>; #
Interpolation of method calls qt"Greetings, {$user->title}
{$user->name}"; # Interpolation of various expressions qt{It has been
{$since{n}} {$since{units}} since your last login}; qt{...a game of
{join q{$"}, $favorites->{game}->name_words->@*}}; |

You would get

|# Simple scalar interpolation "Greetings, $title $name"; #
Interpolation of method calls "Greetings, ${{$user->title}}
${{$user->name}}"; # Interpolation of various expressions "It has been
$since{n} $since{units} since your last login"; "...a game of ${{
$favorites->{game}->name_words->@* }}"; |
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 11.01.24 19:26, Michael Conrad wrote:
>
> Also, "${{ }}" could be added
> to the regular interpolation contexts and not break back-compat since it
> would have been an error, before.

$ perl -wE 'say "hello, ${{k => \q(world)}->{k}}"'
hello, world
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 11.01.24 19:26, Michael Conrad wrote:
> I would also agree that #2 is better for users of perl, but would be a
> significant burden to the implementors of syntax highlighting.
> Currently those syntax highlighters get to take advantage of the same
> easy parsing.  If you force them to dive into a full perl parse they
> might have to re-structure their entire code to be able to recursively
> call into it.  You would end up in the short term with most editors not
> bothering to fix that, and then having misleading syntax highlighting
> which could confuse users worse than option #1 would have.

But you already can interpolate arbitrary code into strings:

"@{['just', 'an', 'example', 2+2]}"

If you want to highlight that sensibly, you already need some sort of
recursive embedding.

(Also, JavaScript does #2 with its `... ${ ... } ...` construct and I
don't hear developer tools complaining about that.)
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
Oops.

On 1/11/24 13:26, Michael Conrad wrote:
> Also, "${{ }}" could be added to the regular interpolation contexts
> and not break back-compat since it would have been an error, before.

Actually existing syntax allows

   say "${{ a => \1 }->{a}}"

so my suggestion would indeed break back-compat.

Also my final example could have just been @{ }.   The ppc0019 examples
really aren't really showing why qt would be an advantage.  Better
examples would be to show infix operators and things like that.
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
imho it should use pluggable grammar mechanism - qt should behave like any
pluggable keyword,
and when code intro is detected, it should use perl's grammar.

that may require some adjustments (I already described mechanism here some
time ago).

such mechanism allows easy nesting without escaping, eg from javascript
`a ${ `b ${ 1 + 2 }` }`

btw, what about code?
qt { sub foo { }; foo } }

Brano

On Thu, 11 Jan 2024 at 16:31, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk>
wrote:

> I'm looking at getting around to implementing PPC 0019 finally.
>
> https://github.com/Perl/PPCs/blob/main/ppcs/ppc0019-qt-string.md
>
> First interesting question: Should qt() strings be sub-lexed, or not..?
>
> To explain this question, I'll first need to draw attention to an
> annoying quirk of how existing strings like q() and qq() work.
>
> When the lexer encounters a quote-start operator like q or qq, the
> first thing it does is look at what the delimiting characters are, and
> then it scans ahead looking for the end marker. While looking, it knows
> how to count handed pairs *of that marker* and ignore escaped versions,
> but it doesn't know anything else. Once it has found the bounds of that
> string quoting form, it goes off into a separate parse phase to
> understand the inner contents of it, which then get inserted at the
> parse point.
>
> q(this is the contents) and now we are outside
>
> q(we can count (inner) parentheses) and now this is outside
>
> q(we ignore \( escaped parens) and now this is outside
>
> but that's as far as it goes. Note that it *does not* understand perl
> code inside qq() strings.
>
> eval: qq(This ${\ somefunc ')' } is not valid)
> Compile error: Can't find string terminator "'" anywhere before EOF at
> (eval 7) line 1.
>
> What went wrong here?
>
> Remember - the lexer first looks at the quoting marker, and then tries
> to find the end. It found the end.
>
> qq(This ${\ somefunc ')
>
> ################## ^-- Oh look here's the end.
>
> That inside then gets passed into a sub-lexer to parse, and then gets
> inserted back into the original syntax
>
> qq(###################)' } is not valid)
>
> Oops. Well, that definitely doesn't look like valid perl code - offhand
> I don't know if the parse error comes from the sub-lex inside or the
> main parse outside, but either way, it failed.
>
>
> So with that in mind - what do we feel about the new qt() string syntax?
>
> I.e. what do people feel -should- be the behaviour of a construction
> like
>
> sub f { ... }
>
> say qt(Is this { f(")") } valid syntax?);
>
> Should it:
>
> 1) Yield a parse error similar to the ones given in the example above?
>
> 2) Parse as valid perl code yielding a similar result to:
>
> say 'Is this ', f(")"), ' valid syntax?';
>
> 3) Something else?
>
>
> I feel that interpretation 2 might be most useful and powerful, but
> would be inconsistent with existing behaviour of existing operators.
> Interpretation 1 is certainly easier to achieve as it reüses existing
> parser structures, but given the whole point is to interpolate code
> inside the {braces} it might lead to weird annoying cases that don't
> work so well.
>
> Does anyone have any good examples one way or other from other
> languages that have a similar construction?
>
> (Cross-posted to https://github.com/Perl/PPCs/issues/47)
>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
>
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 1/11/24 13:35, Lukas Mai wrote:
> On 11.01.24 19:26, Michael Conrad wrote:
>> I would also agree that #2 is better for users of perl, but would be
>> a significant burden to the implementors of syntax highlighting. 
>> Currently those syntax highlighters get to take advantage of the same
>> easy parsing.  If you force them to dive into a full perl parse they
>> might have to re-structure their entire code to be able to
>> recursively call into it.  You would end up in the short term with
>> most editors not bothering to fix that, and then having misleading
>> syntax highlighting which could confuse users worse than option #1
>> would have.
>
> But you already can interpolate arbitrary code into strings:
>
>     "@{['just', 'an', 'example', 2+2]}"
>
> If you want to highlight that sensibly, you already need some sort of
> recursive embedding.
>
> (Also, JavaScript does #2 with its `... ${ ... } ...` construct and I
> don't hear developer tools complaining about that.)

Are you aware of highlighters that parse those inner expressions?  The
ones I've seen just highlight the whole string the same color and don't
bother.
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 11.01.24 19:39, Michael Conrad wrote:
> On 1/11/24 13:35, Lukas Mai wrote:
>>
>> But you already can interpolate arbitrary code into strings:
>>
>>     "@{['just', 'an', 'example', 2+2]}"
>>
>> If you want to highlight that sensibly, you already need some sort of
>> recursive embedding.
>>
>> (Also, JavaScript does #2 with its `... ${ ... } ...` construct and I
>> don't hear developer tools complaining about that.)
>
> Are you aware of highlighters that parse those inner expressions?  The
> ones I've seen just highlight the whole string the same color and don't
> bother.

Vim certainly tries.
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On Thu, 11 Jan 2024 13:26:12 -0500
Michael Conrad <mike@nrdvana.net> wrote:

> I would also agree that #2 is better for users of perl, but would be
> a significant burden to the implementors of syntax highlighting. 
> Currently those syntax highlighters get to take advantage of the same
> easy parsing.  If you force them to dive into a full perl parse they
> might have to re-structure their entire code to be able to
> recursively call into it.  You would end up in the short term with
> most editors not bothering to fix that, and then having misleading
> syntax highlighting which could confuse users worse than option #1
> would have.

Having actually written a Perl syntax highlighter for multiple
use-cases including text editors [1], I can say that actually option #2
is _easier_. Trying to do that "find the end then sublex" is a harder
more complex structure to express in most grammar engines, than the
more regular structure of a recursive grammar.

[1]: https://github.com/tree-sitter-perl/tree-sitter-perl/

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 1/11/24 13:48, Paul "LeoNerd" Evans wrote:
> On Thu, 11 Jan 2024 13:26:12 -0500
> Michael Conrad <mike@nrdvana.net> wrote:
>
>> I would also agree that #2 is better for users of perl, but would be
>> a significant burden to the implementors of syntax highlighting.
>> Currently those syntax highlighters get to take advantage of the same
>> easy parsing.  If you force them to dive into a full perl parse they
>> might have to re-structure their entire code to be able to
>> recursively call into it.  You would end up in the short term with
>> most editors not bothering to fix that, and then having misleading
>> syntax highlighting which could confuse users worse than option #1
>> would have.
> Having actually written a Perl syntax highlighter for multiple
> use-cases including text editors [1], I can say that actually option #2
> is _easier_. Trying to do that "find the end then sublex" is a harder
> more complex structure to express in most grammar engines, than the
> more regular structure of a recursive grammar.
>
> [1]: https://github.com/tree-sitter-perl/tree-sitter-perl/
>
Well, I stand corrected.  Actually Vim and highlight.js already think that

   "$test ${ '"' }"

is a valid string, while Scintilla (Scite, Notepad++) and VSCode
correctly detect the end of the string at the internal "

So vim and highlight.js wouldn't require any effort to handle qt{} with
embedded parsing, but scintilla and vscode might.
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On Thu, Jan 11, 2024 at 03:30:50PM +0000, Paul "LeoNerd" Evans wrote:
> I'm looking at getting around to implementing PPC 0019 finally.
>
> https://github.com/Perl/PPCs/blob/main/ppcs/ppc0019-qt-string.md
>
> First interesting question: Should qt() strings be sub-lexed,
> or not..?
>
> I.e. what do people feel -should- be the behaviour of a construction
> like
>
> sub f { ... }
>
> say qt(Is this { f(")") } valid syntax?);
>
> Should it:
>
> 1) Yield a parse error similar to the ones given in the example
> above?
>
> 2) Parse as valid perl code yielding a similar result to:
>
> say 'Is this ', f(")"), ' valid syntax?';
>
> 3) Something else?
>
>
> I feel that interpretation 2 might be most useful and powerful, but
> would be inconsistent with existing behaviour of existing operators.
> Interpretation 1 is certainly easier to achieve as it re?ses existing
> parser structures, but given the whole point is to interpolate code
> inside the {braces} it might lead to weird annoying cases that don't
> work so well.
>

I definitely prefer 1) because I think the wording of the PPC implies
that qt-strings are parsed just like the other quoted constructs (look
for the end, skip escaped delimiters). It's also consistent with how
parsing of quoted constructs has been documented for the past 25+ years
(75e14d17912ce8a35d5c2b04c0c6e30b903ab97f in June 1998).

I'm so used to using ${\} and @{[]} in double-quoted strings that I
didn't really see the benefits of qt right away. It clearly declutters
complex expressions, though.

Thinking about how qt should be parsed, I think it should be consistent
with this paragraph from perldoc ("Gory details of parsing quoted
constructs"):

When searching for single-character delimiters, escaped delimiters
and "\\" are skipped. For example, while searching for terminating
"/", combinations of "\\" and "\/" are skipped. If the delimiters
are bracketing, nested pairs are also skipped. For example, while
searching for a closing "]" paired with the opening "[", combinations
of "\\", "\]", and "\[" are all skipped, and nested "[" and "]" are
skipped as well. However, when backslashes are used as the delimiters
(like "qq\\" and "tr\\\"), nothing is skipped. During the search for
the end, backslashes that escape delimiters or other backslashes are
removed (exactly speaking, they are not copied to the safe location).

and also that anything outside of {} should be treated as single-quoted
strings, meaning that all these produce the same result (the literal
`$bloop`):

'$bloop'
qt($bloop)
qt({'$bloop'})
qt({('$bloop'\)})

and like q(), the only character one needs to be escaped is the
delimiter.

We are quite used to, as Perl users, picking our delimiters, and being
careful about embedding them inside our single or double-quoted
strings. Escaping or balancing our delimiters is something we already
do commonly.

Expanding on your example by adding a newline to the string literal in
the embedded code, I think this would be valid syntax:

sub f { shift }
say qt(Is this { f("\)\n") } valid syntax?);

and would print

Is this )
valid syntax?

i.e. the code run in the template is `f(")\n")`.

Now the documentation in perldoc actually says that \\ is also skipped,
meaning you'd have to escape all \, which sounds less than ideal when
embedding code. (And my example above contradics this.)

If you consider the current behaviour around \ in single-quoted strings,
it is already a bit confusing:

$ perl -E 'say for q(\1), q(\\2), q(\\\3), q(\\\\4)'
\1
\2
\\3
\\4

My take on escaping in qt-strings would be that when search for
single-character delimiters, *only* the escaped delimiters are skippped.
Lone backslashes would be left alone.

So we'd have:

$ perl -E 'say for qt(\1), qt(\\2), qt(\\\3), qt(\\\\4)'
\1
\\2
\\\3
\\\\4

and

$ perl -E 'say for qt(\)), qt(\\)), qt(\\\)), qt(\\\\))'
)
\)
\\)
\\\)

Which I guess is actually your solution 3).

It should be possible to describe qt-string by saying that:

qt( prefix { ... } suffix )

runs exactly like this:

' prefix ' . do { ... } . ' suffix '

(and dies with "Unimplemented" in this case)

Actually, in the corner case where there's neither prefix nor suffix
(`qt({...})`), the `do` could be in list context and propagate it. So
it's really `scalar do {...}`. The PPC already says the code is run as a
scalar expression.

--
Philippe Bruhat (BooK)

Beauty may be a curse, but not as great a curse as stupidity.
(Moral from Groo The Wanderer #11 (Epic))
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
> I definitely prefer 1) because I think the wording of the PPC implies
> that qt-strings are parsed just like the other quoted constructs (look
> for the end, skip escaped delimiters). It's also consistent with how
> parsing of quoted constructs has been documented for the past 25+ years
> (75e14d17912ce8a35d5c2b04c0c6e30b903ab97f in June 1998).
>

... it's buggy and messy ... try quoted construct in quoted construct in
quoted construct


>
> I'm so used to using ${\} and @{[]} in double-quoted strings that I
> didn't really see the benefits of qt right away. It clearly declutters
> complex expressions, though.
>

I think qt should not exist as separated construct but should be feature
how to treat
interpolated literals in given block.


>
> Thinking about how qt should be parsed, I think it should be consistent
> with this paragraph from perldoc ("Gory details of parsing quoted
> constructs"):
>

with pluggable grammars things changed (and will change more)
Perl shouldn't parse interpolated strings as single literal but there
should be
rule in form:
interpolated: string_literal? (interpolation | string_literal)*

and also that anything outside of {} should be treated as single-quoted
>
> strings, meaning that all these produce the same result (the literal
> `$bloop`):
>
> '$bloop'
> qt($bloop)
> qt({'$bloop'})
> qt({('$bloop'\)})
>
> and like q(), the only character one needs to be escaped is the
> delimiter.
>
> We are quite used to, as Perl users, picking our delimiters, and being
> careful about embedding them inside our single or double-quoted
> strings. Escaping or balancing our delimiters is something we already
> do commonly.
>

Even single character delimiter should be treated as "paired" when parsing
interpolated strings.

Pointing to bias "we are quite used to".
Focus should be on future, not on past (however we became comfortable with
it)


>
> Expanding on your example by adding a newline to the string literal in
> the embedded code, I think this would be valid syntax:
>
> sub f { shift }
> say qt(Is this { f("\)\n") } valid syntax?);
>

again, try to nest these things three times ...

Brano
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
Paul \"LeoNerd\" Evans writes:

> I'm looking at getting around to implementing PPC 0019 finally.
>
> https://github.com/Perl/PPCs/blob/main/ppcs/ppc0019-qt-string.md

Thank you. That would be fantastic!

One stupid question that I couldn't see the answer to in your email or
the PPC: what does the t stand for? It isn't an obvious mnemonic for
‘expression’ or ‘code block’ or ‘braces’.

> I feel that interpretation 2 might be most useful and powerful, but
> would be inconsistent with existing behaviour of existing operators.
>
> Does anyone have any good examples one way or other from other
> languages that have a similar construction?

Raku evaluates {...} blocks in double-quoted strings, and goes for
interpolation interpretation 2 — this works:

say qq[Right square bracket is Unicode {ord(']')}.];

Try it out: https://glot.io/snippets/gsdx63bezo

I think that would be the behaviour of least surprise for users:

• Somebody who hasn't thought about the issue would just write code like
the above not noticing they have a potentially problematic embedded
closing delimiter.

• Somebody who realizes the potential issue would avoid it by picking a
different delimiter — which would still work fine.

Michael Conrad writes:

> I would also agree that #2 is better for users of perl, but would be a
> significant burden to the implementors of syntax highlighting.

I don't feel that should massively be taken into account. Syntax-
highlighting Perl is already awkward, and if I'm writing something
particularly esoterically nested, I don't necessarily expect syntax
highlighters to get it right.

In this case if a Perl programmer cares about making the syntax
highlighting work, they can always switch the delimiter to something
else. But it doesn't follow that perl itself shouldn't cope with the
potentially confusing delimiter.

Philippe Bruhat (BooK) writes:

> I definitely prefer 1) because I think the wording of the PPC implies
> that qt-strings are parsed just like the other quoted constructs (look
> for the end, skip escaped delimiters). It's also consistent with how
> parsing of quoted constructs has been documented for the past 25+ years
> (75e14d17912ce8a35d5c2b04c0c6e30b903ab97f in June 1998).

Conversely, when adding a new feature, it's good to remedy any
shortcomings in existing features. ‘You can embed a code block as an
expression’ is simpler both to teach and market as a feature than ‘You
can embed a code block as an expression, unless it happens to include
the closing character for the surrounding string, in which case it might
not work.’.

Also, qt is distinct from existing quoting mechanisms in that it's the
first one whose whole raison d'être is to interpolate a block with
beginning and ending markers. As such, users may reasonably have
expectations of what can be in it that isn't particularly influenced by
what can be done inside other types of quotes.

> I'm so used to using ${\} and @{[]} in double-quoted strings that I
> didn't really see the benefits of qt right away. It clearly declutters
> complex expressions, though.

While "${...}" and "@{...}" clearly can already be used to interpolate
arbitrary expressions, it's reasonable to think of them as being for
interpolating the values from variables (presumably influenced by the
similar shell quoting constructs). Many programmers are going to think
of what's allowed there as being ‘variable-like’ rather than
‘block-like’, so existing expectations don't necessarily apply.

That instead of a variable you can jump through the hoops of
constructing a reference and immediately dereferencing doesn't mean that
it's a good template for how interpolating arbitrary expressions should
be done.

> Thinking about how qt should be parsed, I think ... that anything
> outside of {} should be treated as single-quoted strings

I think that is indeed the intent of PPC 0019, which says in the
Rationale section: “The proposed qt operator only looks for one special
token in a string literal: {.?

However, I see the equivalence in the Specification section implies that
the constant text A and B would be subject to qq interpretation. That
contradiction in the PPC should probably be resolved before
implementation starts.

(Technically it's only a contradiction if you read A and B as being
placeholders for arbitrary strings. If you read them as simply being the
exact strings 'A ' and ' B' then whether they are treated as q or qq
strings is irrelevant, since they're the same in both.)

But ... I'm concerned that not having \n interpreted as a line-break in
qt strings could make them less useful.

If you currently have something like:

say "name: $name\nage: ${\(($now - $dob)->year)}\nclass: $class";

or, equivalently:

say "name: $name\nage: " . ($now - $dob)->year) . "\nclass: $class";

then this could be considered an improvement:

say qt[name: {$name}\nage: {($now - $dob)->year}\nclass: {$class}];

Whereas one of these perhaps less so:

say qt[name: {$name}{"\n"}age: {($now - $dob)->year}{"\n"}class: {$class}];
say qt[name: {"$name\n"}age: {($now - $dob)->year . "\n"}class: {$class}];

Programmers would have to choose whether they wanted the convenience of
{...} expressions or the convenience of \n line-breaks, but not get
both.

That could lead to messy code which mostly uses qq (because the
interpolated values are simple variable look-ups) but has a few lines of
qt mingled among them, and the programmer needs to remember to treat
line-breaks differently on the qt lines.

It also makes it harder to refactor old code to ‘upgrade’ from qq
strings to qt strings.

If you have code with qq strings in it and wish to embed an expression
(or change an existing ref-and-deref embed to use {...} syntax), and you
know that none of your existing strings use literal braces in them (a
reasonable assumption in many cases), then it would be nice and
straightforward for the process to be:

1. Change qq to qt (or " to qt").
2. Wrap all existing variable interpolations in braces.

and then you're ready to put {...} expressions in there.

Whereas having the additional step:

3. Find all \-special characters and replace them with something that
does the equivalent in qt strings.

could make it significantly more work. It would be nice if qt could
straightforwardly do everything that qq does.

Note, I'm *not* suggesting that qt should also continue to interpolate
$var and @var variables outside of {...} blocks. Those already causes
problems with, for instance, prices in dollars and literal email
addresses in qq strings, and {$var} and {@var} are straightforward
alternatives. Whereas \ already needs to be escapable regardless (see
below), and its {...}-based alternative is much clunkier.

\X (for arbitrary X) would need to become {qq[\X]} (in case qt"..." is the
outer delimiters), for instance turning:

"$zap\n$kapow"

into:

qt"{$zap}{qq[\n]}{$kapow}"

That nesting makes it heavy both on the punctuation (which this PPC is
rightly trying to avoid for expression interpolation) and mentally, in
having to consider a nested string within an expression within a string
— which seems like a lot just for a line-break.

There will be many cases where something simpler can be done, of course.
But the fact they'll vary makes the qq-to-qt translation more awkward:
it's no longer a straightforward identical change for each \X in the
string.

So I think that possible having qt be like qq except it does {..}
interpolation instead of $var and @var interpolation may be the most
useful combination.

> My take on escaping in qt-strings would be that when search for
> single-character delimiters, *only* the escaped delimiters are skippped.
> Lone backslashes would be left alone.

If you have an escape mechanism then you need to allow escaping the
escape character, so that it can appear at the end of the string, in
something like 'X\\' (because 'X\' would be interpreted as an X followed
by a literal single quote and then ... whatever the next bit of your
program is, because it hasn't found the closing quote mark yet).

Similarly for something like q[a[\\]b] if you want a literal backslash
just before a non-escaped delimiter.

'X\Y' is fine without being escaped, because there's no ambiguity, so
escape characters don't usually have to be escaped, but it needs to be
possible to do so.

Smylers
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On Fri, 12 Jan 2024 11:32:05 +0100 (CET)
"Smylers  " via perl5-porters <perl5-porters@perl.org> wrote:

> One stupid question that I couldn't see the answer to in your email or
> the PPC: what does the t stand for? It isn't an obvious mnemonic for
> ‘expression’ or ‘code block’ or ‘braces’.

I don't quite recall where it came from but I have a vague memory it
might be "quoted template".

> Raku evaluates {...} blocks in double-quoted strings, and goes for
> interpolation interpretation 2 — this works:
>
> say qq[Right square bracket is Unicode {ord(']')}.];
>
> Try it out: https://glot.io/snippets/gsdx63bezo
>
> I think that would be the behaviour of least surprise for users:
>
> • Somebody who hasn't thought about the issue would just write code
> like the above not noticing they have a potentially problematic
> embedded closing delimiter.
>
> • Somebody who realizes the potential issue would avoid it by picking
> a different delimiter — which would still work fine.

Yes it sound from descriptions that Ruby and JavaScript take similar
rules there, so we'd be in good company.

> Michael Conrad writes:
>
> > I would also agree that #2 is better for users of perl, but would
> > be a significant burden to the implementors of syntax highlighting.
> >
>
> I don't feel that should massively be taken into account. Syntax-
> highlighting Perl is already awkward, and if I'm writing something
> particularly esoterically nested, I don't necessarily expect syntax
> highlighters to get it right.

Plus I wouldn't be surprised if half of the existing highlighters fail
to correctly implement the rules on existing q/qq/etc.. anyway. ;) It
took us quite some effort in tree-sitter-perl to get the correct set of
behaviours, and that's from knowing a lot of the cornercase traps.


> Philippe Bruhat (BooK) writes:
>
> > I definitely prefer 1) because I think the wording of the PPC
> > implies that qt-strings are parsed just like the other quoted
> > constructs (look for the end, skip escaped delimiters). It's also
> > consistent with how parsing of quoted constructs has been
> > documented for the past 25+ years
> > (75e14d17912ce8a35d5c2b04c0c6e30b903ab97f in June 1998).
>
> Conversely, when adding a new feature, it's good to remedy any
> shortcomings in existing features. ‘You can embed a code block as an
> expression’ is simpler both to teach and market as a feature than ‘You
> can embed a code block as an expression, unless it happens to include
> the closing character for the surrounding string, in which case it
> might not work.’.
>
> Also, qt is distinct from existing quoting mechanisms in that it's the
> first one whose whole raison d'être is to interpolate a block with
> beginning and ending markers. As such, users may reasonably have
> expectations of what can be in it that isn't particularly influenced
> by what can be done inside other types of quotes.

Yes I'm inclined to agree. The entire point is to embed more complex
code structures and I think this kind of thing would come up often
enough to make people think about it more, as compared the case in q/qq
strings where it's very rarely of interest. I think about the only time
I'm ever aware of it is if I try to interpolate elements of a hash that
need quoting around the key names; e.g. this won't work:

say "My user name is $data->{"user-name"}";

Whereas my suggested interpretation 2, would permit

say qt"My user name is {$data->{"user-name"}}";

> > Thinking about how qt should be parsed, I think ... that anything
> > outside of {} should be treated as single-quoted strings
>
> I think that is indeed the intent of PPC 0019, which says in the
> Rationale section: “The proposed qt operator only looks for one
> special token in a string literal: {.?
>
> However, I see the equivalence in the Specification section implies
> that the constant text A and B would be subject to qq interpretation.
> That contradiction in the PPC should probably be resolved before
> implementation starts.

Oooh, yes - another fine question that I hadn't spotted first time
around.

> (Technically it's only a contradiction if you read A and B as being
> placeholders for arbitrary strings. If you read them as simply being
> the exact strings 'A ' and ' B' then whether they are treated as q or
> qq strings is irrelevant, since they're the same in both.)
>
> But ... I'm concerned that not having \n interpreted as a line-break
> in qt strings could make them less useful.
...
> It also makes it harder to refactor old code to ‘upgrade’ from qq
> strings to qt strings.
...

Yes, a lot of interesting thoughts there that basically come down to
another set of choices, on how to handle the non-{} parts of the qt
string contents.

I think there's three options here, in order of size

1) Treat the characters exactly like q()

2) Treat the characters like the \X-aware parts of qq() but without
$... and @... interpolations

3) Treat the characters exactly like qq(), including variable
interpolations

Clearly the PPC doesn't intend for option 3 - it shouldn't support full
variable interpolations like qq(). There aren't any examples in the PPC
to distinguish options 1 or 2, but I think from the intention of the
part you quote where it says "equivalent to" an example using qq()
instead, suggests we probably should still support those escapes.

That is, an example like:

print qt(My name is { $self->name }\n);

would emit an actual newline sequence, rather than a literal
backslash-n combination.

While this does create yet another kind of weird quoting context that
has its own unique rules, that was already the case for qt() the moment
we picked the rules for {}. That brings the count up to at least four
that I can see - q(), qq(), m() and qt().

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On Fri, Jan 12, 2024 at 12:19:21PM +0000, Paul "LeoNerd" Evans wrote:
> On Fri, 12 Jan 2024 11:32:05 +0100 (CET)
> "Smylers  " via perl5-porters <perl5-porters@perl.org> wrote:
>
> > Philippe Bruhat (BooK) writes:
> >
> > > I definitely prefer 1) because I think the wording of the PPC
> > > implies that qt-strings are parsed just like the other quoted
> > > constructs (look for the end, skip escaped delimiters). It's also
> > > consistent with how parsing of quoted constructs has been
> > > documented for the past 25+ years
> > > (75e14d17912ce8a35d5c2b04c0c6e30b903ab97f in June 1998).
> >
> > Conversely, when adding a new feature, it's good to remedy any
> > shortcomings in existing features. ‘You can embed a code block as an
> > expression’ is simpler both to teach and market as a feature than ‘You
> > can embed a code block as an expression, unless it happens to include
> > the closing character for the surrounding string, in which case it
> > might not work.’.

After discussing with Paul during this week's PSC meeting and reading
your email, I agree with you both. 2 is the better solution. There
were a bunch of issues I didn't think through.

The old quoting mechanisms will remain the same (for backwards
compatibility), but they shouldn't hold back a new and better one.

> Yes, a lot of interesting thoughts there that basically come down to
> another set of choices, on how to handle the non-{} parts of the qt
> string contents.
>
> I think there's three options here, in order of size
>
> 1) Treat the characters exactly like q()
>
> 2) Treat the characters like the \X-aware parts of qq() but without
> $... and @... interpolations
>
> 3) Treat the characters exactly like qq(), including variable
> interpolations
>
> Clearly the PPC doesn't intend for option 3 - it shouldn't support full
> variable interpolations like qq(). There aren't any examples in the PPC
> to distinguish options 1 or 2, but I think from the intention of the
> part you quote where it says "equivalent to" an example using qq()
> instead, suggests we probably should still support those escapes.

Yes, supporting existing character escapes (but not interpolation)
outside of the {} is the most useful option.

--
Philippe Bruhat (BooK)

When it is time for voting- / In the West or in the East-
Why must we always settle for- / The man we hate the least?
(Intro poem to Groo The Wanderer #108 (Epic))
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On 12.01.24 13:19, Paul "LeoNerd" Evans wrote:
> Yes, a lot of interesting thoughts there that basically come down to
> another set of choices, on how to handle the non-{} parts of the qt
> string contents.
>
> I think there's three options here, in order of size
>
> 1) Treat the characters exactly like q()
>
> 2) Treat the characters like the \X-aware parts of qq() but without
> $... and @... interpolations
>
> 3) Treat the characters exactly like qq(), including variable
> interpolations
>
> Clearly the PPC doesn't intend for option 3 - it shouldn't support full
> variable interpolations like qq(). There aren't any examples in the PPC
> to distinguish options 1 or 2, but I think from the intention of the
> part you quote where it says "equivalent to" an example using qq()
> instead, suggests we probably should still support those escapes.
>
> That is, an example like:
>
> print qt(My name is { $self->name }\n);
>
> would emit an actual newline sequence, rather than a literal
> backslash-n combination.
>
> While this does create yet another kind of weird quoting context that
> has its own unique rules, that was already the case for qt() the moment
> we picked the rules for {}. That brings the count up to at least four
> that I can see - q(), qq(), m() and qt().

It's not that weird to support backslash escapes, but not $/@
interpolation. IIRC that's exactly how tr/// works.

Also, you might want to exclude \Q, \L, \l, \U, \u, \F, \E from the set
of supported backslash escapes because they represent dynamic string
transformations that don't really make sense if variable interpolation
isn't available (and their parsing rules are a mess, which is why I
didn't bother in Quote::Code).
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On Fri, 12 Jan 2024 19:37:20 +0100
Lukas Mai <lukasmai.403+p5p@gmail.com> wrote:

> Also, you might want to exclude \Q, \L, \l, \U, \u, \F, \E from the
> set of supported backslash escapes because they represent dynamic
> string transformations that don't really make sense if variable
> interpolation isn't available (and their parsing rules are a mess,
> which is why I didn't bother in Quote::Code).

Yup, I was already thinking exactly that :)

I'll write up some more notes on the PPC doc and send it as a PR I
think.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Design question on PPC 0019 "quoted template strings" - To Sublex or not to Sublex [ In reply to ]
On Fri, 12 Jan 2024 21:01:02 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> I'll write up some more notes on the PPC doc and send it as a PR I
> think.

I've written a PR to clarify the rules about escapes in quoting.
Comments/votes welcome:

https://github.com/Perl/PPCs/pull/48

(/cc all the folks who have commented thusly)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/