Mailing List Archive: f-strings in the grammar

f-strings in the grammar

pablogsal at gmail

Sep 20, 2021, 4:18 AM

Post #1 of 32 (2052 views)

Hi,

I have started a project to move the parsing off-strings to the parser and
the grammar. Appart
from some maintenance improvements (we can drop a considerable amount of
hand-written code),
there are some interesting things we **could** (emphasis on could) get out
of this and I wanted
to discuss what people think about them.

* The parser will likely have "\n" characters and backslashes in f-strings
expressions, which currently is impossible:

>>> f"blah blah {'\n'} blah"
File "<stdin>", line 1
f"blah blah {'\n'} blah"
^
SyntaxError: f-string expression part cannot include a backslash

* The parser will allow nesting quote characters. This means that we
**could** allow reusing the same quote type in nested expressions
like this:

f"some text { my_dict["string1"] } more text"

* The parser will naturally allow more control over error messages and AST
positions.

* The **grammar** of the f-string will be fully specified without
ambiguity. Currently, the "grammar" that we have in the docs
(
https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals)
is not really formal grammar because
not only is mixing lexing details with grammar details (the definition of "
literal_char") but also is not compatible with the current python
lexing schema (for instance, it recognizes "{{" as its own token, which the
language doesn't allow because something like "{{a:b}:c}"
is tokenized as "{", "{", "a" ... not as "{{", "a". Adding a formal grammar
could help syntax highlighters, IDEs, parsers and other tools
to make sure they properly recognize everything that there is.

There may be some other advantages that we have not explored still.

The work is at a point where the main idea works (all the grammar is
already there and working), but we need to make sure that all existing
errors and specifics are properly ported to the new code, which is a
considerable amount of work still so I wanted to make sure we are on the
same page before we decide to invest more time on this (Batuhan is helping
me with this and Lyssandros will likely join us). We are doing
this work in this branch:
https://github.com/we-like-parsers/cpython/blob/fstring-grammar

Tell me what you think.

P.S. If you are interested to help with this project, please reach out to
me. If we decide to go ahead we can use your help! :)

Regards from cloudy London,
Pablo Galindo Salgado

Re: f-strings in the grammar [ In reply to ]

Erlend-A at innova

Sep 20, 2021, 5:13 AM

Post #2 of 32 (2052 views)

On 20 Sep 2021, at 13:18, Pablo Galindo Salgado <pablogsal@gmail.com<mailto:pablogsal@gmail.com>> wrote:

We are doing this work in this branch: https://github.com/we-like-parsers/cpython/blob/fstring-grammar

That link is broken. Assuming you mean https://github.com/we-like-parsers/cpython/tree/fstring-grammar?

E

Re: f-strings in the grammar [ In reply to ]

storchaka at gmail

Sep 20, 2021, 5:46 AM

Post #3 of 32 (2052 views)

20.09.21 14:18, Pablo Galindo Salgado ????:
> * The parser will likely have "\n" characters and backslashes in
> f-strings expressions, which currently is impossible:

What about characters "\x7b", "\x7d", "\x5c", etc?

What about newlines in single quotes? Currently this works:

f'''{1 +
2}'''

But this does not:

f'{1 +
2}'

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IJEJ5UVVKHEH6QGXZ3LONZPAITNMBULL/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

pablogsal at gmail

Sep 20, 2021, 6:06 AM

Post #4 of 32 (2052 views)

>
> What about characters "\x7b", "\x7d", "\x5c", etc?
> What about newlines in single quotes? Currently this works:

This is from the current branch:

>>> f"ble { '\x7b' }"
'ble {'

>>> f"{1 +
... 2}"
'3'

>>> f'{1 +
... 2}'
'3'

On Mon, 20 Sept 2021 at 13:52, Serhiy Storchaka <storchaka@gmail.com> wrote:

> 20.09.21 14:18, Pablo Galindo Salgado ????:
> > * The parser will likely have "\n" characters and backslashes in
> > f-strings expressions, which currently is impossible:
>
> What about characters "\x7b", "\x7d", "\x5c", etc?
>
> What about newlines in single quotes? Currently this works:
>
> f'''{1 +
> 2}'''
>
> But this does not:
>
> f'{1 +
> 2}'
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/IJEJ5UVVKHEH6QGXZ3LONZPAITNMBULL/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

Re: f-strings in the grammar [ In reply to ]

tjreedy at udel

Sep 20, 2021, 8:19 AM

Post #5 of 32 (2052 views)

On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:

> there are some interesting things we **could** (emphasis on could) get
> out of this and I wanted
> to discuss what people think about them.
>
> * The parser will allow nesting quote characters. This means that we
> **could** allow reusing the same quote type in nested expressions
> like this:
>
> f"some text { my_dict["string1"] } more text"

I believe that this will disable regex-based processing, such as syntax
highlighters, as in IDLE. I also think that it will be sometimes
confusing to human readers.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TWSJKE4KKSW7YD3OCHKGKJC52VUG6FY5/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

tjreedy at udel

Sep 20, 2021, 8:21 AM

Post #6 of 32 (2052 views)

On 9/20/2021 8:46 AM, Serhiy Storchaka wrote:
> 20.09.21 14:18, Pablo Galindo Salgado ????:
>> * The parser will likely have "\n" characters and backslashes in
>> f-strings expressions, which currently is impossible:
>
> What about characters "\x7b", "\x7d", "\x5c", etc?
>
> What about newlines in single quotes? Currently this works:
>
> f'''{1 +
> 2}'''
>
> But this does not:
>
> f'{1 +
> 2}'

The later is an error with or without the 'f' prefix and I think that
this should continue to be the case.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Z2IQGYH77V72D7TEDMIFWOHTN4MHKIAB/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

eric at trueblade

Sep 20, 2021, 8:34 AM

Post #7 of 32 (2052 views)

On 9/20/2021 11:21 AM, Terry Reedy wrote:
> On 9/20/2021 8:46 AM, Serhiy Storchaka wrote:
>> 20.09.21 14:18, Pablo Galindo Salgado ????:
>>> * The parser will likely have "\n" characters and backslashes in
>>> f-strings expressions, which currently is impossible:
>>
>> What about characters "\x7b", "\x7d", "\x5c", etc?
>>
>> What about newlines in single quotes? Currently this works:
>>
>> f'''{1 +
>> 2}'''
>>
>> But this does not:
>>
>> f'{1 +
>> 2}'
>
> The later is an error with or without the 'f' prefix and I think that
> this should continue to be the case.
>
The thought is that anything that's within braces {} and is a valid
expression should be allowed. Basically, the opening brace puts you in
"parse expression" mode. Personally, I'd be okay with this particular
change.

Eric

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QWCISJYDR6LYXOD4DAKUTA3EYV3XQQIM/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

eric at trueblade

Sep 20, 2021, 8:48 AM

Post #8 of 32 (2052 views)

On 9/20/2021 11:19 AM, Terry Reedy wrote:
> On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:
>
>> there are some interesting things we **could** (emphasis on could)
>> get out of this and I wanted
>> to discuss what people think about them.
>>
>> * The parser will allow nesting quote characters. This means that we
>> **could** allow reusing the same quote type in nested expressions
>> like this:
>>
>> f"some text { my_dict["string1"] } more text"
>
> I believe that this will disable regex-based processing, such as
> syntax highlighters, as in IDLE. I also think that it will be
> sometimes confusing to human readers.

When I initially wrote f-strings, it was an explicit design goal to be
just like existing strings, but with a new prefix. That's why there are
all of the machinations in the parser for scanning within f-strings: the
parser had already done its duty, so there needed to be a separate stage
to decode inside the f-strings. Since they look just like regular
strings, most tools could add the lowest possible level of support just
by adding 'f' to existing prefixes they support: 'r', 'b', 'u'. The
upside is that if you don't care about what's inside an f-string, your
work is done.

I definitely share your concern about making f-strings more complicated
to parse for tool vendors: basically all editors, alternative
implementations, etc.: really anyone who parses python source code. But
maybe we've already crossed this bridge with the PEG parser. Although I
realize there's a difference between lexing and parsing. While the PEG
parser just makes parsing more complicated, this change would make what
was lexing into a more sophisticated parsing problem.

In 2018 or 2019 at PyCon in Cleveland I talked to several tool vendors.
It's been so long ago that I don't remember who, but I'm pretty sure it
was PyCharm and 2 or 3 other editors. All of them supported making this
change, even understanding the complications it would cause them. I
don't recall if I talked to anyone who maintains an alternative
implementation, but we should probably discuss it with MicroPython,
Cython, PyPy, etc., and understand where they stand on it.

In general I'm supportive of this change, because as Pablo points out
there are definite benefits. But I think if we do accept it we should
understand what sort of burden we're putting on tool and implementation
authors. It would probably be a good idea to discuss it at the upcoming
dev sprints.

Eric

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DN5HB7CBS7I2FXI74UBM4ZZVMSNVDQ57/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

guido at python

Sep 20, 2021, 8:53 AM

Post #9 of 32 (2052 views)

The current restrictions will also confuse some users (e.g. those used to
bash, and IIRC JS, where the rules are similar as what Pablo is proposing).

On Mon, Sep 20, 2021 at 8:24 AM Terry Reedy <tjreedy@udel.edu> wrote:

> On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:
>
> > there are some interesting things we **could** (emphasis on could) get
> > out of this and I wanted
> > to discuss what people think about them.
> >
> > * The parser will allow nesting quote characters. This means that we
> > **could** allow reusing the same quote type in nested expressions
> > like this:
> >
> > f"some text { my_dict["string1"] } more text"
>
> I believe that this will disable regex-based processing, such as syntax
> highlighters, as in IDLE. I also think that it will be sometimes
> confusing to human readers.
>
> --
> Terry Jan Reedy
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/TWSJKE4KKSW7YD3OCHKGKJC52VUG6FY5/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Re: f-strings in the grammar [ In reply to ]

tagrain at gmail

Sep 20, 2021, 8:54 AM

Post #10 of 32 (2052 views)

I don't think the python syntax should be beholden to syntax highlighting tools, eventually some syntax feature that PEG enables will require every parser or highlighter to switch to a similar or more powerful parse tool
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FI6QGN5GJJMVXOEM3VDZ7CAKIEUU2S4R/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

pablogsal at gmail

Sep 20, 2021, 8:56 AM

Post #11 of 32 (2052 views)

Thanks a lot, Eric for your message! I actually share some of these worries
myself
and that's why I wanted to have a bigger conversation.

I wanted to also make clear that the change doesn't force us to do
*everything*. This means
that we can absolutely have some of the improvements but not others (for
example allowing
backslashes but not nesting). So is important to be clear that is not "all
or nothing". We just need to
decide what set of things in the design space we want :)

On Mon, 20 Sept 2021 at 16:52, Eric V. Smith <eric@trueblade.com> wrote:

> On 9/20/2021 11:19 AM, Terry Reedy wrote:
> > On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:
> >
> >> there are some interesting things we **could** (emphasis on could)
> >> get out of this and I wanted
> >> to discuss what people think about them.
> >>
> >> * The parser will allow nesting quote characters. This means that we
> >> **could** allow reusing the same quote type in nested expressions
> >> like this:
> >>
> >> f"some text { my_dict["string1"] } more text"
> >
> > I believe that this will disable regex-based processing, such as
> > syntax highlighters, as in IDLE. I also think that it will be
> > sometimes confusing to human readers.
>
> When I initially wrote f-strings, it was an explicit design goal to be
> just like existing strings, but with a new prefix. That's why there are
> all of the machinations in the parser for scanning within f-strings: the
> parser had already done its duty, so there needed to be a separate stage
> to decode inside the f-strings. Since they look just like regular
> strings, most tools could add the lowest possible level of support just
> by adding 'f' to existing prefixes they support: 'r', 'b', 'u'. The
> upside is that if you don't care about what's inside an f-string, your
> work is done.
>
> I definitely share your concern about making f-strings more complicated
> to parse for tool vendors: basically all editors, alternative
> implementations, etc.: really anyone who parses python source code. But
> maybe we've already crossed this bridge with the PEG parser. Although I
> realize there's a difference between lexing and parsing. While the PEG
> parser just makes parsing more complicated, this change would make what
> was lexing into a more sophisticated parsing problem.
>
> In 2018 or 2019 at PyCon in Cleveland I talked to several tool vendors.
> It's been so long ago that I don't remember who, but I'm pretty sure it
> was PyCharm and 2 or 3 other editors. All of them supported making this
> change, even understanding the complications it would cause them. I
> don't recall if I talked to anyone who maintains an alternative
> implementation, but we should probably discuss it with MicroPython,
> Cython, PyPy, etc., and understand where they stand on it.
>
> In general I'm supportive of this change, because as Pablo points out
> there are definite benefits. But I think if we do accept it we should
> understand what sort of burden we're putting on tool and implementation
> authors. It would probably be a good idea to discuss it at the upcoming
> dev sprints.
>
> Eric
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DN5HB7CBS7I2FXI74UBM4ZZVMSNVDQ57/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

Re: f-strings in the grammar [ In reply to ]

Sep 20, 2021, 9:03 AM

Post #12 of 32 (2052 views)

> The current restrictions will also confuse some users (e.g. those used to bash, and IIRC JS, where the rules are similar as what Pablo is proposing).
> --
> --Guido van Rossum (python.org/~guido <http://python.org/~guido>)

WRT the similar syntax in bash (and similar shells), there are two options:

"string `code` string"

"string $(code) string"

The latter, $(), allows fully-featured nesting in the way Pablo is suggesting:

"string $(code "string2 $(code2) string2" code) string"

The former, using backticks, does not allow nesting directly, but it allows extra backslashes inside the backticks to escape the nested ones, like this:

"string `code "string2 \`code2\` string2" code` string"

This can be nested infinitely using lots of backslashes. Is this worth considering as another option? It doesn't have the disadvantage of complicating lexing (as much), although nesting with backslashes is quite ugly. IMO nesting things in f-strings would be ugly anyway, so I don't think that would matter too much.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R5NNGXYOU74VEXCBF7API7EFRGLN7MWJ/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

ucodery at gmail

Sep 20, 2021, 9:26 AM

Post #13 of 32 (2052 views)

I just want to say that I am very excited to see where this goes. As an
author of a package that tries to recreate compiled f-strings at runtime,
they are a hard thing to generate given the current tools within Python.

On Mon, Sep 20, 2021 at 4:23 AM Pablo Galindo Salgado <pablogsal@gmail.com>
wrote:

>
> Tell me what you think.
>
> P.S. If you are interested to help with this project, please reach out to
> me. If we decide to go ahead we can use your help! :)
>

I don't know the CPython API very well, but if there is anything I can do
to help, I would be happy to assist.

Regards,
Jeremiah

Re: f-strings in the grammar [ In reply to ]

tjreedy at udel

Sep 20, 2021, 10:53 AM

Post #14 of 32 (2052 views)

On 9/20/2021 11:48 AM, Eric V. Smith wrote:
>
> When I initially wrote f-strings, it was an explicit design goal to be
> just like existing strings, but with a new prefix. That's why there are
> all of the machinations in the parser for scanning within f-strings: the
> parser had already done its duty, so there needed to be a separate stage
> to decode inside the f-strings. Since they look just like regular
> strings, most tools could add the lowest possible level of support just
> by adding 'f' to existing prefixes they support: 'r', 'b', 'u'.

Which is what I did with IDLE. Of course 'just add' was complicated by
uppercase being allowed and 'f' being compatible with 'r' but not 'u' or
'b'.

> I definitely share your concern about making f-strings more complicated
> to parse for tool vendors: basically all editors, alternative
> implementations, etc.: really anyone who parses python source code. But
> maybe we've already crossed this bridge with the PEG parser.

I think we are on the far side of the bridge with contextual keywords.
I don't believe the new code for highlighting the new match statement is
exactly correct. As I remember, properly classifying '_' in all the
examples we created was too difficult, and maybe not possible.

> Although I
> realize there's a difference between lexing and parsing. While the PEG
> parser just makes parsing more complicated, this change would make what
> was lexing into a more sophisticated parsing problem.

I have no love for the RE code. I would try ast.parse if I was not sure
it would be too slow. I would be happy if a simplified and fast minimal
lexer/parser were added for everyone to use. It would not have to make
exactly the same distinctions that IDLE currently does.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SAYU6SMP4KT7G7AQ6WVQYUDOSZPKHJMS/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

guido at python

Sep 20, 2021, 1:25 PM

Post #15 of 32 (2052 views)

On Mon, Sep 20, 2021 at 1:07 PM Patrick Reader <_@pxeger.com> wrote:

> > The current restrictions will also confuse some users (e.g. those used
> to bash, and IIRC JS, where the rules are similar as what Pablo is
> proposing).
> > --
> > --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
>
> WRT the similar syntax in bash (and similar shells), there are two options:
>
> "string `code` string"
>
> "string $(code) string"
>
> The latter, $(), allows fully-featured nesting in the way Pablo is
> suggesting:
>
> "string $(code "string2 $(code2) string2" code) string"
>
> The former, using backticks, does not allow nesting directly, but it
> allows extra backslashes inside the backticks to escape the nested ones,
> like this:
>
> "string `code "string2 \`code2\` string2" code` string"
>
> This can be nested infinitely using lots of backslashes. Is this worth
> considering as another option? It doesn't have the disadvantage of
> complicating lexing (as much), although nesting with backslashes is quite
> ugly. IMO nesting things in f-strings would be ugly anyway, so I don't
> think that would matter too much.
>

F-strings are more like $(...), since the interpolation syntax uses {...}
delimiters. So it probably should work that way. JS interpolation works
that way too, see
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#nesting_templates
.

I wouldn't want to do anything to bring `backticks` back in the language.

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Re: f-strings in the grammar [ In reply to ]

brett at python

Sep 20, 2021, 2:36 PM

Post #16 of 32 (2051 views)

On Mon, Sep 20, 2021 at 8:58 AM Thomas Grainger <tagrain@gmail.com> wrote:

> I don't think the python syntax should be beholden to syntax highlighting
> tools, eventually some syntax feature that PEG enables will require every
> parser or highlighter to switch to a similar or more powerful parse tool
>

But that's not how syntax highlighting works in editors. You typically
don't get to choose the parsing tool used for syntax highlighting, you just
define the grammar using whatever is provided by the editor (which has
always been regexes based on my experience). So there's no way to "require"
every editor out there to switch to a PEG parser or equivalent to support
Python's grammar because that's asking every editor to change how syntax
highlighting is implemented at a lower level.

Having said all that, I think as long as we understand that this is a
side-effect then it's fine; syntax highlighting is usually not tied to
semantics in an editor so it shouldn't be a blocker on this. If people care
they simply won't use the same type of quotes in their code (which I bet is
what most people will do unless Black says otherwise ????).

But I also think this means we definitely have to get a parser module for
tools together as this is way more potential breakage than just parentheses
for `with` statements and I don't know if formatting tools can just move to
the AST module at that point. ????

Re: f-strings in the grammar [ In reply to ]

pablogsal at gmail

Sep 20, 2021, 2:50 PM

Post #17 of 32 (2051 views)

>> But I also think this means we definitely have to get a parser module

What is in this context a "parse" module? Because that will massively
change depending who you ask. We already expose APIs that return AST
objects that can be used for all sort of things and a tokenizer module that
exposes some form of lexing that is relatively close to the one that
CPython uses internally. The only missing piece would be something that
returns a CST with enough information to reconstruct the source but at this
point that is absolutely arbitrary because nothing in CPython would use
that tree. Not only that, but the requirements from such CST will change
quite a lot depending on who you ask and that impacts a lot the APIs that
we would need to offer.

Offering a parse module here can involve quite a high maintainance cost
without the certainly that will be useful to all set of users.

That also without considering that many tools parsing Python code are not
written on Python and will not be able to leverage it.

On Mon, 20 Sep 2021, 22:39 Brett Cannon, <brett@python.org> wrote:

>
>
> On Mon, Sep 20, 2021 at 8:58 AM Thomas Grainger <tagrain@gmail.com> wrote:
>
>> I don't think the python syntax should be beholden to syntax highlighting
>> tools, eventually some syntax feature that PEG enables will require every
>> parser or highlighter to switch to a similar or more powerful parse tool
>>
>
> But that's not how syntax highlighting works in editors. You typically
> don't get to choose the parsing tool used for syntax highlighting, you just
> define the grammar using whatever is provided by the editor (which has
> always been regexes based on my experience). So there's no way to "require"
> every editor out there to switch to a PEG parser or equivalent to support
> Python's grammar because that's asking every editor to change how syntax
> highlighting is implemented at a lower level.
>
> Having said all that, I think as long as we understand that this is a
> side-effect then it's fine; syntax highlighting is usually not tied to
> semantics in an editor so it shouldn't be a blocker on this. If people care
> they simply won't use the same type of quotes in their code (which I bet is
> what most people will do unless Black says otherwise ????).
>
> But I also think this means we definitely have to get a parser module for
> tools together as this is way more potential breakage than just parentheses
> for `with` statements and I don't know if formatting tools can just move to
> the AST module at that point. ????
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/I4POAK22LZW4RNFGFFKQ6BILRLCSQO2I/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

Re: f-strings in the grammar [ In reply to ]

stephenjturnbull at gmail

Sep 20, 2021, 6:36 PM

Post #18 of 32 (2051 views)

Eric V. Smith writes:

> >> But this does not:
> >>
> >> f'{1 +
> >> 2}'
> >
> > The later is an error with or without the 'f' prefix and I think that
> > this should continue to be the case.
> >
> The thought is that anything that's within braces {} and is a valid
> expression should be allowed.

-0 FWIW, some thoughts specific to me, I don't know how
representative they might be of others.

I guess you could argue that the braces are a kind of expression-level
parenthesis, but I don't "see" them that way. I see *one* string with
eval'able format expressions embedded in it, so that single-quoted
strings can't have embedded newlines. I also don't see the braces as
expression-level syntax (after all, they already have two different
meanings at expression level), I see them as part of f-string syntax.
So even with triple-quoted strings, my eyes "want" to see parentheses
or line continuation (which already work).

I'm sure I could get used to the syntax. But ...

Is this syntax useful? Or is it just a variant of purity trying to
escape Pandora's virtualbox? I mean, am I going to see it often
enough to get used to it? Or am I going to WTF at it for the rest of
my life?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPFHA55JDGX522UL2KXIRZKDPIOVDP66/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

david.mertz at gmail

Sep 20, 2021, 6:58 PM

Post #19 of 32 (2051 views)

I know I'm strongly -1 on allowing much more than currently exists for
f-strings. For basically the same reason Stephen explains.

Newlines inside braces, for example, go way too far away from readability.
Nested expressions also feel like an attractive nuisance. I use f-strings
all the time, but in much the same way a thousand character regular
expression is an abuse (even if perfectly well defined grammatically),
really complex f-strings worries look and feel much the same.

On Mon, Sep 20, 2021, 9:39 PM Stephen J. Turnbull <
stephenjturnbull@gmail.com> wrote:

> Eric V. Smith writes:
>
> > >> But this does not:
> > >>
> > >> f'{1 +
> > >> 2}'
> > >
> > > The later is an error with or without the 'f' prefix and I think that
> > > this should continue to be the case.
> > >
> > The thought is that anything that's within braces {} and is a valid
> > expression should be allowed.
>
> -0 FWIW, some thoughts specific to me, I don't know how
> representative they might be of others.
>
> I guess you could argue that the braces are a kind of expression-level
> parenthesis, but I don't "see" them that way. I see *one* string with
> eval'able format expressions embedded in it, so that single-quoted
> strings can't have embedded newlines. I also don't see the braces as
> expression-level syntax (after all, they already have two different
> meanings at expression level), I see them as part of f-string syntax.
> So even with triple-quoted strings, my eyes "want" to see parentheses
> or line continuation (which already work).
>
> I'm sure I could get used to the syntax. But ...
>
> Is this syntax useful? Or is it just a variant of purity trying to
> escape Pandora's virtualbox? I mean, am I going to see it often
> enough to get used to it? Or am I going to WTF at it for the rest of
> my life?
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RPFHA55JDGX522UL2KXIRZKDPIOVDP66/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

Re: f-strings in the grammar [ In reply to ]

guido at python

Sep 20, 2021, 7:04 PM

Post #20 of 32 (2051 views)

[Stephen J. Turnbull]

> Is this syntax useful? Or is it just a variant of purity trying to
> escape Pandora's virtualbox? I mean, am I going to see it often
> enough to get used to it? Or am I going to WTF at it for the rest of
> my life?
>

I don't know about the line breaks, but in recent weeks I've found myself
more than once having to remind myself that inside interpolations, you must
use the other type of quote. Things like

print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}")

Learning that inside {} you can write any expression is easy (it's a real
Aha! moment -- that's the power of f-strings). Remembering that you have to
switch up the quote characters there is hard -- it doesn't occur very
often, and the reason is obscure. By the time my mental parser has made it
to the argument list of removesuffix() it has already forgotten that it's
inside an f-string and my fingers just reach for my favorite quote
character.

And the error isn't really helping either:
```
>>> print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}")
File "<stdin>", line 1
print(f"{source.removesuffix(".py")}.c: $(srcdir)/{source}")
^
SyntaxError: f-string: unmatched '('
```

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Re: f-strings in the grammar [ In reply to ]

stephenjturnbull at gmail

Sep 20, 2021, 10:09 PM

Post #21 of 32 (2051 views)

Guido van Rossum writes:

> I don't know about the line breaks, but in recent weeks I've found myself
> more than once having to remind myself that inside interpolations, you must
> use the other type of quote.

My earlier remarks were specifically directed to line breaks.

I see the point, but I think the question should be readability, as
David points out. I don't think there's a problem with the opening
quote in your example. Even in an ordinary string literal it's
obvious to me that the embedded quotation marks are not intended to
terminate the string:

s = "Here is a singleton " and here is an initial for "something."

But how about that last quotation mark? I tried to construct a
similarly visually ambiguous f-string where braces "hide" the embedded
quotation marks, and couldn't do it without a trailing quote followed
immediately by an embedded literal line break.

So I'm cautiously sympathetic to this extension, as long as embedded
line breaks are not permitted in singly-quoted f-strings.

However, I myself will almost certainly automatically "correct" such
quotation marks if they are allowed. So this is unlikely to be a plus
or a minus for me.

Steve

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZILJFTV6UXO63F76PSY6VCPNGTLMYIMR/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

Sep 20, 2021, 11:50 PM

Post #22 of 32 (2051 views)

I didn't meant to bring back backticks, but to use the semantics they have in shell languages of using backslashes to escape nested substitutions, like this:

f"string {code f\"string2 \{code2\} string2\" code} string"

Upon reflection though, I agree that since we already use brackets which lend themselves to nesting, it probably does make more sense to use them for nesting.

On 20/09/2021 21:25, Guido van Rossum wrote:
> On Mon, Sep 20, 2021 at 1:07 PM Patrick Reader <_@pxeger.com <http://pxeger.com>> wrote:
>
> > The current restrictions will also confuse some users (e.g. those used to bash, and IIRC JS, where the rules are similar as what Pablo is proposing).
> > --
> > --Guido van Rossum (python.org/~guido <http://python.org/~guido> <http://python.org/~guido>)
>
> WRT the similar syntax in bash (and similar shells), there are two options:
>
> "string `code` string"
>
> "string $(code) string"
>
> The latter, $(), allows fully-featured nesting in the way Pablo is suggesting:
>
> "string $(code "string2 $(code2) string2" code) string"
>
> The former, using backticks, does not allow nesting directly, but it allows extra backslashes inside the backticks to escape the nested ones, like this:
>
> "string `code "string2 \`code2\` string2" code` string"
>
> This can be nested infinitely using lots of backslashes. Is this worth considering as another option? It doesn't have the disadvantage of complicating lexing (as much), although nesting with backslashes is quite ugly. IMO nesting things in f-strings would be ugly anyway, so I don't think that would matter too much.
>
>
> F-strings are more like $(...), since the interpolation syntax uses {...} delimiters. So it probably should work that way. JS interpolation works that way too, see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#nesting_templates .
>
> I wouldn't want to do anything to bring `backticks` back in the language.
>
> --
> --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
> /Pronouns: he/him //(why is my pronoun here?)/ <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7COOVJPGJMDLYRS2WNQZMMOGVMBJBQFK/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

ajm at flonidan

Sep 21, 2021, 4:47 AM

Post #23 of 32 (2051 views)

Pablo Galindo Salgado [mailto:pablogsal@gmail.com] wrote:
> We already expose APIs that return AST objects that can be used for all sort of things and a tokenizer module that exposes some form of lexing that is relatively close to the one that CPython uses internally.

What do you envision tokenize.py will do with f-strings after this?
What would be the output of, say,
$ echo 'f"hello {world!r}."' | python3 -m tokenize
?

regards, Anders

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XXHWMINTPOLHLECS7BSHZPOC7RRN47T2/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: f-strings in the grammar [ In reply to ]

pablogsal at gmail

Sep 21, 2021, 4:51 AM

Post #24 of 32 (2051 views)

>> What do you envision tokenize.py will do with f-strings after this?

It will emit new tokens: FSTRING_START FSTRING_MIDDLE '{' NAME
FSTRING_FORMAT '}' FSTRING_END

On Tue, 21 Sept 2021 at 12:50, Anders Munch <ajm@flonidan.dk> wrote:

> Pablo Galindo Salgado [mailto:pablogsal@gmail.com] wrote:
> > We already expose APIs that return AST objects that can be used for all
> sort of things and a tokenizer module that exposes some form of lexing that
> is relatively close to the one that CPython uses internally.
>
> What do you envision tokenize.py will do with f-strings after this?
> What would be the output of, say,
> $ echo 'f"hello {world!r}."' | python3 -m tokenize
> ?
>
> regards, Anders
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/XXHWMINTPOLHLECS7BSHZPOC7RRN47T2/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

Re: f-strings in the grammar [ In reply to ]

eric at trueblade

Sep 21, 2021, 11:42 AM

Post #25 of 32 (2051 views)

To bring this back on track, I'll try and answer the questions from your
original email.

On 9/20/2021 7:18 AM, Pablo Galindo Salgado wrote:
> I have started a project to move the parsing off-strings to the parser
> and the grammar. Appart
> from some maintenance improvements (we can drop a considerable
> amount of hand-written code),
> there are some interesting things we **could** (emphasis on could) get
> out of this and I wanted
> to discuss what people think about them.

I think this is all awesome.

My position is that if we make zero syntactic changes to f-strings, and
leave the functionality exactly as it is today, I think we should still
move the logic into the parser and grammar, as you suggested. As you
say, this would eliminate a lot of code, and in addition likely get us
better error messages. As for the things we could possibly add:

> * The parser will likely have "\n" characters and backslashes in
> f-strings expressions, which currently is impossible:
>
> >>> f"blah blah {'\n'} blah"
> File "<stdin>", line 1
> f"blah blah {'\n'} blah"
> ^
> SyntaxError: f-string expression part cannot include a backslash

I think supporting backslashes in strings inside of f-string expression
(the part inside {}) would be a big win, and should be the first thing
we allow. I often have to do this:

nl = '\n'
x = f"blah {nl if condition else ' '}"

Being able to write this more naturally would be a big win.

I don't recall exactly why, but I disallowed backslashes inside
expressions at the last minute before 3.6 was released. It might have
been because I was interpreting them in a way that didn't make sense if
a "real" parser were inspecting f-strings. The idea, even back then, was
to re-allow them when/if we moved f-string parsing into the parser
itself. I think it's time.

> * The parser will allow nesting quote characters. This means that we
> **could** allow reusing the same quote type in nested expressions
> like this:
>
> f"some text { my_dict["string1"] } more text"
I'm okay with this, with the caveat that I raised in another email: the
effect on non-Python tools and alternate Python implementations. To
restate that here: as long as we survey some (most?) of the affected
parties and they're okay with it (or at least it doesn't cause them a
gigantic amount of work), then I'm okay with it. This will of course be
subjective. My big concern is tools that today use regex's (or similar)
to recognize f-strings, and then completely ignore what's inside them.
They just want to "skip over" f-strings in the source code, maybe
because they're doing some sort of source-to-source transpiling, and
they're just going to output the f-strings as-is. It seems to me we're
creating a lot of work for such tools. Are there a lot of such tools? I
don't know: maybe there are none.
> * The parser will naturally allow more control over error messages and
> AST positions.
This would be a good win.
> * The **grammar** of the f-string will be fully specified without
> ambiguity. Currently, the "grammar" that we have in the docs
> (https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals
> <https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals>)
> is not really formal grammar because
> not only is mixing lexing details with grammar details (the definition
> of "literal_char") but also is not compatible with the current python
> lexing schema (for instance, it recognizes "{{" as its own token,
> which the language doesn't allow because something like "{{a:b}:c}"
> is tokenized as "{", "{", "a" ... not as "{{", "a". Adding a formal
> grammar could help syntax highlighters, IDEs, parsers and other tools
> to make sure they properly recognize everything that there is.
Also a big win.
> There may be some other advantages that we have not explored still.
>
> The work is at a point where the main idea works (all the grammar is
> already there and working), but we need to make sure that all existing
> errors and specifics are properly ported to the new code, which is a
> considerable amount of work still so I wanted to make sure we are on the
> same page before we decide to invest more time on this (Batuhan is
> helping me with this and Lyssandros will likely join us). We are doing
> this work in this branch:
> https://github.com/we-like-parsers/cpython/blob/fstring-grammar
> <https://github.com/we-like-parsers/cpython/blob/fstring-grammar>
>
> Tell me what you think.
>
> P.S. If you are interested to help with this project, please reach out
> to me. If we decide to go ahead we can use your help! :)

I'm interested in helping.

Thanks for your work on this.

Eric