Mailing List Archive: PEP 622 aspects

PEP 622 aspects

Jul 18, 2020, 10:57 AM

Post #1 of 3 (161 views)

PEP 622 authors,

Overall, the PEP describes the proposal quite nicely. However, I do indeed
have concerns and questions, some of which I describe in this email.

(1) Class pattern that does isinstance and nothing else.

If I understand the proposed semantics correctly, `Class()` is equivalent
to checking `isinstance(obj, Class)`, also when `__match_args__` is not
present. However, if a future match protocol is allowed to override this
behavior to mean something else, for example `Class() == obj`, then the
plain isinstance checks won't work anymore! I do find `Class() == obj` to
be a more intuitive and consistent meaning for `Class()` than plain
`isinstance` is.

Instead, the plain isinstance check would seem to be well described by a
pattern like `Class(...)`. This would allow isinstance checks for any
class, and there is even a workaround if you really want to refer to the
Ellipsis object. This is also related to the following point.

(2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)`

In the proposed semantics, cases like this are equivalent. I can see why
that is desirable in many cases, although Class(x=1, ...)` would make it
more clear. A possible improvement might be to add an optional element to
`__match_args__` that separates optional arguments from required ones
(although "optional" is not the same as "don't care").

(3) Check for exhaustiveness at runtime

The PEP states:

Check exhaustiveness at runtime
> The question is what to do if no case clause has a matching pattern, and
> there is no default case. An earlier version of the proposal specified that
> the behavior in this case would be to throw an exception rather than
> silently falling through.
> The arguments back and forth were many, but in the end the EIBTI (Explicit
> Is Better Than Implicit) argument won out: it's better to have the
> programmer explicitly throw an exception if that is the behavior they want.
> For cases such as sealed classes and enums, where the patterns are all
> known to be members of a discrete set, static checkers can warn about
> missing patterns.

I don't understand this argument. Would it not be more explicit to have an
`else` or `case _` branch to say what should happen in that case?

(4) Check for exhaustiveness by static checkers

About this, the PEP states:

From a reliability perspective, experience shows that missing a case when
> dealing with a set of possible data values leads to hard to debug issues,
> thus forcing people to add safety asserts like this:
> def get_first(data: Union[int, list[int]]) -> int:
> if isinstance(data, list) and data:
> return data[0]
> elif isinstance(data, int):
> return data
> else:
> assert False, "should never get here"
> PEP 484 specifies that static type checkers should support exhaustiveness
> in conditional checks with respect to enum values. PEP 586 later
> generalized this requirement to literal types.
> This PEP further generalizes this requirement to arbitrary patterns.

This seems reasonable. However, why is the standard for static and runtime
different? The corresponding runtime check is extremely easy and efficient
to do, so if this is an error according to static analysis, why not make it
an error at runtime too?

—Koos

Re: PEP 622 aspects [ In reply to ]

kohnt at tobiaskohn

Jul 19, 2020, 4:55 AM

Post #2 of 3 (158 views)

Permalink

Hi Koos,

Let me try and address some of the concerns and questions you are
rising. I am replying here to two emails of yours so as to keep
traffic down.

Quoting Koos Zevenhoven <k7hoven@gmail.com>:

> (1) Class pattern that does isinstance and nothing else.
>
> If I understand the proposed semantics correctly, `Class()` is
> equivalent to checking `isinstance(obj, Class)`, also when
> `__match_args__` is not present. However, if a future match protocol
> is allowed to override this behavior to mean something else, for
> example `Class() == obj`, then the plain isinstance checks won't
> work anymore! I do find `Class() == obj` to be a more intuitive and
> consistent meaning for `Class()` than plain `isinstance` is.
>
> Instead, the plain isinstance check would seem to be well described
> by a pattern like `Class(...)`. This would allow isinstance checks
> for any class, and there is even a workaround if you really want to
> refer to the Ellipsis object. This is also related to the following
> point.
>
> (2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)`
>
> In the proposed semantics, cases like this are equivalent. I can see
> why that is desirable in many cases, although Class(x=1, ...)` would
> make it more clear. A possible improvement might be to add an
> optional element to `__match_args__` that separates optional
> arguments from required ones (although "optional" is not the same as
> "don't care").

Please let me answer these two questions in reverse order, as I think
it makes more sense to tackle the second one first.

**2. ATTRIBUTES**

There actually is an important difference between `Class(x=1, y=_)`
and `Class(x=1)` and it won't do to just write `Class(x=1,...)`
instead. The form `Class(x=1, y=_)` ensures that the object has an
attribute `y`. In a way, this is where the "duck typing" is coming in.

The class of an object and its actual shape (i.e. the set of
attributes it has) are rather loosely coupled in Python: there is
usually nothing in the class itself that specifies what attributes an
object has (other than the good sense to add these attributes in
`__init__`). Conceptually, it therefore makes sense to not only
support `isinstance` but also `hasattr`/`getattr` as a means to
specify the shape/structure of an object.

Let me give a very simple example from Python's `AST` module. We
know that compound statements have a field `body` (for the suite) and
possibly even a field `orelse` (for the `else` part). But there is no
common superclass for compound statements. Hence, although it is
shared by several objects, you cannot detect this structure through
`isinstance` alone. By allowing you to explicitly specify attributes
in patterns, you can still use pattern matching notwithstanding:
```
MATCH node:
    CASE ast.stmt(body=suite, orelse=else_suite) if else_suite:
        # a statement with a non-empty else-part
        ...
    CASE ast.stmt(body=suite):
        # a compound statement without else-part
        ...
    CASE ast.stmt():
        # a simple statement
        ...
```

The very basic form of class patterns could be described as
`C(a_1=P_1, a_2=P_2, ...)`, where `C` is a class to be checked through
`isinstance`, and the `a_i` are attribute names to be extracted by
means of `getattr` to then be matched against the subpatterns `P_i`.
In short: you specify the structure not only by class, but also by its
actual structure in form of required attributes.

Particularly for very simple objects, it becomes annoying to specify
the attribute names each time. Take, for instance, the
`Num`-expression from the AST. It has just a single field `n` to hold
the actual number. But the AST objects also contain an attribute
`_fields = ('n',)` that not only lists the *relevant* attributes, but
also specifies an order. It thus makes sense to introduce a
convention that in `Num(x)` without argument name, the `x` corresponds
to the first field `n`. Likewise, you write `UnarOp('+', item)`
without the attribute names because `_fields=('op', 'operand')`
already tells you what attributes are meant. That is essentially the
principle we adopted through introduction of `__match_args__`.

**1. MATCH PROTOCOL**

I am not entirely sure what you mean by `C() == obj`. In most cases
you could not actually create an instance of `C` without some
meaningful arguments for the constructor.

The idea of the match-protocol is very similar to how you can
already override the behaviour of `isinstance`. It is not meant to
completely change the semantics of what is already there, but to allow
you to customise it (in some exciting ways ^_^). Of course, as with
everything customisable, you could go off and do something funny with
it, but if it then breaks, that's quite on you.

On the caveat that this is **NOT PART OF THIS PEP (!)**, let me try
and explain why we would consider a match protocol in the first
place. The standard example to consider are complex numbers. In
Python complex numbers are represented in their "rectangular" form,
i.e. as `c = a + b*j` with a real and an imaginary part. However,
this is not the only way to represent a complex number. You could
equally write it in its polar form as `c = r * exp(i * phi)`.
Depending on the context, this second form has some advantages, e.g.,
when computing the root or power of `c`.

So, what we would like to do is write a pattern like this:
```
CLASS Polar:
    DEF __init__(self, r, p=0):
        IF isinstance(r, complex):
            r, p = rect_to_polar(r)
        self.radius = r
        self.phi = p

MATCH some_complex_number:
    CASE Polar(radius=r, phi=p):
        ...
```
Naively, however, this will always fail because a complex number `c`
in Python is never an instance of my custom class `Polar`. Just
overriding the `isinstance` behaviour of `Polar` will not suffice,
either, because we are then trying to access attributes that are not
there (namely `radius` and `phi`). Our original approach was
therefore to allow `Polar` to swap the subject of pattern matching for
further processing inside a given case clause. An `instancecheck` on
steroids if you will. Something along the lines of:
```
CLASS Polar:
    @staticmethod
    DEF __match__(original_subject):
        IF isinstance(original_subject, Polar):
            RETURN original_subject)
        ELIF isinstance(original_subject, complex):
            RETURN Polar(original_subject)
        ELSE:
            RETURN CANNOT_MATCH
```
There are various valid concerns with this initial idea of the match
protocol, and we will probably be aiming for a simpler, less complex
variant that addresses actual use cases as best as possible. But any
such future extension will be an opt-in extension of current semantics
and not a replacement that suddenly changes the meaning of class
pattern altogether.

Quoting Koos Zevenhoven <k7hoven@gmail.com>:

> This is related to one of my concerns regarding PEP 622. It may be
tempting to see pattern matching as a form of assignment. However,
that is quite a stretch, both conceptually and as a future direction.
There is no way these 'match expressions' could be allowed in regular
assignments – the way names are treated just needs to be different.
And allowing them in walrus assignments doesn't make much sense either.

We probably agree here on an important aspect: in Python we cannot
simply extend the idea of patterns (as proposed by PEP 622) to general
assignments. But this in itself hardly says anything about the
validity of the proposal. Passing arguments to a function is clearly
a form of assignment that follows slightly different rules to "normal"
assignment as a stand-alone statement, not to mention special
assignment structures like `for` loops or `with` blocks. Up to a
certain point, it is even debatable what "assignment" means: some
functional languages would argue that they work without assignments
altogether because it is well hidden in parameter passing.

Then there are other statements such as `return` that are only valid
in the context of a function. Although we certainly could attach
meaning to `return` on the module level as well. But there are good
reasons not to do that, and yet it does not decrease the validity of
it in any way.

> Conceptually, it is strange to call this match operation an
> assignment. Most of the added power comes from checking that the
> object has a certain structure or contents – and in many cases, that
> is the only thing it does! As a (not always) handy side product, it
> is also able to assign things to specified targets. Even then, the
> whole pattern is not assigned to, only parts of it are.
>
> In mathematics, assignment (definition) and re-assignment is often
> denoted with the same sign as equality/identity, because it is
> usually clear from the context, which one is in question. Usually,
> however, it matters which one is in question. Therefore, as we well
> know, we have = for assignment, == for equality, and := to emphasize
> assignment. Matching is closer to ==, or almost :==.

In general, patterns have a "compare and assign" semantics, or
perhaps "filter and assign". And, indeed, you can forgo the
assignment aspect completely, but that's conceptually not what
patterns are meant for. Moreover, the syntax is flexible enough to
mask a lot of what would have been an assignment in the original
conception of a pattern.

I would claim this is quite similar to functions. In Python all
functions return a value, even though you might throw away that value
in many cases, particularly if the only value a function can return is
`None`. Still I would certainly not go as far as saying the concept
of returning a value is wrong because it might only apply in so many
cases. In the end, having one unifying concept for everything can be
quite helpful (although it is an abstraction that does not come easy
to novices but must be explicitly learned first).

Just as an aside: mathematics actually has neither assignments nor
re-assignments. That's a very "computer sciency" reading of
mathematical equations, just as `x = x + 1` in programming famously is
_not_ a mathematical equation but rather an assignment. As this
confusion is often an issue for students learning to program, I think
we should be careful to properly distinguish these concepts.

Kind regards,
Tobias

Re: PEP 622 aspects [ In reply to ]

k7hoven at gmail

Jul 19, 2020, 2:42 PM

Post #3 of 3 (157 views)

Permalink

On Sun, Jul 19, 2020 at 3:00 PM Tobias Kohn <kohnt@tobiaskohn.ch> wrote:

> Quoting Koos Zevenhoven <k7hoven@gmail.com>:
>
> > (1) Class pattern that does isinstance and nothing else.
> >
> > If I understand the proposed semantics correctly, `Class()` is
> equivalent to checking `isinstance(obj, Class)`, also when `__match_args__`
> is not present. However, if a future match protocol is allowed to override
> this behavior to mean something else, for example `Class() == obj`, then
> the plain isinstance checks won't work anymore! I do find `Class() == obj`
> to be a more intuitive and consistent meaning for `Class()` than plain
> `isinstance` is.
> >
> > Instead, the plain isinstance check would seem to be well described by a
> pattern like `Class(...)`. This would allow isinstance checks for any
> class, and there is even a workaround if you really want to refer to the
> Ellipsis object. This is also related to the following point.
> >
> > (2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)`
> >
> > In the proposed semantics, cases like this are equivalent. I can see why
> that is desirable in many cases, although Class(x=1, ...)` would make it
> more clear. A possible improvement might be to add an optional element to
> `__match_args__` that separates optional arguments from required ones
> (although "optional" is not the same as "don't care").
>
>
> Please let me answer these two questions in reverse order, as I think it
> makes more sense to tackle the second one first.
>
Possibly. Although I do find (1) a more serious issue than (2). To not have
isinstance available by default in a consistent manner would definitely be
a problem in my opinion. But the way I proposed to solve (1) may affect the
user interpretations of (2).

> ***2. Attributes***
>
> There actually is an important difference between `Class(x=1, y=_)` and `
> Class(x=1)` and it won't do to just write `Class(x=1,...)` instead. The
> form `Class(x=1, y=_)` ensures that the object has an attribute `y`. In
> a way, this is where the "duck typing" is coming in.
>
Ok, that is indeed how the current class pattern match algorithm works
according to the current PEP 622. Let me rephrase the title of problem (2)
slightly to accommodate for this:

"(2) The meaning of e.g. `Class(x=1, y=_)` versus `Class(x=1)` (when the
object has attributes x, y and "x", "y" are in __match_arhs__)"

> The class of an object and its actual shape (i.e. the set of attributes it
> has) are rather loosely coupled in Python: there is usually nothing in the
> class itself that specifies what attributes an object has (other than the
> good sense to add these attributes in `__init__`).
>
Usually, it is bad practice to define classes whose interface is not or
cannot be specified. Python does, however, even allow you to make hacks
like tack an extra attribute to an object while it doesn't really "belong"
there.

> Conceptually, it therefore makes sense to not only support `isinstance`
> but also `hasattr`/`getattr` as a means to specify the shape/structure of
> an object.
>
> Here we agree (although not necessarily regarding "therefore").

> Let me give a very simple example from Python's `AST` module. We know
> that compound statements have a field `body` (for the suite) and possibly
> even a field `orelse` (for the `else` part). But there is no common
> superclass for compound statements. Hence, although it is shared by
> several objects, you cannot detect this structure through `isinstance`
> alone. By allowing you to explicitly specify attributes in patterns, you
> can still use pattern matching notwithstanding:
> ```
> *match* node:
> *case* ast.stmt(body=suite, orelse=else_suite) if else_suite:
> # a statement with a non-empty else-part
> ...
> *case* ast.stmt(body=suite):
> # a compound statement without else-part
> ...
> *case* ast.stmt():
> # a simple statement
> ...
> ```
>
So this is an example of a combination of duck-typing and a class type. I
agree it's good to be able to have this type of matching available. I can
only imagine the thought process that led you to bring up this example, but
I feel that we got stuck on whether an attribute is present or not, which
is a side track regarding the issues I pointed out.

Python can be written in many ways, but I'm not sure that the above example
is representative of how duck typing usually works. I see a lot more
situations where you either care about isinstance or about some duck typing
pattern – usually not both.

> The very basic form of class patterns could be described as `C(a_1=P_1,
> a_2=P_2, ...)`, where `C` is a class to be checked through `isinstance`,
> and the `a_i` are attribute names to be extracted by means of `getattr`
> to then be matched against the subpatterns `P_i`. In short: you specify
> the structure not only by class, but also by its actual structure in form
> of required attributes.
>
Ok, back on track now. But this won't do, if we want to be able to access
isinstance for all classes by default. If this form is applied to all
classes, then no class will have anything different from that. My version
was a bit different: to introduce the *very basic* form that is spelled
Class(...), and this would have the same meaning (isinstance) for ALL
classes.

> Particularly for very simple objects, it becomes annoying to specify the
> attribute names each time. Take, for instance, the `Num`-expression from
> the AST. It has just a single field `n` to hold the actual number. But
> the AST objects also contain an attribute `_fields = ('n',)` that not
> only lists the *relevant* attributes, but also specifies an order. It thus
> makes sense to introduce a convention that in `Num(x)` without argument
> name, the `x` corresponds to the first field `n`. Likewise, you write `UnarOp('+',
> item)` without the attribute names because `_fields=('op', 'operand')`
> already tells you what attributes are meant. That is essentially the
> principle we adopted through introduction of `__match_args__`.
>
Makes sense (at least up to the last sentence – if that is the purpose, it
is not obvious to me that it should be called __match_args__).

>
> ***1. Match Protocol***
>
> I am not entirely sure what you mean by `C() == obj`. In most cases you
> could not actually create an instance of `C` without some meaningful
> arguments for the constructor.
>
I mean exactly that – the case where, to match, the object needs to be
equal to C(). Constructing objects with no arguments is not uncommon at
all. Often it is an empty container, or in some sense the most basic form
of the object. Already in builtins, there are many examples: str(),
bytes(), dict(), int(), list(), tuple(), ...

> The idea of the match-protocol is very similar to how you can already
> override the behaviour of `isinstance`. It is not meant to completely
> change the semantics of what is already there, but to allow you to
> customise it (in some exciting ways ^_^). Of course, as with everything
> customisable, you could go off and do something funny with it, but if it
> then breaks, that's quite on you.
>
Agreed.

> On the caveat that this is ***not part of this PEP (!)***, let me try and
> explain why we would consider a match protocol in the first place. The
> standard example to consider are complex numbers.
>
[.snipped complex numbers example – in short, I agree that a general match
protocol for "class patterns" should not enforce an isinstance check.]

[...]

––Koos