Mailing List Archive

Another take on PEP 622
Hi Everyone,

I feel there are still quite a few misconceptions around concerning
PEP 622 and the new pattern matching feature it proposes.  Please
allow me therefore to take another attempt at explaining the ideas
behind PEP 622 with a different approach.  Bear in mind that I
naturally cannot cover everything, though, and that some parts
presented here are still slightly simplified.


Let's start with perhaps the most crucial part:

PEP 622 does **NOT** propose to introduce a `switch`-statement!

Indeed, pattern matching is much more closely related to concepts like
regular expressions, sequence unpacking, visitor patterns, or function
overloading.  In particular, the patterns themselves share more
similarities with formal parameters than with expressions.

So, to start with I would like to invite you to think of PEP 622 not
so much as proposing a new control structure in the sense of
`if`-`elif`-`else`, `try`-`except`-`finally`, or even `switch`, but
much rather as addressing the question: what would function
overloading look like in Python?  Of course, this does not fully
capture pattern matching or do it justice, either, but it might offer
a better basis to start with.



1. Function Overloading
-----------------------

In a statically typed language, you might define a function `product`
that either accepts two numbers or an iterable something like the
following (I am sticking to Python syntax for simplicity):
```
def product(a: float, b: float) -> float:
    return a * b

def product(it: Iterable[float]) -> float:
    result = 1.0
    for x in it: result *= x
    return result
```
In Python, however, this needs to be done differently and the dispatch
logic has to be put inside the function itself:
```
def product(*args):
    if len(args) == 2 and isinstance(args[0], float) and
isinstance(args[1], float):
        return args[0] * args[1]
    elif len(args) == 1 and isinstance(args[0], Iterable):
        result = 1.0
        for x in args[0]: result *= x
        return result
```
It is this use case to begin with that pattern matching is addressing
by introducing a more declarative way.  Each `case` represents one
possibility of what the parameter list might look like:
```
def product(*args):
    match args:
        case (float(a), float(b)):
            return a * b
        case (Iterable(it),):
            result = 1.0
            for x in it: result *= x
            return result
```
And if you squint a little, you might even see that these parameter
lists could almost be written in C: `(float a, float b)`.

In the context of more functional languages, you might also have seen
some wilder stuff, where function overloading allows you to include
literals.  One of the most prevalent examples is the factorial
function, usually defined a bit like this:
```
def fact(0):
    return 1
def fact(n):
    return n * fact(n-1)
```
Again, when doing the same thing in Python, we put the dispatch logic
inside the function:
```
def fact(n):
    if n == 0:
        return 1
    else:
        return n * fact(n - 1)
```
It is only natural to also allow pattern matching to express this use case:
```
def fact(arg):
    match arg:
        case 0:
            return 1
        case n:
            return n * fact(n - 1)
```
And this here is where you probably start to see good old `switch`
coming in.  Indeed, pattern matching is powerful and versatile enough
to act as a `switch` replacement in many cases.  But given how we
arrived here, you might start to understand that this is a happy
accident and not by design of the structure itself.

There is one big elephant in the room, of course: why do we need the
`match`-line in the first place?  If all we want is to somehow express
function overloading, couldn't we just put the cases directly into the
function body?

In principle, yes, we could do that.  But!  As it turns out, there is
something to be gained by a clear separation.  When actually using
pattern matching, you might discover that you do not always want to
define a function.  Sometimes, it is quite convenient being able to
have a pattern like this, for instance:
```
def foo(*args):
    for arg in args:
        match arg:
            case ...
```

In all of these examples, it looks as if pattern matching is replacing
an `if`-`elif`-`else` chain.  However, this is not entirely accurate. 
What we really want to express is the function overloading in the
first place.  The `if`s are only needed to express the same idea in
Python for the time being.  In other words: the individual cases here
express independent implementations and we leave it up to the Python
interpreter to choose the right one.



2. Visitor Pattern and Dispatch
-------------------------------

If you wanted to implement something like function overloading based
on the type of the argument, you might use the _visitor pattern_ for
that purpose.  Thanks to Python's reflection capabilities, this is
quite easy and simple to do.  Take, e.g., this example of a function
`draw()` that accepts a range of different geometric shapes.
```
def draw(geoShape):
    class Matcher:
        def case_Point(self, x, y):
            canvas.moveTo(x, y)
            canvas.dot(1)

        def case_Line(self, x1, y1, x2, y2):
            canvas.moveTo(x1, y1)
            canvas.lineTo(x2, y2)

        def case_Circle(self, cX, cY, radius):
            canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)

        def match(self, subject):
            n = 'case_' + subject.__class__.__name__
            method = getattr(self, n)
            args = get_attribute_values(subject)
            method(*args)

    m = Matcher()
    m.match(geoShape)
```
Now let us look at the same thing written as pattern matching
according to PEP 622:
```
def draw(geoShape):
    match geoShape:
        case Point(x, y):
            canvas.moveTo(x, y)
            canvas.dot(1)

        case Line(x1, y1, x2, y2):
            canvas.moveTo(x1, y1)
            canvas.lineTo(x2, y2)

        case Circle(cX, cY, radius):
            canvas.ellipse(cX-radius, cY-radius, cX+radius, cY+radius)    
```
You might see that the design of pattern matching is not something
entirely new and alien in Python, but well founded in ideas and
principles already present.  Moreover, observe, once again, how the
variables in the patterns literally correspond to parameters in the
visitor pattern above.  In this instance, the `case` statements are
quite close to function definitions in their entire syntactic
structure (is it really that alien that `Point(x, y)` here does _not_
represent a function call?).

This example also demonstrates something else I have brought up
before: pattern matching does not strictly have the same semantics as
an `if`-`elif`-`else` chain.  It is thus not a simple control flow
construct, as the emphasis is less on the control flow itself and more
on the shape of the objects and data.

There are, however, some clear limitations of the visitor pattern as
implement in this way.  If we go and define a new subclass of one of
these geometric shapes---perhaps something like `HorizontalLine` as a
specialisation of `Line`---, the visitor pattern will break down
because there is no method `case_HorizontalLine`.  Pattern matching,
on the other hand, will still work as expected, because we are
actually working in the space of classes with their full inheritance
structure, and not only on the names of types.

Mind, pattern matching is not a replacement for the visitor pattern in
general, though.  Even with pattern matching in place, there are valid
use cases where you would certainly prefer the visitor pattern over
pattern matching.  This is only one example, after all, meant to
illustrate the idea behind pattern matching and why it might be a good
idea.



3. Shape and Structure
----------------------

Data is organised and comes in different shapes, structures, and
representations.  It is not always as easy as in the cases above,
where the class of an object already contains all the information you
need.  As a first example, the visitor pattern as illustrated above
could not easily differentiate between tuples of different lengths,
say: you would have to encode the length of the tuple somehow in the
method names, and then consider what happens if some objects does not
have a length, say, and thus requires a different encoding scheme.

In fact, the classic example of where pattern matching really starts
to shine is a tree structure that we want to simplify (or otherwise
process) according to specific rules.  This might be an abstract
syntax tree (AST) or a mathematical expression, but the principle is
always the same: the decision which rule to apply depends on a number
of different attributes, which are often scattered among nodes of
different depths in your tree.  The following illustrates this with a
tree that represents mathematical expressions:
```
def simplify(node):
    match node:
        case BinOp(Num(left), '+', Num(right)):
            return Num(left + right)
        case BinOp(left, ('+'|'-'), Num(0)):
            return left
        case UnaryOp('-', UnaryOp('-', x)):
            return x
        case UnaryOp('-', Num(x)):
            return Num(-x)
        case anythingElse:
            return anythingElse
```
If you draw an actual diagram of the structure that each case imposes
on the node, you will quickly find that pattern matching allows you to
express a larger variety of structure than you could encode in method
names.  Not to mention if you had to write everything with `if`s and
calls to `isinstance`, etc.  It is possible, of course, but pattern
matching introduces a very succinct way of expressing the structure
declaratively and then let the compiler and interpreter figure out how
to best compare the subject `node` against the different possibilities
and extract the data you are actually interested in.

By the way: in this case, the compiler might end up creating a
structure of nested `if`s, perhaps something like:
```
def simplify(node):
    TOS = node
    try:
        if isinstance(TOS, BinOp) and isinstance(TOS.right, Num):
            if TOS.right.n == 0 and TOS.op in ('+','-'):
                return TOS.left
            elif TOS.op == '+' and isinstance(TOS.left, Num):
                return Num(TOS.left.n + TOS.right.n)
        elif isinstance(TOS, UnaryOp) and TOS.op == '-':
            value = TOS.operand
            if isinstance(value, UnaryOp) and value.op == '-':
                return value.operand
            elif isinstance(value, Num):
                return value.n
        return TOS:
    finally:
        del TOS
```
This illustrates, once again, that pattern matching is not so much
specifying the control flow itself, but rather offering a set of
alternatives.  But there is neither the strict sequential structure of
`if`-`elif`-`else`, not the jump tables of `switch`.  And in case you
wondered: the entire `match` block is actually a mini-scope, in which
the subject `node` is defined.


Kind regards,
Tobias

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZAMFDQKKEJONDEQRGC5XKOUAFUTL6OG7/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Another take on PEP 622 [ In reply to ]
On 7/16/2020 9:51 PM, Tobias Kohn wrote:
> Hi Everyone,
>
> I feel there are still quite a few misconceptions around concerning PEP
> 622 and the new pattern matching feature it proposes.  Please allow me
> therefore to take another attempt at explaining the ideas behind PEP 622
> with a different approach.  Bear in mind that I naturally cannot cover
> everything, though, and that some parts presented here are still
> slightly simplified.

Thank you Tobias. I am a fan of looking at things from multiple
viewpoint. For 200 years, physicists argued about whether light is
waves or particles, before discovering that 'neither' and 'both' were
more correct.

> 1. Function Overloading
> 2. Visitor Pattern and Dispatch > 3. Shape and Structure
[snip, snip, snip]

In an assignment statement, the code to the left of '=' is a 'target
list' of 'target's, with some idiosyncratic rules. Even though it
might, misleadingly, be called a 'target expression', it is not an
expression to be evaluated. Similarly, the code between parentheses in
a def statement is a 'parameter list' of 'defparameter' and special
symbols, with other idiosyncratic rules. Both could be called 'binding
lists' or more generally, 'binding structures'.

To me, the important point of your point is that 'case' is somewhat
analogous to 'def', and that the stuff between 'case' and ':' is a
binding structure. We should call this structure a 'match structure'.
It is misleading and confusing to call it a 'match expression'. A match
structure consists of a 'match list' of 'match items' or 'matcher's and
an optional "'if' <condition>".

Matchers have a 3rd, new, and larger set of idiosyncratic rules. The
optional condition is an escape hatch for when expressing the intended
match constraint and corresponding match set is difficult or worse using
the match rules.

As with target and parameter items, untagged simple name matchers are
(and I now see, should be) binding targets. (The parameter list tags
are ':' for types and '=' for default values.) Unlike assignment
targets, dotted names and subscriptings are not binding targets. Like
parameter lists, match lists include literals. Match lists also include
syntactic structure not seen in the other binding structures.


--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NNZ67OMAV2CDV7GSX64SOLUAERJSF5HP/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Another take on PEP 622 [ In reply to ]
Hi Terry,

Thank you: I really like your wave/particules analogy.  I think that
pattern matching is indeed uniting different concepts to build a
stronger, more versatile structure.

I also like your concept of a general "binding structure" with
different forms, such as assignment, parameter passing, and (according
to PEP 622) patterns.  I feel that this categorisation puts pattern
matching quite in the correct place.  Of course, there is also the
second aspect of "bind these variables---/if you can/" or "analyse and
compare the structure", which is the other part of this particule/wave
duality.

Concerning the guards (optional conditions), I also think that you
summarised very nearly.  I tend to think of the patterns a bit like
the grammar of a programming language.  Something that is supposed to
be static and declarative (as far as this is possible in Python). 
Yet, some constraints in programming languages cannot be expressed by
a (context-free) grammar in a meaningful way.  For instance, the
grammar itself does not keep you from having two parameters in a
function sharing the same name.  This is a dynamic aspect, a
relationship between two otherwise independent parts of the overall
structure.  And that is best caught by the guards.  So, yes: as I see
it, the guards really add something that goes beyond the declarative
side of patterns.  Hence, I completely agree with your characterisation.

Kind regards,
Tobias

Quoting Terry Reedy <tjreedy@udel.edu>:

> On 7/16/2020 9:51 PM, Tobias Kohn wrote:
>> Hi Everyone,
>>
>> I feel there are still quite a few misconceptions around concerning
>> PEP 622 and the new pattern matching feature it proposes.  Please
>> allow me therefore to take another attempt at explaining the ideas
>> behind PEP 622 with a different approach.  Bear in mind that I
>> naturally cannot cover everything, though, and that some parts
>> presented here are still slightly simplified.
>
> Thank you Tobias.  I am a fan of looking at things from multiple
> viewpoint.  For 200 years, physicists argued about whether light is
> waves or particles, before discovering that 'neither' and 'both'
> were more correct.
>
>> 1. Function Overloading
>> 2. Visitor Pattern and Dispatch > 3. Shape and Structure
>
> [snip, snip, snip]
>
> In an assignment statement, the code to the left of '=' is a 'target
> list' of 'target's, with some idiosyncratic rules.  Even though it
> might, misleadingly, be called a 'target expression', it is not an
> expression to be evaluated.  Similarly, the code between parentheses
> in a def statement is a 'parameter list' of 'defparameter' and
> special symbols, with other idiosyncratic rules.  Both could be
> called 'binding lists' or more generally, 'binding structures'.
>
> To me, the important point of your point is that 'case' is somewhat
> analogous to 'def', and that the stuff between 'case' and ':' is a
> binding structure.  We should call this structure a 'match
> structure'. It is misleading and confusing to call it a 'match
> expression'.  A match structure consists of a 'match list' of 'match
> items' or 'matcher's and an optional "'if' <condition>".
>
> Matchers have a 3rd, new, and larger set of idiosyncratic rules. 
> The optional condition is an escape hatch for when expressing the
> intended match constraint and corresponding match set is difficult
> or worse using the match rules.
>
> As with target and parameter items, untagged simple name matchers
> are (and I now see, should be) binding targets.  (The parameter list
> tags are ':' for types and '=' for default values.)  Unlike
> assignment targets, dotted names and subscriptings are not binding
> targets.  Like parameter lists, match lists include literals.  Match
> lists also include syntactic structure not seen in the other binding
> structures.
>
> --
> Terry Jan Reedy
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/NNZ67OMAV2CDV7GSX64SOLUAERJSF5HP/Code of Conduct:
> http://python.org/psf/codeofconduct/