Mailing List Archive

PEP 622 and variadic positional-only args
I've taken a look through PEP 622 and I've been thinking about how it
could be used with sympy.

In principle case/match and destructuring should be useful for sympy
because sympy has a class Basic which defines a common structure for
~1000 subclasses. There are a lot of places where it is necessary to
dispatch on the type of some object including in places that are
performance sensitive so those would seem like good candidates for
case/match. However the PEP doesn't quite seem as I hoped because it
only handles positional arguments indirectly and it does not seem to
directly handle types with variadic positional args.

The objects I refer to in sympy represent mathematical expressions e.g.:

>>> from sympy import *
>>> x, y = symbols('x, y')
>>> expr = x**2 + 2*x*y
>>> expr
x**2 + 2*x*y

You can see the structure of the object explicitly using sympy's srepr function:

>>> print(srepr(expr))
Add(Pow(Symbol('x'), Integer(2)), Mul(Integer(2), Symbol('x'), Symbol('y')))

There are a bunch of classes there (Add, Pow, Symbol, Mul, Integer)
but these are a tiny subset of the possibilities. The key feature of
Basic instances is that they have an .args attribute which can be used
to rebuild the object like:

>>> expr.args
(x**2, 2*x*y)
>>> type(expr)
<class 'sympy.core.add.Add'>
>>> type(expr)(*expr.args)
x**2 + 2*x*y
>>> type(expr)(*expr.args) == expr
True

This is known as the func-args invariant in sympy and is used to
destructure and rebuild the expression tree in different ways e.g. for
performing a substitution:

>>> expr.subs(x, 5)
10*y + 25

All Basic classes are strictly constructed using positional only
arguments and not keyword arguments. In the PEP it seems that we can
handle positional arguments when their number is fixed by the type.
For example a simplified version of Pow could be:

class Pow:

def __init__(self, base, exp):
self.args = (base, exp)

__match_args__ == ("base", "exp")

@property
def base(self):
return self.args[0]

@property
def exp(self):
return self.args[1]

Then I could match Pow in case/match with

obj = Pow(Symbol('x'), Integer(4))

match obj:
case Pow(base, exp):
# do stuff with base, exp

It seems awkward and inefficient though to go through __match_args__
and the base and exp property-methods to match the positional
arguments when they are already available as a tuple in obj.args. Note
that performance is a concern: just dispatching on isinstance() has a
measurable overhead in sympy code which is almost always CPU-bound.

The main problem though is with variadic positional arguments. For
example sympy has a symbolic Tuple class which is much like a regular
python tuple except that it takes multiple positional args rather than
a single iterable arg:

class Tuple:
def __init__(self, *args):
self.args = args

So now how do I match a 2-Tuple of two integers? I can't use
__match_args__ because that's a class attribute and different
instances have different numbers of args. It seems I can do this:

obj = Tuple(2, 4)

match obj:
case Tuple(args=(2, 4)):

That's awkward though because it doesn't match the constructor syntax
which strictly uses positional-only args. It also doesn't scale well
with nesting:

obj = Tuple(Tuple(1, 2), Tuple(3, 4))

match obj:
case Tuple(args=(Tuple(args=(1, 2)), Tuple(args=(3, 4))):
# handle ((1, 2), (3, 4)) case

Another option would be to fake a single positional argument for
matching purposes:

class Tuple:
__match_args__ == ("args",)
def __init__(self, *args):
self.args = args

match obj:
case Tuple((Tuple((1, 2)), Tuple((3, 4)))):

This requires an extra level of brackets for each node and also
doesn't match the actual constructor syntax: evaluating that pattern
in sympy turns each Tuple into a 1-Tuple containing another Tuple of
the args:

>>> t = Tuple((Tuple((1, 2)), Tuple((3, 4))))
>>> print(srepr(t))
Tuple(Tuple(Tuple(Tuple(Integer(1), Integer(2))),
Tuple(Tuple(Integer(3), Integer(4)))))

I've used Tuple in the examples above but the same applies to all
variadic Basic classes: Add, Mul, And, Or, FiniteSet, Union,
Intersection, ProductSet, ...

From a first glimpse of the proposal I thought I could do matches like this:

match obj:
case Add(Mul(x, y), Mul(z, t)) if y == t:
case Add(*terms):
case Mul(coeff, *factors):
case And(Or(A, B), Or(C, D)) if B == D:
case Union(Interval(x1, y1), Interval(x2, y2)) if y1 == x2:
case Union(Interval(x, y), FiniteSet(*p)) | Union(FiniteSet(*p),
Interval(x, y)):
case Union(*sets):

Knowing the sympy codebase each of those patterns would look quite
natural because they resemble the constructors for the corresponding
objects (as intended in the PEP). It seems instead that many of these
constructors would need to have args= so it becomes:

match obj:
case Add(args=(Mul(args=(x, y)), Mul(args=(z, t)))) if y == t:
case Add(args=terms):
case Mul(args=(coeff, *factors)):
case And(args=(Or(args=(A, B)), Or(args=(C, D)))) if C == D:
case Union(args=(Interval(x1, y1), Interval(x2, y2))) if y1 == x2:
case Union(args=(Interval(x, y), FiniteSet(args=p))) |
Union(args=(FiniteSet(args=p), Interval(x, y))):
case Union(args=sets):

Each of these looks less natural as they don't match the constructors
and the syntax gets messier with nesting.


Oscar
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/V6UC7QEG4WLQY6JBC4MEIK5UGF7X2GSD/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 622 and variadic positional-only args [ In reply to ]
On Wed, Jul 15, 2020 at 4:41 PM Oscar Benjamin <oscar.j.benjamin@gmail.com>
wrote:

> I've taken a look through PEP 622 and I've been thinking about how it
> could be used with sympy.
>
> In principle case/match and destructuring should be useful for sympy
> because sympy has a class Basic which defines a common structure for
> ~1000 subclasses. There are a lot of places where it is necessary to
> dispatch on the type of some object including in places that are
> performance sensitive so those would seem like good candidates for
> case/match. However the PEP doesn't quite seem as I hoped because it
> only handles positional arguments indirectly and it does not seem to
> directly handle types with variadic positional args.
>
> The objects I refer to in sympy represent mathematical expressions e.g.:
>
> >>> from sympy import *
> >>> x, y = symbols('x, y')
> >>> expr = x**2 + 2*x*y
> >>> expr
> x**2 + 2*x*y
>
> You can see the structure of the object explicitly using sympy's srepr
> function:
>
> >>> print(srepr(expr))
> Add(Pow(Symbol('x'), Integer(2)), Mul(Integer(2), Symbol('x'),
> Symbol('y')))
>
> There are a bunch of classes there (Add, Pow, Symbol, Mul, Integer)
> but these are a tiny subset of the possibilities. The key feature of
> Basic instances is that they have an .args attribute which can be used
> to rebuild the object like:
>
> >>> expr.args
> (x**2, 2*x*y)
> >>> type(expr)
> <class 'sympy.core.add.Add'>
> >>> type(expr)(*expr.args)
> x**2 + 2*x*y
> >>> type(expr)(*expr.args) == expr
> True
>
> This is known as the func-args invariant in sympy and is used to
> destructure and rebuild the expression tree in different ways e.g. for
> performing a substitution:
>
> >>> expr.subs(x, 5)
> 10*y + 25
>
> All Basic classes are strictly constructed using positional only
> arguments and not keyword arguments. In the PEP it seems that we can
> handle positional arguments when their number is fixed by the type.
> For example a simplified version of Pow could be:
>
> class Pow:
>
> def __init__(self, base, exp):
> self.args = (base, exp)
>
> __match_args__ == ("base", "exp")
>
> @property
> def base(self):
> return self.args[0]
>
> @property
> def exp(self):
> return self.args[1]
>
> Then I could match Pow in case/match with
>
> obj = Pow(Symbol('x'), Integer(4))
>
> match obj:
> case Pow(base, exp):
> # do stuff with base, exp
>
> It seems awkward and inefficient though to go through __match_args__
> and the base and exp property-methods to match the positional
> arguments when they are already available as a tuple in obj.args. Note
> that performance is a concern: just dispatching on isinstance() has a
> measurable overhead in sympy code which is almost always CPU-bound.
>
> The main problem though is with variadic positional arguments. For
> example sympy has a symbolic Tuple class which is much like a regular
> python tuple except that it takes multiple positional args rather than
> a single iterable arg:
>
> class Tuple:
> def __init__(self, *args):
> self.args = args
>
> So now how do I match a 2-Tuple of two integers? I can't use
> __match_args__ because that's a class attribute and different
> instances have different numbers of args. It seems I can do this:
>
> obj = Tuple(2, 4)
>
> match obj:
> case Tuple(args=(2, 4)):
>
> That's awkward though because it doesn't match the constructor syntax
> which strictly uses positional-only args. It also doesn't scale well
> with nesting:
>
> obj = Tuple(Tuple(1, 2), Tuple(3, 4))
>
> match obj:
> case Tuple(args=(Tuple(args=(1, 2)), Tuple(args=(3, 4))):
> # handle ((1, 2), (3, 4)) case
>
> Another option would be to fake a single positional argument for
> matching purposes:
>
> class Tuple:
> __match_args__ == ("args",)
> def __init__(self, *args):
> self.args = args
>
> match obj:
> case Tuple((Tuple((1, 2)), Tuple((3, 4)))):
>
> This requires an extra level of brackets for each node and also
> doesn't match the actual constructor syntax: evaluating that pattern
> in sympy turns each Tuple into a 1-Tuple containing another Tuple of
> the args:
>
> >>> t = Tuple((Tuple((1, 2)), Tuple((3, 4))))
> >>> print(srepr(t))
> Tuple(Tuple(Tuple(Tuple(Integer(1), Integer(2))),
> Tuple(Tuple(Integer(3), Integer(4)))))
>
> I've used Tuple in the examples above but the same applies to all
> variadic Basic classes: Add, Mul, And, Or, FiniteSet, Union,
> Intersection, ProductSet, ...
>
> From a first glimpse of the proposal I thought I could do matches like
> this:
>
> match obj:
> case Add(Mul(x, y), Mul(z, t)) if y == t:
> case Add(*terms):
> case Mul(coeff, *factors):
> case And(Or(A, B), Or(C, D)) if B == D:
> case Union(Interval(x1, y1), Interval(x2, y2)) if y1 == x2:
> case Union(Interval(x, y), FiniteSet(*p)) | Union(FiniteSet(*p),
> Interval(x, y)):
> case Union(*sets):
>
> Knowing the sympy codebase each of those patterns would look quite
> natural because they resemble the constructors for the corresponding
> objects (as intended in the PEP). It seems instead that many of these
> constructors would need to have args= so it becomes:
>
> match obj:
> case Add(args=(Mul(args=(x, y)), Mul(args=(z, t)))) if y == t:
> case Add(args=terms):
> case Mul(args=(coeff, *factors)):
> case And(args=(Or(args=(A, B)), Or(args=(C, D)))) if C == D:
> case Union(args=(Interval(x1, y1), Interval(x2, y2))) if y1 == x2:
> case Union(args=(Interval(x, y), FiniteSet(args=p))) |
> Union(args=(FiniteSet(args=p), Interval(x, y))):
> case Union(args=sets):
>
> Each of these looks less natural as they don't match the constructors
> and the syntax gets messier with nesting.
>

That's a really interesting new use case you're bringing up.

You may have noticed that between v1 and v2 of the PEP we withdrew the
`__match__` protocol; we've been brainstorming about different forms a
future `__match__` protocol could take, once we have more practical
experience. One possible variant we've been looking at would be something
that would *only* be used for positional arguments -- `__match__` would
just return a tuple of values extracted from the object that can then be
matched by the interpreter's match machinery. Your use case could then
(almost, see below) be handled by having `__match__` just return
`self.args`.

I also think there's a hack that will work today, assuming your users
aren't going to write match statements with insanely long parameter lists
for class patterns: You can set `__match_args__ = ["__0__", "__1__",
"__2__", ..., "__25__"]`, and define 26 properties like this:
```
@property
def __0__(self):
return self.args[0]
@property
def __1__(self):
return self.args[1]
# etc.
```

But now for the caveat.

As the PEP currently stands, you don't *have* to specify all parameters in
a class pattern. For example, using a Point3d class that takes `(x, y, z)`,
you can write `Point(x, y)` as a shorthand for `Point(x, y, _)`. This is
intended to make life easy for classes that have several positional
arguments with defaults, where the constructor can also omit some of the
arguments. (It also avoids needing to have a special case for a class with
*no* arguments, which covers the important use case of *just* wanting to
check the type.) But for your use case it seems it would be less than ideal
to have `Add(Symbol('x'), Symbol('y'))` be a valid match for `x + y + z`. I
can think of a workaround (pass a sentinel to the pattern) but it would be
uglier than doubling the parentheses.

Note that this only applies to class patterns -- sequence patterns require
an explicit `*_` to ignore excess values. Because of this, writing
`Add(args=(...))` or `Add((...))` would circumvent the problem (but it
would have the problems you pointed out of course). When we design the
`__match__` protocol in the future we can make sure there's a way to
specify this. For example, we could pass *in* the number of positional
sub-patterns. This has been proposed, but we weren't sure of the use case
-- now we have one (unless I'm misunderstanding your intentions).

Thoughts?

--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: PEP 622 and variadic positional-only args [ In reply to ]
Hi Oscar
On Wed, Jul 15, 2020 at 4:41 PM Oscar Benjamin
<oscar.j.benjamin@gmail.com> wrote:
> I've taken a look through PEP 622 and I've been thinking about how
> it could be used with sympy.
Thank you very much for taking the time to carefully elaborate an
interesting possible use case.  I find this very helpful and a great
basis for further/future discussions on the design of pattern matching.

A deliberate part of the current design was to address the
/structure/shape/ of objects rather than the constructor directly
(which could be an arbitrary complex function after all).  Writing
`Add(args=(Mul(...), Mul(...))` for instance is therefore consistent
as it reflects the /actual structure/ of your objects.  The
`__match_args__` is primarily intended for rather simple object
shapes, where it is quite obvious what attributes constitute the
object and in which order (similar to the `_fields` attribute in AST
nodes).

From this perspective, your use case makes an argument for something
I would call '/variadic shape/' (in lack of a better word). 
Structurally, your objects behave like sequences or tuples, adorned
with a specific type/class---which, again, is currently expressed as
the "class(tuple)" pattern such as `Add((Mul(), Mul()))`.

There are two possibilities to approach this issue.  We could
introduce patterns that extract "sequential elements" via
`__getitem__` rather than attributes.  Or we could have a special
method `__match__` that might return a representation of the object's
data in sequential form.

The `__getitem__` approach turned out to come with quite a few
complications.  In short: it is very hard to assess an object's
possibly sequential structure in a non-destructive way.  Because of
the multiple cases in the new pattern matching structure, we cannot
just use an iterator as in unpacking.  And `__getitem__` itself is
extremely versatile, being used, e.g., for both sequences as well as
mappings.  We therefore ended up supporting only built-in structures
like tuples, list, and dicts for now, for which the interpreter can
easily determine how to handle `__getitem__`.

The `__match__` protocol, on the other hand, is something that we
deferred so that we can make sure it really is powerful and well
designed enough to handle a wide range of use cases.  One of the more
interesting use cases, e.g., I had in mind was to destructure data
that comes as byte strings, say (something that Rhodri James [1] has
brought up, too).  And I think you have just added another very
interesting use case to take into consideration.  But probably the
best course of action is really to gain some experience and collect
some additional use cases.

Kind regards,
Tobias

[1] 
https://mail.python.org/archives/list/python-dev@python.org/message/WD2E3K5TWR4E6PZBM4TKGHTJ7VDERTDG/
Re: PEP 622 and variadic positional-only args [ In reply to ]
On Thu, 16 Jul 2020 at 02:09, Guido van Rossum <guido@python.org> wrote:
>
> On Wed, Jul 15, 2020 at 4:41 PM Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
>>
>> I've taken a look through PEP 622 and I've been thinking about how it
>> could be used with sympy.
>>
>> In principle case/match and destructuring should be useful for sympy
>> because sympy has a class Basic which defines a common structure for
>> ~1000 subclasses. There are a lot of places where it is necessary to
>> dispatch on the type of some object including in places that are
>> performance sensitive so those would seem like good candidates for
>> case/match. However the PEP doesn't quite seem as I hoped because it
>> only handles positional arguments indirectly and it does not seem to
>> directly handle types with variadic positional args.
>>
[snip]
>>
>> From a first glimpse of the proposal I thought I could do matches like this:
>>
>> match obj:
>> case Add(Mul(x, y), Mul(z, t)) if y == t:
>> case Add(*terms):
>> case Mul(coeff, *factors):
>> case And(Or(A, B), Or(C, D)) if B == D:
>> case Union(Interval(x1, y1), Interval(x2, y2)) if y1 == x2:
>> case Union(Interval(x, y), FiniteSet(*p)) | Union(FiniteSet(*p), Interval(x, y)):
>> case Union(*sets):
>>
>> Knowing the sympy codebase each of those patterns would look quite
>> natural because they resemble the constructors for the corresponding
>> objects (as intended in the PEP). It seems instead that many of these
>> constructors would need to have args= so it becomes:
>>
>> match obj:
>> case Add(args=(Mul(args=(x, y)), Mul(args=(z, t)))) if y == t:
>> case Add(args=terms):
>> case Mul(args=(coeff, *factors)):
>> case And(args=(Or(args=(A, B)), Or(args=(C, D)))) if C == D:
>> case Union(args=(Interval(x1, y1), Interval(x2, y2))) if y1 == x2:
>> case Union(args=(Interval(x, y), FiniteSet(args=p))) | Union(args=(FiniteSet(args=p), Interval(x, y))):
>> case Union(args=sets):
>>
>> Each of these looks less natural as they don't match the constructors
>> and the syntax gets messier with nesting.
>
>
> That's a really interesting new use case you're bringing up.
>
> You may have noticed that between v1 and v2 of the PEP we withdrew the `__match__` protocol; we've been brainstorming about different forms a future `__match__` protocol could take, once we have more practical experience. One possible variant we've been looking at would be something that would *only* be used for positional arguments -- `__match__` would just return a tuple of values extracted from the object that can then be matched by the interpreter's match machinery. Your use case could then (almost, see below) be handled by having `__match__` just return `self.args`.

That would work but something else just occurred to me which is that
as I understand it the intention of __match_args__ is that it is
supposed to correspond to the parameter list for __init__/__new__ like

class Thing2:
__match_args__ = ('first', 'second')

def __init__(self, first, second):
self.first = first
self.second = second

That is deliberate so that matching can have a similar logic to the
way that arguments are handled when calling __init__:

match obj:
case Thing2(1, 2):
case Thing2(1, second=2):
case Thing2(first=1, second=2):
...

Maybe __match_args__ could have a way to specify the variadic part of
a parameter list as well:

class ThingN:
__match_args__ = ('first', 'second', '*rest')

def __init__(self, first, second, *rest):
self.first = first
self.second = second
self.rest = rest

Then you can match with

match obj:
case ThingN(1, 2):
case ThingN(1, 2, 3):
case ThingN(first, second, *rest):
case ThingN(first, *second_and_rest):
case ThingN(*allargs):
...

The normal restrictions for combinations of keyword and positional
arguments could apply to the patterns as if __match_args__ was the
parameter list for a function.

Perhaps the * in '*rest' isn't as clear between quotes and some more
noticeable syntax could be used like

__match_args__ = ('first', 'second', ('rest',))


--
Oscar
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SMN5GZHGURK2AQKTBES5Z6XLWPDUBNB6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 622 and variadic positional-only args [ In reply to ]
Oscar Benjamin's study of sympy is part of what prompted this, and does provide a concrete example of why constructors should be echoed.

I think in general, the matching has fallen into two categories:

(1) Simple literal-like matching, that mostly works OK. There is still some concern over what is a bind variable vs a match constraint, but it mostly works. And everyone agrees that it isn't the important or interesting part of the proposal.

(2) Object destructuring matches, that ... are not as close to resolution. It occurs to me that object creation is also a function call (albeit with an implicit self), so this may be a good place to build on Bound Signatures. (Think inspect.Parameter, but also containing the value.)

I hope (and think) that the result for sympy would be about what Oscar asked for (below), so I'll fill in with the more generic Point-based example.

class Point:
def __init__ Point(self, x, y, z=0, *, color=Color.BLACK): ...

case Point(23, y=y, oldcolor=color): # z doesn't matter

I have weak opinions on whether to require y=y to (or y= or :=y or ...) to capture one of the variables when it isn't being renamed.

Oscar Benjamin wrote:
> I've taken a look through PEP 622 and I've been thinking about how
> it could be used with sympy.

> ... The key feature of Basic instances is that they have an .args
> attribute which can be used to rebuild the object ...

> All Basic classes are strictly constructed using positional only
> arguments and not keyword arguments. In the PEP it seems that
> we can handle positional arguments when their number is fixed
> by the type. ... The main problem though is with variadic positional
> arguments. ...

> From a first glimpse of the proposal I thought I could do matches like this:
> match obj:
> case Add(Mul(x, y), Mul(z, t)) if y == t:
> case Add(terms):
> case Mul(coeff, factors):
> case And(Or(A, B), Or(C, D)) if B == D:
> case Union(Interval(x1, y1), Interval(x2, y2)) if y1 == x2:
> case Union(Interval(x, y), FiniteSet(p)) | Union(FiniteSet(p),
> Interval(x, y)):
> case Union(*sets):
> Knowing the sympy codebase each of those patterns would look quite
> natural because they resemble the constructors for the corresponding
> objects (as intended in the PEP). It seems instead that many of these
> constructors would need to have args= so it becomes:
> match obj:
> case Add(args=(Mul(args=(x, y)), Mul(args=(z, t)))) if y == t:
> case Add(args=terms):
> case Mul(args=(coeff, *factors)):
> case And(args=(Or(args=(A, B)), Or(args=(C, D)))) if C == D:
> case Union(args=(Interval(x1, y1), Interval(x2, y2))) if y1 == x2:
> case Union(args=(Interval(x, y), FiniteSet(args=p))) |
> Union(args=(FiniteSet(args=p), Interval(x, y))):
> case Union(args=sets):
> Each of these looks less natural as they don't match the constructors
> and the syntax gets messier with nesting.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/T2C2KI5DSXJ63MC2XMTXXC6E65VZ5FZK/
Code of Conduct: http://python.org/psf/codeofconduct/