Mailing List Archive

Annotating pure functions to improve inline caching/optimization
Hello everyone,

the docs on the upcoming 3.11 release state

> This [specializing adaptive interpreter] also brings in another
concept called inline caching, where Python caches the results of
expensive operations directly in the bytecode.

I wonder how this caching works, given that the dynamic nature means
that virtually every operation could have side effects, causing wrong
behaviour when cached. The only mitigation for this that I can imagine
is that caching just occurs for basic operations defined in the standard
library, where it is known that they are free of side effects or "pure".

A web search did reveal some discussions[1,2] and a module related to
dealing with pure functions, but, as far as I see, not related to
optimization.

As an example, consider a code like this:

```py
@pure
def rot_factor(angle_deg: float) -> complex:
# This could also be a much more expensive calculation.
return cmath.exp(angle_deg / 180 * cmath.pi * 1j)

# ...

res: List[Tuple(complex, complex, complex, float)] = []
for x in many:
res.append((
x * rot_factor(90),
x * rot_factor(45),
x * rot_factor(-45),
x * math.sin(math.pi/8),
))
```

The problem with this code is obvious, every loop iteration calls
`rot_factor()` with 90, 45 and -45 and will get exactly the same set of
results. The last factor might already be inline cached by the
interpreter, since it probably knows that `math.pi` is a constant and
`math.sin()` is a pure function. Optimizing this by hand (not
considering a list comprehension or other more sophisticated
improvements) is easy, but not very pretty:

```py
f_p90 = rot_factor(90)
f_p45 = rot_factor(45)
f_m45 = rot_factor(-45)
f_sin = math.sin(math.pi / 8)
res: List[Tuple(complex, complex, complex, float)] = []
for x in many:
res.append((
x * f_p90,
x * f_p45,
x * f_m45,
x * f_sin,
))
```

I actually find myself often factoring such data out of loops in Python,
whereas in C I would just leave that to the optimizer/compiler.

An available option would be to use `@lru_cache` for `rot_factor()`, but
this will still cause the same dictionary lookups in every iteration and
it may not work at all in case the function argument(s) is/are not hashable.

Now, if the interpreter understood the `@pure` decorator for
`rot_factor()` indicated above would give it the same opportunity to
cache the three results throughout the loop, basically creating the
manually-optimized code above. For these completely static values, it
could even precompute the results and integrate them into the bytecode.


Has anything like this been considered already, or is the interpreter
itself capable to perform such optimizations?


Thanks and best regards,
Philipp



[1] 'pure' type annotation for mypy:
https://github.com/python/mypy/issues/4468

[2] pure-func module: https://pypi.org/project/pure-func/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XDYZRC6L7LBPE3T6RO6S5IVY3J6IMRSJ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Annotating pure functions to improve inline caching/optimization [ In reply to ]
You should bring this up on discuss.python.org. It's not going to see
much if any discussion here.

Eric

On 9/14/2022 10:05 AM, Philipp Burch wrote:
> Hello everyone,
>
> the docs on the upcoming 3.11 release state
>
> > This [specializing adaptive interpreter] also brings in another
> concept called inline caching, where Python caches the results of
> expensive operations directly in the bytecode.
>
> I wonder how this caching works, given that the dynamic nature means
> that virtually every operation could have side effects, causing wrong
> behaviour when cached. The only mitigation for this that I can imagine
> is that caching just occurs for basic operations defined in the
> standard library, where it is known that they are free of side effects
> or "pure".
>
> A web search did reveal some discussions[1,2] and a module related to
> dealing with pure functions, but, as far as I see, not related to
> optimization.
>
> As an example, consider a code like this:
>
> ```py
> @pure
> def rot_factor(angle_deg: float) -> complex:
>     # This could also be a much more expensive calculation.
>     return cmath.exp(angle_deg / 180 * cmath.pi * 1j)
>
> # ...
>
> res: List[Tuple(complex, complex, complex, float)] = []
> for x in many:
>     res.append((
>         x * rot_factor(90),
>         x * rot_factor(45),
>         x * rot_factor(-45),
>         x * math.sin(math.pi/8),
>     ))
> ```
>
> The problem with this code is obvious, every loop iteration calls
> `rot_factor()` with 90, 45 and -45 and will get exactly the same set
> of results. The last factor might already be inline cached by the
> interpreter, since it probably knows that `math.pi` is a constant and
> `math.sin()` is a pure function. Optimizing this by hand (not
> considering a list comprehension or other more sophisticated
> improvements) is easy, but not very pretty:
>
> ```py
> f_p90 = rot_factor(90)
> f_p45 = rot_factor(45)
> f_m45 = rot_factor(-45)
> f_sin = math.sin(math.pi / 8)
> res: List[Tuple(complex, complex, complex, float)] = []
> for x in many:
>     res.append((
>         x * f_p90,
>         x * f_p45,
>         x * f_m45,
>         x * f_sin,
>     ))
> ```
>
> I actually find myself often factoring such data out of loops in
> Python, whereas in C I would just leave that to the optimizer/compiler.
>
> An available option would be to use `@lru_cache` for `rot_factor()`,
> but this will still cause the same dictionary lookups in every
> iteration and it may not work at all in case the function argument(s)
> is/are not hashable.
>
> Now, if the interpreter understood the `@pure` decorator for
> `rot_factor()` indicated above would give it the same opportunity to
> cache the three results throughout the loop, basically creating the
> manually-optimized code above. For these completely static values, it
> could even precompute the results and integrate them into the bytecode.
>
>
> Has anything like this been considered already, or is the interpreter
> itself capable to perform such optimizations?
>
>
> Thanks and best regards,
> Philipp
>
>
>
> [1] 'pure' type annotation for mypy:
> https://github.com/python/mypy/issues/4468
>
> [2] pure-func module: https://pypi.org/project/pure-func/
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/XDYZRC6L7LBPE3T6RO6S5IVY3J6IMRSJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/A5CATJEKKNZN4XOWXVJ2ICK2Z4A3WFYU/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Annotating pure functions to improve inline caching/optimization [ In reply to ]
> I wonder how this caching works, given that the dynamic nature means
> that virtually every operation could have side effects, causing wrong
> behaviour when cached. The only mitigation for this that I can imagine
> is that caching just occurs for basic operations defined in the standard
> library, where it is known that they are free of side effects or "pure".
I've frequently explored the new adaptive, inline caching code generated by 3.11. "inline caching" does not mean result caching (like what C/C++ might do) here, but rather it should mean the caching of info used for the adaptive instructions. That means the bytecode stays the same except for the adaptive instructions and the changed `CACHE`s below each adaptive instruction, which should *always* be skipped due to the possibility of misinterpretation as other instructions.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2EYWR5OLDNCFRP5QEG7K5RV4XNPQX6WS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Annotating pure functions to improve inline caching/optimization [ In reply to ]
Hi Phillip, thanks for your interest in CPython.

How Python views your code isn't how you view your code. CPython views source code instead as something called "bytecode/wordcode". This bytecode is a lower-level intermediary language that the CPython interpreter executes. You can read more about bytecode at the documentation for the dis module [1].

Operations are thus viewed at the bytecode level. The inline caching is done on a per-bytecode basis. A complicated program would be split into many bytecode that each does a smaller operation relative to the larger program. This is why our guards when using information from the inline cache are very simple, because the operations themselves are relatively simpler to something as big as a function. This is also how we can ensure that side effects in our operations don't break anything.

> I actually find myself often factoring such data out of loops in Python, whereas in C I would just leave that to the optimizer/compiler.

The compiler in CPython can't really do that because it's not safe in Python. The user could've overridden `__add__` to do more things than just addition.

[1] https://docs.python.org/3/library/dis.html
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UVKPPMKOAN5SNMARXIQAFOXCYO6JM5K2/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Annotating pure functions to improve inline caching/optimization [ In reply to ]
On 15.09.22 00:05, Jeremiah Gabriel Pascual wrote:
> I've frequently explored the new adaptive, inline caching code generated by 3.11. "inline caching" does not mean result caching (like what C/C++ might do) here, but rather it should mean the caching of info used for the adaptive instructions. That means the bytecode stays the same except for the adaptive instructions and the changed `CACHE`s below each adaptive instruction, which should *always* be skipped due to the possibility of misinterpretation as other instructions.


Oh, thanks for the clarification. Looks like I've hoped for too much
when reading about caching. Maybe sometimes in the future, who knows...

Regards,
Philipp
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NVT4SKFSD2PDY3N2YDFNVQQH5KBHSZKT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Annotating pure functions to improve inline caching/optimization [ In reply to ]
Hi Ken,

thank you for the inputs. Just one more comment:

>> I actually find myself often factoring such data out of loops in Python, whereas in C I would just leave that to the optimizer/compiler.
>
> The compiler in CPython can't really do that because it's not safe in Python. The user could've overridden `__add__` to do more things than just addition.

I'm totally aware that Python, or dynamic languages in general, are
terrible for any optimizer, because there is basically nothing that it
can be sure about. However, this is exactly the point of marking
functions as "pure". It would not make the code technically safe to
optimize, but transfer the responsibility from the interpreter to the
programmer. If the programmer says that something is safe to
optimize/cache, then the interpreter can't be blamed if something breaks
when it actually does so.

On the other hand, a language as dynamic as Python probably is just not
designed for an optimizer, so either you do it by hand or use an
extension module to speed critical parts up. Fair enough.

Best regards,
Philipp
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JA7YE6V4APZT7QZXA6SASVWXEQJTWY5K/
Code of Conduct: http://python.org/psf/codeofconduct/