Mailing List Archive

Improve CPython tracing performance
Hi all,

Right now, when a debugger is active, the number of local variables can
affect the tracing speed quite a lot.

For instance, having tracing setup in a program such as the one below takes
4.64 seconds to run, yet, changing all the variables to have the same name
-- i.e.: change all assignments to `a = 1` (such that there's only a single
variable in the namespace), it takes 1.47 seconds (in my machine)... the
higher the number of variables, the slower the tracing becomes.

```
import time
t = time.time()

def call():
a = 1
b = 1
c = 1
d = 1
e = 1
f = 1

def noop(frame, event, arg):
return noop

import sys
sys.settrace(noop)

for i in range(1_000_000):
call()

print('%.2fs' % (time.time() - t,))
```

This happens because `PyFrame_FastToLocalsWithError` and
`PyFrame_LocalsToFast` are called inside the `call_trampoline` (
https://github.com/python/cpython/blob/master/Python/sysmodule.c#L946).

So, I'd like to simply remove those calls.

Debuggers can call `PyFrame_LocalsToFast` when needed -- otherwise
mutating non-current frames doesn't work anyways. As a note, pydevd already
has such a call:
https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b53ab113509d/_pydevd_bundle/pydevd_save_locals.py#L57
and PyPy also has a counterpart.

As for `PyFrame_FastToLocalsWithError`, I don't really see any reason to
call it at all.

i.e.: something as the code below prints the `a` variable from the `main()`
frame regardless of that and I checked all pydevd tests and nothing seems
to be affected (it seems that accessing f_locals already does this:
https://github.com/python/cpython/blob/cb9879b948a19c9434316f8ab6aba9c4601a8173/Objects/frameobject.c#L35,
so, I don't see much reason to call it at all).

```
def call():
import sys
frame = sys._getframe()
print(frame.f_back.f_locals)

def main():
a = 1
call()

if __name__ == '__main__':
main()
```

Does anyone see any issue with this?

If it's non controversial, is a PEP needed or just an issue to track it
would be enough to remove those 2 lines?

Thanks,

Fabio
Re: Improve CPython tracing performance [ In reply to ]
Le jeu. 29 oct. 2020 à 13:02, Fabio Zadrozny <fabiofz@gmail.com> a écrit :
> Debuggers can call `PyFrame_LocalsToFast` when needed -- otherwise mutating non-current frames doesn't work anyways. As a note, pydevd already has such a call: https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b53ab113509d/_pydevd_bundle/pydevd_save_locals.py#L57 and PyPy also has a counterpart.

Hum, if a trace or profile function is written in Python, reading
frame.f_locals does call PyFrame_FastToLocalsWithError(). So a Python
debugger/profiler would be ok with your code.

For a debugger/profiler written in C, it would be a backward
incompatible change. I agree that it would be reasonable to require it
to call PyFrame_FastToLocalsWithError().

> If it's non controversial, is a PEP needed or just an issue to track it would be enough to remove those 2 lines?

Incompatible changes should be well documented in What's New in Python
3.10. In this case, I don't think that a deprecation period is needed.

Just open an issue. Please post the URL to your issue in reply to your
email. It's even better if you can write a PR to implement your idea
;-)

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YRHHR6BWIQK3BBIEPOM47XSMXWUODYF7/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Improve CPython tracing performance [ In reply to ]
On Thu, Oct 29, 2020 at 9:45 AM Victor Stinner <vstinner@python.org> wrote:

> Le jeu. 29 oct. 2020 à 13:02, Fabio Zadrozny <fabiofz@gmail.com> a écrit :
> > Debuggers can call `PyFrame_LocalsToFast` when needed -- otherwise
> mutating non-current frames doesn't work anyways. As a note, pydevd already
> has such a call:
> https://github.com/fabioz/PyDev.Debugger/blob/0d4d210f01a1c0a8647178b2e665b53ab113509d/_pydevd_bundle/pydevd_save_locals.py#L57
> and PyPy also has a counterpart.
>
> Hum, if a trace or profile function is written in Python, reading
> frame.f_locals does call PyFrame_FastToLocalsWithError(). So a Python
> debugger/profiler would be ok with your code.
>
> For a debugger/profiler written in C, it would be a backward
> incompatible change. I agree that it would be reasonable to require it
> to call PyFrame_FastToLocalsWithError().
>
> > If it's non controversial, is a PEP needed or just an issue to track it
> would be enough to remove those 2 lines?
>
> Incompatible changes should be well documented in What's New in Python
> 3.10. In this case, I don't think that a deprecation period is needed.
>
> Just open an issue. Please post the URL to your issue in reply to your
> email. It's even better if you can write a PR to implement your idea
> ;-)
>


Ok, I've created https://bugs.python.org/issue42197 to track it.

--
Fabio
Re: Improve CPython tracing performance [ In reply to ]
On Fri, 30 Oct 2020 at 00:19, Fabio Zadrozny <fabiofz@gmail.com> wrote:
> On Thu, Oct 29, 2020 at 9:45 AM Victor Stinner <vstinner@python.org> wrote:
>>
>> > If it's non controversial, is a PEP needed or just an issue to track it would be enough to remove those 2 lines?
>>
>> Incompatible changes should be well documented in What's New in Python
>> 3.10. In this case, I don't think that a deprecation period is needed.
>>
>> Just open an issue. Please post the URL to your issue in reply to your
>> email. It's even better if you can write a PR to implement your idea
>> ;-)

Removing those calls would require a PEP, as it would break all sorts
of tools in cases that currently work correctly.

> Ok, I've created https://bugs.python.org/issue42197 to track it.

Please also have a look at PEP 558 and its draft reference
implementation at https://github.com/python/cpython/pull/3640

The way these trampoline calls currently work isn't just slow, it's
actually broken in various ways, and changing them to use a
write-through proxy instead of a dict-based snapshot means that the
cost of producing those dict-based snapshots simply because tracing is
turned on will go away.

The PEP itself didn't seem to be particularly controversial (at least
in its current form - earlier versions drew more objections), but
there's a bunch of preparatory work that needs to be done before it
could seriously be submitted for final review (specifically: the
write-through proxy isn't actually implementing the full mutable
mapping API. In order for it to do that without excessive code
duplication, the helper functions already written for ordered
dictionaries needed to moved out to a separate linkable module so that
the new write-through proxy can reuse them without taking a separate
copy of them)

Cheers,
Nick.

P.S. If that sounds like a request for help, that's because it is ;)

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RGP4UJFWUML6LQQPB4RALTBOXEHOEMHE/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Improve CPython tracing performance [ In reply to ]
On Fri, Oct 30, 2020 at 7:02 AM Nick Coghlan <ncoghlan@gmail.com> wrote:

> On Fri, 30 Oct 2020 at 00:19, Fabio Zadrozny <fabiofz@gmail.com> wrote:
> > On Thu, Oct 29, 2020 at 9:45 AM Victor Stinner <vstinner@python.org>
> wrote:
> >>
> >> > If it's non controversial, is a PEP needed or just an issue to track
> it would be enough to remove those 2 lines?
> >>
> >> Incompatible changes should be well documented in What's New in Python
> >> 3.10. In this case, I don't think that a deprecation period is needed.
> >>
> >> Just open an issue. Please post the URL to your issue in reply to your
> >> email. It's even better if you can write a PR to implement your idea
> >> ;-)
>
> Removing those calls would require a PEP, as it would break all sorts
> of tools in cases that currently work correctly.
>
> > Ok, I've created https://bugs.python.org/issue42197 to track it.
>
> Please also have a look at PEP 558 and its draft reference
> implementation at https://github.com/python/cpython/pull/3640
>
> The way these trampoline calls currently work isn't just slow, it's
> actually broken in various ways, and changing them to use a
> write-through proxy instead of a dict-based snapshot means that the
> cost of producing those dict-based snapshots simply because tracing is
> turned on will go away.
>
> The PEP itself didn't seem to be particularly controversial (at least
> in its current form - earlier versions drew more objections), but
> there's a bunch of preparatory work that needs to be done before it
> could seriously be submitted for final review (specifically: the
> write-through proxy isn't actually implementing the full mutable
> mapping API. In order for it to do that without excessive code
> duplication, the helper functions already written for ordered
> dictionaries needed to moved out to a separate linkable module so that
> the new write-through proxy can reuse them without taking a separate
> copy of them)
>
>
Hi Nick!

As a note, the current implementation does allow debuggers to mutate frame
locals -- as long as they understand that they need to call `
PyFrame_LocalsToFast ` when doing such a change -- potentially using ctypes
(I'm just mentioning this because PEP 558 seems to imply this isn't
possible).

i.e.: Debuggers already *must* call ` PyFrame_LocalsToFast ` if locals from
a frame which is not the current frame are being mutated, so, as far as I
can see a debugger is already broken if it isn't doing that -- some years
ago I even thought about exposing it in the frame API:
https://bugs.python.org/issue1654367, but in the end accessing it through
the C-API through ctypes does get the job done, debugger authors just need
to be aware of it -- PyPy also has a counterpart mentioned in that issue.

I agree that having f_locals be a proxy that does everything transparently
would be better, but unfortunately I don't currently have the time
available to help there... in your opinion, just removing those calls as I
proposed (requiring that debuggers call `PyFrame_LocalsToFast`) would be
acceptable? If you think it is, I'll proceed on creating the PEP, otherwise
I'll probably just drop it until f_locals becomes a proxy (in which case
I'd expect the `PyFrame_FastToLocalsWithError` and `PyFrame_LocalsToFast`
to be removed in the same commit which converts f_locals to a proxy).

Cheers,

Fabio
Re: Improve CPython tracing performance [ In reply to ]
On Fri, 30 Oct 2020 at 20:52, Fabio Zadrozny <fabiofz@gmail.com> wrote:
> On Fri, Oct 30, 2020 at 7:02 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
> As a note, the current implementation does allow debuggers to mutate frame locals -- as long as they understand that they need to call ` PyFrame_LocalsToFast ` when doing such a change -- potentially using ctypes (I'm just mentioning this because PEP 558 seems to imply this isn't possible).

If that's referring to the sentence that ends with ".. without even
reliably enabling the desired functionality of allowing debuggers like
pdb to mutate local variables.", the critical word there is
"reliably".

The existing API and implementation does allow you to change local
variables from trace functions (especially if you use ctypes to access
the C API rather than sticking solely to the Python level API), it
just also messes up your application state sometimes.

> i.e.: Debuggers already *must* call ` PyFrame_LocalsToFast ` if locals from a frame which is not the current frame are being mutated, so, as far as I can see a debugger is already broken if it isn't doing that -- some years ago I even thought about exposing it in the frame API: https://bugs.python.org/issue1654367, but in the end accessing it through the C-API through ctypes does get the job done, debugger authors just need to be aware of it -- PyPy also has a counterpart mentioned in that issue.

It isn't primarily debuggers that are a concern for backwards
compatibility, it's trace functions manipulating the state of the
frame being traced. In the current implementation, that feature relies
on those FastToLocals and LocalsToFast conversion calls to work.

> I agree that having f_locals be a proxy that does everything transparently would be better, but unfortunately I don't currently have the time available to help there... in your opinion, just removing those calls as I proposed (requiring that debuggers call `PyFrame_LocalsToFast`) would be acceptable? If you think it is, I'll proceed on creating the PEP, otherwise I'll probably just drop it until f_locals becomes a proxy (in which case I'd expect the `PyFrame_FastToLocalsWithError` and `PyFrame_LocalsToFast` to be removed in the same commit which converts f_locals to a proxy).

Checking the PR branch, the write-through proxy meant I was able to
get rid of the call to PyFrame_LocalsToFast, but I couldn't get rid of
the implicit snapshot refresh due to C API compatibility concerns:
https://github.com/python/cpython/pull/3640/files#diff-a3a5c73931235f7f344c072dc755d6508e13923db3f5d581c5e88652075871cb

Assuming the PEP is eventually accepted, though, then it would be
reasonable to subsequently deprecate that implicit refresh, and say
that any trace function calling such APIs needs to make its own call
to PyLocals_RefreshViews() first.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JS7NEH4KS5CX3PHO6RVYPHJPRWBVWLRW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Improve CPython tracing performance [ In reply to ]
Le ven. 30 oct. 2020 à 11:02, Nick Coghlan <ncoghlan@gmail.com> a écrit :
> > Ok, I've created https://bugs.python.org/issue42197 to track it.
>
> Please also have a look at PEP 558 and its draft reference
> implementation at https://github.com/python/cpython/pull/3640

I don't think that the PEP 558 and bpo-42197 are incompatible.

Debuggers and profilers usually only care of specific frames or
function calls (ex: 10% of function calls or even a single function
call in a whole application). The problem is how to make them as
efficient as possible for "no operation" calls, when they don't care
about the current frame. Avoiding PyFrame_FastToLocalsWithError() to
enter the debugger/profile and avoiding PyFrame_LocalsToFast() on exit
sounds a simple and practical solution.

By the way, I created https://bugs.python.org/issue40421 to prepare
the C API to make the PyFrameObject structure opaque. Once it will be
opaque, we will have more freedom on the API exposed to inspect frame
locals.

IMO it's a good idea to require function calls to inspect frame
locals, and not let developers think that PyFrameObject.f_locals is
always up-to-date and can always be modified.

I dislike PyFrame_FastToLocalsWithError() and PyFrame_LocalsToFast()
names. Maybe we should have an even more "explicit" API. Example
(without error handling):
---
// Implementation: Call PyFrame_FastToLocalsWithError() and returns
PyFrameObject.f_locals
locals = PyFrame_GetLocals();

// ... the debugger modifies some local variables in 'locals' dict ...

// Implementation: Set PyFrameObject.f_locals and call PyFrame_LocalsToFast()
PyFrame_SetLocals(locals);
---

The good news is that PyFrame_GetLocals() and PyFrame_SetLocals() can
easily be reimplemented in Python 3.9 for my compatibility header
file:
https://github.com/pythoncapi/pythoncapi_compat

Such API avoids a complex proxy and simply reuses a regular dict
object (exiting PyFrameObject.f_locals).

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/65RHL2XKIYWMOCULGTI3HD3MSZSKVIBL/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Improve CPython tracing performance [ In reply to ]
On Fri, 30 Oct 2020 at 23:34, Victor Stinner <vstinner@python.org> wrote:
>
> Le ven. 30 oct. 2020 à 11:02, Nick Coghlan <ncoghlan@gmail.com> a écrit :
> > > Ok, I've created https://bugs.python.org/issue42197 to track it.
> >
> > Please also have a look at PEP 558 and its draft reference
> > implementation at https://github.com/python/cpython/pull/3640
>
> I don't think that the PEP 558 and bpo-42197 are incompatible.
>
> Debuggers and profilers usually only care of specific frames or
> function calls (ex: 10% of function calls or even a single function
> call in a whole application). The problem is how to make them as
> efficient as possible for "no operation" calls, when they don't care
> about the current frame. Avoiding PyFrame_FastToLocalsWithError() to
> enter the debugger/profile and avoiding PyFrame_LocalsToFast() on exit
> sounds a simple and practical solution.

Aye, I agree. I just don't think we can remove those implicit calls
without preparing a replacement API first.

> By the way, I created https://bugs.python.org/issue40421 to prepare
> the C API to make the PyFrameObject structure opaque. Once it will be
> opaque, we will have more freedom on the API exposed to inspect frame
> locals.
>
> IMO it's a good idea to require function calls to inspect frame
> locals, and not let developers think that PyFrameObject.f_locals is
> always up-to-date and can always be modified.

Aye, if direct access to the PyFrameObject.f_locals struct field goes
away, then the cache refresh can be added
to the accessor APIs, just as PEP 558 proposes doing for Python code.

> The good news is that PyFrame_GetLocals() and PyFrame_SetLocals() can
> easily be reimplemented in Python 3.9 for my compatibility header
> file:
> https://github.com/pythoncapi/pythoncapi_compat
>
> Such API avoids a complex proxy and simply reuses a regular dict
> object (exiting PyFrameObject.f_locals).

Unfortunately, cell variables mean that there's no way to make
snapshot-with-writeback logic consistently correct in the presence of
generators and coroutines (see
https://www.python.org/dev/peps/pep-0558/#resolving-the-issues-with-tracing-mode-behaviour
for the gist of that problem).

The early iterations of PEP 558 worked that way (like you, I wanted to
avoid having to actually implement the write-through proxy), but
Nathaniel kept finding holes where bugs like bpo-30744 could still
happen (hence the acknowledgement in the PEP), so I eventually
conceded defeat and accepted that the proxy was the only approach that
could ever be truly correct.

The core of the write-through proxy implementation isn't actually too
bad: https://github.com/python/cpython/blob/ed6e53be7794ec94924b69cd46d5b009633c6307/Objects/frameobject.c#L1315

The annoying parts are the outright copy-and-paste of odict code at
the end of the file, and the few remaining mutable mapping methods
that still raise NotImplementedError rather than doing what you'd
expect.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5OXVNEOSD5AZBSSDU23Y4UI45VOVRUHT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Improve CPython tracing performance [ In reply to ]
Reply to an old thread.

On Sat, Oct 31, 2020 at 8:02 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
> > Debuggers and profilers usually only care of specific frames or
> > function calls (ex: 10% of function calls or even a single function
> > call in a whole application). The problem is how to make them as
> > efficient as possible for "no operation" calls, when they don't care
> > about the current frame. Avoiding PyFrame_FastToLocalsWithError() to
> > enter the debugger/profile and avoiding PyFrame_LocalsToFast() on exit
> > sounds a simple and practical solution.
>
> Aye, I agree. I just don't think we can remove those implicit calls
> without preparing a replacement API first.

Again, I don't think that it's incompatible. We can enforce calling
PyFrame_FastToLocalsWithError() at enter and PyFrame_LocalsToFast() at
exit for now, and enhance the API later.


> Unfortunately, cell variables mean that there's no way to make
> snapshot-with-writeback logic consistently correct in the presence of
> generators and coroutines (...)

I do understand that there are corner cases where it doesn't work, ok.

But https://bugs.python.org/issue42197 is a simple and ready to merge
solution which should optimize profilers and debuggers for the most
common case.

To be clear: currently, PyFrame_FastToLocalsWithError() is called at
enter and PyFrame_LocalsToFast() is called at exit. So asking
debuggers/profilers to call them explicitly doesn't make the situation
worse (nor better ;-)) for generators/coroutines, it would be exactly
the same behavior. It's just an optimization.

The PEP 558 is being discussed for 5 years and still a draft. I don't
think that it should hold bpo-42197 optimization.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MV2723DT7BLHBJPNM6K4WHLTWSRMHZLN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Improve CPython tracing performance [ In reply to ]
On Wed, 20 Jan 2021 at 19:22, Victor Stinner <vstinner@python.org> wrote:
>
> Reply to an old thread.
>
> On Sat, Oct 31, 2020 at 8:02 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
> > > Debuggers and profilers usually only care of specific frames or
> > > function calls (ex: 10% of function calls or even a single function
> > > call in a whole application). The problem is how to make them as
> > > efficient as possible for "no operation" calls, when they don't care
> > > about the current frame. Avoiding PyFrame_FastToLocalsWithError() to
> > > enter the debugger/profile and avoiding PyFrame_LocalsToFast() on exit
> > > sounds a simple and practical solution.
> >
> > Aye, I agree. I just don't think we can remove those implicit calls
> > without preparing a replacement API first.
>
> Again, I don't think that it's incompatible. We can enforce calling
> PyFrame_FastToLocalsWithError() at enter and PyFrame_LocalsToFast() at
> exit for now, and enhance the API later.

PEP 558 makes `PyFrame_LocalsToFast()` raise an exception, so the two
approaches definitely aren't compatible :)

> To be clear: currently, PyFrame_FastToLocalsWithError() is called at
> enter and PyFrame_LocalsToFast() is called at exit. So asking
> debuggers/profilers to call them explicitly doesn't make the situation
> worse (nor better ;-)) for generators/coroutines, it would be exactly
> the same behavior. It's just an optimization.
>
> The PEP 558 is being discussed for 5 years and still a draft. I don't
> think that it should hold bpo-42197 optimization.

No, what should hold up the bpo-42197 PR is the fact that it's an API
compatibility break that shouldn't be done without a PEP.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PDHPZ3DQLT2Z4S5PTUKLB6FUJ3R676NI/
Code of Conduct: http://python.org/psf/codeofconduct/