Mailing List Archive

PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3)
I've updated PEP 683 for the feedback I've gotten. Thanks again for that!

The updated PEP text is included below. The largest changes involve
either the focus of the PEP (internal mechanism to mark objects
immortal) or the possible ways that things can break on older 32-bit
stable ABI extensions. All other changes are smaller.

Given the last round of discussion, I'm hoping this will be the last
round before we go to the steering council.

-eric

----------------

PEP: 683
Title: Immortal Objects, Using a Fixed Refcount
Author: Eric Snow <ericsnowcurrently@gmail.com>, Eddie Elizondo
<eduardo.elizondorueda@gmail.com>
Discussions-To:
https://mail.python.org/archives/list/python-dev@python.org/thread/TPLEYDCXFQ4AMTW6F6OQFINSIFYBRFCR/
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2022
Python-Version: 3.11
Post-History: 15-Feb-2022, 19-Feb-2022, 28-Feb-2022
Resolution:


Abstract
========

Currently the CPython runtime maintains a
`small amount of mutable state <Runtime Object State_>`_ in the
allocated memory of each object. Because of this, otherwise immutable
objects are actually mutable. This can have a large negative impact
on CPU and memory performance, especially for approaches to increasing
Python's scalability.

This proposal mandates that, internally, CPython will support marking
an object as one for which that runtime state will no longer change.
Consequently, such an object's refcount will never reach 0, and so the
object will never be cleaned up. We call these objects "immortal".
(Normally, only a relatively small number of internal objects
will ever be immortal.) The fundamental improvement here
is that now an object can be truly immutable.

Scope
-----

Object immortality is meant to be an internal-only feature. So this
proposal does not include any changes to public API or behavior
(with one exception). As usual, we may still add some private
(yet publicly accessible) API to do things like immortalize an object
or tell if one is immortal. Any effort to expose this feature to users
would need to be proposed separately.

There is one exception to "no change in behavior": refcounting semantics
for immortal objects will differ in some cases from user expectations.
This exception, and the solution, are discussed below.

Most of this PEP focuses on an internal implementation that satisfies
the above mandate. However, those implementation details are not meant
to be strictly proscriptive. Instead, at the least they are included
to help illustrate the technical considerations required by the mandate.
The actual implementation may deviate somewhat as long as it satisfies
the constraints outlined below. Furthermore, the acceptability of any
specific implementation detail described below does not depend on
the status of this PEP, unless explicitly specified.

For example, the particular details of:

* how to mark something as immortal
* how to recognize something as immortal
* which subset of functionally immortal objects are marked as immortal
* which memory-management activities are skipped or modified for
immortal objects

are not only CPython-specific but are also private implementation
details that are expected to change in subsequent versions.

Implementation Summary
----------------------

Here's a high-level look at the implementation:

If an object's refcount matches a very specific value (defined below)
then that object is treated as immortal. The CPython C-API and runtime
will not modify the refcount (or other runtime state) of an immortal
object.

Aside from the change to refcounting semantics, there is one other
possible negative impact to consider. A naive implementation of the
approach described below makes CPython roughly 4% slower. However,
the implementation is performance-neutral once known mitigations
are applied.


Motivation
==========

As noted above, currently all objects are effectively mutable. That
includes "immutable" objects like ``str`` instances. This is because
every object's refcount is frequently modified as the object is used
during execution. This is especially significant for a number of
commonly used global (builtin) objects, e.g. ``None``. Such objects
are used a lot, both in Python code and internally. That adds up to
a consistent high volume of refcount changes.

The effective mutability of all Python objects has a concrete impact
on parts of the Python community, e.g. projects that aim for
scalability like Instragram or the effort to make the GIL
per-interpreter. Below we describe several ways in which refcount
modification has a real negative effect on such projects.
None of that would happen for objects that are truly immutable.

Reducing CPU Cache Invalidation
-------------------------------

Every modification of a refcount causes the corresponding CPU cache
line to be invalidated. This has a number of effects.

For one, the write must be propagated to other cache levels
and to main memory. This has small effect on all Python programs.
Immortal objects would provide a slight relief in that regard.

On top of that, multi-core applications pay a price. If two threads
(running simultaneously on distinct cores) are interacting with the
same object (e.g. ``None``) then they will end up invalidating each
other's caches with each incref and decref. This is true even for
otherwise immutable objects like ``True``, ``0``, and ``str`` instances.
CPython's GIL helps reduce this effect, since only one thread runs at a
time, but it doesn't completely eliminate the penalty.

Avoiding Data Races
-------------------

Speaking of multi-core, we are considering making the GIL
a per-interpreter lock, which would enable true multi-core parallelism.
Among other things, the GIL currently protects against races between
multiple concurrent threads that may incref or decref the same object.
Without a shared GIL, two running interpreters could not safely share
any objects, even otherwise immutable ones like ``None``.

This means that, to have a per-interpreter GIL, each interpreter must
have its own copy of *every* object. That includes the singletons and
static types. We have a viable strategy for that but it will require
a meaningful amount of extra effort and extra complexity.

The alternative is to ensure that all shared objects are truly immutable.
There would be no races because there would be no modification. This
is something that the immortality proposed here would enable for
otherwise immutable objects. With immortal objects,
support for a per-interpreter GIL
becomes much simpler.

Avoiding Copy-on-Write
----------------------

For some applications it makes sense to get the application into
a desired initial state and then fork the process for each worker.
This can result in a large performance improvement, especially
memory usage. Several enterprise Python users (e.g. Instagram,
YouTube) have taken advantage of this. However, the above
refcount semantics drastically reduce the benefits and
have led to some sub-optimal workarounds.

Also note that "fork" isn't the only operating system mechanism
that uses copy-on-write semantics. Anything that uses ``mmap``
relies on copy-on-write, including sharing data from shared object
files between processes.


Rationale
=========

The proposed solution is obvious enough that both of this proposal's
authors came to the same conclusion (and implementation, more or less)
independently. The Pyston project `uses a similar approach <Pyston_>`_.
Other designs were also considered. Several possibilities have also
been discussed on python-dev in past years.

Alternatives include:

* use a high bit to mark "immortal" but do not change ``Py_INCREF()``
* add an explicit flag to objects
* implement via the type (``tp_dealloc()`` is a no-op)
* track via the object's type object
* track with a separate table

Each of the above makes objects immortal, but none of them address
the performance penalties from refcount modification described above.

In the case of per-interpreter GIL, the only realistic alternative
is to move all global objects into ``PyInterpreterState`` and add
one or more lookup functions to access them. Then we'd have to
add some hacks to the C-API to preserve compatibility for the
may objects exposed there. The story is much, much simpler
with immortal objects


Impact
======

Benefits
--------

Most notably, the cases described in the above examples stand
to benefit greatly from immortal objects. Projects using pre-fork
can drop their workarounds. For the per-interpreter GIL project,
immortal objects greatly simplifies the solution for existing static
types, as well as objects exposed by the public C-API.

In general, a strong immutability guarantee for objects enables Python
applications to scale like never before. This is because they can
then leverage multi-core parallelism without a tradeoff in memory
usage. This is reflected in most of the above cases.

Performance
-----------

A naive implementation shows `a 4% slowdown`_. We have demonstrated
a return to performance-neutral with a handful of basic mitigations
applied. See the `mitigation`_ section below.

On the positive side, immortal objects save a significant amount of
memory when used with a pre-fork model. Also, immortal objects provide
opportunities for specialization in the eval loop that would improve
performance.

.. _a 4% slowdown:
https://github.com/python/cpython/pull/19474#issuecomment-1032944709

Backward Compatibility
----------------------

Ideally this internal-only feature would be completely compatible.
However, it does involve a change to refcount semantics in some cases.
Only immortal objects are affected, but this includes high-use objects
like ``None``, ``True``, and ``False``.

Specifically, when an immortal object is involved:

* code that inspects the refcount will see a really, really large value
* the new noop behavior may break code that:

* depends specifically on the refcount to always increment or decrement
(or have a specific value from ``Py_SET_REFCNT()``)
* relies on any specific refcount value, other than 0 or 1
* directly manipulates the refcount to store extra information there

* in 32-bit pre-3.11 `Stable ABI`_ extensions,
objects may leak due to `Accidental Immortality`_
* such extensions may crash due to `Accidental De-Immortalizing`_

Again, those changes in behavior only apply to immortal objects, not
most of the objects a user will access. Furthermore, users cannot mark
an object as immortal so no user-created objects will ever have that
changed behavior. Users that rely on any of the changing behavior for
global (builtin) objects are already in trouble. So the overall impact
should be small.

Also note that code which checks for refleaks should keep working fine,
unless it checks for hard-coded small values relative to some immortal
object. The problems noticed by `Pyston`_ shouldn't apply here since
we do not modify the refcount.

See `Public Refcount Details`_ below for further discussion.

Accidental Immortality
''''''''''''''''''''''

Hypothetically, a non-immortal object could be incref'ed so much
that it reaches the magic value needed to be considered immortal.
That means it would accidentally never be cleaned up
(by going back to 0).

On 64-bit builds, this accidental scenario is so unlikely that we need
not worry. Even if done deliberately by using ``Py_INCREF()`` in a
tight loop and each iteration only took 1 CPU cycle, it would take
2^60 cycles (if the immortal bit were 2^60). At a fast 5 GHz that would
still take nearly 250,000,000 seconds (over 2,500 days)!

Also note that it is doubly unlikely to be a problem because it wouldn't
matter until the refcount got back to 0 and the object was cleaned up.
So any object that hit that magic "immortal" refcount value would have
to be decref'ed that many times again before the change in behavior
would be noticed.

Again, the only realistic way that the magic refcount would be reached
(and then reversed) is if it were done deliberately. (Of course, the
same thing could be done efficiently using ``Py_SET_REFCNT()`` though
that would be even less of an accident.) At that point we don't
consider it a concern of this proposal.

On 32-bit builds it isn't so obvious. Let's say the magic refcount
were 2^30. Using the same specs as above, it would take roughly
4 seconds to accidentally immortalize an object. Under reasonable
conditions, it is still highly unlikely that an object be accidentally
immortalized. It would have to meet these criteria:

* targeting a non-immortal object (so not one of the high-use builtins)
* the extension increfs without a corresponding decref
(e.g. returns from a function or method)
* no other code decrefs the object in the meantime

Even at a much less frequent rate incref it would not take long to reach
accidental immortality (on 32-bit). However, then it would have to run
through the same number of (now noop-ing) decrefs before that one object
would be effectively leaking. This is highly unlikely, especially because
the calculations assume no decrefs.

Furthermore, this isn't all that different from how such 32-bit extensions
can already incref an object past 2^31 and turn the refcount negative.
If that were an actual problem then we would have heard about it.

Between all of the above cases, the proposal doesn't consider
accidental immortality a problem.

Stable ABI
''''''''''

The implementation approach described in this PEP is compatible
with extensions compiled to the stable ABI (with the exception
of `Accidental Immortality`_ and `Accidental De-Immortalizing`_).
Due to the nature of the stable ABI, unfortunately, such extensions
use versions of ``Py_INCREF()``, etc. that directly modify the object's
``ob_refcnt`` field. This will invalidate all the performance benefits
of immortal objects.

However, we do ensure that immortal objects (mostly) stay immortal
in that situation. We set the initial refcount of immortal objects to
a value high above the magic refcount value, but one that still matches
the high bit. Thus we can still identify such objects as immortal.
(See `_Py_IMMORTAL_REFCNT`_.) At worst, objects in that situation
would feel the effects described in the `Motivation`_ section.
Even then the overall impact is unlikely to be significant.

Accidental De-Immortalizing
'''''''''''''''''''''''''''

32-bit builds of older stable ABI extensions can take `Accidental Immortality`_
to the next level.

Hypothetically, such an extension could incref an object to a value on
the next highest bit above the magic refcount value. For example, if
the magic value were 2^30 and the initial immortal refcount were thus
2^30 + 2^29 then it would take 2^29 increfs by the extension to reach
a value of 2^31, making the object non-immortal.
(Of course, a refcount that high would probably already cause a crash,
regardless of immortal objects.)

The more problematic case is where such a 32-bit stable ABI extension
goes crazy decref'ing an already immortal object. Continuing with the
above example, it would take 2^29 asymmetric decrefs to drop below the
magic immortal refcount value. So an object like ``None`` could be
made mortal and subject to decref. That still wouldn't be a problem
until somehow the decrefs continue on that object until it reaches 0.
For many immortal objects, like ``None``, the extension will crash
the process if it tries to dealloc the object. For the other
immortal objects, the dealloc might be okay. However, there will
be runtime code expecting the formerly-immortal object to be around
forever. That code will probably crash.

Again, the likelihood of this happening is extremely small, even on
32-bit builds. It would require roughly a billion decrefs on that
one object without a corresponding incref. The most likely scenario is
the following:

A "new" reference to ``None`` is returned by many functions and methods.
Unlike with non-immortal objects, the 3.11 runtime will almost never
incref ``None`` before giving it to the extension. However, the
extension *will* decref it when done with it (unless it returns it).
Each time that exchange happens with the one object, we get one step
closer to a crash.

How realistic is it that some form of that exchange (with a single
object) will happen a billion times in the lifetime of a Python process
on 32-bit? If it is a problem, how could it be addressed?

As to how realistic, the answer isn't clear currently. However, the
mitigation is simple enough that we can safely proceed under the
assumption that it would be a problem.

Here are some possible solutions (only needed on 32-bit):

* periodically reset the refcount for immortal objects
(only enable this if a stable ABI extension is imported?)
* special-case immortal objects in tp_dealloc() for the relevant types
(but not int, due to frequency?)
* provide a runtime flag for disabling immortality

Alternate Python Implementations
--------------------------------

This proposal is CPython-specific. However, it does relate to the
behavior of the C-API, which may affect other Python implementations.
Consequently, the effect of changed behavior described in
`Backward Compatibility`_ above also applies here (e.g. if another
implementation is tightly coupled to specific refcount values, other
than 0, or on exactly how refcounts change, then they may impacted).

Security Implications
---------------------

This feature has no known impact on security.

Maintainability
---------------

This is not a complex feature so it should not cause much mental
overhead for maintainers. The basic implementation doesn't touch
much code so it should have much impact on maintainability. There
may be some extra complexity due to performance penalty mitigation.
However, that should be limited to where we immortalize all
objects post-init and that code will be in one place.


Specification
=============

The approach involves these fundamental changes:

* add `_Py_IMMORTAL_REFCNT`_ (the magic value) to the internal C-API
* update ``Py_INCREF()`` and ``Py_DECREF()`` to no-op for objects with
the magic refcount (or its most significant bit)
* do the same for any other API that modifies the refcount
* stop modifying ``PyGC_Head`` for immortal GC objects ("containers")
* ensure that all immortal objects are cleaned up during
runtime finalization

Then setting any object's refcount to ``_Py_IMMORTAL_REFCNT``
makes it immortal.

(There are other minor, internal changes which are not described here.)

In the following sub-sections we dive into the details. First we will
cover some conceptual topics, followed by more concrete aspects like
specific affected APIs.

Public Refcount Details
-----------------------

In `Backward Compatibility`_ we introduced possible ways that user code
might be broken by the change in this proposal. Any contributing
misunderstanding by users is likely due in large part to the names of
the refcount-related API and to how the documentation explains those
API (and refcounting in general).

Between the names and the docs, we can clearly see answers
to the following questions:

* what behavior do users expect?
* what guarantees do we make?
* do we indicate how to interpret the refcount value they receive?
* what are the use cases under which a user would set an object's
refcount to a specific value?
* are users setting the refcount of objects they did not create?

As part of this proposal, we must make sure that users can clearly
understand on which parts of the refcount behavior they can rely and
which are considered implementation details. Specifically, they should
use the existing public refcount-related API and the only refcount
values with any meaning are 0 and 1. (Some code relies on 1 as an
indicator that the object can be safely modified.) All other values
are considered "not 0 or 1".

This information will be clarified in the `documentation <Documentation_>`_.

Arguably, the existing refcount-related API should be modified to reflect
what we want users to expect. Something like the following:

* ``Py_INCREF()`` -> ``Py_ACQUIRE_REF()`` (or only support ``Py_NewRef()``)
* ``Py_DECREF()`` -> ``Py_RELEASE_REF()``
* ``Py_REFCNT()`` -> ``Py_HAS_REFS()``
* ``Py_SET_REFCNT()`` -> ``Py_RESET_REFS()`` and ``Py_SET_NO_REFS()``

However, such a change is not a part of this proposal. It is included
here to demonstrate the tighter focus for user expectations that would
benefit this change.

Constraints
-----------

* ensure that otherwise immutable objects can be truly immutable
* minimize performance penalty for normal Python use cases
* be careful when immortalizing objects that we don't actually expect
to persist until runtime finalization.
* be careful when immortalizing objects that are not otherwise immutable
* ``__del__`` and weakrefs must continue working properly

Regarding "truly" immutable objects, this PEP doesn't impact the
effective immutability of any objects, other than the per-object
runtime state (e.g. refcount). So whether or not some immortal object
is truly (or even effectively) immutable can only be settled separately
from this proposal. For example, str objects are generally considered
immutable, but ``PyUnicodeObject`` holds some lazily cached data. This
PEP has no influence on how that state affects str immutability.

Immortal Mutable Objects
------------------------

Any object can be marked as immortal. We do not propose any
restrictions or checks. However, in practice the value of making an
object immortal relates to its mutability and depends on the likelihood
it would be used for a sufficient portion of the application's lifetime.
Marking a mutable object as immortal can make sense in some situations.

Many of the use cases for immortal objects center on immutability, so
that threads can safely and efficiently share such objects without
locking. For this reason a mutable object, like a dict or list, would
never be shared (and thus no immortality). However, immortality may
be appropriate if there is sufficient guarantee that the normally
mutable object won't actually be modified.

On the other hand, some mutable objects will never be shared between
threads (at least not without a lock like the GIL). In some cases it
may be practical to make some of those immortal too. For example,
``sys.modules`` is a per-interpreter dict that we do not expect to ever
get freed until the corresponding interpreter is finalized. By making
it immortal, we no longer incur the extra overhead during incref/decref.

We explore this idea further in the `mitigation`_ section below.

Implicitly Immortal Objects
---------------------------

If an immortal object holds a reference to a normal (mortal) object
then that held object is effectively immortal. This is because that
object's refcount can never reach 0 until the immortal object releases
it.

Examples:

* containers like ``dict`` and ``list``
* objects that hold references internally like ``PyTypeObject.tp_subclasses``
* an object's type (held in ``ob_type``)

Such held objects are thus implicitly immortal for as long as they are
held. In practice, this should have no real consequences since it
really isn't a change in behavior. The only difference is that the
immortal object (holding the reference) doesn't ever get cleaned up.

We do not propose that such implicitly immortal objects be changed
in any way. They should not be explicitly marked as immortal just
because they are held by an immortal object. That would provide
no advantage over doing nothing.

Un-Immortalizing Objects
------------------------

This proposal does not include any mechanism for taking an immortal
object and returning it to a "normal" condition. Currently there
is no need for such an ability.

On top of that, the obvious approach is to simply set the refcount
to a small value. However, at that point there is no way in knowing
which value would be safe. Ideally we'd set it to the value that it
would have been if it hadn't been made immortal. However, that value
has long been lost. Hence the complexities involved make it less
likely that an object could safely be un-immortalized, even if we
had a good reason to do so.

_Py_IMMORTAL_REFCNT
-------------------

We will add two internal constants::

_Py_IMMORTAL_BIT - has the top-most available bit set (e.g. 2^62)
_Py_IMMORTAL_REFCNT - has the two top-most available bits set

The actual top-most bit depends on existing uses for refcount bits,
e.g. the sign bit or some GC uses. We will use the highest bit possible
after consideration of existing uses.

The refcount for immortal objects will be set to ``_Py_IMMORTAL_REFCNT``
(meaning the value will be halfway between ``_Py_IMMORTAL_BIT`` and the
value at the next highest bit). However, to check if an object is
immortal we will compare (bitwise-and) its refcount against just
``_Py_IMMORTAL_BIT``.

The difference means that an immortal object will still be considered
immortal, even if somehow its refcount were modified (e.g. by an older
stable ABI extension).

Note that top two bits of the refcount are already reserved for other
uses. That's why we are using the third top-most bit.

Affected API
------------

API that will now ignore immortal objects:

* (public) ``Py_INCREF()``
* (public) ``Py_DECREF()``
* (public) ``Py_SET_REFCNT()``
* (private) ``_Py_NewReference()``

API that exposes refcounts (unchanged but may now return large values):

* (public) ``Py_REFCNT()``
* (public) ``sys.getrefcount()``

(Note that ``_Py_RefTotal`` and ``sys.gettotalrefcount()``
will not be affected.)

Also, immortal objects will not participate in GC.

Immortal Global Objects
-----------------------

All runtime-global (builtin) objects will be made immortal.
That includes the following:

* singletons (``None``, ``True``, ``False``, ``Ellipsis``, ``NotImplemented``)
* all static types (e.g. ``PyLong_Type``, ``PyExc_Exception``)
* all static objects in ``_PyRuntimeState.global_objects`` (e.g. identifiers,
small ints)

The question of making them actually immutable (e.g. for
per-interpreter GIL) is not in the scope of this PEP.

Object Cleanup
--------------

In order to clean up all immortal objects during runtime finalization,
we must keep track of them.

For GC objects ("containers") we'll leverage the GC's permanent
generation by pushing all immortalized containers there. During
runtime shutdown, the strategy will be to first let the runtime try
to do its best effort of deallocating these instances normally. Most
of the module deallocation will now be handled by
``pylifecycle.c:finalize_modules()`` which cleans up the remaining
modules as best as we can. It will change which modules are available
during ``__del__``, but that's already explicitly undefined behavior in the
docs. Optionally, we could do some topological ordering to guarantee
that user modules will be deallocated first before the stdlib modules.
Finally, anything left over (if any) can be found through the permanent
generation GC list which we can clear after ``finalize_modules()``.

For non-container objects, the tracking approach will vary on a
case-by-case basis. In nearly every case, each such object is directly
accessible on the runtime state, e.g. in a ``_PyRuntimeState`` or
``PyInterpreterState`` field. We may need to add a tracking mechanism
to the runtime state for a small number of objects.

None of the cleanup will have a significant effect on performance.

.. _mitigation:

Performance Regression Mitigation
---------------------------------

In the interest of clarity, here are some of the ways we are going
to try to recover some of the lost `performance <Performance_>`_:

* at the end of runtime init, mark all objects as immortal
* drop refcount operations in code where we know the object is immortal
(e.g. ``Py_RETURN_NONE``)
* specialize for immortal objects in the eval loop (see `Pyston`_)

Regarding that first point, we can apply the concept from
`Immortal Mutable Objects`_ in the pursuit of getting back some of
that 4% performance we lose with the naive implementation of immortal
objects. At the end of runtime init we can mark *all* objects as
immortal and avoid the extra cost in incref/decref. We only need
to worry about immutability with objects that we plan on sharing
between threads without a GIL.

Note that none of this section is part of the proposal.
The above is included here for clarity.

Possible Changes
----------------

* mark every interned string as immortal
* mark the "interned" dict as immortal if shared else share all interned strings
* (Larry,MvL) mark all constants unmarshalled for a module as immortal
* (Larry,MvL) allocate (immutable) immortal objects in their own memory page(s)

Documentation
-------------

The immortal objects behavior and API are internal, implementation
details and will not be added to the documentation.

However, we will update the documentation to make public guarantees
about refcount behavior more clear. That includes, specifically:

* ``Py_INCREF()`` - change "Increment the reference count for object o."
to "Indicate taking a new reference to object o."
* ``Py_DECREF()`` - change "Decrement the reference count for object o."
to "Indicate no longer using a previously taken reference to object o."
* similar for ``Py_XINCREF()``, ``Py_XDECREF()``, ``Py_NewRef()``,
``Py_XNewRef()``, ``Py_Clear()``
* ``Py_REFCNT()`` - add "The refcounts 0 and 1 have specific meanings
and all others only mean code somewhere is using the object,
regardless of the value.
0 means the object is not used and will be cleaned up.
1 means code holds exactly a single reference."
* ``Py_SET_REFCNT()`` - refer to ``Py_REFCNT()`` about how values over 1
may be substituted with some over value

We *may* also add a note about immortal objects to the following,
to help reduce any surprise users may have with the change:

* ``Py_SET_REFCNT()`` (a no-op for immortal objects)
* ``Py_REFCNT()`` (value may be surprisingly large)
* ``sys.getrefcount()`` (value may be surprisingly large)

Other API that might benefit from such notes are currently undocumented.
We wouldn't add such a note anywhere else (including for ``Py_INCREF()``
and ``Py_DECREF()``) since the feature is otherwise transparent to users.


Reference Implementation
========================

The implementation is proposed on GitHub:

https://github.com/python/cpython/pull/19474


Open Issues
===========

* how realistic is the `Accidental De-Immortalizing`_ concern?


References
==========

.. _Pyston: https://mail.python.org/archives/list/python-dev@python.org/message/JLHRTBJGKAENPNZURV4CIJSO6HI62BV3/

Prior Art
---------

* `Pyston`_

Discussions
-----------

This was discussed in December 2021 on python-dev:

* https://mail.python.org/archives/list/python-dev@python.org/thread/7O3FUA52QGTVDC6MDAV5WXKNFEDRK5D6/#TBTHSOI2XRWRO6WQOLUW3X7S5DUXFAOV
* https://mail.python.org/archives/list/python-dev@python.org/thread/PNLBJBNIQDMG2YYGPBCTGOKOAVXRBJWY

Runtime Object State
--------------------

Here is the internal state that the CPython runtime keeps
for each Python object:

* `PyObject.ob_refcnt`_: the object's `refcount <refcounting_>`_
* `_PyGC_Head <PyGC_Head>`_: (optional) the object's node in a list of
`"GC" objects <refcounting_>`_
* `_PyObject_HEAD_EXTRA <PyObject_HEAD_EXTRA>`_: (optional) the
object's node in the list of heap objects

``ob_refcnt`` is part of the memory allocated for every object.
However, ``_PyObject_HEAD_EXTRA`` is allocated only if CPython was built
with ``Py_TRACE_REFS`` defined. ``PyGC_Head`` is allocated only if the
object's type has ``Py_TPFLAGS_HAVE_GC`` set. Typically this is only
container types (e.g. ``list``). Also note that ``PyObject.ob_refcnt``
and ``_PyObject_HEAD_EXTRA`` are part of ``PyObject_HEAD``.

.. _PyObject.ob_refcnt:
https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/object.h#L107
.. _PyGC_Head: https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/internal/pycore_gc.h#L11-L20
.. _PyObject_HEAD_EXTRA:
https://github.com/python/cpython/blob/80a9ba537f1f1666a9e6c5eceef4683f86967a1f/Include/object.h#L68-L72

.. _refcounting:

Reference Counting, with Cyclic Garbage Collection
--------------------------------------------------

Garbage collection is a memory management feature of some programming
languages. It means objects are cleaned up (e.g. memory freed)
once they are no longer used.

Refcounting is one approach to garbage collection. The language runtime
tracks how many references are held to an object. When code takes
ownership of a reference to an object or releases it, the runtime
is notified and it increments or decrements the refcount accordingly.
When the refcount reaches 0, the runtime cleans up the object.

With CPython, code must explicitly take or release references using
the C-API's ``Py_INCREF()`` and ``Py_DECREF()``. These macros happen
to directly modify the object's refcount (unfortunately, since that
causes ABI compatibility issues if we want to change our garbage
collection scheme). Also, when an object is cleaned up in CPython,
it also releases any references (and resources) it owns
(before it's memory is freed).

Sometimes objects may be involved in reference cycles, e.g. where
object A holds a reference to object B and object B holds a reference
to object A. Consequently, neither object would ever be cleaned up
even if no other references were held (i.e. a memory leak). The
most common objects involved in cycles are containers.

CPython has dedicated machinery to deal with reference cycles, which
we call the "cyclic garbage collector", or often just
"garbage collector" or "GC". Don't let the name confuse you.
It only deals with breaking reference cycles.

See the docs for a more detailed explanation of refcounting
and cyclic garbage collection:

* https://docs.python.org/3.11/c-api/intro.html#reference-counts
* https://docs.python.org/3.11/c-api/refcounting.html
* https://docs.python.org/3.11/c-api/typeobj.html#c.PyObject.ob_refcnt
* https://docs.python.org/3.11/c-api/gcsupport.html


Copyright
=========

This document is placed in the public domain or under the
CC0-1.0-Universal license, whichever is more permissive.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MI22URMVKC63OFMZTALHFZKAKVGAT4UF/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3) [ In reply to ]
On Mon, Feb 28, 2022 at 6:01 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
> The updated PEP text is included below. The largest changes involve
> either the focus of the PEP (internal mechanism to mark objects
> immortal) or the possible ways that things can break on older 32-bit
> stable ABI extensions. All other changes are smaller.

In particular, I'm hoping to get your thoughts on the "Accidental
De-Immortalizing" section. While I'm confident we will find a good
solution, I'm not yet confident about the specific solution. So
feedback would be appreciated. Thanks!

-eric
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2NNPKXRL6HY7IYUDMEQ6DS5RC3AYQKYQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3) [ In reply to ]
On 09. 03. 22 4:58, Eric Snow wrote:
> On Mon, Feb 28, 2022 at 6:01 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
>> The updated PEP text is included below. The largest changes involve
>> either the focus of the PEP (internal mechanism to mark objects
>> immortal) or the possible ways that things can break on older 32-bit
>> stable ABI extensions. All other changes are smaller.
>
> In particular, I'm hoping to get your thoughts on the "Accidental
> De-Immortalizing" section. While I'm confident we will find a good
> solution, I'm not yet confident about the specific solution. So
> feedback would be appreciated. Thanks!

Hi,
I like the newest version, except this one section is concerning.


"periodically reset the refcount for immortal objects (only enable this
if a stable ABI extension is imported?)" -- that sounds quite expensive,
both at runtime and maintenance-wise.

"provide a runtime flag for disabling immortality" also doesn't sound
workable to me. We'd essentially need to run all tests twice every time
to make sure it stays working.


"Special-casing immortal objects in tp_dealloc() for the relevant types
(but not int, due to frequency?)" sounds promising.

The "relevant types" are those for which we skip calling incref/decref
entirely, like in Py_RETURN_NONE. This skipping is one of the optional
optimizations, so we're entirely in control of if/when to apply it. How
much would it slow things back down if it wasn't done for ints at all?



Some more reasoning for not worrying about de-immortalizing in types
without this optimization:
These objects will be de-immortalized with refcount around 2^29, and
then incref/decref go back to being paired properly. If 2^29 is much
higher than the true reference count at de-immortalization, this'll just
cause a memory leak at shutdown.
And it's probably OK to assume that the true reference count of an
object can't be anywhere near 2^29: most of the time, to hold a
reference you also need to have a pointer to the referenced object, and
there ain't enough memory for that many pointers. This isn't a formally
sound assumption, of course -- you can incref a million times with a
single pointer if you pair the decrefs correctly. But it might be why we
had no issues with "int won't overflow", an assumption which would fail
with just 4× higher numbers.

Of course, the this argument would apply to immortalization and 64-bit
builds as well. I wonder if there are holes in it :)

Oh, and if the "Special-casing immortal objects in tp_dealloc()" way is
valid, refcount values 1 and 0 can no longer be treated specially.
That's probably not a practical issue for the relevant types, but it's
one more thing to think about when applying the optimization.


There's also the other direction to consider: if an old stable-ABI
extension does unpaired *increfs* on an immortal object, it'll
eventually overflow the refcount.
When the refcount is negative, decref will currently crash if built with
Py_DEBUG, and I think we want to keep that check/crash. (Note that
either be Python itself or any extension could be built with Py_DEBUG.)
Hopefully we can live with that, and hope anyone running with Py_DEBUG
will send a useful use case report.
Or is there another bit before the sign this'll mess up?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7ZSLUMOIOV676UH42LIWGQASFMXBWSBN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3) [ In reply to ]
> "periodically reset the refcount for immortal objects (only enable this
> if a stable ABI extension is imported?)" -- that sounds quite expensive,
> both at runtime and maintenance-wise.

As I understand it, the plan is to represent an immortal object by setting two high-order bits to 1. The higher bit is the actual test, and the one representing half of that is a safety margin.

When reducing the reference count, CPython already checks whether the refcount's new value is 0. It could instead check whether refcount & (not !immortal_bit) is 0, which would detect when the safety margin has been reduced to 0 -- and could then add it back in. Since the bit manipulation is not conditional, the only extra branch will occur when an object is about to be de-allocated, and that might be rare enough to be an acceptable cost. (It still doesn't prevent rollover from too many increfs, but ... that should indeed be rare in the wild.)

-jJ
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/O324Q4KMMXL2UHOQIZZWS52U7YHJGYEI/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3) [ In reply to ]
On 10. 03. 22 3:35, Jim J. Jewett wrote:
>> "periodically reset the refcount for immortal objects (only enable this
>> if a stable ABI extension is imported?)" -- that sounds quite expensive,
>> both at runtime and maintenance-wise.
>
> As I understand it, the plan is to represent an immortal object by setting two high-order bits to 1. The higher bit is the actual test, and the one representing half of that is a safety margin.
>
> When reducing the reference count, CPython already checks whether the refcount's new value is 0. It could instead check whether refcount & (not !immortal_bit) is 0, which would detect when the safety margin has been reduced to 0 -- and could then add it back in. Since the bit manipulation is not conditional, the only extra branch will occur when an object is about to be de-allocated, and that might be rare enough to be an acceptable cost. (It still doesn't prevent rollover from too many increfs, but ... that should indeed be rare in the wild.)

The problem is that Py_DECREF is a macro, and its old behavior is
compiled into some extensions. We can't change this without breaking the
stable ABI.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FPKK46N4RBAEJUU5PCQZ3SIZ7GREETTH/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3) [ In reply to ]
responses inline

-eric

On Wed, Mar 9, 2022 at 8:23 AM Petr Viktorin <encukou@gmail.com> wrote:
> "periodically reset the refcount for immortal objects (only enable this
> if a stable ABI extension is imported?)" -- that sounds quite expensive,
> both at runtime and maintenance-wise.

Are you talking just about "(only enable this if a stable ABI
extension is imported?)"? Such a check could certainly be expensive
but it doesn't have to be. However, I'm guessing that you are
actually talking about the mechanism to periodically reset the
refcount.

The actual periodic reset doesn't seem like it needs to be all that
expensive overall. It would just need to be in a place that gets
triggered often enough, but not too often such that the extra cost of
resetting the refcount would be a problem.

One important factor is whether we need to worry about potential
de-immortalization for all immortal objects or only for a specific
subset, like the most commonly used objects (at least most commonly
used by the problematic older stable ABI extensions), Mostly, we only
need to be concerned with the objects that are likely to trigger
de-immortalization in those extensions. Realistically, there aren't
many potential immortal objects that would be exposed to the
de-immortalization problem (e.g. None, True, False), so we could limit
this workaround to them.

A variety of options come to mind. In each case we would reset the
refcount of a given object if it is immortal. (We would also only do
so if the refcount actually changed--to avoid cache invalidation and
copy-on-write.)

If we need to worry about *all* immortal objects then I see several options:

1. in a single place where stable ABI extensions are likely to pass
all objects often enough
2. in a single place where all objects pass through often enough

On the other hand, if we only need to worry about a fixed set of
objects, the following options come to mind:

1. in a single place that is likely to be called by older stable ABI extensions
2. in a place that runs often enough, targeting a hard-coded group of
immortal objects (common static globals like None)
* perhaps in the eval breaker code, in exception handling, etc.
3. like (2) but rotate through subsets of the hard-coded group (to
reduce the overall cost)
4. like (2), but in spread out in type-specific code (e.g. static
types could be reset in type_dealloc())

Again, none of those should be in code that runs often enough that the
overhead would add up.

> "provide a runtime flag for disabling immortality" also doesn't sound
> workable to me. We'd essentially need to run all tests twice every time
> to make sure it stays working.

Yeah, that makes it not worth it.

> "Special-casing immortal objects in tp_dealloc() for the relevant types
> (but not int, due to frequency?)" sounds promising.
>
> The "relevant types" are those for which we skip calling incref/decref
> entirely, like in Py_RETURN_NONE. This skipping is one of the optional
> optimizations, so we're entirely in control of if/when to apply it.

We would definitely do it for those types. NoneType and bool already
have a tp_dealloc that calls Py_FatalError() if triggered. The
tp_dealloc for str & tuple have special casing for some singletons
that do likewise. In PyType_Type.tp_dealloc we have a similar assert
for static types. In each case we would instead reset the refcount to
the initial immortal value. Regardless, in practice we may only need
to worry (as noted above) about the problem for the most commonly used
global objects, so perhaps we could stop there.

However, it depends on what the level of risk is, such that it would
warrant incurring additional potential performance/maintenance costs.
What is the likelihood of actual crashes due to pathological
de-immortalization in older stable ABI extensions? I don't have a
clear answer to offer on that but I'd only expect it to be a problem
if such extensions are used heavily in (very) long-running processes.

> How much would it slow things back down if it wasn't done for ints at all?

I'll look into that. We're talking about the ~260 small ints, so it
depends on how much they are used relative to all the other int
objects that are used in a program.

> Some more reasoning for not worrying about de-immortalizing in types
> without this optimization:
> These objects will be de-immortalized with refcount around 2^29, and
> then incref/decref go back to being paired properly. If 2^29 is much
> higher than the true reference count at de-immortalization, this'll just
> cause a memory leak at shutdown.
> And it's probably OK to assume that the true reference count of an
> object can't be anywhere near 2^29: most of the time, to hold a
> reference you also need to have a pointer to the referenced object, and
> there ain't enough memory for that many pointers. This isn't a formally
> sound assumption, of course -- you can incref a million times with a
> single pointer if you pair the decrefs correctly. But it might be why we
> had no issues with "int won't overflow", an assumption which would fail
> with just 4× higher numbers.

Yeah, if we're dealing with properly paired incref/decref then the
worry about crashing after de-immortalization is mostly gone. The
problem is where in the runtime we would simply not call Py_INCREF()
on certain objects because we know they are immortal. For instance,
Py_RETURN_NONE (outside the older stable ABI) would no longer incref,
while the problematic stable ABI extension would keep actually
decref'ing until we crash.

Again, I'm not sure what the likelihood of this case is. It seems
very unlikely to me.

> Of course, the this argument would apply to immortalization and 64-bit
> builds as well. I wonder if there are holes in it :)

With the numbers involved on 64-bit the problem is super unlikely due
to the massive numbers we're talking about, so we don't need to worry.
Or perhaps I misunderstood your point?

> Oh, and if the "Special-casing immortal objects in tp_dealloc()" way is
> valid, refcount values 1 and 0 can no longer be treated specially.
> That's probably not a practical issue for the relevant types, but it's
> one more thing to think about when applying the optimization.

Given the low chance of the pathological case, the nature of the
conditions where it might happen, and the specificity of 0 and 1
amongst all the possible values, I wouldn't consider this a problem.

> There's also the other direction to consider: if an old stable-ABI
> extension does unpaired *increfs* on an immortal object, it'll
> eventually overflow the refcount.
> When the refcount is negative, decref will currently crash if built with
> Py_DEBUG, and I think we want to keep that check/crash. (Note that
> either be Python itself or any extension could be built with Py_DEBUG.)
> Hopefully we can live with that, and hope anyone running with Py_DEBUG
> will send a useful use case report.

Yeah, the overflow case is not new. The only difference here (with
the specialized tp_dealloc) is that we'd reset the refcount for
immortal objects. Of course, we could adjust the check in Py_DECREF()
to accommodate both situations and keep crashing like it does not.

> Or is there another bit before the sign this'll mess up?

Nope, the sign bit is still highest-order.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OXAYWH47ZGLOWXTNKCIW4YE5PXGHNT4Y/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 683: "Immortal Objects, Using a Fixed Refcount" (round 3) [ In reply to ]
On 12. 03. 22 2:45, Eric Snow wrote:
> responses inline

I'll snip some discussion for a reason I'll get to later, and get right
to the third alternative:


[...]
>> "Special-casing immortal objects in tp_dealloc() for the relevant types
>> (but not int, due to frequency?)" sounds promising.
>>
>> The "relevant types" are those for which we skip calling incref/decref
>> entirely, like in Py_RETURN_NONE. This skipping is one of the optional
>> optimizations, so we're entirely in control of if/when to apply it.
>
> We would definitely do it for those types. NoneType and bool already
> have a tp_dealloc that calls Py_FatalError() if triggered. The
> tp_dealloc for str & tuple have special casing for some singletons
> that do likewise. In PyType_Type.tp_dealloc we have a similar assert
> for static types. In each case we would instead reset the refcount to
> the initial immortal value. Regardless, in practice we may only need
> to worry (as noted above) about the problem for the most commonly used
> global objects, so perhaps we could stop there.
>
> However, it depends on what the level of risk is, such that it would
> warrant incurring additional potential performance/maintenance costs.
> What is the likelihood of actual crashes due to pathological
> de-immortalization in older stable ABI extensions? I don't have a
> clear answer to offer on that but I'd only expect it to be a problem
> if such extensions are used heavily in (very) long-running processes.
>
>> How much would it slow things back down if it wasn't done for ints at all?
>
> I'll look into that. We're talking about the ~260 small ints, so it
> depends on how much they are used relative to all the other int
> objects that are used in a program.

Not only that -- as far as I understand, it's only cases where we know
at compile time that a small int is being returned. AFAIK, that would be
fast branches of aruthmetic code, but not much else.

If not optimizing small ints is OK performance-wise, then everything
looks good: we say that the “skip incref/decref” optimization can only
be done for types whose instances are *all* immortal, leave it to future
discussions to relax the requirement, and PEP 683 is good to go!

With that I mind I snipped your discussion of the previous alternative.
Going with this one wouldn't prevent us from doing something more clever
in the future.


>> Some more reasoning for not worrying about de-immortalizing in types
>> without this optimization:
>> These objects will be de-immortalized with refcount around 2^29, and
>> then incref/decref go back to being paired properly. If 2^29 is much
>> higher than the true reference count at de-immortalization, this'll just
>> cause a memory leak at shutdown.
>> And it's probably OK to assume that the true reference count of an
>> object can't be anywhere near 2^29: most of the time, to hold a
>> reference you also need to have a pointer to the referenced object, and
>> there ain't enough memory for that many pointers. This isn't a formally
>> sound assumption, of course -- you can incref a million times with a
>> single pointer if you pair the decrefs correctly. But it might be why we
>> had no issues with "int won't overflow", an assumption which would fail
>> with just 4× higher numbers.
>
> Yeah, if we're dealing with properly paired incref/decref then the
> worry about crashing after de-immortalization is mostly gone. The
> problem is where in the runtime we would simply not call Py_INCREF()
> on certain objects because we know they are immortal. For instance,
> Py_RETURN_NONE (outside the older stable ABI) would no longer incref,
> while the problematic stable ABI extension would keep actually
> decref'ing until we crash.
>
> Again, I'm not sure what the likelihood of this case is. It seems
> very unlikely to me.
>
>> Of course, the this argument would apply to immortalization and 64-bit
>> builds as well. I wonder if there are holes in it :)
>
> With the numbers involved on 64-bit the problem is super unlikely due
> to the massive numbers we're talking about, so we don't need to worry.
> Or perhaps I misunderstood your point?

That's true. However, as we're adjusting incref/decref documentation for
this PEP anyway, it looks like we could add “you should keep a pointer
around for each reference you hold”, and go from “super unlikely” to
“impossible in well-behaved code” :)

>> Oh, and if the "Special-casing immortal objects in tp_dealloc()" way is
>> valid, refcount values 1 and 0 can no longer be treated specially.
>> That's probably not a practical issue for the relevant types, but it's
>> one more thing to think about when applying the optimization.
>
> Given the low chance of the pathological case, the nature of the
> conditions where it might happen, and the specificity of 0 and 1
> amongst all the possible values, I wouldn't consider this a problem.

+1. But it's worth mentioning that it's not a problem.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WB2QG7KJUWMHRTMEIVV6V2S5NL6AECYF/
Code of Conduct: http://python.org/psf/codeofconduct/