Mailing List Archive

Heap types (PyType_FromSpec) must fully implement the GC protocol
Hi,

In the Python stdlib, many heap types currently don't "properly"
(fully?) implement the GC protocol which can prevent to destroy these
types at Python exit. As a side effect, some other Python objects can
also remain alive, and so are not destroyed neither.

There is an on-going effect to destroy all Python objects at exit
(bpo-1635741). This problem is getting worse when subinterpreters are
involved: Refleaks buildbots failures which prevent to spot other
regressions, and so these "leaks" / "GC bugs" must be fixed as soon as
possible. In my experience, many leaks spotted by tests using
subinterpreters were quite old, it's just that they were ignored
previously.

It's an hard problem and I don't see any simple/obvious solution right
now, except of *workarounds* that I dislike. Maybe the only good
solution is to fix all heap types, one by one.


== Only the Python stdlib should be affected ==

PyType_FromSpec() was added to Python 3.2 by the PEP 384 to define
"heap types" in C, but I'm not sure if it's popular in practice (ex:
Cython doesn't use it, but defines static types). I expect that most
types to still be defined the old style (static types) in a vas
majority of third party extension modules.

To be clear, static types are not affected by this email.

Third party extension modules using the limited C API (to use the
stable ABI) and PyType_FromSpec() can be affected (if they don't fully
implement the GC protocol).


== Heap type instances now stores a strong reference to their type ==

In March 2019, the PyObject_Init() function was modified in bpo-35810
to keep a strong reference (INCREF) to the type if the type is a heap
type. The fixed problem was that heap types could be destroyed before
the last instance is destroyed.


== GC and heap types ==

The new problem is that most heap types don't collaborate well with
the garbage collector. The garbage collector doesn't know anything
about Python objects, types, reference counting or anything. It only
uses the PyGC_Head header and the traverse functions. If an object
holds a strong reference to an object but its type does not define a
traverse function, the GC cannot guess/infer this reference.

A heap type must respect the following 3 conditions to collaborate with the GC:

* Have the Py_TPFLAGS_HAVE_GC flag;
* Define a traverse function (tp_traverse) which visits the type:
Py_VISIT(Py_TYPE(self));
* Instances must be tracked by the GC.

If one of these conditions is not met, the GC can fail to destroy a
type during a GC collection. If an instance is kept alive late while a
Python interpreter is being deleted, it's possible that the type is
never deleted, which can keep indirectly *many* objects alive and so
don't delete them neither.

In practice, when a type is not deleted, a test using subinterpreter
starts to fail on Refleaks buildbot since it leaks references. Without
subinterpreters, such leak is simply ignored, whereas this is an
on-going effect to delete Python objects at exit (bpo-1635741).


== Boring traverse functions ==

Currently, there is no default traverse implementation which visits the type.

For example, I had the implement the following function for _thread.LockType:

static int
lock_traverse(lockobject *self, visitproc visit, void *arg)
{
Py_VISIT(Py_TYPE(self));
return 0;
}

It's a little bit annoying to have to implement the GC protocol
whereas a lock cannot contain other Python objects, it's not a
container. It's just a thin wrapper to a C lock.

There is exactly one strong reference: to the type.


== Workaround: loop on gc.collect() ==

A workaround is to run gc.collect() in a loop until it returns 0 (no
object was collected).


== Traverse automatically? Nope. ==

Pablo Galindo attempts to automatically visit the type in the traverse function:

https://bugs.python.org/issue40217
https://github.com/python/cpython/commit/0169d3003be3d072751dd14a5c84748ab63a249f

Moreover, What's New in Python 3.9 contains a long section suggesting
to implement a traverse function for this problem, but it doesn't
suggest to track instances:
https://docs.python.org/dev/whatsnew/3.9.html#changes-in-the-c-api

This solution causes too many troubles, and so instead, traverse
functions were defined on heap types to visit the type.

Currently in the master branch, 89 types are defined as heap types on
a total of 206 types (117 types are defined statically). I don't think
that these 89 heap types respect the 3 conditions to collaborate with
the GC.


== How should we address this issue? ==

I'm not sure what should be done. Working around the issue by
triggering multiple GC collections? Emit a warning in development mode
if a heap type doesn't collaborate well with the GC?

If core developers miss these bugs and have troubles to debug them, I
expect that extension module authors would suffer even more.


== GC+heap type bugs became common ==

I'm fixing such GC issue for 1 year as part as the work on cleaning
Python objects at exit, and also indirectly related to
subinterpreters. The behavior is surprising, it's really hard to dig
into GC internals and understand what's going on. I wrote an article
on this kind of "GC bugs":
https://vstinner.github.io/subinterpreter-leaks.html

Today, I learnt the hard way that defining a traverse is *not* enough.
The type constructor (tp_new) must also track instances! See my fix
for _multibytecodec related to CJK codecs:

https://github.com/python/cpython/commit/11ef53aefbecfac18b63cee518a7184f771f708c
https://bugs.python.org/issue42866


== Reference cycles are common ==

The GC only serves to break reference cycles. But reference cycles are
rare, right? Well...

First of all, most types create reference cycles involing themselves.
For example, a type __mro__ tuple contains the type which already
creates a ref cycle. Type methods can also contain a reference to the
type.

=> The GC must break the cycle, otherwise the type cannot be destroyed

When a function is defined in a Python module, the function
__globals__ is the module namespace (module.__dict__) which...
contains the function. Defining a function in a Python module also
creates a reference cycle which prevents to delete the module
namespace.

If a function is used as a callback somewhere, the whole module
remains "alive" until the reference to the callback is cleared.
Example. os.register_at_fork() and codecs.register() callbacks are
cleared really late during Python finalization. Currently, it's
basically the last objects which are cleared at Python exit. After
that, there is exactly one final GC collection.

=> The GC


== Debug GC issues ==

* gc.get_referents() and gc.get_referrers() can be used to check
traverse functions.
* gc.is_tracked() can be used to check if the GC tracks an object.
* Using the gdb debugger on gc_collect_main() helps to see which
objects are collected. See for example the finalize_garbage()
functions which calls finalizers on unreachable objects.
* The solution is usually a missing traverse functions or a missing
Py_VISIT() in an existing traverse function.


== __del__ hack for debugging ==

If you want to play with the issue or if you have to debug a GC issue,
you can use an object which logs a message when it's being deleted:

class VerboseDel:
def __del__(self):
print("DELETE OBJECT")
obj = VerboseDel()

Warning: creating such object in a module also prevents to destroy the
module namespace when the last reference to the module is deleted!
__del__.__globals__ contains a reference to the module namespace, and
obj.__class__ contains a reference to the type... Yeah, ref cycle and
GC issues are fun!


== Long email ==

Yeah, I like to put titles in my long emails. Enjoy. Happy hacking!


Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/C4ILXGPKBJQYUN5YDMTJOEOX7RHOD4S3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On Sat, 9 Jan 2021 02:02:17 +0100
Victor Stinner <vstinner@python.org> wrote:
>
> It's an hard problem and I don't see any simple/obvious solution right
> now, except of *workarounds* that I dislike. Maybe the only good
> solution is to fix all heap types, one by one.

Ok. Why are we adding heap types to the stdlib exactly? Is the goal to
have exactly zero shared objects between subinterpreters?

Regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4LSSGCFUUTPCIZTT6NDRUCYWSZ7CZKA6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
Hi Victor,

Thank you for looking into these issues. They are very important to HPy too!

HPy currently only supports head types for similar reasons to why they
are important to sub-interpreters -- their lifecycle can be managed by
the Python interpreter and they are not tied to the memory and life
cycle of the dynamic library containing the C extension. E.g. with
heap types the interpreter can control when a type is created and
destroyed and when it can be accessed.

We've run into some minor issues with the limitations in PyType_Slot
(https://docs.python.org/3/c-api/type.html#c.PyType_Slot.PyType_Slot.slot)
but we are working around them for the moment.

It would be useful to have some sense of where PyType_FromSpec is
headed -- e.g. is it a goal to have it support all of the features of
static types in the future -- so that we can perhaps help suggest /
implement small changes that head in the right direction and also
ensure that HPy is aligned with the immediate future of the C API.

Yours sincerely,
Simon Cross
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GXZI7T2KGAU3BNKNW6E4CKDTECLZAUGX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
Hi,

There are multiple PEPs covering heap types. The latest one refers to
other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin.
https://www.python.org/dev/peps/pep-0630/#motivation

The use case is to embed multiple Python instances (interpreters) in
the same application process, or to embed Python with multiple calls
to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static
types are causing different issues for these use cases.

Also, it's not possible to destroy static types at Python exit, which
goes against the on-going effort to destroy all Python objects at exit
(bpo-1635741).

Victor
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XTOLFGQIYXPZXRD6BL4XO2XB53VDBDWC/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 1/11/21 5:26 PM, Victor Stinner wrote:
> Hi,
>
> There are multiple PEPs covering heap types. The latest one refers to
> other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin.
> https://www.python.org/dev/peps/pep-0630/#motivation
>
> The use case is to embed multiple Python instances (interpreters) in
> the same application process, or to embed Python with multiple calls
> to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static
> types are causing different issues for these use cases.

If a type is immutable and has no references to heap-allocated objects,
it could stay as a static type.
The issue is that very many types don't fit that. For example, if some
method needs to raise a module-specific exception, that's a reference to
a heap-allocated type, because custom exceptions generally aren't static.


> Also, it's not possible to destroy static types at Python exit, which
> goes against the on-going effort to destroy all Python objects at exit
> (bpo-1635741).

I don't see why we would need to destroy immutable static objects. They
don't need to be freed.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/E6NSBPPMCJV5KPZCZJOLDAO74VUK25X6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
Simon Cross wrote:
> We've run into some minor issues with the limitations in PyType_Slot
> (https://docs.python.org/3/c-api/type.html#c.PyType_Slot.PyType_Slot.slot)
> but we are working around them for the moment.
> It would be useful to have some sense of where PyType_FromSpec is
> headed -- e.g. is it a goal to have it support all of the features of
> static types in the future -- so that we can perhaps help suggest /
> implement small changes that head in the right direction and also
> ensure that HPy is aligned with the immediate future of the C API.

Yes, the goal is to have it support all the features of static types.
If you see something that's not in PEP 630 open issues (https://www.python.org/dev/peps/pep-0630/#open-issues),
I'd like to know.
I'm using https://github.com/encukou/abi3/issues to collect issues related to the stable ABI. Maybe have some of HPy's issues there already.

And fixes are always welcome, of course :)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BFLYTI2WZZY5W3LKF5WBT4MZBTB6O7N6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On Tue, Jan 12, 2021 at 3:28 PM Petr Viktorin <encukou@gmail.com> wrote:
> If a type is immutable and has no references to heap-allocated objects,
> it could stay as a static type.
> The issue is that very many types don't fit that. For example, if some
> method needs to raise a module-specific exception, that's a reference to
> a heap-allocated type, because custom exceptions generally aren't static.
> (...)
> I don't see why we would need to destroy immutable static objects. They
> don't need to be freed.

I'm not sure of your definition of "immutable" here. At the C level,
many immutable Python objects are mutable. For example, a str instance
*can* be modified with the C level, and computing hash(<my string>)
modifies the object as well (the internal cached hash value).

Any type contains at least one Python object: the __mro__ tuple. Most
types also contain a __subclass__ dictionary (by default, it's NULL).
These objects are created at Python startup, but not destroyed at
Python exit. See also tp_bases (tuple) and tp_dict (dict).

I tried once to "finalize" static types, but it didn't go well:

* https://github.com/python/cpython/pull/20763
* https://bugs.python.org/issue1635741#msg371119

It doesn't look to be safe to clear static types. Many functions rely
on the fact that static types are "always there" and are never
finalized. Also, only a few static types are cleared by my PR: many
static types are left unchanged. For example, static types of the _io
module. It seems like a safer approach is to continue the work on
bpo-40077: "Convert static types to PyType_FromSpec()".

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OMDNS54ZVOAMJALDCFB7IVD26WF47WB4/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On Tue, 12 Jan 2021 15:22:36 +0100
Petr Viktorin <encukou@gmail.com> wrote:
> On 1/11/21 5:26 PM, Victor Stinner wrote:
> > Hi,
> >
> > There are multiple PEPs covering heap types. The latest one refers to
> > other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin.
> > https://www.python.org/dev/peps/pep-0630/#motivation
> >
> > The use case is to embed multiple Python instances (interpreters) in
> > the same application process, or to embed Python with multiple calls
> > to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static
> > types are causing different issues for these use cases.
>
> If a type is immutable and has no references to heap-allocated objects,
> it could stay as a static type.
> The issue is that very many types don't fit that. For example, if some
> method needs to raise a module-specific exception, that's a reference to
> a heap-allocated type, because custom exceptions generally aren't static.

Aren't we confusing two different things here?

- a mutable *type*, i.e. a type with mutable state attached to itself
(not to instances)

- a mutable *instance*, where the mutable state is per-instance

While it's very common for custom exceptions to have mutable instance
state (e.g. a backend-specific error number), I can't think of any
custom exception that has mutable state attached to the exception
*type*.

> > Also, it's not possible to destroy static types at Python exit, which
> > goes against the on-going effort to destroy all Python objects at exit
> > (bpo-1635741).
>
> I don't see why we would need to destroy immutable static objects. They
> don't need to be freed.

Right.

Regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FNSTYH7AQQUJFCD354AWDXVZWCLNV462/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 1/12/21 4:09 PM, Victor Stinner wrote:
> On Tue, Jan 12, 2021 at 3:28 PM Petr Viktorin <encukou@gmail.com> wrote:
>> If a type is immutable and has no references to heap-allocated objects,
>> it could stay as a static type.
>> The issue is that very many types don't fit that. For example, if some
>> method needs to raise a module-specific exception, that's a reference to
>> a heap-allocated type, because custom exceptions generally aren't static.
>> (...)
>> I don't see why we would need to destroy immutable static objects. They
>> don't need to be freed.
>
> I'm not sure of your definition of "immutable" here. At the C level,
> many immutable Python objects are mutable. For example, a str instance
> *can* be modified with the C level, and computing hash(<my string>)
> modifies the object as well (the internal cached hash value).
>
> Any type contains at least one Python object: the __mro__ tuple. Most
> types also contain a __subclass__ dictionary (by default, it's NULL).
> These objects are created at Python startup, but not destroyed at
> Python exit. See also tp_bases (tuple) and tp_dict (dict).

Ah, right. __subclasses__ is the reason these need to be heap types (if
they allow subclassing, which – isn't).
If __mro__ is a tuple of static types, it could probably be made static
as well; hashes could be protected by a lock.


> I tried once to "finalize" static types, but it didn't go well:
>
> * https://github.com/python/cpython/pull/20763
> * https://bugs.python.org/issue1635741#msg371119
>
> It doesn't look to be safe to clear static types. Many functions rely
> on the fact that static types are "always there" and are never
> finalized. Also, only a few static types are cleared by my PR: many
> static types are left unchanged. For example, static types of the _io
> module. It seems like a safer approach is to continue the work on
> bpo-40077: "Convert static types to PyType_FromSpec()".

Yes, seems so. And perhaps this has enough subtle details to want a PEP?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3TLWPQT76RZ2Q6HEUKFURB3J45AXYWFE/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 1/12/21 4:34 PM, Antoine Pitrou wrote:
> On Tue, 12 Jan 2021 15:22:36 +0100
> Petr Viktorin <encukou@gmail.com> wrote:
>> On 1/11/21 5:26 PM, Victor Stinner wrote:
>>> Hi,
>>>
>>> There are multiple PEPs covering heap types. The latest one refers to
>>> other PEPs: PEP 630 "Isolating Extension Modules" by Petr Viktorin.
>>> https://www.python.org/dev/peps/pep-0630/#motivation
>>>
>>> The use case is to embed multiple Python instances (interpreters) in
>>> the same application process, or to embed Python with multiple calls
>>> to Py_Initialize/Py_Finalize (sequentially, not in parallel). Static
>>> types are causing different issues for these use cases.
>>
>> If a type is immutable and has no references to heap-allocated objects,
>> it could stay as a static type.
>> The issue is that very many types don't fit that. For example, if some
>> method needs to raise a module-specific exception, that's a reference to
>> a heap-allocated type, because custom exceptions generally aren't static.
>
> Aren't we confusing two different things here?
>
> - a mutable *type*, i.e. a type with mutable state attached to itself
> (not to instances)
>
> - a mutable *instance*, where the mutable state is per-instance
>
> While it's very common for custom exceptions to have mutable instance
> state (e.g. a backend-specific error number), I can't think of any
> custom exception that has mutable state attached to the exception
> *type*.

You're right, exception types *could* generally be static. However, the
most common API for creating them, PyErr_NewException[WithDoc], creates
heap types.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZSQ4HGBIMFBOVXBUKB7C7UPIJPW76OLA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 2021-01-12, Victor Stinner wrote:
> It seems like a safer approach is to continue the work on
> bpo-40077: "Convert static types to PyType_FromSpec()".

I agree that trying to convert static types is a good idea. Another
possible bonus might be that we can gain some performance by
integrating garbage collection with the Python object memory
allocator. Static types frustrate that effort.

Could we have something easier to use than PyType_FromSpec(), for
the purposes of coverting existing code? I was thinking of
something like:

static PyTypeObject Foo_TypeStatic = {
}
static PyTypeObject *Foo_Type;

PyInit_foo(void)
{
Foo_Type = PyType_FromStatic(&Foo_TypeStatic);
}


The PyType_FromStatic() would return a new heap type, created by
copying the static type. The static type could be marked as being
unusable (e.g. with a type flag).
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
One worry that I have in general with this move
is the usage of _PyType_GetModuleByDef to get the type object
from the module definition. This normally involves getting a TLS in every
instance creation,
which can impact notably performance for some perf-sensitive types or types
that are created a lot.

On Tue, 12 Jan 2021 at 18:21, Neil Schemenauer <nas-python@arctrix.com>
wrote:

> On 2021-01-12, Victor Stinner wrote:
> > It seems like a safer approach is to continue the work on
> > bpo-40077: "Convert static types to PyType_FromSpec()".
>
> I agree that trying to convert static types is a good idea. Another
> possible bonus might be that we can gain some performance by
> integrating garbage collection with the Python object memory
> allocator. Static types frustrate that effort.
>
> Could we have something easier to use than PyType_FromSpec(), for
> the purposes of coverting existing code? I was thinking of
> something like:
>
> static PyTypeObject Foo_TypeStatic = {
> }
> static PyTypeObject *Foo_Type;
>
> PyInit_foo(void)
> {
> Foo_Type = PyType_FromStatic(&Foo_TypeStatic);
> }
>
>
> The PyType_FromStatic() would return a new heap type, created by
> copying the static type. The static type could be marked as being
> unusable (e.g. with a type flag).
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 1/12/21 7:16 PM, Neil Schemenauer wrote:
> On 2021-01-12, Victor Stinner wrote:
>> It seems like a safer approach is to continue the work on
>> bpo-40077: "Convert static types to PyType_FromSpec()".
>
> I agree that trying to convert static types is a good idea. Another
> possible bonus might be that we can gain some performance by
> integrating garbage collection with the Python object memory
> allocator. Static types frustrate that effort.
>
> Could we have something easier to use than PyType_FromSpec(), for
> the purposes of coverting existing code? I was thinking of
> something like:
>
> static PyTypeObject Foo_TypeStatic = {
> }
> static PyTypeObject *Foo_Type;
>
> PyInit_foo(void)
> {
> Foo_Type = PyType_FromStatic(&Foo_TypeStatic);
> }
>
>
> The PyType_FromStatic() would return a new heap type, created by
> copying the static type. The static type could be marked as being
> unusable (e.g. with a type flag).

Unfortunately, it's not just the creation that needs to be changed.
You also need to decref Foo_Type somewhere.

Your example is for "single-phase init" modules (pre-PEP 489). Those
don't have a dealloc hook, so they will leak memory (e.g. in multiple
Py_Initialize/Py_Finalize cycles).

Multi-phase init (PEP 489) allows multiple module instances of extension
modules. Assigning PyType_FromStatic's result to a static pointer would
mean that every instance of the module will create a new type, and
overwrite any existing one. And the deallocation will either leave a
dangling pointer or NULL the pointer for other module instances.

So, you need to make the type part of the module state, so that the
module has proper ownership of the type. And that means you need to
access the type from the module state any time you need to use it.

At that point, IMO, PyType_FromStatic saves you so little work that it's
not worth supporting a third variation of type creation code.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WAKYSLYIUZN7NPCE6G6SRRCJK5RELJQ3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
Having used the heap types extensively for JPype, I believe that converting all types too heap types would be a great benefit. There are still minor rough spots in which a static type can do things that heap types cannot (such as you can derive a type which is marked final when it is static but not heap such as function). But generally I found heap types to be much more flexible. I found that heap types were better in concept than static but because the majority of the API (and the examples on using CAPI) were static the heap types paths were less exercised. I eventually puzzled out most of the mysteries, but having the everything be the same (except for old static types that should be marked as immortal) likely has a lot of side benefits.

Of course the other issue that I have with heap types is that they currently lack the concept of meta classes. Thus there are things that you can do from the Python language that you can't do from the C API. See...

https://bugs.python.org/issue42617

The downside of course is there are a lot of calls in the C API that infer that static type is fixed address. Perhaps those call all be macros to the which equate to evaluating the address of the heap type.

But that is just my 2 cents.

--Karl

-----Original Message-----
From: Neil Schemenauer <nas-python@arctrix.com>
Sent: Tuesday, January 12, 2021 10:17 AM
To: Victor Stinner <vstinner@python.org>
Cc: Python Dev <python-dev@python.org>
Subject: [Python-Dev] Re: Heap types (PyType_FromSpec) must fully implement the GC protocol

On 2021-01-12, Victor Stinner wrote:
> It seems like a safer approach is to continue the work on
> bpo-40077: "Convert static types to PyType_FromSpec()".

I agree that trying to convert static types is a good idea. Another possible bonus might be that we can gain some performance by integrating garbage collection with the Python object memory allocator. Static types frustrate that effort.

Could we have something easier to use than PyType_FromSpec(), for the purposes of coverting existing code? I was thinking of something like:

static PyTypeObject Foo_TypeStatic = {
}
static PyTypeObject *Foo_Type;

PyInit_foo(void)
{
Foo_Type = PyType_FromStatic(&Foo_TypeStatic);
}


The PyType_FromStatic() would return a new heap type, created by copying the static type. The static type could be marked as being unusable (e.g. with a type flag).
_______________________________________________
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/
Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DV4SPP2TTXGYMTMRMEO6TG5W7XPZKPXX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 1/12/21 7:48 PM, Pablo Galindo Salgado wrote:
> One worry that I have in general with this move
> is the usage of _PyType_GetModuleByDef to get the type object
> from the module definition. This normally involves getting a TLS in
> every instance creation,

Not TLS, it's walking the MRO.

> which can impact notably performance for some perf-sensitive types or types
> that are created a lot.

But yes, that's right. _PyType_GetModuleByDef should not be used in
perf-sensitive spots, at least not without profiling.
There's often an alternative, though. Do you have any specific cases
you're concerned about?


> On Tue, 12 Jan 2021 at 18:21, Neil Schemenauer <nas-python@arctrix.com
> <mailto:nas-python@arctrix.com>> wrote:
>
> On 2021-01-12, Victor Stinner wrote:
> > It seems like a safer approach is to continue the work on
> > bpo-40077: "Convert static types to PyType_FromSpec()".
>
> I agree that trying to convert static types is a good idea.  Another
> possible bonus might be that we can gain some performance by
> integrating garbage collection with the Python object memory
> allocator.  Static types frustrate that effort.
>
> Could we have something easier to use than PyType_FromSpec(), for
> the purposes of coverting existing code?  I was thinking of
> something like:
>
>     static PyTypeObject Foo_TypeStatic = {
>     }
>     static PyTypeObject *Foo_Type;
>
>     PyInit_foo(void)
>     {
>         Foo_Type = PyType_FromStatic(&Foo_TypeStatic);
>     }
>
>
> The PyType_FromStatic() would return a new heap type, created by
> copying the static type.  The static type could be marked as being
> unusable (e.g. with a type flag).
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> <mailto:python-dev@python.org>
> To unsubscribe send an email to python-dev-leave@python.org
> <mailto:python-dev-leave@python.org>
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> <https://mail.python.org/mailman3/lists/python-dev.python.org/>
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/
> <https://mail.python.org/archives/list/python-dev@python.org/message/RPG2TRQLONM2OCXKPVCIDKVLQOJR7EUU/>
> Code of Conduct: http://python.org/psf/codeofconduct/
> <http://python.org/psf/codeofconduct/>
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HOCGUW3S6AXBSQ5BWX5KYPFVXEGWQJ6H/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QYXDVMTI5CBKQOGYC557ER45IZZLJZGS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On Tue, 12 Jan 2021 18:48:39 +0000
Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
> One worry that I have in general with this move
> is the usage of _PyType_GetModuleByDef to get the type object
> from the module definition. This normally involves getting a TLS in every
> instance creation,
> which can impact notably performance for some perf-sensitive types or types
> that are created a lot.

If it's inlined C TLS it should be fast (*). If it's Python's emulated
TLS then probably not :-)

(*) see https://godbolt.org/z/d7eKx7

Regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CKABGMLSJLKDGKUOMXA6MKO36MEWZIIS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 2021-01-12, Pablo Galindo Salgado wrote:
> One worry that I have in general with this move is the usage of
> _PyType_GetModuleByDef to get the type object from the module
> definition. This normally involves getting a TLS in every instance
> creation, which can impact notably performance for some
> perf-sensitive types or types that are created a lot.

I would say _PyType_GetModuleByDef is the problem. Why do we need
to use such an ugly approach (walking the MRO) when Python defined
classes don't have the same performance issue? E.g.

class A:
def b():
pass
A.b.__globals__

IMHO, we should be working to make types and functions defined in
extensions more like the pure Python versions.

Related, my "__namespace__" idea[1] might be helpful in reducing the
differences between pure Python modules and extension modules.
Rather than functions having a __globals__ property, which is a
dict, they would have a __namespace__, which is a module object.
Basically, functions and methods known which global namespace
(module) they have been defined in. For extension modules, when you
call a function or method defined in the extension, it could be
passed the module instance, by using the __namespace__ property.

Maybe I'm missing some details on why this approach wouldn't work.
However, at a high level, I don't see why it shouldn't. Maybe
performance would be an issue? Reducing the number of branches in
code paths like CALL_FUNCTION should help.

1. https://github.com/nascheme/cpython/tree/frame_no_builtins
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ERZFFBSO6J4G4X3V5QFWH6CBEEECEIAG/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 2021-01-12, Petr Viktorin wrote:
> Unfortunately, it's not just the creation that needs to be changed.
> You also need to decref Foo_Type somewhere.

Add the type to the module dict?
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7LSR4IASUNVVEAV6M6FRHZN7DABWHSBY/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Heap types (PyType_FromSpec) must fully implement the GC protocol [ In reply to ]
On 1/12/21 8:23 PM, Neil Schemenauer wrote:
> On 2021-01-12, Pablo Galindo Salgado wrote:
>> One worry that I have in general with this move is the usage of
>> _PyType_GetModuleByDef to get the type object from the module
>> definition. This normally involves getting a TLS in every instance
>> creation, which can impact notably performance for some
>> perf-sensitive types or types that are created a lot.
>
> I would say _PyType_GetModuleByDef is the problem. Why do we need
> to use such an ugly approach (walking the MRO) when Python defined
> classes don't have the same performance issue? E.g.
>
> class A:
> def b():
> pass
> A.b.__globals__
>
> IMHO, we should be working to make types and functions defined in
> extensions more like the pure Python versions.
>
> Related, my "__namespace__" idea[1] might be helpful in reducing the
> differences between pure Python modules and extension modules.
> Rather than functions having a __globals__ property, which is a
> dict, they would have a __namespace__, which is a module object.
> Basically, functions and methods known which global namespace
> (module) they have been defined in. For extension modules, when you
> call a function or method defined in the extension, it could be
> passed the module instance, by using the __namespace__ property.
>
> Maybe I'm missing some details on why this approach wouldn't work.
> However, at a high level, I don't see why it shouldn't. Maybe
> performance would be an issue? Reducing the number of branches in
> code paths like CALL_FUNCTION should help.

The main difference between Python and C functions is that in C, you
need type safety. You can't store C state in a mutable dict (or module)
accessible from Python, because when users invalidate your C invariants,
you get a segfault rather than a nice AttributeError.

Making methods "remember" their context does work though, and has
already been implemented -- see PEP 573!
It uses the *defining class* instead of __namespace__, but you can get
the module from that quite easily.

The only place it doesn't work are slot methods, which have a fixed C
API. For example:

PyObject *tp_repr(PyObject *self);
int tp_init(PyObject *self, PyObject *args, PyObject *kwds);

There is no good way to pass the method, module object, globals() or the
defining class to such functions.


> 1. https://github.com/nascheme/cpython/tree/frame_no_builtins
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TZVSCCCTUISV32U2OTE5LY7F3X5QAVCX/
Code of Conduct: http://python.org/psf/codeofconduct/