Mailing List Archive

PyGC and PyObject_Malloc introspection
Hi there,

I've spent quite some time on memory profiling for Python now. I'm
struggling to get more information from the allocated memory right now
for what looks like a sad reason. :(

Supposedly PyObject_Malloc() returns some memory space to store a
PyObject. If that was true all the time, that would allow anyone to
introspect the allocated memory and understand why it's being used.

Unfortunately, this is not the case. Objects whose types are tracked by
the GC go through _PyObject_GC_Alloc() which changes the underlying
memory structure to be (PyGC_HEAD + PyObject).

This is a bummer as there then no safe way that I can think of to know
if an allocated memory space is gc-tracked or gc-untracked. It makes it
therefore impossible to introspect the memory allocated by
PyObject_Malloc().

There are multiple ways to solve this, but I don't see any way right now
of doing this without slightly changing CPython.

Has anyone any idea on how to workaround this or the kind of change that
could be acceptable to mitigate the issue?

Thanks!

Cheers,
--
Julien Danjou
# Free Software hacker
# https://julien.danjou.info
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QRX6U5XBXMHMT6YKIXERS3UT64ALYV27/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
Hi Julien,

Isn't this a similar problem that you have with regular malloc? When you
call malloc() with
some size, malloc actually will reserve more than that for
alignment purposes and for
bookkeeping and apart from some platform-specific APIs
like malloc_usable_size()
you cannot query that value.

Unless I am missing something here, there is also some problem of what you
propose: the Python
allocators deal with memory and they do not know what that memory is going
to be used so when you
say:

> This is a bummer as there then no safe way that I can think of to know ifan
allocated memory space is gc-tracked or gc-untracked.

My answer would be that that's because memory itself cannot be gc tracked,
only objects can and those belonging to different
categories. For example, notice that the tracemalloc module does not report
objects, it only reports memory blocks and you
cannot ask tracemalloc to give you a list of all the objects that were
allocated because it does not have the notion of what
an object is.

Adding the functionality that you suggest would couple the allocator with
other parts of the VM and it also won't give you the
the answer that you need to because what's underneath (pymalloc or libc
malloc or whatever else) may actually allocate even more
than what you asked.

Regards from cloudy London,
Pablo Galindo Salgado




On Thu, 14 Jan 2021 at 16:10, Julien Danjou <julien@danjou.info> wrote:

> Hi there,
>
> I've spent quite some time on memory profiling for Python now. I'm
> struggling to get more information from the allocated memory right now
> for what looks like a sad reason. :(
>
> Supposedly PyObject_Malloc() returns some memory space to store a
> PyObject. If that was true all the time, that would allow anyone to
> introspect the allocated memory and understand why it's being used.
>
> Unfortunately, this is not the case. Objects whose types are tracked by
> the GC go through _PyObject_GC_Alloc() which changes the underlying
> memory structure to be (PyGC_HEAD + PyObject).
>
> This is a bummer as there then no safe way that I can think of to know
> if an allocated memory space is gc-tracked or gc-untracked. It makes it
> therefore impossible to introspect the memory allocated by
> PyObject_Malloc().
>
> There are multiple ways to solve this, but I don't see any way right now
> of doing this without slightly changing CPython.
>
> Has anyone any idea on how to workaround this or the kind of change that
> could be acceptable to mitigate the issue?
>
> Thanks!
>
> Cheers,
> --
> Julien Danjou
> # Free Software hacker
> # https://julien.danjou.info
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QRX6U5XBXMHMT6YKIXERS3UT64ALYV27/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
[Julien Danjou]
> ...
> Supposedly PyObject_Malloc() returns some memory space to store a
> PyObject. If that was true all the time, that would allow anyone to
> introspect the allocated memory and understand why it's being used.
>
> Unfortunately, this is not the case. Objects whose types are tracked by
> the GC go through _PyObject_GC_Alloc() which changes the underlying
> memory structure to be (PyGC_HEAD + PyObject).
>
> This is a bummer as there then no safe way that I can think of to know
> if an allocated memory space is gc-tracked or gc-untracked. It makes it
> therefore impossible to introspect the memory allocated by
> PyObject_Malloc().

I'm not clear on exactly what it is you're after, but CPython faces
the same question all the time: _given_ a pointer to an object, is
there, or is there not, a GC header prepended? That's answered by
this C API function:

"""
int PyObject_IS_GC(PyObject *obj)

Returns non-zero if the object implements the garbage collector
protocol, otherwise returns 0.

The object cannot be tracked by the garbage collector if this function
returns 0.
"""

FYI, the implementation usually resolves it by looking at whether
obj's type object has the Py_TPFLAGS_HAVE_GC flag set.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LBYGKGEPB5ZAFEQGOBIICNEDRYK4PIT3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
On Thu, Jan 14 2021, Pablo Galindo Salgado wrote:

Hi Pablo,

> Isn't this a similar problem that you have with regular malloc? When you
> call malloc() with
> some size, malloc actually will reserve more than that for
> alignment purposes and for
> bookkeeping and apart from some platform-specific APIs
> like malloc_usable_size()
> you cannot query that value.

Not really. It's not a real problem if malloc reserve more memory than
the size you requested if you still know the original size.
When working with Python memory allocator API, you do have this original
size so you're able to read the memory from its beginning (the allocated
pointer address) to its end.

> My answer would be that that's because memory itself cannot be gc tracked,
> only objects can and those belonging to different
> categories. For example, notice that the tracemalloc module does not report
> objects, it only reports memory blocks and you
> cannot ask tracemalloc to give you a list of all the objects that were
> allocated because it does not have the notion of what
> an object is.

Exactly, which is a bit a bummer. Considering Python provides 3
different memory allocator, it'd be great if there was some ability to
be sure that PyObject_Malloc pointer are actually PyObject, not
Py_GC_HEAD.

--
Julien Danjou
// Free Software hacker
// https://julien.danjou.info
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MMCIEUV62ER6DWYG7FVIRCRD2KI3E4EZ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
On Thu, Jan 14 2021, Tim Peters wrote:

> I'm not clear on exactly what it is you're after, but CPython faces
> the same question all the time: _given_ a pointer to an object, is
> there, or is there not, a GC header prepended? That's answered by
> this C API function:
>
> """
> int PyObject_IS_GC(PyObject *obj)
>
> Returns non-zero if the object implements the garbage collector
> protocol, otherwise returns 0.
>
> The object cannot be tracked by the garbage collector if this function
> returns 0.
> """
>
> FYI, the implementation usually resolves it by looking at whether
> obj's type object has the Py_TPFLAGS_HAVE_GC flag set.

Right, but that only works if you have the PyObject address.
If you all got is a pointer returned by PyObject_Malloc(), you don't
know if the PyObject is at this pointer address, or after the PyGC_HEAD
header that is prepended. :(

--
Julien Danjou
/* Free Software hacker
https://julien.danjou.info */
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Z6GJUNKOFE2IN6EKQKDBBMX4SUZ36ITU/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
On Fri, 15 Jan 2021 09:36:11 +0100
Julien Danjou <julien@danjou.info> wrote:
> On Thu, Jan 14 2021, Tim Peters wrote:
>
> > I'm not clear on exactly what it is you're after, but CPython faces
> > the same question all the time: _given_ a pointer to an object, is
> > there, or is there not, a GC header prepended? That's answered by
> > this C API function:
> >
> > """
> > int PyObject_IS_GC(PyObject *obj)
> >
> > Returns non-zero if the object implements the garbage collector
> > protocol, otherwise returns 0.
> >
> > The object cannot be tracked by the garbage collector if this function
> > returns 0.
> > """
> >
> > FYI, the implementation usually resolves it by looking at whether
> > obj's type object has the Py_TPFLAGS_HAVE_GC flag set.
>
> Right, but that only works if you have the PyObject address.
> If you all got is a pointer returned by PyObject_Malloc(), you don't
> know if the PyObject is at this pointer address, or after the PyGC_HEAD
> header that is prepended. :(

Also note that PyObject_Malloc() may also be used to allocate
non-objects, for example a bytearray's payload, IIRC.

Regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DCXBZ6UJ5OKYCUL6MSWVQD6JT7YW5QON/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
> Exactly, which is a bit a bummer. Considering Python provides 3
> different memory allocator, it'd be great if there was some ability to
> be sure that PyObject_Malloc pointer are actually PyObject, not
> Py_GC_HEAD.

The allocators are specialized based on the allocation strategy
and efficiency, not based on what are you going to use the memory
for. If you want to allocate a buffer using the object allocation
strategy because <insert reason> then nobody is preventing you
to use PyObject_Malloc(). Even if we sanitize the whole stdlib to
be conforming to "only objects are allocated using PyObejct_Malloc()",
3rd party extension modules and other bests can do whatever, so you
can still crash if you decide to interpreter the output as an object.


On Fri, 15 Jan 2021 at 08:43, Julien Danjou <julien@danjou.info> wrote:

> On Thu, Jan 14 2021, Tim Peters wrote:
>
> > I'm not clear on exactly what it is you're after, but CPython faces
> > the same question all the time: _given_ a pointer to an object, is
> > there, or is there not, a GC header prepended? That's answered by
> > this C API function:
> >
> > """
> > int PyObject_IS_GC(PyObject *obj)
> >
> > Returns non-zero if the object implements the garbage collector
> > protocol, otherwise returns 0.
> >
> > The object cannot be tracked by the garbage collector if this function
> > returns 0.
> > """
> >
> > FYI, the implementation usually resolves it by looking at whether
> > obj's type object has the Py_TPFLAGS_HAVE_GC flag set.
>
> Right, but that only works if you have the PyObject address.
> If you all got is a pointer returned by PyObject_Malloc(), you don't
> know if the PyObject is at this pointer address, or after the PyGC_HEAD
> header that is prepended. :(
>
> --
> Julien Danjou
> /* Free Software hacker
> https://julien.danjou.info */
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/Z6GJUNKOFE2IN6EKQKDBBMX4SUZ36ITU/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
On Fri, Jan 15 2021, Antoine Pitrou wrote:

> Also note that PyObject_Malloc() may also be used to allocate
> non-objects, for example a bytearray's payload, IIRC.

Interesting. What's the rational for not using PyMem_Malloc() in such
cases?

--
Julien Danjou
# Free Software hacker
# https://julien.danjou.info
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6GM4IOWIQWAQNBRAAPZBC5TRD2HZ7VTL/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
Le 15/01/2021 à 13:08, Julien Danjou a écrit :
> On Fri, Jan 15 2021, Antoine Pitrou wrote:
>
>> Also note that PyObject_Malloc() may also be used to allocate
>> non-objects, for example a bytearray's payload, IIRC.
>
> Interesting. What's the rational for not using PyMem_Malloc() in such
> cases?

I think PyMem_Malloc() just redirects to PyObject_Malloc() nowadays.
Historically though, PyMem_Malloc() was a thin wrapper around the system
malloc(), and it could therefore be slower than PyObject_Malloc() for
small payloads.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6OH6N6YAGL6MRFJVOGLV3UYY5JCDRQKN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
On Fri, Jan 15 2021, Pablo Galindo Salgado wrote:

>> Exactly, which is a bit a bummer. Considering Python provides 3
>> different memory allocator, it'd be great if there was some ability to
>> be sure that PyObject_Malloc pointer are actually PyObject, not
>> Py_GC_HEAD.
>
> The allocators are specialized based on the allocation strategy
> and efficiency, not based on what are you going to use the memory
> for. If you want to allocate a buffer using the object allocation
> strategy because <insert reason> then nobody is preventing you
> to use PyObject_Malloc(). Even if we sanitize the whole stdlib to
> be conforming to "only objects are allocated using PyObejct_Malloc()",
> 3rd party extension modules and other bests can do whatever, so you
> can still crash if you decide to interpreter the output as an object.

Agreed.

Then the correct endpoint would more likely to be PyObject_New(), but
there's no way to intercept such calls for statistical analysis
currently. And as you wrote, if some code decide to use PyMalloc()
directly, then that memory won't be tracked.

It sounds like the provided C API is a bit too low level for this,
preventing any kind of statistical analysis of the allocation patterns.
:(

--
Julien Danjou
# Free Software hacker
# https://julien.danjou.info
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UAWA2ADOQNI5IAGI3YVNE7EYKGSBIHZ5/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PyGC and PyObject_Malloc introspection [ In reply to ]
> Then the correct endpoint would more likely to be PyObject_New(), but
> there's no way to intercept such calls for statistical analysis
> currently. And as you wrote, if some code decide to use PyMalloc()
> directly, then that memory won't be tracked.

The one that CPython uses in debug mode to track all references for
sys.getobjects() is PyObject_Init(), which is heavily inlined for
performance
reasons (the same as many of the other calls in the allocation chain), so
unfortunately
is not possible to intercept using LD_PRELOAD or even GOT patching.

> t sounds like the provided C API is a bit too low level for this,
> preventing any kind of statistical analysis of the allocation patterns.
> :(

Yes, allowing interception or customization will prevent inlining or other
optimizations
and therefore will involve a considerable performance hit. As an experiment
I forced
PyObject_Init and _PyObject_Init to not be inlined and that made a 7-13%
speed
impact overall in the performance test suite.

If you want to track all objects creation, your best bet IMHO is a debug
build and to use sys.getobjects().

Regards from sunny London,
Pablo Galindo Salgado


On Fri, 15 Jan 2021 at 12:17, Julien Danjou <julien@danjou.info> wrote:

> On Fri, Jan 15 2021, Pablo Galindo Salgado wrote:
>
> >> Exactly, which is a bit a bummer. Considering Python provides 3
> >> different memory allocator, it'd be great if there was some ability to
> >> be sure that PyObject_Malloc pointer are actually PyObject, not
> >> Py_GC_HEAD.
> >
> > The allocators are specialized based on the allocation strategy
> > and efficiency, not based on what are you going to use the memory
> > for. If you want to allocate a buffer using the object allocation
> > strategy because <insert reason> then nobody is preventing you
> > to use PyObject_Malloc(). Even if we sanitize the whole stdlib to
> > be conforming to "only objects are allocated using PyObejct_Malloc()",
> > 3rd party extension modules and other bests can do whatever, so you
> > can still crash if you decide to interpreter the output as an object.
>
> Agreed.
>
> Then the correct endpoint would more likely to be PyObject_New(), but
> there's no way to intercept such calls for statistical analysis
> currently. And as you wrote, if some code decide to use PyMalloc()
> directly, then that memory won't be tracked.
>
> It sounds like the provided C API is a bit too low level for this,
> preventing any kind of statistical analysis of the allocation patterns.
> :(
>
> --
> Julien Danjou
> # Free Software hacker
> # https://julien.danjou.info
>