Mailing List Archive

Question about garbage collection
Hi all

I have read that one should not have to worry about garbage collection
in modern versions of Python - it 'just works'.

I don't want to rely on that. My app is a long-running server, with
multiple clients logging on, doing stuff, and logging off. They can
create many objects, some of them long-lasting. I want to be sure that
all objects created are gc'd when the session ends.

I do have several circular references. My experience is that if I do not
take some action to break the references when closing the session, the
objects remain alive. Below is a very simple program to illustrate this.

Am I missing something? All comments appreciated.

Frank Millman

==================================================

import gc

class delwatcher:
    # This stores enough information to identify the object being watched.
    # It does not store a reference to the object itself.
    def __init__(self, obj):
        self.id = (obj.type, obj.name, id(obj))
        print('***', *self.id, 'created ***')
    def __del__(self):
        print('***', *self.id, 'deleted ***')

class Parent:
    def __init__(self, name):
        self.type = 'parent'
        self.name = name
        self.children = []
        self._del = delwatcher(self)

class Child:
    def __init__(self, parent, name):
        self.type = 'child'
        self.parent = parent
        self.name = name
        parent.children.append(self)
        self._del = delwatcher(self)

p1 = Parent('P1')
p2 = Parent('P2')

c1_1 = Child(p1, 'C1_1')
c1_2 = Child(p1, 'C1_2')
c2_1 = Child(p2, 'C2_1')
c2_2 = Child(p2, 'C2_2')

input('waiting ...')

# if next 2 lines are included, parent and child can be gc'd
# for ch in p1.children:
#     ch.parent = None

# if next line is included, child can be gc'd, but not parent
# p1.children = None

del c1_1
del p1
gc.collect()

input('wait some more ...')

--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
> I do have several circular references. My experience is that if I do not
> take some action to break the references when closing the session, the
> objects remain alive. Below is a very simple program to illustrate this.
>
> Am I missing something? All comments appreciated.

Python has normal reference counting, but also has a cyclic garbage
collector. Here's plenty of detail about how it works:

https://devguide.python.org/internals/garbage-collector/index.html

Skip
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
Frank Millman wrote at 2024-1-15 15:51 +0200:
>I have read that one should not have to worry about garbage collection
>in modern versions of Python - it 'just works'.

There are still some isolated cases when not all objects
in an unreachable cycle are destroyed
(see e.g. step 2 of
"https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects").
But Python's own objects (e.g. traceback cycles)
or instances of classes implemented in Python
should no longer be affected.

Thus, unless you use extensions implemented in C (with "legacy finalizer"s),
garbage collection should not make problems.


On the other hand, your application, too, must avoid memory leaks.
Caches of various forms (with data for several sessions) might introduce them.

--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
> Frank Millman wrote at 2024-1-15 15:51 +0200:
> >I have read that one should not have to worry about garbage collection
> >in modern versions of Python - it 'just works'.

Dieter Maurer via Python-list writes:
> There are still some isolated cases when not all objects
> in an unreachable cycle are destroyed
> (see e.g. step 2 of
> "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects").

Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.

...Akkana
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On Tue, 16 Jan 2024 at 06:32, Akkana Peck via Python-list
<python-list@python.org> wrote:
>
> > Frank Millman wrote at 2024-1-15 15:51 +0200:
> > >I have read that one should not have to worry about garbage collection
> > >in modern versions of Python - it 'just works'.
>
> Dieter Maurer via Python-list writes:
> > There are still some isolated cases when not all objects
> > in an unreachable cycle are destroyed
> > (see e.g. step 2 of
> > "https://devguide.python.org/internals/garbage-collector/index.html#destroying-unreachable-objects").
>
> Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.
>

Got any examples of that?

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
I wrote:
> > Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.

Chris Angelico writes:
> Got any examples of that?

The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out of scope, the Python object was automatically garbage collected, but the pixbuf data leaked. Calling gc.collect() caused the pixbuf data to be garbage collected too.

There used to be a post explaining this on the pygtk mailing list: the link was
http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html
but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated.

...Akkana
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On Tue, 16 Jan 2024 at 13:49, Akkana Peck via Python-list
<python-list@python.org> wrote:
>
> I wrote:
> > > Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.
>
> Chris Angelico writes:
> > Got any examples of that?
>
> The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out of scope, the Python object was automatically garbage collected, but the pixbuf data leaked. Calling gc.collect() caused the pixbuf data to be garbage collected too.
>
> There used to be a post explaining this on the pygtk mailing list: the link was
> http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html
> but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated.
>

Okay, so to clarify: the Python object will always be garbage
collected correctly, but a buggy third-party module might have
*external* resources (in that case, the pixbuf) that aren't properly
released. Either that, or there is a reference loop, which doesn't
necessarily mean you NEED to call gc.collect(), but it can help if you
want to get rid of them more promptly. (Python will detect such loops
at some point, but not always immediately.) But these are bugs in the
module, particularly the first case, and should be considered as such.
2003 is fully two decades ago now, and I would not expect that a
serious bug like that has been copied into PyGObject (the newer way of
using GTK from Python).

So, Python's garbage collection CAN be assumed to "just work", unless
you find evidence to the contrary.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 1/15/2024 9:47 PM, Akkana Peck via Python-list wrote:
> I wrote:
>>> Also be warned that some modules (particularly if they're based on libraries not written in Python) might not garbage collect, so you may need to use other methods of cleaning up after those objects.
>
> Chris Angelico writes:
>> Got any examples of that?
>
> The big one for me was gdk-pixbuf, part of GTK. When you do something like gtk.gdk.pixbuf_new_from_file(), there's a Python object that gets created, but there's also the underlying C code that allocates memory for the pixbuf. When the object went out of scope, the Python object was automatically garbage collected, but the pixbuf data leaked.

This kind of thing can happen with PyQt, also. There are ways to
minimize it but I don't know if you can ever be sure all Qt C++ objects
will get deleted. It depends on the type of object and the circumstances.

> Calling gc.collect() caused the pixbuf data to be garbage collected too.
>
> There used to be a post explaining this on the pygtk mailing list: the link was
> http://www.daa.com.au/pipermail/pygtk/2003-December/006499.html
> but that page is gone now and I can't seem to find any other archives of that list (it's not on archive.org either). And this was from GTK2; I never checked whether the extra gc.collect() is still necessary in GTK3, but I figure leaving it in doesn't hurt anything. I use pixbufs in a tiled map application, so there are a lot of small pixbufs being repeatedly read and then deallocated.
>
> ...Akkana

--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
> On 16 Jan 2024, at 03:49, Thomas Passin via Python-list <python-list@python.org> wrote:
>
> This kind of thing can happen with PyQt, also. There are ways to minimize it but I don't know if you can ever be sure all Qt C++ objects will get deleted. It depends on the type of object and the circumstances.

When this has been seen in the past it has been promptly fixed by the maintainer.

Barry




--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 2024-01-15 3:51 PM, Frank Millman via Python-list wrote:
> Hi all
>
> I have read that one should not have to worry about garbage collection
> in modern versions of Python - it 'just works'.
>
> I don't want to rely on that. My app is a long-running server, with
> multiple clients logging on, doing stuff, and logging off. They can
> create many objects, some of them long-lasting. I want to be sure that
> all objects created are gc'd when the session ends.
>

I did not explain myself very well. Sorry about that.

My problem is that my app is quite complex, and it is easy to leave a
reference dangling somewhere which prevents an object from being gc'd.

This can create (at least) two problems. The obvious one is a memory
leak. The second is that I sometimes need to keep a reference from a
transient object to a more permanent structure in my app. To save myself
the extra step of removing all these references when the transient
object is deleted, I make them weak references. This works, unless the
transient object is kept alive by mistake and the weak ref is never removed.

I feel it is important to find these dangling references and fix them,
rather than wait for problems to appear in production. The only method I
can come up with is to use the 'delwatcher' class that I used in my toy
program in my original post.

I am surprised that this issue does not crop up more often. Does nobody
else have these problems?

Frank



--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On Tue, 16 Jan 2024 at 23:08, Frank Millman via Python-list
<python-list@python.org> wrote:
>
> On 2024-01-15 3:51 PM, Frank Millman via Python-list wrote:
> > Hi all
> >
> > I have read that one should not have to worry about garbage collection
> > in modern versions of Python - it 'just works'.
> >
> > I don't want to rely on that. My app is a long-running server, with
> > multiple clients logging on, doing stuff, and logging off. They can
> > create many objects, some of them long-lasting. I want to be sure that
> > all objects created are gc'd when the session ends.
> >
>
> I did not explain myself very well. Sorry about that.
>
> My problem is that my app is quite complex, and it is easy to leave a
> reference dangling somewhere which prevents an object from being gc'd.
>
> This can create (at least) two problems. The obvious one is a memory
> leak. The second is that I sometimes need to keep a reference from a
> transient object to a more permanent structure in my app. To save myself
> the extra step of removing all these references when the transient
> object is deleted, I make them weak references. This works, unless the
> transient object is kept alive by mistake and the weak ref is never removed.
>
> I feel it is important to find these dangling references and fix them,
> rather than wait for problems to appear in production. The only method I
> can come up with is to use the 'delwatcher' class that I used in my toy
> program in my original post.
>
> I am surprised that this issue does not crop up more often. Does nobody
> else have these problems?
>

It really depends on how big those dangling objects are. My personal
habit is to not worry about a few loose objects, by virtue of ensuring
that everything either has its reference loops deliberately broken at
some point in time, or by keeping things small.

An example of deliberately breaking a refloop would be when I track
websockets. Usually I'll tag the socket object itself with some kind
of back-reference to my own state, but I also need to be able to
iterate over all of my own state objects (let's say they're
dictionaries for simplicity) and send a message to each socket. So
there'll be a reference loop between the socket and the state. But at
some point, I will be notified that the socket has been disconnected,
and that's when I go to its state object and wipe out its
back-reference. It can then be disposed of promptly, since there's no
loop.

It takes a bit of care, but in general, large state objects won't have
these kinds of loops, and dangling references haven't caused me any
sort of major issues in production.

Where do you tend to "leave a reference dangling somewhere"? How is
this occurring? Is it a result of an incomplete transaction (like an
HTTP request that never finishes), or a regular part of the operation
of the server?

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 1/16/2024 4:17 AM, Barry wrote:
>
>
>> On 16 Jan 2024, at 03:49, Thomas Passin via Python-list <python-list@python.org> wrote:
>>
>> This kind of thing can happen with PyQt, also. There are ways to minimize it but I don't know if you can ever be sure all Qt C++ objects will get deleted. It depends on the type of object and the circumstances.
>
> When this has been seen in the past it has been promptly fixed by the maintainer.

The usual advice is to call deleteLater() on objects derived from PyQt
classes. I don't know enough about PyQt to know if this takes care of
all dangling reference problems, though.

--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 2024-01-16 2:15 PM, Chris Angelico via Python-list wrote:
>
> Where do you tend to "leave a reference dangling somewhere"? How is
> this occurring? Is it a result of an incomplete transaction (like an
> HTTP request that never finishes), or a regular part of the operation
> of the server?
>

I have a class that represents a database table, and another class that
represents a database column. There is a one-to-many relationship and
they maintain references to each other.

In another part of the app, there is a class that represents a form, and
another class that represents the gui elements on the form. Again there
is a one-to-many relationship.

A gui element that represents a piece of data has to maintain a link to
its database column object. There can be a many-to-one relationship, as
there could be more than one gui element referring to the same column.

There are added complications which I won't go into here. The bottom
line is that on some occasions a form which has been closed does not get
gc'd.

I have been trying to reproduce the problem in my toy app, but I cannot
get it to fail. There is a clue there! I think I have just
over-complicated things.

I will start with a fresh approach tomorrow. If you don't hear from me
again, you will know that I have solved it!

Thanks for the input, it definitely helped.

Frank


--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On Wed, 17 Jan 2024 at 01:45, Frank Millman via Python-list
<python-list@python.org> wrote:
>
> On 2024-01-16 2:15 PM, Chris Angelico via Python-list wrote:
> >
> > Where do you tend to "leave a reference dangling somewhere"? How is
> > this occurring? Is it a result of an incomplete transaction (like an
> > HTTP request that never finishes), or a regular part of the operation
> > of the server?
> >
>
> I have a class that represents a database table, and another class that
> represents a database column. There is a one-to-many relationship and
> they maintain references to each other.
>
> In another part of the app, there is a class that represents a form, and
> another class that represents the gui elements on the form. Again there
> is a one-to-many relationship.

I don't know when you'd be "done" with the table, so I won't try to
give an example, but I'll try this one and maybe it'll give some ideas
that could apply to both.

When you open the form, you initialize it, display it, etc, etc. This
presumably includes something broadly like this:

class Form:
def __init__(self):
self.elements = []

class Element:
def __init__(self, form):
self.form = form
form.elements.append(self)

frm = Form(...)
Element(frm, ...) # as many as needed
frm.show() # present it to the user

This is a pretty classic refloop. I don't know exactly what your setup
is, but most likely it's going to look something like this. Feel free
to correct me if it doesn't.

The solution here would be to trap the "form is no longer being
displayed" moment. That'll be some sort of GUI event like a "close" or
"delete" signal. When that comes through (and maybe after doing other
processing), you no longer need the form, and can dispose of it. The
simplest solution here is: Empty out frm.elements. That immediately
leaves the form itself as a leaf (no references to anything relevant),
and the elements still refer back to it, but once nothing ELSE refers
to the form, everything can be disposed of.

> A gui element that represents a piece of data has to maintain a link to
> its database column object. There can be a many-to-one relationship, as
> there could be more than one gui element referring to the same column.

Okay, so the Element also refers to the corresponding Column. If the
Form and Element aren't in a refloop, this shouldn't be a problem.
However, if this is the same Table and Column that you referred to
above, that might be the answer to my question. Are you "done" with
the Table at the same time that the form is no longer visible? If so,
you would probably have something similar where the Form refers to the
Table, and the Table and Columns refer to each other... so the same
solution hopefully should work: wipe out the Table's list of columns.

> There are added complications which I won't go into here. The bottom
> line is that on some occasions a form which has been closed does not get
> gc'd.
>
> I have been trying to reproduce the problem in my toy app, but I cannot
> get it to fail. There is a clue there! I think I have just
> over-complicated things.

Definitely possible.

> I will start with a fresh approach tomorrow. If you don't hear from me
> again, you will know that I have solved it!
>
> Thanks for the input, it definitely helped.

Cool cool, happy to help.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
> On 16 Jan 2024, at 13:17, Thomas Passin via Python-list <python-list@python.org> wrote:
>
> The usual advice is to call deleteLater() on objects derived from PyQt classes. I don't know enough about PyQt to know if this takes care of all dangling reference problems, though.

It works well and robustly.

Barry


--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
> On 16 Jan 2024, at 12:10, Frank Millman via Python-list <python-list@python.org> wrote:
>
> My problem is that my app is quite complex, and it is easy to leave a reference dangling somewhere which prevents an object from being gc'd.

What I do to track these problems down is use gc.get_objects() then summerize the number of each type. Part 2 is to print the delta after an interval of a 2nd summary.
Leaks of objects show up as the count of a type increasing every time you sample.


Barry


--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 17/01/24 4:00 am, Chris Angelico wrote:
> class Form:
> def __init__(self):
> self.elements = []
>
> class Element:
> def __init__(self, form):
> self.form = form
> form.elements.append(self)

If you make the reference from Element to Form a weak reference,
it won't keep the Form alive after it's been closed.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 17/01/24 1:01 am, Frank Millman wrote:
> I sometimes need to keep a reference from a
> transient object to a more permanent structure in my app. To save myself
> the extra step of removing all these references when the transient
> object is deleted, I make them weak references.

I don't see how weak references help here at all. If the transient
object goes away, all references from it to the permanent objects also
go away.

A weak reference would only be of use if the reference went the other
way, i.e. from the permanent object to the transient object.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
On 2024-01-17 3:01 AM, Greg Ewing via Python-list wrote:
> On 17/01/24 1:01 am, Frank Millman wrote:
>> I sometimes need to keep a reference from a transient object to a more
>> permanent structure in my app. To save myself the extra step of
>> removing all these references when the transient object is deleted, I
>> make them weak references.
>
> I don't see how weak references help here at all. If the transient
> object goes away, all references from it to the permanent objects also
> go away.
>
> A weak reference would only be of use if the reference went the other
> way, i.e. from the permanent object to the transient object.
>

You are right. I got my description above back-to-front. It is a pub/sub
scenario. A transient object makes a request to the permanent object to
be notified of any changes. The permanent object stores a reference to
the transient object and executes a callback on each change. When the
transient object goes away, the reference must be removed.

Frank

--
https://mail.python.org/mailman/listinfo/python-list
Re: Question about garbage collection [ In reply to ]
So, here's some info about how to see what's going on with Python's
memory allocation: https://docs.python.org/3/library/tracemalloc.html
. I haven't looked into this in a long time, but it used to be the
case that you needed to compile native modules (and probably Python
itself?) so that instrumentation is possible (I think incref / decref
macros should give you a hint, because they would have to naturally
report some of that info).

Anyways. The problem of tracing memory allocation / deallocation in
Python can be roughly split into these categories:

1. Memory legitimately claimed by objects created through Python
runtime, but not reclaimed due to programmer error. I.e. the
programmer wrote a program that keeps references to objects which it
will never use again.
2. Memory claimed through native objects obtained by means of
interacting with Python's allocator. When working with Python C API
it's best to interface with Python allocator to deal with dynamic
memory allocation and release. However, it's somewhat cumbersome, and
some module authors simply might not know about it, or wouldn't want
to use it because they prefer a different allocator. Sometimes
library authors don't implement memory deallocation well. Which
brings us to:
3. Memory claimed by any user-space code that is associated with the
Python process. This can be for example shared libraries loaded by
means of Python bindings, that is on top of the situation described
above.
4. System memory associated with the process. Some system calls need
to allocate memory on the system side. Typical examples are opening
files, creating sockets etc. Typically, the system will limit the
number of such objects, and the user program will hit the numerical
limit before it hits the memory limit, but it can also happen that
this will manifest as a memory problem (one example I ran into was
trying to run conda-build and it would fail due to enormous amounts of
memory it requested, but the specifics of the failure were due to it
trying to create new sub-processes -- another system resource that
requires memory allocation).

There isn't a universal strategy to cover all these cases. But, if
you have reasons to suspect (4), for example, you'd probably start by
using strace utility (on Linux) to see what system calls are executed.

For something like the (3), you could try to utilize Valgrind (but
it's a lot of work to set it up). It's also possible to use jemalloc
to profile a program, but you would have to build Python with its
allocator modified to use jemalloc (I've seen an issue in the Python
bug tracker where someone wrote a script to do that, so it should be
possible). Both of these are quite labor intensive and not trivial to
set up.

(2) could be often diagnosed with tracemalloc Python module and (1) is
something that can be helped with Python's gc module.

It's always better though to have an actual error and work from there.
Or, at least, have some monitoring data that suggests that your
application memory use increases over time. Otherwise you could be
spending a lot of time chasing problems you don't have.
--
https://mail.python.org/mailman/listinfo/python-list