Mailing List Archive

Re: [Patches] Reference cycle collection for Python
[.I don't like to cross-post to patches and python-dev, but I think
this belongs in patches because it's a followup to Neil's post there
and also in -dev because of its longer-term importance.]

Thanks for the new patches, Neil!

We had a visitor here at CNRI today, Eric Tiedemann
<est@hyperreal.org>, who had a look at your patches before. Eric
knows his way around the Scheme, Lisp and GC literature, and presented
a variant on your approach which takes the bite out of the recursive
passes.

Eric had commented earlier on Neil's previous code, and I had used the
morning to make myself familiar with Neil's code. This was relatively
easy because Neil's code is very clear.

Today, Eric proposed to do away with Neil's hash table altogether --
as long as we're wasting memory, we might as well add 3 fields to each
container object rather than allocating the same amount in a separate
hash table. Eric expects that this will run faster, although this
obviously needs to be tried.

Container types are: dict, list, tuple, class, instance; plus
potentially user-defined container types such as kjbuckets. I have a
feeling that function objects should also be considered container
types, because of the cycle involving globals.

Eric's algorithm, then, consists of the following parts.

Each container object has three new fields: gc_next, gc_prev, and
gc_refs. (Eric calls the gc_refs "refcount-zero".)

We color objects white (initial), gray (root), black (scanned root).
(The terms are explained later; we believe we don't actually need bits
in the objects to store the color; see later.)

All container objects are chained together in a doubly-linked list --
this is the same as Neil's code except Neil does it only for dicts.
(Eric postulates that you need a list header.)

When GC is activated, all objects are colored white; we make a pass
over the entire list and set gc_refs equal to the refcount for each
object.

Next, we make another pass over the list to collect the internal
references. Internal references are (just like in Neil's version)
references from other container types. In Neil's version, this was
recursive; in Eric's version, we don't need recursion, since the list
already contains all containers. So we simple visit the containers in
the list in turn, and for each one we go over all the objects it
references and subtract one from *its* gc_refs field. (Eric left out
the little detail that we ened to be able to distinguish between
container and non-container objects amongst those references; this can
be a flag bit in the type field.)

Now, similar to Neil's version, all objects for which gc_refs == 0
have only internal references, and are potential garbage; all objects
for which gc_refs > 0 are "roots". These have references to them from
other places, e.g. from globals or stack frames in the Python virtual
machine.

We now start a second list, to which we will move all roots. The way
to do this is to go over the first list again and to move each object
that has gc_refs > 0 to the second list. Objects placed on the second
list in this phase are considered colored gray (roots).

Of course, some roots will reference some non-roots, which keeps those
non-roots alive. We now make a pass over the second list, where for
each object on the second list, we look at every object it references.
If a referenced object is a container and is still in the first list
(colored white) we *append* it to the second list (colored gray).
Because we append, objects thus added to the second list will
eventually be considered by this same pass; when we stop finding
objects that sre still white, we stop appending to the second list,
and we will eventually terminate this pass. Conceptually, objects on
the second list that have been scanned in this pass are colored black
(scanned root); but there is no need to to actually make the
distinction.

(How do we know whether an object pointed to is white (in the first
list) or gray or black (in the second)? We could use an extra
bitfield, but that's a waste of space. Better: we could set gc_refs
to a magic value (e.g. 0xffffffff) when we move the object to the
second list. During the meeting, I proposed to set the back pointer
to NULL; that might work too but I think the gc_refs field is more
elegant. We could even just test for a non-zero gc_refs field; the
roots moved to the second list initially all have a non-zero gc_refs
field already, and for the objects with a zero gc_refs field we could
indeed set it to something arbitrary.)

Once we reach the end of the second list, all objects still left in
the first list are garbage. We can destroy them in a similar to the
way Neil does this in his code. Neil calls PyDict_Clear on the
dictionaries, and ignores the rest. Under Neils assumption that all
cycles (that he detects) involve dictionaries, that is sufficient. In
our case, we may need a type-specific "clear" function for containers
in the type object.

We discussed more things, but not as thoroughly. Eric & Eric stressed
the importance of making excellent statistics available about the rate
of garbage collection -- probably as data structures that Python code
can read rather than debugging print statements. Eric T also sketched
an incremental version of the algorithm, usable for real-time
applications. This involved keeping the gc_refs field ("external"
reference counts) up-to-date at all times, which would require two
different versions of the INCREF/DECREF macros: one for
adding/deleting a reference from a container, and another for
adding/deleting a root reference. Also, a 4th color (red) was added,
to distinguish between scanned roots and scanned non-roots. We
decided not to work this out in more detail because the overhead cost
appeared to be much higher than for the previous algorithm; instead,
we recommed that for real-time requirements the whole GC is disabled
(there should be run-time controls for this, not just compile-time).
We also briefly discussed possibilities for generational schemes.

The general opinion was that we should first implement and test the
algorithm as sketched above, and then changes or extensions could be
made.

I was pleasantly surprised to find Neil's code in my inbox when we
came out of the meeting; I think it would be worthwhile to compare and
contrast the two approaches. (Hm, maybe there's a paper in it?)

The rest of the afternoon was spent discussing continuations,
coroutines and generators, and the fundamental reason why
continuations are so hard (the C stack getting in the way everywhere).
But that's a topic for another mail, maybe.

--Guido van Rossum (home page: http://www.python.org/~guido/)
RE: Re: [Patches] Reference cycle collection for Python [ In reply to ]
Very briefly:

[Guido]
> ...
> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table. Eric expects that this will run faster, although this
> obviously needs to be tried.

No, it doesn't <wink>: it will run faster.

> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets. I
> have a feeling that function objects should also be considered
> container types, because of the cycle involving globals.

Note that the list-migrating steps you sketch later are basically the same
as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few
years ago, right down to using appending to effect a breadth-first traversal
without requiring recursion -- except M&S doesn't have to bother accounting
for sources of refcounts. Since *this* scheme does more work per item per
scan, to be as fast in the end it has to touch less stuff than M&S. But the
more kinds of types you track, the more stuff this scheme will have to
chase.

The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable
meta-point <wink>: you balked at M&S the last time around because of the
apparent need for two link fields + a bit or two per object of a "chaseable
type". If that's no longer perceived as being a showstopper, M&S should be
reconsidered too.

I happen to be a fan of both approaches <wink>. The worst part of M&S-on-RC
(== the one I never had a good answer for) is that a non-cooperating
extension type E can't be chased, hence objects reachable only from objects
of type E never get marked, so are vulnerable to bogus collection. In the
Neil/Toby scheme, objects of type E merely act as sources of "external"
references, so the scheme fails safe (in the sense of never doing a bogus
collection due to non-cooperating types).

Hmm ... if both approaches converge on keeping a list of all chaseable
objects, and being careful of uncoopoerating types, maybe the only real
difference in the end is whether the root set is given explicitly (as in
traditional M&S) or inferred indirectly (but where "root set" has a
different meaning in the scheme you sketched).

> ...
> In our case, we may need a type-specific "clear" function for containers
> in the type object.

I think definitely, yes.

full-speed-sideways<wink>-ly y'rs - tim
Re: Re: [Patches] Reference cycle collection for Python [ In reply to ]
Guido van Rossum wrote:
>
> Thanks for the new patches, Neil!

Thanks from me too!
I notice, however, that hash_resize() still uses a malloc call
instead of PyMem_NEW. Neil, please correct this in your version
immediately ;-)

>
> We had a visitor here at CNRI today, Eric Tiedemann
> <est@hyperreal.org>, who had a look at your patches before. Eric
> knows his way around the Scheme, Lisp and GC literature, and presented
> a variant on your approach which takes the bite out of the recursive
> passes.

Avoiding the recursion is valuable, as long we're optimizing the
implementation of one particular scheme. It doesn't bother me that
Neil's scheme is recursive, because I still perceive his code as a
proof of concept.

You're presenting here another scheme based on refcounts arithmetic,
generalized for all container types. The linked list implementation
of this generalized scheme is not directly related to the logic.

I have some suspitions on the logic, so you'll probably want to elaborate
a bit more on it, and convince me that this scheme would actually work.

> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.

I cannot agree so easily with this statement, but you should have expecting
this from me :-) If we're about to opimize storage, I have good reasons
to believe that we don't need 3 additional slots per container (but 1 for
gc_refs, yes).

We could certainly envision allocating the containers within memory pools
of 4K (just as it is done in pymalloc, and close to what we have for
ints & floats). These pools would be labaled as "container's memory",
they would obviously be under our control, and we'd have additional slots
per pool, not per object. As long as we isolate the containers from the
rest, we can enumerate them easily by walking though the pools.

But I'm willing to defer this question for now, as it involves the object
allocators (the builtin allocators + PyObject_NEW for extension types E --
user objects of type E would be automatically taken into account for GC
if there's a flag in the type struct which identifies them as containers).

> Eric expects that this will run faster, although this obviously needs
> to be tried.

Definitely, although I trust Eric & Tim :-)

>
> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets. I have a
> feeling that function objects should also be considered container
> types, because of the cycle involving globals.

+ other extension container types. And I insist.
Don't forget that we're planning to merge types and classes...

>
> Eric's algorithm, then, consists of the following parts.
>
> Each container object has three new fields: gc_next, gc_prev, and
> gc_refs. (Eric calls the gc_refs "refcount-zero".)
>
> We color objects white (initial), gray (root), black (scanned root).
> (The terms are explained later; we believe we don't actually need bits
> in the objects to store the color; see later.)
>
> All container objects are chained together in a doubly-linked list --
> this is the same as Neil's code except Neil does it only for dicts.
> (Eric postulates that you need a list header.)
>
> When GC is activated, all objects are colored white; we make a pass
> over the entire list and set gc_refs equal to the refcount for each
> object.

Step 1: for all containers, c->gc_refs = c->ob_refcnt

>
> Next, we make another pass over the list to collect the internal
> references. Internal references are (just like in Neil's version)
> references from other container types. In Neil's version, this was
> recursive; in Eric's version, we don't need recursion, since the list
> already contains all containers. So we simple visit the containers in
> the list in turn, and for each one we go over all the objects it
> references and subtract one from *its* gc_refs field. (Eric left out
> the little detail that we ened to be able to distinguish between
> container and non-container objects amongst those references; this can
> be a flag bit in the type field.)

Step 2: c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c

I guess that you realize that after this step, gc_refs can be zero
or negative.

I'm not sure that you collect "internal" references here (references
from other container types). A list referencing 20 containers, being
itself referenced by one container + one static variable + two times
from the runtime stack, has an initial refcount == 4, so we'll end
up with gc_refs == -16.

A tuple referencing 1 list, referenced once by the stack, will end up
with gc_refs == 0.

Neil's scheme doesn't seem to have this "property".

>
> Now, similar to Neil's version, all objects for which gc_refs == 0
> have only internal references, and are potential garbage; all objects
> for which gc_refs > 0 are "roots". These have references to them from
> other places, e.g. from globals or stack frames in the Python virtual
> machine.
>

Agreed, some roots have gc_refs > 0
I'm not sure that all of them have it, though... Do they?

> We now start a second list, to which we will move all roots. The way
> to do this is to go over the first list again and to move each object
> that has gc_refs > 0 to the second list. Objects placed on the second
> list in this phase are considered colored gray (roots).
>

Step 3: Roots with gc_refs > 0 go to the 2nd list.
All c->gc_refs <= 0 stay in the 1st list.

> Of course, some roots will reference some non-roots, which keeps those
> non-roots alive. We now make a pass over the second list, where for
> each object on the second list, we look at every object it references.
> If a referenced object is a container and is still in the first list
> (colored white) we *append* it to the second list (colored gray).
> Because we append, objects thus added to the second list will
> eventually be considered by this same pass; when we stop finding
> objects that sre still white, we stop appending to the second list,
> and we will eventually terminate this pass. Conceptually, objects on
> the second list that have been scanned in this pass are colored black
> (scanned root); but there is no need to to actually make the
> distinction.
>

Step 4: Closure on reachable containers which are all moved to the 2nd list.

(Assuming that the objects are checked only via their type, without
involving gc_refs)

> (How do we know whether an object pointed to is white (in the first
> list) or gray or black (in the second)?

Good question? :-)

> We could use an extra bitfield, but that's a waste of space.
> Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when
> we move the object to the second list.

I doubt that this would work for the reasons mentioned above.

> During the meeting, I proposed to set the back pointer to NULL; that
> might work too but I think the gc_refs field is more elegant. We could
> even just test for a non-zero gc_refs field; the roots moved to the
> second list initially all have a non-zero gc_refs field already, and
> for the objects with a zero gc_refs field we could indeed set it to
> something arbitrary.)

Not sure that "arbitrary" is a good choice if the differentiation
is based solely on gc_refs.

>
> Once we reach the end of the second list, all objects still left in
> the first list are garbage. We can destroy them in a similar to the
> way Neil does this in his code. Neil calls PyDict_Clear on the
> dictionaries, and ignores the rest. Under Neils assumption that all
> cycles (that he detects) involve dictionaries, that is sufficient. In
> our case, we may need a type-specific "clear" function for containers
> in the type object.

Couldn't this be done in the object's dealloc function?

Note that both Neil's and this scheme assume that garbage _detection_
and garbage _collection_ is an atomic operation. I must say that
I don't care of having some living garbage if it doesn't hurt my work.
IOW, the used criterion for triggering the detection phase _may_ eventually
differ from the one used for the collection phase. But this is where we
reach the incremental approaches, implying different reasoning as a
whole. My point is that the introduction of a "clear" function depends
on the adopted scheme, whose logic depends on pertinent statistics on
memory consumption of the cyclic garbage.

To make it simple, we first need stats on memory consumption, then we
can discuss objectively on how to implement some particular GC scheme.
I second Eric on the need for excellent statistics.

>
> The general opinion was that we should first implement and test the
> algorithm as sketched above, and then changes or extensions could be
> made.

I'd like to see it discussed first in conjunction with (1) the possibility
of having a proprietary malloc, (2) the envisioned type/class unification.
Perhaps I'm getting too deep, but once something gets in, it's difficult
to take it out, even when a better solution is found subsequently. Although
I'm enthousiastic about this work on GC, I'm not in a position to evaluate
the true benefits of the proposed schemes, as I still don't have a basis
for evaluating how much garbage my program generates and whether it hurts
the interpreter compared to its overal memory consumption.

>
> I was pleasantly surprised to find Neil's code in my inbox when we
> came out of the meeting; I think it would be worthwhile to compare and
> contrast the two approaches. (Hm, maybe there's a paper in it?)

I'm all for it!

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Re: Re: [Patches] Reference cycle collection for Python [ In reply to ]
>>>>> "VM" == Vladimir Marangozov <marangoz@python.inrialpes.fr> writes:

[">>" == Guido explaining Eric Tiedemann's GC design]
>> Next, we make another pass over the list to collect the internal
>> references. Internal references are (just like in Neil's
>> version) references from other container types. In Neil's
>> version, this was recursive; in Eric's version, we don't need
>> recursion, since the list already contains all containers. So we
>> simple visit the containers in the list in turn, and for each one
>> we go over all the objects it references and subtract one from
>> *its* gc_refs field. (Eric left out the little detail that we
>> ened to be able to distinguish between container and
>> non-container objects amongst those references; this can be a
>> flag bit in the type field.)

VM> Step 2: c->gc_refs = c->gc_refs -
VM> Nb_referenced_containers_from_c

VM> I guess that you realize that after this step, gc_refs can be
VM> zero or negative.

I think Guido's explanation is slightly ambiguous. When he says,
"subtract one from *its" gc_refs field" he means subtract one from the
_contained_ object's gc_refs field.

VM> I'm not sure that you collect "internal" references here
VM> (references from other container types). A list referencing 20
VM> containers, being itself referenced by one container + one
VM> static variable + two times from the runtime stack, has an
VM> initial refcount == 4, so we'll end up with gc_refs == -16.

The strategy is not that the container's gc_refs is decremented once
for each object it contains. Rather, the container decrements each
contained object's gc_refs by one. So you should never end of with
gc_refs < 0.

>> During the meeting, I proposed to set the back pointer to NULL;
>> that might work too but I think the gc_refs field is more
>> elegant. We could even just test for a non-zero gc_refs field;
>> the roots moved to the second list initially all have a non-zero
>> gc_refs field already, and for the objects with a zero gc_refs
>> field we could indeed set it to something arbitrary.)

I believe we discussed this further and concluded that setting the
back pointer to NULL would not work. If we make the second list
doubly-linked (like the first one), it is trivial to end GC by
swapping the first and second lists. If we've zapped the NULL
pointer, then we have to go back and re-set them all.

Jeremy
Re: Re: [Patches] Reference cycle collection for Python [ In reply to ]
On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote:
> Guido van Rossum wrote:
> > Once we reach the end of the second list, all objects still left in
> > the first list are garbage. We can destroy them in a similar to the
> > way Neil does this in his code. Neil calls PyDict_Clear on the
> > dictionaries, and ignores the rest. Under Neils assumption that all
> > cycles (that he detects) involve dictionaries, that is sufficient. In
> > our case, we may need a type-specific "clear" function for containers
> > in the type object.
>
> Couldn't this be done in the object's dealloc function?

No, I don't think so. The object still has references to it.
You have to be careful about how you break cycles so that memory
is not accessed after it is freed.


Neil

--
"If elected mayor, my first act will be to kill the whole lot of you, and
burn your town to cinders!" -- Groundskeeper Willie