Mailing List Archive

Design question: call __del__ for cyclical garbage?
We now have two implementations of Eric Tiedemann's idea: Neil and I
both implemented it. It's too soon to post the patch sets (both are
pretty rough) but I've got another design question.

Once we've identified a bunch of objects that are only referring to
each other (i.e., one or more cycles) we have to dispose of them.

The question is, how? We can't just call free on each of the objects;
some may not be allocated with malloc, and some may contain pointers
to other malloc'ed memory that also needs to be freed.

So we have to get their destructors involved. But how? Calling
ob->ob_type->tp_dealloc(ob) for an object who reference count is
unsafe -- this will destroy the object while there are still
references to it! Those references are all coming from other objects
that are part of the same cycle; those objects will also be
deallocated and they will reference the deallocated objects (if only
to DECREF them).

Neil uses the same solution that I use when finalizing the Python
interpreter -- find the dictionaries and call PyDict_Clear() on them.
(In his unpublished patch, he also clears the lists using
PyList_SetSlice(list, 0, list->ob_size, NULL). He's also generalized
so that *every* object can define a tp_clear function in its type
object.)

As long as every cycle contains at least one dictionary or list
object, this will break cycles reliably and get rid of all the
garbage. (If you wonder why: clearing the dict DECREFs the next
object(s) in the cycle; if the last dict referencing a particular
object is cleared, the last DECREF will deallocate that object, which
will in turn DECREF the objects it references, and so forth. Since
none of the objects in the cycle has incoming references from outside
the cycle, we can prove that this will delete all objects as long as
there's a dict or list in each cycle.

However, there's a snag. It's the same snag as what finalizing the
Python interpreter runs into -- it has to do with __del__ methods and
the undefined order in which the dictionaries are cleared.

For example, it's quite possible that the first dictionary we clear is
the __dict__ of an instance, so this zaps all its instance variables.
Suppose this breaks the cycle, so then the instance itself gets
DECREFed to zero. Its deallocator will be called. If it's got a
__del__, this __del__ will be called -- but all the instance variables
have already been zapped, so it will fail miserably!

It's also possible that the __dict__ of a class involved in a cycle
gets cleared first, in which case the __del__ no longer "exists", and
again the cleanup is skipped.

So the question is: What to *do*?

My solution is to make an extra pass over all the garbage objects
*before* we clear dicts and lists, and for those that are instances
and have __del__ methods, call their __del__ ("by magic", as Tim calls
it in another post). The code in instance_dealloc() already does the
right thing here: it calls __del__, then discovers that the reference
count is > 0 ("I'm not dead yet" :-), and returns without freeing the
object. (This is also why I want to introduce a flag ensuring that
__del__ gets called by instance_dealloc at most once: later when the
instance gets DECREFed to 0, instance_dealloc is called again and will
correctly free the object; but we don't want __del__ called again.)
[.Note for Neil: somehow I forgot to add this logic to the code;
in_del_called isn't used! The change is obvious though.]

This still leaves a problem for the user: if two class instances
reference each other and both have a __del__, we can't predict whose
__del__ is called first when they are called as part of cycle
collection. The solution is to write each __del__ so that it doesn't
depend on the other __del__.

Someone (Tim?) in the past suggested a different solution (probably
found in another language): for objects that are collected as part of
a cycle, the destructor isn't called at all. The memory is freed
(since it's no longer reachable), but the destructor is not called --
it is as if the object lives on forever.

This is theoretically superior, but not practical: when I have an
object that creates a temp file, I want to be able to reliably delete
the temp file in my destructor, even when I'm part of a cycle!

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: Design question: call __del__ for cyclical garbage? [ In reply to ]
The __init__ rule for calling __del__ has me confused. Is this per-class or
per-object?

I.e. what will happen in the following case:

class Purse:
def __init__(self):
self.balance = WithdrawCashFromBank(1000)

def __del__(self):
PutCashBackOnBank(self.balance)
self.balance = 0

class LossyPurse(Purse):
def __init__(self):
Purse.__init__(self)
raise 'kaboo! kaboo!'

If the new scheme means that the __del__ method of Purse isn't called I think
I don't like it. In the current scheme I can always program defensively:
def __del__(self):
try:
b = self.balance
self.balance = 0
except AttributeError:
pass
else:
PutCashBackOnBank(b)
but in a new scheme with a per-object "__del__ must be called" flag I can't...
--
Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++
Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++
www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm
Re: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Guido about ways to cleanup cyclic garbage]

FYI, I'm using a special protocol for disposing of cyclic
garbage: the __cleanup__ protocol. The purpose of this call
is probably similar to Neil's tp_clear: it is intended to
let objects break possible cycles in their own storage scope,
e.g. instances can delete instance variables which they
know can cause cyclic garbage.

The idea is simple: give all power to the objects rather
than try to solve everything with one magical master plan.

The mxProxy package has details on the protocol. The __cleanup__
method is called by the Proxy when the Proxy is about to be deleted.
If all references to an object go through the Proxy, the
__cleanup__ method call can easily break cycles to have the
refcount reach zero in which case __del__ is called. Since the
object knows about this scheme it can take precautions to
make sure that __del__ still works after __cleanup__ was
called.

Anyway, just a thought... there are probably many ways to do
all this.

--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Guido]
> ...
> Someone (Tim?) in the past suggested a different solution (probably
> found in another language): for objects that are collected as part of
> a cycle, the destructor isn't called at all. The memory is freed
> (since it's no longer reachable), but the destructor is not called --
> it is as if the object lives on forever.

Stroustrup has written in favor of this for C++. It's exactly the kind of
overly slick "good argument" he would never accept from anyone else <0.1

> This is theoretically superior, but not practical: when I have an
> object that creates a temp file, I want to be able to reliably delete
> the temp file in my destructor, even when I'm part of a cycle!

A member of the C++ committee assured me Stroustrup is overwhelmingly
opposed on this. I don't even agree it's theoretically superior: it relies
on the fiction that gc "may never occur", and that's just silly in practice.

You're moving down the Java path. I can't possibly do a better job of
explaining the Java rules than the Java Language Spec. does for itself. So
pick that up and study section 12.6 (Finalization of Class Instances). The
end result makes little sense to users, but is sufficient to guarantee that
Java itself never blows up.

Note, though, that there is NO good answer to finalizers in cycles! The
implementation cannot be made smart enough to both avoid trouble and "do the
right thing" from the programmer's POV, because the latter is unknowable.
Somebody has to lose, one way or another.

Rather than risk doing a wrong thing, the BDW collector lets cycles with
finalizers leak. But it also has optional hacks to support exceptions for
use with C++ (which sometimes creates self-cycles) and Java. See

http://reality.sgi.com/boehm_mti/finalization.html

for Boehm's best concentrated <wink> thoughts on the subject.

The only principled approach I know of comes out of the Scheme world.
Scheme has no finalizers, of course. But it does have gc, and the concept
of "guardians" was invented to address all gc finalization problems in one
stroke. It's extremely Scheme-like in providing a perfectly general
mechanism with no policy whatsoever. You (the Scheme programmer) can create
guardian objects, and "register" other objects with a guardian. At any
time, you can ask a guardian whether some object registered with it is
"ready to die" (i.e., the only thing keeping it alive is its registration
with the guardian). If so, you can ask it to give you one. Everything else
is up to you: if you want to run a finalizer, your problem. If there are
cycles, also your problem. Even if there are simple non-cyclic
dependencies, your problem. Etc.

So those are the extremes: BDW avoids blame by refusing to do anything.
Java avoids blame by exposing an impossibly baroque implementation-driven
finalization model. Scheme avoids blame by refusing to do anything "by
magic", but helps you to shoot yourself with the weapon of your choice.

That bad news is that I don't know of a scheme *not* at an extreme!

It's extremely un-Pythonic to let things leak (despite that it has let
things leak for a decade <wink>), but also extremely un-Pythonic to make
some wild-ass guess.

So here's what I'd consider doing: explicit is better than implicit, and in
the face of ambiguity refuse the temptation to guess. If a trash cycle
contains a finalizer (my, but that has to be rare. in practice, in
well-designed code!), don't guess, but make it available to the user. A
gc.guardian() call could expose such beasts, or perhaps a callback could be
registered, invoked when gc finds one of these things. Anyone crazy enough
to create cyclic trash with finalizers then has to take responsibility for
breaking the cycle themself. This puts the burden on the person creating
the problem, and they can solve it in the way most appropriate to *their*
specific needs. IOW, the only people who lose under this scheme are the
ones begging to lose, and their "loss" consists of taking responsibility.

when-a-problem-is-impossible-to-solve-favor-sanity<wink>-ly y'rs - tim
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> Note, though, that there is NO good answer to finalizers in cycles! The

"Note" ?? Not just a note, but I'd say an axiom :-)

By definition, you have two objects referring to each other in some way.
How can you *definitely* know how to break the link between them? Do you
call A's finalizer or B's first? If they're instances, do you just whack
their __dict__ and hope for the best?

>...
> So here's what I'd consider doing: explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess. If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!), don't guess, but make it available to the user. A
> gc.guardian() call could expose such beasts, or perhaps a callback could be
> registered, invoked when gc finds one of these things. Anyone crazy enough
> to create cyclic trash with finalizers then has to take responsibility for
> breaking the cycle themself. This puts the burden on the person creating
> the problem, and they can solve it in the way most appropriate to *their*
> specific needs. IOW, the only people who lose under this scheme are the
> ones begging to lose, and their "loss" consists of taking responsibility.

I'm not sure if Tim is saying the same thing, but I'll write down a
concreate idea for cleaning garbage cycles.

First, a couple observations:

* Some objects can always be reliably "cleaned": lists, dicts, tuples.
They just drop their contents, with no invocations against any of them.

Note that an instance without a __del__ has no opinion on how it is
cleaned.
(this is related to Tim's point about whether a cycle has a finalizer)

* The other objects may need to *use* their referenced objects in some way
to clean out cycles.

Since the second set of objects (possibly) need more care during their
cleanup, we must concentrate on how to solve their problem.

Back up a step: to determine where an object falls, let's define a
tp_clean type slot. It returns an integer and takes one parameter: an
operation integer.

Py_TPCLEAN_CARE_CHECK /* check whether care is needed */
Py_TPCLEAN_CARE_EXEC /* perform the careful cleaning */
Py_TPCLEAN_EXEC /* perform a non-careful cleaning */

Given a set of objects that require special cleaning mechanisms, there is
no way to tell where to start first. So... just pick the first one. Call
its tp_clean type slot with CARE_EXEC. For instances, this maps to
__clean__. If the instance does not have a __clean__, then tp_clean
returns FALSE meaning that it could not clean this object. The algorithm
moves on to the next object in the set.

If tp_clean returns TRUE, then the object has been "cleaned" and is moved
to the "no special care needed" list of objects, awaiting its reference
count to hit zero.

Note that objects in the "care" and "no care" lists may disappear during
the careful-cleaning process.

If the careful-cleaning algorithm hits the end of the careful set of
objects and the set is non-empty, then throw an exception:
GCImpossibleError. The objects in this set each said they could not be
cleaned carefully AND they were not dealloc'd during other objects'
cleaning.

[. it could be possible to define a *dynamic* CARE_EXEC that will succeed
if you call it during a second pass; I'm not sure this is a Good Thing
to allow, however. ]

This also implies that a developer should almost *always* consider writing
a __clean__ method whenever they write a __del__ method. That method MAY
be called when cycles need to be broken; the object should delete any
non-essential variables in such a way that integrity is retained (e.g. it
fails gracefully when methods are called and __del__ won't raise an
error). For example, __clean__ could call a self.close() to shut down its
operation. Whatever... you get the idea.

At the end of the iteration of the "care" set, then you may have objects
remaining in the "no care" set. By definition, these objects don't care
about their internal references to other objects (they don't need them
during deallocation). We iterate over this set, calling tp_clean(EXEC).
For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out
the references to other objects (but does not dealloc the object!). Again:
objects in the "no care" set will go away during this process. By the end
of the iteration over the "no care" set, it should be empty.

[. note: the iterations over these sets should probably INCREF/DECREF
across the calls; otherwise, the object could be dealloc'd during the
tp_clean call. ]

[. if the set is NOT empty, then tp_clean(EXEC) did not remove all possible
references to other objects; not sure what this means. is it an error?
maybe you just force a tp_dealloc on the remaining objects. ]

Note that the tp_clean mechanism could probably be used during the Python
finalization, where Python does a bunch of special-casing to clean up
modules. Specifically: a module does not care about its contents during
its deallocation, so it is a "no care" object; it responds to
tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they
can clear their dict (which contains a module reference which usually
causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects
with CARE_CHECK have been handled -- all that funny logic in there is to
deal with "care" objects.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Tim]
> Note, though, that there is NO good answer to finalizers in cycles! The

[Greg Stein]
> "Note" ?? Not just a note, but I'd say an axiom :-)

An axiom is accepted without proof: we have plenty of proof that there's no
thoroughly good answer (i.e., every language that has ever addressed this
issue -- along with every language that ever will <wink>).

> By definition, you have two objects referring to each other in some way.
> How can you *definitely* know how to break the link between them? Do you
> call A's finalizer or B's first? If they're instances, do you just whack
> their __dict__ and hope for the best?

Exactly. The *programmer* may know the right thing to do, but the Python
implementation can't possibly know. Facing both facts squarely constrains
the possibilities to the only ones that are all of understandable,
predictable and useful. Cycles with finalizers must be a Magic-Free Zone
else you lose at least one of those three: even Guido's kung fu isn't
strong enough to outguess this.

[.a nice implementation sketch, of what seems an overly elaborate scheme,
if you believe cycles with finalizers are rare in intelligently designed
code)
]

Provided Guido stays interested in this, he'll make his own fun. I'm just
inviting him to move in a sane direction <0.9 wink>.

One caution:

> ...
> If the careful-cleaning algorithm hits the end of the careful set of
> objects and the set is non-empty, then throw an exception:
> GCImpossibleError.

Since gc "can happen at any time", this is very severe (c.f. Guido's
objection to making resurrection illegal). Hand a trash cycle back to the
programmer instead, via callback or request or whatever, and it's all
explicit without more cruft in the implementation. It's alive again when
they get it back, and they can do anything they want with it (including
resurrecting it, or dropping it again, or breaking cycles -- anything). I'd
focus on the cycles themselves, not on the types of objects involved. I'm
not pretending to address the "order of finalization at shutdown" question,
though (although I'd agree they're deeply related: how do you follow a
topological sort when there *isn't* one? well, you don't, because you
can't).

realistically y'rs - tim
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
On Fri, 3 Mar 2000, Tim Peters wrote:
>...
> [.a nice implementation sketch, of what seems an overly elaborate scheme,
> if you believe cycles with finalizers are rare in intelligently designed
> code)
> ]

Nah. Quite simple to code up, but a bit longer to explain in English :-)

The hardest part is finding the cycles, but Guido already posted a long
explanation about that. Once that spits out the doubly-linked list of
objects, then you're set.

1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed"
objects to a second list
2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is
returned, then the object was cleaned and moves to the "no care" list.
3) assert len(care-needed list) == 0
4) scan the no-care list calling tp_clean(EXEC)
5) (questionable) assert len(no-care list) == 0

The background makes it longer. The short description of the algorithm is
easy. Step (1) could probably be merged right into one of the scans in the
GC algorithm (e.g. during the placement into the "these are cyclical
garbage" list)

> Provided Guido stays interested in this, he'll make his own fun. I'm just
> inviting him to move in a sane direction <0.9 wink>.

hehe... Agreed.

> One caution:
>
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
>
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

GCImpossibleError would simply be a subclass of MemoryError. Makes sense
to me, and definitely allows for its "spontaneity."

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation. It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles -- anything). I'd
> focus on the cycles themselves, not on the types of objects involved. I'm
> not pretending to address the "order of finalization at shutdown" question,
> though (although I'd agree they're deeply related: how do you follow a
> topological sort when there *isn't* one? well, you don't, because you
> can't).

I disagree. I don't think a Python-level function is going to have a very
good idea of what to do. IMO, this kind of semantics belong down in the
interpreter with a specific, documented algorithm. Throwing it out to
Python won't help -- that function will still have to use a "standard
pattern" for getting the cyclical objects to toss themselves. I think that
standard pattern should be a language definition. Without a standard
pattern, then you're saying the application will know what to do, but that
is kind of weird -- what happens when an unexpected cycle arrives?

Cheers,
-g

--
Greg Stein, http://www.lyra.org/
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Tim Peters]
> ...If a trash cycle
> contains a finalizer (my, but that has to be rare. in practice, in
> well-designed code!),

This shows something Tim himself has often said -- he never programmed a
GUI. It's very hard to build a GUI (especially with Tkinter) which is
cycle-less, but the classes implementing the GUI often have __del__'s
to break system-allocated resources.

So, it's not as rare as we would like to believe, which is the reason
I haven't given this answer.

which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z.
--
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
On Sat, 4 Mar 2000, Greg Stein wrote:

> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do
<snip>

Much better then the Python interpreter...

<snip>
> Throwing it out to Python won't help
<snip>
> what happens when an unexpected cycle arrives?

Don't delete it.
It's as simple as that, since it's a bug.

--
Moshe Zadka <mzadka@geocities.com>.
http://www.oreilly.com/news/prescod_0300.html
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
On Sat, 4 Mar 2000, Moshe Zadka wrote:
> On Sat, 4 Mar 2000, Greg Stein wrote:
> > I disagree. I don't think a Python-level function is going to have a very
> > good idea of what to do
> <snip>
>
> Much better then the Python interpreter...

If your function receives two instances (A and B), what are you going to
do? How can you know what their policy is for cleaning up in the face of a
cycle?

I maintain that you would call the equivalent of my proposed __clean__.
There isn't much else you'd be able to do, unless you had a completely
closed system, you expected cycles between specific types of objects, and
you knew a way to clean them up. Even then, you would still be calling
something like __clean__ to let the objects do whatever they needed.

I'm suggesting that __clean__ should be formalized (as part of tp_clean).
Throwing the handling "up to Python" isn't going to do much for you.

Seriously... I'm all for coding more stuff in Python rather than C, but
this just doesn't feel right. Getting the objects GC'd is a language
feature, and a specific pattern/method/recommendation is best formulated
as an interpreter mechanism.

> <snip>
> > Throwing it out to Python won't help
> <snip>
> > what happens when an unexpected cycle arrives?
>
> Don't delete it.
> It's as simple as that, since it's a bug.

The point behind this stuff is to get rid of it, rather than let it linger
on. If the objects have finalizers (which is how we get to this step!),
then it typically means there is a resource they must release. Getting the
object cleaned and dealloc'd becomes quite important.

Cheers,
-g

p.s. did you send in a patch for the instance_contains() thing yet?

--
Greg Stein, http://www.lyra.org/
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Tim sez "toss insane cycles back on the user"]

[Greg Stein]
> I disagree. I don't think a Python-level function is going to have a very
> good idea of what to do.

You've already assumed that Python coders know exactly what to do, else they
couldn't have coded the new __clean__ method your proposal relies on. I'm
taking what strikes me as the best part of Scheme's Guardian idea: don't
assume *anything* about what users "should" do to clean up their trash.
Leave it up to them: their problem, their solution. I think finalizers in
trash cycles should be so rare in well-written code that it's just not worth
adding much of anything in the implementation to cater to it.

> IMO, this kind of semantics belong down in the interpreter with a
> specific, documented algorithm. Throwing it out to Python won't help
> -- that function will still have to use a "standard pattern" for getting
> the cyclical objects to toss themselves.

They can use any pattern they want, and if the pattern doesn't *need* to be
coded in C as part of the implementation, it shouldn't be.

> I think that standard pattern should be a language definition.

I distrust our ability to foresee everything users may need over the next 10
years: how can we know today that the first std pattern you dreamed up off
the top of your head is the best approach to an unbounded number of problems
we haven't yet seen a one of <wink>?

> Without a standard pattern, then you're saying the application will know
> what to do, but that is kind of weird -- what happens when an unexpected
> cycle arrives?

With the hypothetical gc.get_cycle() function I mentioned before, they
should inspect objects in the list they get back, and if they find they
don't know what to do with them, they can still do anything <wink> they
want. Examples include raising an exception, dialing my home pager at 3am
to insist I come in to look at it, or simply let the list go away (at which
point the objects in the list will again become a trash cycle containing a
finalizer).

If several distinct third-party modules get into this act, I *can* see where
it could become a mess. That's why Scheme "guardians" is plural: a given
module could register its "problem objects" in advance with a specific
guardian of its own, and query only that guardian later for things ready to
die. This probably can't be implemented in Python, though, without support
for weak references (or lots of brittle assumptions about specific refcount
values).

agreeably-disagreeing-ly y'rs - tim
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Tim]
> ...If a trash cycle contains a finalizer (my, but that has to be rare.
> in practice, in well-designed code!),

[Moshe Zadka]
> This shows something Tim himself has often said -- he never programmed a
> GUI. It's very hard to build a GUI (especially with Tkinter) which is
> cycle-less, but the classes implementing the GUI often have __del__'s
> to break system-allocated resources.
>
> So, it's not as rare as we would like to believe, which is the reason
> I haven't given this answer.

I wrote Cyclops.py when trying to track down leaks in IDLE. The
extraordinary thing we discovered is that "even real gc" would not have
reclaimed the cycles. They were legitimately reachable, because, indeed,
"everything points to everything else". Guido fixed almost all of them by
explicitly calling new "close" methods. I believe IDLE has no __del__
methods at all now. Tkinter.py currently contains two.

so-they-contained-__del__-but-weren't-trash-ly y'rs - tim
Re: Design question: call __del__ for cyclical garbage? [ In reply to ]
I'm beginning to believe that handing cycles with finalizers to the
user is better than calling __del__ with a different meaning, and I
tentatively withdraw my proposal to change the rules for when __del__
is called (even when __init__ fails; I haven't had any complaints
about that either).

There seem to be two competing suggestions for solutions: (1) call
some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
object; (2) Tim's proposal of an interface to ask the garbage
collector for a trash cycle with a finalizer (or for an object with a
finalizer in a trash cycle?).

Somehow Tim's version looks less helpful to me, because it *seems*
that whoever gets to handle the cycle (the main code of the program?)
isn't necessarily responsible for creating it (some library you didn't
even know was used under the covers of some other library you called).

Of course, it's also posssible that a trash cycle is created by code
outside the responsibility of the finalizer.

But still, I have a hard time understanding how Tim's version would be
used. Greg or Marc-Andre's version I understand.

What keeps nagging me though is what to do when there's a finalizer
but no cleanup method. I guess the trash cycle remains alive. Is
this acceptable? (I guess so, because we've given the programmer a
way to resolve the trash: provide a cleanup method.)

If we detect individual cycles (the current algorithm doesn't do that
yet, though it seems easy enough to do another scan), could we
special-case cycles with only one finalizer and no cleaner-upper?
(I'm tempted to call the finalizer because it seems little harm can be
done -- but then of course there's the problem of the finalizer being
called again when the refcount really goes to zero. :-( )

> Exactly. The *programmer* may know the right thing to do, but the Python
> implementation can't possibly know. Facing both facts squarely constrains
> the possibilities to the only ones that are all of understandable,
> predictable and useful. Cycles with finalizers must be a Magic-Free Zone
> else you lose at least one of those three: even Guido's kung fu isn't
> strong enough to outguess this.
>
> [.a nice implementation sketch, of what seems an overly elaborate scheme,
> if you believe cycles with finalizers are rare in intelligently designed
> code)
> ]
>
> Provided Guido stays interested in this, he'll make his own fun. I'm just
> inviting him to move in a sane direction <0.9 wink>.

My current tendency is to go with the basic __cleanup__ and nothing
more, calling each instance's __cleanup__ before clobbering
directories and lists -- which should break all cycles safely.

> One caution:
>
> > ...
> > If the careful-cleaning algorithm hits the end of the careful set of
> > objects and the set is non-empty, then throw an exception:
> > GCImpossibleError.
>
> Since gc "can happen at any time", this is very severe (c.f. Guido's
> objection to making resurrection illegal).

Not quite. Cycle detection is presumably only called every once in a
while on memory allocation, and memory *allocation* (as opposed to
deallocation) is allowed to fail. Of course, this will probably run
into various coding bugs where allocation failure isn't dealt with
properly, because in practice this happens so rarely...

> Hand a trash cycle back to the
> programmer instead, via callback or request or whatever, and it's all
> explicit without more cruft in the implementation. It's alive again when
> they get it back, and they can do anything they want with it (including
> resurrecting it, or dropping it again, or breaking cycles --
> anything).

That was the idea with calling the finalizer too: it would be called
between INCREF/DECREF, so the object would be considered alive for the
duration of the finalizer call.

Here's another way of looking at my error: for dicts and lists, I
would call a special *clear* function; but for instances, I would call
*dealloc*, however intending it to perform a *clear*.

I wish we didn't have to special-case finalizers on class instances
(since each dealloc function is potentially a combination of a
finalizer and a deallocation routine), but the truth is that they
*are* special -- __del__ has no responsibility for deallocating
memory, only for deallocating external resources (such as temp files).

And even if we introduced a tp_clean protocol that would clear dicts
and lists and call __cleanup__ for instances, we'd still want to call
it first for instances, because an instance depends on its __dict__
for its __cleanup__ to succeed (but the __dict__ doesn't depend on the
instance for its cleanup). Greg's 3-phase tp_clean protocol seems
indeed overly elaborate but I guess it deals with such dependencies in
the most general fashion.

> I'd focus on the cycles themselves, not on the types of objects
> involved. I'm not pretending to address the "order of finalization
> at shutdown" question, though (although I'd agree they're deeply
> related: how do you follow a topological sort when there *isn't*
> one? well, you don't, because you can't).

In theory, you just delete the last root (a C global pointing to
sys.modules) and you run the garbage collector. It might be more
complicated in practiceto track down all roots. Another practical
consideration is that now there are cycles of the form

<function object> <=> <module dict>

which suggests that we should make function objects traceable. Also,
modules can cross-reference, so module objects should be made
traceable. I don't think that this will grow the sets of traced
objects by too much (since the dicts involved are already traced, and
a typical program has way fewer functions and modules than it has
class instances). On the other hand, we may also have to trace
(un)bound method objects, and these may be tricky because they are
allocated and deallocated at high rates (once per typical method
call).

Back to the drawing board...

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: Design question: call __del__ for cyclical garbage? [ In reply to ]
Guido> What keeps nagging me though is what to do when there's a
Guido> finalizer but no cleanup method. I guess the trash cycle remains
Guido> alive. Is this acceptable? (I guess so, because we've given the
Guido> programmer a way to resolve the trash: provide a cleanup method.)

That assumes the programmer even knows there's a cycle, right? I'd like to
see this scheme help provide debugging assistance. If a cycle is discovered
but the programmer hasn't declared a cleanup method for the object it wants
to cleanup, a default cleanup method is called if it exists
(e.g. sys.default_cleanup), which would serve mostly as an alert (print
magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of
death, ...) as opposed to actually breaking any cycles. Presumably the
programmer would define sys.default_cleanup during development and leave it
undefined during production.

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/
Re: Design question: call __del__ for cyclical garbage? [ In reply to ]
On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> So here's what I'd consider doing: explicit is better than implicit, and in
> the face of ambiguity refuse the temptation to guess.

I like Marc's suggestion. Here is my proposal:

Allow classes to have a new method, __cleanup__ or whatever you
want to call it. When tp_clear is called for an instance, it
checks for this method. If it exists, call it, otherwise delete
the container objects from the instance's dictionary. When
collecting cycles, call tp_clear for instances first.

Its simple and allows the programmer to cleanly break cycles if
they insist on creating them and using __del__ methods.


Neil
RE: Design question: call __del__ for cyclical garbage? [ In reply to ]
[Guido]
> I'm beginning to believe that handing cycles with finalizers to the
> user is better than calling __del__ with a different meaning,

You won't be sorry: Python has the chance to be the first language that's
both useful and sane here!

> and I tentatively withdraw my proposal to change the rules for when
> __del__is called (even when __init__ fails; I haven't had any complaints
> about that either).

Well, everyone liked the parenthetical half of that proposal, although
Jack's example did point out a real surprise with it.

> There seem to be two competing suggestions for solutions: (1) call
> some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the
> object; (2) Tim's proposal of an interface to ask the garbage
> collector for a trash cycle with a finalizer (or for an object with a
> finalizer in a trash cycle?).

Or a maximal strongly-connected component, or *something* -- unsure.

> Somehow Tim's version looks less helpful to me, because it *seems*
> that whoever gets to handle the cycle (the main code of the program?)
> isn't necessarily responsible for creating it (some library you didn't
> even know was used under the covers of some other library you called).

Yes, to me too. This is the Scheme "guardian" idea in a crippled form
(Scheme supports as many distinct guardians as the programmer cares to
create), and even in its full-blown form it supplies "a perfectly general
mechanism with no policy whatsoever".

Greg convinced me (although I haven't admitted this yet <wink>) that "no
policy whatsoever" is un-Pythonic too. *Some* policy is helpful, so I won't
be pushing the guardian idea any more (although see immediately below for an
immediate backstep on that <wink>).

> ...
> What keeps nagging me though is what to do when there's a finalizer
> but no cleanup method. I guess the trash cycle remains alive. Is
> this acceptable? (I guess so, because we've given the programmer a
> way to resolve the trash: provide a cleanup method.)

BDW considers it better to leak than to risk doing a wrong thing, and I
agree wholeheartedly with that. GC is one place you want to have a "100%
language".

This is where something like a guardian can remain useful: while leaking is
OK because you've given them an easy & principled alternative, leaking
without giving them a clear way to *know* about it is not OK. If gc pushes
the leaked stuff off to the side, the gc module should (say) supply an entry
point that returns all the leaked stuff in a list. Then users can *know*
they're leaking, know how badly they're leaking, and examine exactly the
objects that are leaking. Then they've got the info they need to repair
their program (or at least track down the 3rd-party module that's leaking).
As with a guardian, they *could* also build a reclamation scheme on top of
it, but that would no longer be the main (or even an encouraged) thrust.

> If we detect individual cycles (the current algorithm doesn't do that
> yet, though it seems easy enough to do another scan), could we
> special-case cycles with only one finalizer and no cleaner-upper?
> (I'm tempted to call the finalizer because it seems little harm can be
> done -- but then of course there's the problem of the finalizer being
> called again when the refcount really goes to zero. :-( )

"Better safe than sorry" is my immediate view on this -- you can't know that
the finalizer won't resurrect the cycle, and "finalizer called iff refcount
hits 0" is a wonderfully simple & predictable rule. That's worth a lot to
preserve, unless & until it proves to be a disaster in practice.


As to the details of cleanup, I haven't succeeded in making the time to
understand all the proposals. But I've done my primary job here if I've
harassed everyone into not repeating the same mistakes all previous
languages have made <0.9 wink>.

> ...
> I wish we didn't have to special-case finalizers on class instances
> (since each dealloc function is potentially a combination of a
> finalizer and a deallocation routine), but the truth is that they
> *are* special -- __del__ has no responsibility for deallocating
> memory, only for deallocating external resources (such as temp files).

And the problem is that __del__ can do anything whatsoever than can be
expressed in Python, so there's not a chance in hell of outguessing it.

> ...
> Another practical consideration is that now there are cycles of the form
>
> <function object> <=> <module dict>
>
> which suggests that we should make function objects traceable. Also,
> modules can cross-reference, so module objects should be made
> traceable. I don't think that this will grow the sets of traced
> objects by too much (since the dicts involved are already traced, and
> a typical program has way fewer functions and modules than it has
> class instances). On the other hand, we may also have to trace
> (un)bound method objects, and these may be tricky because they are
> allocated and deallocated at high rates (once per typical method
> call).

This relates to what I was trying to get at with my response to your gc
implementation sketch: mark-&-sweep needs to chase *everything*, so the set
of chased types is maximal from the start. Adding chased types to the
"indirectly infer what's unreachable via accounting for internal refcounts
within the transitive closure" scheme can end up touching nearly as much as
a full M-&-S pass per invocation. I don't know where the break-even point
is, but the more stuff you chase in the latter scheme the less often you
want to run it.

About high rates, so long as a doubly-linked list allows efficient removal
of stuff that dies via refcount exhaustion, you won't actually *chase* many
bound method objects (i.e., they'll usually go away by themselves).

Note in passing that bound method objects often showed up in cycles in IDLE,
although you usually managed to break those in other ways.

> Back to the drawing board...

Good! That means you're making real progress <wink>.

glad-someone-is-ly y'rs - tim
Re: Design question: call __del__ for cyclical garbage? [ In reply to ]
nascheme@enme.ucalgary.ca wrote:
>
> On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
> > So here's what I'd consider doing: explicit is better than implicit, and in
> > the face of ambiguity refuse the temptation to guess.
>
> I like Marc's suggestion. Here is my proposal:
>
> Allow classes to have a new method, __cleanup__ or whatever you
> want to call it. When tp_clear is called for an instance, it
> checks for this method. If it exists, call it, otherwise delete
> the container objects from the instance's dictionary. When
> collecting cycles, call tp_clear for instances first.
>
> Its simple and allows the programmer to cleanly break cycles if
> they insist on creating them and using __del__ methods.

Right :-)

--
Marc-Andre Lemburg
______________________________________________________________________
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/