Mailing List Archive

Solution to finalisation problem [Re: fork()]
Tim Peters wrote:
>
> the combo of
> finalizers, cycles and resurrection seems to be a philosophical mess even in
> languages that take it all <ahem> seriously.

Indeed, which leads me to believe that the idea of giving objects
a __del__ method is fundamentally flawed in the first place.

Fortunately, there *is* a way to do finalisation that avoids all
these problems. Instead of an object doing its own finalisation,
it designates some *other* object to do its finalisation on its
behalf after it is dead.

The benefits of this are:

(1) The object doing the finalisation is fully alive and operates
in a predictable environment.

(2) The object which triggered the finalisation is fully dead
(its memory has already been reclaimed by the time the finaliser
is called) and there is no possibility of it being resurrected.

In Python, it might look something like this from the
user's point of view:

class FileWrapper:
# Example of a class needing finalisation.
# Has an instance variable 'file' which needs
# to be closed.
def __init__(self, file):
self.file = file
register_finaliser(self, FileWrapperFinaliser(file))

class FileWrapperFinaliser:
def __init__(self, file):
self.file = file
def __finalise__(self):
self.file.close()

In this example, register_finaliser() is a new built-in
method which stores the object and its finaliser in some
special global dict. Whenever an object is reclaimed (either
by refcount or M&S) the dict is checked for a finaliser for
that object. If one is found, its __finalise__ method is
called and it is removed from the dict.

Note that the finaliser object shouldn't be given a reference
to the original object (doing so would prevent it from ever
becoming unreachable), but only enough information to enable
the finalisation to be carried out.

If a scheme like this were adopted, it should completely
replace the existing __del__ method mechanism, so that there
would be no difference between the finalisation of cyclic
and non-cyclic trash. As such it would have to wait for
Python 2.0.

A-final-solution-to-the-finalisation-problem,
Greg
Solution to finalisation problem [Re: fork()] [ In reply to ]
[Tim]
> the combo of finalizers, cycles and resurrection seems to be a
> philosophical mess even in languages that take it all <ahem> seriously.

[Greg Ewing]
> Indeed, which leads me to believe that the idea of giving objects
> a __del__ method is fundamentally flawed in the first place.

Except it works so nice in "the usual" case (no cycle, no resurrection):
everyone agrees on what should be done then, and it's remarkably
surprise-free!

Toss resurrection in, and it's still not *much* of a debate: Python invokes
__del__ each time the refcount hits 0, Java does it only once, and the
choice seems arbitrary in the sense that there's no apparent killer argument
either way.

It's cycles that break its back, in Python and Java: the language is then
in the business of choosing an order in a context where it can't possibly be
smart enough to choose wisely. Java defines just enough to keep its own
internals sane, explicitly warning the programmer that it's not going to be
of much use to them. Guido wisely picked reference-counting so Python
didn't have to disappoint users similarly <wink>.

Anyway, I'm not sure __del__ deserves the blame for not being able to handle
every insane complication a user can throw at it. Like many another
feature, it's *fine* for what it's designed for, and simply can't be pushed
beyond that.

Hisao Suzuki earlier posted a link to an excellent paper describing proposed
"guardians" in Scheme. Like many things Schemeish, it defines a perfectly
general mechanism and absolutely no policy. Want to finalize cycles? Fine,
here's a hammer, build it yourself. Want to clean up I/O channels? Fine,
be our guest. 20,000 nails, and one hammer to pound them all in one at a
time <wink>.

> Fortunately, there *is* a way to do finalisation that avoids all
> these problems. Instead of an object doing its own finalisation,
> it designates some *other* object to do its finalisation on its
> behalf after it is dead.
>
> The benefits of this are:
>
> (1) The object doing the finalisation is fully alive and operates
> in a predictable environment.
>
> (2) The object which triggered the finalisation is fully dead
> (its memory has already been reclaimed by the time the finaliser
> is called) and there is no possibility of it being resurrected.

This appears isomorphic to the "register-for-finalization" approach dissed
in the aforementioned Scheme paper.

I'll leave it to Schemers to argue the Scheme case, but at least in Python
#2 is a drawback to people who use resurrection on purpose, e.g. to avoid
high initialization costs for large objects connecting to the outside world.
A __del__ in that case can consist of just enough to close the connection,
then put the expensively constructed object on a global "free list" for
immediate reuse (one reason to prefer Python's treatment of resurrection
over Java's -- in this case you definitely *want* __del__ to get called each
time the object is about to die).

> In Python, it might look something like this from the
> user's point of view:
>
> class FileWrapper:
> # Example of a class needing finalisation.
> # Has an instance variable 'file' which needs
> # to be closed.
> def __init__(self, file):
> self.file = file
> register_finaliser(self, FileWrapperFinaliser(file))
>
> class FileWrapperFinaliser:
> def __init__(self, file):
> self.file = file
> def __finalise__(self):
> self.file.close()
>
> In this example, register_finaliser() is a new built-in
> method which stores the object and its finaliser in some
> special global dict. Whenever an object is reclaimed (either
> by refcount or M&S) the dict is checked for a finaliser for
> that object. If one is found, its __finalise__ method is
> called and it is removed from the dict.

Presumably this needs to be some form of weak dict, else the very act of
registering would render the object immortal ... OK, that's easily doable
under the covers.

> Note that the finaliser object shouldn't be given a reference
> to the original object (doing so would prevent it from ever
> becoming unreachable), but only enough information to enable
> the finalisation to be carried out.

In an object with mutable state this can be quite inconvenient, don't you
think? Seems the "interesting state" would need to be factored out into
another container, shared by the object and its finalizer, referenced
indirectly by both.

> If a scheme like this were adopted, it should completely
> replace the existing __del__ method mechanism, so that there
> would be no difference between the finalisation of cyclic
> and non-cyclic trash. As such it would have to wait for
> Python 2.0.
>
> A-final-solution-to-the-finalisation-problem,
> Greg

Seems OK and doable so far as it goes. Doubt people would like giving up
the easy convenience of __del__ in the vast bulk of cases where it's
non-problematic. In a sense, Python (& Java) get in trouble now by trying
to provide policy as well as mechanism; I've become convinced that there is
no sensible policy in the presence of cycles, so punting on policy there is
fine; but don't want the language to make users "build it all themselves" in
the simpler cases where a reasonable policy is clear.

don't-throw-the-del-out-with-the-cycle-water-ly y'rs - tim
Solution to finalisation problem [Re: fork()] [ In reply to ]
[Greg Ewing]
> Indeed, which leads me to believe that the idea of giving objects
> a __del__ method is fundamentally flawed in the first place.
Tim Peters (tim_one@email.msn.com) wrote:
: Except it works so nice in "the usual" case (no cycle, no resurrection):
: everyone agrees on what should be done then, and it's remarkably
: surprise-free!

The problem with this argument Tim is that is is very difficult for the
programmer to recognise when they are in "the usual case", since circular
references are easy to create. Hence it is very difficult for the
programmer to verify the correctness of their code.

Tim Peters (tim_one@email.msn.com) wrote:
: It's cycles that break its back, in Python and Java:

No cycles are not the problem. The problem is that finalisers are run in
"incomplete contexts". These incomplete contexts arise because implementations
of GC delete or partially delete collected objects before calling finalisers.
That is a mistake. If you run *all* finalisers of *all* collected objects
before you actually delete or partially delete any of those objects then
all finalisers run in well defined contexts and there are no problems
with finalisers. That is the point of two pass collection.

graham
--
As you grow up and leave the playground
where you kissed your prince and found your frog
Remember the jester that showed you tears
the script for tears
Solution to finalisation problem [Re: fork()] [ In reply to ]
[Greg Ewing]
> Indeed, which leads me to believe that the idea of giving objects
> a __del__ method is fundamentally flawed in the first place.

[Tim]
> Except it works so nice in "the usual" case (no cycle, no resurrection):
> everyone agrees on what should be done then, and it's remarkably
> surprise-free!

[Graham Matthews]
> The problem with this argument

Thought of it as an observation; unless there's some argument about what to
do in the absence of cycles that everyone has been too shy to make here
<wink>.

> Tim is that is is very difficult for the programmer to recognise when
> they are in "the usual case", since circular references are easy to
> create. Hence it is very difficult for the programmer to verify the
> correctness of their code.

"is" is too strong in both cases. "can be" I'll buy. In the majority of
classes I've written or seen, it's obvious that instances can never be part
of a cycle -- or, more rarely, obvious that they can be. Does get muddier
over time with "big classes", though (e.g., the implementation of IDLE
appears to be growing more cyclic relations over time, and that's not
unusual).

> ...
> No cycles are not the problem. The problem is that finalisers are run in
> "incomplete contexts".

*A* problem I buy; not "the".

> These incomplete contexts arise because implementations of GC delete
> or partially delete collected objects before calling finalisers.
> That is a mistake.

Insert "some" after "because" & I agree.

> If you run *all* finalisers of *all* collected objects before you
> actually delete or partially delete any of those objects then all
> finalisers run in well defined contexts and there are no problems
> with finalisers.

Sure there are: finalizers in cycles remain largely useless in practice
without a way to specify the order in which they're invoked. Go back to the
Java part of the thread. The language can't know whether a cycle needs to
be finalized in a particular order, or, if it does, what that order may be.
In Java this has nothing to do with partial deletion: the spec is very
clear that the implementation must guarantee that all objects reachable from
a finalizer are wholly intact when the finalizer is run. What it doesn't
say-- because it can't --is that all objects reachable from a finalizer will
be in a "useful" state. That's left on the programmer's head, and when e.g.
child and parent can be finalized in either order or even simultaneously,
correct non-trivial finalizers are jokingly difficult to write.

The Scheme guardian gimmick at least attempts to address this, by never
finalizing anything by magic, leaving it up to the programmer to explicity
ask for an enumeration of the currently-finalizable objects and nuking them
or not however the programmer sees fit. If a cycle needs special treatment,
it's up to the programmer to realize that a cycle exists and "do the right
thing" by their own lights. The language doesn't help, but at least it
doesn't get in the way by doing a wrong thing either; it simply doesn't do
anything at all.

> That is the point of two pass collection.

Understood the first time; but if finalizers are to be invoked on cycle
members, that's just necessary, not sufficient.

a-problem-isn't-solved-until-the-users-agree-ly y'rs - tim
Solution to finalisation problem [Re: fork()] [ In reply to ]
Graham Matthews:
> If you run *all* finalisers of *all* collected objects before you
> actually delete or partially delete any of those objects then all
> finalisers run in well defined contexts and there are no problems
> with finalisers.
Tim Peters (tim_one@email.msn.com) wrote:
: Sure there are: finalizers in cycles remain largely useless in practice
: without a way to specify the order in which they're invoked. Go back to the
: Java part of the thread. The language can't know whether a cycle needs to
: be finalized in a particular order, or, if it does, what that order may be.
: In Java this has nothing to do with partial deletion: the spec is very
: clear that the implementation must guarantee that all objects reachable from
: a finalizer are wholly intact when the finalizer is run. What it doesn't
: say-- because it can't --is that all objects reachable from a finalizer will
: be in a "useful" state.

I don't understand this phrase "useful" state. If you run a two pass
collector then the order in which you finalise doesn't matter. All
objects will be in a state in which they can be finalised in pass 1 since
nothing is deleted in pass 1. I don't see the problem here at all.

Tim Peters (tim_one@email.msn.com) wrote:
: Understood the first time; but if finalizers are to be invoked on cycle
: members, that's just necessary, not sufficient.
:
At the risk of sounding completely brain-dead ... huh?!

graham

--
As you grow up and leave the playground
where you kissed your prince and found your frog
Remember the jester that showed you tears
the script for tears
Solution to finalisation problem [Re: fork()] [ In reply to ]
Graham Matthews wrote:
>
> All
> objects will be in a state in which they can be finalised in pass 1 since
> nothing is deleted in pass 1. I don't see the problem here at all.

I think what we need at this point is some concrete examples.
What sorts of things do people actually *do* with __del__
methods at the moment?

As far as I can see, the only thing you really need a
finalizer for is freeing an external resource. (The only
non-external resource is memory, and the GC takes care
of that.)

If each external resource is wrapped up in an object which
encapsulates everything needed to identify that resource,
then that object can have a finaliser which does its job
without caring about the state of any other object.

The only way I can see that the 2-pass collection scheme
can fail is if the finaliser of object A destroys something
needed by the finaliser of object B. But if each object
having a finaliser looks after just one external resource,
how can that happen?

Anyone have a real-world example?

Greg
Solution to finalisation problem [Re: fork()] [ In reply to ]
Graham Matthews wrote:
: > All
: > objects will be in a state in which they can be finalised in pass 1 since
: > nothing is deleted in pass 1. I don't see the problem here at all.
Greg Ewing (greg.ewing@compaq.com) wrote:
: The only way I can see that the 2-pass collection scheme
: can fail is if the finaliser of object A destroys something
: needed by the finaliser of object B. But if each object
: having a finaliser looks after just one external resource,
: how can that happen?

What do you mean "destroy" here?

As I am sure Tim will tell you the only way a 2 pass collector can get
into trouble is if the finaliser for A changes some object O used by the
finaliser for B. In this case the ordering in which you run finalisers
is important. In other words a 2 pass collector solves the "which order
to finalise in" problem as long as finalisers don't change objects
used by other finalisers.

But here is where semantics comes into play. What do we want finalisation
to do? In my opinion finalisation should be *defined* as follows: all
objects collected by the collector are finalised *simultaneously*.
In implementation speak this means that objects are finalised relative to
the system state at the *beginning* of the finalisation process. This is
also relatively straighforward to implement I think. Alternatively we
can simply decide that finalisers that play with the state of other
finalisers are "wrong" programs and hence not worry about them. This is
much like the way Python treats threads playing with shared data (it's
up to the programmer to ensure that such code runs in a determinate
manner). Personally I think either solution is fine, since I do believe
that finalisers playing with the state of other finalisers is very rare,
and also "wrong" code.

graham


--
As you grow up and leave the playground
where you kissed your prince and found your frog
Remember the jester that showed you tears
the script for tears
Solution to finalisation problem [Re: fork()] [ In reply to ]
Greg Ewing <greg.ewing@compaq.com> said:
> I think what we need at this point is some concrete examples.
> What sorts of things do people actually *do* with __del__
> methods at the moment?

I use them for a few things. One is to free memory allocated
from a C library, or to remove a temporary file. This is
effectively your statement:

> As far as I can see, the only thing you really need a
> finalizer for is freeing an external resource.

Most of the uses of __del__ in the std. Python lib. is of this form.



I use __del__ to implement the "resource acquisition is initialization"
idea, as in:

class PreserveMod:
"""Internal class which returns the molecule to its original "mod"
state"""
def __init__(self, mol):
self.mol = mol
self.modify = mol.mod
def __del__(self):
self.mol.mod = self.modify

Looking at the std. lib., audiodev also uses this style.


Finally, I use __del__ to free cycles which might otherwise arise, as
in this (untested) example:

class Molecule:
def __init__(self):
self.atoms = []
self.bonds = []
def __del__(self):
for atom in self.atoms:
del atom.bonds

class Atom:
def __init__(self, name):
self.name = name
self.bonds = []
def add_bond(self, bond):
self.bonds.append(bond)

class Bond:
def __init__(self, atoms):
assert len(atoms) == 2
self.atoms = atoms

mol = Molecule()
a1 = Atom("C")
a2 = Atom("O")
mol.atoms.append(a1)
mol.atoms.append(a2)
b = Bond([a1, a2])
a1.add_bond(b)
a2.add_bond(b)


This has cycles between the atoms and the bonds, but as neither of
them point back to the molecule, I can use the deletion of the
molecule (ref count goes to 0) to trigger a cleanup of the cycles.

Andrew
dalke@bioreason.com
Solution to finalisation problem [Re: fork()] [ In reply to ]
[Greg Ewing]
> ...
> The only way I can see that the 2-pass collection scheme
> can fail is if the finaliser of object A destroys something
> needed by the finaliser of object B. But if each object
> having a finaliser looks after just one external resource,
> how can that happen?

Greg, this is kind of like asking "if each loop terminates, how can we have
an infinite loop?". That is, sure, if you write code that works, it works
<wink>.

> Anyone have a real-world example?

Not in *my* Python code <0.5 wink>. It's all too tricky for me. Doesn't
mean bigger minds than mine should be dissuaded, though.

Note that the Boehm-Demers-Weiser collector has most often been suggested
for Python (heck, it's a FAQ), and you may want to read about why *they*
guarantee that if A points to B, and both are registered for finalization,
A's finalizer gets called first; e.g., at

http://reality.sgi.com/boehm_mti/finalization.html

This is an expensive and inconvenient decision for them to have made, so you
can be sure they didn't take it lightly.

BTW, a little-known characteristic of BDW is that given a cycle containing
finalizers, they let the cycle leak rather than risk running finalizers in a
wrong order.

when-the-reactor-is-going-critical-you-don't-want-to-lock-the-doors-
before-the-people-get-out<wink>-ly y'rs - tim
Solution to finalisation problem [Re: fork()] [ In reply to ]
Tim Peters wrote:
>
> Greg, this is kind of like asking "if each loop terminates, how can we have
> an infinite loop?". That is, sure, if you write code that works, it works
> <wink>.

What I meant was, does anyone have a real example of a
case where there would be serious inconvenience if
finalisers couldn't play with each other's data?

Greg
Solution to finalisation problem [Re: fork()] [ In reply to ]
On Fri, 18 Jun 1999 10:08:06 +1200, Greg Ewing wrote:
>Tim Peters wrote:

>> Greg, this is kind of like asking "if each loop terminates, how can we have
>> an infinite loop?". That is, sure, if you write code that works, it works
>> <wink>.

>What I meant was, does anyone have a real example of a
>case where there would be serious inconvenience if
>finalisers couldn't play with each other's data?

Good question.

It seems to me that the only time we want finalizers is when we're the
only link to an external resource, and furthermore it seems that the
finalizer should do nothing more than close that external resource.

Wait. Perhaps I should have said "when we _represent_ an external
resource".

It sure seems to me that one basic definition of an external resource is
that it contains no references to Python objects (except for _perhaps_ a
table of lazy references).

If I'm correct, and we can claim that finalizers are only needed in this
case, perhaps we can get away with making some simplifying assumptions.

>Greg

--
-William "Billy" Tanksley
Utinam logica falsa tuam philosophiam totam suffodiant!
:-: May faulty logic undermine your entire philosophy!