Mailing List Archive

1 2 3  View All
fork() [ In reply to ]
[Tim, rambling]
> So my plan for the hour <wink> is: if M&S finds a cycle, it
> will clean it all up iff no involved object has a __del__. If
> an object in a trash cycle does have a __del__, tough, the cycle
> won't be reclaimed. Instead we'll supply a function you can call
> if you care, that will return those objects (one at a time? in
> a list? whatever), and you do with them what you will; if you don't
> clean them up, you'll keep getting them back every time you call that
> function until you resurrect them or break the cycle by hand in
> the way that makes most sense to your algorithm.

[Evan Simpson, scrambling <wink>.]
> Presuming the hour isn't up yet, I've a handful of scattered comments:
>
> . How about a callback instead of/in addition to a probe function?

Dunno. Not keen on burdening it with features before I think I know whether
it makes sense at *all* <wink>.

> That way you could flip a switch and get called even on cycles which
> are "del-clean", if you wanted to check for unintentional cycle creation.

A worthy goal indeed.

> . "del-cleanness" only requires that actual cycle participants lack
> __del__, not dangly bits, right?

"refcount rules" seem to work fine in the absence of cycles (at least nobody
has complained about them yet!), and once a cycle is purged the dangly bits
can follow those rules without further intervention. So, yes, your reading
seems to make good sense. For this hour, anyway <wink>.

> Or does this re-introduce analysis that you intended to throw out?

I'm trying to build on what Guido has in mind -- tracking only the dicts in
cycles is very attractive for time and space reasons, and there's sure no
analysis I could even dream of throwing out from *that* bare bones approach
<wink>. But Guido is resorting to "well, let's ignore __del__ entirely
then" because it doesn't collect enough info to do anything more ambitious
than that.

What it leaves behind is a list of dicts definitely alive, and another list
of dicts definitely unreachable *and* definitely involved in a cycle. What
I want to explore is whether we can build on that, by now making a deeper
analysis of *just* the stuff in the latter "doomed" list. By running a more
expensive M&S with the dead dicts as "the roots", the structure of the trash
can be deduced and the objects containing those dicts identified (if an
object contains a dead dict, the object must itself be trash -- else the
dict could not be dead).

That would take additional time roughly proportional to the amount of cyclic
trash, and influenced by the complexity of its structure. Which should be
"not much" and "not very" most often. Then I wave my hands a lot and
everyone is happy <snort>.

> . (somewhat rhetorical) How the heck do you recognize what you've got when
> the machinery hands you an object which isn't otherwise reachable?

If you intend to use __del__ in objects involved in cycles, the obvious
answer is that it's your problem to make each object store enough info about
itself in itself so that your code knows what to do with it when it gets it
back; i.e., it's akin to asking what the heck you're supposed to do with a
floating-point number <wink>.

> I realize that there's no general answer to this, and that in some
> specific cases it would be enough to check __bases__, but the only answer
> which occurs to me off the bat is to uniquely mark objects which you
> expect to have to handle, or provide them with special __cleanup__
methods,
> or the like.

Whatever works for you is fine by me. I personally expect I would use
__del__ in a cycle only to torture newbies on c.l.py with the obscure
possibilities <wink>.

> Now we have wide-open policy, which I recall you decrying <wink>.

When a language *can* do everything "the right way" on my behalf & without
bothering me, I appreciate it when it does. refcounting is like that.

But when it *can't* do "the right thing" by magic, it's obligated first to
do no harm, and second to give me a way to do it myself. That's where the
wide-open policy comes in; e.g., if objects X and Y are in a dead cycle, and
X's finalizer happens to need the services of an active Y in order to shut
itself down cleanly, the language does harm if it runs Y's finalizer
first -- and may also do harm if it refuses to finalize either. But the
language simply can't guess the intent of your code, so it should refuse to
act in ignorance, instead letting you do it explicitly however you require.
I don't think Python should fill in your "def" bodies with code it makes up
either <wink>.

> I guess that no matter what mechanism is implemented, certain
badly-handled
> cases will appear and need to be handled with "well, don't do that!"

No, no, that's never ever happened in *Python* <wink>.

> finalization-is-like-life-you-can't-win-and-you-can't-break-even-ly y'rs
> Evan Simpson

luckily-though-you-do-get-to-die-in-the-end-ly y'rs - tim
fork() [ In reply to ]
In the view of some (or many?) Python programmers, to remove all
the references to an object is the obvious and deterministic way
to destroy the object. The predictability of destructor
invocation favored by some (or many?) Pythoneers is based on
this rule.

She/he consciously writes codes to destroy the object based on
the rule. Just the same, any C++ programmer consciously writes
`delete's to destroy the `new'ed object. Both intents are very
similar to each other. From _this_ point of view, what
"Tim Peters" <tim_one@email.msn.com> wrote
in article <000101beb886$628a1ea0$099e2299@tim>:
> To me, that merely describes the conditions under which the respective
> languages treat an object as immortal today. It makes good sense for
> Stroustrup to say that in C++ a garbage collector shouldn't invoke the
> finalizer for its flavor of immortal objects, but he's not arguing that
> position *because* they're immortal today, but instead because explictly
> new'ed objects never have their finalizer invoked in C++ unless explicitly
> delete'd, and GC is not explicit delete'tion.

sounds too particular about superficial differences. However,

> Python has no such rules, so
> the core of his argument doesn't apply to Python without more strain than I
> can swallow <0.9 wink>.

is certainly consistent and valid in the _official_ semantics
described in the Python Reference Manual.

At the same time, in the current Pytnon (or rather CPython), the
idea that to remove all references corresponds to the delete
operation in C++ (and thus being part of cycles in Python
corresponds to being never deleted in C++) is _also_ consistent.
And moreover, it is a valid working rule practically useful to
predict the object destruction.

I described it against what
Moshe Zadka <moshez@math.huji.ac.il> wrote in
article <Pine.SUN.3.95-heb-2.07.990611075346.4942G-100000@sunset.ma.huji.ac.il>
< On Fri, 11 Jun 1999, Guido van Rossum wrote:
< <snip/>
< > ...this almost seems acceptable: in the formal semantics, objects
< > that are part of cycles would live forever, and as an invisible
< > optimization we recycle their memory anyway if we know they are
< > unreachable. (I've got the feeling that I've seen this rule before
< > somewhere.)
< <snip/>
< No, that can't be. If we did, it were in Scheme, and after you put it in,
< you'd be morally obliged to add full-fledged lambda, tail-recursion and
< user visible continuations.

There I said that Guido's idea is quite consistent and orthodox
in a sense which has almost nothing to do with Scheme. If one
adopts the idea such as I described above, one can write rightly
working programs, can predict their behavior more precisely, and
might think well of Guido's idea (even if not definitively; it
is not the sole consequence even for C++, you know).

At the same time, if one adopts the _official_ weaker semantics
(which is also written by Guido), one can still (needless to
say!) write rightly working programs, and the above idea of
Guido would lose some power of persuasion (even if not
rejected). To remove all the references _no_ longer means the
immediate destruction of the object.

All in all, each idea is a valid model for the current Python.
The definitive difference lies in the semantics of language it
relies on. Regarding the favor of destruction predictability
such as seen in this newsgroup, the current actual semantics is
perhaps as important as the official one.

Nevertheless, please do not think that I stick to one idea.
Actually I have also a third idea, though it might be useless
for the purpose of storage reclamation (see article
<u36756gcoc.fsf@ares.sys1.ngo.okisoft.co.jp> on 03 Jun 1999
12:56:03 +0900). I am not yet sure which one is better than
others in total ...


> Whose point <wink>? Java has always guaranteed to "destroy objects in an
> order that ensures that the destructor for one object doesn't refer to an
> object that has been previously destroyed", and indeed "without help from
> the programmer". So by "Java solved that one" I meant it satisfactorily
> addressed the points made in the quote. That doesn't mean there aren't
> other points to be made, but since the quote you gave didn't make any other
> points I don't feel bad about commenting on the points it did make <wink>.

Even if the storage of such objects are not reclaimed yet, their
finalizer()s may have been already invoked. This may be a real
problem for aggregation objects (if you once rely on finalizers
in Java). Note that Stroustrup's "destroy" implies "call the
destructor".

The JLS says:
"12.6.2 Finalizer Invocations are Not Ordered

Java imposes no ordering on finalize method calls. Finalizers
may be called in any order, or even concurrently.

As an example, if a circularly linked group of unfinalized
objects becomes unreachable (or finalizer-reachable), then all
the objects may become finalizable together. Eventually, the
finalizers for these objects may be invoked, in any order, or
even concurrently using multiple threads. If the automatic
storage manager later finds that the objects are unreachable,
then their storage can be reclaimed."

Therefore, the authors of the JLS continue:

"It is straightforward to implement a Java class that will cause
a set of finalizer-like methods to be invoked in a specified
order for a set of objects when all the objects become
unreachable. Defining such a class is left as an exercise for
the reader."

You see, to implement such a class needs _help_ from the
programmer. It would be like the following. It might be very
boring to write codes like this every time from the scratch, and
the component classes might lose flexibility and re-usability.

public class HeadOfAggregation {
int count;
int expected_count; ...
protected void finalize() { trigger(); }
public void trigger() {
boolean r = false;
synchronized (this) {
count++;
r = (count == expected_count);
}
if (r) finalize_aggregation_totally();
} ....
}

public class ComponentOfAggregation_1 { ....
protected void finalize() { our_head.trigger(); }
}

public class ComponentOfAggregation_2 { ....
protected void finalize() { our_head.trigger(); }
}

--===-----========------------- Sana esprimo naskas sanan ideon.
SUZUKI Hisao suzuki611@okisoft.co.jp, suzuki@acm.org.
fork() [ In reply to ]
[.Hisao Suzuki, still likes his anlaogy between explicitly delete'ing
in C++, and explicitly removing all references in CPython]

You say my position "sounds too particular about superficial differences",
and I say yours reaches too far from superficial similarities. We're never
going to agree on this one, so I'll leave it there.

> ...
> All in all, each idea is a valid model for the current Python.
> The definitive difference lies in the semantics of language it
> relies on. Regarding the favor of destruction predictability
> such as seen in this newsgroup, the current actual semantics is
> perhaps as important as the official one.

Trying to nudge this in a useful direction <wink>, the "current actual
semantics" don't suggest anything about what "should be" done in the
presence of cycles with destructors in a GC world, except for the obvious
"reclaim a cycle the instant it becomes unreachable". That is doable but--
I think --not cheaply enough to be practical. So what then? BTW, I'm told
that most of the C++ committee disagrees with Stroustrup on this one -- the
notion that an object can be destroyed without its destructor getting
invoked is distasteful at first bite, and leaves a bad taste even after
three helpings of creative rationalization <wink>.

If there's a good argument to made for it, it should rest where Guido left
it: the behavior falls out of the simplest implementation, and that's
really all there is to it: you *put up* with the semantics that fall out,
hoping they're not too bad in practice. Lots of things work like that
(floating-point arithmetic is a particularly miserable example <0.3 wink>);
sometimes it's just the best you can get. Let's not pretend it's a good
thing for "deep reasons", though!

>| Further the C++ 3rd Ed. says:
>| "It is possible to design a garbage collector to invoke the
>| destructors for objects that have been specifically
>| registered' with the collector. However, there is no
>| standard way of `registering' objects. Note that it is
>| always important to destroy objects in an order that ensures
>| that the destructor for one object doesn't refer to an
>| object that has been previously destroyed. Such ordering
>| isn't easily achieved by a garbage collector without help
>| from the programmer."

[Tim sez Java solved that one]

[Hisao]
> Even if the storage of such objects are not reclaimed yet, their
> finalizer()s may have been already invoked. This may be a real
> problem for aggregation objects (if you once rely on finalizers
> in Java). Note that Stroustrup's "destroy" implies "call the
> destructor".

We don't have an argument about Java (at least none that I've been able to
detect <wink>), and Stroustrup wasn't writing about Java. His "destroy"--
being about C++ --*also* implies "nuke the memory", which isn't the case in
Java. Java does guarantee that the memory of every object reachable from a
finalizer is wholly intact, so-- again --Java addresses the point Stroustrup
is making (which was in support of his view that GC not invoke destructors
at all). GC simply doesn't need any help from the programmer to guarantee
that much; although Stroustrup is right (Graham notwithstanding <wink>) that
it's not easily achieved.

[.quotes from the JLS, and a Java aggregation example where the order
finalizers run in is important]
> ...
> You see, to implement such a class needs _help_ from the
> programmer.

Certainly. It's a different issue, though: GC doesn't need help to avoid
referencing bogus memory, it needs help to implement the intent of the
algorithm. If Stroustrup were concerned about the latter, he would not have
said "such ordering isn't easily achieved ... without help", he would have
said such ordering is *impossible* to achieve without programmer
intervention. He's a careful writer.

Even in the absence of cycles, CPython today doesn't (and can't!) guarantee
to run finalizers in the order your code may need; e.g., if you run this
today under 1.5.2:

class Parent:
def __init__(self):
self.kids = []
def add_child(self, name):
self.kids.append(Child(name))
def __del__(self):
global estate
estate = 1000000
print "parent died, leaving", estate, "dollars"
for k in self.kids:
print k.name, "wants a share of the cash!"

class Child:
def __init__(self, name):
self.name = name
def __del__(self):
global estate
me = estate / 2
estate = estate - me
print self.name, "died of grief with", me, "of those bucks"

p = Parent()
p.add_child("Nancy")
p.add_child("Tim")
del p

It prints:

parent died, leaving 1000000 dollars
Nancy wants a share of the cash!
Tim wants a share of the cash!
Nancy died of grief with 500000 of those bucks
Tim died of grief with 250000 of those bucks

But if you run it under the current CVS Python snapshot, Tim gets the half
million and poor Nancy gets 250000 (lists get destroyed in reverse order in
the development version).

The order of finalizer invocation certainly matters a lot here (well, to me
and my sister, or at least to the charities in our wills <wink>), and it's
impossible for Python-- or any other language --to guess the intent.

All that can be said is that refcount rules *usually* avoid such surprises;
in the end, though, it's still up to you to force ordering when you need it.

BTW, that does suggest one "natural" extension of CPython semantics to
cycles: since CPython today doesn't guarantee anything about the order in
which finalizers are called when more than one object loses its last
reference at the same time, it would be consistent to run finalizers for
members of GC'ed cycles in an arbitrary order too; and if your program
relies on a particular order, tough.

One bad aspect is that it's a rule the compiler+runtime can't enforce, in
the sense that violations can't be detected. Python tries (much harder than
C++) to avoid rules like that.

it's-a-good-game-that-can't-be-won<wink>-ly y'rs - tim

1 2 3  View All