Mailing List Archive

Tackling circular dependencies in 2.0?
I think in Python 2.0 it would be nice to have some way to reclaim circular
dependencies without the programmer explicitly having to do something like
implement a destroy() method and requiring other programmers to (remember
to) call it. I forget what the current state of affairs is w.r.t. future
memory management in Python. Not knowing anything much about memory
management, would it be possible to have a sort of mixed ref count/garbage
collection system where you only use the gc stuff as a last resort? My
thought is that it would be useful to use gc to find and reclaim circular
garbage.

can-you-tell-I-just-got-bitten-by-a-circular-reference?-ly y'rs

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098 | Python: Programming the way Guido indented...
Re: Tackling circular dependencies in 2.0? [ In reply to ]
Skip Montanaro wrote:
>
> I think in Python 2.0 it would be nice to have some way to reclaim circular
> dependencies without the programmer explicitly having to do something like
> implement a destroy() method and requiring other programmers to (remember
> to) call it. I forget what the current state of affairs is w.r.t. future
> memory management in Python. Not knowing anything much about memory
> management, would it be possible to have a sort of mixed ref count/garbage
> collection system where you only use the gc stuff as a last resort? My
> thought is that it would be useful to use gc to find and reclaim circular
> garbage.
>
> can-you-tell-I-just-got-bitten-by-a-circular-reference?-ly y'rs

If you don't want to wait until 2.0 becomes GA, you could try weak
references:

http://starship.skyport.net/~lemburg/mxProxy.html

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 102 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: Tackling circular dependencies in 2.0? [ In reply to ]
> Not knowing anything much about memory
> management, would it be possible to have a sort of mixed ref count/garbage
> collection system where you only use the gc stuff as a last resort? My
> thought is that it would be useful to use gc to find and reclaim circular
> garbage.

That's *sorta* what Perl does, 'though you can still be bitten by
circular refs in a long-running process. Err, long-running thread. You
see, Perl has a mark-and-sweep garbage collector which is run only on
thread shutdown.

From the "perlobj" man page:

[.code sample omitted because I'm sure many python-dev readers
automatically filter out any message that matches /[\$\@\%]\w+/
it's just a constructor that creates objects for a recursive data
structure]

If you create nodes like that, they (currently) won't go
away unless you break their self reference yourself. (In
other words, this is not to be construed as a feature, and
you shouldn't depend on it.)

Almost.

When an interpreter thread finally shuts down (usually when
your program exits), then a rather costly but complete
mark-and-sweep style of garbage collection is performed, and
everything allocated by that thread gets destroyed. This is
essential to support Perl as an embedded or a
multithreadable language. For example, this program
demonstrates Perl's two-phased garbage collection:

[more interesting code omitted]

Interesting idea, but I don't think it's what Skip had in mind.

Greg
--
Greg Ward - software developer gward@cnri.reston.va.us
Corporation for National Research Initiatives
1895 Preston White Drive voice: +1-703-620-8990
Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
RE: Tackling circular dependencies in 2.0? [ In reply to ]
[Skip Montanaro]
> I think in Python 2.0 it would be nice to have some way to
> reclaim circular dependencies without the programmer explicitly
> having to do something ...

This was debated (again) at great length on c.l.py just a few months ago.
Guido chimed in with a proposal to keep track of only the dicts that have
been allocated, and now and again mark everything reachable from the root
set and nuke whatever dicts don't end up marked. Cycles involving dicts
would get reclaimed this way, but not cycles not involving dicts. The
approach to destructors for objects in cycles was "tough -- they don't get
called". What to do about destructors for objects that are not themselves
involved in cycles but are reachable only from dead cycles (so are in fact
dead too) wasn't addressed. Seemed possible that stuff reachable from
ordinary dicts (not in a cycle, and neither reachable from a cycle) would
behave differently than today, since the "list of all dicts" may keep the
stuff artificially alive until the next mark+sweep, even if the refcount on
the stuff fell to zero; there's probably an OK way around that, though.

Anyway, Guido was aiming for the minimal changes that could possibly do real
good. It didn't pretend to reclaim all cycles, and was (IMO) too eager to
punt on the hard issues (the combo of cycles, destructors and resurrection
is a god-awful mess, even in theory; Scheme uses callbacks to dump the
problems back on the users Java has incredibily elaborate rules that are
both bulletproof and unusable; the Boehm collector lets objects with
destructors that are in cycles simply leak, rather than do a wrong thing;
Stroustrup has flip-flopped and most recently argued for Guido's "reclaim
the memory but don't call the destructors" approach, but a member of the C++
committee told me he's overwhelmingly opposed on this one (I know I would
oppose it)).

In any case, nothing has come of it, and no easy principled solution is in
sight. OTOH, if Guido balks at explaining what "__" means to 12-year-olds,
wait until he tries to explain immortal cyclic trash <wink>.

Perl scores points for its brute-force end-of-thread M&S, but I believe
complex user-level data structures are much rarer in Perl, simply due to the
clumsiness of the syntax and explicit reference model. Perl's version of
"nested functions" don't actually nest (all "def"s are floated to the top
level by the compiler, regardless of how deeply they're nested), so Perl's
lexical closures don't create cyclic trash either (well, they do, but in the
same sense there's a cycle between a Python module namespace and the
functions in that module -- Perl also special-cases the snot out of those
and busts those cycles by brute force).

So there you go! It needs to be solved and nobody has a clue <wink>.

if-java-hype-hadn't-suffered-exponential-decay-we-could-have-
dumped-it-on-the-jvm-by-now-ly y'rs - tim
Re: Tackling circular dependencies in 2.0? [ In reply to ]
Tim Peters wrote:
>
> [Skip Montanaro]
> > I think in Python 2.0 it would be nice to have some way to
> > reclaim circular dependencies without the programmer explicitly
> > having to do something ...
>
> This was debated (again) at great length on c.l.py just a few months ago.
> Guido chimed in with a proposal to keep track of only the dicts that have
> been allocated, and now and again mark everything reachable from the root
> set and nuke whatever dicts don't end up marked. Cycles involving dicts
> would get reclaimed this way, but not cycles not involving dicts. The
> approach to destructors for objects in cycles was "tough -- they don't get
> called". What to do about destructors for objects that are not themselves
> involved in cycles but are reachable only from dead cycles (so are in fact
> dead too) wasn't addressed. Seemed possible that stuff reachable from
> ordinary dicts (not in a cycle, and neither reachable from a cycle) would
> behave differently than today, since the "list of all dicts" may keep the
> stuff artificially alive until the next mark+sweep, even if the refcount on
> the stuff fell to zero; there's probably an OK way around that, though.

You could probably tackle the problem by doing local mark&sweep whenever
the ref count on a dictionary falls down to 1 (meaning that it is only
referenced from the list of all dicts). This is what I do in mxProxy's
weak reference implementation and to my surprise it solved all those
strange situations where objects are kept alive longer than they
would have normally.

> Anyway, Guido was aiming for the minimal changes that could possibly do real
> good. It didn't pretend to reclaim all cycles, and was (IMO) too eager to
> punt on the hard issues (the combo of cycles, destructors and resurrection
> is a god-awful mess, even in theory; Scheme uses callbacks to dump the
> problems back on the users Java has incredibily elaborate rules that are
> both bulletproof and unusable; the Boehm collector lets objects with
> destructors that are in cycles simply leak, rather than do a wrong thing;
> Stroustrup has flip-flopped and most recently argued for Guido's "reclaim
> the memory but don't call the destructors" approach, but a member of the C++
> committee told me he's overwhelmingly opposed on this one (I know I would
> oppose it)).

Not calling the destructor will cause leakage in all objects
allocating extra storage, such as lists, instances and
probably just about any dynamically sized object there is in
Python... solving the problem only half way. Plus you will
definitely run into trouble as soon as external resources
are involved, e.g. open files or connections to databases.

Perhaps we should give more power to the user instead of trying
to give him fuzzy feelings about what's happening underneath
the hood. Builtin weak references or other indirect ways
of accessing objects (e.g. by giving unique names to the
involved objects) can solve many of those circ. ref. problems.

BTW, I usually use an instrumented Python interpreter to track
down circular references: it uses a tracing hook in the
allocation/deallocation code of Python instances which is used
when Python is run in debugging mode (python -d). The hook
calls a function sys.traceinstances (if present) which allows
me to keep a track record of all allocated instances:

def traceinstances(action,inst):

""" Tracing hook. This is called whenever an instances is created
and destroyed. action is either 'create' or 'delete'; inst
points to the instance object.
"""
...

If anyone is interested I can post the patch (against Python 1.5).

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 101 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: Tackling circular dependencies in 2.0? [ In reply to ]
> Not calling the destructor will cause leakage in all objects
> allocating extra storage, such as lists, instances and
> probably just about any dynamically sized object there is in
> Python... solving the problem only half way. Plus you will
> definitely run into trouble as soon as external resources
> are involved, e.g. open files or connections to databases.

If I remember well, the only destructors not called would be __del__
methods, since the dependencies between to-be-deleted instances are
unknown to the collector. Regular (C-level) destructors would of
course be called.

--Guido van Rossum (home page: http://www.python.org/~guido/)
Re: Tackling circular dependencies in 2.0? [ In reply to ]
Marc> BTW, I usually use an instrumented Python interpreter to track
Marc> down circular references: ...

Marc> If anyone is interested I can post the patch (against Python 1.5).

That would be interesting to look at. I found my latest circular reference
by building Python with Py_DEBUG defined and trudging through the output at
the end.

Skip Montanaro | http://www.mojam.com/
skip@mojam.com | http://www.musi-cal.com/~skip/
847-971-7098 | Python: Programming the way Guido indented...
Re: Tackling circular dependencies in 2.0? [ In reply to ]
Skip Montanaro wrote:
>
> Marc> BTW, I usually use an instrumented Python interpreter to track
> Marc> down circular references: ...
>
> Marc> If anyone is interested I can post the patch (against Python 1.5).
>
> That would be interesting to look at. I found my latest circular reference
> by building Python with Py_DEBUG defined and trudging through the output at
> the end.

Here it is:

--- Objects/orig/classobject.c Thu Jan 1 20:39:04 1998
+++ Objects/classobject.c Sat Aug 8 21:38:57 1998
@@ -403,10 +588,37 @@ PyInstance_New(class, arg, kw)
inst = NULL;
}
Py_DECREF(res);
}
}
+ /* sys.traceinstances hook */
+ if (Py_DebugFlag && inst) {
+ PyObject *fct;
+
+ fct = PySys_GetObject("traceinstances");
+ if (fct) {
+ PyObject *v,*arg;
+ PyObject *error_type, *error_value, *error_traceback;
+
+ /* Save and clear any exception */
+ PyErr_Fetch(&error_type,&error_value,
+ &error_traceback);
+ PyErr_Clear();
+ arg = Py_BuildValue("(sO)","create",(PyObject *)inst);
+ v = PyEval_CallObject(fct,arg);
+ Py_DECREF(arg);
+ if (!v) {
+ PyErr_Print();
+ PyErr_Clear();
+ }
+ else
+ Py_DECREF(v);
+ /* Restore exception state */
+ PyErr_Restore(error_type,error_value,
+ error_traceback);
+ }
+ }
return (PyObject *)inst;
}

/* Instance methods */

@@ -460,10 +672,31 @@ instance_dealloc(inst)
}
else
Py_DECREF(res);
Py_DECREF(del);
}
+ /* sys.traceinstances hook */
+ if (Py_DebugFlag) {
+ PyObject *fct;
+
+ fct = PySys_GetObject("traceinstances");
+ if (fct) {
+ PyObject *v,*arg;
+
+ /* Clear any previous exception */
+ PyErr_Clear();
+ arg = Py_BuildValue("(sO)","delete",(PyObject *)inst);
+ v = PyEval_CallObject(fct,arg);
+ Py_DECREF(arg);
+ if (!v) {
+ PyErr_Print();
+ PyErr_Clear();
+ }
+ else
+ Py_DECREF(v);
+ }
+ }
/* Restore the saved exception and undo the temporary revival */
PyErr_Restore(error_type, error_value, error_traceback);
/* Can't use DECREF here, it would cause a recursive call */
if (--inst->ob_refcnt > 0) {
#ifdef COUNT_ALLOCS


--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 101 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
Re: Tackling circular dependencies in 2.0? [ In reply to ]
Guido van Rossum wrote:
>
> > Not calling the destructor will cause leakage in all objects
> > allocating extra storage, such as lists, instances and
> > probably just about any dynamically sized object there is in
> > Python... solving the problem only half way. Plus you will
> > definitely run into trouble as soon as external resources
> > are involved, e.g. open files or connections to databases.
>
> If I remember well, the only destructors not called would be __del__
> methods, since the dependencies between to-be-deleted instances are
> unknown to the collector. Regular (C-level) destructors would of
> course be called.

Ok, so low-stuff will not break. But what about e.g. wrappers
around these low-level (C-level) objects written in Python, e.g.
database abstraction classes ?

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 101 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/
RE: Tackling circular dependencies in 2.0? [ In reply to ]
[Guido van Rossum]
> If I remember well, the only destructors not called would be __del__
> methods, since the dependencies between to-be-deleted instances are
> unknown to the collector. Regular (C-level) destructors would of
> course be called.

Seems to me that the dependencies among to-be-deleted arbitrary C objects
are equally unknown to the collector. Assuming there's a fundamental
difference between objects implemented in Python and objects implemented in
C seems shaky on the face of it. Or if it's not just a convenient
assumption, on what is it based?

and-what-about-objects-implemented-in-fortran<wink>?-ly y'rs - tim
Re: Tackling circular dependencies in 2.0? [ In reply to ]
> [Guido van Rossum]
> > If I remember well, the only destructors not called would be __del__
> > methods, since the dependencies between to-be-deleted instances are
> > unknown to the collector. Regular (C-level) destructors would of
> > course be called.

[Tim]
> Seems to me that the dependencies among to-be-deleted arbitrary C objects
> are equally unknown to the collector. Assuming there's a fundamental
> difference between objects implemented in Python and objects implemented in
> C seems shaky on the face of it. Or if it's not just a convenient
> assumption, on what is it based?

My assumption was based on the standard Python objects. Cycles
necessarily have to include dictionaries or lists (modules, classes
and instances link to each other through dictionaries; ditto for
function objects; i'm crossing my fingers here for stack frame and
traceback objects :-) and I can do things to these to get rid of the
links without getting rid of the objects: del L[:] or D.clear().

Third party C objects might have interdependencies similar to those
found in Python instances, but my gut feeling is that these aren't as
problematic -- e.g. interdependencies between C modules are rare
(because the machinery is cumbersome) while they are common between
Python modules; and C code isn't susceptible to the problems that
Python destructors encounter when the modules they import have already
been destroyed.

I know, it's an unusual amount of handwaving...

--Guido van Rossum (home page: http://www.python.org/~guido/)