Mailing List Archive

Cyclops 0.9.4
http://www.python.org/ftp/python/contrib/System/Cyclops.py

# Module Cyclops version 0.9.4.
# Released to the public domain 18-Jul-1999,
# by Tim Peters (tim_one@email.msn.com).

Cyclops.py implements a CycleTracker class that provides major help in
finding and analyzing runtime cycles in Python programs.

Strong points:

+ Much faster than previous similar modules. Searching for cycles
takes time linear in the number of objects reachable from the
registered objects plus the number of arcs connecting them, and all
computation needed to display cycles is delayed until it's really
needed (if ever). Tens of thousands of objects and hundreds of
thousands of arcs can be analyzed in less than a minute on my creaky
old P5-166 system.

+ Many kinds of optional output reports, from a simple listing of
objects found in cycles, to a partitioning of cyclic objects into
maximal strongly-connected components.

+ Easy to add new types to the set of objects CycleTracker knows how
to "chase": pass appropriate functions to a CycleTracker instance's
chase_type() method. It's done this way instead of via subclassing
for speed.

+ An optional cycle-filter callback can be registered to ignore expected
cycles (typically those created by the Python implementation itself).

Weak points:

+ The problems this module helps to address are inherently difficult, and
CycleTracker doesn't make them any easier to understand or to fix -- it
only helps identify what and where they are. It's a tool, not a
solution.

+ All the optional abilities make for a steep learning curve.

+ Registering objects can distort the behavior of the program under
investigation, by keeping objects alive that would otherwise have died.
See the module docstring for discussion and workarounds.

<P><A HREF="http://www.python.org/ftp/python/contrib/System/Cyclops.py">
Cyclops.py 0.9.4</A> - tool for tracking cyclic structures in Python
programs. (18-Jul-99)
Cyclops 0.9.4 [ In reply to ]
Tim Peters <tim_one@email.msn.com> wrote:
>Cyclops.py implements a CycleTracker class that provides major help in
>finding and analyzing runtime cycles in Python programs.

Cool. How about a feature that displays only cycles created
since a marking function was called. I think this would be most
useful in programs that leak memory. You don't care if cycles
exist, you just don't want them to be continually created. The
way I see this working is you start up something like IDLE, do
some stuff, call the marking function, do more stuff, call cycle
showing function. Anyhow, thanks for this tool.


Neil
Cyclops 0.9.4 [ In reply to ]
[nascheme@ucalgary.ca]
> [This message has also been posted.]

Ditto.

> Cool. How about a feature that displays only cycles created
> since a marking function was called. I think this would be most
> useful in programs that leak memory. You don't care if cycles
> exist, you just don't want them to be continually created. The
> way I see this working is you start up something like IDLE, do
> some stuff, call the marking function, do more stuff, call cycle
> showing function.

You can do this yourself by installing a cycle filter, that returns false
whenever it sees a cycle that's been seen before. I don't think it could be
done more efficiently than that if implemented inside Cyclops, although
Cyclops knows how to subtract its internal references from the refcounts
delivered via get_rootset(). A user-written cycle filter can do that too
for its own references, but it is a little tricky to get straight.

Simpler, just call find_cycles() followed by show_stats() from time to time.
One of the output lines is "# cycles found: <n>". New cycles are being
found if & only if n is increasing.

Now if you actually *try* any of these and find them helpful in "real life",
pass it on! My experience in IDLE was that cycles weren't actually causing
the leaks, although up until it hit a dead end they were correlated with
leaks; the real cause was references in basic (reachable) objects that
weren't getting cleared out when the widgets they referenced got closed.

enough-to-make-you-dizzy-ly y'rs - tim
Cyclops 0.9.4 [ In reply to ]
Seems to me that Cyclops (or any other such thing
implemented in pure Python) is incapable of finding
what you *really* want to find -- which is groups
of objects not reachable from anywhere.

Greg
Cyclops 0.9.4 [ In reply to ]
[Greg Ewing]
> Seems to me that Cyclops (or any other such thing
> implemented in pure Python) is incapable of finding
> what you *really* want to find -- which is groups
> of objects not reachable from anywhere.

1. That can't be done in C today either (Python has no internal way to find
all allocated objects, and there's no protocol to implement pointer-chasing
either).

2. Unless you're fighting a bug in the interpreter's reference-counting code
(very rare), those groups of unreachable objects necessarily contain cycles.
That's why Cyclops focuses on cycles.

3. They're really not what you're looking for either, in my experience. All
forms of GC are mechanical approximations to what you really want: that an
object go away when it *won't* be referenced again, not when it *can't* be
referenced again. The latter is a conservative approximation to the former.

Most leaks I've seen are due to objects that are reachable, but where the
author didn't *expect* them to reachable. Cyclops can help with that too,
by calling get_rootset() at the end and seeing which registered objects
still have a non-0 (adjusted) refcount that you were expecting to be
history.

That's the mode it's being used in with IDLE now: things *are* still
reachable that nobody expects "should be" reachable. What Cyclops can't do
(yet <wink>) is tell you from *where* they can be reached. All it can say
for now is that they don't appear to be in cycles -- they're still reachable
"for some other" reason.

you-wanna-know-different-things-at-different-stages-ly y'rs - tim
Cyclops 0.9.4 [ In reply to ]
Tim Peters wrote:
>
> 2. Unless you're fighting a bug in the interpreter's reference-counting code
> (very rare), those groups of unreachable objects necessarily contain cycles.
> That's why Cyclops focuses on cycles.

I'm not sure that focusing on cycles is the right thing to
do, then. The cycles themselves aren't necessarily wrong;
what's wrong is the code that was meant to break them is
buggy. But you don't find that out until you leak memory.

I think the Python core needs an enhancement to allow
walking a list of all allocated objects. Cyclops in
conjuction with that would be very useful. Maybe a compile
time or run time option, since it would slow down
allocation/deallocation a little bit.

> That's the mode it's being used in with IDLE now: things *are* still
> reachable that nobody expects "should be" reachable. What Cyclops can't do
> (yet <wink>) is tell you from *where* they can be reached. All it can say
> for now is that they don't appear to be in cycles -- they're still reachable
> "for some other" reason.

So, again, cycles aren't really what you want to know about!

Greg
Cyclops 0.9.4 [ In reply to ]
Starting at the end:

[Greg Ewing]
> So, again, cycles aren't really what you want to know about!

You want to know about objects still living that you don't expect to be
living. No system can answer that for you! They can't know what you expect
<wink>.

Cycles have a special status in Python, so they're often a *useful* thing to
look for. For different apps, and for different times within a single app,
you may want to look for other things.

If you have specific objects in mind that you expect will be dead at time T,
you can register them with Cyclops before time T, and at time T Cyclops will
tell you whether they probably would have been dead had you not registered
them <wink>.

If you have no idea which objects to look for, you're not going to get help
short of divine revelation. An object living after you expect it to die is
an error in program logic, and as hard to track down as any other kind of
programmer flub; heck, harder than most, because usually an error of
omission (you neglected to lose all the references) -- there's no line of
code "to blame".

> ...
> I think the Python core needs an enhancement to allow
> walking a list of all allocated objects. Cyclops in
> conjuction with that would be very useful. Maybe a compile
> time or run time option, since it would slow down
> allocation/deallocation a little bit.

If what you want is a list of all reachable objects, Python needs three
different things:

1) A little internal cleanup to make it easy to get at "the root set" (the
minimal set of objects from which it's guaranteed that all live objects are
reachable). The lack of this in the core is why Cyclops (& Plumbo) makes
you "register" "interesting" objects.

2) A protocol for pointer-chasing, and changing extension modules to play
along with it. The lack of this in the core is why Cyclops has a funky set
of methods for installing type-specific pointer-chasing functions.

3) Likely a notion of "mark bits", to identify objects already seen.
Cyclops fakes these via internal dicts keying off object id.

All of those are also needed if Python is ever to move toward builtin
portable mark-&-sweep (optional or not). The advantage over "walking an
(explicit) list" is no overhead (time or space) unless & until it's used.
I'd like to see Python move this direction for several reasons (one of which
is indeed that then a Cyclops-like thingy could tell you not only that
something unexpected is still alive, but also from where it can be
reached -- chasing that chain back would be a real help).

in-the-meantime-it's-a-gloriously-fun-game<wink>-ly y'rs - tim
Cyclops 0.9.4 [ In reply to ]
In article <001301bed410$e59eafe0$642d2399@tim>,
Tim Peters <tim_one@email.msn.com> wrote:
>
>If you have no idea which objects to look for, you're not going to get help
>short of divine revelation. An object living after you expect it to die is
>an error in program logic, and as hard to track down as any other kind of
>programmer flub; heck, harder than most, because usually an error of
>omission (you neglected to lose all the references) -- there's no line of
>code "to blame".

Going off on a tangent, I want to whine a bit about a "bug" in my code
that I came across recently: I'm downloading web pages, cross-checking
HREFs against URLs already in the database. One URL on one particular
web page keeps causing my code to crash. Netscape's View | Source
didn't show anything wrong. Finally, I used urllib.urlopen() to get a
raw look at the page -- and it turned out that the URL had a null byte
at the end! (Yes, '<a href="http://foo/bar/\000">')

A null byte in the middle of a web page. Who'd'a thunk? <grrrrrr!>
--
--- Aahz (@netcom.com)

Androgynous poly kinky vanilla queer het <*> http://www.rahul.net/aahz/
Hugs and backrubs -- I break Rule 6 (if you want to know, do some research)
Cyclops 0.9.4 [ In reply to ]
Tim Peters wrote:
>
> You want to know about objects still living that you don't expect to be
> living. No system can answer that for you! They can't know what you expect
> <wink>.

I'm quite happy to go through the list of living objects
and decide for myself which ones should be dead. What I
meant was, what I really want to know is how those objects
got reached. If there are cycles involved, it wouldn't
hurt to be told about them, but they're not the most
important thing.

> If what you want is a list of all reachable objects,

No, I'd like a list of *all* objects, reachable or not.
Cyclops could then do what it does now, but for the
isolated islands of garbage as well.

> All of those are also needed if Python is ever to move toward builtin
> portable mark-&-sweep (optional or not). The advantage over "walking an
> (explicit) list" is no overhead (time or space) unless & until it's used.

If there were mark & sweep there wouldn't be unreachable
garbage, so in that case you are right -- there would be
no point in keeping a list. I'm only suggesting it as an
interim measure.

> then a Cyclops-like thingy could tell you not only that
> something unexpected is still alive, but also from where it can be
> reached

I don't understand why Cyclops couldn't be made to do
that now. It reached those objects somehow -- all it needs
to do is remember how!

I don't mean to sound ungrateful for Cyclops, by the way --
it's a great idea. All it needs is a little bit more help
from the Python core to make it even greater...

With-government-funding-I'm-sure-I-could-make-it-*really*-silly-oops-er-I-mean-useful,
Greg
Cyclops 0.9.4 [ In reply to ]
[Tim]
> You want to know about objects still living that you don't expect
> to be living. No system can answer that for you! They can't know
> what you expect <wink>.

[Greg Ewing]
> I'm quite happy to go through the list of living objects
> and decide for myself which ones should be dead.

Nobody can sift thru hundreds of thousands of live objects by eye, which
happens routinely in large long-running apps.

> What I meant was, what I really want to know is how those objects
> got reached.

Well, 100,000 objects can often be reached in 500,000 ways via one direct
link -- & it gets messier the longer the path length you consider. You have
to reduce the set of objects that are potentially interesting; Cyclops
requires you to identify them in advance; I'm not sure how it could be
easier to identify them later.

> If there are cycles involved, it wouldn't hurt to be told about
> them, but they're not the most important thing.

Cycles do have special status in CPython, though. I find them the most
fruitful thing to look at first. The more of them I see, though, I agree
they're not the whole banana they've been made out to be.

>> If what you want is a list of all reachable objects,

> No, I'd like a list of *all* objects, reachable or not.

All objects ever allocated, or all objects whose refcounts haven't yet
fallen to 0?

The former implies no object memory is ever recycled, which would make it
impractical for the large long-running apps most in *need* of help.

The latter requires a way to forget objects whose refcounts have reached 0,
and so likely requires a doubly-linked list of objects (so that dead
non-cyclic objects can unlink themselves efficiently when they die). Since
"even ints are boxed" in Python, that's a major memory hit.

A short while back Guido pitched a compromise, maintaining a list of all (&
only) allocated dicts (whether dicts whose refcount had reached zero would
be purged from this list wasn't resolved). Since each class and instance
object has a dict, this probably allows getting at 99% of problem cases.
Although descriptor-based slicing of large array objects seems to account
for all major leaks reported by others on c.l.py this month <wink/sigh>.

> Cyclops could then do what it does now, but for the isolated islands
> of garbage as well.

Isolated garbage in CPython necessarily contains cycles, and all isolated
garbage (cyclic or not) is necessarily reachable *from* cycles, so Cyclops
has *a* handle on that now. It doesn't need more information so much as it
needs ways to present less to the user. Let's say you have everything you
want, and even more: for every object ever allocated, Cyclops2 can give you
a timestamped history of every change ever made to it, every path from which
it could be reached at any given time, and similarly for every object
reachable from it.

Then what? Beats me -- every app seems different, and each requires a lot
of thought to whittle the last trillion machine cycles down to the seven
that matter. More info doesn't help without a clear plan for exploiting it.

>> All of those are also needed if Python is ever to move toward builtin
>> portable mark-&-sweep (optional or not). The advantage over "walking an
>> (explicit) list" is no overhead (time or space) unless & until
>> it's used.

> If there were mark & sweep there wouldn't be unreachable
> garbage, so in that case you are right -- there would be
> no point in keeping a list. I'm only suggesting it as an
> interim measure.

No, what I described is essential for the M half of portable M&S, but on top
of that a list is essential for the S half. Without a list of allocated
objects, Python *can't find* the "isolated islands of garbage" in order to
clean them up -- it could mark all reachable objects, but couldn't sweep the
unreachable ones.

You should take that as good news: if Python is ever to *reach* builtin
portable M&S, the list you want will be there in one form or another. But
in my repeated experience here, unreachable garbage is not the problem --
it's reachable stuff you expected *would* be garbage but isn't; and a list
isn't needed to find that.

>> then a Cyclops-like thingy could tell you not only that something
>> unexpected is still alive, but also from where it can be reached

> I don't understand why Cyclops couldn't be made to do
> that now. It reached those objects somehow -- all it needs
> to do is remember how!

I think you should try using it once <wink>. You have to explicitly
register the set of objects you're curious about, and in practice *that's*
(i.e., your registration) how it "reached them". "Yes, object X is
reachable in 12 ways, and here are the ways I know about: (1) you
registered it. (2) umm ... well, no other way I can find -- register more
stuff, chase more types, filter less away, and try again".

There was once another pile of code to trace out all ways of reaching each
registered object from all other registered objects, but that didn't turn up
anything but cycles (which it finds more cheaply via other means), plus
hundreds of thousands of useless paths ("OK, A.x can be reached from A"), so
I tossed it. I think I'll put it back in, though -- one of these days it's
bound to stumble into something worth finding even if by accident <0.6

> I don't mean to sound ungrateful for Cyclops, by the way --
> it's a great idea.

Greg, it's much more than an idea: it's code you can use <smile>.

> All it needs is a little bit more help from the Python core to make
> it even greater...

The core help I think it could best exploit was covered in the preceding msg
(namely, the "M" half of M&S machinery). That's not enough to find
unreachable islands unless an element of the island was registered, but
those appear to be the tail of the leaky dog.

canine-metaphors-in-service-of-a-barfworthy-problem-ly y'rs - tim
Cyclops 0.9.4 [ In reply to ]
Tim Peters wrote:
>
> Nobody can sift thru hundreds of thousands of live objects by eye, which
> happens routinely in large long-running apps.

I can look for certain kinds of object, though. If
I open ten BlargWindows and then close them, I know
there shouldn't be any BlargWindow instances left.
If there are, I can take a closer look at them to
find out what what's happening.

> Well, 100,000 objects can often be reached in 500,000 ways via one direct
> link -- & it gets messier the longer the path length you consider. You have
> to reduce the set of objects that are potentially interesting; Cyclops
> requires you to identify them in advance; I'm not sure how it could be
> easier to identify them later.

I'm not asking to be told about every possible way
of reaching every live object, obviously! What I mean
is, once I've found (by whatever means) a live object that
I know should be dead, I should be able to say "show me
a path from the root set to this object". I don't need all
paths, only one of them -- somewhere along that path there
will be a link that shouldn't be there, and it should be
fairly easy to spot.

That's what I meant when I said that cycles aren't what
I want to know about in that situation.

> All objects ever allocated, or all objects whose refcounts haven't yet
> fallen to 0?

The latter.

> Since
> "even ints are boxed" in Python, that's a major memory hit.

Which is why I'm happy for it to be a debugging option,
maybe requiring a different interpreter. Although if M&S
is ever added, we're going to have to live with this
overhead all the time -- or find a smarter way to handle
it.

> Let's say you have everything you
> want, and even more: for every object ever allocated, Cyclops2 can give you
> a timestamped history of every change ever made to it, every path from which
> it could be reached at any given time, and similarly for every object
> reachable from it.

Tim, this is expandio ad absurdum! I never asked for all that
information. All I said whas that it would be handy to have a way
to discover the existence of dead objects taking up room in
my heap. I never expected the idea to be this controversial...

> I think you should try using it once <wink>.

You're right. I'll shut up now.

Greg
Cyclops 0.9.4 [ In reply to ]
[Greg Ewing]
> ...
> I can look for certain kinds of object, though. If
> I open ten BlargWindows and then close them, I know
> there shouldn't be any BlargWindow instances left.
> If there are, I can take a closer look at them to
> find out what what's happening.

Perfectly reasonable. Under Cyclops <wink>, register those 10 by hand, or
put a .register(self) call in BlargWindows.__init__.

> ...
> I'm not asking to be told about every possible way
> of reaching every live object, obviously!

That you weren't asking for that was obvious, but what you were asking for
wasn't. Now it's clearer:

> What I mean is, once I've found (by whatever means) a live object that
> I know should be dead, I should be able to say "show me a path from the
> root set to this object". I don't need all paths, only one of them --
> somewhere along that path there will be a link that shouldn't be there,
> and it should be fairly easy to spot.

Sounds like a good idea to me too.

>> All objects ever allocated, or all objects whose refcounts haven't yet
>> fallen to 0?

> The latter.

>> Since "even ints are boxed" in Python, that's a major memory hit.

> Which is why I'm happy for it to be a debugging option,
> maybe requiring a different interpreter. Although if M&S
> is ever added, we're going to have to live with this
> overhead all the time -- or find a smarter way to handle
> it.

Guido suggested focussing on dicts; perhaps "containers" is sufficient
(being an object that contains a pointer -- doesn't seem any loss to exempt
ints, strings, floats, etc).

>> Let's say you have everything you want, and even more: ...

> Tim, this is expandio ad absurdum! I never asked for all that
> information.

Sure. My point was that even with perfect & complete information, the hard
problems remain.

> All I said whas that it would be handy to have a way to discover the
> existence of dead objects taking up room in my heap. I never expected
> the idea to be this controversial...

Na, *every* idea is *endlessly* controversial until someone actually
implements it; then it becomes an endless source of complaints <wink>.

You're absolutely right that Python has no way to tell you this now, with or
without Cyclops, and that it would be of some help. "Purify"-like tools can
do that for C, and I was blue-skying because such tools have never been
*enough* of a help. More info or better info is needed, but at this time we
can't get either.

dreamily y'rs - tim