Mailing List Archive: Crash in new "trashcan" mechanism.

Crash in new "trashcan" mechanism.

Apr 9, 2000, 9:32 PM

Post #1 of 21 (1919 views)

[.Im re-sending as the attachment caused this to be held up for
administrative approval. Ive forwarded the attachement to Chris -
anyone else just mail me for it]

Ive struck a crash in the new trashcan mechanism (so I guess Chris
is gunna pay the most attention here). Although I can only provoke
this reliably in debug builds, I believe it also exists in release
builds, but is just far more insidious.

Unfortunately, I also can not create a simple crash case. But I
_can_ provide info on how you can reliably cause the crash.
Obviously only tested on Windows...

* Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the
download, and unzip.

* Replace "transformer.py" with the attached version (multi-arg
append bites :-)

* Ensure you have a Windows "debug" build available, built from CVS.

* From the p2c directory, Run "python_d.exe gencode.py gencode.py"

You will get a crash, and the debugger will show you are destructing
a list, with an invalid object. The crash occurs about 1000 times
after this code is first hit, and I can't narrow the crash condition
down :-(

If you open object.h, and disable the trashcan mechanism (by
changing the "xx", as the comments suggest) then it runs fine.

Hope this helps someone - Im afraid I havent a clue :-(

Mark.

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 10, 2000, 7:24 AM

Post #2 of 21 (1911 views)

Permalink

About extensions and Trashcan.

Mark Hammond wrote:
...
> Ive struck a crash in the new trashcan mechanism (so I guess Chris
> is gunna pay the most attention here). Although I can only provoke
> this reliably in debug builds, I believe it also exists in release
> builds, but is just far more insidious.
>
> Unfortunately, I also can not create a simple crash case. But I
> _can_ provide info on how you can reliably cause the crash.
> Obviously only tested on Windows...
...
> You will get a crash, and the debugger will show you are destructing
> a list, with an invalid object. The crash occurs about 1000 times
> after this code is first hit, and I can't narrow the crash condition
> down :-(

The trashcan is built in a quite simple manner. It uses a List
to delay deletions if the nesting level is deep.
The list operations are not thread safe.

A special case is handled: It *could* happen on destruction
of the session, that trashcan cannot handle errors, since
the thread state is already undefined.

But the general case of no interpreter lock is undefined and
forbidden.

In a discussion with Guido, we first thought that we would
need some thread safe object for the delay. Later on it
turned out that it must be generally *forbidden* to destroy
an object when the interpreter lock is not held.

Reason: An instance destruction might call __del__, and that
would run an interpreter without lock. Forbidden.
For that reason, I kept the list in place.

I think it is fine that it crashed.
There are obviously extension modules left
where the interpreter lock rule is violated.
The builtin Python code has been checked, there
are most probably no holes, including tkinter.
Or, I made a mistake in this little code:

void
_PyTrash_deposit_object(op)
PyObject *op;
{
PyObject *error_type, *error_value, *error_traceback;

if (PyThreadState_GET() != NULL)
PyErr_Fetch(&error_type, &error_value, &error_traceback);

if (!_PyTrash_delete_later)
_PyTrash_delete_later = PyList_New(0);
if (_PyTrash_delete_later)
PyList_Append(_PyTrash_delete_later, (PyObject *)op);

if (PyThreadState_GET() != NULL)
PyErr_Restore(error_type, error_value, error_traceback);
}

void
_PyTrash_destroy_list()
{
while (_PyTrash_delete_later) {
PyObject *shredder = _PyTrash_delete_later;
_PyTrash_delete_later = NULL;
++_PyTrash_delete_nesting;
Py_DECREF(shredder);
--_PyTrash_delete_nesting;
}
}

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

Re: Crash in new "trashcan" mechanism. [ In reply to ]

jeremy at cnri

Apr 10, 2000, 9:05 AM

Post #3 of 21 (1916 views)

Permalink

>>>>> "CT" == Christian Tismer <tismer@tismer.com> writes:

CT> I think it is fine that it crashed. There are obviously
CT> extension modules left where the interpreter lock rule is
CT> violated. The builtin Python code has been checked, there are
CT> most probably no holes, including tkinter. Or, I made a mistake
CT> in this little code:

I think have misunderstood at least one of Mark's bug report and your
response. Does the problem Mark reported rely on extension code? I
thought the bug was triggered by running pure Python code. If that is
the case, then it can never be fine that it crashed. If the problem
relies on extension code, then there ought to be a way to write the
extension so that it doesn't cause a crash.

Jeremy

PS Mark: Is the transformer.py you attached different from the one in
the nondist/src/Compiler tree? It looks like the only differences are
with the whitespace.

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 10, 2000, 12:47 PM

Post #4 of 21 (1914 views)

Permalink

Jeremy Hylton wrote:
>
> >>>>> "CT" == Christian Tismer <tismer@tismer.com> writes:
>
> CT> I think it is fine that it crashed. There are obviously
> CT> extension modules left where the interpreter lock rule is
> CT> violated. The builtin Python code has been checked, there are
> CT> most probably no holes, including tkinter. Or, I made a mistake
> CT> in this little code:
>
> I think have misunderstood at least one of Mark's bug report and your
> response. Does the problem Mark reported rely on extension code? I
> thought the bug was triggered by running pure Python code. If that is
> the case, then it can never be fine that it crashed. If the problem
> relies on extension code, then there ought to be a way to write the
> extension so that it doesn't cause a crash.

Oh!

If it is so, then there is in fact a problem left
in the Kernel.
Mark, did you use an extension?

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

RE: Crash in new "trashcan" mechanism. [ In reply to ]

mhammond at skippinet

Apr 10, 2000, 4:00 PM

Post #5 of 21 (1910 views)

Permalink

> If it is so, then there is in fact a problem left
> in the Kernel.
> Mark, did you use an extension?

I tried to explain this in private email:

This is pure Python code. The parser module is the only extension
being used. The crash _always_ occurs as a frame object is being
de-allocated, and _always_ happens as a builtin list object (a local
variable) is de-alloced by the frame.

Always the same line of Python code, always the same line of C code,
always the exact same failure.

Mark.

RE: Crash in new "trashcan" mechanism. [ In reply to ]

mhammond at skippinet

Apr 10, 2000, 4:41 PM

Post #6 of 21 (1911 views)

Permalink

[Sorry - missed this bit]

> PS Mark: Is the transformer.py you attached different
> from the one in
> the nondist/src/Compiler tree? It looks like the only
> differences are
> with the whitespace.

The attached version is simply the "release" P2C transformer.py with
.append args fixed. I imagine it is very close to the CVS version
(and indeed I know for a fact that the CVS version also crashes).

My initial testing showed the CVS compiler did _not_ trigger this
bug (even though code that uses an identical transformer.py does),
so I just dropped back to P2C and stopped when I saw it :-)

Mark.

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 11, 2000, 9:12 AM

Post #7 of 21 (1913 views)

Permalink

Mark Hammond wrote:
>
> > Can you perhaps tell me what the call stack says?
> > Is it somewhere, or are we in finalization code of the
> > interpreter?
>
> The crash is in _Py_Dealloc - op is a pointer, but all fields
> (ob_type, ob_refcnt, etc) are all 0 - hence the crash.
>
> Next up is list_dealloc - op is also trashed - ob_item != NULL
> (hence we are in the if condition, and calling Py_XDECREF() (which
> triggers the _Py_Dealloc) - ob_size ==9, but all other fields are 0.
>
> Next up is Py_Dealloc()
>
> Next up is _PyTrash_Destroy()
>
> Next up is frame_dealloc()
>
> _Py_Dealloc()
>
> Next up is eval_code2() - the second last line - Py_DECREF(f) to
> cleanup the frame it just finished executing.
>
> Up the stack are lots more eval_code2() - we are just running the
> code - not shutting down or anything.

And you do obviously not have any threads, right?
And you are in the middle of a simple, heavy computing
application. Nothing with GUI events happening?

That can only mean there is a bug in the Python core or in
the parser module. That happens to be exposed by trashcan,
but isn't trashcan's fault.

Well. Trashcan might change the order of destruction a little.
This *should* not be a problem. But here is a constructed
situation where I can think of a problem, if we have
buggy code, somewhere:

Assume you have something like a tuple that holds other elements.
If there is a bug, like someone is dereferencing an argument
in an arg tuple, what is always an error.
This error can hide for a million of years:

a contains (b, c, d)

The C function decref's a first, and erroneously then also one
of the contained elements. If b is already deallotted by
decreffing a, it has refcount zero, but that doesn't hurt,
since the dead object is still there, and no mallcos have
taken place (unless there is a __del__ trigered of course).

This eror would never be detected.
With trashcan, it could happen that destruction of a is
deferred, but by chance now the delayed erroneous decref of b
might happen before a's decref, and there may be mallocs
in between, since I have a growing list.

If my code is valid (and it appears so), then I guess
we have such a situation somewhere in the core code.

I-smell-some-long-nightshifts-again - ly y'rs - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

Re: Crash in new "trashcan" mechanism. [ In reply to ]

Vladimir.Marangozov at inrialpes

Apr 11, 2000, 9:37 AM

Post #8 of 21 (1910 views)

Permalink

Christian Tismer wrote:
>
> About extensions and Trashcan.
> ...
> Or, I made a mistake in this little code:
>
> void
> _PyTrash_deposit_object(op)
> PyObject *op;
> {
> PyObject *error_type, *error_value, *error_traceback;
>
> if (PyThreadState_GET() != NULL)
> PyErr_Fetch(&error_type, &error_value, &error_traceback);
>
> if (!_PyTrash_delete_later)
> _PyTrash_delete_later = PyList_New(0);
> if (_PyTrash_delete_later)
> PyList_Append(_PyTrash_delete_later, (PyObject *)op);
>
> if (PyThreadState_GET() != NULL)
> PyErr_Restore(error_type, error_value, error_traceback);
> }

Maybe unrelated, but this code does not handle the case when
PyList_Append fails. If it fails, the object needs to be deallocated
as usual. Looking at the macros, I don't see how you can do that
because Py_TRASHCAN_SAFE_END, which calls the above function,
occurs after the finalization code...

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 11, 2000, 9:57 AM

Post #9 of 21 (1910 views)

Permalink

Vladimir Marangozov wrote:
>
> Christian Tismer wrote:
> >
> > About extensions and Trashcan.
> > ...
> > Or, I made a mistake in this little code:

> Maybe unrelated, but this code does not handle the case when
> PyList_Append fails. If it fails, the object needs to be deallocated
> as usual. Looking at the macros, I don't see how you can do that
> because Py_TRASHCAN_SAFE_END, which calls the above function,
> occurs after the finalization code...

Yes, it does not handle this case for the following reasons:

Reason 1)
If the append does not work, then the system is apparently
in a incredibly bad state, most probably broken!
Note that these actions only take place when we have a
recursion depth of 50 or so. That means, we already freed
some memory, and now we have trouble with this probably
little list. I won't touch a broken memory management.

Reason 2)
If the append does not work, then we are not allowed to
deallocate the element at all. Trashcan was written in
order to avoid crashes for too deeply nested objects.
The current nesting level of 20 or 50 is of course
very low, but generally I would assume that the limit
is choosen for good reasons, and any deeper recursion
might cause a machine crash. Under this assumption,
the only thing you can do is to forget about the object.

Remark ad 1):
I had once changed the strategy to use a tuple construct instead.
Thinking of memory problems when the shredder list must be
grown, this could give an advantage. The optimum would be
if the destructor data structure is never bigger than the
smallest nested object. This would even allow me to recycle
these for the destruction, without any malloc at all.

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

Re: Crash in new "trashcan" mechanism. [ In reply to ]

Vladimir.Marangozov at inrialpes

Apr 11, 2000, 9:59 AM

Post #10 of 21 (1913 views)

Permalink

Christian Tismer wrote:
>
> Vladimir Marangozov wrote:
> >
> > Maybe unrelated, but this code does not handle the case when
> > PyList_Append fails. If it fails, the object needs to be deallocated
> > as usual. Looking at the macros, I don't see how you can do that
> > because Py_TRASHCAN_SAFE_END, which calls the above function,
> > occurs after the finalization code...
>
> Yes, it does not handle this case for the following reasons:
> ...

Not enough good reasons to segfault. I suggest you move the
call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert
the condition there.

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 11, 2000, 10:20 AM

Post #11 of 21 (1913 views)

Permalink

Vladimir Marangozov wrote:
>
> Christian Tismer wrote:
> >
> > Vladimir Marangozov wrote:
> > >
> > > Maybe unrelated, but this code does not handle the case when
> > > PyList_Append fails. If it fails, the object needs to be deallocated
> > > as usual. Looking at the macros, I don't see how you can do that
> > > because Py_TRASHCAN_SAFE_END, which calls the above function,
> > > occurs after the finalization code...
> >
> > Yes, it does not handle this case for the following reasons:
> > ...
>
> Not enough good reasons to segfault. I suggest you move the
> call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert
> the condition there.

Sorry, I don't see what you are suggesting, I'm distracted.
Maybe you want to submit a patch, and a few more words
on what you mean and why you prefer to core dump
with stack overflow?

I'm busy seeking a bug in the core, not in that ridiculous code.
Somewhere is a real bug, probably the one which I was
seeking many time before, when I got weird crashes in the small
block heap of Windows. It was never solved, and never clear if
it was Python or Windows memory management.
Maybe we just found another entrance to this.
It smells so very familiar: many many small tuples and we crash.

busy-ly y'rs - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

RE: Crash in new "trashcan" mechanism. [ In reply to ]

mhammond at skippinet

Apr 11, 2000, 5:04 PM

Post #12 of 21 (1911 views)

Permalink

To answer Chris' earlier question: No threads, no gui, no events.
The "parser" module is the only builtin module (apart from the
obvious - ntpath etc)

Greg and/or Bill can correct me if I am wrong - it is just P2C, and
it is just console based, mainline procedural code. It _is_ highly
recursive tho (and I believe this will turn out to be the key factor
in the crash)

> Somewhere is a real bug, probably the one which I was
> seeking many time before, when I got weird crashes in the small
> block heap of Windows. It was never solved, and never clear if
> it was Python or Windows memory management.

I am confident that this problem was my fault, in that I was
releasing a different version of the MFC DLLs than I had actually
built with. At least everyone with a test case couldnt repro it
after the DLL update.

This new crash is so predictable and always with the same data that
I seriously doubt the problem is in any way related.

> Maybe we just found another entrance to this.
> It smells so very familiar: many many small tuples and we crash.

Lists this time, but I take your point. Ive got a busy next few
days, so it is still exists after that I will put some more effort
into it.

Mark.

Re: Crash in new "trashcan" mechanism. [ In reply to ]

Vladimir.Marangozov at inrialpes

Apr 11, 2000, 8:18 PM

Post #13 of 21 (1924 views)

Permalink

Christian Tismer wrote:
>
> Vladimir Marangozov wrote:
> >
> > Not enough good reasons to segfault. I suggest you move the
> > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert
> > the condition there.
>
> Sorry, I don't see what you are suggesting, I'm distracted.

I was thinking about the following. Change the macros in object.h from:

#define Py_TRASHCAN_SAFE_BEGIN(op) \
{ \
++_PyTrash_delete_nesting; \
if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \

#define Py_TRASHCAN_SAFE_END(op) \
;} \
else \
_PyTrash_deposit_object((PyObject*)op);\
--_PyTrash_delete_nesting; \
if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \
_PyTrash_destroy_list(); \
} \

to:

#define Py_TRASHCAN_SAFE_BEGIN(op) \
{ \
++_PyTrash_delete_nesting; \
if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \
_PyTrash_deposit_object((PyObject*)op) != 0) { \

#define Py_TRASHCAN_SAFE_END(op) \
;} \
--_PyTrash_delete_nesting; \
if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \
_PyTrash_destroy_list(); \
} \

where _PyTrash_deposit_object returns 0 on success, -1 on failure. This
gives another last chance to the system to finalize the object, hoping
that the stack won't overflow. :-)

My point is that it is better to control whether _PyTrash_deposit_object
succeeds or not (and it may fail because of PyList_Append).

If this doesn't sound acceptable (because of the possible stack overflow)
it would still be better to abort in _PyTrash_deposit_object with an
exception "stack overflow on recursive finalization" when PyList_Append
fails. Leaving it unchecked is not nice -- especially in such extreme
situations.

Currently, if something fails, the object is not finalized (leaking
memory). Ok, so be it. What's not nice is that this happens silently
which is not the kind of tolerance I would accept from the Python runtime.

As to the bug: it's curious that, as Mark reported, without the trashcan
logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;)
some erroneous situation. I'd expect that if the trashcan macros are
implemented as above, the crash will go away (which wouldn't solve the
problem and would obviate the trashcan in the first place :-)

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Re: Crash in new "trashcan" mechanism. [ In reply to ]

Vladimir.Marangozov at inrialpes

Apr 11, 2000, 8:34 PM

Post #14 of 21 (1911 views)

Permalink

Of course, this

Vladimir Marangozov wrote:
>
> to:
>
> #define Py_TRASHCAN_SAFE_BEGIN(op) \
> { \
> ++_PyTrash_delete_nesting; \
> if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \
> _PyTrash_deposit_object((PyObject*)op) != 0) { \
>

was meant to be this:

#define Py_TRASHCAN_SAFE_BEGIN(op) \
{ \
++_PyTrash_delete_nesting; \
if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \
_PyTrash_deposit_object((PyObject*)op) != 0) { \

--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 12, 2000, 5:43 AM

Post #15 of 21 (1914 views)

Permalink

Vladimir Marangozov wrote:
>
> Christian Tismer wrote:
> >
> > Vladimir Marangozov wrote:

[yup, good looking patch]

> where _PyTrash_deposit_object returns 0 on success, -1 on failure. This
> gives another last chance to the system to finalize the object, hoping
> that the stack won't overflow. :-)
>
> My point is that it is better to control whether _PyTrash_deposit_object
> succeeds or not (and it may fail because of PyList_Append).
> If this doesn't sound acceptable (because of the possible stack overflow)
> it would still be better to abort in _PyTrash_deposit_object with an
> exception "stack overflow on recursive finalization" when PyList_Append
> fails. Leaving it unchecked is not nice -- especially in such extreme
> situations.

You bet that I *would* raise an exception if I could. Unfortunately
the destructors have no way to report an error, and they are
always called in a context where no error is expected (Py_DECREF macro).

I believe this *was* quite ok, until __del__ was introduced.
After that, it looks to me like a design flaw.
IMHO there should not be a single function in a system that needs heap
memory, and cannot report an error.

> Currently, if something fails, the object is not finalized (leaking
> memory). Ok, so be it. What's not nice is that this happens silently
> which is not the kind of tolerance I would accept from the Python runtime.

Yes but what can I do? This isn't worse than before. deletion errors
die silently, this is the current concept. I don't agree with it,
but I'm not the one to change policy. In that sense, trashcan
was just compliant to a concept, without saying this is a good concept. :-)

> As to the bug: it's curious that, as Mark reported, without the trashcan
> logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;)
> some erroneous situation. I'd expect that if the trashcan macros are
> implemented as above, the crash will go away (which wouldn't solve the
> problem and would obviate the trashcan in the first place :-)

I think trashcan can be made *way* smarter:

Much much more better would be to avoid memory allocation
in trashcan at all. I'm wondering if that would be possible.
The idea is to catch a couple of objects in an earlier recursion
level, and use them as containers for later objects-to-be-deleted.
Not using memory at all, that's what I want. And it would
avoid all messing with errors in this context.

I hate Java dieing silently, since it has not enough memory
to tell me that it has not enough memory :-)

but-before-implementing-this-*I*-will-need-to-become-*way*-smarter -
ly y'rs - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 12, 2000, 6:00 AM

Post #16 of 21 (1916 views)

Permalink

Mark,

I know you are very busy. But I have no chance to build
a debug version, and probably there are more differences.

Can you perhaps try Vlad's patch?
and tell me if the outcome changes? This would give me
much more insight.
The change affects the macros and the function _PyTrash_deposit_object
which now must report an error via the return value.

The macro code should be:

#define Py_TRASHCAN_SAFE_BEGIN(op) \
{ \
++_PyTrash_delete_nesting; \
if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \
_PyTrash_deposit_object((PyObject*)op) != 0) { \
#define Py_TRASHCAN_SAFE_END(op) \
;} \
--_PyTrash_delete_nesting; \
if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \
_PyTrash_destroy_list(); \
} \

And the _PyTrash_deposit_object code should be (untested):

int
_PyTrash_deposit_object(op)
PyObject *op;
{
PyObject *error_type, *error_value, *error_traceback;

if (PyThreadState_GET() != NULL)
PyErr_Fetch(&error_type, &error_value, &error_traceback);

if (!_PyTrash_delete_later)
_PyTrash_delete_later = PyList_New(0);
if (_PyTrash_delete_later)
return PyList_Append(_PyTrash_delete_later, (PyObject *)op);
else
return -1;

if (PyThreadState_GET() != NULL)
PyErr_Restore(error_type, error_value, error_traceback);
return 0;
}

The result of this would be really enlighting :-)

ciao - chris

Vladimir Marangozov wrote:
>
> Of course, this
>
> Vladimir Marangozov wrote:
> >
> > to:
> >
> > #define Py_TRASHCAN_SAFE_BEGIN(op) \
> > { \
> > ++_PyTrash_delete_nesting; \
> > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \
> > _PyTrash_deposit_object((PyObject*)op) != 0) { \
> >
>
> was meant to be this:
>
> #define Py_TRASHCAN_SAFE_BEGIN(op) \
> { \
> ++_PyTrash_delete_nesting; \
> if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \
> _PyTrash_deposit_object((PyObject*)op) != 0) { \
>
> --
> Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
> http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> http://www.python.org/mailman/listinfo/python-dev

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

RE: Crash in new "trashcan" mechanism. [ In reply to ]

mhammond at skippinet

Apr 12, 2000, 6:07 PM

Post #17 of 21 (1918 views)

Permalink

The trashcan bug turns out to be trivial to describe, but not so
trivial to fix. Put simply, the trashcan mechanism conflicts
horribly with PY_TRACE_REFS :-(

The problem stems from the fact that the trashcan resurrects
objects. An object added to the trashcan has its ref count as zero,
but is then added to the trash list, transitioning its ref-count
back to 1. Deleting the trashcan then does a second deallocation of
the object, again taking the ref count back to zero, and this time
actually doing the destruction. By pure fluke, this works without
Py_DEBUG defined!

With Py_DEBUG defined, this first causes problems due to ob_type
being NULL. _Py_Dealloc() sets the ob_type element to NULL before
it calls the object de-allocater. Thus, the trash object first hits
a zero refcount, and its ob_type is zapped. It is then resurrected,
but the ob_type value remains NULL. When the second deallocation
for the object happens, this NULL type forces the crash.

Changing the Py_DEBUG version of _Py_Dealloc() to not zap the type
doesnt solve the problem. The whole _Py_ForgetReference()
linked-list management also dies. Second time we attempt to
deallocate the object the code that removes the object from the
"alive objects" linked list fails - the object was already removed
first time around.

I see these possible solutions:

* The trash mechanism is changed to keep a list of (address,
deallocator) pairs. This is a "cleaner" solution, as the list is
not considered holding PyObjects as such, just blocks of memory to
be freed with a custom allocator. Thus, we never end up in a
position where a Python objects are resurrected - we just defer the
actual memory deallocation, rather than attempting a delayed object
destruction. This may not be as trivial to implement as to describe
:-)

* Debug builds disable the trash mechanism. Not desired as the
basic behaviour of the interpreter will change, making bug tracking
with debug builds difficult! If we went this way, I would (try to
:-) insist that the Windows debug builds dropped Py_DEBUG, as I
really want to avoid the scenario that switching to a debug build
changes the behaviour to this extent.

* Perform further hacks, so that Py_ForgetReference() gracefully
handles NULL linked-list elements etc.

Any thoughts?

Mark.

RE: Crash in new "trashcan" mechanism. [ In reply to ]

gstein at lyra

Apr 12, 2000, 6:25 PM

Post #18 of 21 (1911 views)

Permalink

On Thu, 13 Apr 2000, Mark Hammond wrote:
>...
> I see these possible solutions:
>
> * The trash mechanism is changed to keep a list of (address,
> deallocator) pairs. This is a "cleaner" solution, as the list is
> not considered holding PyObjects as such, just blocks of memory to
> be freed with a custom allocator. Thus, we never end up in a
> position where a Python objects are resurrected - we just defer the
> actual memory deallocation, rather than attempting a delayed object
> destruction. This may not be as trivial to implement as to describe
> :-)
>
> * Debug builds disable the trash mechanism. Not desired as the
> basic behaviour of the interpreter will change, making bug tracking
> with debug builds difficult! If we went this way, I would (try to
> :-) insist that the Windows debug builds dropped Py_DEBUG, as I
> really want to avoid the scenario that switching to a debug build
> changes the behaviour to this extent.
>
> * Perform further hacks, so that Py_ForgetReference() gracefully
> handles NULL linked-list elements etc.
>
> Any thoughts?

Option 4: lose the trashcan mechanism. I don't think the free-threading
issue was ever resolved.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 13, 2000, 3:50 AM

Post #19 of 21 (1918 views)

Permalink

Greg Stein wrote:
>
> On Thu, 13 Apr 2000, Mark Hammond wrote:
> >...
> > I see these possible solutions:
> >
> > * The trash mechanism is changed to keep a list of (address,
> > deallocator) pairs. This is a "cleaner" solution, as the list is
> > not considered holding PyObjects as such, just blocks of memory to
> > be freed with a custom allocator. Thus, we never end up in a
> > position where a Python objects are resurrected - we just defer the
> > actual memory deallocation, rather than attempting a delayed object
> > destruction. This may not be as trivial to implement as to describe
> > :-)

This one sounds quite hard to implement.

> > * Debug builds disable the trash mechanism. Not desired as the
> > basic behaviour of the interpreter will change, making bug tracking
> > with debug builds difficult! If we went this way, I would (try to
> > :-) insist that the Windows debug builds dropped Py_DEBUG, as I
> > really want to avoid the scenario that switching to a debug build
> > changes the behaviour to this extent.

I vote for this one at the moment.

> > * Perform further hacks, so that Py_ForgetReference() gracefully
> > handles NULL linked-list elements etc.
> >
> > Any thoughts?
>
> Option 4: lose the trashcan mechanism. I don't think the free-threading
> issue was ever resolved.

Option 5: Forget about free threading, change trashcan in a way
that it doesn't change the order of destruction, doesn't need
memory at all, and therefore does not change anything if it is
disabled in debug mode.

cheers - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com

Re: Crash in new "trashcan" mechanism. [ In reply to ]

gstein at lyra

Apr 13, 2000, 4:14 AM

Post #20 of 21 (1911 views)

Permalink

On Thu, 13 Apr 2000, Christian Tismer wrote:
> Greg Stein wrote:
>...
> > Option 4: lose the trashcan mechanism. I don't think the free-threading
> > issue was ever resolved.
>
> Option 5: Forget about free threading, change trashcan in a way
> that it doesn't change the order of destruction, doesn't need
> memory at all, and therefore does not change anything if it is
> disabled in debug mode.

hehe... :-)

Definitely possible. Seems like you could just statically allocate an
array of PyObject* and drop the pointers in there (without an INCREF or
anything). Place them there, in order. Dunno about the debug stuff, and
how that would affect it.

Cheers,
-g

--
Greg Stein, http://www.lyra.org/

Re: Crash in new "trashcan" mechanism. [ In reply to ]

tismer at tismer

Apr 13, 2000, 4:56 AM

Post #21 of 21 (1911 views)

Permalink

Greg Stein wrote:
>
> On Thu, 13 Apr 2000, Christian Tismer wrote:
> > Greg Stein wrote:
> >...
> > > Option 4: lose the trashcan mechanism. I don't think the free-threading
> > > issue was ever resolved.
> >
> > Option 5: Forget about free threading, change trashcan in a way
> > that it doesn't change the order of destruction, doesn't need
> > memory at all, and therefore does not change anything if it is
> > disabled in debug mode.
>
> hehe... :-)
>
> Definitely possible. Seems like you could just statically allocate an
> array of PyObject* and drop the pointers in there (without an INCREF or
> anything). Place them there, in order. Dunno about the debug stuff, and
> how that would affect it.

I could even better use the given objects-to-be-destroyed
as an explicit stack. Similar to what the debug delloc does,
I may abuse the type pointer as a stack pointer. Since
the refcount is zero, it can be abused to store a type code
(we have only 5 types to distinguish here), and there is enough
room for some state like a loop counter as well.

Given that, I can build a destructor without recursion, but
with an explicit stack and iteration. It would not interfere
with anything, since it actually does the same thing, just
in a different way, but in the same order, without mallocs.

Should I try it? (say no and I'll do it anyway:)

ciao - chris

--
Christian Tismer :^) <mailto:tismer@appliedbiometrics.com>
Applied Biometrics GmbH : Have a break! Take a ride on Python's
Kaunstr. 26 : *Starship* http://starship.python.net
14163 Berlin : PGP key -> http://wwwkeys.pgp.net
PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF
where do you want to jump today? http://www.stackless.com