Mailing List Archive

Deadlock when interrupting interpreter initialisation with ptrace?
Hi there

I hope you don't mind me sharing my experience with testing the
austinp variant of Austin with Python >=2.7,<3.11.

The austinp variant is a variant of Austin
(https://github.com/P403n1x87/austin) for Linux that uses ptrace to
seize and interrupt/continue threads to capture native stack traces
using libunwind. During testing, I have discovered that there are good
chances of causing what looks like a deadlock in Python if the seizing
and interrupting of threads happen very early when spawning a Python
subprocess from austinp. This seems to coincide with the
initialisation of the interpreter when modules are being loaded. To
avoid interfering so destructively with Python, I have added a sleep
of about 0.5s on fork to prevent sampling during this initialisation
phase, which has helped significantly.

However, I think this poses one question: is this behaviour from
Python to be expected or is it perhaps an indication of a potential
bug? Whilst I find it conceivable that something like this could
happen, given the locking that happens around imports, is it
acceptable that the pausing and resuming of the execution of a thread
lead to a potential deadlock?

Cheers,
Gabriele
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EWE5IK53IAME7ODZOGCQGSSP4YBE37YX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Deadlock when interrupting interpreter initialisation with ptrace? [ In reply to ]
Hi Gabriele,

If everything you are doing is pausing and restarting, there should be no
reason
why this would interfere with anything more than if you are doing this at
any other time
other than the interpreter initialization. The only thing I can think of is
that at this stage
locking is much more common.

The other thing that could be at play is that ptrace sends SIGSTOP
on PTRACE_ATTACH but
the signal cannot be captured by the interpreter (or any other process) so
no signal handler
should be at play either.

Do you know what is involved in the deadlock (as in, what the threads are
waiting on)?

Answering your questions directly:

> However, I think this poses one question: is this behaviour from
Python to be expected or is it perhaps an indication of a potential
bug?

Is not expected or unexpected because is not something we support. Is not
also something we
explicitly forbid either, is just that there is nothing in the design or
the test suite that ensures that
this will work.

> is it acceptable that the pausing and resuming of the execution of a
thread
lead to a potential deadlock?

It depends if this is something that we can control in a reasonable way or
if this is outside our control.
It may be a bug in our code in which case we can try to fix it, but without
a more concrete pointer is
going to be complicated, especially given that is more likely that this is
outside our control.

We probably will reject any proposal to add complexity to support this use
case but we likely will
be happy to do small changes if there is something small that we do that is
preventing the use case.

Cheers from cloudy London,
Pablo Galindo Salgado


On Mon, 6 Jun 2022 at 15:38, Gabriele <phoenix1987@gmail.com> wrote:

> Hi there
>
> I hope you don't mind me sharing my experience with testing the
> austinp variant of Austin with Python >=2.7,<3.11.
>
> The austinp variant is a variant of Austin
> (https://github.com/P403n1x87/austin) for Linux that uses ptrace to
> seize and interrupt/continue threads to capture native stack traces
> using libunwind. During testing, I have discovered that there are good
> chances of causing what looks like a deadlock in Python if the seizing
> and interrupting of threads happen very early when spawning a Python
> subprocess from austinp. This seems to coincide with the
> initialisation of the interpreter when modules are being loaded. To
> avoid interfering so destructively with Python, I have added a sleep
> of about 0.5s on fork to prevent sampling during this initialisation
> phase, which has helped significantly.
>
> However, I think this poses one question: is this behaviour from
> Python to be expected or is it perhaps an indication of a potential
> bug? Whilst I find it conceivable that something like this could
> happen, given the locking that happens around imports, is it
> acceptable that the pausing and resuming of the execution of a thread
> lead to a potential deadlock?
>
> Cheers,
> Gabriele
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EWE5IK53IAME7ODZOGCQGSSP4YBE37YX/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Deadlock when interrupting interpreter initialisation with ptrace? [ In reply to ]
On Mon, Jun 6, 2022 at 4:35 PM Gabriele <phoenix1987@gmail.com> wrote:
> The austinp variant is a variant of Austin
> (https://github.com/P403n1x87/austin) for Linux that uses ptrace to
> seize and interrupt/continue threads to capture native stack traces
> using libunwind. During testing, I have discovered that there are good
> chances of causing what looks like a deadlock in Python if the seizing
> and interrupting of threads happen very early when spawning a Python
> subprocess from austinp.

Do you have a backtrace of the Python main thread when the hang
happens? How do you spawn a new process? With the Python subprocess
module?

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3OUS6BAV3B7A4BQBCLHLKUQMJM3PF646/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Deadlock when interrupting interpreter initialisation with ptrace? [ In reply to ]
> Do you know what is involved in the deadlock (as in, what the threads are waiting on)?

I've found it hard to give an answer to this question. Because austinp
is already tracing the interpreter, I cannot use, e.g., gdb to dump a
backtrace. The event is also quite rare and it seems to happen before
austinp has the chance to capture any samples. With the new support
for 3.11 I might be able to see if I come across the same issue with
the latest beta. I was hoping that the description of the issue could
have rung a bell for anybody more familiar than me with all the
locking going on during imports. The logs from austinp seem to suggest
that the thread fails to resume after being interrupted, so something
for me to explore is whether attempting to resume the thread more
times before giving up is the actual solution in this case.

> How do you spawn a new process?

I should have clarified that this is just a plain fork/exec from C:
https://github.com/P403n1x87/austin/blob/e3d79ddc9f9737a791362e6962b5cac25a4e3dc2/src/py_proc.c#L972-L1010

Cheers,
Gabriele

On Mon, 6 Jun 2022 at 16:30, Victor Stinner <vstinner@python.org> wrote:
>
> On Mon, Jun 6, 2022 at 4:35 PM Gabriele <phoenix1987@gmail.com> wrote:
> > The austinp variant is a variant of Austin
> > (https://github.com/P403n1x87/austin) for Linux that uses ptrace to
> > seize and interrupt/continue threads to capture native stack traces
> > using libunwind. During testing, I have discovered that there are good
> > chances of causing what looks like a deadlock in Python if the seizing
> > and interrupting of threads happen very early when spawning a Python
> > subprocess from austinp.
>
> Do you have a backtrace of the Python main thread when the hang
> happens? How do you spawn a new process? With the Python subprocess
> module?
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.



--
"Egli è scritto in lingua matematica, e i caratteri son triangoli,
cerchi, ed altre figure
geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola;
senza questi è un aggirarsi vanamente per un oscuro laberinto."

-- G. Galilei, Il saggiatore.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DHRBK2WNXQJNKKQDHTGNSKRQ32MO365H/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Deadlock when interrupting interpreter initialisation with ptrace? [ In reply to ]
> On 6 Jun 2022, at 17:52, Gabriele <phoenix1987@gmail.com> wrote:
>
> I've found it hard to give an answer to this question. Because austinp
> is already tracing the interpreter, I cannot use, e.g., gdb to dump a
> backtrace.


Don't you have the backtrace from libunwind that you could save from austinp itself?

Barry
Re: Deadlock when interrupting interpreter initialisation with ptrace? [ In reply to ]
> Don't you have the backtrace from libunwind that you could save from austinp itself?

Unfortunately no as the "deadlock" happens before any samples have a
chance to be collected. Upon further investigation, it seems that
trying to resume a thread over and over when ptrace fails takes quite
"some" time (in fact, more than I'd have hoped). Playing with a larger
wait timeout (100 ms, but the largest number I've seen so far on my
machine is 4 ms, which is still an eternity compared to a sensible
sampling interval of 10 ms) seems to "cure" the problem, which I've
only seen during interpreter initialisation. So perhaps Python itself
is off the hook!

On Mon, 6 Jun 2022 at 19:20, Barry Scott <barry@barrys-emacs.org> wrote:
>
>
>
> On 6 Jun 2022, at 17:52, Gabriele <phoenix1987@gmail.com> wrote:
>
> I've found it hard to give an answer to this question. Because austinp
> is already tracing the interpreter, I cannot use, e.g., gdb to dump a
> backtrace.
>
>
> Don't you have the backtrace from libunwind that you could save from austinp itself?
>
> Barry
>


--
"Egli è scritto in lingua matematica, e i caratteri son triangoli,
cerchi, ed altre figure
geometriche, senza i quali mezzi è impossibile a intenderne umanamente parola;
senza questi è un aggirarsi vanamente per un oscuro laberinto."

-- G. Galilei, Il saggiatore.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Y2GHXXQGJ6CNHHIO6DVERKWKDRUCOQIR/
Code of Conduct: http://python.org/psf/codeofconduct/