Mailing List Archive

Sub-interpreters: importing numpy causes hang
Hi all!

I am new to the list and arriving with a concrete problem that I'd
like to fix myself.

I am embedding Python (3.6) into my C++ application and I would like
to run Python scripts isolated from each other using sub-interpreters.
I am not using threads; everything is supposed to run in the
application's main thread.

I noticed that if I create an interpreter, switch to it and execute
code that imports numpy (1.13), my application will hang.

ntdll.dll!NtWaitForSingleObject() Unknown
KernelBase.dll!WaitForSingleObjectEx() Unknown
> python36.dll!_PyCOND_WAIT_MS(_PyCOND_T * cv=0x00000000748a67a0, _RTL_CRITICAL_SECTION * cs=0x00000000748a6778, unsigned long ms=5) Line 245 C
[Inline Frame] python36.dll!PyCOND_TIMEDWAIT(_PyCOND_T *) Line 275 C
python36.dll!take_gil(_ts * tstate=0x0000023251cbc260) Line 224 C
python36.dll!PyEval_RestoreThread(_ts * tstate=0x0000023251cbc260) Line 370 C
python36.dll!PyGILState_Ensure() Line 855 C
umath.cp36-win_amd64.pyd!00007ff8c6306ab2() Unknown
umath.cp36-win_amd64.pyd!00007ff8c630723c() Unknown
umath.cp36-win_amd64.pyd!00007ff8c6303a1d() Unknown
umath.cp36-win_amd64.pyd!00007ff8c63077c0() Unknown
umath.cp36-win_amd64.pyd!00007ff8c62ff926() Unknown
[Inline Frame] python36.dll!_PyObject_FastCallDict(_object *) Line 2316 C
[Inline Frame] python36.dll!_PyObject_FastCallKeywords(_object *) Line 2480 C
python36.dll!call_function(_object * * *
pp_stack=0x00000048be5f5e40, __int64 oparg, _object * kwnames) Line
4822 C

Numpy's extension umath calls PyGILState_Ensure(), which in turn calls
PyEval_RestoreThread on the (auto) threadstate of the main
interpreter. And that's wrong.
We are already holding the GIL with the threadstate of our current
sub-interpreter, so there's no need to switch.

I know that the GIL API is not fully compatible with sub-interpreters,
as issues #10915 and #15751 illustrate.

But since I need to support calls to PyGILState_Ensure - numpy is the
best example -, I am trying to improve the situation here:
https://github.com/stephanreiter/cpython/commit/d9d3451b038af2820f500843b6a88f57270e1597

That change may be naive, but it does the trick for my use case. If
totally wrong, I don't mind pursuing another alley.

Essentially, I'd like to ask for some guidance in how to tackle this
problem while keeping the current GIL API unchanged (to avoid breaking
modules).

I am also wondering how I can test any changes I am proposing. Is
there a test suite for interpreters, for example?

Thank you very much,
Stephan
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Tue, 22 Jan 2019 15:32:22 +0100
Stephan Reiter <stephan.reiter@gmail.com> wrote:
>
> Numpy's extension umath calls PyGILState_Ensure(), which in turn calls
> PyEval_RestoreThread on the (auto) threadstate of the main
> interpreter. And that's wrong.
> We are already holding the GIL with the threadstate of our current
> sub-interpreter, so there's no need to switch.
>
> I know that the GIL API is not fully compatible with sub-interpreters,
> as issues #10915 and #15751 illustrate.

That's a pity.

Note that there is a patch on https://bugs.python.org/issue10915 that
could probably solve the issue if it had been applied some years ago ;-)

(yes, it needs C extension authors to use the new API, but Numpy is a
well-maintained library and would probably have accepted a patch for
that; so would Cython probably)

> Essentially, I'd like to ask for some guidance in how to tackle this
> problem while keeping the current GIL API unchanged (to avoid breaking
> modules).

I'm not aware of any solution which does not require designing a new
API, unfortunately.

> I am also wondering how I can test any changes I am proposing. Is
> there a test suite for interpreters, for example?

You'll find a couple of them in test_embed.py, test_capi.py and
test_threading.py.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
There are currently numerous incompatibilities between numpy and
subinterpreters, and no concrete plan for fixing them. The numpy team does
not consider subinterpreters to be a supported configuration, and can't
help you with any issues you run into. I know the concept of
subinterpreters is really appealing, but unfortunately the CPython
implementation is not really mature or widely supported... are you
absolutely certain you need to use subinterpreters for your application?

On Tue, Jan 22, 2019, 08:27 Stephan Reiter <stephan.reiter@gmail.com wrote:

> Hi all!
>
> I am new to the list and arriving with a concrete problem that I'd
> like to fix myself.
>
> I am embedding Python (3.6) into my C++ application and I would like
> to run Python scripts isolated from each other using sub-interpreters.
> I am not using threads; everything is supposed to run in the
> application's main thread.
>
> I noticed that if I create an interpreter, switch to it and execute
> code that imports numpy (1.13), my application will hang.
>
> ntdll.dll!NtWaitForSingleObject() Unknown
> KernelBase.dll!WaitForSingleObjectEx() Unknown
> > python36.dll!_PyCOND_WAIT_MS(_PyCOND_T * cv=0x00000000748a67a0,
> _RTL_CRITICAL_SECTION * cs=0x00000000748a6778, unsigned long ms=5) Line 245
> C
> [Inline Frame] python36.dll!PyCOND_TIMEDWAIT(_PyCOND_T *) Line 275 C
> python36.dll!take_gil(_ts * tstate=0x0000023251cbc260) Line 224 C
> python36.dll!PyEval_RestoreThread(_ts * tstate=0x0000023251cbc260) Line
> 370 C
> python36.dll!PyGILState_Ensure() Line 855 C
> umath.cp36-win_amd64.pyd!00007ff8c6306ab2() Unknown
> umath.cp36-win_amd64.pyd!00007ff8c630723c() Unknown
> umath.cp36-win_amd64.pyd!00007ff8c6303a1d() Unknown
> umath.cp36-win_amd64.pyd!00007ff8c63077c0() Unknown
> umath.cp36-win_amd64.pyd!00007ff8c62ff926() Unknown
> [Inline Frame] python36.dll!_PyObject_FastCallDict(_object *) Line 2316 C
> [Inline Frame] python36.dll!_PyObject_FastCallKeywords(_object *) Line
> 2480 C
> python36.dll!call_function(_object * * *
> pp_stack=0x00000048be5f5e40, __int64 oparg, _object * kwnames) Line
> 4822 C
>
> Numpy's extension umath calls PyGILState_Ensure(), which in turn calls
> PyEval_RestoreThread on the (auto) threadstate of the main
> interpreter. And that's wrong.
> We are already holding the GIL with the threadstate of our current
> sub-interpreter, so there's no need to switch.
>
> I know that the GIL API is not fully compatible with sub-interpreters,
> as issues #10915 and #15751 illustrate.
>
> But since I need to support calls to PyGILState_Ensure - numpy is the
> best example -, I am trying to improve the situation here:
>
> https://github.com/stephanreiter/cpython/commit/d9d3451b038af2820f500843b6a88f57270e1597
>
> That change may be naive, but it does the trick for my use case. If
> totally wrong, I don't mind pursuing another alley.
>
> Essentially, I'd like to ask for some guidance in how to tackle this
> problem while keeping the current GIL API unchanged (to avoid breaking
> modules).
>
> I am also wondering how I can test any changes I am proposing. Is
> there a test suite for interpreters, for example?
>
> Thank you very much,
> Stephan
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
Thanks for the answers so far. I appreciate them!

Nathaniel, I'd like to allow Python plugins in my application. A
plugin should be allowed to bring its own modules along (i.e.
plugin-specific subdir is in sys.path when the plugin is active) and
hence some isolation of them will be needed, so that they can use
different versions of a given module. That's my main motivation for
using subinterpreters.
I thought about running plugins out-of-processes - a separate process
for every plugin - and allow them to communicate with my application
via RPC. But that makes it more complex to implement the API my
application will offer and will slow down things due to the need to
copy data.
Maybe you have another idea for me? :)

Henry, Antoine, thanks for your input; I'll check out the tests and
see what I can learn from issue 10915.

Stephan

Am Di., 22. Jan. 2019 um 22:39 Uhr schrieb Nathaniel Smith <njs@pobox.com>:
>
> There are currently numerous incompatibilities between numpy and subinterpreters, and no concrete plan for fixing them. The numpy team does not consider subinterpreters to be a supported configuration, and can't help you with any issues you run into. I know the concept of subinterpreters is really appealing, but unfortunately the CPython implementation is not really mature or widely supported... are you absolutely certain you need to use subinterpreters for your application?
>
> On Tue, Jan 22, 2019, 08:27 Stephan Reiter <stephan.reiter@gmail.com wrote:
>>
>> Hi all!
>>
>> I am new to the list and arriving with a concrete problem that I'd
>> like to fix myself.
>>
>> I am embedding Python (3.6) into my C++ application and I would like
>> to run Python scripts isolated from each other using sub-interpreters.
>> I am not using threads; everything is supposed to run in the
>> application's main thread.
>>
>> I noticed that if I create an interpreter, switch to it and execute
>> code that imports numpy (1.13), my application will hang.
>>
>> ntdll.dll!NtWaitForSingleObject() Unknown
>> KernelBase.dll!WaitForSingleObjectEx() Unknown
>> > python36.dll!_PyCOND_WAIT_MS(_PyCOND_T * cv=0x00000000748a67a0, _RTL_CRITICAL_SECTION * cs=0x00000000748a6778, unsigned long ms=5) Line 245 C
>> [Inline Frame] python36.dll!PyCOND_TIMEDWAIT(_PyCOND_T *) Line 275 C
>> python36.dll!take_gil(_ts * tstate=0x0000023251cbc260) Line 224 C
>> python36.dll!PyEval_RestoreThread(_ts * tstate=0x0000023251cbc260) Line 370 C
>> python36.dll!PyGILState_Ensure() Line 855 C
>> umath.cp36-win_amd64.pyd!00007ff8c6306ab2() Unknown
>> umath.cp36-win_amd64.pyd!00007ff8c630723c() Unknown
>> umath.cp36-win_amd64.pyd!00007ff8c6303a1d() Unknown
>> umath.cp36-win_amd64.pyd!00007ff8c63077c0() Unknown
>> umath.cp36-win_amd64.pyd!00007ff8c62ff926() Unknown
>> [Inline Frame] python36.dll!_PyObject_FastCallDict(_object *) Line 2316 C
>> [Inline Frame] python36.dll!_PyObject_FastCallKeywords(_object *) Line 2480 C
>> python36.dll!call_function(_object * * *
>> pp_stack=0x00000048be5f5e40, __int64 oparg, _object * kwnames) Line
>> 4822 C
>>
>> Numpy's extension umath calls PyGILState_Ensure(), which in turn calls
>> PyEval_RestoreThread on the (auto) threadstate of the main
>> interpreter. And that's wrong.
>> We are already holding the GIL with the threadstate of our current
>> sub-interpreter, so there's no need to switch.
>>
>> I know that the GIL API is not fully compatible with sub-interpreters,
>> as issues #10915 and #15751 illustrate.
>>
>> But since I need to support calls to PyGILState_Ensure - numpy is the
>> best example -, I am trying to improve the situation here:
>> https://github.com/stephanreiter/cpython/commit/d9d3451b038af2820f500843b6a88f57270e1597
>>
>> That change may be naive, but it does the trick for my use case. If
>> totally wrong, I don't mind pursuing another alley.
>>
>> Essentially, I'd like to ask for some guidance in how to tackle this
>> problem while keeping the current GIL API unchanged (to avoid breaking
>> modules).
>>
>> I am also wondering how I can test any changes I am proposing. Is
>> there a test suite for interpreters, for example?
>>
>> Thank you very much,
>> Stephan
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev@python.org
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Tue, Jan 22, 2019 at 6:33 PM Stephan Reiter <stephan.reiter@gmail.com> wrote:
>
> Thanks for the answers so far. I appreciate them!
>
> Nathaniel, I'd like to allow Python plugins in my application. A
> plugin should be allowed to bring its own modules along (i.e.
> plugin-specific subdir is in sys.path when the plugin is active) and
> hence some isolation of them will be needed, so that they can use
> different versions of a given module. That's my main motivation for
> using subinterpreters.
> I thought about running plugins out-of-processes - a separate process
> for every plugin - and allow them to communicate with my application
> via RPC. But that makes it more complex to implement the API my
> application will offer and will slow down things due to the need to
> copy data.
> Maybe you have another idea for me? :)

Not really, sorry! I believe that most applications that support
Python plugins (like blender, gimp, libreoffice, etc.), do it by using
a single shared environment for all plugins. This is also how every
application written in Python works, so at the ecosystem level there's
a lot of pressure on module authors to make it possible to assemble
them into a single coherent environment.

-n

--
Nathaniel J. Smith -- https://vorpus.org
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On 1/23/19 3:33 AM, Stephan Reiter wrote:
> Thanks for the answers so far. I appreciate them!
>
> Nathaniel, I'd like to allow Python plugins in my application. A
> plugin should be allowed to bring its own modules along (i.e.
> plugin-specific subdir is in sys.path when the plugin is active) and
> hence some isolation of them will be needed, so that they can use
> different versions of a given module. That's my main motivation for
> using subinterpreters.
> I thought about running plugins out-of-processes - a separate process
> for every plugin - and allow them to communicate with my application
> via RPC. But that makes it more complex to implement the API my
> application will offer and will slow down things due to the need to
> copy data.
> Maybe you have another idea for me? :)

Try to make the plugins work together. Look into using pip/PyPI for your
plugins. Try to make it so each package ("plugin") would have only one
module/package, and dependencies would be other packages that can be
installed individually and shared. And keep in mind you can set up your
own package index, or distribute/install individual package files.

If that's not possible, and you want things to work now, go with subprocess.

If you want to help make subinterpreters work better, there are several
people scratching at the problem from different angles. Most/all would
welcome help, but don't expect any short-term benefits.
(FWIW, my own effort is currently blocked on PEP 580, and I hope to move
forward after a Council is elected.)


> Henry, Antoine, thanks for your input; I'll check out the tests and
> see what I can learn from issue 10915.
>
> Stephan
>
> Am Di., 22. Jan. 2019 um 22:39 Uhr schrieb Nathaniel Smith <njs@pobox.com>:
>>
>> There are currently numerous incompatibilities between numpy and subinterpreters, and no concrete plan for fixing them. The numpy team does not consider subinterpreters to be a supported configuration, and can't help you with any issues you run into. I know the concept of subinterpreters is really appealing, but unfortunately the CPython implementation is not really mature or widely supported... are you absolutely certain you need to use subinterpreters for your application?
>>
>> On Tue, Jan 22, 2019, 08:27 Stephan Reiter <stephan.reiter@gmail.com wrote:
>>>
>>> Hi all!
>>>
>>> I am new to the list and arriving with a concrete problem that I'd
>>> like to fix myself.
>>>
>>> I am embedding Python (3.6) into my C++ application and I would like
>>> to run Python scripts isolated from each other using sub-interpreters.
>>> I am not using threads; everything is supposed to run in the
>>> application's main thread.
>>>
>>> I noticed that if I create an interpreter, switch to it and execute
>>> code that imports numpy (1.13), my application will hang.
>>>
>>> ntdll.dll!NtWaitForSingleObject() Unknown
>>> KernelBase.dll!WaitForSingleObjectEx() Unknown
>>>> python36.dll!_PyCOND_WAIT_MS(_PyCOND_T * cv=0x00000000748a67a0, _RTL_CRITICAL_SECTION * cs=0x00000000748a6778, unsigned long ms=5) Line 245 C
>>> [Inline Frame] python36.dll!PyCOND_TIMEDWAIT(_PyCOND_T *) Line 275 C
>>> python36.dll!take_gil(_ts * tstate=0x0000023251cbc260) Line 224 C
>>> python36.dll!PyEval_RestoreThread(_ts * tstate=0x0000023251cbc260) Line 370 C
>>> python36.dll!PyGILState_Ensure() Line 855 C
>>> umath.cp36-win_amd64.pyd!00007ff8c6306ab2() Unknown
>>> umath.cp36-win_amd64.pyd!00007ff8c630723c() Unknown
>>> umath.cp36-win_amd64.pyd!00007ff8c6303a1d() Unknown
>>> umath.cp36-win_amd64.pyd!00007ff8c63077c0() Unknown
>>> umath.cp36-win_amd64.pyd!00007ff8c62ff926() Unknown
>>> [Inline Frame] python36.dll!_PyObject_FastCallDict(_object *) Line 2316 C
>>> [Inline Frame] python36.dll!_PyObject_FastCallKeywords(_object *) Line 2480 C
>>> python36.dll!call_function(_object * * *
>>> pp_stack=0x00000048be5f5e40, __int64 oparg, _object * kwnames) Line
>>> 4822 C
>>>
>>> Numpy's extension umath calls PyGILState_Ensure(), which in turn calls
>>> PyEval_RestoreThread on the (auto) threadstate of the main
>>> interpreter. And that's wrong.
>>> We are already holding the GIL with the threadstate of our current
>>> sub-interpreter, so there's no need to switch.
>>>
>>> I know that the GIL API is not fully compatible with sub-interpreters,
>>> as issues #10915 and #15751 illustrate.
>>>
>>> But since I need to support calls to PyGILState_Ensure - numpy is the
>>> best example -, I am trying to improve the situation here:
>>> https://github.com/stephanreiter/cpython/commit/d9d3451b038af2820f500843b6a88f57270e1597
>>>
>>> That change may be naive, but it does the trick for my use case. If
>>> totally wrong, I don't mind pursuing another alley.
>>>
>>> Essentially, I'd like to ask for some guidance in how to tackle this
>>> problem while keeping the current GIL API unchanged (to avoid breaking
>>> modules).
>>>
>>> I am also wondering how I can test any changes I am proposing. Is
>>> there a test suite for interpreters, for example?
>>>
>>> Thank you very much,
>>> Stephan
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev@python.org
>>> https://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com
>
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
Hi!

Well, the plugins would be created by third-parties and I'd like them
to enable bunding of modules with their plugins.
I am afraid of modules with the same name, but being different, or
different versions of modules being used by different plugins. If
plugins share an interpreter, the module with a given name that is
imported first sticks around forever and for all plugins.

I am thinking about this design:
- Plugins don't maintain state in their Python world. They expose
functions, my application calls them.
- Everytime I call into them, they are presented with a clean global
namespace. After the call, the namespace (dict) is thrown away. That
releases any objects the plugin code has created.
- So, then I could also actively unload modules they loaded. But I do
know that this is problematic in particular for modules that use
native code.

I am interested in both a short-term and a long-term solution.
Actually, making subinterpreters work better is pretty sexy ...
because it's hard. :-)

Stephan

Am Mi., 23. Jan. 2019 um 11:30 Uhr schrieb Petr Viktorin <encukou@gmail.com>:
>
> On 1/23/19 3:33 AM, Stephan Reiter wrote:
> > Thanks for the answers so far. I appreciate them!
> >
> > Nathaniel, I'd like to allow Python plugins in my application. A
> > plugin should be allowed to bring its own modules along (i.e.
> > plugin-specific subdir is in sys.path when the plugin is active) and
> > hence some isolation of them will be needed, so that they can use
> > different versions of a given module. That's my main motivation for
> > using subinterpreters.
> > I thought about running plugins out-of-processes - a separate process
> > for every plugin - and allow them to communicate with my application
> > via RPC. But that makes it more complex to implement the API my
> > application will offer and will slow down things due to the need to
> > copy data.
> > Maybe you have another idea for me? :)
>
> Try to make the plugins work together. Look into using pip/PyPI for your
> plugins. Try to make it so each package ("plugin") would have only one
> module/package, and dependencies would be other packages that can be
> installed individually and shared. And keep in mind you can set up your
> own package index, or distribute/install individual package files.
>
> If that's not possible, and you want things to work now, go with subprocess.
>
> If you want to help make subinterpreters work better, there are several
> people scratching at the problem from different angles. Most/all would
> welcome help, but don't expect any short-term benefits.
> (FWIW, my own effort is currently blocked on PEP 580, and I hope to move
> forward after a Council is elected.)
>
>
> > Henry, Antoine, thanks for your input; I'll check out the tests and
> > see what I can learn from issue 10915.
> >
> > Stephan
> >
> > Am Di., 22. Jan. 2019 um 22:39 Uhr schrieb Nathaniel Smith <njs@pobox.com>:
> >>
> >> There are currently numerous incompatibilities between numpy and subinterpreters, and no concrete plan for fixing them. The numpy team does not consider subinterpreters to be a supported configuration, and can't help you with any issues you run into. I know the concept of subinterpreters is really appealing, but unfortunately the CPython implementation is not really mature or widely supported... are you absolutely certain you need to use subinterpreters for your application?
> >>
> >> On Tue, Jan 22, 2019, 08:27 Stephan Reiter <stephan.reiter@gmail.com wrote:
> >>>
> >>> Hi all!
> >>>
> >>> I am new to the list and arriving with a concrete problem that I'd
> >>> like to fix myself.
> >>>
> >>> I am embedding Python (3.6) into my C++ application and I would like
> >>> to run Python scripts isolated from each other using sub-interpreters.
> >>> I am not using threads; everything is supposed to run in the
> >>> application's main thread.
> >>>
> >>> I noticed that if I create an interpreter, switch to it and execute
> >>> code that imports numpy (1.13), my application will hang.
> >>>
> >>> ntdll.dll!NtWaitForSingleObject() Unknown
> >>> KernelBase.dll!WaitForSingleObjectEx() Unknown
> >>>> python36.dll!_PyCOND_WAIT_MS(_PyCOND_T * cv=0x00000000748a67a0, _RTL_CRITICAL_SECTION * cs=0x00000000748a6778, unsigned long ms=5) Line 245 C
> >>> [Inline Frame] python36.dll!PyCOND_TIMEDWAIT(_PyCOND_T *) Line 275 C
> >>> python36.dll!take_gil(_ts * tstate=0x0000023251cbc260) Line 224 C
> >>> python36.dll!PyEval_RestoreThread(_ts * tstate=0x0000023251cbc260) Line 370 C
> >>> python36.dll!PyGILState_Ensure() Line 855 C
> >>> umath.cp36-win_amd64.pyd!00007ff8c6306ab2() Unknown
> >>> umath.cp36-win_amd64.pyd!00007ff8c630723c() Unknown
> >>> umath.cp36-win_amd64.pyd!00007ff8c6303a1d() Unknown
> >>> umath.cp36-win_amd64.pyd!00007ff8c63077c0() Unknown
> >>> umath.cp36-win_amd64.pyd!00007ff8c62ff926() Unknown
> >>> [Inline Frame] python36.dll!_PyObject_FastCallDict(_object *) Line 2316 C
> >>> [Inline Frame] python36.dll!_PyObject_FastCallKeywords(_object *) Line 2480 C
> >>> python36.dll!call_function(_object * * *
> >>> pp_stack=0x00000048be5f5e40, __int64 oparg, _object * kwnames) Line
> >>> 4822 C
> >>>
> >>> Numpy's extension umath calls PyGILState_Ensure(), which in turn calls
> >>> PyEval_RestoreThread on the (auto) threadstate of the main
> >>> interpreter. And that's wrong.
> >>> We are already holding the GIL with the threadstate of our current
> >>> sub-interpreter, so there's no need to switch.
> >>>
> >>> I know that the GIL API is not fully compatible with sub-interpreters,
> >>> as issues #10915 and #15751 illustrate.
> >>>
> >>> But since I need to support calls to PyGILState_Ensure - numpy is the
> >>> best example -, I am trying to improve the situation here:
> >>> https://github.com/stephanreiter/cpython/commit/d9d3451b038af2820f500843b6a88f57270e1597
> >>>
> >>> That change may be naive, but it does the trick for my use case. If
> >>> totally wrong, I don't mind pursuing another alley.
> >>>
> >>> Essentially, I'd like to ask for some guidance in how to tackle this
> >>> problem while keeping the current GIL API unchanged (to avoid breaking
> >>> modules).
> >>>
> >>> I am also wondering how I can test any changes I am proposing. Is
> >>> there a test suite for interpreters, for example?
> >>>
> >>> Thank you very much,
> >>> Stephan
> >>> _______________________________________________
> >>> Python-Dev mailing list
> >>> Python-Dev@python.org
> >>> https://mail.python.org/mailman/listinfo/python-dev
> >>> Unsubscribe: https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/encukou%40gmail.com
> >
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
Hi Stephan,

On Tue, Jan 22, 2019 at 9:25 AM Stephan Reiter <stephan.reiter@gmail.com> wrote:
> I am new to the list and arriving with a concrete problem that I'd
> like to fix myself.

That is great! Statements like that are a good way to get folks
interested in your success. :)

> I am embedding Python (3.6) into my C++ application and I would like
> to run Python scripts isolated from each other using sub-interpreters.
> I am not using threads; everything is supposed to run in the
> application's main thread.

FYI, running multiple interpreters in the same (e.g. main) thread
isn't as well thought out as running them in separate threads. There
may be assumptions in the runtime that would cause crashes or
inconsistency in the runtime, so be vigilant. Is there a reason not
to run the subinterpreters in separate threads?

Regarding isolation, keep in mind that there are some limitations. At
an intrinsic level subinterpreters are never truly isolated since they
run in the same process. This matters if you have concerns about
security (which you should always consider) and stability (if a
subinterpreter crashes then your whole process crashes). You can find
that complete isolation via subprocess & multiprocessing.

On top of intrinsic isolation, currently subinterpreters have gaps in
isolation that need fixing. For instance, they share a lot of
module-global state, as well as builtin types and singletons. So data
can leak between subinterpreters unexpectedly.

Finally, at the Python level subinterpreters don't have a good way to
pass data around. (I'm working on that. [1]) Naturally at the C
level you can keep pointers to objects and share data that way. Just
keep in mind that doing so relies on the GIL (in an
interpreter-per-thread scenario, which you're avoiding). In a world
where subinterpreters don't share the GIL [2] (and you're running one
interpreter per thread) you'll end up with refcounting races, leading
to crashes. Just keep that mind if you decide to switch to
one-subinterpreter-per-thread.

On Tue, Jan 22, 2019 at 8:09 PM Stephan Reiter <stephan.reiter@gmail.com> wrote:
> Nathaniel, I'd like to allow Python plugins in my application. A
> plugin should be allowed to bring its own modules along (i.e.
> plugin-specific subdir is in sys.path when the plugin is active) and
> hence some isolation of them will be needed, so that they can use
> different versions of a given module. That's my main motivation for
> using subinterpreters.

That's an interesting approach. Using subinterpreters would indeed
give you isolation between the sets of imported modules.

As you noticed, you'll run into some problems when extension modules
are involved. There aren't any great workarounds yet .
Subinterpreters are tied pretty tightly to the core runtime so it's
hard to attack the problem from the outside. Furthermore,
subinterpreters aren't widely used yet so folks haven't been very
motivated to fix the runtime. (FWIW, that is changing.)

> I thought about running plugins out-of-processes - a separate process
> for every plugin - and allow them to communicate with my application
> via RPC. But that makes it more complex to implement the API my
> application will offer and will slow down things due to the need to
> copy data.

Yep. It might be worth it though. Note that running
plugins/extensions in separate processes is a fairly common approach
for a variety of solid technical reasons (e.g. security, stability).
FWIW, there are some tools available (or soon to be) for sharing data
more efficiently (e.g. shared memory in multiprocessing, PEP 574)

> Maybe you have another idea for me? :)

* single proc -- keep using subinterpreters
+ dlmopen or the Windows equivalent (I hesitate to suggest this
hack, but it might help somewhat with extension modules)
+ help fix the problems with subinterpreters :)
* single proc -- no subinterpreters
+ import hook to put plugins in their own namespace (tricky with
extension modules)
+ extend importlib to do the same
+ swap sys.modules in and out around plugin use
* multi-proc -- one process per plugin
+ subprocess
+ multiprocessing

On Wed, Jan 23, 2019 at 8:48 AM Stephan Reiter <stephan.reiter@gmail.com> wrote:
> Well, the plugins would be created by third-parties and I'd like them
> to enable bunding of modules with their plugins.
> I am afraid of modules with the same name, but being different, or
> different versions of modules being used by different plugins. If
> plugins share an interpreter, the module with a given name that is
> imported first sticks around forever and for all plugins.
>
> I am thinking about this design:
> - Plugins don't maintain state in their Python world. They expose
> functions, my application calls them.
> - Everytime I call into them, they are presented with a clean global
> namespace. After the call, the namespace (dict) is thrown away. That
> releases any objects the plugin code has created.
> - So, then I could also actively unload modules they loaded. But I do
> know that this is problematic in particular for modules that use
> native code.
>
> I am interested in both a short-term and a long-term solution.
> Actually, making subinterpreters work better is pretty sexy ...
> because it's hard. :-)

Petr noted that a number of people are working on getting
subinterpreters to a good place. That includes me. [1][2] :) We'd
welcome any help!

-eric


[1] https://www.python.org/dev/peps/pep-0554/
[2] https://github.com/ericsnowcurrently/multi-core-python
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
You all do make me feel very welcome in this community! Thank you very much! :-)

And thank you for all the thought and time you put into your message,
Eric. I do appreciate in particular all the alternatives you
presented; you provide a good picture of my options.
Not ruling out any of them, I'll stick with (single process + multiple
subinterpreters + plugins can't keep state in Python + all my Python
calls are performed on the main thread) for the time being. That's
quite a limited environment, which I hope I can make work in the long
run. And I think the concept of subinterpreters is nice and I'd like
to spend some time on the challenge of improving the situation.

So, I updated my changes and have the following on top of 3.6.1 at the moment:
https://github.com/stephanreiter/cpython/commit/c1afa0c8cdfab862f409f1c7ff02b189f5191cbe

I did what Henry suggested and ran the Python test suite. On Windows,
with my changes I get as output:

357 tests OK.

2 tests failed:
test_re test_subprocess

46 tests skipped:
test_bz2 test_crypt test_curses test_dbm_gnu test_dbm_ndbm
test_devpoll test_epoll test_fcntl test_fork1 test_gdb test_grp
test_idle test_ioctl test_kqueue test_lzma test_nis test_openpty
test_ossaudiodev test_pipes test_poll test_posix test_pty test_pwd
test_readline test_resource test_smtpnet test_socketserver
test_spwd test_sqlite test_ssl test_syslog test_tcl
test_threadsignals test_timeout test_tix test_tk test_ttk_guionly
test_ttk_textonly test_turtle test_urllib2net test_urllibnet
test_wait3 test_wait4 test_winsound test_xmlrpc_net test_zipfile64

Total duration: 6 min 20 sec
Tests result: FAILURE

I dropped my changes and ran the test suite again using vanilla Python
and got the same result.
So, it seems that the change doesn't break anything that is tested,
but that probably doesn't mean a lot.

Tomorrow, I'll investigate the following situation if I find time:

If we create a fresh OS thread and make it call PyGILState_Ensure, it
won't have a PyThreadState saved under autoTLSkey. That means it will
create one using the main interpreter. I, as the developer embedding
Python into my application and using multiple interpreters, have no
control here. Maybe I know that under current conditions a certain
other interpreter should be used.

I'll try to provoke this situation and then introduce a callback from
Python into my application that will allow me to specify which
interpreter should be used, e.g. code as follows:

PyInterpreter *pickAnInterpreter() {
return activePlugin ? activePlugin->interpreter : nullptr; //
nullptr maps to main interpreter
}

PyGILState_SetNewThreadInterpreterSelectionCallback(&pickAnInterpreter);

Maybe rubbish. But I think a valuable experiment that will give me a
better understanding.

Stephan

Am Mi., 23. Jan. 2019 um 18:11 Uhr schrieb Eric Snow
<ericsnowcurrently@gmail.com>:
>
> Hi Stephan,
>
> On Tue, Jan 22, 2019 at 9:25 AM Stephan Reiter <stephan.reiter@gmail.com> wrote:
> > I am new to the list and arriving with a concrete problem that I'd
> > like to fix myself.
>
> That is great! Statements like that are a good way to get folks
> interested in your success. :)
>
> > I am embedding Python (3.6) into my C++ application and I would like
> > to run Python scripts isolated from each other using sub-interpreters.
> > I am not using threads; everything is supposed to run in the
> > application's main thread.
>
> FYI, running multiple interpreters in the same (e.g. main) thread
> isn't as well thought out as running them in separate threads. There
> may be assumptions in the runtime that would cause crashes or
> inconsistency in the runtime, so be vigilant. Is there a reason not
> to run the subinterpreters in separate threads?
>
> Regarding isolation, keep in mind that there are some limitations. At
> an intrinsic level subinterpreters are never truly isolated since they
> run in the same process. This matters if you have concerns about
> security (which you should always consider) and stability (if a
> subinterpreter crashes then your whole process crashes). You can find
> that complete isolation via subprocess & multiprocessing.
>
> On top of intrinsic isolation, currently subinterpreters have gaps in
> isolation that need fixing. For instance, they share a lot of
> module-global state, as well as builtin types and singletons. So data
> can leak between subinterpreters unexpectedly.
>
> Finally, at the Python level subinterpreters don't have a good way to
> pass data around. (I'm working on that. [1]) Naturally at the C
> level you can keep pointers to objects and share data that way. Just
> keep in mind that doing so relies on the GIL (in an
> interpreter-per-thread scenario, which you're avoiding). In a world
> where subinterpreters don't share the GIL [2] (and you're running one
> interpreter per thread) you'll end up with refcounting races, leading
> to crashes. Just keep that mind if you decide to switch to
> one-subinterpreter-per-thread.
>
> On Tue, Jan 22, 2019 at 8:09 PM Stephan Reiter <stephan.reiter@gmail.com> wrote:
> > Nathaniel, I'd like to allow Python plugins in my application. A
> > plugin should be allowed to bring its own modules along (i.e.
> > plugin-specific subdir is in sys.path when the plugin is active) and
> > hence some isolation of them will be needed, so that they can use
> > different versions of a given module. That's my main motivation for
> > using subinterpreters.
>
> That's an interesting approach. Using subinterpreters would indeed
> give you isolation between the sets of imported modules.
>
> As you noticed, you'll run into some problems when extension modules
> are involved. There aren't any great workarounds yet .
> Subinterpreters are tied pretty tightly to the core runtime so it's
> hard to attack the problem from the outside. Furthermore,
> subinterpreters aren't widely used yet so folks haven't been very
> motivated to fix the runtime. (FWIW, that is changing.)
>
> > I thought about running plugins out-of-processes - a separate process
> > for every plugin - and allow them to communicate with my application
> > via RPC. But that makes it more complex to implement the API my
> > application will offer and will slow down things due to the need to
> > copy data.
>
> Yep. It might be worth it though. Note that running
> plugins/extensions in separate processes is a fairly common approach
> for a variety of solid technical reasons (e.g. security, stability).
> FWIW, there are some tools available (or soon to be) for sharing data
> more efficiently (e.g. shared memory in multiprocessing, PEP 574)
>
> > Maybe you have another idea for me? :)
>
> * single proc -- keep using subinterpreters
> + dlmopen or the Windows equivalent (I hesitate to suggest this
> hack, but it might help somewhat with extension modules)
> + help fix the problems with subinterpreters :)
> * single proc -- no subinterpreters
> + import hook to put plugins in their own namespace (tricky with
> extension modules)
> + extend importlib to do the same
> + swap sys.modules in and out around plugin use
> * multi-proc -- one process per plugin
> + subprocess
> + multiprocessing
>
> On Wed, Jan 23, 2019 at 8:48 AM Stephan Reiter <stephan.reiter@gmail.com> wrote:
> > Well, the plugins would be created by third-parties and I'd like them
> > to enable bunding of modules with their plugins.
> > I am afraid of modules with the same name, but being different, or
> > different versions of modules being used by different plugins. If
> > plugins share an interpreter, the module with a given name that is
> > imported first sticks around forever and for all plugins.
> >
> > I am thinking about this design:
> > - Plugins don't maintain state in their Python world. They expose
> > functions, my application calls them.
> > - Everytime I call into them, they are presented with a clean global
> > namespace. After the call, the namespace (dict) is thrown away. That
> > releases any objects the plugin code has created.
> > - So, then I could also actively unload modules they loaded. But I do
> > know that this is problematic in particular for modules that use
> > native code.
> >
> > I am interested in both a short-term and a long-term solution.
> > Actually, making subinterpreters work better is pretty sexy ...
> > because it's hard. :-)
>
> Petr noted that a number of people are working on getting
> subinterpreters to a good place. That includes me. [1][2] :) We'd
> welcome any help!
>
> -eric
>
>
> [1] https://www.python.org/dev/peps/pep-0554/
> [2] https://github.com/ericsnowcurrently/multi-core-python
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
If your primary concern is module clashes between plugins, maybe you
can hack around that:

1) if the plugins are providing copies of any other modules, then you
can simply require them to put them in their own namespace — that is,
a plug-in is a single package, with however many sub modules as it may
need.

2) if plugins might require third party packages that need to be
isolated, then maybe you could use an import hook that
re-names/isolates the modules each plugin loads, so they are kept
separate.

I haven’t thought through how to do any of this, but in principle, you
can have the same module loaded twice if it has a different name.


Not that sub interpreters aren’t cool and useful, but you can probably
handle module clashes in a simpler way.

-CHB



Sent from my iPhone

> On Jan 23, 2019, at 11:41 AM, Stephan Reiter <stephan.reiter@gmail.com> wrote:
>
> You all do make me feel very welcome in this community! Thank you very much! :-)
>
> And thank you for all the thought and time you put into your message,
> Eric. I do appreciate in particular all the alternatives you
> presented; you provide a good picture of my options.
> Not ruling out any of them, I'll stick with (single process + multiple
> subinterpreters + plugins can't keep state in Python + all my Python
> calls are performed on the main thread) for the time being. That's
> quite a limited environment, which I hope I can make work in the long
> run. And I think the concept of subinterpreters is nice and I'd like
> to spend some time on the challenge of improving the situation.
>
> So, I updated my changes and have the following on top of 3.6.1 at the moment:
> https://github.com/stephanreiter/cpython/commit/c1afa0c8cdfab862f409f1c7ff02b189f5191cbe
>
> I did what Henry suggested and ran the Python test suite. On Windows,
> with my changes I get as output:
>
> 357 tests OK.
>
> 2 tests failed:
> test_re test_subprocess
>
> 46 tests skipped:
> test_bz2 test_crypt test_curses test_dbm_gnu test_dbm_ndbm
> test_devpoll test_epoll test_fcntl test_fork1 test_gdb test_grp
> test_idle test_ioctl test_kqueue test_lzma test_nis test_openpty
> test_ossaudiodev test_pipes test_poll test_posix test_pty test_pwd
> test_readline test_resource test_smtpnet test_socketserver
> test_spwd test_sqlite test_ssl test_syslog test_tcl
> test_threadsignals test_timeout test_tix test_tk test_ttk_guionly
> test_ttk_textonly test_turtle test_urllib2net test_urllibnet
> test_wait3 test_wait4 test_winsound test_xmlrpc_net test_zipfile64
>
> Total duration: 6 min 20 sec
> Tests result: FAILURE
>
> I dropped my changes and ran the test suite again using vanilla Python
> and got the same result.
> So, it seems that the change doesn't break anything that is tested,
> but that probably doesn't mean a lot.
>
> Tomorrow, I'll investigate the following situation if I find time:
>
> If we create a fresh OS thread and make it call PyGILState_Ensure, it
> won't have a PyThreadState saved under autoTLSkey. That means it will
> create one using the main interpreter. I, as the developer embedding
> Python into my application and using multiple interpreters, have no
> control here. Maybe I know that under current conditions a certain
> other interpreter should be used.
>
> I'll try to provoke this situation and then introduce a callback from
> Python into my application that will allow me to specify which
> interpreter should be used, e.g. code as follows:
>
> PyInterpreter *pickAnInterpreter() {
> return activePlugin ? activePlugin->interpreter : nullptr; //
> nullptr maps to main interpreter
> }
>
> PyGILState_SetNewThreadInterpreterSelectionCallback(&pickAnInterpreter);
>
> Maybe rubbish. But I think a valuable experiment that will give me a
> better understanding.
>
> Stephan
>
> Am Mi., 23. Jan. 2019 um 18:11 Uhr schrieb Eric Snow
> <ericsnowcurrently@gmail.com>:
>>
>> Hi Stephan,
>>
>>> On Tue, Jan 22, 2019 at 9:25 AM Stephan Reiter <stephan.reiter@gmail.com> wrote:
>>> I am new to the list and arriving with a concrete problem that I'd
>>> like to fix myself.
>>
>> That is great! Statements like that are a good way to get folks
>> interested in your success. :)
>>
>>> I am embedding Python (3.6) into my C++ application and I would like
>>> to run Python scripts isolated from each other using sub-interpreters.
>>> I am not using threads; everything is supposed to run in the
>>> application's main thread.
>>
>> FYI, running multiple interpreters in the same (e.g. main) thread
>> isn't as well thought out as running them in separate threads. There
>> may be assumptions in the runtime that would cause crashes or
>> inconsistency in the runtime, so be vigilant. Is there a reason not
>> to run the subinterpreters in separate threads?
>>
>> Regarding isolation, keep in mind that there are some limitations. At
>> an intrinsic level subinterpreters are never truly isolated since they
>> run in the same process. This matters if you have concerns about
>> security (which you should always consider) and stability (if a
>> subinterpreter crashes then your whole process crashes). You can find
>> that complete isolation via subprocess & multiprocessing.
>>
>> On top of intrinsic isolation, currently subinterpreters have gaps in
>> isolation that need fixing. For instance, they share a lot of
>> module-global state, as well as builtin types and singletons. So data
>> can leak between subinterpreters unexpectedly.
>>
>> Finally, at the Python level subinterpreters don't have a good way to
>> pass data around. (I'm working on that. [1]) Naturally at the C
>> level you can keep pointers to objects and share data that way. Just
>> keep in mind that doing so relies on the GIL (in an
>> interpreter-per-thread scenario, which you're avoiding). In a world
>> where subinterpreters don't share the GIL [2] (and you're running one
>> interpreter per thread) you'll end up with refcounting races, leading
>> to crashes. Just keep that mind if you decide to switch to
>> one-subinterpreter-per-thread.
>>
>>> On Tue, Jan 22, 2019 at 8:09 PM Stephan Reiter <stephan.reiter@gmail.com> wrote:
>>> Nathaniel, I'd like to allow Python plugins in my application. A
>>> plugin should be allowed to bring its own modules along (i.e.
>>> plugin-specific subdir is in sys.path when the plugin is active) and
>>> hence some isolation of them will be needed, so that they can use
>>> different versions of a given module. That's my main motivation for
>>> using subinterpreters.
>>
>> That's an interesting approach. Using subinterpreters would indeed
>> give you isolation between the sets of imported modules.
>>
>> As you noticed, you'll run into some problems when extension modules
>> are involved. There aren't any great workarounds yet .
>> Subinterpreters are tied pretty tightly to the core runtime so it's
>> hard to attack the problem from the outside. Furthermore,
>> subinterpreters aren't widely used yet so folks haven't been very
>> motivated to fix the runtime. (FWIW, that is changing.)
>>
>>> I thought about running plugins out-of-processes - a separate process
>>> for every plugin - and allow them to communicate with my application
>>> via RPC. But that makes it more complex to implement the API my
>>> application will offer and will slow down things due to the need to
>>> copy data.
>>
>> Yep. It might be worth it though. Note that running
>> plugins/extensions in separate processes is a fairly common approach
>> for a variety of solid technical reasons (e.g. security, stability).
>> FWIW, there are some tools available (or soon to be) for sharing data
>> more efficiently (e.g. shared memory in multiprocessing, PEP 574)
>>
>>> Maybe you have another idea for me? :)
>>
>> * single proc -- keep using subinterpreters
>> + dlmopen or the Windows equivalent (I hesitate to suggest this
>> hack, but it might help somewhat with extension modules)
>> + help fix the problems with subinterpreters :)
>> * single proc -- no subinterpreters
>> + import hook to put plugins in their own namespace (tricky with
>> extension modules)
>> + extend importlib to do the same
>> + swap sys.modules in and out around plugin use
>> * multi-proc -- one process per plugin
>> + subprocess
>> + multiprocessing
>>
>>> On Wed, Jan 23, 2019 at 8:48 AM Stephan Reiter <stephan.reiter@gmail.com> wrote:
>>> Well, the plugins would be created by third-parties and I'd like them
>>> to enable bunding of modules with their plugins.
>>> I am afraid of modules with the same name, but being different, or
>>> different versions of modules being used by different plugins. If
>>> plugins share an interpreter, the module with a given name that is
>>> imported first sticks around forever and for all plugins.
>>>
>>> I am thinking about this design:
>>> - Plugins don't maintain state in their Python world. They expose
>>> functions, my application calls them.
>>> - Everytime I call into them, they are presented with a clean global
>>> namespace. After the call, the namespace (dict) is thrown away. That
>>> releases any objects the plugin code has created.
>>> - So, then I could also actively unload modules they loaded. But I do
>>> know that this is problematic in particular for modules that use
>>> native code.
>>>
>>> I am interested in both a short-term and a long-term solution.
>>> Actually, making subinterpreters work better is pretty sexy ...
>>> because it's hard. :-)
>>
>> Petr noted that a number of people are working on getting
>> subinterpreters to a good place. That includes me. [1][2] :) We'd
>> welcome any help!
>>
>> -eric
>>
>>
>> [1] https://www.python.org/dev/peps/pep-0554/
>> [2] https://github.com/ericsnowcurrently/multi-core-python
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Thu, 24 Jan 2019 at 05:45, Stephan Reiter <stephan.reiter@gmail.com> wrote:
> If we create a fresh OS thread and make it call PyGILState_Ensure, it
> won't have a PyThreadState saved under autoTLSkey. That means it will
> create one using the main interpreter. I, as the developer embedding
> Python into my application and using multiple interpreters, have no
> control here. Maybe I know that under current conditions a certain
> other interpreter should be used.
>
> I'll try to provoke this situation and then introduce a callback from
> Python into my application that will allow me to specify which
> interpreter should be used, e.g. code as follows:
>
> PyInterpreter *pickAnInterpreter() {
> return activePlugin ? activePlugin->interpreter : nullptr; //
> nullptr maps to main interpreter
> }
>
> PyGILState_SetNewThreadInterpreterSelectionCallback(&pickAnInterpreter);
>
> Maybe rubbish. But I think a valuable experiment that will give me a
> better understanding.

That actually sounds like a pretty plausible approach to me, at least
for cases where the embedding application maintains some other state
that lets it know which interpreter a new thread should be associated
with. The best aspect of it is that it would let the embedding
application decide how to handle registration of previously unknown
threads with the Python runtime *without* requiring that all existing
extension modules switch to a new thread registration API first.

I'll pass the concept along to Graham Dumpleton (author of the
mod_wsgi module for Apache httpd) to see if an interface like this
might be enough to resolve some of the major compatibility issues
mod_wsgi currently encounters with subinterpreters.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
Cool. Thanks, Nick!

I did experiments based on this idea (
https://github.com/stephanreiter/cpython/commit/3bca91c26ac81e517b4aa22302be1741b3315622)
and haven't rejected it yet. :-)

Together with the other fix (
https://github.com/stephanreiter/cpython/commit/c1afa0c8cdfab862f409f1c7ff02b189f5191cbe),
numpy at least is happy in my Python-hosting app.

I will pursue the idea of swapping sys.modules in a single Interpreter now
because that wouldn't require patching Python and I might get the mileage
out of this approach I need.

Still interested in improving sub-interpreters, though. I just need to
balance short and long term solution. :-)

Stephan


Den søn. 27. jan. 2019, 15.17 skrev Nick Coghlan <ncoghlan@gmail.com:

> On Thu, 24 Jan 2019 at 05:45, Stephan Reiter <stephan.reiter@gmail.com>
> wrote:
> > If we create a fresh OS thread and make it call PyGILState_Ensure, it
> > won't have a PyThreadState saved under autoTLSkey. That means it will
> > create one using the main interpreter. I, as the developer embedding
> > Python into my application and using multiple interpreters, have no
> > control here. Maybe I know that under current conditions a certain
> > other interpreter should be used.
> >
> > I'll try to provoke this situation and then introduce a callback from
> > Python into my application that will allow me to specify which
> > interpreter should be used, e.g. code as follows:
> >
> > PyInterpreter *pickAnInterpreter() {
> > return activePlugin ? activePlugin->interpreter : nullptr; //
> > nullptr maps to main interpreter
> > }
> >
> > PyGILState_SetNewThreadInterpreterSelectionCallback(&pickAnInterpreter);
> >
> > Maybe rubbish. But I think a valuable experiment that will give me a
> > better understanding.
>
> That actually sounds like a pretty plausible approach to me, at least
> for cases where the embedding application maintains some other state
> that lets it know which interpreter a new thread should be associated
> with. The best aspect of it is that it would let the embedding
> application decide how to handle registration of previously unknown
> threads with the Python runtime *without* requiring that all existing
> extension modules switch to a new thread registration API first.
>
> I'll pass the concept along to Graham Dumpleton (author of the
> mod_wsgi module for Apache httpd) to see if an interface like this
> might be enough to resolve some of the major compatibility issues
> mod_wsgi currently encounters with subinterpreters.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
>
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Sun, Jan 27, 2019, 06:34 Stephan Reiter <stephan.reiter@gmail.com wrote:

> Cool. Thanks, Nick!
>
> I did experiments based on this idea (
> https://github.com/stephanreiter/cpython/commit/3bca91c26ac81e517b4aa22302be1741b3315622)
> and haven't rejected it yet. :-)
>
> Together with the other fix (
> https://github.com/stephanreiter/cpython/commit/c1afa0c8cdfab862f409f1c7ff02b189f5191cbe),
> numpy at least is happy in my Python-hosting app.
>

So again, just to make sure you're aware, even if it looks like it's
working right now, there are definitely many subtle ways that numpy will
break when used in a subinterpreter and this configuration is not supported
by the numpy devs. If you discover later that there's some strange crash,
or even that you've been getting incorrect results for months without
noticing, then the numpy devs will be sympathetic but will probably close
your bugs without further investigation.

-n
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Thu, Jan 24, 2019 at 1:25 PM Chris Barker - NOAA Federal via Python-Dev <
python-dev@python.org> wrote:

> If your primary concern is module clashes between plugins, maybe you
> can hack around that:
>
> 1) if the plugins are providing copies of any other modules, then you
> can simply require them to put them in their own namespace — that is,
> a plug-in is a single package, with however many sub modules as it may
> need.
>
> 2) if plugins might require third party packages that need to be
> isolated, then maybe you could use an import hook that
> re-names/isolates the modules each plugin loads, so they are kept
> separate.
>
> I haven’t thought through how to do any of this, but in principle, you
> can have the same module loaded twice if it has a different name.
>

This is dangerous for extension modules. C is a single global space
unrelated to Python module names that cannot be isolated without
intentionally building and linking each desired extension module statically
and configured not to export its own symbols (no-export-dynamic). Non
trivial.

Suggesting importing the same extension module multiple times under
different Python sys.modules names is a recipe for disaster. Most
extension module code is not written with that in mind. So while *some*
things happen to "work", many others blow up in unexpected hard to debug
ways.

Not that sub interpreters aren’t cool and useful, but you can probably
> handle module clashes in a simpler way.
>

They're a cool and useful theory... but I really do not recommend their use
for code importing other libraries expecting to be isolated. CPython
doesn't offer multiple isolated runtimes in a process today.

-gps
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Mon, 28 Jan 2019 at 00:32, Stephan Reiter <stephan.reiter@gmail.com> wrote:
>
> Cool. Thanks, Nick!
>
> I did experiments based on this idea (https://github.com/stephanreiter/cpython/commit/3bca91c26ac81e517b4aa22302be1741b3315622) and haven't rejected it yet. :-)

After talking to Graham about this, I unfortunately realised that the
reason the callback approach is appearing to work for you is because
your application is single-threaded, so you can readily map any
invocation of the callback to the desired interpreter. Multi-threaded
applications won't have that luxury - they need to be able to set the
callback target on a per-thread basis.

Graham actually described a plausible approach for doing that several
years back: https://bugs.python.org/issue10915#msg126387

We have much better subinterpreter testing support now, so if this is
any area that you're interested in, one potential place to start would
be to get Antoine's patch back to a point where it applies and
compiles again.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
Reading through that post, I think I have everything covered but this here:
- The third and final scenario, and the one where the extended GIL
state functions for Ensure is still required, is where code doesn't
have the GIL as yet and wants to make a call into sub interpreter
rather than the main interpreter, where it already has a pointer to
the sub interpreter and nothing more. In this case the new
PyGILState_EnsureEx() function is used, with the sub interpreter being
passed as argument.

If I understand it correctly, it means the following in practice:
Whenever I or a third-party library start a new thread, we need to
query what interpreter we are running at the moment (in the thread
that is starting the new thread) and pass that information on to the
new thread so that it can initialize the GIL for itself.

Pseudo code ahead:
void do_in_thread(func_t *what) {
PyThreadState* state = PyThreadState_Get(); /// or new
PyInterpreterState_Current();
PyInterpreterState *interpreter = state->interp;
std::thread t([what, interpreter] {
auto s = PyGILState_EnsureEx(interpreter);
what();
PyGILState_Release(s); // could also release before what() because
TLS was updated and next PyGILState_Ensure() will work
});
}

Did I get that right?

Stephan

Am Mo., 28. Jan. 2019 um 09:27 Uhr schrieb Nick Coghlan <ncoghlan@gmail.com>:
>
> On Mon, 28 Jan 2019 at 00:32, Stephan Reiter <stephan.reiter@gmail.com> wrote:
> >
> > Cool. Thanks, Nick!
> >
> > I did experiments based on this idea (https://github.com/stephanreiter/cpython/commit/3bca91c26ac81e517b4aa22302be1741b3315622) and haven't rejected it yet. :-)
>
> After talking to Graham about this, I unfortunately realised that the
> reason the callback approach is appearing to work for you is because
> your application is single-threaded, so you can readily map any
> invocation of the callback to the desired interpreter. Multi-threaded
> applications won't have that luxury - they need to be able to set the
> callback target on a per-thread basis.
>
> Graham actually described a plausible approach for doing that several
> years back: https://bugs.python.org/issue10915#msg126387
>
> We have much better subinterpreter testing support now, so if this is
> any area that you're interested in, one potential place to start would
> be to get Antoine's patch back to a point where it applies and
> compiles again.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com
Re: Sub-interpreters: importing numpy causes hang [ In reply to ]
On Mon, 28 Jan 2019 at 19:36, Stephan Reiter <stephan.reiter@gmail.com> wrote:
>
> Reading through that post, I think I have everything covered but this here:
> - The third and final scenario, and the one where the extended GIL
> state functions for Ensure is still required, is where code doesn't
> have the GIL as yet and wants to make a call into sub interpreter
> rather than the main interpreter, where it already has a pointer to
> the sub interpreter and nothing more. In this case the new
> PyGILState_EnsureEx() function is used, with the sub interpreter being
> passed as argument.
>
> If I understand it correctly, it means the following in practice:
> Whenever I or a third-party library start a new thread, we need to
> query what interpreter we are running at the moment (in the thread
> that is starting the new thread) and pass that information on to the
> new thread so that it can initialize the GIL for itself.
>
> Pseudo code ahead:
> void do_in_thread(func_t *what) {
> PyThreadState* state = PyThreadState_Get(); /// or new
> PyInterpreterState_Current();
> PyInterpreterState *interpreter = state->interp;
> std::thread t([what, interpreter] {
> auto s = PyGILState_EnsureEx(interpreter);
> what();
> PyGILState_Release(s); // could also release before what() because
> TLS was updated and next PyGILState_Ensure() will work
> });
> }
>
> Did I get that right?

Yeah, I think that's the essence of it, although the other case that
can come up is when the parent thread just created a new
subinterpreter (that only changes how it acquires the pointer though -
the challenge of getting a child thread to make proper use of that
pointer remains the same).

Cheers,
Nick.

--
Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: https://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com