Mailing List Archive

Importing a submodule doesn't always set an attribute on its parent
Hello,

I came across what seems like either a bug in the import system or a gap in its documentation, so I'd like to run it by folks here to see if I should submit a bug report. If there's somewhere else more appropriate to discuss this, please let me know.

If you import A.B, then remove A from sys.modules and import A.B again, the newly-loaded version of A will not contain an attribute referring to B. Using "collections.abc" as an example submodule from the standard library:

>>> import sys
>>> import collections.abc
>>> del sys.modules['collections']
>>> import collections.abc
>>> collections.abc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'collections' has no attribute 'abc'

This behavior seems quite counter-intuitive to me: why should the fact that B is already loaded prevent adding a reference to it to A? It also goes against the general principle that "import FOO" makes the expression "FOO" well-defined; for example PLR 5.7 states that "'import XXX.YYY.ZZZ' should expose 'XXX.YYY.ZZZ' as a usable expression". Finally, it violates the "invariant" stated in PLR 5.4.2 that if 'A' and 'A.B' both appear in sys.modules, then A.B must be defined and refer to sys.modules['A.B'].

On the other hand, PLR 5.4.2 also states that "when a submodule is loaded using any mechanism... a binding is placed in the parent module's namespace to the submodule object", which is consistent with the behavior above, since the second import of A.B does not actually "load" B (only retrieve it from the sys.modules cache). So perhaps Python is working as intended here, and there is an unwritten assumption that if you unload a module from the cache, you must also unload all of its submodules. If so, I think this needs to be added to the documentation (which currently places no restrictions on how you can modify sys.modules, as far as I can tell).

This may be an obscure corner case that is unlikely to come up in practice (I imagine few people need to modify sys.modules), but it did actually cause a bug in a project I work on, where it is necessary to uncache certain modules so that they can be reloaded. I was able to fix the bug some other way, but I think it would still be worthwhile to either make the import behavior more consistent (so that 'import A.B' always sets the B attribute of A) or add a warning in the documentation about this case. I'd appreciate any thoughts on this!

Thanks,
Daniel
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VIPXZRK3OJNSVNSZSAJ7CO6QFC2RX27W/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
On Fri, Apr 8, 2022 at 4:38 PM dfremont--- via Python-Dev <
python-dev@python.org> wrote:

> Hello,
>
> I came across what seems like either a bug in the import system or a gap
> in its documentation, so I'd like to run it by folks here to see if I
> should submit a bug report. If there's somewhere else more appropriate to
> discuss this, please let me know.
>
> If you import A.B, then remove A from sys.modules and import A.B again,
> the newly-loaded version of A will not contain an attribute referring to B.
> Using "collections.abc" as an example submodule from the standard library:
>
> >>> import sys
> >>> import collections.abc
> >>> del sys.modules['collections']
> >>> import collections.abc
> >>> collections.abc
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: module 'collections' has no attribute 'abc'
>
> This behavior seems quite counter-intuitive to me: why should the fact
> that B is already loaded prevent adding a reference to it to A?


Because `"collections.abc" in sys.modules` is true. The import system
expects that if you already imported a module then everything that needed
to happen, happened. Basically you cheated by not doing a thorough cleaning
of sys.modules by not deleting all the submodules as well.


> It also goes against the general principle that "import FOO" makes the
> expression "FOO" well-defined;


You're dealing with the import system; you never got to have a well-defined
statement to begin with. ????


> for example PLR 5.7 states that "'import XXX.YYY.ZZZ' should expose
> 'XXX.YYY.ZZZ' as a usable expression".


And it did. But then you went behind the curtain and moved stuff around.


> Finally, it violates the "invariant" stated in PLR 5.4.2 that if 'A' and
> 'A.B' both appear in sys.modules, then A.B must be defined and refer to
> sys.modules['A.B'].
>

That isn't an invariant that holds when you delete things outside of the
import system; that statement is what the import system *does*, not what
the import system guarantees to always be true.


>
> On the other hand, PLR 5.4.2 also states that "when a submodule is loaded
> using any mechanism... a binding is placed in the parent module's namespace
> to the submodule object", which is consistent with the behavior above,
> since the second import of A.B does not actually "load" B (only retrieve it
> from the sys.modules cache). So perhaps Python is working as intended here,
> and there is an unwritten assumption that if you unload a module from the
> cache, you must also unload all of its submodules. If so, I think this
> needs to be added to the documentation (which currently places no
> restrictions on how you can modify sys.modules, as far as I can tell).
>
> This may be an obscure corner case that is unlikely to come up in practice
> (I imagine few people need to modify sys.modules), but it did actually
> cause a bug in a project I work on, where it is necessary to uncache
> certain modules so that they can be reloaded. I was able to fix the bug
> some other way, but I think it would still be worthwhile to either make the
> import behavior more consistent (so that 'import A.B' always sets the B
> attribute of A) or add a warning in the documentation about this case. I'd
> appreciate any thoughts on this!
>

Feel free to propose some language to update the docs, but changing this
behaviour very well may have unintended consequences, so I would rather not
try to change it.
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
On 4/8/2022 7:56 PM, Brett Cannon wrote:
>
> On Fri, Apr 8, 2022 at 4:38 PM dfremont--- via Python-Dev
> <python-dev@python.org <mailto:python-dev@python.org>> wrote:

> If you import A.B, then remove A from sys.modules and import A.B
> again, the newly-loaded version of A will not contain an attribute
> referring to B.
...
> for example PLR 5.7 states that "'import XXX.YYY.ZZZ' should expose
> 'XXX.YYY.ZZZ' as a usable expression".
>
> And it did. But then you went behind the curtain and moved stuff around.
>
> Finally, it violates the "invariant" stated in PLR 5.4.2 that if 'A'
> and 'A.B' both appear in sys.modules, then A.B must be defined and
> refer to sys.modules['A.B'].
>
> That isn't an invariant that holds when you delete things outside of the
> import system; that statement is what the import system /does/, not what
> the import system guarantees to always be true.
...
> Feel free to propose some language to update the docs,

Perhaps something intentionally vague like

"Manual deletion of entries from sys.modules may invalidate statements
above, even after re-imports."

or

"Manual deletion of entries from sys.modules may result in surprising
behavior, even after re-imports."

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7CMQMWJJMM7RUDWUQXL3MW64KL4VW3P6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
2022-04-09 04:24 UTC, Terry Reedy <tjreedy@udel.edu>????:
> Perhaps something intentionally vague like
>
> "Manual deletion of entries from sys.modules may invalidate statements
> above, even after re-imports."
>
> or
>
> "Manual deletion of entries from sys.modules may result in surprising
> behavior, even after re-imports."

Not only deletion, but also random assignments:

>>> import sys
>>> import collections.abc
>>> sys.modules['collections'] = 1
>>> import collections.abc
>>> collections.abc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'int' object has no attribute 'abc'
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IRXLI6XANNQTOGSBQGOFX25UJD6J4SGJ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
Thanks, Brett. I understand why the behavior happens, I just don't understand the decision to implement imports this way. Since there's no warning in the documentation that removing items from sys.modules can break the fact that "import X.Y" defines "X.Y" (note that the "behind the curtain" stuff happens *before* the second import, so it's still the case that the second import does not define "X.Y" as implied by the docs), and there's also no warning that submodules must be removed at the same time as their parent, I would expect my example code to work.

I don't see any downside to having "import X.Y" always set the Y attribute of X (instead of only setting it if 'X.Y' is not already in sys.modules), but if you think it's a bad idea, here's a suggestion for a paragraph to add at the end of PLR 5.4.2:

"Note that the binding to the submodule object in the parent module's namespace is only added when the submodule is actually *loaded*. If the submodule is already present in `sys.modules` when it is imported (through any of the mechanisms above), then it will not be loaded again and no binding will be added to the parent module."

If removing a module but not its submodules from sys.modules is considered "cheating" and could potentially break other parts of the import system, that should also be documented, e.g. by adding the sentence "If you delete a key for a module in `sys.modules`, you must also delete the keys for all submodules of that module." at the end of the 3rd paragraph of PLR 5.3.1. However, I would much rather not impose this restriction, since it seems unnecessarily restrictive (indeed, my code violates it but works fine, and changing it to transitively remove all submodules would necessitate reloading many modules which do not actually need to be reloaded).

(Terry, thanks for your suggestion. My concern about adding such a vague warning is that to me, it reads as saying that all bets are off if you modify sys.modules by hand, which means it would never be safe to do so, i.e., the behavior might change arbitrarily in a future Python version. But in my opinion there are legitimate cases where it is necessary to ensure a module will be reloaded the next time it is imported, and the documented way to do that is to remove entries from sys.modules.)

Daniel
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/N763W6AGD6NQ4IXVWMNGDL4DBN3LXBJ7/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
> Not only deletion, but also random assignments:
>
> >>> import sys
> >>> import collections.abc
> >>> sys.modules['collections'] = 1
> >>> import collections.abc
> >>> collections.abc
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'int' object has no attribute 'abc'


Sure, but I hope people expect that kind of monkey patching to break things.

-CHB


> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/IRXLI6XANNQTOGSBQGOFX25UJD6J4SGJ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
On 4/9/2022 5:09 AM, Arfrever Frehtes Taifersar Arahesis wrote:
> 2022-04-09 04:24 UTC, Terry Reedy <tjreedy@udel.edu>????:
>> Perhaps something intentionally vague like
>>
>> "Manual deletion of entries from sys.modules may invalidate statements
>> above, even after re-imports."
>>
>> or
>>
>> "Manual deletion of entries from sys.modules may result in surprising
>> behavior, even after re-imports."
>
> Not only deletion, but also random assignments:

Ok. Change "Manual deletion of entries from sys.modules" to "Direct
manipulation of sys.modules"

Terry Jan Reedy
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/BKGWDKLV3WJCN4YFQ5LFKOKSKQAZDCNJ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
On 4/9/2022 4:28 PM, Terry Reedy wrote:
> On 4/9/2022 5:09 AM, Arfrever Frehtes Taifersar Arahesis wrote:
>>
>> Not only deletion, but also random assignments:
>
> Ok.  Change "Manual deletion of entries from sys.modules" to "Direct
> manipulation of sys.modules"

I'm not sure it's worth the hassle to document this. There are no doubt
hundreds of ways to break things. We can't enumerate all of them.

Eric


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WGP7SLEIEIDPJEOU3ANSRPGHWKLMM7QE/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
On Sun, 10 Apr 2022, 8:44 am Eric V. Smith, <eric@trueblade.com> wrote:

> On 4/9/2022 4:28 PM, Terry Reedy wrote:
> > On 4/9/2022 5:09 AM, Arfrever Frehtes Taifersar Arahesis wrote:
> >>
> >> Not only deletion, but also random assignments:
> >
> > Ok. Change "Manual deletion of entries from sys.modules" to "Direct
> > manipulation of sys.modules"
>
> I'm not sure it's worth the hassle to document this. There are no doubt
> hundreds of ways to break things. We can't enumerate all of them.
>


I thought we already explicitly said that if you directly alter the import
system's data stores then it's possible to break things in non-obvious ways.

If we don't, then it seems reasonable to make it explicit that the runtime
*isn't* implicitly enforcing the invariants the import system expects to
see, it is instead assuming that all code manipulating the data stores is
cooperating to ensure those invariants hold.

As the OP noted, in the context of programming languages in general, rather
than Python in particular, it's a relatively unusual design choice to allow
developers the ability to inadvertently (or intentionally!) corrupt the
internal consistency of the import system's runtime state to the point
where legal code won't run anymore (even though it's aligned with Python's
generally permissive philosophy that allows even destructive actions like
clearing the builtin namespace).

Cheers,
Nick.


> Eric
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/WGP7SLEIEIDPJEOU3ANSRPGHWKLMM7QE/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
On Sat, Apr 9, 2022 at 1:53 PM dfremont--- via Python-Dev <
python-dev@python.org> wrote:

> Thanks, Brett. I understand why the behavior happens, I just don't
> understand the decision to implement imports this way. Since there's no
> warning in the documentation that removing items from sys.modules can break
> the fact that "import X.Y" defines "X.Y" (note that the "behind the
> curtain" stuff happens *before* the second import, so it's still the case
> that the second import does not define "X.Y" as implied by the docs), and
> there's also no warning that submodules must be removed at the same time as
> their parent, I would expect my example code to work.
>
> I don't see any downside to having "import X.Y" always set the Y attribute
> of X (instead of only setting it if 'X.Y' is not already in sys.modules),
> but if you think it's a bad idea, here's a suggestion for a paragraph to
> add at the end of PLR 5.4.2:
>
> "Note that the binding to the submodule object in the parent module's
> namespace is only added when the submodule is actually *loaded*. If the
> submodule is already present in `sys.modules` when it is imported (through
> any of the mechanisms above), then it will not be loaded again and no
> binding will be added to the parent module."
>

I don't want the import docs to be that detailed. As others have suggested,
something more about "directly mutating the contents of `sys.modules` may
have unexpected side-effects" is better.


>
> If removing a module but not its submodules from sys.modules is considered
> "cheating" and could potentially break other parts of the import system,
> that should also be documented, e.g. by adding the sentence "If you delete
> a key for a module in `sys.modules`, you must also delete the keys for all
> submodules of that module." at the end of the 3rd paragraph of PLR 5.3.1.
> However, I would much rather not impose this restriction, since it seems
> unnecessarily restrictive (indeed, my code violates it but works fine, and
> changing it to transitively remove all submodules would necessitate
> reloading many modules which do not actually need to be reloaded).
>
> (Terry, thanks for your suggestion. My concern about adding such a vague
> warning is that to me, it reads as saying that all bets are off if you
> modify sys.modules by hand, which means it would never be safe to do so,
> i.e., the behavior might change arbitrarily in a future Python version.


That's correct, and that's the reason Terry suggested that wording. If we
were to do the import system over again, sys.modules would either be hidden
from direct access, be attached to the code implementing the importer, or
have a leading underscore. So we don't want to strengthen the definition at
all; best we are comfortable with is put up a warning that you don't want
to do stuff with sys.modules unless you know what you're doing.


> But in my opinion there are legitimate cases where it is necessary to
> ensure a module will be reloaded the next time it is imported, and the
> documented way to do that is to remove entries from sys.modules.)
>

The Python import system is simply not designed around the idea of undoing
an import (the fact that imports have side-effects guarantee it will never
be 100% successful). Plus even using something like importlib.reload()
won't necessarily get you what you want since any object stored as an
attribute somewhere will not get updated.
Re: Importing a submodule doesn't always set an attribute on its parent [ In reply to ]
Brett Cannon wrote:
> So we don't want to strengthen the definition at
> all; best we are comfortable with is put up a warning that you don't want
> to do stuff with sys.modules unless you know what you're doing.

OK, thanks for the clarification. Having read through the source of importlib one too many times, I guess I will declare that I "know what [I'm] doing" for now and keep on mutating sys.modules, since the alternative (intercepting all imports) seems more painful to me. If my code breaks in a future Python version I'll only blame myself :)

Best,
Daniel
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HZEPOAI3YME4GD2M6RPWG2KG4OTSB5KX/
Code of Conduct: http://python.org/psf/codeofconduct/