Mailing List Archive

datetime module refactoring: folder vs parallel private modules
Hi all,

I was hoping to get some feedback on a proposed refactoring of the
datetime module that should dramatically improve import performance.

The datetime module is implemented more or less in full both in pure
Python and in C; the way that this is currently achieved
<https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L7>
is that the pure Python implementation is defined in datetime.py, and
the C implementation is in _datetime, and /after/ the full Python
version is defined, the C version is star-imported and thus any symbols
defined in both versions are taken from the C version; if the C version
is used, any private symbols used only in the pure Python implementation
are manually deleted (see the end of the file
<https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L2503-L2522>).

This adds a lot of unnecessary overhead, both to define a bunch of
unused classes and functions and to import modules that are required for
the pure Python implementation but not for the C implementation. In the
issue he created about this <https://bugs.python.org/issue40799>, Victor
Stinner demonstrated that moving the pure Python implementation to its
own module would speed up the import of datetime by a factor of 4.

I think that we should indeed move the pure Python implementation into
its own module, despite the fact that this is almost guaranteed to break
some people either relying on implementation details or doing something
funky with the import system — I don't think it should break anyone
relying on the guaranteed public interface. The issue at hand is that we
have two options available for the refactoring: either move the pure
Python implementation to its own private top-level module (single file)
such as `_pydatetime`, or make `datetime` a folder with an `__init__.py`
and move the pure Python implementation to `datetime._pydatetime` or
something of that nature.

The decimal and zoneinfo modules both have this same issue; the decimal
module uses the first strategy with _pydecimal and decimal, the zoneinfo
module uses a folder with a zoneinfo._zoneinfo submodule. Assuming we go
forward with this, we need to decide which strategy to adopt for datetime.

In favor of using a datetime/ folder, I'd say it's cleaner to put the
pure Python implementation of datetime under the datetime namespace, and
also it gives us more freedom to play with the module's structure in the
future, since we could have lazily-imported sub-components, or we could
implement some logic common to both implementations in Python and import
it from a `datetime._common` module without requiring the C version to
import the entire Python version, similar to the way zoneinfo has the
zoneinfo._common
<https://github.com/python/cpython/blob/master/Lib/zoneinfo/_common.py>
module.

The downside of the folder method is that it complicates the way
datetime is imported — /especially/ if we add additional structure to
the module, or add any logic into the __init__.py. Two single-file
modules side-by-side, one imported by the other doesn't change anything
about the nature of how the datetime module is imported, and is much
less likely to break anything.

Anyone have thoughts or strong preferences here? Anyone have use cases
where one or the other approaches is likely to cause a bunch of undue
hardship? I'd like to avoid moving this more than once.

Best,
Paul

P.S. Victor's PR moving this code to _pydatetime
<https://github.com/python/cpython/pull/20472> is currently done in such
a way that the ability to backport changes from post-refactoring to
pre-refactoring branches is preserved; I have not checked but I /think/
we should be able to do the same thing with the other strategy as well.
Re: datetime module refactoring: folder vs parallel private modules [ In reply to ]
On 20.07.2020 20:58, Paul Ganssle wrote:
>
> Hi all,
>
> I was hoping to get some feedback on a proposed refactoring of the datetime module that should dramatically improve import performance.
>
> The datetime module is implemented more or less in full both in pure Python and in C; the way that this is currently achieved
> <https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L7> is that the pure Python
> implementation is defined in datetime.py, and the C implementation is in _datetime, and /after/ the full Python version is defined, the C
> version is star-imported and thus any symbols defined in both versions are taken from the C version; if the C version is used, any private
> symbols used only in the pure Python implementation are manually deleted (see the end of the file
> <https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L2503-L2522>).
>
> This adds a lot of unnecessary overhead, both to define a bunch of unused classes and functions and to import modules that are required
> for the pure Python implementation but not for the C implementation. In the issue he created about this
> <https://bugs.python.org/issue40799>, Victor Stinner demonstrated that moving the pure Python implementation to its own module would speed
> up the import of datetime by a factor of 4.
>
> I think that we should indeed move the pure Python implementation into its own module, despite the fact that this is almost guaranteed to
> break some people either relying on implementation details or doing something funky with the import system — I don't think it should break
> anyone relying on the guaranteed public interface. The issue at hand is that we have two options available for the refactoring: either
> move the pure Python implementation to its own private top-level module (single file) such as `_pydatetime`, or make `datetime` a folder
> with an `__init__.py` and move the pure Python implementation to `datetime._pydatetime` or something of that nature.
>

What's the problem with

try:
    from _datetime import *
except ImportError:
    <everything else>

?

Though

try:
    from _datetime import *
except ImportError:
    from _pydatetime import *

Would be more maintainable I guess.


The same goes for `pickle` then.


> The decimal and zoneinfo modules both have this same issue; the decimal module uses the first strategy with _pydecimal and decimal, the
> zoneinfo module uses a folder with a zoneinfo._zoneinfo submodule. Assuming we go forward with this, we need to decide which strategy to
> adopt for datetime.
>
> In favor of using a datetime/ folder, I'd say it's cleaner to put the pure Python implementation of datetime under the datetime namespace,
> and also it gives us more freedom to play with the module's structure in the future, since we could have lazily-imported sub-components,
> or we could implement some logic common to both implementations in Python and import it from a `datetime._common` module without requiring
> the C version to import the entire Python version, similar to the way zoneinfo has the zoneinfo._common
> <https://github.com/python/cpython/blob/master/Lib/zoneinfo/_common.py> module.
>
> The downside of the folder method is that it complicates the way datetime is imported — /especially/ if we add additional structure to the
> module, or add any logic into the __init__.py. Two single-file modules side-by-side, one imported by the other doesn't change anything
> about the nature of how the datetime module is imported, and is much less likely to break anything.
>
> Anyone have thoughts or strong preferences here? Anyone have use cases where one or the other approaches is likely to cause a bunch of
> undue hardship? I'd like to avoid moving this more than once.
>
> Best,
> Paul
>
> P.S. Victor's PR moving this code to _pydatetime <https://github.com/python/cpython/pull/20472> is currently done in such a way that the
> ability to backport changes from post-refactoring to pre-refactoring branches is preserved; I have not checked but I /think/ we should be
> able to do the same thing with the other strategy as well.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CCI7PDAL6G67XVVRKPP2FAYJ5YZYHTK3/
> Code of Conduct: http://python.org/psf/codeofconduct/
> --
> Regards,
> Ivan
Re: datetime module refactoring: folder vs parallel private modules [ In reply to ]
I would go with Ivan's second suggestion (_pydatetime.py). The Zen of
Python mentions "flat is better than nested" and a package seems overkill
here (I'm not sure why you chose a package for zoneinfo, but it looks like
it has a little more internal structure than a datetime package would have.)

On Mon, Jul 20, 2020 at 11:01 AM Paul Ganssle <paul@ganssle.io> wrote:

> Hi all,
>
> I was hoping to get some feedback on a proposed refactoring of the
> datetime module that should dramatically improve import performance.
>
> The datetime module is implemented more or less in full both in pure
> Python and in C; the way that this is currently achieved
> <https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L7>
> is that the pure Python implementation is defined in datetime.py, and the C
> implementation is in _datetime, and *after* the full Python version is
> defined, the C version is star-imported and thus any symbols defined in
> both versions are taken from the C version; if the C version is used, any
> private symbols used only in the pure Python implementation are manually
> deleted (see the end of the file
> <https://github.com/python/cpython/blob/5a2bac7fe0e7a2b67fd57c7a9176a50feed0d7a0/Lib/datetime.py#L2503-L2522>
> ).
>
> This adds a lot of unnecessary overhead, both to define a bunch of unused
> classes and functions and to import modules that are required for the pure
> Python implementation but not for the C implementation. In the issue he
> created about this <https://bugs.python.org/issue40799>, Victor Stinner
> demonstrated that moving the pure Python implementation to its own module
> would speed up the import of datetime by a factor of 4.
>
> I think that we should indeed move the pure Python implementation into its
> own module, despite the fact that this is almost guaranteed to break some
> people either relying on implementation details or doing something funky
> with the import system — I don't think it should break anyone relying on
> the guaranteed public interface. The issue at hand is that we have two
> options available for the refactoring: either move the pure Python
> implementation to its own private top-level module (single file) such as
> `_pydatetime`, or make `datetime` a folder with an `__init__.py` and move
> the pure Python implementation to `datetime._pydatetime` or something of
> that nature.
>
> The decimal and zoneinfo modules both have this same issue; the decimal
> module uses the first strategy with _pydecimal and decimal, the zoneinfo
> module uses a folder with a zoneinfo._zoneinfo submodule. Assuming we go
> forward with this, we need to decide which strategy to adopt for datetime.
>
> In favor of using a datetime/ folder, I'd say it's cleaner to put the pure
> Python implementation of datetime under the datetime namespace, and also it
> gives us more freedom to play with the module's structure in the future,
> since we could have lazily-imported sub-components, or we could implement
> some logic common to both implementations in Python and import it from a
> `datetime._common` module without requiring the C version to import the
> entire Python version, similar to the way zoneinfo has the
> zoneinfo._common
> <https://github.com/python/cpython/blob/master/Lib/zoneinfo/_common.py>
> module.
>
> The downside of the folder method is that it complicates the way datetime
> is imported — *especially* if we add additional structure to the module,
> or add any logic into the __init__.py. Two single-file modules
> side-by-side, one imported by the other doesn't change anything about the
> nature of how the datetime module is imported, and is much less likely to
> break anything.
>
> Anyone have thoughts or strong preferences here? Anyone have use cases
> where one or the other approaches is likely to cause a bunch of undue
> hardship? I'd like to avoid moving this more than once.
>
> Best,
> Paul
>
> P.S. Victor's PR moving this code to _pydatetime
> <https://github.com/python/cpython/pull/20472> is currently done in such
> a way that the ability to backport changes from post-refactoring to
> pre-refactoring branches is preserved; I have not checked but I *think*
> we should be able to do the same thing with the other strategy as well.
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/CCI7PDAL6G67XVVRKPP2FAYJ5YZYHTK3/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Re: datetime module refactoring: folder vs parallel private modules [ In reply to ]
On Mon, 20 Jul 2020 19:49:38 -0700
Guido van Rossum <guido@python.org> wrote:
> I would go with Ivan's second suggestion (_pydatetime.py). The Zen of
> Python mentions "flat is better than nested" and a package seems overkill
> here (I'm not sure why you chose a package for zoneinfo, but it looks like
> it has a little more internal structure than a datetime package would have.)

Just my two cents, but I agree with Guido.

Best regards

Antoine.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YRV2FT7HUKB4K67OT2SNZAO335REFPTL/
Code of Conduct: http://python.org/psf/codeofconduct/