Mailing List Archive

PEP-0467: Minor API improvements for binary sequences
Looking for final comments before submitting for SC approval. It would be nice to finally get this resolved. :)

Full text follows.

---------------------------------------------------------------------------

PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us>
Status: Deferred
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Mar-2014
Python-Version: 3.9
Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01 2021-04-13


Abstract
========

During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes five small adjustments to the APIs of the ``bytes`` and
``bytearray`` types to make it easier to operate entirely in the binary domain:

* Discourage passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators

And one built-in::

* ``bchr``

PEP Deferral
============

This PEP has been deferred until Python 3.9 at the earliest, as the open
questions aren't currently expected to be resolved in time for the Python 3.8
feature addition deadline in May 2019 (if you're keen to see these changes
implemented and are willing to drive that resolution process, contact the PEP
authors).

Proposals
=========

Discourage use of current "zero-initialised sequence" behaviour
---------------------------------------------------------------

Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::

>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')

This PEP proposes to update the documentation to discourage making use of that
input type dependent behaviour in Python 3.10, suggesting to use a new, more
explicit, ``bytes.fromsize(n)`` or ``bytearray.fromsize(n)`` spelling instead
(see next section).

However, the current handling of numeric inputs in the default constructors
would remain in place indefinitely to avoid introducing a compatibility break.

No other changes are proposed to the existing constructors.


Addition of explicit "count and byte initialised sequence" constructors
-----------------------------------------------------------------------

To replace the now discouraged behaviour, this PEP proposes the addition of an
explicit ``fromsize`` alternative constructor as a class method on both
``bytes`` and ``bytearray`` whose first argument is the count, and whose
second argument is the fill byte to use (defaults to ``\x00``)::

>>> bytes.fromsize(3)
b'\x00\x00\x00'
>>> bytearray.fromsize(3)
bytearray(b'\x00\x00\x00')
>>> bytes.fromsize(5, b'\x0a')
b'\x0a\x0a\x0a\x0a\x0a'
>>> bytearray.fromsize(5, b'\x0a')
bytearray(b'\x0a\x0a\x0a\x0a\x0a')

``fromsize`` will behave just as the current constructors behave when passed a
single integer, while allowing for non-zero fill values when needed.

Similar to ``str.center``, ``str.ljust``, and ``str.rjust``, both parameters
would be positional-only with no externally visible name.


Addition of "bchr" function and explicit "single byte" constructors
-------------------------------------------------------------------

As binary counterparts to the text ``chr`` function, this PEP proposes
the addition of a ``bchr`` function and an explicit ``fromord`` alternative
constructor as a class method on both ``bytes`` and ``bytearray``::

>>> bchr(ord("A"))
b'A'
>>> bchr(ord(b"A"))
b'A'
>>> bytes.fromord(65)
b'A'
>>> bytearray.fromord(65)
bytearray(b'A')

These methods will only accept integers in the range 0 to 255 (inclusive)::

>>> bytes.fromord(512)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: integer must be in range(0, 256)

>>> bytes.fromord(1.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer

While this does create some duplication, there are valid reasons for it:

* the ``bchr`` builtin is to recreate the ``ord``/``chr``/``unichr`` trio from
Python 2 under a different naming scheme (however, see the Open Questions
section below)
* the class method is mainly for the ``bytearray.fromord`` case, with
``bytes.fromord`` added for consistency

The documentation of the ``ord`` builtin will be updated to explicitly note
that ``bchr`` is the primary inverse operation for binary data, while ``chr``
is the inverse operation for text data, and that ``bytes.fromord`` and
``bytearray.fromord`` also exist.

Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher
order functions like ``map``.


Addition of "getbyte" method to retrieve a single byte
------------------------------------------------------

This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
which will always return ``bytes``::

>>> b'abc'.getbyte(0)
b'a'

If an index is asked for that doesn't exist, ``IndexError`` is raised::

>>> b'abc'.getbyte(9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index out of range


Addition of optimised iterator methods that produce ``bytes`` objects
---------------------------------------------------------------------

This PEP proposes that ``bytes`` and ``bytearray`` gain an optimised
``iterbytes`` method that produces length 1 ``bytes`` objects rather than
integers::

for x in data.iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer

For example::

>>> tuple(b"ABC".iterbytes())
(b'A', b'B', b'C')


Design discussion
=================

Why not rely on sequence repetition to create zero-initialised sequences?
-------------------------------------------------------------------------

Zero-initialised sequences can be created via sequence repetition::

>>> b'\x00' * 3
b'\x00\x00\x00'
>>> bytearray(b'\x00') * 3
bytearray(b'\x00\x00\x00')

However, this was also the case when the ``bytearray`` type was originally
designed, and the decision was made to add explicit support for it in the
type constructor. The immutable ``bytes`` type then inherited that feature
when it was introduced in PEP 3137.

This PEP isn't revisiting that original design decision, just changing the
spelling as users sometimes find the current behaviour of the binary sequence
constructors surprising. In particular, there's a reasonable case to be made
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
``bytes.fromord(x)`` proposal in this PEP. Providing both behaviours as separate
class methods avoids that ambiguity.


Why use positional-only parameters?
-----------------------------------

This is for consistency with the other methods on the affected types, and to
avoid having to devise sensible names for them.


Open Issue: memoryview
======================

Updating ``memoryview`` with these new methods is outside the scope of this PEP.
>>>>>>> 467: update deprecation and memoryview statements


References
==========

.. [1] Initial March 2014 discussion thread on python-ideas
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
.. [2] Guido's initial feedback in that thread
(https://mail.python.org/pipermail/python-ideas/2014-March/027376.html)
.. [3] Issue proposing moving zero-initialised sequences to a dedicated API
(http://bugs.python.org/issue20895)
.. [4] Issue proposing to use calloc() for zero-initialised binary sequences
(http://bugs.python.org/issue21644)
.. [5] August 2014 discussion thread on python-dev
(https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
.. [6] June 2016 discussion thread on python-dev
(https://mail.python.org/pipermail/python-dev/2016-June/144875.html)


Copyright
=========

This document has been placed in the public domain.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/L4MNK4Y3OM6E4TVJSEW6T552HGJJNLVA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP-0467: Minor API improvements for binary sequences [ In reply to ]
Thanks for this PEP! Most of these proposals would make for useful
improvements to the language. I have a few pieces of feedback below.

El mar, 13 abr 2021 a las 14:14, Ethan Furman (<ethan@stoneleaf.us>)
escribió:

> This PEP has been deferred until Python 3.9 at the earliest, as the open
>
This should be 3.10 at least (and even that is pushing it by now).


>
> While this does create some duplication, there are valid reasons for it:
>
> * the ``bchr`` builtin is to recreate the ``ord``/``chr``/``unichr`` trio
> from
> Python 2 under a different naming scheme (however, see the Open
> Questions
> section below)
> * the class method is mainly for the ``bytearray.fromord`` case, with
> ``bytes.fromord`` added for consistency
>
>
> I don't see an "Open questions" section in this email (only an "Open
issues" section talking about memoryview).

I don't find the argument for a builtin very persuasive. Why is it
important to recreate the Python 2 trio? `bchr` is a more obscure name than
`bytes.fromord`. `bytes.fromord` is already short and doesn't require an
import, so we don't gain that much from the separate builtin.
Re: PEP-0467: Minor API improvements for binary sequences [ In reply to ]
On 4/13/21 3:01 PM, Jelle Zijlstra wrote:

> Thanks for this PEP! Most of these proposals would make for useful improvements to the language. I have a few pieces of
> feedback below.
>
> El mar, 13 abr 2021 a las 14:14, Ethan Furman escribió:
>
> This PEP has been deferred until Python 3.9 at the earliest, as the open
>
> This should be 3.10 at least (and even that is pushing it by now).

Ah, thanks -- fixed (and fingers crossed for 3.10 -- most of the code/tests are already written).

> While this does create some duplication, there are valid reasons for it:
>
> * the ``bchr`` builtin is to recreate the ``ord``/``chr``/``unichr`` trio from
>    Python 2 under a different naming scheme (however, see the Open Questions
>    section below)
> * the class method is mainly for the ``bytearray.fromord`` case, with
>    ``bytes.fromord`` added for consistency
>
>
> I don't see an "Open questions" section in this email (only an "Open issues" section talking about memoryview).

Fixed (removed reference to Open questions).

> I don't find the argument for a builtin very persuasive. Why is it important to recreate the Python 2 trio? `bchr` is a
> more obscure name than `bytes.fromord`. `bytes.fromord` is already short and doesn't require an import, so we don't gain
> that much from the separate builtin.

`chr` and `ord` are builtins, so `bchr` fits right in. `bytes.fromord` is there to mirror `bytearray.fromord` and
facilitate duck-typing. What you are doing will affect which one you reach for. For me at least, reading code that
contains `bytes.fromord` puts too much emphasis on the type and method, whilst `bchr` has it just right. :-)

--
~Ethan~
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DX7GTNWYO36QQVNSN3BT3Z6QPG7SRXYA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP-0467: Minor API improvements for binary sequences [ In reply to ]
Thanks for picking this back up!

The deferral section can go away now that you're actively working on it
again, and +1 from me on the resolution of the previously open questions
(although I wouldn't be particularly upset if the SC considered bchr
redundant, given that "bchr = bytes.fromord" is a trivial alias in cases
where the shorter spelling is easier to read).

Cheers,
Nick.