Mailing List Archive

PEP 467 feedback from the Steering Council
Hello Nick, Ethan,

The Python Steering Council reviewed PEP 467 -- Minor API improvements for binary sequences at our 2021-07-26 meeting.

Thank you for work on this PEP. We’re generally very favorable for adding to Python 3.11 the features and APIs described in the PEP. We have some requests for changes that we’d like you to consider.

* The Python-Version in the PEP needs to target Python 3.11 of course.

* We think it would be better if bytes.fromsize()’s second argument was a keyword-enabled or keyword-only argument. We understand the rationale given in the PEP for not doing so, but ultimately we think the readability of (at least allowing) a keyword argument to be more compelling. Some possible options include `fill`, `value`, or `byte`.

* We all really dislike the word “ord” as in `bytes.fromord()`. We understand the symmetry of this choice, but we also feel like we have an opportunity to make it more understandable, so we recommend `bytes.fromint()` and `bytearray.fromint()`.

* We think the `bchr()` built-in is not necessary. Given the `.fromint()` methods, it’s better not to duplicate the functionality, and everything has a cost. A built-in that exists only for the symmetry described in the PEP is just a little extra complexity for little value.

Let us know what you think about making these changes. We aren’t making acceptance contingent on these changes, but we do think they make the PEP and the new APIs better.

-Barry (on behalf of the Python Steering Council)
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Fri, 30 Jul 2021, 8:47 am Barry Warsaw, <barry@python.org> wrote:

>
> Hello Nick, Ethan,
>
> The Python Steering Council reviewed PEP 467 -- Minor API improvements for
> binary sequences at our 2021-07-26 meeting.
>
> Thank you for work on this PEP. We’re generally very favorable for adding
> to Python 3.11 the features and APIs described in the PEP.


Thank you!


>
> Let us know what you think about making these changes. We aren’t making
> acceptance contingent on these changes, but we do think they make the PEP
> and the new APIs better.
>


Those changes all sound reasonable to me, so if Ethan is also amenable, I
think we should incorporate them.

Cheers,
Nick.


> -Barry (on behalf of the Python Steering Council)
>
>
Re: PEP 467 feedback from the Steering Council [ In reply to ]
Thanks Nick. Ethan, what do you think?

-Barry

> On Jul 29, 2021, at 16:28, Nick Coghlan <ncoghlan@gmail.com> wrote:
>
>
>
> On Fri, 30 Jul 2021, 8:47 am Barry Warsaw, <barry@python.org> wrote:
>
> Hello Nick, Ethan,
>
> The Python Steering Council reviewed PEP 467 -- Minor API improvements for binary sequences at our 2021-07-26 meeting.
>
> Thank you for work on this PEP. We’re generally very favorable for adding to Python 3.11 the features and APIs described in the PEP.
>
> Thank you!
>
>
>
> Let us know what you think about making these changes. We aren’t making acceptance contingent on these changes, but we do think they make the PEP and the new APIs better.
>
>
> Those changes all sound reasonable to me, so if Ethan is also amenable, I think we should incorporate them.
>
> Cheers,
> Nick.
>
>
> -Barry (on behalf of the Python Steering Council)
>
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On 7/29/21 3:46 PM, Barry Warsaw wrote:

> We’re generally very favorable for adding to Python 3.11 the features and APIs described
> in the PEP. We have some requests for changes that we’d like you to consider.
>
> * The Python-Version in the PEP needs to target Python 3.11 of course.

Done.

> * We think it would be better if bytes.fromsize()’s second argument was a keyword-enabled or keyword-only argument.
We understand the rationale given in the PEP for not doing so, but ultimately we think the readability of (at least
allowing) a keyword argument to be more compelling. Some possible options include `fill`, `value`, or `byte`.

Done, went with "fill" as an optional keyword argument.

> * We all really dislike the word “ord” as in `bytes.fromord()`. We understand the symmetry of this choice, but we
also feel like we have an opportunity to make it more understandable, so we recommend `bytes.fromint()` and
`bytearray.fromint()`.

Done.

> * We think the `bchr()` built-in is not necessary. Given the `.fromint()` methods, it’s better not to duplicate the
functionality, and everything has a cost. A built-in that exists only for the symmetry described in the PEP is just a
little extra complexity for little value.

I would rather keep `bchr` and lose the `.fromint()` methods.

To get bytes:

some_var = bchr(65)
vs
some_var = bytes.fromint(65)

and for bytearrays

some_var = bytearray(bchr(65))
vs
some_var = bytearray.from_int(65)


Let me know if I should drop `.fromint()`.

--
~Ethan~
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FUGXFVXV3SXPOX66KMAWKAJXIERB3JF4/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
Thanks for responding Ethan.

> On Aug 3, 2021, at 10:48, Ethan Furman <ethan@stoneleaf.us> wrote:
>
> I would rather keep `bchr` and lose the `.fromint()` methods.
>
> To get bytes:
>
> some_var = bchr(65)
> vs
> some_var = bytes.fromint(65)
>
> and for bytearrays
>
> some_var = bytearray(bchr(65))
> vs
> some_var = bytearray.from_int(65)

Can you provide some rationale for why you prefer bchr() over .fromint()?

Cheers,
-Barry
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On 8/3/21 1:19 PM, Barry Warsaw wrote:

> Can you provide some rationale for why you prefer bchr() over .fromint()?

- `bchr` directly corresponds with `chr`

- `str` has no `fromint`

- `bytearray(bchr(int))` is roughly the same as `bytearray.fromint(int)`, but
`bchr(int)` for a bytes object is much nicer that `bytes.fromint(int)`

- possible confusion between `fromsize` and `fromint` (I know it would get me
from time to time)

--
~Ethan~
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5PDMVAVBELIPG23TOCTLEDA6DWYVYBAC/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
> I would rather keep `bchr` and lose the `.fromint()` methods.

I would prefer to only have a bytes.byte(65) method, no bchr()
built-in function. I would prefer to keep builtins namespace as small
as possible.

bytes.byte() name is similar to bytes.getbyte(). I cannot find "int"
in the name of other bytes methods.

> some_var = bytearray(bchr(65))
> vs
> some_var = bytearray.from_int(65)

bytearray(bchr(65)) sounds less efficient.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KBVVBJL2PHI55Y26Z4FMSCJPER242LFA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Aug 4, 2021, at 07:31, Victor Stinner <vstinner@python.org> wrote:
>
> On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
>> I would rather keep `bchr` and lose the `.fromint()` methods.
>
> I would prefer to only have a bytes.byte(65) method, no bchr()
> built-in function. I would prefer to keep builtins namespace as small
> as possible.

The Steering Council is also pretty adamantly against adding a new bchr() built-in.

> bytes.byte() name is similar to bytes.getbyte(). I cannot find "int"
> in the name of other bytes methods.

.byte() seems fine to me too. I’m not a fan of smushedwords but .fromint() seemed better than .fromord().

-Barry
Re: PEP 467 feedback from the Steering Council [ In reply to ]
I see in the PEP:

"the bchr builtin is to recreate the ord/chr/unichr trio from Python 2
under a different naming scheme"

Why recreate that trio? Shouldn't we be moving away from the
bytes-is-a-string concept here?

A byte is not a character -- why would the function that creates a byte
from an integer value be called bchr()? (short for "byte character",
presumably)

There are fewer and fewer people having to translate their code (or their
brains) from py2 to py3.

bytes.fromint() is just fine.

-CHB

BTW -- I really love the rest of the PEP -- it's been too awkward to work
with bytes for too long.



On Wed, Aug 4, 2021 at 9:43 AM Barry Warsaw <barry@python.org> wrote:

> On Aug 4, 2021, at 07:31, Victor Stinner <vstinner@python.org> wrote:
> >
> > On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
> >> I would rather keep `bchr` and lose the `.fromint()` methods.
> >
> > I would prefer to only have a bytes.byte(65) method, no bchr()
> > built-in function. I would prefer to keep builtins namespace as small
> > as possible.
>
> The Steering Council is also pretty adamantly against adding a new bchr()
> built-in.
>
> > bytes.byte() name is similar to bytes.getbyte(). I cannot find "int"
> > in the name of other bytes methods.
>
> .byte() seems fine to me too. I’m not a fan of smushedwords but
> .fromint() seemed better than .fromord().
>
> -Barry
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/CPZTRWIWLRKTBHQS6UPY63BEZCV3FDZZ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: PEP 467 feedback from the Steering Council [ In reply to ]
Christopher Barker writes:

> A byte is not a character

While I am -0.5 on bchr for many of the reasons already cited in the
thread (and would be -1 if the methods names proposed for the feature
were a bit more aesthetic), I don't think this argument is valid.
Bytes that could otherwise be arbitrary (aka "magic numbers") are
*often* chosen because they correspond to the ASCII repertoire. And
strings is still a useful utility for C programmers, even if not so
much for others.

It's true that bytes are still bytes, characters are still characters,
and it's a very good thing from my point of view that Python 3 gave us
a consistent separation -- the only thing I ever explicitly use bytes
for is passwords for zipfiles, and the implicit handling of bytes
ontherwise just works for me :-). But it turns out it was a mistake
to make it so hard for consenting adults to treat bytes as characters
in certain contexts (for example, PEP 461 -- note: I opposed that PEP
and I was wrong -- should have been part of Python 3.0).

Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZASN3G7VZBZSAYY6CH5GUDDENOZ6NJKR/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Fri, 6 Aug 2021 01:37:48 +0900
"Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
> Christopher Barker writes:
>
> > A byte is not a character
>
> While I am -0.5 on bchr for many of the reasons already cited in the
> thread (and would be -1 if the methods names proposed for the feature
> were a bit more aesthetic), I don't think this argument is valid.
> Bytes that could otherwise be arbitrary (aka "magic numbers") are
> *often* chosen because they correspond to the ASCII repertoire. And
> strings is still a useful utility for C programmers, even if not so
> much for others.

In what context is `bchr()` useful?

Regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MLJRB4YJBSWPCC75UVFHHXDKMOQVEMZT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
Antoine Pitrou writes:

> In what context is `bchr()` useful?

As a builtin, not my problem, I'm not the proponent. As a facility
with *some* spelling, it's convenient in contexts where chr() is, but
much less so (eg, coding ROT13 ;-). I know I've used this translation
in mail hacking, but I don't recall whether the code was Python or
Lisp.

Regards,
Steve

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VBZBMCPQ6IR3SZIEALQQFXGLUMXY7RUT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Fri, Aug 6, 2021 at 12:23 PM Stephen J. Turnbull
<turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
> As a builtin, not my problem, I'm not the proponent. As a facility
> with *some* spelling, it's convenient in contexts where chr() is, but
> much less so (eg, coding ROT13 ;-). I know I've used this translation
> in mail hacking, but I don't recall whether the code was Python or
> Lisp.

Stephen does not advocate bchr() as a built-in or library function,
but he just gave a great reason why it should not be a built-in: it's
hard to find compelling and common use cases.

A built-ins should be one or more of:

- extremely useful in daily coding, like len() and list() etc.
- foundational, like next(), classmethod() and iter() etc.
- hard to create in Python code, like breakpoint(), compile() etc.

super() is an example that fits all of those groups.

bchr() (or whatever name it might have) fits none.

Cheers,

Luciano



>
> Regards,
> Steve
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VBZBMCPQ6IR3SZIEALQQFXGLUMXY7RUT/
> Code of Conduct: http://python.org/psf/codeofconduct/



--
Luciano Ramalho
| Author of Fluent Python (O'Reilly, 2015)
| http://shop.oreilly.com/product/0636920032519.do
| Technical Principal at ThoughtWorks
| Twitter: @ramalhoorg
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/W2OLE2D4JP5KXR2U4I6W5RDQZH322LEO/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
I recommend removing the "discouragement" from writing "bytes(10)". That is merely stylistic. As long as we support the API, it is valid Python. In the contexts where it is currently used, it tends to be clear about what it is doing: buffer = bytearray(bufsize). That doesn't need to be discouraged.

Also, I concur the with SC comment that the singular of bytearray() or bytes() is byte(), not bchr(). Practically what people want here is an efficient literal that is easier to write than: b'\x1F'. I don't think bchr() meets that need. Neither bchr(0x1f) or bytearray.fromint(0x1f) are fast (not a literal) nor are they easier to read or type.

The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases.

This is disappointing because often the only reasonable way to manipulate binary data is with bytearrays. A user could switch to array.array() or a numpy.array, but that is unnecessarily inconvenient given that we already have a nice builtin type that means the need (for images, crypto hashes, compression, bloom filters, or anything where a C programmer would an array of unsigned chars).

Given that bytes/bytearray is already an uncomfortable hybrid of string and list APIs for binary data, I don't think the competing views and APIs will be disentangled by adding methods that duplicate functionality that already exists. Instead, I recommend that the PEP focus on one or two cases where methods could be added that simplify any common tasks that are currently awkward. For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OKILIXKK7F6BHDRTFRGUFXXUDNNZW3BL/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:

> The history of bytes/bytearray is a dual-purpose view. It can be used in
> a string-like way to emulate Python 2 string handling (hence all the usual
> string methods and a repr that displays in a string-like fashion). It can
> also be used as an array of numbers, 0 to 255 (hence the list methods and
> having an iterator of ints). ISTM that the authors of this PEP reject or
> want to discourage the latter use cases.
>

I didn't read it that way, but if so, please no, I"d rather see the former
use cases discouraged. ISTM that the Py2 string handling is still needed
for working with mixed binary / text data -- but that should be a pretty
specialized use case. spelling the way to create a byte, byte() sure makes
more sense in any other context.


> ... anything where a C programmer would an array of unsigned chars).
>

or any programmer would use an array of unsigned 8bit integers :-) numpy
spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`.
My point is that for anyone not an "old time" C programmer, or even a
Python2 programmer, the "character is an unsigned 8 bit int" concept is
alien and confusing, not a helpful mnemonic.


> For example, creating a single byte with bytes([0x1f]) isn't pleasant,
> obvious, or fast.
>

no, though bytes([31]) isn't horrible ;-) (despite coding for over four
decades, I'm still not comfortable with hex notation)

I say it's not horrible, because bytes is a Sequence of bytes (or integer
values between 0 and 255), initializing it with an iterable seems pretty
reasonable, that's how we initialize most (all?) other sequences after all.
And compatible with array.array and numpy arrays.

-CHB


--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: PEP 467 feedback from the Steering Council [ In reply to ]
Barry Warsaw wrote:
> On Aug 4, 2021, at 07:31, Victor Stinner vstinner@python.org wrote:
> > On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman ethan@stoneleaf.us wrote:
> > I would rather keep `bchr` and lose the `.fromint()` methods.
> > I would prefer to only have a bytes.byte(65) method, no bchr()
> > built-in function. I would prefer to keep builtins namespace as small
> > as possible.
> > The Steering Council is also pretty adamantly against adding a new bchr() built-in.

FYI the PEP still mentions `bchr`.

-Brett

> > bytes.byte() name is similar to bytes.getbyte(). I cannot find "int"
> > in the name of other bytes methods.
> > .byte() seems fine to me too. I’m not a fan of smushedwords but .fromint() seemed better than .fromord().
> -Barry
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5XGSSSTSDGG3QSEQSC2IUVNEAKWKLQXS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com>
wrote:

> On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
>
>> The history of bytes/bytearray is a dual-purpose view. It can be used in
>> a string-like way to emulate Python 2 string handling (hence all the usual
>> string methods and a repr that displays in a string-like fashion). It can
>> also be used as an array of numbers, 0 to 255 (hence the list methods and
>> having an iterator of ints). ISTM that the authors of this PEP reject or
>> want to discourage the latter use cases.
>>
>
> I didn't read it that way, but if so, please no, I"d rather see the former
> use cases discouraged. ISTM that the Py2 string handling is still needed
> for working with mixed binary / text data -- but that should be a pretty
> specialized use case. spelling the way to create a byte, byte() sure makes
> more sense in any other context.
>
>
>> ... anything where a C programmer would an array of unsigned chars).
>>
>
> or any programmer would use an array of unsigned 8bit integers :-) numpy
> spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`.
> My point is that for anyone not an "old time" C programmer, or even a
> Python2 programmer, the "character is an unsigned 8 bit int" concept is
> alien and confusing, not a helpful mnemonic.
>
>
>> For example, creating a single byte with bytes([0x1f]) isn't pleasant,
>> obvious, or fast.
>>
>
> no, though bytes([31]) isn't horrible ;-) (despite coding for over four
> decades, I'm still not comfortable with hex notation)
>
> I say it's not horrible, because bytes is a Sequence of bytes (or integer
> values between 0 and 255), initializing it with an iterable seems pretty
> reasonable, that's how we initialize most (all?) other sequences after all.
> And compatible with array.array and numpy arrays.
>

I consider bytes([31]) notation to be horrible API design because a simple
easy to make typo of omitting the [] or using () and forgetting the
tupleizing comma turns it into a different valid call with an entirely
different meaning. bytes([31]) vs bytes((31)) vs bytes(31).

It's also ugly to anyone who thinks about what bytecode is generated and
executed in order to do it. an entire new list object with a single
element referring to a tiny int is created and destroyed just to create a
b'\037' object? An optimizer pass to fix that up at the bytecode level
isn't easy as it can only be done when it can prove that `bytes` has not
been reassigned to something other than the builtin. Near impossible in a
lot of code. bytes.fromint(31) isn't much better in the bytecode regard,
but at least a temporary list is not being created.

As much as I think that bytes(size: int) was a bad idea to have as an API -
bytearray(size: int) is fine and useful as it is mutable - that ship sailed
and getting rid of it would break some odd code. It doesn't have much use,
so adding fromsize(size: int) methods don't sound very compelling as it
just adds yet another way to do the same thing. we should just live with
that specific wart.

`bchr` as a builtin... I'm with the others on saying no to any new builtin
that isn't expected to see frequent use. bchr won't see frequent use.

`bytes.fromint` seems fine. others are proposing `bytes.byte` for that. I
don't *like* to argue over names (the last stage of anything) but I do need
to point out how that sounds to read. It falls victim to API stuttering.
"bytes dot byte" or "bytes byte" doesn't convey much to a reader in English
as the difference is a subtle "s". "bytes dot from int" or "bytes from
int" is quite clear. (avoiding stuttering in API design was popularized by
golang - it's a good thing to strive for in any language) It's times like
this that i wish Python had chosen consistent camelCase, CapWords, or
snake_case in all API names as conjoinedwords aren't great. But they are
sadly consistent with our past sins.

One thing never mentioned in the PEP. If you expect a primary use of the
fromint (aka bchr builtin that isn't going to happen) to be called on
constant values often. Why are we adding name lookups and function calls
to this? Why not address the elephant in the room and allow for decimal
values to be written as an escape sequence within bytes literals?

b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255
with no leading zero should be accepted when parsing such an escape. (Do
not bother adding the same feature for codepoints in unicode strs; leave
that to later if someone shows actual demand). This can't address the
bytearray need, but that's been true of bytearray for ages, a common way to
create them is via a copy from transient bytes objects. bytearray(b'\d31')
isn't much different than bytearray.fromint(31). one less name lookup.

Why not add a \d escape? Introducing a new escape is fraught with peril as
existing \d's within b'' literals in code could change meaning. backwards
compatibility fail. But one that is easy to check for with a
DeprecationWarning for a few releases... The new literal parsing could be
enabled per-file with a __future__ import.

-gps


> -CHB
>
>
> --
> Christopher Barker, PhD (Chris)
>
> Python Language Consulting
> - Teaching
> - Scientific Software Development
> - Desktop GUI and Web Development
> - wxPython, numpy, scipy, Cython
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4GIKYYWV7J5F6IQJ66KUIXWMMF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: PEP 467 feedback from the Steering Council [ In reply to ]
Hm, I don’t think the major use for bchr() will be with a constant.

On Sun, Aug 22, 2021 at 14:48 Gregory P. Smith <greg@krypto.org> wrote:

>
> On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com>
> wrote:
>
>> On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
>>
>>> The history of bytes/bytearray is a dual-purpose view. It can be used
>>> in a string-like way to emulate Python 2 string handling (hence all the
>>> usual string methods and a repr that displays in a string-like fashion).
>>> It can also be used as an array of numbers, 0 to 255 (hence the list
>>> methods and having an iterator of ints). ISTM that the authors of this PEP
>>> reject or want to discourage the latter use cases.
>>>
>>
>> I didn't read it that way, but if so, please no, I"d rather see the
>> former use cases discouraged. ISTM that the Py2 string handling is still
>> needed for working with mixed binary / text data -- but that should be a
>> pretty specialized use case. spelling the way to create a byte, byte() sure
>> makes more sense in any other context.
>>
>>
>>> ... anything where a C programmer would an array of unsigned chars).
>>>
>>
>> or any programmer would use an array of unsigned 8bit integers :-) numpy
>> spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`.
>> My point is that for anyone not an "old time" C programmer, or even a
>> Python2 programmer, the "character is an unsigned 8 bit int" concept is
>> alien and confusing, not a helpful mnemonic.
>>
>>
>>> For example, creating a single byte with bytes([0x1f]) isn't pleasant,
>>> obvious, or fast.
>>>
>>
>> no, though bytes([31]) isn't horrible ;-) (despite coding for over four
>> decades, I'm still not comfortable with hex notation)
>>
>> I say it's not horrible, because bytes is a Sequence of bytes (or integer
>> values between 0 and 255), initializing it with an iterable seems pretty
>> reasonable, that's how we initialize most (all?) other sequences after all.
>> And compatible with array.array and numpy arrays.
>>
>
> I consider bytes([31]) notation to be horrible API design because a simple
> easy to make typo of omitting the [] or using () and forgetting the
> tupleizing comma turns it into a different valid call with an entirely
> different meaning. bytes([31]) vs bytes((31)) vs bytes(31).
>
> It's also ugly to anyone who thinks about what bytecode is generated and
> executed in order to do it. an entire new list object with a single
> element referring to a tiny int is created and destroyed just to create a
> b'\037' object? An optimizer pass to fix that up at the bytecode level
> isn't easy as it can only be done when it can prove that `bytes` has not
> been reassigned to something other than the builtin. Near impossible in a
> lot of code. bytes.fromint(31) isn't much better in the bytecode regard,
> but at least a temporary list is not being created.
>
> As much as I think that bytes(size: int) was a bad idea to have as an API
> - bytearray(size: int) is fine and useful as it is mutable - that ship
> sailed and getting rid of it would break some odd code. It doesn't have
> much use, so adding fromsize(size: int) methods don't sound very compelling
> as it just adds yet another way to do the same thing. we should just live
> with that specific wart.
>
> `bchr` as a builtin... I'm with the others on saying no to any new builtin
> that isn't expected to see frequent use. bchr won't see frequent use.
>
> `bytes.fromint` seems fine. others are proposing `bytes.byte` for that.
> I don't *like* to argue over names (the last stage of anything) but I do
> need to point out how that sounds to read. It falls victim to API
> stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a
> reader in English as the difference is a subtle "s". "bytes dot from int"
> or "bytes from int" is quite clear. (avoiding stuttering in API design was
> popularized by golang - it's a good thing to strive for in any language)
> It's times like this that i wish Python had chosen consistent camelCase,
> CapWords, or snake_case in all API names as conjoinedwords aren't great.
> But they are sadly consistent with our past sins.
>
> One thing never mentioned in the PEP. If you expect a primary use of the
> fromint (aka bchr builtin that isn't going to happen) to be called on
> constant values often. Why are we adding name lookups and function calls
> to this? Why not address the elephant in the room and allow for decimal
> values to be written as an escape sequence within bytes literals?
>
> b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255
> with no leading zero should be accepted when parsing such an escape. (Do
> not bother adding the same feature for codepoints in unicode strs; leave
> that to later if someone shows actual demand). This can't address the
> bytearray need, but that's been true of bytearray for ages, a common way to
> create them is via a copy from transient bytes objects. bytearray(b'\d31')
> isn't much different than bytearray.fromint(31). one less name lookup.
>
> Why not add a \d escape? Introducing a new escape is fraught with peril as
> existing \d's within b'' literals in code could change meaning. backwards
> compatibility fail. But one that is easy to check for with a
> DeprecationWarning for a few releases... The new literal parsing could be
> enabled per-file with a __future__ import.
>
> -gps
>
>
>> -CHB
>>
>>
>> --
>> Christopher Barker, PhD (Chris)
>>
>> Python Language Consulting
>> - Teaching
>> - Scientific Software Development
>> - Desktop GUI and Web Development
>> - wxPython, numpy, scipy, Cython
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4GIKYYWV7J5F6IQJ66KUIXWMMF/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VMNMDBUTGYG72H5WLKDWBYFSUV/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
--Guido (mobile)
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Sun, 22 Aug 2021 16:08:56 -0700
Guido van Rossum <guido@python.org> wrote:
> Hm, I don’t think the major use for bchr() will be with a constant.

What would be the major use for bchr()? I don't think I've ever
regretted its absence.

Regards

Antoine.


>
> On Sun, Aug 22, 2021 at 14:48 Gregory P. Smith <greg@krypto.org> wrote:
>
> >
> > On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com>
> > wrote:
> >
> >> On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
> >>
> >>> The history of bytes/bytearray is a dual-purpose view. It can be used
> >>> in a string-like way to emulate Python 2 string handling (hence all the
> >>> usual string methods and a repr that displays in a string-like fashion).
> >>> It can also be used as an array of numbers, 0 to 255 (hence the list
> >>> methods and having an iterator of ints). ISTM that the authors of this PEP
> >>> reject or want to discourage the latter use cases.
> >>>
> >>
> >> I didn't read it that way, but if so, please no, I"d rather see the
> >> former use cases discouraged. ISTM that the Py2 string handling is still
> >> needed for working with mixed binary / text data -- but that should be a
> >> pretty specialized use case. spelling the way to create a byte, byte() sure
> >> makes more sense in any other context.
> >>
> >>
> >>> ... anything where a C programmer would an array of unsigned chars).
> >>>
> >>
> >> or any programmer would use an array of unsigned 8bit integers :-) numpy
> >> spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`.
> >> My point is that for anyone not an "old time" C programmer, or even a
> >> Python2 programmer, the "character is an unsigned 8 bit int" concept is
> >> alien and confusing, not a helpful mnemonic.
> >>
> >>
> >>> For example, creating a single byte with bytes([0x1f]) isn't pleasant,
> >>> obvious, or fast.
> >>>
> >>
> >> no, though bytes([31]) isn't horrible ;-) (despite coding for over four
> >> decades, I'm still not comfortable with hex notation)
> >>
> >> I say it's not horrible, because bytes is a Sequence of bytes (or integer
> >> values between 0 and 255), initializing it with an iterable seems pretty
> >> reasonable, that's how we initialize most (all?) other sequences after all.
> >> And compatible with array.array and numpy arrays.
> >>
> >
> > I consider bytes([31]) notation to be horrible API design because a simple
> > easy to make typo of omitting the [] or using () and forgetting the
> > tupleizing comma turns it into a different valid call with an entirely
> > different meaning. bytes([31]) vs bytes((31)) vs bytes(31).
> >
> > It's also ugly to anyone who thinks about what bytecode is generated and
> > executed in order to do it. an entire new list object with a single
> > element referring to a tiny int is created and destroyed just to create a
> > b'\037' object? An optimizer pass to fix that up at the bytecode level
> > isn't easy as it can only be done when it can prove that `bytes` has not
> > been reassigned to something other than the builtin. Near impossible in a
> > lot of code. bytes.fromint(31) isn't much better in the bytecode regard,
> > but at least a temporary list is not being created.
> >
> > As much as I think that bytes(size: int) was a bad idea to have as an API
> > - bytearray(size: int) is fine and useful as it is mutable - that ship
> > sailed and getting rid of it would break some odd code. It doesn't have
> > much use, so adding fromsize(size: int) methods don't sound very compelling
> > as it just adds yet another way to do the same thing. we should just live
> > with that specific wart.
> >
> > `bchr` as a builtin... I'm with the others on saying no to any new builtin
> > that isn't expected to see frequent use. bchr won't see frequent use.
> >
> > `bytes.fromint` seems fine. others are proposing `bytes.byte` for that.
> > I don't *like* to argue over names (the last stage of anything) but I do
> > need to point out how that sounds to read. It falls victim to API
> > stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a
> > reader in English as the difference is a subtle "s". "bytes dot from int"
> > or "bytes from int" is quite clear. (avoiding stuttering in API design was
> > popularized by golang - it's a good thing to strive for in any language)
> > It's times like this that i wish Python had chosen consistent camelCase,
> > CapWords, or snake_case in all API names as conjoinedwords aren't great.
> > But they are sadly consistent with our past sins.
> >
> > One thing never mentioned in the PEP. If you expect a primary use of the
> > fromint (aka bchr builtin that isn't going to happen) to be called on
> > constant values often. Why are we adding name lookups and function calls
> > to this? Why not address the elephant in the room and allow for decimal
> > values to be written as an escape sequence within bytes literals?
> >
> > b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255
> > with no leading zero should be accepted when parsing such an escape. (Do
> > not bother adding the same feature for codepoints in unicode strs; leave
> > that to later if someone shows actual demand). This can't address the
> > bytearray need, but that's been true of bytearray for ages, a common way to
> > create them is via a copy from transient bytes objects. bytearray(b'\d31')
> > isn't much different than bytearray.fromint(31). one less name lookup.
> >
> > Why not add a \d escape? Introducing a new escape is fraught with peril as
> > existing \d's within b'' literals in code could change meaning. backwards
> > compatibility fail. But one that is easy to check for with a
> > DeprecationWarning for a few releases... The new literal parsing could be
> > enabled per-file with a __future__ import.
> >
> > -gps
> >
> >
> >> -CHB
> >>
> >>
> >> --
> >> Christopher Barker, PhD (Chris)
> >>
> >> Python Language Consulting
> >> - Teaching
> >> - Scientific Software Development
> >> - Desktop GUI and Web Development
> >> - wxPython, numpy, scipy, Cython
> >> _______________________________________________
> >> Python-Dev mailing list -- python-dev@python.org
> >> To unsubscribe send an email to python-dev-leave@python.org
> >> https://mail.python.org/mailman3/lists/python-dev.python.org/
> >> Message archived at
> >> https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4GIKYYWV7J5F6IQJ66KUIXWMMF/
> >> Code of Conduct: http://python.org/psf/codeofconduct/
> >>
> > _______________________________________________
> > Python-Dev mailing list -- python-dev@python.org
> > To unsubscribe send an email to python-dev-leave@python.org
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> > https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VMNMDBUTGYG72H5WLKDWBYFSUV/
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >



_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AA2DIRNNCSGVYDHZG2LMP3OOQRRL3IAN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
I’m finally getting back around to this thread. I’d like to see some resolution to the bchr/fromint question, since it seems like that’s the last thing holding up approval of the PEP. And the PEP has other useful additions that I’d like to see in Python 3.11.

On Aug 22, 2021, at 16:08, Guido van Rossum <guido@python.org> wrote:
>
> Hm, I don’t think the major use for bchr() will be with a constant.

Perhaps. I think Greg’s idea has merit anyway, but it doesn’t *have* to be tied to PEP 467.

I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider?

Cheers,
-Barry
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:

> I think Nick is on board with bytes.fromint() and no bchr(), and my
> sense of the sentiment here is that this would be an acceptable
> resolution for most folks. Ethan, can you reconsider?

I haven't been completely keeping up with the entire thread, so
apologies if this has already been covered. I assume that the idea is
that bytes.fromint should return a single byte, equivalent to chr()
returning a single character.

To me, it sounds like should be the opposite of int.from_bytes.

>>> int.from_bytes(b'Hello world', 'little')
121404708502361365413651784
>>> bytes.from_int(121404708502361365413651784, 'little')
# should return b'Hello world'

If that's not the API being suggested, that's going to be confusing.

How about bytes.bchr()?

bytes.bchr(n) --> a single byte

bytes.from_int(n, byteorder) --> one or more bytes

Personally, I think I would use the one or more bytes version more then
the single bchr version, so if we only had one, I vote for that.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/4BZ6MTOZM23UXMIUCMHVH3QXCN2ICOJ2/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Wed, Sep 8, 2021 at 7:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
> >>> bytes.from_int(121404708502361365413651784, 'little')
> # should return b'Hello world'

Really? I don't know anyone serializing strings as a "bigint" number.
Did you already see such code pattern in the wild? Usually, bytes are
serialized as... bytes, no? Sometimes, bytes are serialized as base64
or hexadecimal to go through into an ASCII ("7-bit") bytestream. But I
don' recall any file format serializing bytes as a single large
decimal number.

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/65FBFRCWV4V5SAP44EQG3XKHW6Q7C3QL/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On Wed, Sep 8, 2021 at 10:42 PM Victor Stinner <vstinner@python.org> wrote:
>
> On Wed, Sep 8, 2021 at 7:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
> > >>> bytes.from_int(121404708502361365413651784, 'little')
> > # should return b'Hello world'
>
> Really? I don't know anyone serializing strings as a "bigint" number.
> Did you already see such code pattern in the wild? Usually, bytes are
> serialized as... bytes, no? Sometimes, bytes are serialized as base64
> or hexadecimal to go through into an ASCII ("7-bit") bytestream. But I
> don' recall any file format serializing bytes as a single large
> decimal number.
>

I've seen it, in various places. There are certain protocols in which
the distinction between a number and a byte sequence is immaterial
(for instance, the FOURCC identifier in an IFF family file such as a
.wav - the signature 'WAVE' is identically considered to be the number
0x57415645). Being able to convert between the numeric and character
forms of the same identifier is convenient.

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/G7AA7NYUUIAEYEZSETNMQBFAEAC4XYTP/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On 9/7/21 10:39 PM, Steven D'Aprano wrote:
> On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:
>
>> I think Nick is on board with bytes.fromint() and no bchr(), and my
>> sense of the sentiment here is that this would be an acceptable
>> resolution for most folks. Ethan, can you reconsider?
>
> I haven't been completely keeping up with the entire thread, so
> apologies if this has already been covered. I assume that the idea is
> that bytes.fromint should return a single byte, equivalent to chr()
> returning a single character.
>
> To me, it sounds like should be the opposite of int.from_bytes.
>
> >>> int.from_bytes(b'Hello world', 'little')
> 121404708502361365413651784
> >>> bytes.from_int(121404708502361365413651784, 'little')
> # should return b'Hello world'

That certainly makes sense to me. At this point, the only reason that would not work is an arbitrary limit of 255 on
the input, and the only reason that limit is there is to have `bchr` be the inverse of `ord`. Since `bchr` isn't going
to happen, I see no reason to have the 255 limit. `byteorder` can default to None with a requirement of being set when
the integer is over 255.

--
~Ethan~
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/44O4B2YOQGHCYUARRGVZ6WL6GRD4PZ5J/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: PEP 467 feedback from the Steering Council [ In reply to ]
On 2021-09-08 13:37, Victor Stinner wrote:
> On Wed, Sep 8, 2021 at 7:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
>> >>> bytes.from_int(121404708502361365413651784, 'little')
>> # should return b'Hello world'
>
> Really? I don't know anyone serializing strings as a "bigint" number.
> Did you already see such code pattern in the wild? Usually, bytes are
> serialized as... bytes, no? Sometimes, bytes are serialized as base64
> or hexadecimal to go through into an ASCII ("7-bit") bytestream. But I
> don' recall any file format serializing bytes as a single large
> decimal number.
>
Well, we already have int.from_bytes. What's that used for?

Adding the opposite conversion does make sense to me. If the number is
0..255, and maybe the byteorder can be omitted in that case, then it
seems like a reasonable solution to me.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MZGIU5ECYSAPVA47475ZEI4QQCQQJCYA/
Code of Conduct: http://python.org/psf/codeofconduct/

1 2 3  View All