Mailing List Archive

Re: Python-Dev Digest, Vol 232, Issue 10
> I can't see any, but then I couldn't see the security consequences of
> predictable string hashes until they were pointed out to me. So it would
> be really good to have some security experts comment on whether this is
> safe or not.

I can't either. I can point out that the complexity attack via hash
collisions is not possible on None specifically because it is a singleton,
so:
1. There may only be one None in a dict at most,
2. For composite types that depend on None and other things, like say a
tuple of string and an optional int, they become as resistant to attack as
the same types without None, in this case string and int.
3. Python only explicitly added this security (hash randomization) feature
to string and bytes hashes, and if your composite key type depends on those
types, it will still be protected regardless of what None does


> Because this entire discussion is motivated by the OP who wants
> consistent set order across multiple runs of his Python application.
> That's what he needs; having None hash to a constant value is just a
> means to that end.

Not entirely. I do explain in my doc why there is a foundational reason
why, once the choice was made to have None represent the `None` of
Optional[T], it entered the family of types that you would expect to have
deterministic behavior. And as the hashing function is intrinsic to types
in Python, it is included in that.


> Even if we grant None a constant hash, that still does not guarantee
> consistent set order across runs. At best, we might get such consistent
> order as an undocumented and changeable implementation detail, until we
> change the implementation of hashing, or of sets, or of something
> seemingly unrelated like address randomisation.

ASLR will not cause any trouble so long as I keep objects with identity
based hashing out of my sets. Or at least, any sets I later iterate on and
run imperative code with side-effects under the loop.
The possibility of disabling ASLR is not a reason to dismiss this change.
No one guarantees a user of Python is in a position to make infosec related
decisions on the computer system they are working on.

Regarding the possibilities of hashes changing for the worse (in this
regard) - sure. Anything is possible.

Regarding sets - the set implementation is deterministic, and always has
been (for decades).
Yes, in theory it is possible that a new version or implementation of
Python will make sets non-deterministic - shuffle themselves in the
background independently from their history of operations, or what have
you, and then my change loses all of its value.
Like I said, anything is possible. But I think these last two points are
essentially FUD.

I made my proposal because I believe the FUD scenarios are strongly
unlikely. (and even then, at worst we end up with a "practically useless"
behavior on None, that can also be freely reverted along with such other
changes anyway)



On Tue, Nov 29, 2022 at 5:34 AM <python-dev-request@python.org> wrote:

> Send Python-Dev mailing list submissions to
> python-dev@python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> or, via email, send a message with subject or body 'help' to
> python-dev-request@python.org
>
> You can reach the person managing the list at
> python-dev-owner@python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Python-Dev digest..."Today's Topics:
>
> 1. Re: A proposal to modify `None` so that it hashes to a constant
> (Steven D'Aprano)
> 2. Re: A proposal to modify `None` so that it hashes to a constant
> (Oscar Benjamin)
> 3. Re: A proposal to modify `None` so that it hashes to a constant
> (Chris Angelico)
> 4. Re: A proposal to modify `None` so that it hashes to a constant
> (Steven D'Aprano)
>
>
>
> ---------- Forwarded message ----------
> From: "Steven D'Aprano" <steve@pearwood.info>
> To: python-dev@python.org
> Cc:
> Bcc:
> Date: Mon, 28 Nov 2022 15:35:44 +1100
> Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to
> a constant
> On Tue, Nov 29, 2022 at 01:34:54PM +1300, Greg Ewing wrote:
>
> > I got the impression that there were some internal language reasons
> > to want stable dicts, e.g. so that the class dict passed to __prepare__
> > preserves the order in which names are assigned in the class body. Are
> > there any such use cases for stable sets?
>
> Some people wanted order preserving kwargs, I think for web frameworks.
> There was even a discussion for a while about using OrderedDict for
> kwargs and leaving dicts unordered.
>
> For me, the biggest advantage of preserving input order in dicts is that
> it is now easier to write doctests using the dict's repr. It would be
> nice to be able to do the same for sets, but not nice enough to justify
> making them bigger or slower.
>
>
> --
> Steve
>
>
>
> ---------- Forwarded message ----------
> From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
> To:
> Cc: python-dev@python.org
> Bcc:
> Date: Tue, 29 Nov 2022 02:07:34 +0000
> Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to
> a constant
> On Tue, 29 Nov 2022 at 01:33, Steven D'Aprano <steve@pearwood.info> wrote:
> >
> > On Mon, Nov 28, 2022 at 11:13:34PM +0000, Oscar Benjamin wrote:
> > > On Mon, 28 Nov 2022 at 22:56, Brett Cannon <brett@python.org> wrote:
> >
> > As I understand it, we could make sets ordered, but only at the cost of
> > space (much more memory) or time (slower) or both.
> >
> > I am sure that Guido is correct that **if** somebody comes up with a
> > fast, efficient ordered set implementation that doesn't perform worse
> > than the current implementation, we will happily swap to giving sets a
> > predictable order, as we did with dicts. (Practicality beats purity --
> > even if sets are *philosophically* unordered, preserving input order is
> > too useful to give up unless we gain something in return.)
>
> Let's split this into two separate questions:
>
> 1. Is it *innately* good that set order is non-deterministic?
> 2. Are there some other reasons why it is good to choose a model that
> implies *non-deterministic* set order?
>
> The answer to 1. is emphatically NO. In fact the question itself is
> badly posed: why are we even asking about "set order" rather than the
> benefits of determinism in general? If I want my code to be
> deterministic then that's just something that I want regardless of
> whether sets, dicts, floats etc are involved.
>
> As for point 2. the fact that sets are currently non-deterministic is
> actually a relatively new thing in Python. Before hash-randomisation
> set and dict order *was* deterministic but with an arbitrary order.
> That was only changed because of a supposed security issue with hash
> collisions. Prior to that it was well understood that determinism was
> beneficial (honestly I don't understand why I have to state this point
> explicitly: determinism is almost always best in our context).
>
> Please everyone don't confuse arbitrary order, implementation defined
> order and non-deterministic order. There is no reason why sets in
> Python need to have a *non-deterministic* order or at least why there
> shouldn't be a way to control that. There is no performance penalty in
> making the order *deterministic*. (If you think that there might be a
> performance penalty then you haven't understood the suggestion!)
>
> > > It would be useful to have a straight-forward way to sort a set into a
> > > deterministic ordering but no such feature exists after the Py3K
> > > changes (sorted used to do this in Python 2.x).
> >
> > `sorted()` works fine on homogeneous sets. It is only heterogeneous sets
> > that are a problem, and in practice, that usually means None mixed in
> > with some other type.
>
> That is of course precisely the context for this thread!
>
> --
> Oscar
>
>
>
> ---------- Forwarded message ----------
> From: Chris Angelico <rosuav@gmail.com>
> To: python-dev@python.org
> Cc:
> Bcc:
> Date: Tue, 29 Nov 2022 13:20:30 +1100
> Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to
> a constant
> On Tue, 29 Nov 2022 at 13:12, Oscar Benjamin <oscar.j.benjamin@gmail.com>
> wrote:
> > As for point 2. the fact that sets are currently non-deterministic is
> > actually a relatively new thing in Python. Before hash-randomisation
> > set and dict order *was* deterministic but with an arbitrary order.
> > That was only changed because of a supposed security issue with hash
> > collisions. Prior to that it was well understood that determinism was
> > beneficial (honestly I don't understand why I have to state this point
> > explicitly: determinism is almost always best in our context).
>
> To clarify: The hash collision attack is a very real one, but specific
> to dictionaries of string keys, since there are quite a few ways for
> an attacker to send a string that gets automatically parsed into such
> a dictionary (eg web app frameworks where the request parameters are
> made available as a dictionary). But since that attack surface is *so*
> specific, randomization of non-string hashes is unimportant.
>
> ChrisA
>
>
>
> ---------- Forwarded message ----------
> From: "Steven D'Aprano" <steve@pearwood.info>
> To: python-dev@python.org
> Cc:
> Bcc:
> Date: Mon, 28 Nov 2022 17:19:22 +1100
> Subject: [Python-Dev] Re: A proposal to modify `None` so that it hashes to
> a constant
> On Tue, Nov 29, 2022 at 02:07:34AM +0000, Oscar Benjamin wrote:
>
> > Let's split this into two separate questions:
>
> Let's not. Your first question about non-deterministic set order being
> "innately good" is a straw man: as we've already discussed, set order is
> not non-deterministic (except in the informal sense of "hard to
> predict") and I don't think anyone is arguing in favour of keeping set
> order unpredictable even if there are faster, more compact, simpler
> implementations which preserve order.
>
> Talking about determinism is just muddying the waters. Sets are
> deterministic: their order is determinied by the implementation, the set
> history, and potentially any environmental sources of entropy used in
> address randomisation. Deterministic does not mean predictable.
>
> If we want to get this discussion onto a more useful path, we should
> start with a security question:
>
> The hash of None changes because of address randomisation. Address
> randomisation is enabled as a security measure. Are there any security
> consequences of giving None a constant hash?
>
> I can't see any, but then I couldn't see the security consequences of
> predictable string hashes until they were pointed out to me. So it would
> be really good to have some security experts comment on whether this is
> safe or not.
>
>
> > why are we even asking about "set order" rather than the
> > benefits of determinism in general?
>
> Because this entire discussion is motivated by the OP who wants
> consistent set order across multiple runs of his Python application.
> That's what he needs; having None hash to a constant value is just a
> means to that end.
>
> His sets contain objects whose hashes depend on `Optional[int]`, which
> means sometimes they include None, and when he runs under an interpreter
> built with address randomisation, the hash of None can change.
>
> Even if we grant None a constant hash, that still does not guarantee
> consistent set order across runs. At best, we might get such consistent
> order as an undocumented and changeable implementation detail, until we
> change the implementation of hashing, or of sets, or of something
> seemingly unrelated like address randomisation.
>
>
> --
> Steve
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
>