Mailing List Archive

1 2  View All
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
> the language makes no guarantee about hash consistency between
executions

because it's futile in the general case, even if objects were to get a serial `id` and hash by it for example, any change in the number of objects created across all of Python (including its builtin modules and various libraries unrelated to the user code) would make these hashes move.

So it's not like it's even possible to require this generally for all objects.

None of that makes deterministic structural hashing any less useful in practice, though.

Besides, do other languages require it?
Is it required for the language to behave in a manner that makes sense?

Or maybe you think it's by pure accident that such an overwhelming majority of languages and software libraries implement/use deterministic hashing functions for primitive types or aggregates that consist of such types?
I can't figure out if you think it's actually a bad property for the language to have, or really just arguing that it's bad for the sake of it.

> set order is not guaranteed
Maybe not. In practice it has fully deterministic behavior, always has across all versions of Python since its inception. I don't care about what the order is, only that it's deterministic, and it is.

Rejecting my change because someone can technically get away with breaking this, after 30+ years seems highly suspect.

Imagine I came in with 0.5% perf improvement, you would reject it citing that Python's requirements do not mandate that sets have good performance, and also that since they don't, someone else might come in with a change that slows down sets by an arbitrary amount, so there's no reason to believe my change will help at all

Yes, if we tried really hard, we could always make the language worse. That's a pretty awful reason to reject the change though.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CDEDUAMJENE5TMSEMEP4PM3JXF6WBXQP/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Thu, 1 Dec 2022 at 17:26, Yoni Lavi <yoni.lavi.p@gmail.com> wrote:
>
> > the language makes no guarantee about hash consistency between
> executions
>
> because it's futile in the general case, even if objects were to get a serial `id` and hash by it for example, any change in the number of objects created across all of Python (including its builtin modules and various libraries unrelated to the user code) would make these hashes move.
>

For the record, Jython DOES use sequential numbers for ids. And it
doesn't reuse them even if the objects are disposed of.

>>> id(None)
2
>>> lists = [[] for _ in range(10)]
>>> [id(l) for l in lists]
[3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>> lists = None
>>> lists = [[] for _ in range(10)]
>>> [id(l) for l in lists]
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22]

IIRC an id is not assigned to an object until one is requested.

> So it's not like it's even possible to require this generally for all objects.

Well, I mean, in theory you could require that objects whose hash
isn't otherwise defined get given the hash of zero. That doesn't
violate any of the actual rules of hashes, but it does make those
hashes quite suboptimal :)

It's interesting how id() and hash() have opposite requirements (id
must return a unique number among concurrently-existing objects, hash
must return the same number among comparing-equal objects), yet a hash
can be built on an id.

> Besides, do other languages require it?
> Is it required for the language to behave in a manner that makes sense?
>
> Or maybe you think it's by pure accident that such an overwhelming majority of languages and software libraries implement/use deterministic hashing functions for primitive types or aggregates that consist of such types?
> I can't figure out if you think it's actually a bad property for the language to have, or really just arguing that it's bad for the sake of it.

Determinism is usually the easiest option. True randomness takes a lot
of effort compared to a deterministic PRNG, hence web servers having
true entropy but old game consoles relying on PRNGs.

Whether determinism is fundamentally good or fundamentally bad depends
heavily on context. Question: Is the metaquestion "is determinism
good" deterministic (ie can it be answered entirely from predictable
facts), or is it itself entropic? I believe the former, but I'm
curious if anyone disagrees!

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NLDV5PAONVETRJL5QDITFEN34JDKR7T2/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
> Whether determinism is fundamentally good or fundamentally bad depends
> heavily on context.

Agreed 100%. Unfortunately in Python, you cannot choose your hashing function depending on context.

Also, once you've decided to violate determinism somewhere, it's gone. There is no way, in the general case, to bring it back.

That's why it's important not to violate it willy-nilly in a manner that cannot even be prevented by users who _want_ their programs to exhibit deterministic behavior.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TGRZPF6XWOMCBDLA3N3YWUQJJ77UHZKG/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Thu, 1 Dec 2022 at 06:56, Chris Angelico <rosuav@gmail.com> wrote:
>
> On Thu, 1 Dec 2022 at 17:26, Yoni Lavi <yoni.lavi.p@gmail.com> wrote:
> >
> > So it's not like it's even possible to require this generally for all objects.
>
> Well, I mean, in theory you could require that objects whose hash
> isn't otherwise defined get given the hash of zero. That doesn't
> violate any of the actual rules of hashes, but it does make those
> hashes quite suboptimal :)
>
> It's interesting how id() and hash() have opposite requirements (id
> must return a unique number among concurrently-existing objects, hash
> must return the same number among comparing-equal objects), yet a hash
> can be built on an id.

This also demonstrates a significant reason why None is special: it's
a singleton that only compares equal to itself. The reason for using
id for hash in other cases is to make different instances have
different hashes but there is only ever one instance of None. A
singleton class can have a hash function that matches identity based
equality without using id: any constant hash function will do.

--
Oscar
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MTTJJN2HHP3A264DN3CAWSXITHRMLLUW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
Wild suggestion:
    Make None.__hash__ writable.
E.g.
    None.__hash__ = lambda : 0 # Currently raises AttributeError:
'NoneType' object attribute '__hash__' is read-only
Best wishes
Rob Cliffe

On 01/12/2022 11:02, Oscar Benjamin wrote:
> On Thu, 1 Dec 2022 at 06:56, Chris Angelico <rosuav@gmail.com> wrote:
>> On Thu, 1 Dec 2022 at 17:26, Yoni Lavi <yoni.lavi.p@gmail.com> wrote:
>>> So it's not like it's even possible to require this generally for all objects.
>> Well, I mean, in theory you could require that objects whose hash
>> isn't otherwise defined get given the hash of zero. That doesn't
>> violate any of the actual rules of hashes, but it does make those
>> hashes quite suboptimal :)
>>
>> It's interesting how id() and hash() have opposite requirements (id
>> must return a unique number among concurrently-existing objects, hash
>> must return the same number among comparing-equal objects), yet a hash
>> can be built on an id.
> This also demonstrates a significant reason why None is special: it's
> a singleton that only compares equal to itself. The reason for using
> id for hash in other cases is to make different instances have
> different hashes but there is only ever one instance of None. A
> singleton class can have a hash function that matches identity based
> equality without using id: any constant hash function will do.
>
> --
> Oscar
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MTTJJN2HHP3A264DN3CAWSXITHRMLLUW/
> Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XRASAGN52DAM7EAKJOYSWHJEKFAP2JPT/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On 11/30/2022 8:48 PM, Rob Cliffe via Python-Dev wrote:
> Thank you for this very clear analysis, Oscar.
> It seems to me that this strengthens the OP's case.  I am curious as to
> whether others agree.

I do.

> On 30/11/2022 13:35, Oscar Benjamin wrote:
>> On Tue, 29 Nov 2022 at 23:46, Steven D'Aprano <steve@pearwood.info>
>> wrote:
>>> On Tue, Nov 29, 2022 at 08:51:09PM -0000, Yoni Lavi wrote:
>>>
>>>> It does make your argument invalid though, since it's based on this
>>>> assumption that I was asking for a requirement on iteration order
>>>> (e.g. like dict's iteration order = insertion order guarantee), which
>>>> is not the case.
>>> Yoni, I think this answer is disingenious.
>> I don't think it is disingenuous. There are just a lot of people
>> talking past each other and not quite understanding what each person
>> means because there is confusion about even the intended meaning of
>> terms like "deterministic". I will expand here with enough detail that
>> we should hopefully be able to avoid misunderstanding each other.
>>
>> There are probably other places where you could find mentions of this
>> in the docs but I just took a quick look in the Python 3.5 docs
>> (before hash randomisation) to find this mention of dictionary
>> iteration order:
>> https://docs.python.org/3.5/library/stdtypes.html#dictionary-view-objects
>>
>> What it says is
>> """
>> Keys and values are iterated over in an arbitrary order which is
>> non-random, varies across Python implementations, and depends on the
>> dictionary’s history of insertions and deletions.
>> """
>> The key point is the use of the term "non-random" which here is
>> intended to mean that although no particular ordering is guaranteed
>> you can expect to rerun the same program and get the same result
>> deterministically. A different version or implementation of Python
>> might give a different order but rerunning the same program twice
>> without changing anything should give the same result even if that
>> result depends in some way on the iteration order of some
>> dictionaries. I can't immediately find a similar statement about sets
>> but in practice the same behaviour applied to sets as well. Note
>> carefully that it is this *narrow* form of determinism that Yoni is
>> interested in.
>>
>> Of course there are some caveats to this and the obvious one is that
>> this statement does not apply if there are some objects that use
>> identity based hashing so this is not deterministic:
>>
>> class A:
>>      def __init__(self, data):
>>          self.data = data
>>      def __repr__(self):
>>          return 'A(%s)' % self.data
>>
>> a1 = A(1)
>> a2 = A(2)
>>
>> for a in {a1, a2}:
>>      print(a)
>>
>> Running this gives:
>>
>> $ python3.5 t.py
>> A(2)
>> A(1)
>> $ python3.5 t.py
>> A(1)
>> A(2)
>>
>> On the other hand if all of the hashes themselves are deterministic
>> then the program as a whole will be as well so this is deterministic:
>>
>> class A:
>>      def __init__(self, data):
>>          self.data = data
>>      def __repr__(self):
>>          return 'A(%s)' % self.data
>>      def __hash__(self):
>>          return hash(self.data)
>>      def __eq__(self):
>>          return self.data == other.data
>>
>> a1 = A(1)
>> a2 = A(2)
>>
>> for a in {a1, a2}:
>>      print(a)
>>
>> $ python3.5 t.py
>> A(1)
>> A(2)
>> $ python3.5 t.py
>> A(1)
>> A(2)
>>
>> So we have two classes of hashable objects:
>>
>> 1. Those with deterministic hash
>> 2. Those with non-deterministic hash
>>
>> A program that avoids depending on the iteration order of sets or
>> dicts containing objects with non-deterministic hash could be
>> deterministic. It is not the case that the program would depend on the
>> iteration order for its *correctness* but just that the behaviour of
>> the program is *reproducible* which is useful in various ways e.g.:
>>
>> - You could say to someone else "run this code with CPython 3.5 and
>> you should be able to reproduce exactly what I see when I run the
>> program". It is common practice e.g. in scientific programming to
>> record things like random seeds so that someone else can precisely
>> reproduce the results shown in a paper or some other work and this in
>> general requires that it is at least possible to make everything
>> deterministic.
>>
>> - When debugging it is useful to be able to reproduce an error
>> condition precisely. Debugging non-deterministic failures can be
>> extremely difficult. In the same way that you might want to reproduce
>> correctly functioning code it is also very useful to be able to
>> reproduce bugs.
>>
>> I can list more examples but really it shouldn't be necessary to
>> justify from first principles why determinism in programming is
>> usually a good thing. There can be reasons sometimes why determinism
>> is undesired or cannot or should not be guaranteed. It should not be
>> controversial though to say that all things being equal determinism is
>> usually a desirable feature and should be preferred by default. I
>> don't think that the 3.5 docs I quoted above used the words
>> "non-random" casually: it was an intended feature and people were
>> aware that breaking that behaviour would be problematic in many
>> situations.
>>
>> Of course in Python 3.6 this determinism was broken with the
>> introduction of hash randomisation for strings. It was considered that
>> for security purposes it would be better to have some internal
>> non-deterministic behaviour to guard against attackers. Specifically
>> the hashes of three types (str, bytes and datetime) were made
>> non-deterministic between subsequent CPython processes. The effect was
>> not only to change purely internal state though but also the
>> observable iteration order of dicts and sets which became
>> non-deterministic from one run of CPython to another. It was
>> anticipated at the time that this might be problematic in some
>> situations (it certainly was!) and so an environment variable
>> PYTHONHASHSEED was introduced in order to restore determinism for
>> cases that needed it.
>>
>> So now if you want to have reproducible behaviour with Python 3.6+ you
>> also need to fix PYTHONHASHSEED as well as avoiding the use of other
>> types of non-deterministically hashable objects in sets and dicts.
>> This is something that I have personally used mainly for the
>> reproducibility of rare bugs e.g. "to reproduce this you should run
>> the following Python code using commit abc123 under CPython 3.6 and
>> with PYTHONHASHSEED=1234".
>>
>> Subsequently in Python 3.7 dict iteration order was changed to make it
>> always deterministic by having it not depend on the hash values at
>> all, with the order depending on the order of insertions into the dict
>> instead. This introduced a *stronger* guarantee of determinism: now
>> the ordering could be expected to be reproducible even with different
>> versions and implementations of Python. For many this seemed to have
>> resolved the problems of undefined, implementation-defined etc
>> ordering. However this only applied to dicts and not sets and as of
>> Python 3.12 any issues about deterministic ordering still remain
>> wherever sets are used.
>>
>> The introduction of hash randomisation means that since Python 3.6
>> there are now three classes of hashable objects:
>>
>> 1. Objects with deterministic hash (int etc)
>> 2. Objects with non-deterministic hash that can be controlled with
>> PYTHONHASHSEED (bytes, str and datetime).
>> 3. Objects with non-deterministic hash that cannot be controlled
>> (id-based hashing).
>>
>> The question in this thread and others is which of these three classes
>> None should belong to. Although None is just one particular value it
>> is a very commonly used value and its non-deterministic hash can creep
>> through to affect the hash of larger data structures that use
>> recursive hash calls (e.g. a tuple of tuples of ... that somewhere
>> contains None). Also certain situations such as a type hint like
>> Optional[T] as referred to by the OP necessarily use None:
>>
>> $ python -c 'from typing import Optional; print(hash(Optional[int]))'
>> -7631587549837930667
>> $ python -c 'from typing import Optional; print(hash(Optional[int]))'
>> -6488475102642892491
>>
>> Somehow this affects frozen dataclasses but I haven't used those
>> myself so I won't demonstrate how to reproduce the problem with them.
>>
>> Here is a survey of types from builtins:
>>
>> NoneType bool bytearray bytes complex dict ellipsis enumerate filter
>> float frozenset int list map memoryview object range reversed set
>> slice str tuple type zip
>>
>> We can divide these into the three classes described above plus the
>> non-hashable types (I haven't checked this in the code, but just
>> experimenting with calling hash):
>>
>> 1. Deterministic hash:
>> bool, complex, float, frozenset, int, range, tuple
>>
>> 2. Hash depends on hash seed:
>> str, bytes
>>
>> 3. Hash depends on id:
>> NoneType, ellipsis, enumerate, filter, memoryview, object, reversed, zip
>>
>> 4. Non-hashable:
>> bytearray, dict, list, set, slice
>>
>> The question here is whether None belongs in class 3 or class 1. To me
>> it seems clear that there is no advantage in having None be in class 3
>> except perhaps to save a few simple lines of code:
>> https://github.com/python/cpython/pull/99541/files
>>
>> There is however a worthwhile advantage in having None be in class 1.
>> If None had a deterministic hash then tuples, frozensets etc
>> consisting of objects with None as well as other objects with
>> deterministic hash could all have deterministic hash. The behaviour of
>> iteration order for sets would then be deterministic in the *narrow*
>> sense that is referred to by the Python 3.5 docs above.
>>
>> Some have argued that the fact that some types have a seed dependent
>> hash implies that None should not have a deterministic hash but this
>> does not follow. It was known at the time that str and bytes were
>> moved from class 1 to class 2 that it would be problematic to do so
>> which is precisely why PYTHONHASHSEED was introduced. However
>> PYTHONHASHSEED does not help here because None is not even in class 2
>> but rather class 3.
>>
>> If I was going to label any claim made by anyone in these threads as
>> disingenuous then it would be the claim that None is somehow not
>> "special". Firstly many types have deterministic hash so it isn't
>> really that much of a special property. Secondly None is *clearly* a
>> special value that is used everywhere! When I open the CPython
>> interpreter there are already thousands of references to None before I
>> have even done anything:
>>
>>    >>> import sys
>>    >>> sys.getrefcount(None)
>>    4429
>>
>> The motivation for this and other threads is to bring determinism in
>> the *narrow* sense. Others (including me) have made references to
>> other kinds of determinism that have derailed the threads by
>> misunderstanding exactly what Yoni is referring to. The *stronger*
>> sense of determinism would be useful if possible but it is not the
>> intended topic of these threads.
>>
>> --
>> Oscar
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/73KMUPBTS4MIOMPRT3PBQ36HREQFXUUN/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/KVY3SRLMEFTAY7CBJHYDXQ4HCHK6P2O4/
> Code of Conduct: http://python.org/psf/codeofconduct/

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MNPGEGH3AQOB75OEEXQNZIARCBZYUSYA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Mon, Nov 28, 2022 at 6:45 PM Steven D'Aprano <steve@pearwood.info> wrote:
> On Tue, Nov 29, 2022 at 01:34:54PM +1300, Greg Ewing wrote:
> > I got the impression that there were some internal language reasons
> > to want stable dicts, e.g. so that the class dict passed to __prepare__
> > preserves the order in which names are assigned in the class body. Are
> > there any such use cases for stable sets?
>
> Some people wanted order preserving kwargs, I think for web frameworks.
> There was even a discussion for a while about using OrderedDict for
> kwargs and leaving dicts unordered.

See https://peps.python.org/pep-0468/ (kwargs) and
https://peps.python.org/pep-0520/ (class definition body). I
re-implemented OrderedDict in C for this purpose. Literally right
after I had finished that, Inada-san showed up with his compact dict
implementation. Many of us were at the first core sprint at the time
and there was a lot of excitement about compact dict. It was merged
right away (for 3.6) and there was quick agreement that we could
depend on dict insertion ordering internally (for a variety of use
cases, IIRC). Thus, suddenly both my PEPs were effectively
implemented, so we marked them as approved and moved on.

FWIW, making the insertion ordering an official part of the language
happened relatively soon afterward, though for 3.7, not 3.6. [1] I'm
pretty sure there's a python-dev thread about that. The stdtypes docs
were updated [2] soon after, and we finally got around to updating the
language [3] a couple years later.

-eric


[1] https://docs.python.org/3/whatsnew/3.7.html#summary-release-highlights
[2] https://bugs.python.org/issue33609
[3] https://bugs.python.org/issue39879
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5QYN66BWHO4GHWD34DIY43NLBMAM4UPZ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
There's a number of Core devs that have taken strong positions against this change, citing various reasons ranging from "the addition of a function that returns a constant will cause bloat in the interpreter / needs to be tested / etc" to "what you really mean to ask for is set iteration stability, and we don't want that" to "identity based hashing is the default correct choice of a hashing function to use in any situation, unless we are forced by the requirements not to (even if it's disadvantageous compared to other choices)" to just straight appeals to authority ("rhettinger closed the issue on github so he must have done it for a good reason).

I'm not sure if they actually believe what they say in all of these cases. To me, it sounds more like "please go away" than an honest argument on technical merit, but it matters little.
I don't think anything can be changed with further technical discussion.

---

I do have another suggestion that I think merits a discussion. Maybe it will fare better. This change has a bit broader scope.

What if we were to subtract some statically allocated “anchor” address from the pointer in _Py_HashPointerRaw and the id function?

It’s arguably a security fix, since these operations currently leak the ASLR offset, and after that they won’t. It also makes the hashes of statically allocated PyObjects with defaulted tp_hash stable per build of Python, which I think is a good thing for reasons we’ve already discussed at great length.

There is a downside to this suggestion that it adds one integer subtraction to each of these functions.

If this tiny perf cost is a concern, we could even disable this countermeasure if Python can determine it was guaranteed to load to a static memory location.

At least two core devs responded with "don't care" / “it works on my machine” because they happen to have ASLR disabled. The current situation ties together two completely separate concerns, and adds a non-portable aspect to the behavior of the runtime - you can write a program that behaves deterministically on system A and then see non-deterministic behavior on system B. I don’t think I should have to explain why this is bad.

Regarding language requirements, nothing changes.

It is a per-interpreter specific change, since not all id and hash implementations depend on the object’s memory location (also since some runtime environments, like JVM, cannot be attacked with out of bounds memory accesses from inside the program, so an ASLR offset leak might not be deemed a risk there). At most, it is an advisory that those who do should act similarly, and even that is tenuous at best.

WDYT?

P.S. the other way to implement the security fix is to add a randomly chosen 64-bit secret (and then you wouldn’t know what part of the “offset” is due to ASLR and what’s due to the secret). And at least then, it becomes non-deterministic on all systems, as opposed to just some of them.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JGG2LOTJEFXLLMNEMNHT7CHOUSNZ5KZX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Sat, 3 Dec 2022 at 10:57, Yoni Lavi <yoni.lavi.p@gmail.com> wrote:

> There's a number of Core devs that have taken strong positions against
> this change, citing various reasons ranging from "the addition of a
> function that returns a constant will cause bloat in the interpreter /
> needs to be tested / etc" to "what you really mean to ask for is set
> iteration stability, and we don't want that" to "identity based hashing is
> the default correct choice of a hashing function to use in any situation,
> unless we are forced by the requirements not to (even if it's
> disadvantageous compared to other choices)" to just straight appeals to
> authority ("rhettinger closed the issue on github so he must have done it
> for a good reason).
>

My position isn't an "appeal to authority", it's more a case of
acknowledging that core devs have a collective decision making role that is
normally exercised by consensus decisions with all but one or two of the
devs abstaining. In this case, so far only Raymond has not abstained, so
his view stands. Multiple core devs have contributed to these threads, and
likely more have read without commenting - but none of them have been
sufficiently convinced to state an opposing view on the tracker (which is
what matters here).


> I'm not sure if they actually believe what they say in all of these cases.
> To me, it sounds more like "please go away" than an honest argument on
> technical merit, but it matters little.
> I don't think anything can be changed with further technical discussion.
>

It's absolutely *not* "please go away" in my case. It's more "I see your
point, but I don't personally care enough to make an issue of the decision
on your behalf". Although I will admit that I have now reached the point
where I sort of wish this whole discussion *would* "go away" - whether by
you accepting that no-one's going to reverse the decision, or by another
core dev supporting the change on the tracker, rather than just posting
here and on Discourse. But I understand the temptation to just continue the
discussion without taking a formal stance (after all, I'm doing it myself).


> I do have another suggestion that I think merits a discussion. Maybe it
> will fare better. This change has a bit broader scope.
>

I think this is over-complicating things. I think the key merit of your
original proposal was its simplicity. Proposing more complicated ways of
getting the result you want is (IMO) unlikely to succeed, and is only
likely to cause people to become even more entrenched in their positions.
Can you give any explanation of why this proposal is better than your
original one, *apart* from "it's not been rejected yet"?

Seriously. No matter what your proposal, you need core dev support on the
tracker. IMO you stand more chance with your original proposal, in spite of
Raymond's rejection. But at this point, endlessly posting your views
everywhere isn't helping. Give people time to think and consider, and maybe
someone will decide to support the change. There's no urgency - 3.12 is a
year away, so that's the soonest this might be available.

Paul
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
> I think this is over-complicating things. I think the key merit of your
> original proposal was its simplicity. Proposing more complicated ways of
> getting the result you want is (IMO) unlikely to succeed, and is only
> likely to cause people to become even more entrenched in their positions.
> Can you give any explanation of why this proposal is better than your
> original one, *apart* from "it's not been rejected yet"?

It does sidestep the "There is nothing special about None" argument which IS the reason cited for rejecting the previous change.
OTOH, you are still right, in the sense that I know that no matter what exactly I propose, there WILL be a way to view it negatively. (BTW: did you see the 2nd flavor of my proposal in the P.S. where I *don't* get what I want)?

I'm having trouble staying away from the discussion - I tried once and failed. Other people continue to talk about it, then I join again.

But it might be best if I leave the discussion or at least my position as the person who's proposing this change. Forget about the PR/issue and start over. Perhaps someone else needs to push for it. A core dev, so their opinion on the matter will count for something. Until then, it's not a productive use of our collective time to discuss it any further.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LGBQZGL6RQCWLFCP3C2ZFDTFQSR33LU3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Sat, 3 Dec 2022 at 14:46, Yoni Lavi <yoni.lavi.p@gmail.com> wrote:

> > I think this is over-complicating things. I think the key merit of your
> > original proposal was its simplicity. Proposing more complicated ways of
> > getting the result you want is (IMO) unlikely to succeed, and is only
> > likely to cause people to become even more entrenched in their positions.
> > Can you give any explanation of why this proposal is better than your
> > original one, *apart* from "it's not been rejected yet"?
>
> It does sidestep the "There is nothing special about None" argument which
> IS the reason cited for rejecting the previous change.
>

OK, but it also reframes the problem as being about security, not about
reproducibility, which feels like trying to make the issue more "scary" in
the hope that people might care more.


> OTOH, you are still right, in the sense that I know that no matter what
> exactly I propose, there WILL be a way to view it negatively.


That's true of every proposal, and while I sympathise with your
disillusionment, I don't think looking at it that way will help your case.


> (BTW: did you see the 2nd flavor of my proposal in the P.S. where I
> *don't* get what I want)?
>

If by that you mean the "other way to implement the security fix", I saw it
but discounted it. I think the security aspect is a red herring. I
originally mentioned it as a "we should check" suggestion, but people who
know have all (as far as I can see) said there's no security risk here, so
I think we should drop the security related arguments.

I'm having trouble staying away from the discussion - I tried once and
> failed. Other people continue to talk about it, then I join again.
>

Me too :-) But I suggest seeing other people talking about it as a good
thing. Don't try to have the last word (something I find myself doing way
too often), but instead think that if people are talking, they are trying
to form their own views - and repeating yours won't add anything new, but
might distract people who were gradually coming to see your way of thinking.

Oscar Benjamin made a good argument recently in one of the threads. People
*are* helping make your point, you don't need to interject with "look, see,
this is what I meant!" every time. You've sown the seed, let it grow.


> But it might be best if I leave the discussion or at least my position as
> the person who's proposing this change.


Don't feel so tied to "owning" the proposal. You started a conversation,
it's got people thinking, wait and see what consensus forms. I still think
you pushed *way* too hard initially, and came across as not being willing
to listen to counter-arguments, but there's no reason you can't back off a
little and learn from that experience. Many proposals generate huge
discussion threads, but eventually fail because everyone digs their heels
in (this is what I think Chris Angelico perceives as "making proposals run
the gauntlet"). Letting ideas grow and develop, being willing to give
people time to decide, has a very different feel to it, but it requires the
proposer to let the debate happen, and *guide* it, rather than controlling
it. In my experience, not many new proposers know how to do this (or they
don't care enough to try).

This is a tiny change, almost insignificant in terms of benefits and
consequences. But it's a chance to learn the dynamics of making a proposal,
and understanding how to influence the direction of a discussion. Even if
the proposal ultimately fails, learning more about that is good. (Sorry,
re-reading that it sounds rather patronising. I don't mean it to, but I do
know that I see lots of people commenting that there's a lot more
"politics" and "people skills" to open source than they realised, and this
is a really good example of that in action, so I couldn't resist pointing
it out).


> Forget about the PR/issue and start over. Perhaps someone else needs to
> push for it. A core dev, so their opinion on the matter will count for
> something.


I really hope that the take-away here *isn't* that only the opinions of
core devs matter. It's true that only core devs can make changes, but if we
ever reach a point where community opinions aren't important, we've failed
badly. What *is* true, is that Python is big enough that *every* change has
to be considered in terms of the overall value to the whole community. Core
devs have to make those sorts of judgements a lot, so their views have
weight. With a community member, it's often not clear whether they are a
new contributor with a "great idea" but no experience in handling big open
source projects, or someone who's highly experienced in a different
community that we just don't know about. So "show that you can argue your
point and persuade people - or at least learn how to do so" is a challenge
non-core devs need to navigate to be heard. But it's not impossible to do
so, and it's not a failure if you struggle learning how to get your points
across.

Also, people are people, and we all have bad days when we're simply grumpy,
argumentative, or even downright obstructive :-) Not all disagreements are
rational.


> Until then, it's not a productive use of our collective time to discuss it
> any further.
>

Almost everyone here is doing this as a hobby or for fun. So "productive"
isn't really the measure. As long as people are *enjoying* the debate, it's
worthwhile.

Paul
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On 29/11/2022 00:51, Guido van Rossum wrote:
> To stir up some more fire, I would personally be fine with sets having
> the same ordering guarantees as dicts, *IF* it can be done without
> performance degradations. So far nobody has come up with a way to ensure
> that. "Sets weren't meant to be deterministic" sounds like a remnant of
> the old philosophy, where we said the same about dicts -- until they
> became deterministic without slowing down, and then everybody loved it.

Hi all, hi Guido,

Just as a data point, PyPy's sets have been using the same stable
ordering like dicts since our introduction of insertion ordered dicts.
We have a number of other set optimizations so it's not an
apple-to-apple comparison, but still.

Cheers,

Carl Friedrich

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/52OGDTARUZJMV5ETMTRWHFS2GGVERNE3/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Thu, Dec 01, 2022 at 10:18:49PM +0000, Rob Cliffe via Python-Dev wrote:

> Wild suggestion:
> ??? Make None.__hash__ writable.
> E.g.
> ??? None.__hash__ = lambda : 0 # Currently raises AttributeError:
> 'NoneType' object attribute '__hash__' is read-only

You would have to write to `type(None).__hash__` because of the way
dunders work.

Now imagine that you have twenty different libraries or functions or
classes, each the `__hash__` method to a different function. Chaos.

You can simulate that chaos with this:

```
import random

class ChangingHash:
def __repr__(self):
return "MyNone"
def __hash__(self):
# Simulate the effect of many different callers changing
# the hash value returned at unpredictable times.
return random.randint(1, 9)

MyNone = ChangingHash()

data = {MyNone: 100}
print(MyNone in data) # 8 in 9 chance of printing False
data[MyNone] = 200
print(data) # 8 in 9 chance of {MyNone: 100, MyNone: 200}
print(MyNone in data) # now 7 in 9 chance of printing False
```


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZEQFHMIQJIO5AWYTLSW7PKPZE2RZMJMY/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
On Mon, 5 Dec 2022 at 05:11, Rob Cliffe via Python-Dev
<python-dev@python.org> wrote:
>
> Wild suggestion:
> Make None.__hash__ writable.
> E.g.
> None.__hash__ = lambda : 0 # Currently raises AttributeError:
> 'NoneType' object attribute '__hash__' is read-only

Hashes have to be stable. If you change the hash of None after it's
been inserted into a dictionary, you'll get all kinds of entertaining
problems.

>>> class X:
... def __init__(self): self.hash = 0
... def __hash__(self): return self.hash
...
>>> x = X()
>>> d = {x: "This is x"}
>>> x.hash = 1
>>> for key in d: print(key, key in d)
...
<__main__.X object at 0x7f2d07c6f1c0> False

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OVVIKTG7CBN6BII4OBGIXWQJJXYCEO3I/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
You are absolutely right, of course.  It was a wild idea, and a bad one.
I find myself moving towards supporting the OP.  I can't see anything
terrible about the hash of None always being 0, or perhaps better some
other arbitrary constant.
Rob

On 04/12/2022 03:20, Steven D'Aprano wrote:
> On Thu, Dec 01, 2022 at 10:18:49PM +0000, Rob Cliffe via Python-Dev wrote:
>
>> Wild suggestion:
>>     Make None.__hash__ writable.
>> E.g.
>>     None.__hash__ = lambda : 0 # Currently raises AttributeError:
>> 'NoneType' object attribute '__hash__' is read-only
> You would have to write to `type(None).__hash__` because of the way
> dunders work.
>
> Now imagine that you have twenty different libraries or functions or
> classes, each the `__hash__` method to a different function. Chaos.
>
> You can simulate that chaos with this:
>
> ```
> import random
>
> class ChangingHash:
> def __repr__(self):
> return "MyNone"
> def __hash__(self):
> # Simulate the effect of many different callers changing
> # the hash value returned at unpredictable times.
> return random.randint(1, 9)
>
> MyNone = ChangingHash()
>
> data = {MyNone: 100}
> print(MyNone in data) # 8 in 9 chance of printing False
> data[MyNone] = 200
> print(data) # 8 in 9 chance of {MyNone: 100, MyNone: 200}
> print(MyNone in data) # now 7 in 9 chance of printing False
> ```
>
>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NKXS4JKYMOTAIMS7D5YY5FSQPBPWZHPA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: A proposal to modify `None` so that it hashes to a constant [ In reply to ]
You're right of course.  Oh well, it *was* a wild idea.????
Rob Cliffe

On 04/12/2

On 04/12/2022 18:16, Chris Angelico wrote:
> On Mon, 5 Dec 2022 at 05:11, Rob Cliffe via Python-Dev
> <python-dev@python.org> wrote:
>> Wild suggestion:
>> Make None.__hash__ writable.
>> E.g.
>> None.__hash__ = lambda : 0 # Currently raises AttributeError:
>> 'NoneType' object attribute '__hash__' is read-only
> Hashes have to be stable. If you change the hash of None after it's
> been inserted into a dictionary, you'll get all kinds of entertaining
> problems.
>
>>>> class X:
> ... def __init__(self): self.hash = 0
> ... def __hash__(self): return self.hash
> ...
>>>> x = X()
>>>> d = {x: "This is x"}
>>>> x.hash = 1
>>>> for key in d: print(key, key in d)
> ...
> <__main__.X object at 0x7f2d07c6f1c0> False
>
> ChrisA
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/OVVIKTG7CBN6BII4OBGIXWQJJXYCEO3I/
> Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/L7JDJFPJKYNHIA7QVYGC74SYAVODM2IX/
Code of Conduct: http://python.org/psf/codeofconduct/

1 2  View All