Mailing List Archive

Make HAMT available to python script
Hi,

HAMT is a very useful immutable mapping type. Currently CPython use it internally to implement contextvar. Considering immutable data structure is very useful I hope we can make it available to python script(maybe via collections module).

Immutable data structures are fundamental parts of our project. Currently we use a full-featured python immutable data library called pyrsistent. Pyrsistent is very powerful, however the map type in it is implemented in python script not in C. It becomes slow when the data set is relatively large.

On the other hand, CPython now already has an immutable mapping type in it. I think maybe it’s a good idea to make it public? Many projects can benefit from it I believe.

Here is a talk given by the author of javascript immutable-js library explain why immutable data structures are powerful: https://www.youtube.com/watch?v=I7IdS-PbEgI

Pyristent: https://github.com/tobgu/pyrsistent

What do you think?

Cheers,
Kai
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44WBFS2ZMZFNMDFMRPROB7G34JQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
Hi,

In 2019, Yury Selivanov, who added HAMT and contextvars to Python,
wrote PEP 603 "Adding a frozenmap type to collections":
https://peps.python.org/pep-0603/

Sadly, the PEP was stuck in discussions:

* https://discuss.python.org/t/pep-603-adding-a-frozenmap-type-to-collections/2318
* https://discuss.python.org/t/pep-603-frozenmap-vs-my-frozendict/2473

I don't think that differences between dict and frozenmap was a
blocker issue. In fact, if I recall correctly, there was no blocker
issue.

Anyway, Yury is also maintaining the "immutables" 3rd party module
which implements frozenmap:
https://pypi.org/project/immutables/

Victor

On Fri, Apr 1, 2022 at 11:37 AM zhang kai <kylerzhang11@gmail.com> wrote:
>
> Hi,
>
> HAMT is a very useful immutable mapping type. Currently CPython use it internally to implement contextvar. Considering immutable data structure is very useful I hope we can make it available to python script(maybe via collections module).
>
> Immutable data structures are fundamental parts of our project. Currently we use a full-featured python immutable data library called pyrsistent. Pyrsistent is very powerful, however the map type in it is implemented in python script not in C. It becomes slow when the data set is relatively large.
>
> On the other hand, CPython now already has an immutable mapping type in it. I think maybe it’s a good idea to make it public? Many projects can benefit from it I believe.
>
> Here is a talk given by the author of javascript immutable-js library explain why immutable data structures are powerful: https://www.youtube.com/watch?v=I7IdS-PbEgI
>
> Pyristent: https://github.com/tobgu/pyrsistent
>
> What do you think?
>
> Cheers,
> Kai
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44WBFS2ZMZFNMDFMRPROB7G34JQ/
> Code of Conduct: http://python.org/psf/codeofconduct/



--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PQ42TTASV3ZOBJUTDHARTVTPY5EPIW57/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
You may want to check PEP 603, which more or less proposes this (the author
of the pep is the author of the HAMT code)
check https://peps.python.org/pep-0603/

Alternatively, there is already a pypi package with this code:

https://pypi.org/project/immutables/

Regards from cloudy London,
Pablo

On Fri, 1 Apr 2022, 10:34 zhang kai, <kylerzhang11@gmail.com> wrote:

> Hi,
>
> HAMT is a very useful immutable mapping type. Currently CPython use it
> internally to implement contextvar. Considering immutable data structure is
> very useful I hope we can make it available to python script(maybe via
> collections module).
>
> Immutable data structures are fundamental parts of our project. Currently
> we use a full-featured python immutable data library called pyrsistent.
> Pyrsistent is very powerful, however the map type in it is implemented in
> python script not in C. It becomes slow when the data set is relatively
> large.
>
> On the other hand, CPython now already has an immutable mapping type in
> it. I think maybe it’s a good idea to make it public? Many projects can
> benefit from it I believe.
>
> Here is a talk given by the author of javascript immutable-js library
> explain why immutable data structures are powerful:
> https://www.youtube.com/watch?v=I7IdS-PbEgI
>
> Pyristent: https://github.com/tobgu/pyrsistent
>
> What do you think?
>
> Cheers,
> Kai
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44WBFS2ZMZFNMDFMRPROB7G34JQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Make HAMT available to python script [ In reply to ]
Thanks Victor and Pablo. I will check the discussion of PEP 603. It's a
little weird to use the immutables library when it's code in already in
CPython but I'm glad it's an option.

Kai

On Fri, Apr 1, 2022 at 6:14 PM Pablo Galindo Salgado <pablogsal@gmail.com>
wrote:

> You may want to check PEP 603, which more or less proposes this (the
> author of the pep is the author of the HAMT code)
> check https://peps.python.org/pep-0603/
>
> Alternatively, there is already a pypi package with this code:
>
> https://pypi.org/project/immutables/
>
> Regards from cloudy London,
> Pablo
>
> On Fri, 1 Apr 2022, 10:34 zhang kai, <kylerzhang11@gmail.com> wrote:
>
>> Hi,
>>
>> HAMT is a very useful immutable mapping type. Currently CPython use it
>> internally to implement contextvar. Considering immutable data structure is
>> very useful I hope we can make it available to python script(maybe via
>> collections module).
>>
>> Immutable data structures are fundamental parts of our project. Currently
>> we use a full-featured python immutable data library called pyrsistent.
>> Pyrsistent is very powerful, however the map type in it is implemented in
>> python script not in C. It becomes slow when the data set is relatively
>> large.
>>
>> On the other hand, CPython now already has an immutable mapping type in
>> it. I think maybe it’s a good idea to make it public? Many projects can
>> benefit from it I believe.
>>
>> Here is a talk given by the author of javascript immutable-js library
>> explain why immutable data structures are powerful:
>> https://www.youtube.com/watch?v=I7IdS-PbEgI
>>
>> Pyristent: https://github.com/tobgu/pyrsistent
>>
>> What do you think?
>>
>> Cheers,
>> Kai
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/2WYPX44WBFS2ZMZFNMDFMRPROB7G34JQ/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
Re: Make HAMT available to python script [ In reply to ]
On 4/1/2022 11:48 AM, zhang kai wrote:
> Thanks Victor and Pablo. I will check the discussion of PEP 603. It's a
> little weird to use the immutables library when it's code in already in
> CPython but I'm glad it's an option.

The main difference is that 'immutables' offers you a stable/versioned
interface to use it, while the one that's in CPython is an internal
implementation detail. If one day we find a better design, we can just
switch to it, while 'immutables' probably can't. If we've exposed as a
public interface in the core runtime, it's much more complicated.

(For what it's worth, the major thing that held up contextvars in the
first place was justifying why it needed a new data structure that
wasn't already in the core runtime. So we didn't adopt it lightly, and
making sure we kept the freedom to change it was an important compromise.)

There are plenty of other things in this same category, and because we
want to keep things as stable as possible while also improving
performance and reliability, we have to keep pretty tight limits on what
we promise will remain stable. Most of our discussions are about finding
this balance ;)

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/KDJLDBZ5FIMUCVBKGEWCA2T4NGTZB4CW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
Hi, Steve. Thanks for your detailed explanation. Indeed I already saw the
API discussion in PEP 603. It's much easier to make the decision in a
third-party library. I think we will be fine with the immutables library.

On Fri, Apr 1, 2022 at 7:05 PM Steve Dower <steve.dower@python.org> wrote:

> On 4/1/2022 11:48 AM, zhang kai wrote:
> > Thanks Victor and Pablo. I will check the discussion of PEP 603. It's a
> > little weird to use the immutables library when it's code in already in
> > CPython but I'm glad it's an option.
>
> The main difference is that 'immutables' offers you a stable/versioned
> interface to use it, while the one that's in CPython is an internal
> implementation detail. If one day we find a better design, we can just
> switch to it, while 'immutables' probably can't. If we've exposed as a
> public interface in the core runtime, it's much more complicated.
>
> (For what it's worth, the major thing that held up contextvars in the
> first place was justifying why it needed a new data structure that
> wasn't already in the core runtime. So we didn't adopt it lightly, and
> making sure we kept the freedom to change it was an important compromise.)
>
> There are plenty of other things in this same category, and because we
> want to keep things as stable as possible while also improving
> performance and reliability, we have to keep pretty tight limits on what
> we promise will remain stable. Most of our discussions are about finding
> this balance ;)
>
> Cheers,
> Steve
>
Re: Make HAMT available to python script [ In reply to ]
On Fri, Apr 1, 2022 at 4:06 AM Steve Dower <steve.dower@python.org> wrote:

> The main difference is that 'immutables' offers you a stable/versioned
> interface to use it, while the one that's in CPython is an internal
> implementation detail. If one day we find a better design, we can just
> switch to it, while 'immutables' probably can't. If we've exposed as a
> public interface in the core runtime, it's much more complicated.
>

I don't understand the issue here:

If we expose a "frozendict" as built in python object then only the API of
that object needs to remain stable, not the implementation.

And it seems that's an API that is already clearly defined.

+ 1 from me -- just the other day I was wishing it was there.

-CHB

--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Make HAMT available to python script [ In reply to ]
On Sat, 2 Apr 2022 at 02:30, Christopher Barker <pythonchb@gmail.com> wrote:
>
> On Fri, Apr 1, 2022 at 4:06 AM Steve Dower <steve.dower@python.org> wrote:
>>
>> The main difference is that 'immutables' offers you a stable/versioned
>> interface to use it, while the one that's in CPython is an internal
>> implementation detail. If one day we find a better design, we can just
>> switch to it, while 'immutables' probably can't. If we've exposed as a
>> public interface in the core runtime, it's much more complicated.
>
>
> I don't understand the issue here:
>
> If we expose a "frozendict" as built in python object then only the API of that object needs to remain stable, not the implementation.
>
> And it seems that's an API that is already clearly defined.
>
> + 1 from me -- just the other day I was wishing it was there.

There would presumably need to be be a C API as well, and that would
probably expose more of the implementation unless handled carefully.
Without seeing an actual implementation, it's hard to know for sure.

Paul
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/Y5RV73EOBXMQ252KWIZQYLUNCE3DW3OJ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
On 02Apr2022 0925, Paul Moore wrote:
> On Sat, 2 Apr 2022 at 02:30, Christopher Barker <pythonchb@gmail.com> wrote:
>>
>> On Fri, Apr 1, 2022 at 4:06 AM Steve Dower <steve.dower@python.org> wrote:
>>>
>>> The main difference is that 'immutables' offers you a stable/versioned
>>> interface to use it, while the one that's in CPython is an internal
>>> implementation detail. If one day we find a better design, we can just
>>> switch to it, while 'immutables' probably can't. If we've exposed as a
>>> public interface in the core runtime, it's much more complicated.
>>
>>
>> I don't understand the issue here:
>>
>> If we expose a "frozendict" as built in python object then only the API of that object needs to remain stable, not the implementation.
>>
>> And it seems that's an API that is already clearly defined.
>>
>> + 1 from me -- just the other day I was wishing it was there.
>
> There would presumably need to be be a C API as well, and that would
> probably expose more of the implementation unless handled carefully.
> Without seeing an actual implementation, it's hard to know for sure.

Yes, and the request in this thread was "expose the HAMT", not "create a
new data type that uses it under the covers".

My opposition to the new data type was that it *wasn't* frozendict,
primarily because we decided that dicts had to preserve insertion order.
So it was proposed as "frozenmap", but was more like a dict than map().
It's also slightly slower for lookups (as well as technically not O(1),
though close enough for moderate N), and the primary benefit is reduced
memory usage when you mutate it (but it's frozen...? ;) ).

All of that kind of adds up to make it a distraction that's likely worse
than a dict subclass with __setitem__ overridden (if that's the
behaviour you want). It's a fantastic data structure for when you need
it, but it's one that deserves to be researched and understood before
simply dropping it into your code. An optional package is a perfectly
good place for it.

Cheers,
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VFSXMPGG7Q3YOAZKKNHLLW5TPHJPU3OS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
>
> All of that kind of adds up to make it a distraction that's likely worse
> than a dict subclass with __setitem__ overridden (if that's the
> behaviour you want). It's a fantastic data structure for when you need
> it, but it's one that deserves to be researched and understood before
> simply dropping it into your code. An optional package is a perfectly
> good place for it.
>

I agree with this. I just want to explain a little bit about why we want to
mutate an immutable map.

Immutable data structures have some nice features. Because of its immutable
nature, immutable data structures are inherently thread-safe so much safer
to use in a multi-thread environment. And it can be much faster to find the
differences between two immutable data structures because we can replace !=
with is not , therefore quickly rule out all unchanged data.

The immutable nature is actually very powerful when you want to change a
map(especially with nested maps in it) often, then need to find the
differences in other places. It sometimes reminds me of git. With immutable
data, we can keep every snapshot of data history in memory. Reduced memory
usage is also important.

I think immutable map and frozendict are actually two different data types
because they have different intentions.

The author of immutable.js gives a great talk about how immutable data
structure is implemented and why it’s powerful. React.js actually relies a
lot on immutable data structures. Here is the youtube link to the talk:
<https://www.youtube.com/watch?v=I7IdS-PbEgI&t=1005s>
https://www.youtube.com/watch?v=I7IdS-PbEgI&t=1005s
There is also a CppCon talk about immutable data structure:
<https://www.youtube.com/watch?v=sPhpelUfu8Q&t=412s>
https://www.youtube.com/watch?v=sPhpelUfu8Q&t=412s
Re: Make HAMT available to python script [ In reply to ]
On Sat, Apr 2, 2022 at 6:30 AM Steve Dower <steve.dower@python.org> wrote:

> >> + 1 from me -- just the other day I was wishing it was there.
> >
> > There would presumably need to be be a C API as well, and that would
> > probably expose more of the implementation unless handled carefully.
> > Without seeing an actual implementation, it's hard to know for sure.
>

Sure -- as i understand it, there is increasing effort to define the
various levels and stabilities of the C API -- it doesn't seem an
intractable problem to define a new type as having an unstable API for a
period of time.

And as HAMT is there, it's gotten at least a little use. And if we want to
hammer out the details, giving it wider use would be a way to do that.


> Yes, and the request in this thread was "expose the HAMT", not "create a
> new data type that uses it under the covers".
>

OK, sure, but we were also pointed to PEP 603 -- which seems to have
stalled.

I'm supportive of the general idea -- even if a few details need to be
resolved.

My opposition to the new data type was that it *wasn't* frozendict,
> primarily because we decided that dicts had to preserve insertion order.
> So it was proposed as "frozenmap", but was more like a dict than map().


That's the challenge of naming anything ;-)

For my part, yes, I'd rather an immutable Mapping that is as much like dict
as possible.

I just went and read the whole thread in discuss, and this one quote
reflects my take as well:

"""
I’m not sure whether the specific implementation and API of this
frozenmap is a good one, but some kind of immutable, hashable mapping
seems to be commonly requested and commonly re-invented.
"""

Then there's Raymond's point:

"But the PEP is about something different – it is algorithm focused."

So yes, I get what the issue is now.

Then there is the end of the thread:

"I think the hold-up is Yury having time to continue pushing the PEP
forward."

So it's stalled out, rather than being rejected due to any intractable
issues.

For my part:

I think there is substantial demand for an immutable Mapping type.

And having it in the stdlib would be a real benefit -- as it happens, last
week I wanted it not for my own code, but for a little prototype of a
"anonymous names tuple" as discussed on python-ideas -- no idea if that is
going anywhere, but there have got to be multiple potential uses in the
stdlib.

Another thing to keep in mind is that for most uses, the exact
performance characteristics really don't matter much. For me, that points
to an implementation that mimics the dict API (i.e. preserves order). If
someone needs an implementation that has specific performance
characteristics that are different, and doesn't need key order
preservation, then THAT can be a special case. Which, IIUC, is what the
current HAMT in cPython is about.

Anyway, it would be nice for someone to take up the mantle of getting an
immutable Mapping into the stdlib.

-CHB

--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Make HAMT available to python script [ In reply to ]
On Sat, Apr 02, 2022 at 09:28:55AM -0700, Christopher Barker wrote:

> Anyway, it would be nice for someone to take up the mantle of getting an
> immutable Mapping into the stdlib.

Here is a "FrozenMapping" class that could be added to collections. I
have tested it ~~extensively~~ for nearly two minutes and haven't been
able to break it, so it must be good *wink*


```
from collections.abc import Mapping
from types import MappingProxyType

class FrozenMapping(Mapping):
__slots__ = ('_proxy', '_hash')

def __new__(cls, *args, **kwargs):
a = dict(*args, **kwargs)
obj = super().__new__(cls)
obj._proxy = MappingProxyType(a)
obj._hash = None
return obj

def __getitem__(self, key):
return self._proxy[key]

def __iter__(self):
yield from self._proxy

def __len__(self):
return len(self._proxy)

def copy(self):
return self # We're immutable.

def __reversed__(self):
return reversed(self._proxy)

def __or__(self, other):
return type(self)(self._proxy | other)

def __ror__(self, other):
return type(self)(other | self._proxy)

def __hash__(self):
if self._hash is None:
self._hash = hash(frozenset(self.items()))
return self._hash

def __repr__(self):
items = ', '.join(['%r: %r' % t for t in self.items()])
return '%s({%s})' % (type(self).__name__, items)

```

I haven't considered pickling, or deep-copying. I don't think there is
any way to get access to the underlying dict and modify it, except
perhaps via ctypes, so I think it is as immutable as it is possible to
get from Python code. Feel free to take that as a challenge to break it.

Does anyone want to champion adding this to collections? I guess it will
need at least a short PEP to justify that there are use-cases for frozen
mappings.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VDMJVEFP7FK3M25ICJGWG7XJXPPUN6OQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
On Sun, 3 Apr 2022, 12:58 pm Steven D'Aprano, <steve@pearwood.info> wrote:

>
> I haven't considered pickling, or deep-copying. I don't think there is
> any way to get access to the underlying dict and modify it, except
> perhaps via ctypes, so I think it is as immutable as it is possible to
> get from Python code. Feel free to take that as a challenge to break it.
>


MappingProxyType does expose the underlying mapping to the other operand
during type coercion for binary operators (and trying to prevent that
causes enough problems that we decided the problem wasn't worth fixing):
https://bugs.python.org/issue43838


For a lot of use cases, MappingProxyType is already a "good enough"
immutable dict, but it does have a definite discoverability problem.

Cheers,
Nick.



>
>
Re: Make HAMT available to python script [ In reply to ]
PEP 603 adds a frozenmap type to collections, implemented as HAMT.

For a read-only *dict*, I proposed PEP 416 "Add a frozendict builtin
type" in 2012. It was rejected.
https://peps.python.org/pep-0416/

The outcome of this PEP was the addition of the types.MappingProxy
type (previously, it already existed but it was somehow hidden:
type(int.__dict__)).

Victor
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/IQMBELHLSWSEYQI7BEPTJTYP47BWTBUV/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Make HAMT available to python script [ In reply to ]
Thanks for the reminder Victor.

For a read-only *dict*, I proposed PEP 416 "Add a frozendict builtin
> type" in 2012. It was rejected.
> https://peps.python.org/pep-0416/


Note that that was ten years ago -- so maybe worth revisiting.

The outcome of this PEP was the addition of the types.MappingProxy
> type (previously, it already existed but it was somehow hidden:
> type(int.__dict__)).
>

It's interesting to read the justification for the rejection, among other
things:

"According to Raymond Hettinger, use of frozendict is low."

Then it goes on to provide examples. And of course, the MappingProxy type
is used for class dicts, and exposed for other code to use.

However, what strikes me is that the list of reasons we don't need an
immutable dict is jsu that -- it isn't strictly necessary -- which is not
the same as it not being useful.

And the MappingProxy is heavier weight and can't be hashed (i.e. used as a
key in a dict or put in a set)

In the last ten years, we have seen this brought up over and over again --
there really are quite a few use cases.

I would love to see PEP 416 reconsidered.

(note: it would need a touch of editing to cover the fact that dicts are
now ordered)

-CHB











>
> Victor
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/IQMBELHLSWSEYQI7BEPTJTYP47BWTBUV/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython