Mailing List Archive

Having Sorted Containers in stdlib?
Hi All,

This is a modest proposal to consider having sorted containers (http://www.grantjenks.com/docs/sortedcontainers/ <http://www.grantjenks.com/docs/sortedcontainers/>) in standard library. I know that usually adding stuff to standard library requires some strong arguments, so I will try my best to give my reasons here:

1) Some mainstream language support them out of box: C++ for example have set/map which are sorted by the key order, and Java has TreeMap which is internally a Red-black tree. I understand languages might target different audiences, but I think Python’s standard library is quite extensive compared to peers. Consider we even have a sqlite driver in the stdlib, I do not think it is outrageous to have sorted containers.
2) These containers are not really easy to implement correctly, and usually is out of the scope of day-to-day projects. Especially considering we have a large audience of non-hardcore programmers in Python community. They may have the need to use these structures, but they do not necessarily have the skill/knowledge to implement it.
3) Granted, people can just pip install this library, but that is one extra step and less fraction is better for user experience.
4) These structures are very useful in competitive programming, I know at least in Leetcode this library is installed for Python.
5) The said library is of high implementation quality.

I might be stupid here as I am not the maintainer of this library and I might be not even in a position to submit it to Python as part of stdlib, but here are some of my personal thoughts and would love to hear your opinion!

Thanks!
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, 9 Nov 2021 at 16:01, Bob Fang <boyufang@bytedance.com> wrote:

> This is a modest proposal to consider having sorted containers (
> http://www.grantjenks.com/docs/sortedcontainers/
> <https://t.mailpgn.com/l/?u=5264d942-a595-49ac-a64a-986ce181d7fb&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3Da3e560c9-4417-427b-b976-19746738d8e5%26amp%3Bfl%3Dhttp%253A%252F%252Fwww.grantjenks.com%252Fdocs%252Fsortedcontainers%252F>)
> in standard library. I know that usually adding stuff to standard library
> requires some strong arguments, so I will try my best to give my reasons
> here:
>

As of 3.7. dicts are sorted[1], but I'm unsure if the order can be
overridden. Indeed:

>>> Boys = {'Tim': 18,'Charlie':12,'Robert':25}

Girls = {'Tiffany':22}


>>> print(cmp(Girls, Boys))

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'cmp' is not defined

>>> Girls.__gt__(Boys)

NotImplemented


-- H
--
OpenPGP: https://hasan.d8u.us/openpgp.asc
<https://t.mailpgn.com/l/?u=b8770bd0-a9ef-411d-8c87-1636e6d9b77a&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D5a3f5883-7b16-4b11-80ad-be68170836a1%26amp%3Bfl%3Dhttps%253A%252F%252Fhasan.d8u.us%252Fopenpgp.asc>
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
<https://t.mailpgn.com/l/?u=8467f9a6-914d-4d6d-8c00-dd503e667146&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D20c36f1d-4104-40d6-8c5e-b8bee61caaa4%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>*
.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
<https://t.mailpgn.com/l/?u=be22c143-79b8-4a2b-b7f9-cb7a95a8e7e4&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D143cb038-fda7-4317-aef6-37d17da9dfde%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>*
.

<https://t.mailpgn.com/l/?u=b065018c-6b50-41c8-9881-508f9f2f30fa&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D9c5843eb-43ed-470a-83fc-1ca514c43810%26amp%3Bfl%3Dhttps%253A%252F%252Fsks-keyservers.net%252Fpks%252Flookup%253Fop%253Dget%2526amp%253Bsearch%253D0xFEBAD7FFD041BBA1>Sent
from my mobile device
Envoye de mon portable
1. https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/
<https://t.mailpgn.com/l/?u=e7d71732-691a-421f-b43a-a9cdb3c2b9d0&fl=https%3A%2F%2Fsoftwaremaniacs.org%2Fblog%2F2020%2F02%2F05%2Fdicts-ordered%2F>
Re: Having Sorted Containers in stdlib? [ In reply to ]
But that’s in insertion order, not in key order right? I think we need data structure that are key-ordered.

> On 10 Nov 2021, at 00:23, Hasan Diwan <hasan.diwan@gmail.com> wrote:
>
>
> On Tue, 9 Nov 2021 at 16:01, Bob Fang <boyufang@bytedance.com <mailto:boyufang@bytedance.com>> wrote:
> This is a modest proposal to consider having sorted containers (http://www.grantjenks.com/docs/sortedcontainers/ <https://t.mailpgn.com/l/?u=5264d942-a595-49ac-a64a-986ce181d7fb&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3Da3e560c9-4417-427b-b976-19746738d8e5%26amp%3Bfl%3Dhttp%253A%252F%252Fwww.grantjenks.com%252Fdocs%252Fsortedcontainers%252F>) in standard library. I know that usually adding stuff to standard library requires some strong arguments, so I will try my best to give my reasons here:
>
> As of 3.7. dicts are sorted[1], but I'm unsure if the order can be overridden. Indeed:
>
> >>> Boys = {'Tim': 18,'Charlie':12,'Robert':25}
> Girls = {'Tiffany':22}
>
> >>> print(cmp(Girls, Boys))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'cmp' is not defined
> >>> Girls.__gt__(Boys)
> NotImplemented
>
> -- H
> --
> OpenPGP: https://hasan.d8u.us/openpgp.asc <https://t.mailpgn.com/l/?u=b8770bd0-a9ef-411d-8c87-1636e6d9b77a&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D5a3f5883-7b16-4b11-80ad-be68170836a1%26amp%3Bfl%3Dhttps%253A%252F%252Fhasan.d8u.us%252Fopenpgp.asc>
> If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest <https://t.mailpgn.com/l/?u=8467f9a6-914d-4d6d-8c00-dd503e667146&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D20c36f1d-4104-40d6-8c5e-b8bee61caaa4%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>.
> Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest <https://t.mailpgn.com/l/?u=be22c143-79b8-4a2b-b7f9-cb7a95a8e7e4&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D143cb038-fda7-4317-aef6-37d17da9dfde%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>.
>
> <https://t.mailpgn.com/l/?u=b065018c-6b50-41c8-9881-508f9f2f30fa&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D9c5843eb-43ed-470a-83fc-1ca514c43810%26amp%3Bfl%3Dhttps%253A%252F%252Fsks-keyservers.net%252Fpks%252Flookup%253Fop%253Dget%2526amp%253Bsearch%253D0xFEBAD7FFD041BBA1>Sent from my mobile device
> Envoye de mon portable
> 1. https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/ <https://t.mailpgn.com/l/?u=e7d71732-691a-421f-b43a-a9cdb3c2b9d0&fl=https%3A%2F%2Fsoftwaremaniacs.org%2Fblog%2F2020%2F02%2F05%2Fdicts-ordered%2F>_______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RNXTH3JRUZ5WH33AKHVOSZF34FGNMS6S/
> Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, 9 Nov 2021, 17:05 Bob Fang, <boyufang@bytedance.com> wrote:

> But that’s in insertion order, not in key order right? I think we need
> data structure that are key-ordered.
>
According to the tests, it seems to be key-ordered. It also appears that
the ordering cannot be changed (yet?). -- H
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 09, 2021 at 04:23:50PM -0800, Hasan Diwan wrote:

> As of 3.7. dicts are sorted[1], but I'm unsure if the order can be
> overridden.


Dicts are not sorted, they are kept in insertion order.

>>> d = {3: 'a', 4: 'b', 1: 'c', 2: 'd'}
>>> d
{3: 'a', 4: 'b', 1: 'c', 2: 'd'}

See the docs:

https://docs.python.org/3/library/stdtypes.html#dict

although you have to scroll almost all the way to then end, just before
the dict view objects, to see it documented.

Sorting dicts has been discussed on the Python-Ideas mailing list, it is
too hard and expensive to justify for the limited use-cases for it. If
you want to sort a dict, you are best to sort the dict's keys, then
create a new dict. Or possibly use a dedicated Sorted Mapping type like
a red-black tree or similar.


Hasan wrote:

> >>> print(cmp(Girls, Boys))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'cmp' is not defined

The 'cmp' built-in was removed in Python 3.0.

In any case, dicts don't support order comparisons.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AV5RVB37YR7TZ3ZURI4ZIHCS5Y2XV67Y/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
Hi Bob and welcome,

Before we could even consider adding the sortedcontainers library to the
standard library, we would need to hear from the maintainer(s) of the
library that they agree to the move and would be able to continue
maintaining the library under our release schedule and backwards
compatibility guarantees.

Otherwise, you would need to find a core developer willing to
re-implement the containers and maintain them.

I don't want to discourage you, but even if the maintainer is
willing, we night not decide to add it. Every new feature, class and
function adds to the weight of learning Python, and the cost of
maintenance. We must balance that against the benefit, and only add
features where the benefits are greater than the costs.

Our decision making is usually very conservative, because we have strong
requirements for backwards-compatibility. Once we add something to the
stdlib, we can't easily change our mind and remove it again. So we
follow the Zen of Python:

>>> import this
The Zen of Python, by Tim Peters
[...]
Now is better than never.
Although never is often better than *right* now.


Have you looked at the Python PEPs? If you are serious about pushing
this proposal, your first step should be to read PEP 1 and then browse
through the collection of successful and unsuccessful PEPs:

https://www.python.org/dev/peps/pep-0001/

https://www.python.org/dev/peps/


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VCM5PUEHCGRIYTPPRAZZUSEEJ3KW2DTN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <steve@pearwood.info> wrote:

> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
> too hard and expensive to justify for the limited use-cases for it. If
> you want to sort a dict, you are best to sort the dict's keys, then
> create a new dict. Or possibly use a dedicated Sorted Mapping type like
> a red-black tree or similar.
>

There are several implementations of key-order sorted dicts available in
Python, and more than a few of them are well tested. I am the maintainer
of one of them: https://pypi.org/project/treap/ which I ported from Java.
I also did a performance comparison among several of them a few years back.

Red-black trees are popular, perhaps especially among Java developers, but
I've never understood why. They aren't the fastest. Among traditional
tree-based key-sorted dicts, treaps are frequently fastest.

However, SortedDict is not uncommonly faster than treaps - I believe this
is because SortedDict is very good at maximizing locality of reference.
Traditional trees are almost always going to do an entire cache line hit
for every node in large trees, even if those nodes are "next to each other"
when sorted/traversed. SortedDicts put keys that are nearly equal, in
nearly the same part of memory - so multiple values can be retrieved with a
single cache line hit.

Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
sometimes even O(n^2 * logn). This is not good. A traditional tree should
give O(nlogn) algorithms under similar circumstances, and although my gut
is telling me SortedDict is similar the presentation linked below suggests
a different (though still better than sorting in a loop) runtime for
SortedDict.

It's true that key-sorted dicts are not all that common. Their most common
use is probably for implementing finite caches - evictions can benefit from
ordered keys. However, we already have functools.lru_cache.

Here's my most recent performance comparison:
https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf

Here's a PyCon 2016 presentation about SortedContainers, that includes
SortedDict:
https://www.youtube.com/watch?v=7z2Ki44Vs4E

I think it would make sense to include at least one key-sorted dict in
CPython, and feel that SortedDict would not be a bad way to go. Neither
would treap.
Re: Having Sorted Containers in stdlib? [ In reply to ]
Maybe a stupid question:

What are use cases for sorted dicts?

I don’t think I’ve ever needed one.

Also, I can’t quite tell from the discussion If a “sorted dict” implements
something new, or is an internal data structure that gives better
performance for particular use cases. I.e. is a sorted dict a Mapping?

-CHB



On Tue, Nov 9, 2021 at 9:45 PM Dan Stromberg <drsalists@gmail.com> wrote:

>
> On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <steve@pearwood.info>
> wrote:
>
>> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
>> too hard and expensive to justify for the limited use-cases for it. If
>> you want to sort a dict, you are best to sort the dict's keys, then
>> create a new dict. Or possibly use a dedicated Sorted Mapping type like
>> a red-black tree or similar.
>>
>
> There are several implementations of key-order sorted dicts available in
> Python, and more than a few of them are well tested. I am the maintainer
> of one of them: https://pypi.org/project/treap/ which I ported from
> Java. I also did a performance comparison among several of them a few
> years back.
>
> Red-black trees are popular, perhaps especially among Java developers, but
> I've never understood why. They aren't the fastest. Among traditional
> tree-based key-sorted dicts, treaps are frequently fastest.
>
> However, SortedDict is not uncommonly faster than treaps - I believe this
> is because SortedDict is very good at maximizing locality of reference.
> Traditional trees are almost always going to do an entire cache line hit
> for every node in large trees, even if those nodes are "next to each other"
> when sorted/traversed. SortedDicts put keys that are nearly equal, in
> nearly the same part of memory - so multiple values can be retrieved with a
> single cache line hit.
>
> Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
> sometimes even O(n^2 * logn). This is not good. A traditional tree should
> give O(nlogn) algorithms under similar circumstances, and although my gut
> is telling me SortedDict is similar the presentation linked below suggests
> a different (though still better than sorting in a loop) runtime for
> SortedDict.
>
> It's true that key-sorted dicts are not all that common. Their most
> common use is probably for implementing finite caches - evictions can
> benefit from ordered keys. However, we already have functools.lru_cache.
>
> Here's my most recent performance comparison:
> https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf
>
> Here's a PyCon 2016 presentation about SortedContainers, that includes
> SortedDict:
> https://www.youtube.com/watch?v=7z2Ki44Vs4E
>
> I think it would make sense to include at least one key-sorted dict in
> CPython, and feel that SortedDict would not be a bad way to go. Neither
> would treap.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VNK4VPGA4LJH7RMR4LCCACEG2WNDBBFO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Having Sorted Containers in stdlib? [ In reply to ]
A quick Google for "treap python github" yielded
https://github.com/TheAlgorithms/Python/blob/master/data_structures/binary_tree/treap.py
.

On Tue, 9 Nov 2021 at 21:49, Dan Stromberg <drsalists@gmail.com> wrote:

>
> On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <steve@pearwood.info>
> wrote:
>
>> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
>> too hard and expensive to justify for the limited use-cases for it. If
>> you want to sort a dict, you are best to sort the dict's keys, then
>> create a new dict. Or possibly use a dedicated Sorted Mapping type like
>> a red-black tree or similar.
>>
>
> There are several implementations of key-order sorted dicts available in
> Python, and more than a few of them are well tested. I am the maintainer
> of one of them: https://pypi.org/project/treap/ which I ported from
> Java. I also did a performance comparison among several of them a few
> years back.
>
> Red-black trees are popular, perhaps especially among Java developers, but
> I've never understood why. They aren't the fastest. Among traditional
> tree-based key-sorted dicts, treaps are frequently fastest.
>
> However, SortedDict is not uncommonly faster than treaps - I believe this
> is because SortedDict is very good at maximizing locality of reference.
> Traditional trees are almost always going to do an entire cache line hit
> for every node in large trees, even if those nodes are "next to each other"
> when sorted/traversed. SortedDicts put keys that are nearly equal, in
> nearly the same part of memory - so multiple values can be retrieved with a
> single cache line hit.
>
> Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
> sometimes even O(n^2 * logn). This is not good. A traditional tree should
> give O(nlogn) algorithms under similar circumstances, and although my gut
> is telling me SortedDict is similar the presentation linked below suggests
> a different (though still better than sorting in a loop) runtime for
> SortedDict.
>
> It's true that key-sorted dicts are not all that common. Their most
> common use is probably for implementing finite caches - evictions can
> benefit from ordered keys. However, we already have functools.lru_cache.
>
> Here's my most recent performance comparison:
> https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf
>
> Here's a PyCon 2016 presentation about SortedContainers, that includes
> SortedDict:
> https://www.youtube.com/watch?v=7z2Ki44Vs4E
>
> I think it would make sense to include at least one key-sorted dict in
> CPython, and feel that SortedDict would not be a bad way to go. Neither
> would treap.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VNK4VPGA4LJH7RMR4LCCACEG2WNDBBFO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
OpenPGP: https://hasan.d8u.us/openpgp.asc
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
<http://bit.ly/hd1AppointmentRequest>*.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
<http://bit.ly/hd1AppointmentRequest>*.

<https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1>Sent
from my mobile device
Envoye de mon portable
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, Nov 10, 2021 at 5:03 PM Christopher Barker <pythonchb@gmail.com> wrote:
>
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

Me neither, tbh.

> Also, I can’t quite tell from the discussion If a “sorted dict” implements something new, or is an internal data structure that gives better performance for particular use cases. I.e. is a sorted dict a Mapping?
>

Nothing's technically new. You could make an inefficient sorted dict like this:

class SortedDict(dict):
def __iter__(self): return iter(sorted(self.keys()))

So in that sense, yes, it's better performance for the use-case of
needing the keys to be in order. It's not something I have ever really
needed - on the rare occasions when I need something like that, it's
usually no harder to maintain a separate sorted list or something.

IMO this is a great tool to have on PyPI. Someone can start the naive
way, run into major performance problems, and then go "huh, maybe
someone else has already solved this problem". Doesn't need to be in
the stdlib for that.

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SL2UXHDTIUQQ6ARNLPBXZCEHVSVONRJH/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 09, 2021 at 10:01:35PM -0800, Christopher Barker wrote:
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

Good question :-)


> Also, I can’t quite tell from the discussion If a “sorted dict” implements
> something new, or is an internal data structure that gives better
> performance for particular use cases. I.e. is a sorted dict a Mapping?

All dicts are mappings :-) I would expect any kind of sorted dict to
support the full mutable mapping interface.

Some mappings are not necessarily implemented as hash tables, as dict
is. Some variety of tree is a common choice.

I expect that a sorted dict (however it is implemented) would be a
mapping that preserves sorted order on insertions and deletions, rather
than insertion order. (Or some arbitrary order.)

I haven't used one myself, but I would expect an API where you create
a Sorted Dict (of whatever library or implementation you prefer), set an
optional key function and direction (ascending or descending), then as
you insert keys, the mapping keeps the keys in sort order.

Note that this is different from *sorting* a dict, which (if it were
supported) has to be done explicitly. Between sorts, the dict would be
capable of getting out of sorted order. The idea of a *sorted* dict is
that the sort order is an invariant, rather than something that can come
and go as you insert and delete items.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JCFQYOSD5NQ3GFGLRVW7QWISXQL5LHS5/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, Nov 10, 2021 at 05:11:33PM +1100, Chris Angelico wrote:

> Nothing's technically new. You could make an inefficient sorted dict like this:
>
> class SortedDict(dict):
> def __iter__(self): return iter(sorted(self.keys()))

You would need more than that. You would want to ensure that the dict's
repr shows it is sorted order. You would need popitem to pop items in
the appropriate order. There may be other APIs that sorted dicts provide
than merely sorting the keys on iteration.

You don't actually need to implement this to see how repeatedly sorting
the keys would give terrible performance for anything above small sets
of data.

Here's Java's standard SortedMap:

https://docs.oracle.com/javase/7/docs/api/java/util/SortedMap.html

That should give you some idea of the interface a sorted dict would
likely provide.


> IMO this is a great tool to have on PyPI. Someone can start the naive
> way, run into major performance problems, and then go "huh, maybe
> someone else has already solved this problem". Doesn't need to be in
> the stdlib for that.

You may have missed that this thread has already discussed two mature,
good quality sorted dict implementations that have been on PyPI for
years:

https://pypi.org/project/sortedcontainers/

https://pypi.org/project/treap-python/


Here are a few more:

https://pypi.org/project/ruamel.ordereddict/

https://github.com/surenkov/PySortedDict

https://pypi.org/project/sorteddict/


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HPEXFUZR6Z6AHW4MHABWXT7UKCM6ELYZ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, Nov 10, 2021 at 5:45 PM Steven D'Aprano <steve@pearwood.info> wrote:
>
> On Wed, Nov 10, 2021 at 05:11:33PM +1100, Chris Angelico wrote:
>
> > Nothing's technically new. You could make an inefficient sorted dict like this:
> >
> > class SortedDict(dict):
> > def __iter__(self): return iter(sorted(self.keys()))
>
> You would need more than that. You would want to ensure that the dict's
> repr shows it is sorted order. You would need popitem to pop items in
> the appropriate order. There may be other APIs that sorted dicts provide
> than merely sorting the keys on iteration.

Sure. That was a massively oversimplified one, to point out that the
question of whether this is "truly new" is fairly meaningless. A thing
can be extremely useful without actually being *new* per se.

> You don't actually need to implement this to see how repeatedly sorting
> the keys would give terrible performance for anything above small sets
> of data.

And yes. That is exactly the problem.

> > IMO this is a great tool to have on PyPI. Someone can start the naive
> > way, run into major performance problems, and then go "huh, maybe
> > someone else has already solved this problem". Doesn't need to be in
> > the stdlib for that.
>
> You may have missed that this thread has already discussed two mature,
> good quality sorted dict implementations that have been on PyPI for
> years:
>
> https://pypi.org/project/sortedcontainers/
>
> https://pypi.org/project/treap-python/
>
>
> Here are a few more:
>
> https://pypi.org/project/ruamel.ordereddict/
>
> https://github.com/surenkov/PySortedDict
>
> https://pypi.org/project/sorteddict/
>

I know it's on PyPI, in multiple forms. I'm saying that it's a great
tool to keep there, not in the stdlib.

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QKFJJBYDAGHVFWSLSUSYS4WY7ECJ3ONQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, 2021-11-10 at 17:16 +1100, Steven D'Aprano wrote:
> On Tue, Nov 09, 2021 at 10:01:35PM -0800, Christopher Barker wrote:
> > Maybe a stupid question:
> >
> > What are use cases for sorted dicts?
> >
> > I don’t think I’ve ever needed one.
>
> Good question :-)

It could be handy for deterministic iteration of its values, for
example to  allow serialized values to be easily compared, or to
generate and verify a signatures. It's edge case material; I wouldn't
think it's strong enough use case for inclusion in stdlib.
Re: Having Sorted Containers in stdlib? [ In reply to ]
10.11.21 01:47, Bob Fang ????:
> 1) Some mainstream language support them out of box: C++ for example
> have set/map which are sorted by the key order, and Java has TreeMap
> which is internally a Red-black tree.

>From my C++ and Java experience, hashtable-based containers are much
more useful than tree-based containers. They are more compact and fast.
In most cases the only reason of using sorted container is supporting
deterministic iteration order, but often it is enough to sort data only
for output. It is easy to do with the sorted() builtin in Python.

We need some dict implementatin in Python, it is essential part of the
language used to implement modules, classes and objects. And
hashtable-based implementation is good for this. There is no such need
of tree-based implementation. It is not needed in Python core, and is
not needed for most users. So it is good to keep it in third-party
libraries. Maintaining it in the stdlib has a cost and the benefit/cost
value would be low.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LG6GNFGRI4OSBEWF6KTIWLI65UXWMTIM/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
Hi Steve and all,

Thanks!

> Before we could even consider adding the sortedcontainers library to the
> standard library, we would need to hear from the maintainer(s) of the
> library that they agree to the move and would be able to continue
> maintaining the library under our release schedule and backwards
> compatibility guarantees.

Totally agree. And as others mentioned in this thread, I agree the use cases for Sorted Containers are weak to make it worthwhile to be added to stdlib. I was more thinking about the completeness of the stdlib when I was proposing, but I do agree there is a cost/benefit analysis need to be done and it is great to see all the responses.
As other pointed out this is mainly useful to implement some kind of cache, but since we already have lru_cache a lower level sorted dict may not be needed.

Thanks for all the responses.

Best,
Bob

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TCCZZYGREV77OG6JBE3YWYE7NXOVZL4G/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
IMO if someone is motivated to get a new container type in Python, a
PEP is required.

A PEP has been written for the new removeprefix() and removesuffix()
methods which are way simpler ;-)

I expect many questions on corner cases for a sorted container type.

Having a reference implementation, even written in Python, would be helpful.

Victor

On Wed, Nov 10, 2021 at 1:00 AM Bob Fang <boyufang@bytedance.com> wrote:
>
> Hi All,
>
> This is a modest proposal to consider having sorted containers (http://www.grantjenks.com/docs/sortedcontainers/) in standard library. I know that usually adding stuff to standard library requires some strong arguments, so I will try my best to give my reasons here:
>
> 1) Some mainstream language support them out of box: C++ for example have set/map which are sorted by the key order, and Java has TreeMap which is internally a Red-black tree. I understand languages might target different audiences, but I think Python’s standard library is quite extensive compared to peers. Consider we even have a sqlite driver in the stdlib, I do not think it is outrageous to have sorted containers.
> 2) These containers are not really easy to implement correctly, and usually is out of the scope of day-to-day projects. Especially considering we have a large audience of non-hardcore programmers in Python community. They may have the need to use these structures, but they do not necessarily have the skill/knowledge to implement it.
> 3) Granted, people can just pip install this library, but that is one extra step and less fraction is better for user experience.
> 4) These structures are very useful in competitive programming, I know at least in Leetcode this library is installed for Python.
> 5) The said library is of high implementation quality.
>
> I might be stupid here as I am not the maintainer of this library and I might be not even in a position to submit it to Python as part of stdlib, but here are some of my personal thoughts and would love to hear your opinion!
>
> Thanks!
> Bob
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YB2JD477TKPB2HTXDW6ZXUBD6NFFFHHJ/
> Code of Conduct: http://python.org/psf/codeofconduct/



--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/X5AXOUUHN7HBTMIQZQTR5LCNSQJGY27B/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
[Christopher Barker <pythonchb@gmail.com>]
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

For example, for some mappings with totally ordered keys, it can be
useful to ask for the value associated with a key that's not actually
there, because "close to the key is good enough".

A SortedDict supports several ways of doing that efficiently,
depending on what - exactly - an app means by "good enough". For
example, it's easy to find _the_ closest key. Or the k closest keys.
Or the two single keys <= and >=. Or ... Pick one; take some form of
average of their values; use a domain-specific interpolation function
on their keys & values; ...

To make it concrete, suppose you're running a COVID-19 app where
reports of new cases are coming in around the clock. One SortedDict
may be used to map latitude to the number of confirmed cases reported.
If a researcher asks for the number at a specific latitude, but no
exact match is found, fine, you can still efficiently show them the
results across the smallest known interval containing the latitude of
interest. Or if they want the results broken into buckets of latitudes
in 15-degree (N-degree - any N they ask for) intervals, also easy:
enumerating all entries in a given key range also goes fast. Or ...

A sorted list supports all sorts of membership queries efficiently.
Under the covers, a SortedDict _is_ a SortedList, plus a regular
Python dict to associate values with the list's elements.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FVDQZ4FJMSPYYTKNDRGLUXMOE226YIDK/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 9, 2021 at 11:05 PM Paul Bryan <pbryan@anode.ca> wrote:

> On Tue, Nov 09, 2021 at 10:01:35PM -0800, Christopher Barker wrote:
>
> What are use cases for sorted dicts?
>
>
Good question :-)

It could be handy for deterministic iteration of its values, for example to
> allow serialized values to be easily compared, or to generate and verify a
> signatures.
>

Yup -- I've done that. But for that, it's fine to sort when you are doing
the comparing / serialization. What I haven't been able to imagine is a use
case for keeping a
Mapping sorted constantly as you insert / remove items. For that you'd need
a use case where you were making sorted queries of various sorts. I
noticed, for instance, that the Java implementation posted earlier had
methods like "all the items after this one" (can't remember how that was
spelled).

I'm sure there are use cases for this, I'm probably lacking imagination :-)

-CHB


--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Having Sorted Containers in stdlib? [ In reply to ]
[Bob Fang <boyufang@bytedance.com>]
> This is a modest proposal to consider having sorted containers
> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.

+1 from me, but if and only if Grant Jenks (its author) wants that too.

It's first-rate code in all respects, including that it's a fine
example _of_ Python programming (it's not written in C - in Python).

People often say "well, put a thing on PyPI first, and see whether
people really want it!".

SortedContainers has been on PyPI for years straight now, and still
gets hundreds of thousands of downloads from there every day:

https://pypistats.org/packages/sortedcontainers

So it's already widely adopted in the community. What else would it
need to prove?

If it still doesn't qualify, "well, put a thing on PyPI first" is just
obstructionism ;-)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SE6YOVJQI6OEWNM5SAQL3X5VPEFGVZAA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
I agree with Tim. Subject, of course, to the same caveat Tim mentions: does
the creator want this?

I haven't used the library much, but it's obviously top quality, and adding
pure-Python code is less burden than C implementations.

On Wed, Nov 10, 2021, 10:19 PM Tim Peters <tim.peters@gmail.com> wrote:

> [Bob Fang <boyufang@bytedance.com>]
> > This is a modest proposal to consider having sorted containers
> > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
>
> +1 from me, but if and only if Grant Jenks (its author) wants that too.
>
> It's first-rate code in all respects, including that it's a fine
> example _of_ Python programming (it's not written in C - in Python).
>
> People often say "well, put a thing on PyPI first, and see whether
> people really want it!".
>
> SortedContainers has been on PyPI for years straight now, and still
> gets hundreds of thousands of downloads from there every day:
>
> https://pypistats.org/packages/sortedcontainers
>
> So it's already widely adopted in the community. What else would it
> need to prove?
>
> If it still doesn't qualify, "well, put a thing on PyPI first" is just
> obstructionism ;-)
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/SE6YOVJQI6OEWNM5SAQL3X5VPEFGVZAA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, 10 Nov 2021 21:12:17 -0600
Tim Peters <tim.peters@gmail.com> wrote:
> [Bob Fang <boyufang@bytedance.com>]
> > This is a modest proposal to consider having sorted containers
> > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
>
> +1 from me, but if and only if Grant Jenks (its author) wants that too.
>
> It's first-rate code in all respects, including that it's a fine
> example _of_ Python programming (it's not written in C - in Python).

Agreed with Tim. This is a perfect example of some basic and perennial
facility that would fit very well in the stdlib.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WJ4ANAMAYPVRHJNXMIASNAAWAZ45GPSZ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
>
> On Wed, 10 Nov 2021 21:12:17 -0600
> Tim Peters <tim.peters@gmail.com> wrote:
> > [Bob Fang <boyufang@bytedance.com>]
> > > This is a modest proposal to consider having sorted containers
> > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
> >
> > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> >
> > It's first-rate code in all respects, including that it's a fine
> > example _of_ Python programming (it's not written in C - in Python).
>
> Agreed with Tim. This is a perfect example of some basic and perennial
> facility that would fit very well in the stdlib.

I agree as well. Is anyone interested enough to ask the library author
if he supports doing this? That seems to be the main unanswered
question here.

But if anyone wants to argue the "the stdlib should be shrinking, not
growing" position, I suggest they do so *before* someone reaches out
to the module author. No point in us making the suggestion and then
being forced to withdraw it.

Paul
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.

So I suppose we can take a poll on this? If majority of us agree then I am happy to reach out to the author since I am the one who started this thread.

Thanks,
Bob

> On 11 Nov 2021, at 12:28, Paul Moore <p.f.moore@gmail.com> wrote:
>
> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org <mailto:antoine@python.org>> wrote:
>>
>> On Wed, 10 Nov 2021 21:12:17 -0600
>> Tim Peters <tim.peters@gmail.com> wrote:
>>> [Bob Fang <boyufang@bytedance.com>]
>>>> This is a modest proposal to consider having sorted containers
>>>> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
>>>
>>> +1 from me, but if and only if Grant Jenks (its author) wants that too.
>>>
>>> It's first-rate code in all respects, including that it's a fine
>>> example _of_ Python programming (it's not written in C - in Python).
>>
>> Agreed with Tim. This is a perfect example of some basic and perennial
>> facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>
> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org>
> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org>
> https://mail.python.org/mailman3/lists/python-dev.python.org/ <https://mail.python.org/mailman3/lists/python-dev.python.org/>
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/ <https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/>
> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
Re: Having Sorted Containers in stdlib? [ In reply to ]
I'll join Christopher Barker and Chris Angelico in voicing scepticism --
personally, I feel like I'm yet to be persuaded that the use case is strong
enough. sortedcontainers is a wonderful package, and I would completely
support a prominent link to the library in the Python documentation. But
the use case seems too niche and specific, in my opinion, for the library
to be added to the standard library. I see the standard library as a
collection of basic building blocks that can be used to create a multitude
of other things. sortedcontainers feel like a specialised solution to a
small collection of problems, rather than a general solution to a large
collection of problems. PyPI feels like the right place for this library.

Best,
Alex

On Thu, Nov 11, 2021 at 12:33 PM Paul Moore <p.f.moore@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.peters@gmail.com> wrote:
> > > [Bob Fang <boyufang@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim. This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>
> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

1 2  View All