Mailing List Archive

Having Sorted Containers in stdlib?
Hi All,

This is a modest proposal to consider having sorted containers (http://www.grantjenks.com/docs/sortedcontainers/ <http://www.grantjenks.com/docs/sortedcontainers/>) in standard library. I know that usually adding stuff to standard library requires some strong arguments, so I will try my best to give my reasons here:

1) Some mainstream language support them out of box: C++ for example have set/map which are sorted by the key order, and Java has TreeMap which is internally a Red-black tree. I understand languages might target different audiences, but I think Python’s standard library is quite extensive compared to peers. Consider we even have a sqlite driver in the stdlib, I do not think it is outrageous to have sorted containers.
2) These containers are not really easy to implement correctly, and usually is out of the scope of day-to-day projects. Especially considering we have a large audience of non-hardcore programmers in Python community. They may have the need to use these structures, but they do not necessarily have the skill/knowledge to implement it.
3) Granted, people can just pip install this library, but that is one extra step and less fraction is better for user experience.
4) These structures are very useful in competitive programming, I know at least in Leetcode this library is installed for Python.
5) The said library is of high implementation quality.

I might be stupid here as I am not the maintainer of this library and I might be not even in a position to submit it to Python as part of stdlib, but here are some of my personal thoughts and would love to hear your opinion!

Thanks!
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, 9 Nov 2021 at 16:01, Bob Fang <boyufang@bytedance.com> wrote:

> This is a modest proposal to consider having sorted containers (
> http://www.grantjenks.com/docs/sortedcontainers/
> <https://t.mailpgn.com/l/?u=5264d942-a595-49ac-a64a-986ce181d7fb&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3Da3e560c9-4417-427b-b976-19746738d8e5%26amp%3Bfl%3Dhttp%253A%252F%252Fwww.grantjenks.com%252Fdocs%252Fsortedcontainers%252F>)
> in standard library. I know that usually adding stuff to standard library
> requires some strong arguments, so I will try my best to give my reasons
> here:
>

As of 3.7. dicts are sorted[1], but I'm unsure if the order can be
overridden. Indeed:

>>> Boys = {'Tim': 18,'Charlie':12,'Robert':25}

Girls = {'Tiffany':22}


>>> print(cmp(Girls, Boys))

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'cmp' is not defined

>>> Girls.__gt__(Boys)

NotImplemented


-- H
--
OpenPGP: https://hasan.d8u.us/openpgp.asc
<https://t.mailpgn.com/l/?u=b8770bd0-a9ef-411d-8c87-1636e6d9b77a&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D5a3f5883-7b16-4b11-80ad-be68170836a1%26amp%3Bfl%3Dhttps%253A%252F%252Fhasan.d8u.us%252Fopenpgp.asc>
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
<https://t.mailpgn.com/l/?u=8467f9a6-914d-4d6d-8c00-dd503e667146&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D20c36f1d-4104-40d6-8c5e-b8bee61caaa4%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>*
.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
<https://t.mailpgn.com/l/?u=be22c143-79b8-4a2b-b7f9-cb7a95a8e7e4&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D143cb038-fda7-4317-aef6-37d17da9dfde%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>*
.

<https://t.mailpgn.com/l/?u=b065018c-6b50-41c8-9881-508f9f2f30fa&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D9c5843eb-43ed-470a-83fc-1ca514c43810%26amp%3Bfl%3Dhttps%253A%252F%252Fsks-keyservers.net%252Fpks%252Flookup%253Fop%253Dget%2526amp%253Bsearch%253D0xFEBAD7FFD041BBA1>Sent
from my mobile device
Envoye de mon portable
1. https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/
<https://t.mailpgn.com/l/?u=e7d71732-691a-421f-b43a-a9cdb3c2b9d0&fl=https%3A%2F%2Fsoftwaremaniacs.org%2Fblog%2F2020%2F02%2F05%2Fdicts-ordered%2F>
Re: Having Sorted Containers in stdlib? [ In reply to ]
But that’s in insertion order, not in key order right? I think we need data structure that are key-ordered.

> On 10 Nov 2021, at 00:23, Hasan Diwan <hasan.diwan@gmail.com> wrote:
>
>
> On Tue, 9 Nov 2021 at 16:01, Bob Fang <boyufang@bytedance.com <mailto:boyufang@bytedance.com>> wrote:
> This is a modest proposal to consider having sorted containers (http://www.grantjenks.com/docs/sortedcontainers/ <https://t.mailpgn.com/l/?u=5264d942-a595-49ac-a64a-986ce181d7fb&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3Da3e560c9-4417-427b-b976-19746738d8e5%26amp%3Bfl%3Dhttp%253A%252F%252Fwww.grantjenks.com%252Fdocs%252Fsortedcontainers%252F>) in standard library. I know that usually adding stuff to standard library requires some strong arguments, so I will try my best to give my reasons here:
>
> As of 3.7. dicts are sorted[1], but I'm unsure if the order can be overridden. Indeed:
>
> >>> Boys = {'Tim': 18,'Charlie':12,'Robert':25}
> Girls = {'Tiffany':22}
>
> >>> print(cmp(Girls, Boys))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'cmp' is not defined
> >>> Girls.__gt__(Boys)
> NotImplemented
>
> -- H
> --
> OpenPGP: https://hasan.d8u.us/openpgp.asc <https://t.mailpgn.com/l/?u=b8770bd0-a9ef-411d-8c87-1636e6d9b77a&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D5a3f5883-7b16-4b11-80ad-be68170836a1%26amp%3Bfl%3Dhttps%253A%252F%252Fhasan.d8u.us%252Fopenpgp.asc>
> If you wish to request my time, please do so using bit.ly/hd1AppointmentRequest <https://t.mailpgn.com/l/?u=8467f9a6-914d-4d6d-8c00-dd503e667146&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D20c36f1d-4104-40d6-8c5e-b8bee61caaa4%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>.
> Si vous voudrais faire connnaisance, allez a bit.ly/hd1AppointmentRequest <https://t.mailpgn.com/l/?u=be22c143-79b8-4a2b-b7f9-cb7a95a8e7e4&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D143cb038-fda7-4317-aef6-37d17da9dfde%26amp%3Bfl%3Dhttp%253A%252F%252Fbit.ly%252Fhd1AppointmentRequest>.
>
> <https://t.mailpgn.com/l/?u=b065018c-6b50-41c8-9881-508f9f2f30fa&fl=https%3A%2F%2Ft.mailpgn.com%2Fl%2F%3Fu%3D9c5843eb-43ed-470a-83fc-1ca514c43810%26amp%3Bfl%3Dhttps%253A%252F%252Fsks-keyservers.net%252Fpks%252Flookup%253Fop%253Dget%2526amp%253Bsearch%253D0xFEBAD7FFD041BBA1>Sent from my mobile device
> Envoye de mon portable
> 1. https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/ <https://t.mailpgn.com/l/?u=e7d71732-691a-421f-b43a-a9cdb3c2b9d0&fl=https%3A%2F%2Fsoftwaremaniacs.org%2Fblog%2F2020%2F02%2F05%2Fdicts-ordered%2F>_______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RNXTH3JRUZ5WH33AKHVOSZF34FGNMS6S/
> Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, 9 Nov 2021, 17:05 Bob Fang, <boyufang@bytedance.com> wrote:

> But that’s in insertion order, not in key order right? I think we need
> data structure that are key-ordered.
>
According to the tests, it seems to be key-ordered. It also appears that
the ordering cannot be changed (yet?). -- H
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 09, 2021 at 04:23:50PM -0800, Hasan Diwan wrote:

> As of 3.7. dicts are sorted[1], but I'm unsure if the order can be
> overridden.


Dicts are not sorted, they are kept in insertion order.

>>> d = {3: 'a', 4: 'b', 1: 'c', 2: 'd'}
>>> d
{3: 'a', 4: 'b', 1: 'c', 2: 'd'}

See the docs:

https://docs.python.org/3/library/stdtypes.html#dict

although you have to scroll almost all the way to then end, just before
the dict view objects, to see it documented.

Sorting dicts has been discussed on the Python-Ideas mailing list, it is
too hard and expensive to justify for the limited use-cases for it. If
you want to sort a dict, you are best to sort the dict's keys, then
create a new dict. Or possibly use a dedicated Sorted Mapping type like
a red-black tree or similar.


Hasan wrote:

> >>> print(cmp(Girls, Boys))
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> NameError: name 'cmp' is not defined

The 'cmp' built-in was removed in Python 3.0.

In any case, dicts don't support order comparisons.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/AV5RVB37YR7TZ3ZURI4ZIHCS5Y2XV67Y/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
Hi Bob and welcome,

Before we could even consider adding the sortedcontainers library to the
standard library, we would need to hear from the maintainer(s) of the
library that they agree to the move and would be able to continue
maintaining the library under our release schedule and backwards
compatibility guarantees.

Otherwise, you would need to find a core developer willing to
re-implement the containers and maintain them.

I don't want to discourage you, but even if the maintainer is
willing, we night not decide to add it. Every new feature, class and
function adds to the weight of learning Python, and the cost of
maintenance. We must balance that against the benefit, and only add
features where the benefits are greater than the costs.

Our decision making is usually very conservative, because we have strong
requirements for backwards-compatibility. Once we add something to the
stdlib, we can't easily change our mind and remove it again. So we
follow the Zen of Python:

>>> import this
The Zen of Python, by Tim Peters
[...]
Now is better than never.
Although never is often better than *right* now.


Have you looked at the Python PEPs? If you are serious about pushing
this proposal, your first step should be to read PEP 1 and then browse
through the collection of successful and unsuccessful PEPs:

https://www.python.org/dev/peps/pep-0001/

https://www.python.org/dev/peps/


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VCM5PUEHCGRIYTPPRAZZUSEEJ3KW2DTN/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <steve@pearwood.info> wrote:

> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
> too hard and expensive to justify for the limited use-cases for it. If
> you want to sort a dict, you are best to sort the dict's keys, then
> create a new dict. Or possibly use a dedicated Sorted Mapping type like
> a red-black tree or similar.
>

There are several implementations of key-order sorted dicts available in
Python, and more than a few of them are well tested. I am the maintainer
of one of them: https://pypi.org/project/treap/ which I ported from Java.
I also did a performance comparison among several of them a few years back.

Red-black trees are popular, perhaps especially among Java developers, but
I've never understood why. They aren't the fastest. Among traditional
tree-based key-sorted dicts, treaps are frequently fastest.

However, SortedDict is not uncommonly faster than treaps - I believe this
is because SortedDict is very good at maximizing locality of reference.
Traditional trees are almost always going to do an entire cache line hit
for every node in large trees, even if those nodes are "next to each other"
when sorted/traversed. SortedDicts put keys that are nearly equal, in
nearly the same part of memory - so multiple values can be retrieved with a
single cache line hit.

Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
sometimes even O(n^2 * logn). This is not good. A traditional tree should
give O(nlogn) algorithms under similar circumstances, and although my gut
is telling me SortedDict is similar the presentation linked below suggests
a different (though still better than sorting in a loop) runtime for
SortedDict.

It's true that key-sorted dicts are not all that common. Their most common
use is probably for implementing finite caches - evictions can benefit from
ordered keys. However, we already have functools.lru_cache.

Here's my most recent performance comparison:
https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf

Here's a PyCon 2016 presentation about SortedContainers, that includes
SortedDict:
https://www.youtube.com/watch?v=7z2Ki44Vs4E

I think it would make sense to include at least one key-sorted dict in
CPython, and feel that SortedDict would not be a bad way to go. Neither
would treap.
Re: Having Sorted Containers in stdlib? [ In reply to ]
Maybe a stupid question:

What are use cases for sorted dicts?

I don’t think I’ve ever needed one.

Also, I can’t quite tell from the discussion If a “sorted dict” implements
something new, or is an internal data structure that gives better
performance for particular use cases. I.e. is a sorted dict a Mapping?

-CHB



On Tue, Nov 9, 2021 at 9:45 PM Dan Stromberg <drsalists@gmail.com> wrote:

>
> On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <steve@pearwood.info>
> wrote:
>
>> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
>> too hard and expensive to justify for the limited use-cases for it. If
>> you want to sort a dict, you are best to sort the dict's keys, then
>> create a new dict. Or possibly use a dedicated Sorted Mapping type like
>> a red-black tree or similar.
>>
>
> There are several implementations of key-order sorted dicts available in
> Python, and more than a few of them are well tested. I am the maintainer
> of one of them: https://pypi.org/project/treap/ which I ported from
> Java. I also did a performance comparison among several of them a few
> years back.
>
> Red-black trees are popular, perhaps especially among Java developers, but
> I've never understood why. They aren't the fastest. Among traditional
> tree-based key-sorted dicts, treaps are frequently fastest.
>
> However, SortedDict is not uncommonly faster than treaps - I believe this
> is because SortedDict is very good at maximizing locality of reference.
> Traditional trees are almost always going to do an entire cache line hit
> for every node in large trees, even if those nodes are "next to each other"
> when sorted/traversed. SortedDicts put keys that are nearly equal, in
> nearly the same part of memory - so multiple values can be retrieved with a
> single cache line hit.
>
> Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
> sometimes even O(n^2 * logn). This is not good. A traditional tree should
> give O(nlogn) algorithms under similar circumstances, and although my gut
> is telling me SortedDict is similar the presentation linked below suggests
> a different (though still better than sorting in a loop) runtime for
> SortedDict.
>
> It's true that key-sorted dicts are not all that common. Their most
> common use is probably for implementing finite caches - evictions can
> benefit from ordered keys. However, we already have functools.lru_cache.
>
> Here's my most recent performance comparison:
> https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf
>
> Here's a PyCon 2016 presentation about SortedContainers, that includes
> SortedDict:
> https://www.youtube.com/watch?v=7z2Ki44Vs4E
>
> I think it would make sense to include at least one key-sorted dict in
> CPython, and feel that SortedDict would not be a bad way to go. Neither
> would treap.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VNK4VPGA4LJH7RMR4LCCACEG2WNDBBFO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Having Sorted Containers in stdlib? [ In reply to ]
A quick Google for "treap python github" yielded
https://github.com/TheAlgorithms/Python/blob/master/data_structures/binary_tree/treap.py
.

On Tue, 9 Nov 2021 at 21:49, Dan Stromberg <drsalists@gmail.com> wrote:

>
> On Tue, Nov 9, 2021 at 9:00 PM Steven D'Aprano <steve@pearwood.info>
> wrote:
>
>> Sorting dicts has been discussed on the Python-Ideas mailing list, it is
>> too hard and expensive to justify for the limited use-cases for it. If
>> you want to sort a dict, you are best to sort the dict's keys, then
>> create a new dict. Or possibly use a dedicated Sorted Mapping type like
>> a red-black tree or similar.
>>
>
> There are several implementations of key-order sorted dicts available in
> Python, and more than a few of them are well tested. I am the maintainer
> of one of them: https://pypi.org/project/treap/ which I ported from
> Java. I also did a performance comparison among several of them a few
> years back.
>
> Red-black trees are popular, perhaps especially among Java developers, but
> I've never understood why. They aren't the fastest. Among traditional
> tree-based key-sorted dicts, treaps are frequently fastest.
>
> However, SortedDict is not uncommonly faster than treaps - I believe this
> is because SortedDict is very good at maximizing locality of reference.
> Traditional trees are almost always going to do an entire cache line hit
> for every node in large trees, even if those nodes are "next to each other"
> when sorted/traversed. SortedDicts put keys that are nearly equal, in
> nearly the same part of memory - so multiple values can be retrieved with a
> single cache line hit.
>
> Sorting a dict's keys inside a loop tends to give O(n^2) algorithms, and
> sometimes even O(n^2 * logn). This is not good. A traditional tree should
> give O(nlogn) algorithms under similar circumstances, and although my gut
> is telling me SortedDict is similar the presentation linked below suggests
> a different (though still better than sorting in a loop) runtime for
> SortedDict.
>
> It's true that key-sorted dicts are not all that common. Their most
> common use is probably for implementing finite caches - evictions can
> benefit from ordered keys. However, we already have functools.lru_cache.
>
> Here's my most recent performance comparison:
> https://stromberg.dnsalias.org/~strombrg/sorted-dictionary-comparison/Datastructure%20comparison.pdf
>
> Here's a PyCon 2016 presentation about SortedContainers, that includes
> SortedDict:
> https://www.youtube.com/watch?v=7z2Ki44Vs4E
>
> I think it would make sense to include at least one key-sorted dict in
> CPython, and feel that SortedDict would not be a bad way to go. Neither
> would treap.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/VNK4VPGA4LJH7RMR4LCCACEG2WNDBBFO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
OpenPGP: https://hasan.d8u.us/openpgp.asc
If you wish to request my time, please do so using
*bit.ly/hd1AppointmentRequest
<http://bit.ly/hd1AppointmentRequest>*.
Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest
<http://bit.ly/hd1AppointmentRequest>*.

<https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1>Sent
from my mobile device
Envoye de mon portable
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, Nov 10, 2021 at 5:03 PM Christopher Barker <pythonchb@gmail.com> wrote:
>
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

Me neither, tbh.

> Also, I can’t quite tell from the discussion If a “sorted dict” implements something new, or is an internal data structure that gives better performance for particular use cases. I.e. is a sorted dict a Mapping?
>

Nothing's technically new. You could make an inefficient sorted dict like this:

class SortedDict(dict):
def __iter__(self): return iter(sorted(self.keys()))

So in that sense, yes, it's better performance for the use-case of
needing the keys to be in order. It's not something I have ever really
needed - on the rare occasions when I need something like that, it's
usually no harder to maintain a separate sorted list or something.

IMO this is a great tool to have on PyPI. Someone can start the naive
way, run into major performance problems, and then go "huh, maybe
someone else has already solved this problem". Doesn't need to be in
the stdlib for that.

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SL2UXHDTIUQQ6ARNLPBXZCEHVSVONRJH/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 09, 2021 at 10:01:35PM -0800, Christopher Barker wrote:
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

Good question :-)


> Also, I can’t quite tell from the discussion If a “sorted dict” implements
> something new, or is an internal data structure that gives better
> performance for particular use cases. I.e. is a sorted dict a Mapping?

All dicts are mappings :-) I would expect any kind of sorted dict to
support the full mutable mapping interface.

Some mappings are not necessarily implemented as hash tables, as dict
is. Some variety of tree is a common choice.

I expect that a sorted dict (however it is implemented) would be a
mapping that preserves sorted order on insertions and deletions, rather
than insertion order. (Or some arbitrary order.)

I haven't used one myself, but I would expect an API where you create
a Sorted Dict (of whatever library or implementation you prefer), set an
optional key function and direction (ascending or descending), then as
you insert keys, the mapping keeps the keys in sort order.

Note that this is different from *sorting* a dict, which (if it were
supported) has to be done explicitly. Between sorts, the dict would be
capable of getting out of sorted order. The idea of a *sorted* dict is
that the sort order is an invariant, rather than something that can come
and go as you insert and delete items.


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/JCFQYOSD5NQ3GFGLRVW7QWISXQL5LHS5/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, Nov 10, 2021 at 05:11:33PM +1100, Chris Angelico wrote:

> Nothing's technically new. You could make an inefficient sorted dict like this:
>
> class SortedDict(dict):
> def __iter__(self): return iter(sorted(self.keys()))

You would need more than that. You would want to ensure that the dict's
repr shows it is sorted order. You would need popitem to pop items in
the appropriate order. There may be other APIs that sorted dicts provide
than merely sorting the keys on iteration.

You don't actually need to implement this to see how repeatedly sorting
the keys would give terrible performance for anything above small sets
of data.

Here's Java's standard SortedMap:

https://docs.oracle.com/javase/7/docs/api/java/util/SortedMap.html

That should give you some idea of the interface a sorted dict would
likely provide.


> IMO this is a great tool to have on PyPI. Someone can start the naive
> way, run into major performance problems, and then go "huh, maybe
> someone else has already solved this problem". Doesn't need to be in
> the stdlib for that.

You may have missed that this thread has already discussed two mature,
good quality sorted dict implementations that have been on PyPI for
years:

https://pypi.org/project/sortedcontainers/

https://pypi.org/project/treap-python/


Here are a few more:

https://pypi.org/project/ruamel.ordereddict/

https://github.com/surenkov/PySortedDict

https://pypi.org/project/sorteddict/


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HPEXFUZR6Z6AHW4MHABWXT7UKCM6ELYZ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, Nov 10, 2021 at 5:45 PM Steven D'Aprano <steve@pearwood.info> wrote:
>
> On Wed, Nov 10, 2021 at 05:11:33PM +1100, Chris Angelico wrote:
>
> > Nothing's technically new. You could make an inefficient sorted dict like this:
> >
> > class SortedDict(dict):
> > def __iter__(self): return iter(sorted(self.keys()))
>
> You would need more than that. You would want to ensure that the dict's
> repr shows it is sorted order. You would need popitem to pop items in
> the appropriate order. There may be other APIs that sorted dicts provide
> than merely sorting the keys on iteration.

Sure. That was a massively oversimplified one, to point out that the
question of whether this is "truly new" is fairly meaningless. A thing
can be extremely useful without actually being *new* per se.

> You don't actually need to implement this to see how repeatedly sorting
> the keys would give terrible performance for anything above small sets
> of data.

And yes. That is exactly the problem.

> > IMO this is a great tool to have on PyPI. Someone can start the naive
> > way, run into major performance problems, and then go "huh, maybe
> > someone else has already solved this problem". Doesn't need to be in
> > the stdlib for that.
>
> You may have missed that this thread has already discussed two mature,
> good quality sorted dict implementations that have been on PyPI for
> years:
>
> https://pypi.org/project/sortedcontainers/
>
> https://pypi.org/project/treap-python/
>
>
> Here are a few more:
>
> https://pypi.org/project/ruamel.ordereddict/
>
> https://github.com/surenkov/PySortedDict
>
> https://pypi.org/project/sorteddict/
>

I know it's on PyPI, in multiple forms. I'm saying that it's a great
tool to keep there, not in the stdlib.

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/QKFJJBYDAGHVFWSLSUSYS4WY7ECJ3ONQ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, 2021-11-10 at 17:16 +1100, Steven D'Aprano wrote:
> On Tue, Nov 09, 2021 at 10:01:35PM -0800, Christopher Barker wrote:
> > Maybe a stupid question:
> >
> > What are use cases for sorted dicts?
> >
> > I don’t think I’ve ever needed one.
>
> Good question :-)

It could be handy for deterministic iteration of its values, for
example to  allow serialized values to be easily compared, or to
generate and verify a signatures. It's edge case material; I wouldn't
think it's strong enough use case for inclusion in stdlib.
Re: Having Sorted Containers in stdlib? [ In reply to ]
10.11.21 01:47, Bob Fang ????:
> 1) Some mainstream language support them out of box: C++ for example
> have set/map which are sorted by the key order, and Java has TreeMap
> which is internally a Red-black tree.

>From my C++ and Java experience, hashtable-based containers are much
more useful than tree-based containers. They are more compact and fast.
In most cases the only reason of using sorted container is supporting
deterministic iteration order, but often it is enough to sort data only
for output. It is easy to do with the sorted() builtin in Python.

We need some dict implementatin in Python, it is essential part of the
language used to implement modules, classes and objects. And
hashtable-based implementation is good for this. There is no such need
of tree-based implementation. It is not needed in Python core, and is
not needed for most users. So it is good to keep it in third-party
libraries. Maintaining it in the stdlib has a cost and the benefit/cost
value would be low.

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LG6GNFGRI4OSBEWF6KTIWLI65UXWMTIM/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
Hi Steve and all,

Thanks!

> Before we could even consider adding the sortedcontainers library to the
> standard library, we would need to hear from the maintainer(s) of the
> library that they agree to the move and would be able to continue
> maintaining the library under our release schedule and backwards
> compatibility guarantees.

Totally agree. And as others mentioned in this thread, I agree the use cases for Sorted Containers are weak to make it worthwhile to be added to stdlib. I was more thinking about the completeness of the stdlib when I was proposing, but I do agree there is a cost/benefit analysis need to be done and it is great to see all the responses.
As other pointed out this is mainly useful to implement some kind of cache, but since we already have lru_cache a lower level sorted dict may not be needed.

Thanks for all the responses.

Best,
Bob

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TCCZZYGREV77OG6JBE3YWYE7NXOVZL4G/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
IMO if someone is motivated to get a new container type in Python, a
PEP is required.

A PEP has been written for the new removeprefix() and removesuffix()
methods which are way simpler ;-)

I expect many questions on corner cases for a sorted container type.

Having a reference implementation, even written in Python, would be helpful.

Victor

On Wed, Nov 10, 2021 at 1:00 AM Bob Fang <boyufang@bytedance.com> wrote:
>
> Hi All,
>
> This is a modest proposal to consider having sorted containers (http://www.grantjenks.com/docs/sortedcontainers/) in standard library. I know that usually adding stuff to standard library requires some strong arguments, so I will try my best to give my reasons here:
>
> 1) Some mainstream language support them out of box: C++ for example have set/map which are sorted by the key order, and Java has TreeMap which is internally a Red-black tree. I understand languages might target different audiences, but I think Python’s standard library is quite extensive compared to peers. Consider we even have a sqlite driver in the stdlib, I do not think it is outrageous to have sorted containers.
> 2) These containers are not really easy to implement correctly, and usually is out of the scope of day-to-day projects. Especially considering we have a large audience of non-hardcore programmers in Python community. They may have the need to use these structures, but they do not necessarily have the skill/knowledge to implement it.
> 3) Granted, people can just pip install this library, but that is one extra step and less fraction is better for user experience.
> 4) These structures are very useful in competitive programming, I know at least in Leetcode this library is installed for Python.
> 5) The said library is of high implementation quality.
>
> I might be stupid here as I am not the maintainer of this library and I might be not even in a position to submit it to Python as part of stdlib, but here are some of my personal thoughts and would love to hear your opinion!
>
> Thanks!
> Bob
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/YB2JD477TKPB2HTXDW6ZXUBD6NFFFHHJ/
> Code of Conduct: http://python.org/psf/codeofconduct/



--
Night gathers, and now my watch begins. It shall not end until my death.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/X5AXOUUHN7HBTMIQZQTR5LCNSQJGY27B/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
[Christopher Barker <pythonchb@gmail.com>]
> Maybe a stupid question:
>
> What are use cases for sorted dicts?
>
> I don’t think I’ve ever needed one.

For example, for some mappings with totally ordered keys, it can be
useful to ask for the value associated with a key that's not actually
there, because "close to the key is good enough".

A SortedDict supports several ways of doing that efficiently,
depending on what - exactly - an app means by "good enough". For
example, it's easy to find _the_ closest key. Or the k closest keys.
Or the two single keys <= and >=. Or ... Pick one; take some form of
average of their values; use a domain-specific interpolation function
on their keys & values; ...

To make it concrete, suppose you're running a COVID-19 app where
reports of new cases are coming in around the clock. One SortedDict
may be used to map latitude to the number of confirmed cases reported.
If a researcher asks for the number at a specific latitude, but no
exact match is found, fine, you can still efficiently show them the
results across the smallest known interval containing the latitude of
interest. Or if they want the results broken into buckets of latitudes
in 15-degree (N-degree - any N they ask for) intervals, also easy:
enumerating all entries in a given key range also goes fast. Or ...

A sorted list supports all sorts of membership queries efficiently.
Under the covers, a SortedDict _is_ a SortedList, plus a regular
Python dict to associate values with the list's elements.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FVDQZ4FJMSPYYTKNDRGLUXMOE226YIDK/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Tue, Nov 9, 2021 at 11:05 PM Paul Bryan <pbryan@anode.ca> wrote:

> On Tue, Nov 09, 2021 at 10:01:35PM -0800, Christopher Barker wrote:
>
> What are use cases for sorted dicts?
>
>
Good question :-)

It could be handy for deterministic iteration of its values, for example to
> allow serialized values to be easily compared, or to generate and verify a
> signatures.
>

Yup -- I've done that. But for that, it's fine to sort when you are doing
the comparing / serialization. What I haven't been able to imagine is a use
case for keeping a
Mapping sorted constantly as you insert / remove items. For that you'd need
a use case where you were making sorted queries of various sorts. I
noticed, for instance, that the Java implementation posted earlier had
methods like "all the items after this one" (can't remember how that was
spelled).

I'm sure there are use cases for this, I'm probably lacking imagination :-)

-CHB


--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Having Sorted Containers in stdlib? [ In reply to ]
[Bob Fang <boyufang@bytedance.com>]
> This is a modest proposal to consider having sorted containers
> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.

+1 from me, but if and only if Grant Jenks (its author) wants that too.

It's first-rate code in all respects, including that it's a fine
example _of_ Python programming (it's not written in C - in Python).

People often say "well, put a thing on PyPI first, and see whether
people really want it!".

SortedContainers has been on PyPI for years straight now, and still
gets hundreds of thousands of downloads from there every day:

https://pypistats.org/packages/sortedcontainers

So it's already widely adopted in the community. What else would it
need to prove?

If it still doesn't qualify, "well, put a thing on PyPI first" is just
obstructionism ;-)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/SE6YOVJQI6OEWNM5SAQL3X5VPEFGVZAA/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
I agree with Tim. Subject, of course, to the same caveat Tim mentions: does
the creator want this?

I haven't used the library much, but it's obviously top quality, and adding
pure-Python code is less burden than C implementations.

On Wed, Nov 10, 2021, 10:19 PM Tim Peters <tim.peters@gmail.com> wrote:

> [Bob Fang <boyufang@bytedance.com>]
> > This is a modest proposal to consider having sorted containers
> > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
>
> +1 from me, but if and only if Grant Jenks (its author) wants that too.
>
> It's first-rate code in all respects, including that it's a fine
> example _of_ Python programming (it's not written in C - in Python).
>
> People often say "well, put a thing on PyPI first, and see whether
> people really want it!".
>
> SortedContainers has been on PyPI for years straight now, and still
> gets hundreds of thousands of downloads from there every day:
>
> https://pypistats.org/packages/sortedcontainers
>
> So it's already widely adopted in the community. What else would it
> need to prove?
>
> If it still doesn't qualify, "well, put a thing on PyPI first" is just
> obstructionism ;-)
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/SE6YOVJQI6OEWNM5SAQL3X5VPEFGVZAA/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Wed, 10 Nov 2021 21:12:17 -0600
Tim Peters <tim.peters@gmail.com> wrote:
> [Bob Fang <boyufang@bytedance.com>]
> > This is a modest proposal to consider having sorted containers
> > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
>
> +1 from me, but if and only if Grant Jenks (its author) wants that too.
>
> It's first-rate code in all respects, including that it's a fine
> example _of_ Python programming (it's not written in C - in Python).

Agreed with Tim. This is a perfect example of some basic and perennial
facility that would fit very well in the stdlib.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WJ4ANAMAYPVRHJNXMIASNAAWAZ45GPSZ/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
>
> On Wed, 10 Nov 2021 21:12:17 -0600
> Tim Peters <tim.peters@gmail.com> wrote:
> > [Bob Fang <boyufang@bytedance.com>]
> > > This is a modest proposal to consider having sorted containers
> > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
> >
> > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> >
> > It's first-rate code in all respects, including that it's a fine
> > example _of_ Python programming (it's not written in C - in Python).
>
> Agreed with Tim. This is a perfect example of some basic and perennial
> facility that would fit very well in the stdlib.

I agree as well. Is anyone interested enough to ask the library author
if he supports doing this? That seems to be the main unanswered
question here.

But if anyone wants to argue the "the stdlib should be shrinking, not
growing" position, I suggest they do so *before* someone reaches out
to the module author. No point in us making the suggestion and then
being forced to withdraw it.

Paul
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.

So I suppose we can take a poll on this? If majority of us agree then I am happy to reach out to the author since I am the one who started this thread.

Thanks,
Bob

> On 11 Nov 2021, at 12:28, Paul Moore <p.f.moore@gmail.com> wrote:
>
> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org <mailto:antoine@python.org>> wrote:
>>
>> On Wed, 10 Nov 2021 21:12:17 -0600
>> Tim Peters <tim.peters@gmail.com> wrote:
>>> [Bob Fang <boyufang@bytedance.com>]
>>>> This is a modest proposal to consider having sorted containers
>>>> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
>>>
>>> +1 from me, but if and only if Grant Jenks (its author) wants that too.
>>>
>>> It's first-rate code in all respects, including that it's a fine
>>> example _of_ Python programming (it's not written in C - in Python).
>>
>> Agreed with Tim. This is a perfect example of some basic and perennial
>> facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>
> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org>
> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org>
> https://mail.python.org/mailman3/lists/python-dev.python.org/ <https://mail.python.org/mailman3/lists/python-dev.python.org/>
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/ <https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/>
> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
Re: Having Sorted Containers in stdlib? [ In reply to ]
I'll join Christopher Barker and Chris Angelico in voicing scepticism --
personally, I feel like I'm yet to be persuaded that the use case is strong
enough. sortedcontainers is a wonderful package, and I would completely
support a prominent link to the library in the Python documentation. But
the use case seems too niche and specific, in my opinion, for the library
to be added to the standard library. I see the standard library as a
collection of basic building blocks that can be used to create a multitude
of other things. sortedcontainers feel like a specialised solution to a
small collection of problems, rather than a general solution to a large
collection of problems. PyPI feels like the right place for this library.

Best,
Alex

On Thu, Nov 11, 2021 at 12:33 PM Paul Moore <p.f.moore@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.peters@gmail.com> wrote:
> > > [Bob Fang <boyufang@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim. This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>
> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
Earlier in the thread, we were pointed to multiple implementations.

Is this particular one clearly the “best”[*]?

If so, then sure.

-CHB

[*] best meaning “most appropriate for the stdlib”. A couple folks have
already pointed to the quality of the code. But my understanding is that
different algorithms are more or less appropriate for different use cases.
So is this one fairly “universal”?



On Thu, Nov 11, 2021 at 4:31 AM Paul Moore <p.f.moore@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.peters@gmail.com> wrote:
> > > [Bob Fang <boyufang@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim. This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>
> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Having Sorted Containers in stdlib? [ In reply to ]
Serhiy Storchaka wrote:
> From my C++ and Java experience, hashtable-based containers are much
> more useful than tree-based containers. They are more compact and fast.
> In most cases the only reason of using sorted container is supporting
> deterministic iteration order, but often it is enough to sort data only
> for output.

The concern is not constant speed factor but asymptotic complexity.
Inserting or deleting in a tree-based sorted collection is O(log n),
but lists are O(n): the bisect is O(log n) but moving trailing list elements
is O(n). It's memmove fast but dominates as n gets large.

There are many algorithms that are hard to implement as fast as they
can be without a tree. Something like rolling median would be a good
test case, e.g. "for a list N containing numbers, produce a list M of
length len(N) - 1 where M[i] == statistics.median(N[:i+1])". I think the
best you can do with the standard library and without writing a
balanced binary tree from scratch is O(n²) (if you can beat this I'd like
to know how!). With a tree-based sorted list it's O(n log n).

Dicts and sets cannot maintain an arbitrary order, except that
OrderedDict.move_to_end() very occasionally turns out to be all you
need (and even then, I find it tricky to reason about and apply
correctly in an algorithm: it requires you to maintain an invariant,
where SortedList maintains the sorted invariant for you).

> There is no such need
> of tree-based implementation. It is not needed in Python core, and is
> not needed for most users. So it is good to keep it in third-party
> libraries. Maintaining it in the stdlib has a cost and the benefit/cost
> value would be low.

The situation where this bites my firm the most is in interviews. This is
exactly the situation where we're asking candidates to demonstrate
their CS knowledge and produce an algorithm that has low asymptotic
complexity.

We use various services including HackerRank, which provide a
sandboxed Python environment that cannot use PyPI. There are quite
a few services in this kind of Educational+Challenge space including
some that execute Python in the browser using Brython etc.

My team believes that candidates with a Java background find the
challenge easier than candidates with a Python background, because
they just use NavigableSet and move on. I find that kind of embarrassing.
It requires us to apologise for Python's deficiency and fudge how we
score the question, which makes it hard to know we're providing a level
playing field.

(I would argue that we should not ask those kinds of questions or not
use these sandboxed interview environments but institutional friction
makes that magnitude of change very difficult.)

Putting something in the standard library means that it will, in due
course, appear in HackerRank and other services we use, and we can
expect candidates to be familiar with it and incorporate it in their
solutions.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VTOYBWO5OWXID4OIH5FWCKWSWRDXKSTB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, Nov 11, 2021 at 4:30 AM Paul Moore <p.f.moore@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.peters@gmail.com> wrote:
> > > [Bob Fang <boyufang@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim. This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>

A couple 2 cents and IMHO.

-----
The tldr is not so much "don't put it in the standard lib", but that the
stdlib would be better having particular implementations instead of a
one-size-fits-all e.g. SortedList.

Should the stdlib have e.g. SortedList? Probably not, because the use cases
of such data types are too niche to a one-size-fits-all implementation, and
there are too many implementations with too many of their own settings.
Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
because those are somewhat fundamental data structures and implementing,
testing, and verifying them is very much non-trivial. While niche, having
them at your immediate disposal is very powerful.
-----

Last year, for fun, after wishing there was a SortedSet in the standard
lib, I ended up implementing a Red-Black Tree and BTree based sorted
dictionary/set[1]. After then trying to use them for my use case[2], I
found that, in order to fully and truly exploit their benefits, the basic
Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
would let me, e.g. binary search to a particular spot and then iterate, or
give me a range between two points, etc. Such apis tend to be pretty
specialized to exploit their under data structure (which isn't to say an
abstract API can't be created, see Java's NavigableMap, but that level of
abstraction deserves careful consideration).

In the end, it was a fun exercise, but in practice a dictionary and
sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
wasn't worth the effort.

Anyways, I came to two particular, IMHO, conclusions:
1. Sorted collections are very niche. I *could* have used one, but probably
would have spent as much time fiddling with them as using a dict/sorted().
2. If you're in a niche use case, especially for performance, then
abstractions aren't really helpful. Using e.g. a BTree and knowing the
particulars of the implementation are going to be more helpful than having
an abstract SortedList and working out how to use its special APIs for
particular actions.

This basically echos Christopher Baker's sentiment of "whats the use case"
and "why doesn't a sorted() call at the end suffice" to some extent. IME,
Python's sort algorithm is just really good, so it's hard to beat simply
calling sorted() when you're done processing in both runtime performance
and effort-spent-optimizing. I struggle to think of or remember a
particular case where dropping in a sorted collection would be a clear and
obvious win. Maybe a more memory constrained case where sorted() creating a
new list is too much? IDK.

[1] As an aside, RB didn't perform nearly as well as sortedcontainers for
the majority of cases, but the naive BTree was about on par. IIRC,
sortedcollections is basically a specialized BTree.
[2] A wish that came about because I had sorted sets of exact/prefix
strings of 1,000 to 10,000 elements and needed them to interact in various
ways and wanted to preserve their sortedness.

-----

re: "Is this particular one clearly the “best”[*]?"

Performance wise, it's probably about the best insofar as a BTree of
depth=2 and high fanout (10,000 or something? I forget what
sortedcontainers defaults to) is. In my limited experience, it's docs and
claims about its performance were pretty accurate[3].

API wise, it has its share of idiosyncrasies like anything else. e.g.
SortedList claims to implement MutableSequence, but actually raises errors
when calling e.g. append(), which is a bit misleading / wrong to some
extent (to be fair, given MutableSequence's api contract, it's not like
there's particularly clear/right/good choice one can make here).

(disclaimer: i've minimal experience with sortedcontainers; when I looked
at it, it was really only for comparing my own toy BTree/RBT
behavior/performance with a known implementation)

[3] The exception being a claim that its (supposedly) better cache locality
is a significant factor. I think the underlying BTree algorithm is the
dominant factor. But someone who actually knows how to measure cache
locality would be a better judge of that than me, who doesn't know how.


> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 11:01:32 -0800
Richard Levasseur <richardlev@gmail.com> wrote:
>
> In the end, it was a fun exercise, but in practice a dictionary and
> sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
> wasn't worth the effort.
>
> Anyways, I came to two particular, IMHO, conclusions:
> 1. Sorted collections are very niche. I *could* have used one, but probably
> would have spent as much time fiddling with them as using a dict/sorted().

So the fact that sorted collections did not help for one single use
case leads you to conclude that "sorted collections are very niche"?
I don't find this very convincing.

Unfortunately, PyPI doesn't seem to offer a way to query the reverse
dependencies of a package, otherwise we could know how many packages
depend on the "sortedcontainers" package.

[...]

> 2. If you're in a niche use case, especially for performance, then
> abstractions aren't really helpful. Using e.g. a BTree and knowing the
> particulars of the implementation are going to be more helpful than having
> an abstract SortedList and working out how to use its special APIs for
> particular actions.

You don't have "an abstract SortedList", you have a concrete SortedList
with a well-defined implementation with stable performance
characteristics. So, again, this seems to be a red herring.

> This basically echos Christopher Baker's sentiment of "whats the use case"
> and "why doesn't a sorted() call at the end suffice" to some extent.

The use case is obviously when you have to *maintain* sorting at all
times. Yes, there are such situations. Some rather high-profile
Python packages, such as dask/distributed, rely on this for rather
elaborate algorithmic work (dask/distributed is a distributed task
scheduler, you may also think of it as a replacement for Spark).


I won't go over your performance considerations, but I'll just mention
that the "sortedcontainers" documentation has not one but *several*
sections exploring performance characteristics in several dimensions.
It is extremely rare for Python packages to provide such a detailed
insight on the matter (even the built-in `dict` is much less thoroughly
characterised).
http://www.grantjenks.com/docs/sortedcontainers/#user-guide

Regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EHZGBXSIXZE6ZDBGJPESDQQG4PCNOMLW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
sortedcontainers looks great!
I very much appreciate it if such well-done library have type hints
(embedded or at least in form of types-sortedcontainers pypi package).

From my experience of multidict library support, it is not a super-hard
challenge but something that needs attention and time resource from the
library maintainers or third-party volunteers.

On Thu, Nov 11, 2021 at 9:06 PM Richard Levasseur <richardlev@gmail.com>
wrote:

>
>
> On Thu, Nov 11, 2021 at 4:30 AM Paul Moore <p.f.moore@gmail.com> wrote:
>
>> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
>> >
>> > On Wed, 10 Nov 2021 21:12:17 -0600
>> > Tim Peters <tim.peters@gmail.com> wrote:
>> > > [Bob Fang <boyufang@bytedance.com>]
>> > > > This is a modest proposal to consider having sorted containers
>> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
>> library.
>> > >
>> > > +1 from me, but if and only if Grant Jenks (its author) wants that
>> too.
>> > >
>> > > It's first-rate code in all respects, including that it's a fine
>> > > example _of_ Python programming (it's not written in C - in Python).
>> >
>> > Agreed with Tim. This is a perfect example of some basic and perennial
>> > facility that would fit very well in the stdlib.
>>
>> I agree as well. Is anyone interested enough to ask the library author
>> if he supports doing this? That seems to be the main unanswered
>> question here.
>>
>> But if anyone wants to argue the "the stdlib should be shrinking, not
>> growing" position, I suggest they do so *before* someone reaches out
>> to the module author. No point in us making the suggestion and then
>> being forced to withdraw it.
>>
>
> A couple 2 cents and IMHO.
>
> -----
> The tldr is not so much "don't put it in the standard lib", but that the
> stdlib would be better having particular implementations instead of a
> one-size-fits-all e.g. SortedList.
>
> Should the stdlib have e.g. SortedList? Probably not, because the use
> cases of such data types are too niche to a one-size-fits-all
> implementation, and there are too many implementations with too many of
> their own settings.
> Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably
> so, because those are somewhat fundamental data structures and
> implementing, testing, and verifying them is very much non-trivial. While
> niche, having them at your immediate disposal is very powerful.
> -----
>
> Last year, for fun, after wishing there was a SortedSet in the standard
> lib, I ended up implementing a Red-Black Tree and BTree based sorted
> dictionary/set[1]. After then trying to use them for my use case[2], I
> found that, in order to fully and truly exploit their benefits, the basic
> Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
> would let me, e.g. binary search to a particular spot and then iterate, or
> give me a range between two points, etc. Such apis tend to be pretty
> specialized to exploit their under data structure (which isn't to say an
> abstract API can't be created, see Java's NavigableMap, but that level of
> abstraction deserves careful consideration).
>
> In the end, it was a fun exercise, but in practice a dictionary and
> sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
> wasn't worth the effort.
>
> Anyways, I came to two particular, IMHO, conclusions:
> 1. Sorted collections are very niche. I *could* have used one, but
> probably would have spent as much time fiddling with them as using a
> dict/sorted().
> 2. If you're in a niche use case, especially for performance, then
> abstractions aren't really helpful. Using e.g. a BTree and knowing the
> particulars of the implementation are going to be more helpful than having
> an abstract SortedList and working out how to use its special APIs for
> particular actions.
>
> This basically echos Christopher Baker's sentiment of "whats the use case"
> and "why doesn't a sorted() call at the end suffice" to some extent. IME,
> Python's sort algorithm is just really good, so it's hard to beat simply
> calling sorted() when you're done processing in both runtime performance
> and effort-spent-optimizing. I struggle to think of or remember a
> particular case where dropping in a sorted collection would be a clear and
> obvious win. Maybe a more memory constrained case where sorted() creating a
> new list is too much? IDK.
>
> [1] As an aside, RB didn't perform nearly as well as sortedcontainers for
> the majority of cases, but the naive BTree was about on par. IIRC,
> sortedcollections is basically a specialized BTree.
> [2] A wish that came about because I had sorted sets of exact/prefix
> strings of 1,000 to 10,000 elements and needed them to interact in various
> ways and wanted to preserve their sortedness.
>
> -----
>
> re: "Is this particular one clearly the “best”[*]?"
>
> Performance wise, it's probably about the best insofar as a BTree of
> depth=2 and high fanout (10,000 or something? I forget what
> sortedcontainers defaults to) is. In my limited experience, it's docs and
> claims about its performance were pretty accurate[3].
>
> API wise, it has its share of idiosyncrasies like anything else. e.g.
> SortedList claims to implement MutableSequence, but actually raises errors
> when calling e.g. append(), which is a bit misleading / wrong to some
> extent (to be fair, given MutableSequence's api contract, it's not like
> there's particularly clear/right/good choice one can make here).
>
> (disclaimer: i've minimal experience with sortedcontainers; when I looked
> at it, it was really only for comparing my own toy BTree/RBT
> behavior/performance with a known implementation)
>
> [3] The exception being a claim that its (supposedly) better cache
> locality is a significant factor. I think the underlying BTree algorithm is
> the dominant factor. But someone who actually knows how to measure cache
> locality would be a better judge of that than me, who doesn't know how.
>
>
>> Paul
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/HVZCFPKI7TY6WVLFK43KLVVDTEPIXFFQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
Thanks,
Andrew Svetlov
Re: Having Sorted Containers in stdlib? [ In reply to ]
Just want to mention that we do have this nice website to look at, although I am not sure how up to date it is:

https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1

> On 11 Nov 2021, at 19:20, Antoine Pitrou <antoine@python.org> wrote:
>
> Unfortunately, PyPI doesn't seem to offer a way to query the reverse
> dependencies of a package, otherwise we could know how many packages
> depend on the "sortedcontainers" package.
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 20:58:29 +0000
Bob Fang <bob.fang.london@gmail.com> wrote:
> Just want to mention that we do have this nice website to look at, although I am not sure how up to date it is:
>
> https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1

Wow, thank you, that is very nice!

Best regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2YSXRGLMRFWGACMATQFIKX3PFWPP5P3W/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 12:39:50 +0000
Bob Fang <bob.fang.london@gmail.com> wrote:
> > But if anyone wants to argue the "the stdlib should be shrinking, not
> > growing" position, I suggest they do so *before* someone reaches out
> > to the module author. No point in us making the suggestion and then
> > being forced to withdraw it.
>
> So I suppose we can take a poll on this? If majority of us agree then I am happy to reach out to the author since I am the one who started this thread.

Decisions are not made through voting, so that would be informative,
but little more :-)

I think it's better that you contact the author upfront, asking for a
provisional agreement about stdlib inclusion. If he's not interested,
then no need to pursue it further. If he's interested, then you can
start working on a more formal proposal (for example a PEP).

Regards

Antoine.



>
> Thanks,
> Bob
>
> > On 11 Nov 2021, at 12:28, Paul Moore <p.f.moore@gmail.com> wrote:
> >
> > On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org <mailto:antoine@python.org>> wrote:
> >>
> >> On Wed, 10 Nov 2021 21:12:17 -0600
> >> Tim Peters <tim.peters@gmail.com> wrote:
> >>> [Bob Fang <boyufang@bytedance.com>]
> >>>> This is a modest proposal to consider having sorted containers
> >>>> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
> >>>
> >>> +1 from me, but if and only if Grant Jenks (its author) wants that too.
> >>>
> >>> It's first-rate code in all respects, including that it's a fine
> >>> example _of_ Python programming (it's not written in C - in Python).
> >>
> >> Agreed with Tim. This is a perfect example of some basic and perennial
> >> facility that would fit very well in the stdlib.
> >
> > I agree as well. Is anyone interested enough to ask the library author
> > if he supports doing this? That seems to be the main unanswered
> > question here.
> >
> > But if anyone wants to argue the "the stdlib should be shrinking, not
> > growing" position, I suggest they do so *before* someone reaches out
> > to the module author. No point in us making the suggestion and then
> > being forced to withdraw it.
> >
> > Paul
> > _______________________________________________
> > Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org>
> > To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org>
> > https://mail.python.org/mailman3/lists/python-dev.python.org/ <https://mail.python.org/mailman3/lists/python-dev.python.org/>
> > Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/ <https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/>
> > Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
>



_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3UDYPFVMENHMLAZETGDVSUBAO2BY3C6W/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
[Christopher Barker <pythonchb@gmail.com>]
> Earlier in the thread, we were pointed to multiple implementations.
>
> Is this particular one clearly the “best”[*]?
>
> If so, then sure.
>
> -CHB
>
> [*] best meaning “most appropriate for the stdlib”. A couple folks have
> already pointed to the quality of the code. But my understanding is that
> different algorithms are more or less appropriate for different use
> cases. So is this one fairly “universal”?

As noted earlier, SortedContainers is downloaded from PyPI hundreds of
thousands of times every day, and that's been so for a long time..
Best I can tell, no similar package is in the same universe as that.

The author identified over a dozen potential competitors here[1], but
most didn't look like serious projects to him. He ran extensive timing
tests against the rest, reported there too. HIs package usually wins,
and is at worst competitive. It usually wins on memory burden too.

Some of these things are people intrigued by "a data structure" they
read about and implement for fun and self-education. That sometimes
shines through in data-structure-specific APIs. While the
SortedContainers extensive docs explain how things are implemented (if
you look in the right spots for that), the API is remarkably agnostic
(& useful)). For example, this is a method of SortedList:

irange(minimum=None, maximum=None, inclusive=(True, True), reverse=False)

which returns an iterator over a contiguous sorted range of values
(neither, either, or both of whose endpoints may be included,
depending on what you pass for `inclusive` This "makes sense"
regardless of how the structure may be implemented. Of course
SortedSet and SortedDict support `irange()` too.

[1] http://www.grantjenks.com/docs/sortedcontainers/performance.html
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FD3DSXZUBDUC3PFQH2SS47NIZ36TTOL6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, Nov 11, 2021 at 11:22 AM Antoine Pitrou <antoine@python.org> wrote:

> On Thu, 11 Nov 2021 11:01:32 -0800
> Richard Levasseur <richardlev@gmail.com> wrote:
> >
> > In the end, it was a fun exercise, but in practice a dictionary and
> > sorted() got me 90% of the way there and sufficed. Optimizing that last
> 10%
> > wasn't worth the effort.
> >
> > Anyways, I came to two particular, IMHO, conclusions:
> > 1. Sorted collections are very niche. I *could* have used one, but
> probably
> > would have spent as much time fiddling with them as using a
> dict/sorted().
>
> So the fact that sorted collections did not help for one single use
> case leads you to conclude that "sorted collections are very niche"?
> I don't find this very convincing.
>

It was just an example demonstrating that, despite all the apparent "things
need to be kept sorted"-ness of the problem, the strict requirement that
data be fully sorted at every step of the code wasn't actually present.
That sorting at the end is quite sufficient. It just had the mere
appearance of being a requirement. This example is not unlike the exchange
Paul Bryan and Christopher Baker had upthread where Paul gave an example,
and Christopher pointed out sorting at the end was sufficient.

And no, that I didn't choose[1] to use a sorted collection for that
particular case didn't make me conclude they were very niche. Reflecting on
my experience coding and reviewing code over, eh, 20 years? Or w/e. lead me
to realize that very rarely have I seen, needed, or recommended using a
sorted collection. In my experience, the order of elements in a collection
mattering is the exception, not the rule. Hence, I conclude they're niche.

[1] As my example described, a sorted collection would have been able to
realize some optimal performance characteristic, but eking out that last
bit wasn't worth the effort.


>
> Unfortunately, PyPI doesn't seem to offer a way to query the reverse
> dependencies of a package, otherwise we could know how many packages
> depend on the "sortedcontainers" package.
>
>
Querying my employer's code base, the number of hits for sortedcontainers
isn't particularly impressive. It's not so low to be dismissed, but it's
also not so high as to be anywhere near unequivacable. Forgive me for not
sharing the actual numbers; I'm just not really sure on what sort of code
base statistics and numbers I'm permitted to share under what circumstances
etc.


> [...]
>
> > 2. If you're in a niche use case, especially for performance, then
> > abstractions aren't really helpful. Using e.g. a BTree and knowing the
> > particulars of the implementation are going to be more helpful than
> having
> > an abstract SortedList and working out how to use its special APIs for
> > particular actions.
>
> You don't have "an abstract SortedList", you have a concrete SortedList
> with a well-defined implementation with stable performance
> characteristics. So, again, this seems to be a red herring.
>

My point is that there are many ways to implement a collection that
maintains its elements in sorted order that has well defined performance.
One can pick one and accept the pros/cons of the particular underlying
algorithm. But it seems misleading to call that *the* SortedList
implementation. Hence why I say "abstract SortedList".

To emphasize this point a bit, let's presume for the moment Python chose to
have a data type named SortedList, and it was, effectively, the
sortedcontainers implementation. What does one do with the "load factor"
setting that sortedcontainers.SortedList has? This arg is, essentially, a
BTree implementation detail leaking out. IIRC, it controls the fan out of
the underlying BTree (because one size doesn't always fit all), which is
one of the biggest determiners of the overall performance. If such a
parameter is exposed, then it's no longer really a generic "sorted list",
it's a "sorted list as implemented using a btree". So might as well call it
e.g. BTreeList or some such instead.

If such a parameter *isn't* exposed, then a one-size-fits-all value must be
chosen[2] -- I'm skeptical that's really tenable because, if a sorted
collection is needed, it's probably for performance reasons, and you need
knobs like that to better fit the algorithm to your data and access
patterns.

And I guess I have to state upfront that my claim of "it's probably for
performance reasons" is just what my experience leads me to believe, so
it'll just have to be taken with the dosage of salt that best fits, too.

[2] As an aside, that a list's growth factor isn't controllable works
against things like sortedcontainers. Which isn't to say `list` should
expose it, but that if, e.g. ArrayList(growth_factor) was a thing, then a
more optimal implementation of e.g. BTree would be possible.


>
> > This basically echos Christopher Baker's sentiment of "whats the use
> case"
> > and "why doesn't a sorted() call at the end suffice" to some extent.
>
> The use case is obviously when you have to *maintain* sorting at all
> times. Yes, there are such situations. Some rather high-profile
> Python packages, such as dask/distributed, rely on this for rather
> elaborate algorithmic work (dask/distributed is a distributed task
> scheduler, you may also think of it as a replacement for Spark).
>
>
Are you able to elaborate wrt their needs to use a sorted collection?

Something that would be very compelling (to me, at least) is demonstrating
the utility of sorted collections where their performance characteristics
isn't the only/sole/primary/dominant motivator for using one. Performance
is important, but it's also kind of fuzzy and easy to argue about without a
clear answer/conclusion.

I hope we can all agree that there do exist situations where a sorted
collection is, at the least, helpful. That they aren't inherently useless.

I think the main thing being questioned is when/if they are, in
effect, *necessary
throughout*.
That when this is true: * The data must, at all times, without exception,
be and stay sorted from its inception to its conclusion
Then this is also true: * The case is not niche

Does "dask/distributed" meet that criteria? If so, why? I have no
experience, knowledge, or opinion about "dask/distributed". Maybe it *is* a
compelling example, or maybe not. IDK.


> I won't go over your performance considerations, but I'll just mention
> that the "sortedcontainers" documentation has not one but *several*
> sections exploring performance characteristics in several dimensions.
> It is extremely rare for Python packages to provide such a detailed
> insight on the matter (even the built-in `dict` is much less thoroughly
> characterised).
> http://www.grantjenks.com/docs/sortedcontainers/#user-guide


Yes, his performance analysis really is top notch.


>
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EHZGBXSIXZE6ZDBGJPESDQQG4PCNOMLW/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:

> Should the stdlib have e.g. SortedList? Probably not, because the use cases
> of such data types are too niche to a one-size-fits-all implementation, and
> there are too many implementations with too many of their own settings.
> Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
> because those are somewhat fundamental data structures and implementing,
> testing, and verifying them is very much non-trivial. While niche, having
> them at your immediate disposal is very powerful.

By that reasoning, we shouldn't have dicts, for exactly the same reason:
for anyone wanting an associative array, there are so many implementation
variants to choose from:

- hash table with linear probing
- hash table with chaining
- AVL tree
- red-black tree
- judy array
- splay tree
- treap
- scapegoat tree

and many more, and most of them can be tuned.

If Python didn't already have dict, your argument against SortedList
would equally apply to it: they are "fundamental data structures and
implementing, testing, and verifying them is very much non-trivial".

So if your argument is correct, that would imply that standardizing on
one single dict implementation, one that isn't even tunable by the
caller, was a mistake. We should have provided a dozen different hash
tables, trees and treaps.

But... we didn't, and I don't think that Python is a worse language
because we only have one associative array implementation in the stdlib.

Whenever I need an associative array, I don't lose sleep over whether I
could get 2% better performance for hits, at a cost of 17% worse
performance for misses, by swapping over to some other implementation. I
just reach for dict, knowing that it will almost always be good enough.


> Last year, for fun, after wishing there was a SortedSet in the standard
> lib, I ended up implementing a Red-Black Tree and BTree based sorted
> dictionary/set[1]. After then trying to use them for my use case[2], I
> found that, in order to fully and truly exploit their benefits, the basic
> Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
> would let me, e.g. binary search to a particular spot and then iterate, or
> give me a range between two points, etc.

I believe that sortedcontainers.SortedSet provides that functionality
via the irange() method.

http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontainers.SortedList.irange


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DWGJAXOVAY64XU7FBQI6D3D5NNFKZEHW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
I think Richard's point was two-fold: People usually don't want or need
this kind of thing /except/ when they have some very specific
performance requirements, in which case they probably want to also be
very specific about the kind of container they are using rather than
using an abstract "sorted container".

It is true that people sensitive to performance may really care about
the way dict is implemented, but there are a great many uses for
associative arrays in general. I knew about sortedcontainers and I also
don't remember ever seeing a situation where I needed one or recommended
its use. Maybe it would be useful in leetcode-style interview questions
(which may by itself be a good reason to include it in the standard
library, since a lot of those interviewers will let you use
`collections.deque` or `collections.OrderedDict` without implementing
any sort of linked listed, but they won't let you import a module that
provides a trie or something)?

I'm fairly neutral on this proposal. I do think the standard library is
a good place for these sorts of fundamental data structures (even
abstract ones), but I do agree with Richard that they are pretty niche.
In almost any situation where you'd want / need something like this, I
feel like adding a PyPI dependency would not be a big deal.

One thing that I do think might be valuable is if the plan were to
re-implement these as C extensions. Maintaining and distributing C
extensions on PyPI is in many ways more difficult than bundling them
directly into CPython. That said, doing so adds maintenance burden and
implementation cost (and it's a different proposition than merely
adopting an existing library), so I'm probably -0.0 on the whole thing —
the absence of these types of containers in the standard library is not
an obvious lack, and getting them from PyPI in the situations where you
actually need them doesn't seem like a big deal, particularly when it's
a pure Python impementation.

On 11/11/21 21:43, Steven D'Aprano wrote:
> On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:
>
>> Should the stdlib have e.g. SortedList? Probably not, because the use cases
>> of such data types are too niche to a one-size-fits-all implementation, and
>> there are too many implementations with too many of their own settings.
>> Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
>> because those are somewhat fundamental data structures and implementing,
>> testing, and verifying them is very much non-trivial. While niche, having
>> them at your immediate disposal is very powerful.
> By that reasoning, we shouldn't have dicts, for exactly the same reason:
> for anyone wanting an associative array, there are so many implementation
> variants to choose from:
>
> - hash table with linear probing
> - hash table with chaining
> - AVL tree
> - red-black tree
> - judy array
> - splay tree
> - treap
> - scapegoat tree
>
> and many more, and most of them can be tuned.
>
> If Python didn't already have dict, your argument against SortedList
> would equally apply to it: they are "fundamental data structures and
> implementing, testing, and verifying them is very much non-trivial".
>
> So if your argument is correct, that would imply that standardizing on
> one single dict implementation, one that isn't even tunable by the
> caller, was a mistake. We should have provided a dozen different hash
> tables, trees and treaps.
>
> But... we didn't, and I don't think that Python is a worse language
> because we only have one associative array implementation in the stdlib.
>
> Whenever I need an associative array, I don't lose sleep over whether I
> could get 2% better performance for hits, at a cost of 17% worse
> performance for misses, by swapping over to some other implementation. I
> just reach for dict, knowing that it will almost always be good enough.
>
>
>> Last year, for fun, after wishing there was a SortedSet in the standard
>> lib, I ended up implementing a Red-Black Tree and BTree based sorted
>> dictionary/set[1]. After then trying to use them for my use case[2], I
>> found that, in order to fully and truly exploit their benefits, the basic
>> Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
>> would let me, e.g. binary search to a particular spot and then iterate, or
>> give me a range between two points, etc.
> I believe that sortedcontainers.SortedSet provides that functionality
> via the irange() method.
>
> http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontainers.SortedList.irange
>
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:

> I knew about sortedcontainers and I also don't remember ever seeing a
> situation where I needed one or recommended its use.

We have a very odd situation where apparently sortedcontainers is one
of the most well-known, popular, most heavily downloaded libraries on
PyPI. According to here:

https://hugovk.github.io/top-pypi-packages/

sortedcontainers is the 290th most popular package on PyPI, ahead of
such luminaries as pylint, black, selenium, mypy, django and nose.

And yet, nobody(?) admits to either using it or knowing what it could be
used for. How very curious :-/


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7JVLJE4G6ZWM6PKXTMSZL7MVAIUWGZRH/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Sat, Nov 13, 2021 at 3:20 AM Steven D'Aprano <steve@pearwood.info> wrote:
>
> On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:
>
> > I knew about sortedcontainers and I also don't remember ever seeing a
> > situation where I needed one or recommended its use.
>
> We have a very odd situation where apparently sortedcontainers is one
> of the most well-known, popular, most heavily downloaded libraries on
> PyPI. According to here:
>
> https://hugovk.github.io/top-pypi-packages/
>
> sortedcontainers is the 290th most popular package on PyPI, ahead of
> such luminaries as pylint, black, selenium, mypy, django and nose.
>
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/
>

I think that proves the value of download counts: very little. The
highest on the list is a thing called botocore, which I've never heard
of. What is it? It's a dependency of a number of Amazon web services.
My guess is that it's a dependency of popular tools - or maybe of
automatically-installed tools, even - while not itself being well
known. Actually, quite a few of the most popular packages on that list
are Amazon-related, so quite possibly they're all being installed as a
set in response to one single *actual* dependency.

(Download counts DO most likely indicate that something is heavily
used, though, so if people are trying to plan out a benchmark suite,
then that might be more useful. But I'm not really surprised that a
heavily-downloaded package isn't itself well known.)

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LF622NRJBFBNHKZR2B7P6X4D2GE4WSDX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On 12.11.2021 17:10, Steven D'Aprano wrote:
> On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:
>
>> I knew about sortedcontainers and I also don't remember ever seeing a
>> situation where I needed one or recommended its use.
>
> We have a very odd situation where apparently sortedcontainers is one
> of the most well-known, popular, most heavily downloaded libraries on
> PyPI. According to here:
>
> https://hugovk.github.io/top-pypi-packages/
>
> sortedcontainers is the 290th most popular package on PyPI, ahead of
> such luminaries as pylint, black, selenium, mypy, django and nose.
>
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/

Those download stats can be misleading. Packages are often pulled in
as a dependency of other packages which are popular or used a lot
in CI/CD setups.

Perhaps there's a reverse dependency graph we could use to find out
why the package is downloaded this often. I remember having seen
a project which does this, but have lost the URL.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 12 2021)
>>> Python Projects, Coaching and Support ... https://www.egenix.com/
>>> Python Product Development ... https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
https://www.egenix.com/company/contact/
https://www.malemburg.com/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FJT4SSWISWWRFB5FGRPXSLDLDZVMUIXF/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/

trio (which IMHO is a somewhat high profile uses it):

https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27 <https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27>

And so if I am not mistaken any project that depends on trio will automatically need to have sortedcontainers as its dependency? I am not saying this alone will explain why sortedcontainers have large number of downloads, but with one or two projects like this it will push the number up maybe?

Best,
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
> On 12 Nov 2021, at 16:32, Marc-Andre Lemburg <mal@egenix.com> wrote:
>
> Perhaps there's a reverse dependency graph we could use to find out
> why the package is downloaded this often. I remember having seen
> a project which does this, but have lost the URL.


I believe this is the URL we are looking for:

https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1 <https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1>


Best,
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/

trio (which IMHO is a somewhat high profile uses it):

https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27 <https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27>

And so if I am not mistaken any project that depends on trio will automatically need to have sortedcontainers as its dependency? I am not saying this alone will explain why sortedcontainers have large number of downloads, but with one or two projects like this it will push the number up maybe?

Best,
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
Yeah, a datapoint I didn't see mentioned is searching on public github
repos for import sortedcontainers (which includes from sortedcontainers
import).

Obviously it's just one datapoint but it shows a very small count compared
to other packages mentioned when looking at many of the stats available:

https://github.com/search?q=import+sortedcontainers+language%3APython&type=code
https://github.com/search?q=import+pylint+language%3APython&type=code
https://github.com/search?q=import+black+language%3APython&type=code
https://github.com/search?q=import+mypy+language%3APython&type=code
https://github.com/search?q=import+django+language%3APython&type=code

Damian


On Fri, Nov 12, 2021 at 11:39 AM Marc-Andre Lemburg <mal@egenix.com> wrote:

> On 12.11.2021 17:10, Steven D'Aprano wrote:
> > On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:
> >
> >> I knew about sortedcontainers and I also don't remember ever seeing a
> >> situation where I needed one or recommended its use.
> >
> > We have a very odd situation where apparently sortedcontainers is one
> > of the most well-known, popular, most heavily downloaded libraries on
> > PyPI. According to here:
> >
> > https://hugovk.github.io/top-pypi-packages/
> >
> > sortedcontainers is the 290th most popular package on PyPI, ahead of
> > such luminaries as pylint, black, selenium, mypy, django and nose.
> >
> > And yet, nobody(?) admits to either using it or knowing what it could be
> > used for. How very curious :-/
>
> Those download stats can be misleading. Packages are often pulled in
> as a dependency of other packages which are popular or used a lot
> in CI/CD setups.
>
> Perhaps there's a reverse dependency graph we could use to find out
> why the package is downloaded this often. I remember having seen
> a project which does this, but have lost the URL.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Nov 12 2021)
> >>> Python Projects, Coaching and Support ... https://www.egenix.com/
> >>> Python Product Development ... https://consulting.egenix.com/
> ________________________________________________________________________
>
> ::: We implement business ideas - efficiently in both time and costs :::
>
> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> https://www.egenix.com/company/contact/
> https://www.malemburg.com/
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/FJT4SSWISWWRFB5FGRPXSLDLDZVMUIXF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On 11/12/2021 11:31 AM, Chris Angelico wrote:
>
> I think that proves the value of download counts: very little. The
> highest on the list is a thing called botocore, which I've never heard
> of. What is it? It's a dependency of a number of Amazon web services.
> My guess is that it's a dependency of popular tools - or maybe of
> automatically-installed tools, even - while not itself being well
> known. Actually, quite a few of the most popular packages on that list
> are Amazon-related, so quite possibly they're all being installed as a
> set in response to one single *actual* dependency.

botocore and boto3 are the Amazon-provided low- and high-level
interfaces, respectively, to AWS. They're extremely import and well
known in that environment.

Eric

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/V4GSZINECFIAZ4MGDZNM3L354LJDSX2Y/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On 12.11.2021 17:46, Bob Fang wrote:
>
>
>> On 12 Nov 2021, at 16:32, Marc-Andre Lemburg <mal@egenix.com
>> <mailto:mal@egenix.com>> wrote:
>>
>> Perhaps there's a reverse dependency graph we could use to find out
>> why the package is downloaded this often. I remember having seen
>> a project which does this, but have lost the URL.
>
>
> I believe this is the URL we are looking for:?
>
> https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1

Yes, thanks, that was the one.

The listing doesn't seem to solve the puzzle either, though. There
are a few high profile projects using the package, but those are not
high up on the download ranking, so don't explain the high download
counts of the package.

BTW: This reverse dependency ranking on the website gives a good
idea of how important a package is for the Python package eco system:

https://www.wheelodex.org/rdepends-leaders/

(sort of like the Page rank for Python packages)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 12 2021)
>>> Python Projects, Coaching and Support ... https://www.egenix.com/
>>> Python Product Development ... https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
https://www.egenix.com/company/contact/
https://www.malemburg.com/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/732T3MNBB2PMFP7TVXSDDMW6KNI5CEQS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
I started writing up a SortedDict use case I have, but it's very
elaborate and I expect it would just end with endless pointless
argument about other approaches I _could_ take. But I already know all
those ;-)

So let's look at something conceptually "dead easy" instead: priority
queues. They're a basic building block in algorithms from many fields.

In the distributed Python, `heapq` is the only thing that comes close
to being reasonably efficient and scalable for that.

However:

- It only supports min-heaps. It's not that max-heaps aren't also
useful (indeed, _under the covers_ CPython also implements max-heaps
for its own internal use). It's more that the original API simply
doesn't generalize cleanly.

- So there's a body of word-of-mouth "tricks" for making a min-heap
"act like" a max-heap. For integer or float priorities, it suffices to
use the negation of "the real" priority. For other kinds of values, it
gets trickier. In the end, you can wrap each item in a class that
implements `x.__lt__(y)` by doing `y.value < x.value`. But few people
appear to be aware of that, while nobody wants to endure the bother
;-)

- It's often the case that "the priority" needs to be computed from an
item's value. `heapq` has no direct support for that. Instead there's
another body of word-of-mouth tricks akin to the old
decorate-sort-undecorate approach sorting used. Rather than store the
values in the heap, you store `(priority, value)` tuples. But then
various unpleasant things can happen if two priorities are tied: in
that case tuple comparison falls back to comparing the values. But,
e.g., value comparison may be very expensive, or the values may not
support __lt__ at all. So instead you store `(priority,
unique_integer, value)` triples. And hope that you don't really need a
max-heap, lest it get even more obscure.

Using SortedContainers, none of those are issues. It's all
straightforward and obvious. If you have a list L always sorted by
priority, extracting from a "min queue" just requires L.pop(0), or
from a "max queue" L.pop(-1) (or L.pop() - -1 is the default, same as
for the Python list.pop()). No tricks are needed. Fine too if
sometimes you want to extract the min, while at other times the max.
Or, e.g., peek at the 5 highest-priority values and pick the one that
best fits resources available at the time.

In cases of computed priorities, the SortedKeyList constructor allows
specifying a function to be used to compute the keys used to determine
the list's ordering. Again straightforward and obvious.

from sortedcontainers import SortedKeyList
L = SortedKeyList([.(i, j) for i in range(1, 5)
for j in range(1, 5)],
key=lambda t: t[0]/t[1])

>>> [t for t in L]
[.(1, 4), (1, 3), (1, 2), (2, 4), (2, 3), (3, 4), (1, 1),
(2, 2), (3, 3), (4, 4), (4, 3), (3, 2), (2, 1), (4, 2),
(3, 1), (4, 1)]

That said, in most of my code I end up _removing_ uses of
SortedContainers, in favor of faster ways of getting the job done. The
package isn't the last resort for me, it's the first. I can write code
in straightforward ways that scale well from the start. If I need more
speed later, fine, I can puzzle out faster ways to get the task done.
If you're proud of that you _start_ by plotting ways to minimize the
number of sorts you do, you're coding in C++, not Python ;-)

So I suggest people are staring at the wrong end when asking for use
cases that can't be done without the package. Those are necessarily
non-trivial. Having a sorted collection is "more than good enough" for
many tasks. As above, e.g., there's no reason to imagine that Grant
Jenks had priority queues in mind at all, yet the flavors of sorted
lists he implemented are very easy to use for that task.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XXAXVV3KJ4ORQ7K6ELSS4R7S5725ASRE/
Code of Conduct: http://python.org/psf/codeofconduct/