Mailing List Archive

1 2  View All
Re: Having Sorted Containers in stdlib? [ In reply to ]
Earlier in the thread, we were pointed to multiple implementations.

Is this particular one clearly the “best”[*]?

If so, then sure.

-CHB

[*] best meaning “most appropriate for the stdlib”. A couple folks have
already pointed to the quality of the code. But my understanding is that
different algorithms are more or less appropriate for different use cases.
So is this one fairly “universal”?



On Thu, Nov 11, 2021 at 4:31 AM Paul Moore <p.f.moore@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.peters@gmail.com> wrote:
> > > [Bob Fang <boyufang@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim. This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>
> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD (Chris)

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Re: Having Sorted Containers in stdlib? [ In reply to ]
Serhiy Storchaka wrote:
> From my C++ and Java experience, hashtable-based containers are much
> more useful than tree-based containers. They are more compact and fast.
> In most cases the only reason of using sorted container is supporting
> deterministic iteration order, but often it is enough to sort data only
> for output.

The concern is not constant speed factor but asymptotic complexity.
Inserting or deleting in a tree-based sorted collection is O(log n),
but lists are O(n): the bisect is O(log n) but moving trailing list elements
is O(n). It's memmove fast but dominates as n gets large.

There are many algorithms that are hard to implement as fast as they
can be without a tree. Something like rolling median would be a good
test case, e.g. "for a list N containing numbers, produce a list M of
length len(N) - 1 where M[i] == statistics.median(N[:i+1])". I think the
best you can do with the standard library and without writing a
balanced binary tree from scratch is O(n²) (if you can beat this I'd like
to know how!). With a tree-based sorted list it's O(n log n).

Dicts and sets cannot maintain an arbitrary order, except that
OrderedDict.move_to_end() very occasionally turns out to be all you
need (and even then, I find it tricky to reason about and apply
correctly in an algorithm: it requires you to maintain an invariant,
where SortedList maintains the sorted invariant for you).

> There is no such need
> of tree-based implementation. It is not needed in Python core, and is
> not needed for most users. So it is good to keep it in third-party
> libraries. Maintaining it in the stdlib has a cost and the benefit/cost
> value would be low.

The situation where this bites my firm the most is in interviews. This is
exactly the situation where we're asking candidates to demonstrate
their CS knowledge and produce an algorithm that has low asymptotic
complexity.

We use various services including HackerRank, which provide a
sandboxed Python environment that cannot use PyPI. There are quite
a few services in this kind of Educational+Challenge space including
some that execute Python in the browser using Brython etc.

My team believes that candidates with a Java background find the
challenge easier than candidates with a Python background, because
they just use NavigableSet and move on. I find that kind of embarrassing.
It requires us to apologise for Python's deficiency and fudge how we
score the question, which makes it hard to know we're providing a level
playing field.

(I would argue that we should not ask those kinds of questions or not
use these sandboxed interview environments but institutional friction
makes that magnitude of change very difficult.)

Putting something in the standard library means that it will, in due
course, appear in HackerRank and other services we use, and we can
expect candidates to be familiar with it and incorporate it in their
solutions.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VTOYBWO5OWXID4OIH5FWCKWSWRDXKSTB/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, Nov 11, 2021 at 4:30 AM Paul Moore <p.f.moore@gmail.com> wrote:

> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> >
> > On Wed, 10 Nov 2021 21:12:17 -0600
> > Tim Peters <tim.peters@gmail.com> wrote:
> > > [Bob Fang <boyufang@bytedance.com>]
> > > > This is a modest proposal to consider having sorted containers
> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
> library.
> > >
> > > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> > >
> > > It's first-rate code in all respects, including that it's a fine
> > > example _of_ Python programming (it's not written in C - in Python).
> >
> > Agreed with Tim. This is a perfect example of some basic and perennial
> > facility that would fit very well in the stdlib.
>
> I agree as well. Is anyone interested enough to ask the library author
> if he supports doing this? That seems to be the main unanswered
> question here.
>
> But if anyone wants to argue the "the stdlib should be shrinking, not
> growing" position, I suggest they do so *before* someone reaches out
> to the module author. No point in us making the suggestion and then
> being forced to withdraw it.
>

A couple 2 cents and IMHO.

-----
The tldr is not so much "don't put it in the standard lib", but that the
stdlib would be better having particular implementations instead of a
one-size-fits-all e.g. SortedList.

Should the stdlib have e.g. SortedList? Probably not, because the use cases
of such data types are too niche to a one-size-fits-all implementation, and
there are too many implementations with too many of their own settings.
Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
because those are somewhat fundamental data structures and implementing,
testing, and verifying them is very much non-trivial. While niche, having
them at your immediate disposal is very powerful.
-----

Last year, for fun, after wishing there was a SortedSet in the standard
lib, I ended up implementing a Red-Black Tree and BTree based sorted
dictionary/set[1]. After then trying to use them for my use case[2], I
found that, in order to fully and truly exploit their benefits, the basic
Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
would let me, e.g. binary search to a particular spot and then iterate, or
give me a range between two points, etc. Such apis tend to be pretty
specialized to exploit their under data structure (which isn't to say an
abstract API can't be created, see Java's NavigableMap, but that level of
abstraction deserves careful consideration).

In the end, it was a fun exercise, but in practice a dictionary and
sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
wasn't worth the effort.

Anyways, I came to two particular, IMHO, conclusions:
1. Sorted collections are very niche. I *could* have used one, but probably
would have spent as much time fiddling with them as using a dict/sorted().
2. If you're in a niche use case, especially for performance, then
abstractions aren't really helpful. Using e.g. a BTree and knowing the
particulars of the implementation are going to be more helpful than having
an abstract SortedList and working out how to use its special APIs for
particular actions.

This basically echos Christopher Baker's sentiment of "whats the use case"
and "why doesn't a sorted() call at the end suffice" to some extent. IME,
Python's sort algorithm is just really good, so it's hard to beat simply
calling sorted() when you're done processing in both runtime performance
and effort-spent-optimizing. I struggle to think of or remember a
particular case where dropping in a sorted collection would be a clear and
obvious win. Maybe a more memory constrained case where sorted() creating a
new list is too much? IDK.

[1] As an aside, RB didn't perform nearly as well as sortedcontainers for
the majority of cases, but the naive BTree was about on par. IIRC,
sortedcollections is basically a specialized BTree.
[2] A wish that came about because I had sorted sets of exact/prefix
strings of 1,000 to 10,000 elements and needed them to interact in various
ways and wanted to preserve their sortedness.

-----

re: "Is this particular one clearly the “best”[*]?"

Performance wise, it's probably about the best insofar as a BTree of
depth=2 and high fanout (10,000 or something? I forget what
sortedcontainers defaults to) is. In my limited experience, it's docs and
claims about its performance were pretty accurate[3].

API wise, it has its share of idiosyncrasies like anything else. e.g.
SortedList claims to implement MutableSequence, but actually raises errors
when calling e.g. append(), which is a bit misleading / wrong to some
extent (to be fair, given MutableSequence's api contract, it's not like
there's particularly clear/right/good choice one can make here).

(disclaimer: i've minimal experience with sortedcontainers; when I looked
at it, it was really only for comparing my own toy BTree/RBT
behavior/performance with a known implementation)

[3] The exception being a claim that its (supposedly) better cache locality
is a significant factor. I think the underlying BTree algorithm is the
dominant factor. But someone who actually knows how to measure cache
locality would be a better judge of that than me, who doesn't know how.


> Paul
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 11:01:32 -0800
Richard Levasseur <richardlev@gmail.com> wrote:
>
> In the end, it was a fun exercise, but in practice a dictionary and
> sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
> wasn't worth the effort.
>
> Anyways, I came to two particular, IMHO, conclusions:
> 1. Sorted collections are very niche. I *could* have used one, but probably
> would have spent as much time fiddling with them as using a dict/sorted().

So the fact that sorted collections did not help for one single use
case leads you to conclude that "sorted collections are very niche"?
I don't find this very convincing.

Unfortunately, PyPI doesn't seem to offer a way to query the reverse
dependencies of a package, otherwise we could know how many packages
depend on the "sortedcontainers" package.

[...]

> 2. If you're in a niche use case, especially for performance, then
> abstractions aren't really helpful. Using e.g. a BTree and knowing the
> particulars of the implementation are going to be more helpful than having
> an abstract SortedList and working out how to use its special APIs for
> particular actions.

You don't have "an abstract SortedList", you have a concrete SortedList
with a well-defined implementation with stable performance
characteristics. So, again, this seems to be a red herring.

> This basically echos Christopher Baker's sentiment of "whats the use case"
> and "why doesn't a sorted() call at the end suffice" to some extent.

The use case is obviously when you have to *maintain* sorting at all
times. Yes, there are such situations. Some rather high-profile
Python packages, such as dask/distributed, rely on this for rather
elaborate algorithmic work (dask/distributed is a distributed task
scheduler, you may also think of it as a replacement for Spark).


I won't go over your performance considerations, but I'll just mention
that the "sortedcontainers" documentation has not one but *several*
sections exploring performance characteristics in several dimensions.
It is extremely rare for Python packages to provide such a detailed
insight on the matter (even the built-in `dict` is much less thoroughly
characterised).
http://www.grantjenks.com/docs/sortedcontainers/#user-guide

Regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/EHZGBXSIXZE6ZDBGJPESDQQG4PCNOMLW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
sortedcontainers looks great!
I very much appreciate it if such well-done library have type hints
(embedded or at least in form of types-sortedcontainers pypi package).

From my experience of multidict library support, it is not a super-hard
challenge but something that needs attention and time resource from the
library maintainers or third-party volunteers.

On Thu, Nov 11, 2021 at 9:06 PM Richard Levasseur <richardlev@gmail.com>
wrote:

>
>
> On Thu, Nov 11, 2021 at 4:30 AM Paul Moore <p.f.moore@gmail.com> wrote:
>
>> On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
>> >
>> > On Wed, 10 Nov 2021 21:12:17 -0600
>> > Tim Peters <tim.peters@gmail.com> wrote:
>> > > [Bob Fang <boyufang@bytedance.com>]
>> > > > This is a modest proposal to consider having sorted containers
>> > > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard
>> library.
>> > >
>> > > +1 from me, but if and only if Grant Jenks (its author) wants that
>> too.
>> > >
>> > > It's first-rate code in all respects, including that it's a fine
>> > > example _of_ Python programming (it's not written in C - in Python).
>> >
>> > Agreed with Tim. This is a perfect example of some basic and perennial
>> > facility that would fit very well in the stdlib.
>>
>> I agree as well. Is anyone interested enough to ask the library author
>> if he supports doing this? That seems to be the main unanswered
>> question here.
>>
>> But if anyone wants to argue the "the stdlib should be shrinking, not
>> growing" position, I suggest they do so *before* someone reaches out
>> to the module author. No point in us making the suggestion and then
>> being forced to withdraw it.
>>
>
> A couple 2 cents and IMHO.
>
> -----
> The tldr is not so much "don't put it in the standard lib", but that the
> stdlib would be better having particular implementations instead of a
> one-size-fits-all e.g. SortedList.
>
> Should the stdlib have e.g. SortedList? Probably not, because the use
> cases of such data types are too niche to a one-size-fits-all
> implementation, and there are too many implementations with too many of
> their own settings.
> Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably
> so, because those are somewhat fundamental data structures and
> implementing, testing, and verifying them is very much non-trivial. While
> niche, having them at your immediate disposal is very powerful.
> -----
>
> Last year, for fun, after wishing there was a SortedSet in the standard
> lib, I ended up implementing a Red-Black Tree and BTree based sorted
> dictionary/set[1]. After then trying to use them for my use case[2], I
> found that, in order to fully and truly exploit their benefits, the basic
> Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
> would let me, e.g. binary search to a particular spot and then iterate, or
> give me a range between two points, etc. Such apis tend to be pretty
> specialized to exploit their under data structure (which isn't to say an
> abstract API can't be created, see Java's NavigableMap, but that level of
> abstraction deserves careful consideration).
>
> In the end, it was a fun exercise, but in practice a dictionary and
> sorted() got me 90% of the way there and sufficed. Optimizing that last 10%
> wasn't worth the effort.
>
> Anyways, I came to two particular, IMHO, conclusions:
> 1. Sorted collections are very niche. I *could* have used one, but
> probably would have spent as much time fiddling with them as using a
> dict/sorted().
> 2. If you're in a niche use case, especially for performance, then
> abstractions aren't really helpful. Using e.g. a BTree and knowing the
> particulars of the implementation are going to be more helpful than having
> an abstract SortedList and working out how to use its special APIs for
> particular actions.
>
> This basically echos Christopher Baker's sentiment of "whats the use case"
> and "why doesn't a sorted() call at the end suffice" to some extent. IME,
> Python's sort algorithm is just really good, so it's hard to beat simply
> calling sorted() when you're done processing in both runtime performance
> and effort-spent-optimizing. I struggle to think of or remember a
> particular case where dropping in a sorted collection would be a clear and
> obvious win. Maybe a more memory constrained case where sorted() creating a
> new list is too much? IDK.
>
> [1] As an aside, RB didn't perform nearly as well as sortedcontainers for
> the majority of cases, but the naive BTree was about on par. IIRC,
> sortedcollections is basically a specialized BTree.
> [2] A wish that came about because I had sorted sets of exact/prefix
> strings of 1,000 to 10,000 elements and needed them to interact in various
> ways and wanted to preserve their sortedness.
>
> -----
>
> re: "Is this particular one clearly the “best”[*]?"
>
> Performance wise, it's probably about the best insofar as a BTree of
> depth=2 and high fanout (10,000 or something? I forget what
> sortedcontainers defaults to) is. In my limited experience, it's docs and
> claims about its performance were pretty accurate[3].
>
> API wise, it has its share of idiosyncrasies like anything else. e.g.
> SortedList claims to implement MutableSequence, but actually raises errors
> when calling e.g. append(), which is a bit misleading / wrong to some
> extent (to be fair, given MutableSequence's api contract, it's not like
> there's particularly clear/right/good choice one can make here).
>
> (disclaimer: i've minimal experience with sortedcontainers; when I looked
> at it, it was really only for comparing my own toy BTree/RBT
> behavior/performance with a known implementation)
>
> [3] The exception being a claim that its (supposedly) better cache
> locality is a significant factor. I think the underlying BTree algorithm is
> the dominant factor. But someone who actually knows how to measure cache
> locality would be a better judge of that than me, who doesn't know how.
>
>
>> Paul
>> _______________________________________________
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-leave@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/HVZCFPKI7TY6WVLFK43KLVVDTEPIXFFQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


--
Thanks,
Andrew Svetlov
Re: Having Sorted Containers in stdlib? [ In reply to ]
Just want to mention that we do have this nice website to look at, although I am not sure how up to date it is:

https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1

> On 11 Nov 2021, at 19:20, Antoine Pitrou <antoine@python.org> wrote:
>
> Unfortunately, PyPI doesn't seem to offer a way to query the reverse
> dependencies of a package, otherwise we could know how many packages
> depend on the "sortedcontainers" package.
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 20:58:29 +0000
Bob Fang <bob.fang.london@gmail.com> wrote:
> Just want to mention that we do have this nice website to look at, although I am not sure how up to date it is:
>
> https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1

Wow, thank you, that is very nice!

Best regards

Antoine.


_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2YSXRGLMRFWGACMATQFIKX3PFWPP5P3W/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, 11 Nov 2021 12:39:50 +0000
Bob Fang <bob.fang.london@gmail.com> wrote:
> > But if anyone wants to argue the "the stdlib should be shrinking, not
> > growing" position, I suggest they do so *before* someone reaches out
> > to the module author. No point in us making the suggestion and then
> > being forced to withdraw it.
>
> So I suppose we can take a poll on this? If majority of us agree then I am happy to reach out to the author since I am the one who started this thread.

Decisions are not made through voting, so that would be informative,
but little more :-)

I think it's better that you contact the author upfront, asking for a
provisional agreement about stdlib inclusion. If he's not interested,
then no need to pursue it further. If he's interested, then you can
start working on a more formal proposal (for example a PEP).

Regards

Antoine.



>
> Thanks,
> Bob
>
> > On 11 Nov 2021, at 12:28, Paul Moore <p.f.moore@gmail.com> wrote:
> >
> > On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org <mailto:antoine@python.org>> wrote:
> >>
> >> On Wed, 10 Nov 2021 21:12:17 -0600
> >> Tim Peters <tim.peters@gmail.com> wrote:
> >>> [Bob Fang <boyufang@bytedance.com>]
> >>>> This is a modest proposal to consider having sorted containers
> >>>> (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
> >>>
> >>> +1 from me, but if and only if Grant Jenks (its author) wants that too.
> >>>
> >>> It's first-rate code in all respects, including that it's a fine
> >>> example _of_ Python programming (it's not written in C - in Python).
> >>
> >> Agreed with Tim. This is a perfect example of some basic and perennial
> >> facility that would fit very well in the stdlib.
> >
> > I agree as well. Is anyone interested enough to ask the library author
> > if he supports doing this? That seems to be the main unanswered
> > question here.
> >
> > But if anyone wants to argue the "the stdlib should be shrinking, not
> > growing" position, I suggest they do so *before* someone reaches out
> > to the module author. No point in us making the suggestion and then
> > being forced to withdraw it.
> >
> > Paul
> > _______________________________________________
> > Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org>
> > To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org>
> > https://mail.python.org/mailman3/lists/python-dev.python.org/ <https://mail.python.org/mailman3/lists/python-dev.python.org/>
> > Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/ <https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/>
> > Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
>



_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3UDYPFVMENHMLAZETGDVSUBAO2BY3C6W/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
[Christopher Barker <pythonchb@gmail.com>]
> Earlier in the thread, we were pointed to multiple implementations.
>
> Is this particular one clearly the “best”[*]?
>
> If so, then sure.
>
> -CHB
>
> [*] best meaning “most appropriate for the stdlib”. A couple folks have
> already pointed to the quality of the code. But my understanding is that
> different algorithms are more or less appropriate for different use
> cases. So is this one fairly “universal”?

As noted earlier, SortedContainers is downloaded from PyPI hundreds of
thousands of times every day, and that's been so for a long time..
Best I can tell, no similar package is in the same universe as that.

The author identified over a dozen potential competitors here[1], but
most didn't look like serious projects to him. He ran extensive timing
tests against the rest, reported there too. HIs package usually wins,
and is at worst competitive. It usually wins on memory burden too.

Some of these things are people intrigued by "a data structure" they
read about and implement for fun and self-education. That sometimes
shines through in data-structure-specific APIs. While the
SortedContainers extensive docs explain how things are implemented (if
you look in the right spots for that), the API is remarkably agnostic
(& useful)). For example, this is a method of SortedList:

irange(minimum=None, maximum=None, inclusive=(True, True), reverse=False)

which returns an iterator over a contiguous sorted range of values
(neither, either, or both of whose endpoints may be included,
depending on what you pass for `inclusive` This "makes sense"
regardless of how the structure may be implemented. Of course
SortedSet and SortedDict support `irange()` too.

[1] http://www.grantjenks.com/docs/sortedcontainers/performance.html
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FD3DSXZUBDUC3PFQH2SS47NIZ36TTOL6/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, Nov 11, 2021 at 11:22 AM Antoine Pitrou <antoine@python.org> wrote:

> On Thu, 11 Nov 2021 11:01:32 -0800
> Richard Levasseur <richardlev@gmail.com> wrote:
> >
> > In the end, it was a fun exercise, but in practice a dictionary and
> > sorted() got me 90% of the way there and sufficed. Optimizing that last
> 10%
> > wasn't worth the effort.
> >
> > Anyways, I came to two particular, IMHO, conclusions:
> > 1. Sorted collections are very niche. I *could* have used one, but
> probably
> > would have spent as much time fiddling with them as using a
> dict/sorted().
>
> So the fact that sorted collections did not help for one single use
> case leads you to conclude that "sorted collections are very niche"?
> I don't find this very convincing.
>

It was just an example demonstrating that, despite all the apparent "things
need to be kept sorted"-ness of the problem, the strict requirement that
data be fully sorted at every step of the code wasn't actually present.
That sorting at the end is quite sufficient. It just had the mere
appearance of being a requirement. This example is not unlike the exchange
Paul Bryan and Christopher Baker had upthread where Paul gave an example,
and Christopher pointed out sorting at the end was sufficient.

And no, that I didn't choose[1] to use a sorted collection for that
particular case didn't make me conclude they were very niche. Reflecting on
my experience coding and reviewing code over, eh, 20 years? Or w/e. lead me
to realize that very rarely have I seen, needed, or recommended using a
sorted collection. In my experience, the order of elements in a collection
mattering is the exception, not the rule. Hence, I conclude they're niche.

[1] As my example described, a sorted collection would have been able to
realize some optimal performance characteristic, but eking out that last
bit wasn't worth the effort.


>
> Unfortunately, PyPI doesn't seem to offer a way to query the reverse
> dependencies of a package, otherwise we could know how many packages
> depend on the "sortedcontainers" package.
>
>
Querying my employer's code base, the number of hits for sortedcontainers
isn't particularly impressive. It's not so low to be dismissed, but it's
also not so high as to be anywhere near unequivacable. Forgive me for not
sharing the actual numbers; I'm just not really sure on what sort of code
base statistics and numbers I'm permitted to share under what circumstances
etc.


> [...]
>
> > 2. If you're in a niche use case, especially for performance, then
> > abstractions aren't really helpful. Using e.g. a BTree and knowing the
> > particulars of the implementation are going to be more helpful than
> having
> > an abstract SortedList and working out how to use its special APIs for
> > particular actions.
>
> You don't have "an abstract SortedList", you have a concrete SortedList
> with a well-defined implementation with stable performance
> characteristics. So, again, this seems to be a red herring.
>

My point is that there are many ways to implement a collection that
maintains its elements in sorted order that has well defined performance.
One can pick one and accept the pros/cons of the particular underlying
algorithm. But it seems misleading to call that *the* SortedList
implementation. Hence why I say "abstract SortedList".

To emphasize this point a bit, let's presume for the moment Python chose to
have a data type named SortedList, and it was, effectively, the
sortedcontainers implementation. What does one do with the "load factor"
setting that sortedcontainers.SortedList has? This arg is, essentially, a
BTree implementation detail leaking out. IIRC, it controls the fan out of
the underlying BTree (because one size doesn't always fit all), which is
one of the biggest determiners of the overall performance. If such a
parameter is exposed, then it's no longer really a generic "sorted list",
it's a "sorted list as implemented using a btree". So might as well call it
e.g. BTreeList or some such instead.

If such a parameter *isn't* exposed, then a one-size-fits-all value must be
chosen[2] -- I'm skeptical that's really tenable because, if a sorted
collection is needed, it's probably for performance reasons, and you need
knobs like that to better fit the algorithm to your data and access
patterns.

And I guess I have to state upfront that my claim of "it's probably for
performance reasons" is just what my experience leads me to believe, so
it'll just have to be taken with the dosage of salt that best fits, too.

[2] As an aside, that a list's growth factor isn't controllable works
against things like sortedcontainers. Which isn't to say `list` should
expose it, but that if, e.g. ArrayList(growth_factor) was a thing, then a
more optimal implementation of e.g. BTree would be possible.


>
> > This basically echos Christopher Baker's sentiment of "whats the use
> case"
> > and "why doesn't a sorted() call at the end suffice" to some extent.
>
> The use case is obviously when you have to *maintain* sorting at all
> times. Yes, there are such situations. Some rather high-profile
> Python packages, such as dask/distributed, rely on this for rather
> elaborate algorithmic work (dask/distributed is a distributed task
> scheduler, you may also think of it as a replacement for Spark).
>
>
Are you able to elaborate wrt their needs to use a sorted collection?

Something that would be very compelling (to me, at least) is demonstrating
the utility of sorted collections where their performance characteristics
isn't the only/sole/primary/dominant motivator for using one. Performance
is important, but it's also kind of fuzzy and easy to argue about without a
clear answer/conclusion.

I hope we can all agree that there do exist situations where a sorted
collection is, at the least, helpful. That they aren't inherently useless.

I think the main thing being questioned is when/if they are, in
effect, *necessary
throughout*.
That when this is true: * The data must, at all times, without exception,
be and stay sorted from its inception to its conclusion
Then this is also true: * The case is not niche

Does "dask/distributed" meet that criteria? If so, why? I have no
experience, knowledge, or opinion about "dask/distributed". Maybe it *is* a
compelling example, or maybe not. IDK.


> I won't go over your performance considerations, but I'll just mention
> that the "sortedcontainers" documentation has not one but *several*
> sections exploring performance characteristics in several dimensions.
> It is extremely rare for Python packages to provide such a detailed
> insight on the matter (even the built-in `dict` is much less thoroughly
> characterised).
> http://www.grantjenks.com/docs/sortedcontainers/#user-guide


Yes, his performance analysis really is top notch.


>
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/EHZGBXSIXZE6ZDBGJPESDQQG4PCNOMLW/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:

> Should the stdlib have e.g. SortedList? Probably not, because the use cases
> of such data types are too niche to a one-size-fits-all implementation, and
> there are too many implementations with too many of their own settings.
> Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
> because those are somewhat fundamental data structures and implementing,
> testing, and verifying them is very much non-trivial. While niche, having
> them at your immediate disposal is very powerful.

By that reasoning, we shouldn't have dicts, for exactly the same reason:
for anyone wanting an associative array, there are so many implementation
variants to choose from:

- hash table with linear probing
- hash table with chaining
- AVL tree
- red-black tree
- judy array
- splay tree
- treap
- scapegoat tree

and many more, and most of them can be tuned.

If Python didn't already have dict, your argument against SortedList
would equally apply to it: they are "fundamental data structures and
implementing, testing, and verifying them is very much non-trivial".

So if your argument is correct, that would imply that standardizing on
one single dict implementation, one that isn't even tunable by the
caller, was a mistake. We should have provided a dozen different hash
tables, trees and treaps.

But... we didn't, and I don't think that Python is a worse language
because we only have one associative array implementation in the stdlib.

Whenever I need an associative array, I don't lose sleep over whether I
could get 2% better performance for hits, at a cost of 17% worse
performance for misses, by swapping over to some other implementation. I
just reach for dict, knowing that it will almost always be good enough.


> Last year, for fun, after wishing there was a SortedSet in the standard
> lib, I ended up implementing a Red-Black Tree and BTree based sorted
> dictionary/set[1]. After then trying to use them for my use case[2], I
> found that, in order to fully and truly exploit their benefits, the basic
> Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
> would let me, e.g. binary search to a particular spot and then iterate, or
> give me a range between two points, etc.

I believe that sortedcontainers.SortedSet provides that functionality
via the irange() method.

http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontainers.SortedList.irange


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DWGJAXOVAY64XU7FBQI6D3D5NNFKZEHW/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
I think Richard's point was two-fold: People usually don't want or need
this kind of thing /except/ when they have some very specific
performance requirements, in which case they probably want to also be
very specific about the kind of container they are using rather than
using an abstract "sorted container".

It is true that people sensitive to performance may really care about
the way dict is implemented, but there are a great many uses for
associative arrays in general. I knew about sortedcontainers and I also
don't remember ever seeing a situation where I needed one or recommended
its use. Maybe it would be useful in leetcode-style interview questions
(which may by itself be a good reason to include it in the standard
library, since a lot of those interviewers will let you use
`collections.deque` or `collections.OrderedDict` without implementing
any sort of linked listed, but they won't let you import a module that
provides a trie or something)?

I'm fairly neutral on this proposal. I do think the standard library is
a good place for these sorts of fundamental data structures (even
abstract ones), but I do agree with Richard that they are pretty niche.
In almost any situation where you'd want / need something like this, I
feel like adding a PyPI dependency would not be a big deal.

One thing that I do think might be valuable is if the plan were to
re-implement these as C extensions. Maintaining and distributing C
extensions on PyPI is in many ways more difficult than bundling them
directly into CPython. That said, doing so adds maintenance burden and
implementation cost (and it's a different proposition than merely
adopting an existing library), so I'm probably -0.0 on the whole thing —
the absence of these types of containers in the standard library is not
an obvious lack, and getting them from PyPI in the situations where you
actually need them doesn't seem like a big deal, particularly when it's
a pure Python impementation.

On 11/11/21 21:43, Steven D'Aprano wrote:
> On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:
>
>> Should the stdlib have e.g. SortedList? Probably not, because the use cases
>> of such data types are too niche to a one-size-fits-all implementation, and
>> there are too many implementations with too many of their own settings.
>> Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so,
>> because those are somewhat fundamental data structures and implementing,
>> testing, and verifying them is very much non-trivial. While niche, having
>> them at your immediate disposal is very powerful.
> By that reasoning, we shouldn't have dicts, for exactly the same reason:
> for anyone wanting an associative array, there are so many implementation
> variants to choose from:
>
> - hash table with linear probing
> - hash table with chaining
> - AVL tree
> - red-black tree
> - judy array
> - splay tree
> - treap
> - scapegoat tree
>
> and many more, and most of them can be tuned.
>
> If Python didn't already have dict, your argument against SortedList
> would equally apply to it: they are "fundamental data structures and
> implementing, testing, and verifying them is very much non-trivial".
>
> So if your argument is correct, that would imply that standardizing on
> one single dict implementation, one that isn't even tunable by the
> caller, was a mistake. We should have provided a dozen different hash
> tables, trees and treaps.
>
> But... we didn't, and I don't think that Python is a worse language
> because we only have one associative array implementation in the stdlib.
>
> Whenever I need an associative array, I don't lose sleep over whether I
> could get 2% better performance for hits, at a cost of 17% worse
> performance for misses, by swapping over to some other implementation. I
> just reach for dict, knowing that it will almost always be good enough.
>
>
>> Last year, for fun, after wishing there was a SortedSet in the standard
>> lib, I ended up implementing a Red-Black Tree and BTree based sorted
>> dictionary/set[1]. After then trying to use them for my use case[2], I
>> found that, in order to fully and truly exploit their benefits, the basic
>> Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that
>> would let me, e.g. binary search to a particular spot and then iterate, or
>> give me a range between two points, etc.
> I believe that sortedcontainers.SortedSet provides that functionality
> via the irange() method.
>
> http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontainers.SortedList.irange
>
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:

> I knew about sortedcontainers and I also don't remember ever seeing a
> situation where I needed one or recommended its use.

We have a very odd situation where apparently sortedcontainers is one
of the most well-known, popular, most heavily downloaded libraries on
PyPI. According to here:

https://hugovk.github.io/top-pypi-packages/

sortedcontainers is the 290th most popular package on PyPI, ahead of
such luminaries as pylint, black, selenium, mypy, django and nose.

And yet, nobody(?) admits to either using it or knowing what it could be
used for. How very curious :-/


--
Steve
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7JVLJE4G6ZWM6PKXTMSZL7MVAIUWGZRH/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On Sat, Nov 13, 2021 at 3:20 AM Steven D'Aprano <steve@pearwood.info> wrote:
>
> On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:
>
> > I knew about sortedcontainers and I also don't remember ever seeing a
> > situation where I needed one or recommended its use.
>
> We have a very odd situation where apparently sortedcontainers is one
> of the most well-known, popular, most heavily downloaded libraries on
> PyPI. According to here:
>
> https://hugovk.github.io/top-pypi-packages/
>
> sortedcontainers is the 290th most popular package on PyPI, ahead of
> such luminaries as pylint, black, selenium, mypy, django and nose.
>
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/
>

I think that proves the value of download counts: very little. The
highest on the list is a thing called botocore, which I've never heard
of. What is it? It's a dependency of a number of Amazon web services.
My guess is that it's a dependency of popular tools - or maybe of
automatically-installed tools, even - while not itself being well
known. Actually, quite a few of the most popular packages on that list
are Amazon-related, so quite possibly they're all being installed as a
set in response to one single *actual* dependency.

(Download counts DO most likely indicate that something is heavily
used, though, so if people are trying to plan out a benchmark suite,
then that might be more useful. But I'm not really surprised that a
heavily-downloaded package isn't itself well known.)

ChrisA
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/LF622NRJBFBNHKZR2B7P6X4D2GE4WSDX/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On 12.11.2021 17:10, Steven D'Aprano wrote:
> On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:
>
>> I knew about sortedcontainers and I also don't remember ever seeing a
>> situation where I needed one or recommended its use.
>
> We have a very odd situation where apparently sortedcontainers is one
> of the most well-known, popular, most heavily downloaded libraries on
> PyPI. According to here:
>
> https://hugovk.github.io/top-pypi-packages/
>
> sortedcontainers is the 290th most popular package on PyPI, ahead of
> such luminaries as pylint, black, selenium, mypy, django and nose.
>
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/

Those download stats can be misleading. Packages are often pulled in
as a dependency of other packages which are popular or used a lot
in CI/CD setups.

Perhaps there's a reverse dependency graph we could use to find out
why the package is downloaded this often. I remember having seen
a project which does this, but have lost the URL.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 12 2021)
>>> Python Projects, Coaching and Support ... https://www.egenix.com/
>>> Python Product Development ... https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
https://www.egenix.com/company/contact/
https://www.malemburg.com/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FJT4SSWISWWRFB5FGRPXSLDLDZVMUIXF/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/

trio (which IMHO is a somewhat high profile uses it):

https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27 <https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27>

And so if I am not mistaken any project that depends on trio will automatically need to have sortedcontainers as its dependency? I am not saying this alone will explain why sortedcontainers have large number of downloads, but with one or two projects like this it will push the number up maybe?

Best,
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
> On 12 Nov 2021, at 16:32, Marc-Andre Lemburg <mal@egenix.com> wrote:
>
> Perhaps there's a reverse dependency graph we could use to find out
> why the package is downloaded this often. I remember having seen
> a project which does this, but have lost the URL.


I believe this is the URL we are looking for:

https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1 <https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1>


Best,
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
> And yet, nobody(?) admits to either using it or knowing what it could be
> used for. How very curious :-/

trio (which IMHO is a somewhat high profile uses it):

https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27 <https://github.com/python-trio/trio/blob/master/trio/_core/_run.py#L27>

And so if I am not mistaken any project that depends on trio will automatically need to have sortedcontainers as its dependency? I am not saying this alone will explain why sortedcontainers have large number of downloads, but with one or two projects like this it will push the number up maybe?

Best,
Bob
Re: Having Sorted Containers in stdlib? [ In reply to ]
Yeah, a datapoint I didn't see mentioned is searching on public github
repos for import sortedcontainers (which includes from sortedcontainers
import).

Obviously it's just one datapoint but it shows a very small count compared
to other packages mentioned when looking at many of the stats available:

https://github.com/search?q=import+sortedcontainers+language%3APython&type=code
https://github.com/search?q=import+pylint+language%3APython&type=code
https://github.com/search?q=import+black+language%3APython&type=code
https://github.com/search?q=import+mypy+language%3APython&type=code
https://github.com/search?q=import+django+language%3APython&type=code

Damian


On Fri, Nov 12, 2021 at 11:39 AM Marc-Andre Lemburg <mal@egenix.com> wrote:

> On 12.11.2021 17:10, Steven D'Aprano wrote:
> > On Fri, Nov 12, 2021 at 10:07:13AM -0500, Paul Ganssle wrote:
> >
> >> I knew about sortedcontainers and I also don't remember ever seeing a
> >> situation where I needed one or recommended its use.
> >
> > We have a very odd situation where apparently sortedcontainers is one
> > of the most well-known, popular, most heavily downloaded libraries on
> > PyPI. According to here:
> >
> > https://hugovk.github.io/top-pypi-packages/
> >
> > sortedcontainers is the 290th most popular package on PyPI, ahead of
> > such luminaries as pylint, black, selenium, mypy, django and nose.
> >
> > And yet, nobody(?) admits to either using it or knowing what it could be
> > used for. How very curious :-/
>
> Those download stats can be misleading. Packages are often pulled in
> as a dependency of other packages which are popular or used a lot
> in CI/CD setups.
>
> Perhaps there's a reverse dependency graph we could use to find out
> why the package is downloaded this often. I remember having seen
> a project which does this, but have lost the URL.
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Nov 12 2021)
> >>> Python Projects, Coaching and Support ... https://www.egenix.com/
> >>> Python Product Development ... https://consulting.egenix.com/
> ________________________________________________________________________
>
> ::: We implement business ideas - efficiently in both time and costs :::
>
> eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
> D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
> Registered at Amtsgericht Duesseldorf: HRB 46611
> https://www.egenix.com/company/contact/
> https://www.malemburg.com/
>
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-leave@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/FJT4SSWISWWRFB5FGRPXSLDLDZVMUIXF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
Re: Having Sorted Containers in stdlib? [ In reply to ]
On 11/12/2021 11:31 AM, Chris Angelico wrote:
>
> I think that proves the value of download counts: very little. The
> highest on the list is a thing called botocore, which I've never heard
> of. What is it? It's a dependency of a number of Amazon web services.
> My guess is that it's a dependency of popular tools - or maybe of
> automatically-installed tools, even - while not itself being well
> known. Actually, quite a few of the most popular packages on that list
> are Amazon-related, so quite possibly they're all being installed as a
> set in response to one single *actual* dependency.

botocore and boto3 are the Amazon-provided low- and high-level
interfaces, respectively, to AWS. They're extremely import and well
known in that environment.

Eric

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/V4GSZINECFIAZ4MGDZNM3L354LJDSX2Y/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
On 12.11.2021 17:46, Bob Fang wrote:
>
>
>> On 12 Nov 2021, at 16:32, Marc-Andre Lemburg <mal@egenix.com
>> <mailto:mal@egenix.com>> wrote:
>>
>> Perhaps there's a reverse dependency graph we could use to find out
>> why the package is downloaded this often. I remember having seen
>> a project which does this, but have lost the URL.
>
>
> I believe this is the URL we are looking for:?
>
> https://www.wheelodex.org/projects/sortedcontainers/rdepends/?page=1

Yes, thanks, that was the one.

The listing doesn't seem to solve the puzzle either, though. There
are a few high profile projects using the package, but those are not
high up on the download ranking, so don't explain the high download
counts of the package.

BTW: This reverse dependency ranking on the website gives a good
idea of how important a package is for the Python package eco system:

https://www.wheelodex.org/rdepends-leaders/

(sort of like the Page rank for Python packages)

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Nov 12 2021)
>>> Python Projects, Coaching and Support ... https://www.egenix.com/
>>> Python Product Development ... https://consulting.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
https://www.egenix.com/company/contact/
https://www.malemburg.com/

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/732T3MNBB2PMFP7TVXSDDMW6KNI5CEQS/
Code of Conduct: http://python.org/psf/codeofconduct/
Re: Having Sorted Containers in stdlib? [ In reply to ]
I started writing up a SortedDict use case I have, but it's very
elaborate and I expect it would just end with endless pointless
argument about other approaches I _could_ take. But I already know all
those ;-)

So let's look at something conceptually "dead easy" instead: priority
queues. They're a basic building block in algorithms from many fields.

In the distributed Python, `heapq` is the only thing that comes close
to being reasonably efficient and scalable for that.

However:

- It only supports min-heaps. It's not that max-heaps aren't also
useful (indeed, _under the covers_ CPython also implements max-heaps
for its own internal use). It's more that the original API simply
doesn't generalize cleanly.

- So there's a body of word-of-mouth "tricks" for making a min-heap
"act like" a max-heap. For integer or float priorities, it suffices to
use the negation of "the real" priority. For other kinds of values, it
gets trickier. In the end, you can wrap each item in a class that
implements `x.__lt__(y)` by doing `y.value < x.value`. But few people
appear to be aware of that, while nobody wants to endure the bother
;-)

- It's often the case that "the priority" needs to be computed from an
item's value. `heapq` has no direct support for that. Instead there's
another body of word-of-mouth tricks akin to the old
decorate-sort-undecorate approach sorting used. Rather than store the
values in the heap, you store `(priority, value)` tuples. But then
various unpleasant things can happen if two priorities are tied: in
that case tuple comparison falls back to comparing the values. But,
e.g., value comparison may be very expensive, or the values may not
support __lt__ at all. So instead you store `(priority,
unique_integer, value)` triples. And hope that you don't really need a
max-heap, lest it get even more obscure.

Using SortedContainers, none of those are issues. It's all
straightforward and obvious. If you have a list L always sorted by
priority, extracting from a "min queue" just requires L.pop(0), or
from a "max queue" L.pop(-1) (or L.pop() - -1 is the default, same as
for the Python list.pop()). No tricks are needed. Fine too if
sometimes you want to extract the min, while at other times the max.
Or, e.g., peek at the 5 highest-priority values and pick the one that
best fits resources available at the time.

In cases of computed priorities, the SortedKeyList constructor allows
specifying a function to be used to compute the keys used to determine
the list's ordering. Again straightforward and obvious.

from sortedcontainers import SortedKeyList
L = SortedKeyList([.(i, j) for i in range(1, 5)
for j in range(1, 5)],
key=lambda t: t[0]/t[1])

>>> [t for t in L]
[.(1, 4), (1, 3), (1, 2), (2, 4), (2, 3), (3, 4), (1, 1),
(2, 2), (3, 3), (4, 4), (4, 3), (3, 2), (2, 1), (4, 2),
(3, 1), (4, 1)]

That said, in most of my code I end up _removing_ uses of
SortedContainers, in favor of faster ways of getting the job done. The
package isn't the last resort for me, it's the first. I can write code
in straightforward ways that scale well from the start. If I need more
speed later, fine, I can puzzle out faster ways to get the task done.
If you're proud of that you _start_ by plotting ways to minimize the
number of sorts you do, you're coding in C++, not Python ;-)

So I suggest people are staring at the wrong end when asking for use
cases that can't be done without the package. Those are necessarily
non-trivial. Having a sorted collection is "more than good enough" for
many tasks. As above, e.g., there's no reason to imagine that Grant
Jenks had priority queues in mind at all, yet the flavors of sorted
lists he implemented are very easy to use for that task.
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/XXAXVV3KJ4ORQ7K6ELSS4R7S5725ASRE/
Code of Conduct: http://python.org/psf/codeofconduct/

1 2  View All