Mailing List Archive

use sort;
use sort;

With the C<sort> pragma you can control the behaviour of the builtin
C<sort()> function

There are currently two options:

use sort 'defaults'; # default sort algorithm, which before v5.8.0 was unstable
use sort 'stable'; # alternative sort algorithm, guaranteed to be stable

Prior to 5.28.0 there were two other options:

use sort '_mergesort';
use sort '_qsort'; # or '_quicksort'

If you try and specify either of these in 5.28+ it will croak.

The pragma dates from the time when there were two sort implementations to
choose from: one was not a *stable sort*, but you could then also choose to
use it with a "tie breaker" sorting function which would make it stable (but
slightly slower). A stable sort is one that maintains the relative order of
values which compare as equal.

Both options result in the same behaviour - a stable mergesort. The only
options that changed behaviour were the '_qsort' option, which selected
quicksort instead of mergesort, and '_qsort' with 'stable', which added a
"tie breaker" to force the quicksort to be stable (by default, quicksort is
unstable).

The default sort has been stable since v5.8.0, and given this consistent
behaviour for almost two decades, everyone has come to assume
stability. Moreover, both the ability to remove stability and the
alternative sort ('_qsort') were removed in v5.28.0, so there isn't actually
anything you *can* influence - you can no longer change the algorithm, and
for perl's mergesort you could never turn off enforced stability to trade
this for a small speed gain.

In the context of C<sort>, the PSC feels that Perl has come to guarantee
that the default sort will be stable - the balance will never tip so far
that we think it worth *defaulting* to unstable. Hence the need to
C<use sort 'stable';> to guarantee stability is obsolete.

More generally, the historical purpose of C<use sort> was one way of
upgrading one type of behaviour. We wondered why we have a special case for
sort. Why can't one simply import a new sort function, the same way that
you could import some other function to do $other?

Thirdly, if you know that you care that much about performance of your
sorting, and that for your use case and your data, it was worth
investigating alternatives, possible to identify an alternative from our
default that was better, and the cost of switching was worth it, then you
know more than we do. Likely whatever choices we can give are not as good as
implementing your own. (Where you can do things like remove the need for
`reverse sort` to sort in reverse, which requires extra work with stability,
or optimise for a particular comparison function, or even use radix sort
because your keys and sorting criteria are suitable for it.)

We are not averse to *changing* the sort algorithm, but we don't see the
benefit in offering the choice of two general purpose implementations.

So as a general take away for the direction of the language, PSC

* think it important that we favour simple sensible defaults over more knobs
you can twiddle. The latter only give you an illusion of power.
* general purpose ways to extend/replace functionality are better than
special cases

The specific take away for sort is

* Document the pragma as "discouraged"
* Document that stability will remain the default
* Document that we do not foresee going back to offering multiple
implementations
* No longer set the *two* hints bits that it uses, or get passed down through
the optree into the calls in pp_sort.c
* We can't drop the pragma, as there are ~60 CPAN distributions with
use sort 'stable' and a handful that mention defaults.
Re: use sort; [ In reply to ]
On Wed, Jun 9, 2021 at 3:03 AM Nicholas Clark <nick@ccl4.org> wrote:

> More generally, the historical purpose of C<use sort> was one way of
> upgrading one type of behaviour. We wondered why we have a special case for
> sort. Why can't one simply import a new sort function, the same way that
> you could import some other function to do $other?
>

Having a rough idea of how sort is parsed, I'm not sure introducing more
keywords with that behavior is wise...

(not that it sounds like you are suggesting anything specific here)

-Dan