Mailing List Archive

Elevator pitch, deprecating $a/$b sort globals by using sub{}
Hi list


Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
signatures as a language feature it's feasable to make sort not only
take a blok or a subroutine name, but also a sub. That means $a and $b
for sort can be deprecated in the long run, and eventually turned off
with a feature flag if we so choose.


So basically,

    sort { $a->{x} <=> $b->{x} } @list

could be written as:

    sort sub($a,$b){ $a->{x} <=> $b->{x} } @list

which is currently a syntax error.


Yes, it is longer, but less of a wart. It lessens the cognitive load,
avoids the $a/b global variables and makes the language more consistent.


I do understand that this does not come for free, someone has to build
this, and build it so it's as efficient as current $a/b usage when
applicable. But maybe someone feels it is worth enough to have a second
look at it.


An argument against it, is that it adds yet another way to use sort,
which already has three modes of operation. I think it is worth it, but
that is certainly something to be considered, it lessens the cognitive
load on one side and adds to it on another.


So what does p5p think? RFC worthy? a stupid idea? Re-submit later when
signatures are in more common use?


HTH,

M4


P.S. That means 'sort \&x, @list;' probably should be valid too, which
points to a syntax problem, why does the this one have a comma, but the
'sub($a,$b){}' version does not? There is some room for bikeshedding
here. However, allowing this does not really add anything, as 'sort x
@list;' already achieves the same thing, and easier at that. So allowing
it would either be for consistency, or an artifact of implementation.
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
* Martijn Lievaart <m@rtij.nl> [2021-07-04 17:40:09 +0200]:

> Hi list
>
>
> Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
> signatures as a language feature it's feasable to make sort not only take a
> blok or a subroutine name, but also a sub. That means $a and $b for sort can
> be deprecated in the long run, and eventually turned off with a feature flag
> if we so choose.
>
>
> So basically,
>
> ??? sort { $a->{x} <=> $b->{x} } @list
>
> could be written as:
>
> ??? sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
>
> which is currently a syntax error.
>
>
> Yes, it is longer, but less of a wart. It lessens the cognitive load, avoids
> the $a/b global variables and makes the language more consistent.

Does this boil down to removing C<sort>'s support for a BLOCK? It's
in this BLOCK that $a/$b become significant.

Because I had to educate myself on the deep usage details of C<sort>
before replying, it seems that it does indeed support one to define
a subroutine - but by an undecorated name of an existing subroutine,

sort mysub @list;

It's probably safe to assume that signatures and anything available
via C<sub is available there. However, it also seems there is a
lack of support of either providing an subroutine reference or an
inline anonymous sub.

Still both (while related) seem to fall outside of the purpose of
the BLOCK support - which may be an incremental step towards what
providing a subroutine name allows.

To summarize, from this pre-RFC, I get the following potential
suggestions:

* eliminate BLOCK from support
* warn when $a/$b are being references in a subroutine (via SUBNAME)
* support a SUBREF (e.g., C<sort \&subref @list>)
* support inline anonymous sub (w/prototypes, signatures)

There are surely more, e.g.,; warn on $a/$b when used with anything
other than BLOCK. Another would be the rather pointless ability to
use an anonymous subroutine inline without the ability to capture
it for future use.

> I do understand that this does not come for free, someone has to build this,
> and build it so it's as efficient as current $a/b usage when applicable. But
> maybe someone feels it is worth enough to have a second look at it.
>
>
> An argument against it, is that it adds yet another way to use sort, which
> already has three modes of operation. I think it is worth it, but that is
> certainly something to be considered, it lessens the cognitive load on one
> side and adds to it on another.
>
>
> So what does p5p think? RFC worthy? a stupid idea? Re-submit later when
> signatures are in more common use?

I am not p5p, but I personally believe there is something here if it can
be refined more. I'm also not speaking from a place of any great expertise
other than the truth that, yes, I can empathize that the $a/$b thing is
a "wart" - i.e., inconsistent with other things related to scoping. Also,
sometimes I've just wanted to use $a/$b and got that "warning" that they're
special or something...

Hope that helps also.

Cheers,
Brett

>
>
> HTH,
>
> M4
>
>
> P.S. That means 'sort \&x, @list;' probably should be valid too, which
> points to a syntax problem, why does the this one have a comma, but the
> 'sub($a,$b){}' version does not? There is some room for bikeshedding here.
> However, allowing this does not really add anything, as 'sort x @list;'
> already achieves the same thing, and easier at that. So allowing it would
> either be for consistency, or an artifact of implementation.

yes; see above. And also, as I've mentioned above, I think there is something
here. It might just more "support CODEREF/inline anonymous subs" more than
it is "get rid of $a/$b".

>

--
--
oodler@cpan.org
oodler577@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Sun, Jul 04, 2021 at 05:40:09PM +0200, Martijn Lievaart wrote:
> Hi list
>
>
> Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
> signatures as a language feature it's feasable to make sort not only take a
> blok or a subroutine name, but also a sub. That means $a and $b for sort can
> be deprecated in the long run, and eventually turned off with a feature flag
> if we so choose.
>
>
> So basically,
>
> ??? sort { $a->{x} <=> $b->{x} } @list
>
> could be written as:
>
> ??? sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
>
> which is currently a syntax error.

That's always a good start when proposing new syntax :-)

> Yes, it is longer, but less of a wart. It lessens the cognitive load, avoids
> the $a/b global variables and makes the language more consistent.

The first thing that thought was that this makes sort special, compared
with the other builtins that take blocks (map, grep)

But those take one implicit argument, for which $_ works just fine.

(And I think that most others that might get proposed, such as first and any
would also use $_)

Our problem here is that sort needs two implicit arguments. *Is* it the only
operator that wants >1 implicit argument?

Because to me it feels like we're now creating a *different* set of special
case syntax for sort, to eliminate the current special-case syntax ($a, $b).

What else might re-use this syntax?


The second thing that I thought was that as sort can currently take a
subroutine with a *prototype* of $$ to eliminate $a, $b:

$ perl -lwe 'sub foo($$) { 1/$_[0] <=> 1/$_[1] }; print for sort foo (.5, .25, 3, 42, 1, -1, -.1)'
-0.1
-1
42
3
1
0.5
0.25

then really to be consistent we should then say that also a signature of
arity exactly 2 should behave the same.

We currently we can't do.

$ ./perl -Ilib -lwe 'use feature "signatures"; sub foo ($q, $r) { 1/$q <=> 1/$r }; print for sort foo (.5, .25, 3, 42, 1, -1, -.1)'
The signatures feature is experimental at -e line 1.
Too few arguments for subroutine 'main::foo' (got 0; expected 2) at -e line 1.


and signatures are compiled to regular perl OPs at the start of the
subroutine body, so we don't (currently) have a way to introspect the
signature of a subroutine to reason about its arity.


> An argument against it, is that it adds yet another way to use sort, which
> already has three modes of operation. I think it is worth it, but that is
> certainly something to be considered, it lessens the cognitive load on one
> side and adds to it on another.

That sums up what I'm currently thinking. "Swings and roundabouts"

To me it seems more that it adds another way of doing something, making it
differently easy, rather than making something previously hard easy.

But that's as much about the syntax you're suggesting as the problem you're
trying to solve. It doesn't seem that the proposed syntax can be re-used for
any other task.

> P.S. That means 'sort \&x, @list;' probably should be valid too, which
> points to a syntax problem, why does the this one have a comma, but the
> 'sub($a,$b){}' version does not? There is some room for bikeshedding here.
> However, allowing this does not really add anything, as 'sort x @list;'
> already achieves the same thing, and easier at that. So allowing it would
> either be for consistency, or an artifact of implementation.


I am not an expert on the corners of the parser, but given that I can write:

perl -lwe 'sub foo($$) { 1/$_[0] <=> 1/$_[1] }; my $bar = \&foo; print for sort $bar (.5, .25, 3, 42, 1, -1, -.1)'
-0.1
-1
42
3
1
0.5
0.25


my impression was that the syntax that follows sort is already complex and
hides various special cases and heuristics.


I *think* that your new suggestion is only "not currently valid syntax"
because you omit the comma. If I write:

$ ./perl -Ilib -lwe 'use feature "signatures"; print for sort sub ($q, $r) { 1/$q <=> 1/$r }, (.5, .25, 3, 42, 1, -1, -.1)'

The signatures feature is experimental at -e line 1.
-0.1
-1
0.25
0.5
1
3
42
CODE(0x55c991505c68)


at which point, it looks like this plan only works because a comma isn't
there.

Put a comma in, and the code means something else.

That feels like a risky design.


Nicholas Clark
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
Den 2021-07-04 kl. 17:40 skrev Martijn Lievaart:
> Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
> signatures as a language feature it's feasable to make sort not only
> take a blok or a subroutine name, but also a sub. That means $a and $b
> for sort can be deprecated in the long run, and eventually turned off
> with a feature flag if we so choose.

This is a difficult question. I think that Perl is used in two ways:

1. For building small scripts.
2. For building large programs.

This duality means that the design of the language has to be carefully
balanced. "Small" Perl favors implicitness, "big" Perl favors
strictness. In some cases, they're mutually incompatible, and it is
important that neither concern be favored to the detriment of the other,
lest an important significant part of Perl users become disillusioned.

In the past, the strict variant of Perl has been the answer to this
divide. Keeping in mind desires among users of "big" Perl for more
strictness, I think one of the following solutions should be considered:

1. Implicit $a/$b should be disabled when 'use strict' is active.
2. Implicit $a/$b should be disabled by a new pragma, 'use stricter'
= 'use strict' plus any number of further strictures.

The drawback of (2) is that it divides the Perl language even further,
but the benefit is that much more strictness, such as that suggested
here, could be added to Perl without any controversy. It would satisfy
"big" Perl while not upsetting "small" Perl.

(My personal opinion is that implicit $a/$b is too useful to remove from
strict Perl.)
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Sun, Jul 4, 2021 at 1:02 PM Nicholas Clark <nick@ccl4.org> wrote:

>
> > Yes, it is longer, but less of a wart. It lessens the cognitive load,
> avoids
> > the $a/b global variables and makes the language more consistent.
>
> The first thing that thought was that this makes sort special, compared
> with the other builtins that take blocks (map, grep)
>
> But those take one implicit argument, for which $_ works just fine.
>

FWIW, I think being able to specify a lexical to use instead of $_ for map
and grep blocks would also be really cool and an alternative to the failed
"lexical $_" experiment for avoiding the global state (e.g. you could call
functions that mess with $_ inside the map or grep without breaking your
code). So perhaps there is a way to solve these problems with
similar syntax.

-Dan
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
*To me it seems more that it adds another way of doing something, making it
differently easy, rather than making something previously hard easy.*

Well said! I say that is the crucial test that should be applied to all
proposed extensions.

On Sun, Jul 4, 2021 at 6:02 PM Nicholas Clark <nick@ccl4.org> wrote:

> On Sun, Jul 04, 2021 at 05:40:09PM +0200, Martijn Lievaart wrote:
> > Hi list
> >
> >
> > Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
> > signatures as a language feature it's feasable to make sort not only
> take a
> > blok or a subroutine name, but also a sub. That means $a and $b for sort
> can
> > be deprecated in the long run, and eventually turned off with a feature
> flag
> > if we so choose.
> >
> >
> > So basically,
> >
> > sort { $a->{x} <=> $b->{x} } @list
> >
> > could be written as:
> >
> > sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
> >
> > which is currently a syntax error.
>
> That's always a good start when proposing new syntax :-)
>
> > Yes, it is longer, but less of a wart. It lessens the cognitive load,
> avoids
> > the $a/b global variables and makes the language more consistent.
>
> The first thing that thought was that this makes sort special, compared
> with the other builtins that take blocks (map, grep)
>
> But those take one implicit argument, for which $_ works just fine.
>
> (And I think that most others that might get proposed, such as first and
> any
> would also use $_)
>
> Our problem here is that sort needs two implicit arguments. *Is* it the
> only
> operator that wants >1 implicit argument?
>
> Because to me it feels like we're now creating a *different* set of special
> case syntax for sort, to eliminate the current special-case syntax ($a,
> $b).
>
> What else might re-use this syntax?
>
>
> The second thing that I thought was that as sort can currently take a
> subroutine with a *prototype* of $$ to eliminate $a, $b:
>
> $ perl -lwe 'sub foo($$) { 1/$_[0] <=> 1/$_[1] }; print for sort foo (.5,
> .25, 3, 42, 1, -1, -.1)'
> -0.1
> -1
> 42
> 3
> 1
> 0.5
> 0.25
>
> then really to be consistent we should then say that also a signature of
> arity exactly 2 should behave the same.
>
> We currently we can't do.
>
> $ ./perl -Ilib -lwe 'use feature "signatures"; sub foo ($q, $r) { 1/$q <=>
> 1/$r }; print for sort foo (.5, .25, 3, 42, 1, -1, -.1)'
> The signatures feature is experimental at -e line 1.
> Too few arguments for subroutine 'main::foo' (got 0; expected 2) at -e
> line 1.
>
>
> and signatures are compiled to regular perl OPs at the start of the
> subroutine body, so we don't (currently) have a way to introspect the
> signature of a subroutine to reason about its arity.
>
>
> > An argument against it, is that it adds yet another way to use sort,
> which
> > already has three modes of operation. I think it is worth it, but that is
> > certainly something to be considered, it lessens the cognitive load on
> one
> > side and adds to it on another.
>
> That sums up what I'm currently thinking. "Swings and roundabouts"
>
> To me it seems more that it adds another way of doing something, making it
> differently easy, rather than making something previously hard easy.
>
> But that's as much about the syntax you're suggesting as the problem you're
> trying to solve. It doesn't seem that the proposed syntax can be re-used
> for
> any other task.
>
> > P.S. That means 'sort \&x, @list;' probably should be valid too, which
> > points to a syntax problem, why does the this one have a comma, but the
> > 'sub($a,$b){}' version does not? There is some room for bikeshedding
> here.
> > However, allowing this does not really add anything, as 'sort x @list;'
> > already achieves the same thing, and easier at that. So allowing it would
> > either be for consistency, or an artifact of implementation.
>
>
> I am not an expert on the corners of the parser, but given that I can
> write:
>
> perl -lwe 'sub foo($$) { 1/$_[0] <=> 1/$_[1] }; my $bar = \&foo; print for
> sort $bar (.5, .25, 3, 42, 1, -1, -.1)'
> -0.1
> -1
> 42
> 3
> 1
> 0.5
> 0.25
>
>
> my impression was that the syntax that follows sort is already complex and
> hides various special cases and heuristics.
>
>
> I *think* that your new suggestion is only "not currently valid syntax"
> because you omit the comma. If I write:
>
> $ ./perl -Ilib -lwe 'use feature "signatures"; print for sort sub ($q, $r)
> { 1/$q <=> 1/$r }, (.5, .25, 3, 42, 1, -1, -.1)'
>
> The signatures feature is experimental at -e line 1.
> -0.1
> -1
> 0.25
> 0.5
> 1
> 3
> 42
> CODE(0x55c991505c68)
>
>
> at which point, it looks like this plan only works because a comma isn't
> there.
>
> Put a comma in, and the code means something else.
>
> That feels like a risky design.
>
>
> Nicholas Clark
>
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Sun, Jul 4, 2021 at 1:09 PM John Ankarström <john@ankarstrom.se> wrote:

> Den 2021-07-04 kl. 17:40 skrev Martijn Lievaart:
> > Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
> > signatures as a language feature it's feasable to make sort not only
> > take a blok or a subroutine name, but also a sub. That means $a and $b
> > for sort can be deprecated in the long run, and eventually turned off
> > with a feature flag if we so choose.
>
> This is a difficult question. I think that Perl is used in two ways:
>
> 1. For building small scripts.
> 2. For building large programs.
>
> This duality means that the design of the language has to be carefully
> balanced. "Small" Perl favors implicitness, "big" Perl favors
> strictness. In some cases, they're mutually incompatible, and it is
> important that neither concern be favored to the detriment of the other,
> lest an important significant part of Perl users become disillusioned.
>
> In the past, the strict variant of Perl has been the answer to this
> divide. Keeping in mind desires among users of "big" Perl for more
> strictness, I think one of the following solutions should be considered:
>
> 1. Implicit $a/$b should be disabled when 'use strict' is active.
> 2. Implicit $a/$b should be disabled by a new pragma, 'use stricter'
> = 'use strict' plus any number of further strictures.
>
> The drawback of (2) is that it divides the Perl language even further,
> but the benefit is that much more strictness, such as that suggested
> here, could be added to Perl without any controversy. It would satisfy
> "big" Perl while not upsetting "small" Perl.
>
> (My personal opinion is that implicit $a/$b is too useful to remove from
> strict Perl.)
>
>
Unfortunately $a and $b are in far too much use in programs that use strict
for 1 to be at all feasible. 2 is the only option (and it would likely
instead be a specific feature, which could be implicitly enabled by other
constructs).

-Dan
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On 2021-07-04 8:40 a.m., Martijn Lievaart wrote:
> Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With signatures
> as a language feature it's feasable to make sort not only take a blok or a
> subroutine name, but also a sub. That means $a and $b for sort can be deprecated
> in the long run, and eventually turned off with a feature flag if we so choose.
>
> So basically,
>
>     sort { $a->{x} <=> $b->{x} } @list
>
> could be written as:
>
>     sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
>
> which is currently a syntax error.

If we're going to change something I would propose a minimal change is that in
contexts like this make $_ a collection type whose elements are the operands.
For example, make it an arrayref.

Then we can have:

sort { $_->[0] <-> $_->[1] } @list

Or:

sort { my_sort(@{$_}) @list

Which is then most similar to the other list operators using $_ already:

map { my_map($_) } @list

Over a decade ago when I implemented sort methods for my Set::Relation module I
used $_ like this because there didn't seem any way to emulate the
topicalization of $a and $b like I could for $_ in my map/grep/etc methods.

-- Darren Duncan
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
* Martijn Lievaart <m@rtij.nl> [2021-07-04 17:40:09 +0200]:

> Hi list
>
>
> Karl Wiliamson's remarks about $a/$b as a quirk got me thinking. With
> signatures as a language feature it's feasable to make sort not only take a
> blok or a subroutine name, but also a sub. That means $a and $b for sort can
> be deprecated in the long run, and eventually turned off with a feature flag
> if we so choose.
>
>
> So basically,
>
> ??? sort { $a->{x} <=> $b->{x} } @list
>
> could be written as:
>
> ??? sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
>
> which is currently a syntax error.


I can see a value in allowing inline, anonymous subroutines where
signatures (and prototypes) are allowed; however the suggestion
above seems to be conflating

sort BLOCK LIST

with,

sort SUBNAME LIST

For BLOCK, $a/$b are necessary. However, this may point to the
actual "missing feature" of:

sort CODEREF LIST, which would in practice look like:

i)
my @sorted_list = sort \&mysub @list;

or

ii)
my @sorted_list = sort sub { my ($lhv, $rhv) = @_; ... } @list;

The only potential weakness of (ii) is that there is no way to
capture a reference of the CODEREF defined inline; this would be
helpful if one used a C<state> variable inside of the inline CODEREF.
However, I can't think of a traditional way to capture the the
reference to the inline CODEREF with affecting what C<sort> returned,
e.g.,

my ($sorted_list_ref, $coderef) = sort sub { ... } @list;.

>
>
> Yes, it is longer, but less of a wart. It lessens the cognitive load, avoids
> the $a/b global variables and makes the language more consistent.
>
>
> I do understand that this does not come for free, someone has to build this,
> and build it so it's as efficient as current $a/b usage when applicable. But
> maybe someone feels it is worth enough to have a second look at it.
>
>
> An argument against it, is that it adds yet another way to use sort, which
> already has three modes of operation. I think it is worth it, but that is
> certainly something to be considered, it lessens the cognitive load on one
> side and adds to it on another.
>

The cognative load might be alleviated, as others suggested, but hiding the
C<$a <=> $b> needed for non-default numerical (ascending) sort to be represented
by non-BLOCK things, so add a new class of specifier:

sort HOW @list

Where C<HOW> is a well defined set of orderings, e.g.:

sort :NUMERICAL: @list

sort :NUMERICAL:DESC: @list

Of course, you could get the worst of all worlds by supporting *SQL:

sort q{ORDER by $_->{a} DESC} @list;

*incidentally, ORDER (at least in MySQL) is numerical,

https://stackoverflow.com/questions/3968135/mysql-alphabetical-order

Cheers,
Brett

>
> So what does p5p think? RFC worthy? a stupid idea? Re-submit later when
> signatures are in more common use?
>
>
> HTH,
>
> M4
>
>
> P.S. That means 'sort \&x, @list;' probably should be valid too, which
> points to a syntax problem, why does the this one have a comma, but the
> 'sub($a,$b){}' version does not? There is some room for bikeshedding here.
> However, allowing this does not really add anything, as 'sort x @list;'
> already achieves the same thing, and easier at that. So allowing it would
> either be for consistency, or an artifact of implementation.
>
>

--
--
oodler@cpan.org
oodler577@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Sun, 4 Jul 2021 17:02:00 +0000
Nicholas Clark <nick@ccl4.org> wrote:

> Our problem here is that sort needs two implicit arguments. *Is* it
> the only operator that wants >1 implicit argument?

`reduce`

as in eg.

my $item = reduce { $a->{$b} } $href, @innerkeys;

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
Op 04-07-2021 om 19:02 schreef Nicholas Clark:

[snip]

> I *think* that your new suggestion is only "not currently valid syntax"
> because you omit the comma. If I write:
>
> $ ./perl -Ilib -lwe 'use feature "signatures"; print for sort sub ($q, $r) { 1/$q <=> 1/$r }, (.5, .25, 3, 42, 1, -1, -.1)'
>
> The signatures feature is experimental at -e line 1.
> -0.1
> -1
> 0.25
> 0.5
> 1
> 3
> 42
> CODE(0x55c991505c68)
>
>
> at which point, it looks like this plan only works because a comma isn't
> there.
>
> Put a comma in, and the code means something else.
>
> That feels like a risky design.


That certainly seems like a very good argument against it. Not that the
others are bad arguments, but this is certainly a (the?) nail in the coffin.

M4
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
* Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> [2021-07-05 20:29:49 +0100]:

> On Sun, 4 Jul 2021 17:02:00 +0000
> Nicholas Clark <nick@ccl4.org> wrote:
>
> > Our problem here is that sort needs two implicit arguments. *Is* it
> > the only operator that wants >1 implicit argument?
>
> `reduce`
>
> as in eg.
>
> my $item = reduce { $a->{$b} } $href, @innerkeys;

What is the meaning of this? I get the words you wrote, but I don't
understand what you're implying by the hypothetical code example.
That looks a lot more like a 2-arg C<map>, not a "reduce".

Do you mean something more like,

my $max = reduce { $a >= $b } @values; # not claiming semantic correctness

my $max = reduce { $a > $b } @values; # perhaps a 'stable' version

As far as I know a "reduce" results in a single value from many -
or fewer than the original number...not to lead this off the rails,
I just don't undertand the point of 'reduce' above.

Brett

>
> --
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

--
--
oodler@cpan.org
oodler577@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Mon, Jul 5, 2021 at 4:59 PM Oodler 577 via perl5-porters <
perl5-porters@perl.org> wrote:

> * Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> [2021-07-05 20:29:49
> +0100]:
>
> > On Sun, 4 Jul 2021 17:02:00 +0000
> > Nicholas Clark <nick@ccl4.org> wrote:
> >
> > > Our problem here is that sort needs two implicit arguments. *Is* it
> > > the only operator that wants >1 implicit argument?
> >
> > `reduce`
> >
> > as in eg.
> >
> > my $item = reduce { $a->{$b} } $href, @innerkeys;
>
> What is the meaning of this? I get the words you wrote, but I don't
> understand what you're implying by the hypothetical code example.
> That looks a lot more like a 2-arg C<map>, not a "reduce".
>
> Do you mean something more like,
>
> my $max = reduce { $a >= $b } @values; # not claiming semantic
> correctness
>
> my $max = reduce { $a > $b } @values; # perhaps a 'stable' version
>
> As far as I know a "reduce" results in a single value from many -
> or fewer than the original number...not to lead this off the rails,
> I just don't undertand the point of 'reduce' above.
>

It is an arbitrary mechanism to get a single value from many, yes. In
Paul's example, this single value is a deeply-nested hash key. The function
is available from List::Util, and can be used to implement (less efficient
versions of) many of the other functions in that module.

-Dan
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Mon, Jul 05, 2021 at 03:32:46PM +0000, Oodler 577 via perl5-porters wrote:
> Of course, you could get the worst of all worlds by supporting *SQL:
>
> sort q{ORDER by $_->{a} DESC} @list;
>
> *incidentally, ORDER (at least in MySQL) is numerical,
>
> https://stackoverflow.com/questions/3968135/mysql-alphabetical-order

No, MySQL sorts based on the type of the column (or value) involved,
which won't work for perl.

Every answer I see at that link is sorting alphabetically (some in
reverse.)

Tony
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
Nicholas Clark writes:

> On Sun, Jul 04, 2021 at 05:40:09PM +0200, Martijn Lievaart wrote:
>
> >     sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
>
> The first thing that thought was that this makes sort special,
> compared with the other builtins that take blocks (map, grep)
>
> But those take one implicit argument, for which $_ works just fine.
>
> Our problem here is that sort needs two implicit arguments.

Does it?

I mean, yes, obviously at the moment a sort routine requires two
arguments. But how common is it for those not to be duplicates of each
other? That is, in:

sort { f($a) <=> g($b) } @list

f() and g() are overwhelmingly identical operations. And if either of
them are involved, it's irksome to have to duplicate the operation on
both sides:

sort { lc "$a->{somefield}->@*" cmp lc "$b->{somefeild}->@*" } @list

[Ooops — typo in the second hash key!]

If we're looking at re-doing sort, it might be worth considering whether
this repetition can be avoided, rather than just at different ways of
specifying it.

Sort needs to know:

1. How to transform an item into a key to use for comparison.
2. Whether to compare with <=> or cmp.
3. Whether the output should be ascending or descending.

So instead of $a and $b, a sort routine could be specified with $_ and
an indication of whether to use <=> or cmp.

One possible design (to show the concept; I'm not suggesting
specifically this):

asort { lc "$_->{somefield}->@*" } @list

reverse nsort { $_->{age} } @list

That is:

1. A single (optional) block takes $_ and returns the sort key.
2. Comparison operator specified with the function name — cos this
seems less messy than trying to force it into an argument somewhere.
3. Ascending by default; reverse is available for occasions where
descending is required.

And it gets rid of $a and $b.

Smylers
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On 6/7/21 11:48, Smylers wrote:
> Nicholas Clark writes:
>
>> On Sun, Jul 04, 2021 at 05:40:09PM +0200, Martijn Lievaart wrote:
>>
>>>     sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
>>
>> The first thing that thought was that this makes sort special,
>> compared with the other builtins that take blocks (map, grep)
>>
>> But those take one implicit argument, for which $_ works just fine.
>>
>> Our problem here is that sort needs two implicit arguments.
>
> Does it?
>
> I mean, yes, obviously at the moment a sort routine requires two
> arguments. But how common is it for those not to be duplicates of each
> other? That is, in:
>
> sort { f($a) <=> g($b) } @list
>
> f() and g() are overwhelmingly identical operations. And if either of
> them are involved, it's irksome to have to duplicate the operation on
> both sides:
>
> sort { lc "$a->{somefield}->@*" cmp lc "$b->{somefeild}->@*" } @list
>
> [Ooops — typo in the second hash key!]
>
> If we're looking at re-doing sort, it might be worth considering whether
> this repetition can be avoided, rather than just at different ways of
> specifying it.
>
> Sort needs to know:
>
> 1. How to transform an item into a key to use for comparison.
> 2. Whether to compare with <=> or cmp.
> 3. Whether the output should be ascending or descending.
>
> So instead of $a and $b, a sort routine could be specified with $_ and
> an indication of whether to use <=> or cmp.
>
> One possible design (to show the concept; I'm not suggesting
> specifically this):
>
> asort { lc "$_->{somefield}->@*" } @list
>
> reverse nsort { $_->{age} } @list
>
> That is:
>
> 1. A single (optional) block takes $_ and returns the sort key.
> 2. Comparison operator specified with the function name — cos this
> seems less messy than trying to force it into an argument somewhere.
> 3. Ascending by default; reverse is available for occasions where
> descending is required.
>
> And it gets rid of $a and $b.

In general, when sorting, you have two customizable steps:

1) key extraction
2) key comparison

Reducing key comparison to just numeric or string comparison is not
flexible enough (or at least not user-friendly enough). For a start you
want to support having multiple keys, every one with its own comparison
operation and direction.

BTW, regarding direction, sorting and reversing is not equivalent to
sorting in the opposite direction. So if you provide "numsort" you also
need "descnumsort".

Finally, check Sort::Key family of modules. They support almost
everything related to key based sorting but at the expense of providing
a myriad of functions in order to cover all the different key type and
direction combinations.
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On Tue, Jul 6, 2021 at 5:49 AM Smylers <Smylers@stripey.com> wrote:

> Nicholas Clark writes:
>
> > On Sun, Jul 04, 2021 at 05:40:09PM +0200, Martijn Lievaart wrote:
> >
> > > sort sub($a,$b){ $a->{x} <=> $b->{x} } @list
> >
> > The first thing that thought was that this makes sort special,
> > compared with the other builtins that take blocks (map, grep)
> >
> > But those take one implicit argument, for which $_ works just fine.
> >
> > Our problem here is that sort needs two implicit arguments.
>
> Does it?
>
> I mean, yes, obviously at the moment a sort routine requires two
> arguments. But how common is it for those not to be duplicates of each
> other? That is, in:
>
> sort { f($a) <=> g($b) } @list
>
> f() and g() are overwhelmingly identical operations. And if either of
> them are involved, it's irksome to have to duplicate the operation on
> both sides:
>
> sort { lc "$a->{somefield}->@*" cmp lc "$b->{somefeild}->@*" } @list
>
> [Ooops — typo in the second hash key!]
>
> If we're looking at re-doing sort, it might be worth considering whether
> this repetition can be avoided, rather than just at different ways of
> specifying it.
>
> Sort needs to know:
>
> 1. How to transform an item into a key to use for comparison.
> 2. Whether to compare with <=> or cmp.
> 3. Whether the output should be ascending or descending.
>
> So instead of $a and $b, a sort routine could be specified with $_ and
> an indication of whether to use <=> or cmp.
>
> One possible design (to show the concept; I'm not suggesting
> specifically this):
>
> asort { lc "$_->{somefield}->@*" } @list
>
> reverse nsort { $_->{age} } @list
>
> That is:
>
> 1. A single (optional) block takes $_ and returns the sort key.
> 2. Comparison operator specified with the function name — cos this
> seems less messy than trying to force it into an argument somewhere.
> 3. Ascending by default; reverse is available for occasions where
> descending is required.
>
> And it gets rid of $a and $b.


See sort_by and nsort_by in List::UtilsBy for prior art. The main weakness
of this approach is that it does not allow you to chain comparisons, a
common way to specify how to compare entries which are considered
equivalent by the first comparison. Also that you must have separate
functions for different comparison operators and directions (though reverse
could be optimized for a core version of this operator).

my @sorted = sort { foo($a) <=> foo($b) || bar($b) cmp bar($a) } @stuff; #
not possible with *sort_by

I think *sort_by are still worthwhile ops to core eventually, even if just
by putting them in List::Util, but this really should be kept to a new
thread.

-Dan
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
Salvador Fandiño writes:

> Reducing key comparison to just numeric or string comparison is not
> flexible enough (or at least not user-friendly enough).

True, though other objects can override <=> or cmp, which provides the
opportunity for any custom sorting you want.

> For a start you want to support having multiple keys, every one with
> its own comparison operation and direction.

Good point. But it still seems a shame to have to specify each key
twice, once for $a and once for $b.

> BTW, regarding direction, sorting and reversing is not equivalent to
> sorting in the opposite direction.

Oh, I thought that Perl special-cased C<reverse sort> so that it *is*
equivalent to sorting in the opposite direction. Have you an example
where it isn't?

> So if you provide "numsort" you also need "descnumsort".

I explicitly suggesting using reverse to avoid a proliferation of
functions.

> Finally, check Sort::Key family of modules. They support almost
> everything related to key based sorting but at the expense of
> providing a myriad of functions in order to cover all the different
> key type and direction combinations.

Yeah, that's too many functions. But, looking specifically at the main
Sort::Key module: https://metacpan.org/pod/Sort::Key

• The various r*sort functions should be handle-able by reverse.
• In-place sorting is a different matter to that being discussed in this
thread, so the *sort_inplace functions aren't relevant here.
• Numerically sorting specifically as signed or unsigned integers is a
niche requirement that doesn't fit well with core Perl, which mostly
just deals with numbers but nothing more specific, so the *i*sort and
*u*sort functions are probably not relevant to this thread either.

That leaves:

• keysort, which is equivalent to what I called asort above
• nkeysort, equivalent to what I called nsort above
• nsort, which is just nkeysort without a block; given that Perl's
existing sort function manages to have an optional block, a core
nsort presumably could as well, meaning that nkeysort and nsort could
be just one function

So that's two functions. Which isn't one, but nor is it the myriad that
Sort::Key provides, while still covering many of its scenarios. It may
be that that's *enough* common cases that it's still worth doing.

As you point out above, it doesn't cover the Sort::Key::Maker
functionality of sorting by, say, descending age and then alphabetical
names where age is equal. With core sort that currently involves
something like:

sort { $b->age <=> $a->age || $a->name cmp $b->name } @list

Or with that module:

use Sort::Key::Maker
sort_year_groups => sub { $_->age, $_->name }, qw<int string>;
sort_year_groups @list;

Providing that with a built-in sort function and not repeating $a and $b
would require something *equivalent to*:

ksort [.-n => sub { $_->age }, a => sub { $_->name }], @list;

or:

ksort sub :numeric:desc { $_->age }, sub { $_->name }, \@list;

That is, a list of pairs of type/direction indicators and
keys, followed by the list of things to sort.

But there isn't really a syntax for that which fits in with any other
core function:

• It needs two separate lists: the spec list, then the list of items. I
don't think there's any other core function that handles multiple
lists.

• It needs (potentially) multiple blocks. I've put explicit sub-s in the
example above. Core functions taking a block don't require the sub
prefix, but only take one block, in a known place (and directly as an
arg, not nested inside an array).

• It needs a mini-language or options syntax for specifying the
comparison type and direction. There are core functions with something
similar, but none that directly seem to apply here:

» pack/unpack and (s)printf take a (non-optional) initial spec sting
that's specific to the function; but a single string does it,
without need for blocks.

» open takes an optional mode string. That can itself have optional
layers applied to it. open's API should probably not be copied by
anything, ever.

» seek's final argument is the WHENCE indicator, which to be specified
symbolically needs importing from a separate module. Having to
import constants from a module isn't a big advantage over using a
function or function factory from a module.

» //, s///, and tr/// have suffix option letters, which only work
because of the specific quote-like syntax of those operators. split
also uses some of these, but again only because the // syntax
enables it. There isn't a place to put suffix options on most
functions.

Perl doesn't have a general ‘here are some options for this function’
syntax which can be applied here.

So any new sort routine which avoids the need for $a and $b by only
specifying one of them (with $_) and still has full flexibility for
multiple keys would still need some new syntax which isn't (currently)
used by any other function.

It would be nice to have that. And, if it were sufficiently flexible, it
would avoid the need for separate asort and nsort functions; they'd just
be minimal cases of flexible-key-sort.

But it does make Nicholas's comment from elsewhere in this thread
relevant:

> ... to me it feels like we're now creating a *different* set of special
> case syntax for sort, to eliminate the current special-case syntax ($a, $b).
> What else might re-use this syntax?

Smylers
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On 6/7/21 22:32, Smylers wrote:
> Salvador Fandiño writes:
>
>> Reducing key comparison to just numeric or string comparison is not
>> flexible enough (or at least not user-friendly enough).
>
> True, though other objects can override <=> or cmp, which provides the
> opportunity for any custom sorting you want.
>
>> For a start you want to support having multiple keys, every one with
>> its own comparison operation and direction.
>
> Good point. But it still seems a shame to have to specify each key
> twice, once for $a and once for $b.
>
>> BTW, regarding direction, sorting and reversing is not equivalent to
>> sorting in the opposite direction.
>
> Oh, I thought that Perl special-cased C<reverse sort> so that it *is*
> equivalent to sorting in the opposite direction. Have you an example
> where it isn't?

Yes, for instance:

use Data::Dumper qw(Dumper);
@a=(1.0, 1.1, 2, 1.3, 2.4, 2.1);

print(Dumper [sort { int($b) <=> int($a) } @a]);
print(Dumper [reverse sort { int($a) <=> int($b) } @a]);


Which outputs:

$VAR1 = [
2,
'2.4',
'2.1',
1,
'1.1',
'1.3'
];
$VAR1 = [
'2.1',
'2.4',
2,
'1.3',
'1.1',
1
];


The difference is that elements with the same key are reversed. Or in
other words, "reverse sort {...}" is not an stable sorting algorithm.


>> So if you provide "numsort" you also need "descnumsort".
>
> I explicitly suggesting using reverse to avoid a proliferation of
> functions.
>
>> Finally, check Sort::Key family of modules. They support almost
>> everything related to key based sorting but at the expense of
>> providing a myriad of functions in order to cover all the different
>> key type and direction combinations.
>
> Yeah, that's too many functions. But, looking specifically at the main
> Sort::Key module: https://metacpan.org/pod/Sort::Key
>
> • The various r*sort functions should be handle-able by reverse.
> • In-place sorting is a different matter to that being discussed in this
> thread, so the *sort_inplace functions aren't relevant here.
> • Numerically sorting specifically as signed or unsigned integers is a
> niche requirement that doesn't fit well with core Perl, which mostly
> just deals with numbers but nothing more specific, so the *i*sort and
> *u*sort functions are probably not relevant to this thread either.

Yes, the main reason Sort::key supports different numeric types is to be
as fast as possible.


> That leaves:
>
> • keysort, which is equivalent to what I called asort above
> • nkeysort, equivalent to what I called nsort above
> • nsort, which is just nkeysort without a block; given that Perl's
> existing sort function manages to have an optional block, a core
> nsort presumably could as well, meaning that nkeysort and nsort could
> be just one function
>
> So that's two functions. Which isn't one, but nor is it the myriad that
> Sort::Key provides, while still covering many of its scenarios. It may
> be that that's *enough* common cases that it's still worth doing.
>
> As you point out above, it doesn't cover the Sort::Key::Maker
> functionality of sorting by, say, descending age and then alphabetical
> names where age is equal. With core sort that currently involves
> something like:
>
> sort { $b->age <=> $a->age || $a->name cmp $b->name } @list
>
> Or with that module:
>
> use Sort::Key::Maker
> sort_year_groups => sub { $_->age, $_->name }, qw<int string>;
> sort_year_groups @list;
>
> Providing that with a built-in sort function and not repeating $a and $b
> would require something *equivalent to*:
>
> ksort [.-n => sub { $_->age }, a => sub { $_->name }], @list;
>
> or:
>
> ksort sub :numeric:desc { $_->age }, sub { $_->name }, \@list;
>

> That is, a list of pairs of type/direction indicators and
> keys, followed by the list of things to sort.
>
> But there isn't really a syntax for that which fits in with any other
> core function:
>
> • It needs two separate lists: the spec list, then the list of items. I
> don't think there's any other core function that handles multiple
> lists.
>
> • It needs (potentially) multiple blocks. I've put explicit sub-s in the
> example above. Core functions taking a block don't require the sub
> prefix, but only take one block, in a known place (and directly as an
> arg, not nested inside an array).
>
> • It needs a mini-language or options syntax for specifying the
> comparison type and direction. There are core functions with something
> similar, but none that directly seem to apply here:
>
> » pack/unpack and (s)printf take a (non-optional) initial spec sting
> that's specific to the function; but a single string does it,
> without need for blocks.
>
> » open takes an optional mode string. That can itself have optional
> layers applied to it. open's API should probably not be copied by
> anything, ever.
>
> » seek's final argument is the WHENCE indicator, which to be specified
> symbolically needs importing from a separate module. Having to
> import constants from a module isn't a big advantage over using a
> function or function factory from a module.
>
> » //, s///, and tr/// have suffix option letters, which only work
> because of the specific quote-like syntax of those operators. split
> also uses some of these, but again only because the // syntax
> enables it. There isn't a place to put suffix options on most
> functions.
>
> Perl doesn't have a general ‘here are some options for this function’
> syntax which can be applied here.
>
> So any new sort routine which avoids the need for $a and $b by only
> specifying one of them (with $_) and still has full flexibility for
> multiple keys would still need some new syntax which isn't (currently)
> used by any other function.

Well, actually it doesn't!

You can just chain key sorts. For instance:

@sorted = numsort { $_->{foo} }
descnumsort { $_->{bar} }
sort { $_->{doz} }
@data;

Now it is just(!) a matter of having the interpreter recognize such
construction and optimize it into just one efficient sorting operation.

Note that I am not saying that this is the best solution, just that it
is possible. Well, actually I find the code above quite ugly and would
prefer something like this:

@sorted = sort :numeric { $_->{foo} }
sort :numeric :desc { $_->{bar} }
sort { $_->{doz} }
@data;

A drawback of chaining several sort calls is that you can not reuse code
from one key extraction block to the next. For instance, something
pretty common is using a regular expression to extract several keys as in:

multikeysort { split(/,/, $_)[3,4] } @data;
Re: Elevator pitch, deprecating $a/$b sort globals by using sub{} [ In reply to ]
On 2021-07-06 1:32 p.m., Smylers wrote:
...
> That is, a list of pairs of type/direction indicators and
> keys, followed by the list of things to sort.
>
> But there isn't really a syntax for that which fits in with any other
> core function:
>
> • It needs two separate lists: the spec list, then the list of items. I
> don't think there's any other core function that handles multiple
> lists.
>
> • It needs (potentially) multiple blocks. I've put explicit sub-s in the
> example above. Core functions taking a block don't require the sub
> prefix, but only take one block, in a known place (and directly as an
> arg, not nested inside an array).
>
> • It needs a mini-language or options syntax for specifying the
> comparison type and direction. There are core functions with something
> similar, but none that directly seem to apply here:
...
>> ... to me it feels like we're now creating a *different* set of special
>> case syntax for sort, to eliminate the current special-case syntax ($a, $b).
>> What else might re-use this syntax?

It seems to me there is a MUCH simpler and more generic solution here. Once one
even starts thinking about a special mini-language we should just skip that and
just use Perl.

Here is a complete pure-Perl implementation of my proposal:

sub newsort {
my ($ord_func, $key_func, $list) = @_;
return [
map { $_->[1] }
sort { $ord_func->($a->[0], $b->[0]) }
map { [$key_func->($_), $_] }
@{$list}
];
}

You use it like this:

$sorted = newsort( sub { $_[0] cmp $_[1] }, sub { $_[0] }, $unsorted );

That sample call just emulates the behavior of normal built-in sort, and doing
any other kind of sort is an exercise left for the reader.

The key feature here is that all the special logic is in the form of ordinary
sub references, aka regular Perl code, which can have as complicated logic as
the users want, no special sub-language.

This provides the desired algorithmic efficiency such that the logic to produce
a total sort key is just done once per input element and cached.

Hypothetically if the $key_func always produced keys of some kind of
standardized format, such as an arrayref of elements to compare pairwise (such
as [$_->{name}, $_->{age}] etc, then the user-provided $ord_func could be
skipped entirely because a particular generic system-defined sort function would
be implicitly used which worked with the standardized format. But that would
assume a language where every array element's type could be appropriately
detected and the right sort function automatically be used. For arrayrefs is
the simple per-element pairwise, for text cmp(), and so on.

Further to the prior paragraph, if that would work, then this infrastructure
could also be reused for implementing an always-sorted collection similar to a
SQL database table on possibly multiple columns. The only aspect that is
user-defined in either case is the $key_func that derives from each collection
element a normalized format, and the actual ordering logic is built-in only.

-- Darren Duncan