Mailing List Archive

Pre-RFC: s/.../.../gg Really globally substitute
Note this idea came out of a code golf discussion but I feel it has
merit in verbose normal code too so I'm posting it here. Programming
Perl contains the follow exert:

---
When a global substitution just isn’t global enough

Occasionally, you can’t just use a /g to get all the changes to occur,
either because
the substitutions overlap or have to happen right to left, or because
you need the
length of $` to change between matches. You can usually do what you want by
calling s/// repeatedly. However, you want the loop to stop when the
s/// finally
fails, so you have to put it into the conditional, which leaves
nothing to do in the
main part of the loop. So we just write a 1, which is a rather boring
thing to do,
but bored is the best you can hope for sometimes. Here are some examples that
use a few more of those odd regex beasties that keep popping up:

# put commas in the right places in an integer
1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/;

# expand tabs to 8?column spacing
1 while s/\t+/" " x (length($&)*8 ? length($`)%8)/e;

# remove (nested (even deeply nested (like this))) remarks
1 while s/\([^()]*\)//g;

# remove duplicate words (and triplicate (and quadruplicate...))
1 while s/\b(\w+) \1\b/$1/gi;
---

I feel like we could replace this pattern with a dedicated flag that
is clearer and potentially faster, /gg, e.g.

s/(\d)(\d\d\d)(?!\d)/$1,$2/gg;

One thing I like about this proposal is it would allow us to use /r
without needing to add extra variables and copies, e.g.

say $cost =~ s/(\d)(\d\d\d)(?!\d)/$1,$2/ggr;

Doubling up a flag to mean that but more makes sense when you consider
/aa or /xx, or I guess /ee but unlike /ee I don't think we should
target an explicit number of iterations, just once, or as many as
possible. Obviously this construct has the potential to inf loop, but
so does the postfix while.

Things to consider:
- Should we return the total number of of replacements when not using /r.
- Should it apply to m// too or just s/// like /e?
- Should it go breadth or depth first?

Thoughts?
Re: Pre-RFC: s/.../.../gg Really globally substitute [ In reply to ]
Hi there,

On Wed, 29 Jun 2022, James Raspass wrote:

> ...
> I feel like we could replace this pattern with a dedicated flag that
> is clearer and potentially faster, /gg, e.g.
>
> s/(\d)(\d\d\d)(?!\d)/$1,$2/gg;
> ...
> Thoughts?

I don't see any pressing need.

It would just be asking for problems.

I prefer it the way it is.

There are more important things to be getting on with.

--

73,
Ged.
Re: Pre-RFC: s/.../.../gg Really globally substitute [ In reply to ]
This feels pretty corner-casey to me. I've been writing Perl for 15+
years now and I haven't run in to this yet.

It's an interesting idea, for sure, I don't know if it's worth the
time/effort it would take to implement though. Right now, I think we
have bigger fish to fry.

- Scott

On 6/29/22 06:25, James Raspass wrote:
> Note this idea came out of a code golf discussion but I feel it has
> merit in verbose normal code too so I'm posting it here. Programming
> Perl contains the follow exert:
>
> ---
> When a global substitution just isn’t global enough
>
> Occasionally, you can’t just use a /g to get all the changes to occur,
> either because
> the substitutions overlap or have to happen right to left, or because
> you need the
> length of $` to change between matches. You can usually do what you want by
> calling s/// repeatedly. However, you want the loop to stop when the
> s/// finally
> fails, so you have to put it into the conditional, which leaves
> nothing to do in the
> main part of the loop. So we just write a 1, which is a rather boring
> thing to do,
> but bored is the best you can hope for sometimes. Here are some examples that
> use a few more of those odd regex beasties that keep popping up:
>
> # put commas in the right places in an integer
> 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/;
>
> # expand tabs to 8?column spacing
> 1 while s/\t+/" " x (length($&)*8 ? length($`)%8)/e;
>
> # remove (nested (even deeply nested (like this))) remarks
> 1 while s/\([^()]*\)//g;
>
> # remove duplicate words (and triplicate (and quadruplicate...))
> 1 while s/\b(\w+) \1\b/$1/gi;
> ---
>
> I feel like we could replace this pattern with a dedicated flag that
> is clearer and potentially faster, /gg, e.g.
>
> s/(\d)(\d\d\d)(?!\d)/$1,$2/gg;
>
> One thing I like about this proposal is it would allow us to use /r
> without needing to add extra variables and copies, e.g.
>
> say $cost =~ s/(\d)(\d\d\d)(?!\d)/$1,$2/ggr;
>
> Doubling up a flag to mean that but more makes sense when you consider
> /aa or /xx, or I guess /ee but unlike /ee I don't think we should
> target an explicit number of iterations, just once, or as many as
> possible. Obviously this construct has the potential to inf loop, but
> so does the postfix while.
>
> Things to consider:
> - Should we return the total number of of replacements when not using /r.
> - Should it apply to m// too or just s/// like /e?
> - Should it go breadth or depth first?
>
> Thoughts?
>
Re: Pre-RFC: s/.../.../gg Really globally substitute [ In reply to ]
On Wed, 29 Jun 2022 14:25:37 +0100, James Raspass <jraspass@gmail.com> wrote:

> Note this idea came out of a code golf discussion but I feel it has
> merit in verbose normal code too so I'm posting it here. Programming
> Perl contains the follow exert:

I like it. I don't know if this is the best way to implement it,
but I've wanted this occasionally, as it reads much cleaner than
the `1 while`

I agree however with the people that see no value in it, that is
it not on my most wanted feature list. I also forsee problems when
explaining s///egg (the combination with /e)

> ---
> When a global substitution just isn’t global enough
>
> Occasionally, you can’t just use a /g to get all the changes to occur, either
> because the substitutions overlap or have to happen right to left, or because
> you need the length of $` to change between matches. You can usually do what
> you want by calling s/// repeatedly. However, you want the loop to stop when
> the s/// finally fails, so you have to put it into the conditional, which
> leaves nothing to do in the main part of the loop. So we just write a 1, which
> is a rather boringthing to do, but bored is the best you can hope for sometimes.
> Here are some examples that use a few more of those odd regex beasties that
> keep popping up:
>
> # put commas in the right places in an integer
> 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/;
>
> # expand tabs to 8?column spacing
> 1 while s/\t+/" " x (length($&)*8 ? length($`)%8)/e;
>
> # remove (nested (even deeply nested (like this))) remarks
> 1 while s/\([^()]*\)//g;
>
> # remove duplicate words (and triplicate (and quadruplicate...))
> 1 while s/\b(\w+) \1\b/$1/gi;
> ---
>
> I feel like we could replace this pattern with a dedicated flag that is clearer
> and potentially faster, /gg, e.g.
>
> s/(\d)(\d\d\d)(?!\d)/$1,$2/gg;
>
> One thing I like about this proposal is it would allow us to use /r without
> needing to add extra variables and copies, e.g.
>
> say $cost =~ s/(\d)(\d\d\d)(?!\d)/$1,$2/ggr;
>
> Doubling up a flag to mean that but more makes sense when you consider /aa or
> /xx, or I guess /ee but unlike /ee I don't think we should target an explicit
> number of iterations, just once, or as many as possible. Obviously this
> construct has the potential to inf loop, but so does the postfix while.
>
> Things to consider:
> - Should we return the total number of of replacements when not using /r.
> - Should it apply to m// too or just s/// like /e?
> - Should it go breadth or depth first?
>
> Thoughts?


--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.35 porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
Re: Pre-RFC: s/.../.../gg Really globally substitute [ In reply to ]
On Wed, Jun 29, 2022 at 9:50 AM G.W. Haywood via perl5-porters <
perl5-porters@perl.org> wrote:

> Hi there,
>
> On Wed, 29 Jun 2022, James Raspass wrote:
>
> > ...
> > I feel like we could replace this pattern with a dedicated flag that
> > is clearer and potentially faster, /gg, e.g.
> >
> > s/(\d)(\d\d\d)(?!\d)/$1,$2/gg;
> > ...
> > Thoughts?
>
> I don't see any pressing need.
>
> It would just be asking for problems.
>
> I prefer it the way it is.
>
> There are more important things to be getting on with.
>

I have had a couple instances where this would be useful. But I have to
agree, it is not worth a feature for when the use case is rare and one can
simply loop. Additionally I would have a concern that such a feature would
commonly lead to infinite looping.

-Dan