Mailing List Archive: trim - wrapping up the discussion

Re: trim - wrapping up the discussion [ In reply to ]

Apr 1, 2021, 5:03 AM

Post #26 of 37 (749 views)

On Wed, 31 Mar 2021 21:12:24 +0100 neilb@neilb.org wrote:

> What do you think we should do?
>
> 1. Two separate but similarly named functions. Names TBD.
> 2. One trim() function that returns the trimmed string
> 3. One trim() function that edits in place
> 4. Leave it to CPAN
> 5. Given we've missed the 5.34 boat, perform a wider review of text
> processing gaps in Perl, possibly resulting in a broader proposal,
> which might change how we think about trim.

My vote: 2.

1. - also good, maybe better but keep "trim" name for one function.
3. - also good, btw, how is "chomp" implemented; I think "trim" should be
implemented and used in a similar way.
5. - definitely a good idea for future, but let's not miss this chance to
have "trim". The list of all "PHP" string functions distributed
earlier was really inspirational. Perl needs to its string-processing
strengths and even move towards more NLP.
4. - No. This is leaving status quo and it is the least preferred option.

Regards,
Vlado

Re: trim - wrapping up the discussion [ In reply to ]

doughera at lafayette

Apr 1, 2021, 6:27 AM

Post #27 of 37 (749 views)

Permalink

On Wed, Mar 31, 2021 at 09:12:24PM +0100, neilb@neilb.org wrote:

> What do you think we should do?
>
> 1. Two separate but similarly named functions. Names TBD.
> 2. One trim() function that returns the trimmed string
> 3. One trim() function that edits in place
> 4. Leave it to CPAN
> 5. Given we've missed the 5.34 boat, perform a wider review of text processing gaps in Perl, possibly resulting in a broader proposal, which might change how we think about trim.
>

I vote #3. I imagine I would typically use it in the while(<>) loop idiom.

--
Andy Dougherty doughera@lafayette.edu

Re: trim - wrapping up the discussion [ In reply to ]

andrew at afresh1

Apr 1, 2021, 10:09 AM

Post #28 of 37 (749 views)

Permalink

On Thu, Apr 01, 2021 at 04:36:14AM -0500, B. Estrade wrote:
>
>
> On 3/31/21 10:26 PM, Ben Bullock wrote:
> > On Thu, 1 Apr 2021 at 09:16, shmem <gm@qwurx.de <mailto:gm@qwurx.de>> wrote:
> >
> > >? 4. Leave it to CPAN
> >
> > This. Or not even that. Have a perlfaq entry about trim.
> >
> >
> > https://perldoc.perl.org/perlfaq4#How-do-I-strip-blank-space-from-the-beginning/end-of-a-string?
> >
> > has been there for 24 years:
> >
> > https://github.com/Perl/perl5/blame/blead/cpan/perlfaq/lib/perlfaq4.pod
> >
> >
> >>On 3/31/21 6:58 PM, shmem wrote:
> >>
> <snip>
> >>> 4. Leave it to CPAN
> >> This. Or not even that. Have a perlfaq entry about trim.
> >>
> >> $ perldoc -q trim
> >> No documentation for perl FAQ keyword 'trim' found
>
> You missed the crucial element of that comment ^^.
>
> Brett

This problem is easy to solve (if someone better at words wants to
submit their own PR, I will withdraw mine).
(I didn't actually read the FAQ entry to know if it needed improvement
beyond the title)

https://github.com/Perl/perl5/pull/18675

l8rZ,
--
andrew - http://afresh1.com

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs, and the Universe
trying to produce bigger and better idiots. So far, the Universe is
winning." -- Rich Cook

Re: trim - wrapping up the discussion [ In reply to ]

nick at ccl4

Apr 3, 2021, 8:07 AM

Post #29 of 37 (749 views)

Permalink

On Wed, Mar 31, 2021 at 09:12:24PM +0100, neilb@neilb.org wrote:

I realise that in starting to write this message, that as I agree with this:

> 5. Given we've missed the 5.34 boat, perform a wider review of text processing gaps in Perl, possibly resulting in a broader proposal, which might change how we think about trim.

technically that might render all the other answers below moot. However, as
a chunk of decision making from me and others is "how do I think I would use
this?" the counter is that having something in blead and trying it out might
answer more than a *lot* of abstract thinking e-mail.

In the general case I'd also be concerned that there isn't an obvious way to
backport this such that CPAN code can depend on "core-or-pollyfill". But as
this one is a two-line implementation (for the efficient version) I don't
think that anyone will put in extra work to optionally use a core trim in
code published to CPAN.

I see trim as something to improve future programs, and address a small wart
in the language. If you need portability you'll still be "just inlining it"
for a good few years to come. But if your company codebase runs on something
modern, you're golden.

> What do you think we should do?

> 4. Leave it to CPAN

This is the "no change" option. I don't like this.

trim is a FAQ. Roughly, I see every "Frequently Asked Question" as
indication that there's something wrong with the language - ie the existence
of a FAQ implies a deficiency, that it might be good to fix.

This particular FAQ *starts*

=head2 How do I strip blank space from the beginning/end of a string?

(contributed by brian d foy)

A substitution can do this for you. For a single line, you want to
replace all the leading or trailing whitespace with nothing. You
can do that with a pair of substitutions:

s/^\s+//;
s/\s+$//;

You can also write that as a single substitution, although it turns
out the combined statement is slower than the separate ones. That
might not matter to you, though:

s/^\s+|\s+$//g;

(and continues with explaining the details here. To be clear - I think that
brian's answer is appropriate and on-target for the current situation)

The bit that always grates when I read this is that there is no single
*right* answer - there's the *faster* answer, and the 1-liner answer
(may or may not be viewed as prettier - again, no single right answer)

I'd like to be able to say

Use C<trim> and be done.

But if you need to support legacy versions...

(I'm aware that there's a joke that "for every problem there's an answer
which is simple, obvious and wrong" - I don't think that trap applies here)

> 1. Two separate but similarly named functions. Names TBD.

Neil comments in an e-mail:

The built-in will (a) do what you meant, (b) be faster, and (c) be easier
to remember.

I don't like the idea of *two* versions. That's just even more things to
remember, without much functionality win - it's not going to make future
Perl easier to read for future newbies. Whilst technically we could have
infinite built in functions (all easy to search *if you already know the
name* and optimised for their task), I'd prefer that we add a few, well
chosen new builtins that combine to give maximum improvements, rather than
the same number of builtins that are mostly just variants of each other.

Someone else (Chris Prather?) took the quote

Perfection is achieved, not when there is nothing more to add, but when
there is nothing left to take away.

and observed that Antoine de Saint-Exup?ry was not famous for publishing
empty books. ie that this wasn't an argument that the only perfection was
zero.

To my mind, having two versions of "almost the same thing" can be improved
by choosing just one instead.

> 2. One trim() function that returns the trimmed string
> 3. One trim() function that edits in place

The choice between these two seems to come down to personal preference
between how one sees it being used. People are envisaging how they would
use it, and people are looking at what happens on CPAN.

I'm not convinced that the CPAN "results" showing more use "in place" are
actually useful - there wasn't enough context. (And getting context from
grep and analysing it is going to be extremely hard).

If one looks up the FAQ and then simply inlines it, that code is *in place*.
The FAQ doesn't mention s///r (and I think that it is correct not to), as
that would limit the code to newer Perl versions (5.14.0 or later, I think)
And I think that anyone "rolling their own" *for a module on CPAN* likely
will also be coding for the lowest version pre-req they can reasonably
target, to get the widest reach.

You can (mostly) implement one in terms of the other. The one thing that you
*can't* easily do is a trim analogy to `chomp @list` but I'm unaware of
every *seeing* code that had a list in an array and wanted to (just) do
this to it. Most/all code I see with chomp is doing it in a loop, on the
loop iterator, and chomp is the first step in a processing pipeline
*expressed as statements*.

ie I don't think that `trim @list` is ever going to be useful. Even less
useful than `study` or `reset`*

However most code I've encountered actually doing trim-like actions has
been doing it on return results from functions. Which are RVALUEs, and will
be assigned.

The "returns the trimmed string" version is also far cleaner in actual
functional code such as map.

You can implement either with the other. Tony C also notes that:

The existing PR does optimize on:

$y = trim $x;

where $y and $x, however complex they are as expressions, turn out to
be the same SV, but the code doesn't try to eliminate any other
duplication. eg. if $x and $y are lexicals, two padsv ops are produced.

So internally it can be (nearly) as performant as the "in place" version for
the "in place" case. For the general case, there's no way that perl's
peephole optimiser is good enough to make the reverse true for - spot all
the cases where there's an assignment first, and then generate a different
optree that eliminates that initial assignment.

Hence I'm preferring *just* the "returns the trimmed string" version.

Nicholas Clark

* They are useful. You can set a breakpoint on the OP in gdb, hack your
testcase to call one of these, and likely the only time your breakpoint is
hit is when the Perl code execution has reached the point you were
interested in. No, they are no longer useful for their intended purpose.

Re: trim - wrapping up the discussion [ In reply to ]

darren at darrenduncan

Apr 3, 2021, 1:54 PM

Post #30 of 37 (749 views)

Permalink

On 2021-04-03 8:07 a.m., Nicholas Clark wrote:
> In the general case I'd also be concerned that there isn't an obvious way to
> backport this such that CPAN code can depend on "core-or-pollyfill". But as
> this one is a two-line implementation (for the efficient version) I don't
> think that anyone will put in extra work to optionally use a core trim in
> code published to CPAN.

There is a way that is obvious to me, which I had previously shared, which is to
implement trim/etc soley as an addition to a dual-life module such as
Scalar::Util and include that version with the latest Perl.

Unless we decide to implement this as a modifier to regular expression syntax,
its just a plain old routine and there's no reason it has to be any more core
than a dual-life module.

-- Darren Duncan

Re: trim - wrapping up the discussion [ In reply to ]

matthew.persico at gmail

Apr 11, 2021, 5:24 PM

Post #31 of 37 (747 views)

Permalink

One of the things that drives me nuts is the number of times I write

@foo = sort..map..grep...@source

and somewhere in there I do

{ chomp }

and it takes 30 minutes of head scratching to remember

{ chomp; $_}

I don't care if trim() ends up in core or Scalar::Util but please, please,
PLEASE, make the damn thing RETURN the trimmed arg. Honestly, I am about
thisclose to putting in a PR to Scalar::Utils for a function called
rchomp() - chomp that ALSO returns the chomped item.

Thank you.

On Sat, Apr 3, 2021 at 4:54 PM Darren Duncan <darren@darrenduncan.net>
wrote:

> On 2021-04-03 8:07 a.m., Nicholas Clark wrote:
> > In the general case I'd also be concerned that there isn't an obvious
> way to
> > backport this such that CPAN code can depend on "core-or-pollyfill". But
> as
> > this one is a two-line implementation (for the efficient version) I don't
> > think that anyone will put in extra work to optionally use a core trim in
> > code published to CPAN.
>
> There is a way that is obvious to me, which I had previously shared, which
> is to
> implement trim/etc soley as an addition to a dual-life module such as
> Scalar::Util and include that version with the latest Perl.
>
> Unless we decide to implement this as a modifier to regular expression
> syntax,
> its just a plain old routine and there's no reason it has to be any more
> core
> than a dual-life module.
>
> -- Darren Duncan
>

--
Matthew O. Persico

Re: trim - wrapping up the discussion [ In reply to ]

brett at cpanel

Apr 11, 2021, 6:41 PM

Post #32 of 37 (747 views)

Permalink

On 4/11/21 7:24 PM, Matthew Persico wrote:
> One of the things that drives me nuts is the number of times I write
>
> @foo = sort..map..grep...@source
>
> and somewhere in there I do
>
> { chomp }
>
> and it takes 30 minutes of head scratching to remember
>
> { chomp; $_}

I think this a case of it being astonishingly consistent.

map {
chomp;
} @source;

Is behaving as it would with:

while (<$fh>) {
chomp;
# .. do something with $_
}

So it seems the argument of "least astonishment" is necessary, but not
sufficient to cover all definitions of "astonishment".

It's a pickle, for sure.

Brett

>
> I don't care if trim() ends up in core or Scalar::Util but please,
> please, PLEASE, make the damn thing RETURN the trimmed arg. Honestly, I
> am about thisclose to putting in a PR to Scalar::Utils for a function
> called rchomp() - chomp that ALSO returns the chomped item.
>
> Thank you.
>
> On Sat, Apr 3, 2021 at 4:54 PM Darren Duncan <darren@darrenduncan.net
> <mailto:darren@darrenduncan.net>> wrote:
>
> On 2021-04-03 8:07 a.m., Nicholas Clark wrote:
> > In the general case I'd also be concerned that there isn't an
> obvious way to
> > backport this such that CPAN code can depend on
> "core-or-pollyfill". But as
> > this one is a two-line implementation (for the efficient version)
> I don't
> > think that anyone will put in extra work to optionally use a core
> trim in
> > code published to CPAN.
>
> There is a way that is obvious to me, which I had previously shared,
> which is to
> implement trim/etc soley as an addition to a dual-life module such as
> Scalar::Util and include that version with the latest Perl.
>
> Unless we decide to implement this as a modifier to regular
> expression syntax,
> its just a plain old routine and there's no reason it has to be any
> more core
> than a dual-life module.
>
> -- Darren Duncan
>
>
>
> --
> Matthew O. Persico

Re: trim - wrapping up the discussion [ In reply to ]

leonerd at leonerd

Apr 12, 2021, 6:15 AM

Post #33 of 37 (747 views)

Permalink

On Sun, 11 Apr 2021 20:24:52 -0400
Matthew Persico <matthew.persico@gmail.com> wrote:

> One of the things that drives me nuts is the number of times I write
>
> @foo = sort..map..grep...@source
>
> and somewhere in there I do
>
> { chomp }
>
> and it takes 30 minutes of head scratching to remember
>
> { chomp; $_}
>
> I don't care if trim() ends up in core or Scalar::Util but please,
> please, PLEASE, make the damn thing RETURN the trimmed arg. Honestly,
> I am about thisclose to putting in a PR to Scalar::Utils for a
> function called rchomp() - chomp that ALSO returns the chomped item.

Good idea.

If I remember, I'll pop that in when I'm next doing a release.

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/

Re: trim - wrapping up the discussion [ In reply to ]

brett at cpanel

Apr 12, 2021, 6:24 AM

Post #34 of 37 (747 views)

Permalink

On 4/12/21 8:15 AM, Paul "LeoNerd" Evans wrote:
> On Sun, 11 Apr 2021 20:24:52 -0400
> Matthew Persico <matthew.persico@gmail.com> wrote:
>
>> One of the things that drives me nuts is the number of times I write
>>
>> @foo = sort..map..grep...@source
>>
>> and somewhere in there I do
>>
>> { chomp }
>>
>> and it takes 30 minutes of head scratching to remember
>>
>> { chomp; $_}
>>
>> I don't care if trim() ends up in core or Scalar::Util but please,
>> please, PLEASE, make the damn thing RETURN the trimmed arg. Honestly,
>> I am about thisclose to putting in a PR to Scalar::Utils for a
>> function called rchomp() - chomp that ALSO returns the chomped item.
>
> Good idea.
>
> If I remember, I'll pop that in when I'm next doing a release.
>

Pop what in where?

Brett

Re: trim - wrapping up the discussion [ In reply to ]

gm at qwurx

Apr 12, 2021, 7:26 AM

Post #35 of 37 (747 views)

Permalink

From the keyboard of Matthew Persico [11.04.21,20:24]:

> One of the things that drives me nuts is the number of times I write
>
> @foo = sort..map..grep...@source
>
> and somewhere in there I do
>
> { chomp }
>
> and it takes 30 minutes of head scratching to remember
>
> { chomp; $_}
>
> I don't care if trim() ends up in core or Scalar::Util but please, please,
> PLEASE, make the damn thing RETURN the trimmed arg. Honestly, I am about
> thisclose to putting in a PR to Scalar::Utils for a function called
> rchomp() - chomp that ALSO returns the chomped item.

Well, that's not a complling reason not to implement it context-aware.
If you can't remember how chomp works, it's not perls fault or an error
of implementation. You are telling us how /your/ brain works, not how
the language element should behave in a coherent way, given this is perl.

OTOH it feels rather silly to me to have to write

while (<>) {
$_ = trim $_;
...
}

which feels like back in the days of some unsuccessful BASIC where x=x+1
so I'd rather have trim() operate inplace in void context and return its
trimmed arguments otherwise.

What is more, I'd really like to have the perl context system expanded
making BOOLEAN a first-class context like VOID, SCALAR and LIST, so trim
in BOOLEAN context would work inplace and return TRUE iff its arg was
trimmed, e.g.

while (<>) {
if (trim) {
warn "superfluous surrounding whitespace at line $.\n";
}
... # do something with trimmed $_
}

The BOOLEAN context might be useful elsewhere. wantarray() could return
-1 in BOOLEAN context, or some such.

After all perl is a "linguistic" computer language; there could be more
useful contexts besides VOID SCALAR LIST and BOOLEAN, e.g. WITH¹ or REF.
Apologies if this has been discussed long ago and far away e.g. in the
Apoclypses leading to perl6/raku.

This opens the question: is context awareness a good thing to have and
likely to be expanded, or is it too hard to remember, too confusing
and thus likely to be phased out in the long run?

¹) I think of any builtin operating on $_ by default being in the WITH
context, so a builtin "with EXPR" would automagically set $_ to the
result of EXPR during its scope, even for builtins. EXPR might also be
a BLOCK being run when accessing $_, i.e. inside a map() or grep().

best regards,
0--gg-

--
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Re: trim - wrapping up the discussion [ In reply to ]

gm at qwurx

Apr 12, 2021, 7:38 AM

Post #36 of 37 (747 views)

Permalink

From the keyboard of B. Estrade [11.04.21,20:41]:

> I think this a case of it being astonishingly consistent.

No. Not at all.

> map {
> chomp;
> } @source;
>
> Is behaving as it would with:
>
> while (<$fh>) {
> chomp;
> # .. do something with $_

here ---^^^^^^^^^^^^^^^^^^^^

> }

Inside the while() you acually /do/ something with $_, whilst in the
map you do not. And chomp in the map actually returns something, which
is total number of characters removed.

If you don't do something with $_ in the while loop, the effect of
chomp is NIL. In the map, you also have to do something, at least state
$_. An empty map block is NIL also:

perl -E 'say map {;} (1,2,3)'

So, the behavior is consistent, anything else would be astonishing.

> So it seems the argument of "least astonishment" is necessary, but not
> sufficient to cover all definitions of "astonishment".
>
> It's a pickle, for sure.
>
> Brett
>
>>
>> I don't care if trim() ends up in core or Scalar::Util but please, please,
>> PLEASE, make the damn thing RETURN the trimmed arg. Honestly, I am about
>> thisclose to putting in a PR to Scalar::Utils for a function called
>> rchomp() - chomp that ALSO returns the chomped item.
>>
>> Thank you.
>>
>> On Sat, Apr 3, 2021 at 4:54 PM Darren Duncan <darren@darrenduncan.net
>> <mailto:darren@darrenduncan.net>> wrote:
>>
>> On 2021-04-03 8:07 a.m., Nicholas Clark wrote:
>> > In the general case I'd also be concerned that there isn't an
>> obvious way to
>> > backport this such that CPAN code can depend on
>> "core-or-pollyfill". But as
>> > this one is a two-line implementation (for the efficient version)
>> I don't
>> > think that anyone will put in extra work to optionally use a core
>> trim in
>> > code published to CPAN.
>>
>> There is a way that is obvious to me, which I had previously shared,
>> which is to
>> implement trim/etc soley as an addition to a dual-life module such as
>> Scalar::Util and include that version with the latest Perl.
>>
>> Unless we decide to implement this as a modifier to regular
>> expression syntax,
>> its just a plain old routine and there's no reason it has to be any
>> more core
>> than a dual-life module.
>>
>> -- Darren Duncan
>>
>>
>>
>> --
>> Matthew O. Persico
>

0--gg-

--
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Re: trim - wrapping up the discussion [ In reply to ]

brett at cpanel

Apr 12, 2021, 7:55 AM

Post #37 of 37 (747 views)

Permalink

On 4/12/21 9:38 AM, shmem wrote:
> From the keyboard of B. Estrade [11.04.21,20:41]:
>
>> I think this a case of it being astonishingly consistent.
>
> No. Not at all.
>
>> map {
>> chomp;
>> } @source;
>>
>> Is behaving as it would with:
>>
>> while (<$fh>) {
>> chomp;
>> # .. do something with $_
>
> here ---^^^^^^^^^^^^^^^^^^^^
>
>> }
>
> Inside the while() you acually /do/ something with $_, whilst in the
> map you do not. And chomp in the map actually returns something, which
> is total number of characters removed.
>
> If you don't do something with $_ in the while loop, the effect of
> chomp is NIL. In the map, you also have to do something, at least state
> $_. An empty map block is NIL also:
>
>     perl -E 'say map {;} (1,2,3)'
>
> So, the behavior is consistent, anything else would be astonishing.

We agree.

Sorry if I was not clear. That's what I was saying.

"astonishingly consistent" = "the behavior is consistent, but I see how
if used in map it could be surprising".

It's also notable that chomp does return something already:

perldoc -f chomp
...
It returns the total number of characters removed from
all its arguments.

Also, I am not sure what was meant in the earlier response regarding:

>
> I don't care if trim() ends up in core or Scalar::Util but please,
> please, PLEASE, make the damn thing RETURN the trimmed arg. Honestly,
> I am about thisclose to putting in a PR to Scalar::Utils for a
> function called rchomp() - chomp that ALSO returns the chomped item.
>>
>> Good idea.
>>
>> If I remember, I'll pop that in when I'm next doing a release.
>>

A clarification on this would be appreciated.

Brett

>
>> So it seems the argument of "least astonishment" is necessary, but not
>> sufficient to cover all definitions of "astonishment".
>>
>> It's a pickle, for sure.
>>
>> Brett
>>
>>>
>>> I don't care if trim() ends up in core or Scalar::Util but please,
>>> please, PLEASE, make the damn thing RETURN the trimmed arg. Honestly,
>>> I am about thisclose to putting in a PR to Scalar::Utils for a
>>> function called rchomp() - chomp that ALSO returns the chomped item.
>>>
>>> Thank you.
>>>
>>> On Sat, Apr 3, 2021 at 4:54 PM Darren Duncan <darren@darrenduncan.net
>>> <mailto:darren@darrenduncan.net>> wrote:
>>>
>>>     On 2021-04-03 8:07 a.m., Nicholas Clark wrote:
>>>      > In the general case I'd also be concerned that there isn't an
>>>     obvious way to
>>>      > backport this such that CPAN code can depend on
>>>     "core-or-pollyfill". But as
>>>      > this one is a two-line implementation (for the efficient version)
>>>     I don't
>>>      > think that anyone will put in extra work to optionally use a core
>>>     trim in
>>>      > code published to CPAN.
>>>
>>>     There is a way that is obvious to me, which I had previously shared,
>>>     which is to
>>>     implement trim/etc soley as an addition to a dual-life module
>>> such as
>>>     Scalar::Util and include that version with the latest Perl.
>>>
>>>     Unless we decide to implement this as a modifier to regular
>>>     expression syntax,
>>>     its just a plain old routine and there's no reason it has to be any
>>>     more core
>>>     than a dual-life module.
>>>
>>>     -- Darren Duncan
>>>
>>>
>>>
>>> --
>>> Matthew O. Persico
>>
>
> 0--gg-
>