Mailing List Archive

Revisiting trim
As a reminder, after a lot of discussion, there was broad agreement that we should add the capability for trimming a string, and we should keep it simple: trim whitespace from both ends of the string, with no parameterisation. Here "whitespace" means [:space:], so consistent Unicode semantics, whatever the internal encoding of the string.

But there was disagreement on whether it should trim in place, or return the trimmed string. Maybe we should have two functions: trim() to edit in place, and trimmed() to return the trimmed string. Rik, Sawyer, and Neil were all fans of the in-place edit.

After lots of discussion on github and p5p, the steering council also solicited input from people with a lot of experience training people and/or writing books to explain Perl.

The end result of all of that is:

• We think we should add a single function to Perl.
• It should return the trimmed string.
• It should be called "trimmed", not "trim".

This is not a mandate, but it is our recommendation.

What convinced us on the naming issue was an example from Tom Christiansen of a large company that had a similar internal discussion, and found that people were confused over how they expected it to work, until they changed the name from "trim" to "trimmed". This also fits in with one of Larry's original desires, that Perl be a readable language. The function returns a trimmed version of its input:

$x = trimmed $y; # "x is a trimmed version of y"

We think that "trim" would have been the right name for editing in place:

trim $x;

There are only two distributions on CPAN that define a function called "trimmed", so we suspect that this will clash with less code in the wild. There are plenty of distributions that define a trim(), but not all of them perform the function we're talking about here.

It may be tempting to claim that if we called it "trim" rather than "trimmed", then it "would just work" for the majority of people with an existing trim function (whether imported from CPAN or home-brewed). But whatever a new builtin is called, existing code would not use it by default, so everyone would have to make edits to enable it, no matter what name we give it.

There's a related topic of namespaces for functions like this. We're deliberately not addressing that here, as we suspect it will fall out of the broader review of text/string processing capabilities, which we want completed before 5.36 (i.e. before trimmed would be included in a stable release anyway).

We're now interested to hear what p5p thinks of this.

Neil, Rik, Nick
Re: Revisiting trim [ In reply to ]
On 5/26/21 4:20 PM, Neil Bowers wrote:
> As a reminder, after a lot of discussion, there was broad agreement
> that we should add the capability for trimming a string, and we should
> keep it simple: trim whitespace from both ends of the string, with no
> parameterisation. Here "whitespace" means [:space:], so consistent
> Unicode semantics, whatever the internal encoding of the string.
>
> What convinced us on the naming issue was an example from Tom
> Christiansen of a large company that had a similar internal
> discussion, and found that people were confused over how they expected
> it to work, until they changed the name from "trim" to "trimmed". This
> also fits in with one of Larry's original desires, that Perl be a
> readable language. The function returns a trimmed version of its input:
>
> $x = trimmed $y; # "x is a trimmed version of y"
>
> We think that "trim" would have been the right name for editing in place:
>
> trim $x;
>
what about trimmed using context? in a void context it trims in place.
in scalar (or non-void) it returns the trimmed string and leaves the
input unchanged. only one new function and we use context everywhere so
it is familiar.

uri
Re: Revisiting trim [ In reply to ]
On Wednesday, May 26th, 2021 at 21:55, Uri Guttman <uri@stemsystems.com> wrote:

> what about trimmed using context? in a void context it trims in place. in scalar (or non-void) it returns the trimmed string and leaves the input unchanged. only one new function and we use context everywhere so it is familiar.

While I like the idea, we would need to make other methods to have similar behavior:
- lc / uc / ucfirst
- chomp / chop
and probably others.

My 5 cents
Re: Revisiting trim [ In reply to ]
On 5/26/21 2:13 PM, Alberto Simões wrote:
>
> On Wednesday, May 26th, 2021 at 21:55, Uri Guttman
> <uri@stemsystems.com> wrote:
>
>>
>> what about trimmed using context? in a void context it trims in
>> place. in scalar (or non-void) it returns the trimmed string and
>> leaves the input unchanged. only one new function and we use context
>> everywhere so it is familiar.
>>
>>
> While I like the idea, we would need to make other methods to have
> similar behavior:
>    - lc / uc / ucfirst
>    - chomp / chop
> and probably others.
>
> My 5 cents

Using context for trim() has already been discussed several times and
shot down.

- Scott
Re: Revisiting trim [ In reply to ]
On 5/26/21 5:13 PM, Alberto Simões wrote:
>
> On Wednesday, May 26th, 2021 at 21:55, Uri Guttman
> <uri@stemsystems.com> wrote:
>
>>
>> what about trimmed using context? in a void context it trims in
>> place. in scalar (or non-void) it returns the trimmed string and
>> leaves the input unchanged. only one new function and we use context
>> everywhere so it is familiar.
>>
>>
> While I like the idea, we would need to make other methods to have
> similar behavior:
>    - lc / uc / ucfirst
>    - chomp / chop
> and probably others.
>
>

chop/chomp already have return values so they can't be changed. the
ls/uc ones could have void context to work.

another issue is that void context needs an lvalue as an argument so it
can modify in place.
the pass results versions (in non-void context) can take any expression.

uri
Re: Revisiting trim [ In reply to ]
??????? Original Message ???????
On Wednesday, May 26, 2021 3:20 PM, Neil Bowers <neilb@neilb.org> wrote:

> As a reminder, after a lot of discussion, there was broad agreement that we should add the capability for trimming a string, and we should keep it simple: trim whitespace from both ends of the string, with no parameterisation. Here "whitespace" means [:space:], so consistent Unicode semantics, whatever the internal encoding of the string.
>
> But there was disagreement on whether it should trim in place, or return the trimmed string. Maybe we should have two functions: trim() to edit in place, and trimmed() to return the trimmed string. Rik, Sawyer, and Neil were all fans of the in-place edit.
>
> After lots of discussion on github and p5p, the steering council also solicited input from people with a lot of experience training people and/or writing books to explain Perl.
>
> The end result of all of that is:
>
> - We think we should add a single function to Perl.
> - It should return the trimmed string.
> - It should be called "trimmed", not "trim".
>
> This is not a mandate, but it is our recommendation.

"trimmed" is fair, though I suspect people are going to want:

* trim also
* someone will ask for "chomped" and/or "tromp"

Please don't implement this in perl core, though. Please. Seriously.

> What convinced us on the naming issue was an example from Tom Christiansen of a large company that had a similar internal discussion, and found that people were confused over how they expected it to work, until they changed the name from "trim" to "trimmed". This also fits in with one of Larry's original desires, that Perl be a readable language. The function returns a trimmed version of its input:
>
> $x = trimmed $y; # "x is a trimmed version of y"
>
> We think that "trim" would have been the right name for editing in place:
>
> trim $x;
>
> There are only two distributions on CPAN that define a function called "trimmed", so we suspect that this will clash with less code in the wild. There are plenty of distributions that define a trim(), but not all of them perform the function we're talking about here.
>
> It may be tempting to claim that if we called it "trim" rather than "trimmed", then it "would just work" for the majority of people with an existing trim function (whether imported from CPAN or home-brewed). But whatever a new builtin is called, existing code would not use it by default, so everyone would have to make edits to enable it, no matter what name we give it.
>
> There's a related topic of namespaces for functions like this. We're deliberately not addressing that here, as we suspect it will fall out of the broader review of text/string processing capabilities, which we want completed before 5.36 (i.e. before trimmed would be included in a stable release anyway).

I think waiting on this discussion/decision is terrible mistake. All you need is a dual-life module that exports stuff via the familiar "use feature" syntax (or better under Experimental::*). Shoving things in perl core should be the absolute last stop for proven functionity that needs to be fast or for fundamental capabilities of the perl runtime needed to support useful and interesting features that are themselves implemented at much higher levels.

Cheers,
Brett

> We're now interested to hear what p5p thinks of this.
>
> Neil, Rik, Nick
Re: Revisiting trim [ In reply to ]
On Thu, 27 May 2021 at 09:57, mah.kitteh via perl5-porters
<perl5-porters@perl.org> wrote:
> I think waiting on this discussion/decision is terrible mistake. All you need is a ...

Given the problems the world faces I would not say "terrible" but I
agree it is a mistake. I would suggest a different "all you need is a"
than you I think (although maybe what you mean by Experimental::*
would cover my view), but I think sorting out where things should go
is a far more important decision than "should we add trim(med) to the
core". For me if its put in the right place where it can't cause
language conflict and is part of building an orderly and viable future
then I have no issue with it being in core. The *where* is the problem
for me.

Anyway, as for the proposal, if a trim like function is going to be
added to the standard keyword set, I think doing it as trimmed() with
the semantics outlined in this thread is at least a touch more
palatable than trim() which will definitely cause trouble all over.

But I really really appeal to those in charge these days to address
this issue of where new functional (not control) keywords go and how
it can be done in a forwards and backwards compatible way (meaning use
feature is out). I feel really strongly that a proper decision on that
subject will make all the rest of the debates on other functions much
less controversial. Eg, so we have trimmed(), when (and where) do
ltrimmed and rtrimmed get added? Do we just endlessly accrete new
keywords into the main part of the language? It just seems to create
so much unnecessary acrimony. Figure out a clean way to resolve the
forwards/backwards compatibility issue (which is pretty easy with well
chosen namespaces) and IMO almost all of the acrimony will go away.
Why should anyone care if a new speciality function gets added to a
fenced off namespace?

cheers,
Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: Revisiting trim [ In reply to ]
On Wed, 26 May 2021 22:20:22 +0200, Neil Bowers <neilb@neilb.org> wrote:

> $x = trimmed $y; # "x is a trimmed version of y"
>
> We're now interested to hear what p5p thinks of this.

Sounds perfect to me. Making the functionality as accessible as possible is the goal, and that is a perfectly good name and limits the scope of changes to the minimum possible.

--
With regards,
Christian Walde
Re: Revisiting trim [ In reply to ]
On 5/26/21 1:20 PM, Neil Bowers wrote:
>
> * We think we should add a single function to Perl.
> * It should return the trimmed string.
> * It should be called "trimmed", not "trim".
>
> This is not a mandate, but it is our recommendation.

As the original author of this I appreciate the clear and concise
response. Thank you PSC for continuing to meet and discuss these issues
and provide guidance.

Personally I'd prefer *trim()* instead of *trimmed()* just for
consistency with other languages:

* PHP = trim() <https://www.php.net/manual/en/function.trim.php>
* Javascript = trim()
<https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim>
* Raku = trim() <https://docs.raku.org/routine/trim>
* Go = trim()
<https://www.geeksforgeeks.org/how-to-trim-a-string-in-golang/>
* Vimscript = trim()
<https://github.com/vim/vim/commit/295ac5ab5e840af6051bed5ec9d9acc3c73445de>
* PowerShell = trim()
<https://devblogs.microsoft.com/scripting/trim-your-strings-with-powershell/>
* VBA = trim() <https://trumpexcel.com/vba-trim/>
* C# = trim()
<https://www.c-sharpcorner.com/uploadfile/mahesh/trim-string-in-C-Sharp/>
* String::Util = trim()
<https://metacpan.org/pod/String::Util#trim($string),-ltrim($string),-rtrim($string)>
* Text::Trim = trim() <https://metacpan.org/pod/Text::Trim>
* Lisp = string_trim() <http://clhs.lisp.se/Body/f_stg_tr.htm>
* Python = strip()
<https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip>
* Ruby = strip()
<https://ruby-doc.org/core-2.7.1/String.html#method-i-strip>

If we go with *trimmed()* we'll definitely be an outlier. Since the PSC
has agreed this is a valuable feature, and should be included in core
(bike shedding is done), the only thing left to debate before we have a
final implementation is the name.

I'd like to being work in earnest next Monday on this feature. Can we
debate the best name here for a couple days so I can begin work on the
final feature? I have a large rebase on the PR to do, and some other
tweaking.

- Scott
Re: Revisiting trim [ In reply to ]
--
oodler@cpan.org
?https://github.com/oodler577
#pdl #p5p #p7-dev #native @ irc.perl.org

Sent with [ProtonMail](https://protonmail.com) Secure Email.

??????? Original Message ???????
On Thursday, May 27, 2021 11:36 AM, Scott Baker <scott@perturb.org> wrote:

> On 5/26/21 1:20 PM, Neil Bowers wrote:
>
>> - We think we should add a single function to Perl.
>> - It should return the trimmed string.
>> - It should be called "trimmed", not "trim".
>>
>> This is not a mandate, but it is our recommendation.
>
> As the original author of this I appreciate the clear and concise response. Thank you PSC for continuing to meet and discuss these issues and provide guidance.
>
> Personally I'd prefer trim() instead of trimmed() just for consistency with other languages:
>
> - PHP = [trim()](https://www.php.net/manual/en/function.trim.php)
>
> - Javascript = [trim()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim)
>
> - Raku = [trim()](https://docs.raku.org/routine/trim)
>
> - Go = [trim()](https://www.geeksforgeeks.org/how-to-trim-a-string-in-golang/)
>
> - Vimscript = [trim()](https://github.com/vim/vim/commit/295ac5ab5e840af6051bed5ec9d9acc3c73445de)
>
> - PowerShell = [trim()](https://devblogs.microsoft.com/scripting/trim-your-strings-with-powershell/)
>
> - VBA = [trim()](https://trumpexcel.com/vba-trim/)
>
> - C# = [trim()](https://www.c-sharpcorner.com/uploadfile/mahesh/trim-string-in-C-Sharp/)
>
> - String::Util = [trim()](https://metacpan.org/pod/String::Util#trim($string),-ltrim($string),-rtrim($string))
>
> - Text::Trim = [trim()](https://metacpan.org/pod/Text::Trim)
>
> - Lisp = [string_trim()](http://clhs.lisp.se/Body/f_stg_tr.htm)
>
> - Python = [strip()](https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip)
>
> - Ruby = [strip()](https://ruby-doc.org/core-2.7.1/String.html#method-i-strip)
>
> If we go with trimmed() we'll definitely be an outlier. Since the PSC has agreed this is a valuable feature, and should be included in core (bike shedding is done), the only thing left to debate before we have a final implementation is the name.

a. being an outlier is an indicator that you're in the lead, generally; PHP, e.g., has a whole set of functions that are interfaces to PRCE..thought I forget what the "P" in that means...

b. "be included in core (bike shedding is done)" - WAY to early to state this, and it's therefore ambigious what the first part of this statement even refers to or means

> I'd like to being work in earnest next Monday on this feature. Can we debate the best name here for a couple days so I can begin work on the final feature? I have a large rebase on the PR to do, and some other tweaking.

This is quite presumptuous. There has been no conversation on where to place this. It's very concerning to me that there has also been very little discussion about "where" to place this "single" (yeah right) core feature. At this point, and mainly due to the pressure and rush being applied to this, my general concern as I said last night is not necessarily "trim" as the POC is currently implemented; but what comes after "trim" and how it's handed - string related or not. So what's the rush? No rush exists other than the proof of concept work facing potential bit rot. That's not really perl's problem.

Brett

> - Scott
Re: Revisiting trim [ In reply to ]
On Thu, 27 May 2021 16:44:42 +0000
"mah.kitteh via perl5-porters" <perl5-porters@perl.org> wrote:


> This is quite presumptuous. There has been no conversation on where to place this. It's very concerning to me that there has also been very little discussion about "where" to place this "single" (yeah right) core feature. At this point, and mainly due to the pressure and rush being applied to this, my general concern as I said last night is not necessarily "trim" as the POC is currently implemented; but what comes after "trim" and how it's handed - string related or not. So what's the rush? No rush exists other than the proof of concept work facing potential bit rot. That's not really perl's problem.

There was a lot of conversation. There are literally *hundreds* of posts
about trim on p5p and github. The discussion has been going for almost a
year now.

https://github.com/Perl/perl5/issues/17952
https://github.com/Perl/perl5/pull/17999

In chronological order:

https://www.nntp.perl.org/group/perl.perl5.porters/2020/07/msg258058.html
https://www.nntp.perl.org/group/perl.perl5.porters/2020/11/msg258544.html
https://www.nntp.perl.org/group/perl.perl5.porters/2021/02/msg259118.html
https://www.nntp.perl.org/group/perl.perl5.porters/2021/03/msg259427.html
https://www.nntp.perl.org/group/perl.perl5.porters/2021/03/msg259615.html
Re: Revisiting trim [ In reply to ]
??????? Original Message ???????
On Thursday, May 27, 2021 11:59 AM, Tomasz Konojacki <me@xenu.pl> wrote:

> On Thu, 27 May 2021 16:44:42 +0000
> "mah.kitteh via perl5-porters" perl5-porters@perl.org wrote:
>
> > This is quite presumptuous. There has been no conversation on where to place this. It's very concerning to me that there has also been very little discussion about "where" to place this "single" (yeah right) core feature. At this point, and mainly due to the pressure and rush being applied to this, my general concern as I said last night is not necessarily "trim" as the POC is currently implemented; but what comes after "trim" and how it's handed - string related or not. So what's the rush? No rush exists other than the proof of concept work facing potential bit rot. That's not really perl's problem.
>
> There was a lot of conversation. There are literallyhundreds of posts
> about trim on p5p and github. The discussion has been going for almost a
> year now.
>
> https://github.com/Perl/perl5/issues/17952
> https://github.com/Perl/perl5/pull/17999
>
> In chronological order:
>
> https://www.nntp.perl.org/group/perl.perl5.porters/2020/07/msg258058.html
> https://www.nntp.perl.org/group/perl.perl5.porters/2020/11/msg258544.html
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/02/msg259118.html
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/03/msg259427.html
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/03/msg259615.html

I am aware of those, but I appreciate you taking the time to provide the links.

What I can't seem to find is the conversation on why it needs to be implemented at such a low level. If I understood this particular piece with some clarity then I'd be happy to never post about "trim" again.

Cheers,
Brett
Re: Revisiting trim [ In reply to ]
On Thu, 27 May 2021 17:09:34 +0000
"mah.kitteh via perl5-porters" <perl5-porters@perl.org> wrote:

> What I can't seem to find is the conversation on why it needs to be
> implemented at such a low level. If I understood this particular
> piece with some clarity then I'd be happy to never post about "trim"
> again.

It doesn't. It'd be great if core perl had a _much_ lighter-weight
mechanism for doing all of this. If maybe someone could write a ~10line
C function to attach to e.g. `string::trim` in a way that doesn't
require a _huge_ disturbance to keywords and parsing and opcodes and
everything else, then we could use that same mechanism to apply a huge
number more core functions under namespaces like string::, math::,
scalar:: and so on, and have a lot more useful utility functions
around, without all this heavyweight stuff.

That mechanism doesn't exist.

Yet.

Want to write it? ;)

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Revisiting trim [ In reply to ]
??????? Original Message ???????
On Thursday, May 27, 2021 12:13 PM, Paul "LeoNerd" Evans <leonerd@leonerd.org.uk> wrote:

> On Thu, 27 May 2021 17:09:34 +0000
> "mah.kitteh via perl5-porters" perl5-porters@perl.org wrote:
>
> > What I can't seem to find is the conversation on why it needs to be
> > implemented at such a low level. If I understood this particular
> > piece with some clarity then I'd be happy to never post about "trim"
> > again.
>
> It doesn't. It'd be great if core perl had amuch lighter-weight
> mechanism for doing all of this. If maybe someone could write a ~10line
> C function to attach to e.g. `string::trim` in a way that doesn't
> require a huge disturbance to keywords and parsing and opcodes and
> everything else, then we could use that same mechanism to apply a huge
> number more core functions under namespaces like string::, math::,
> scalar:: and so on, and have a lot more useful utility functions
> around, without all this heavyweight stuff.

Thank you, I can get behind this.

>
> That mechanism doesn't exist.
>
> Yet.
>
> Want to write it? ;)

No.

Cheers,
Brett

>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Paul "LeoNerd" Evans
>
> leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
> http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Revisiting trim [ In reply to ]
I'm opposed to extending core, unless everything in for instance List::Util
is going to start being in there too. What's wrong with including some kind
of standard String::Util that could have all these things that can easily
be written with a single s/// ?

trim
rtrim
ltrim
concat
extend
...

and everything would operate in-place in void context and return copies
otherwise?
Re: Revisiting trim [ In reply to ]
On Thu, 27 May 2021, 19:10 mah.kitteh via perl5-porters, <
perl5-porters@perl.org> wrote:

> ??????? Original Message ???????
> On Thursday, May 27, 2021 11:59 AM, Tomasz Konojacki <me@xenu.pl> wrote:
>
> > On Thu, 27 May 2021 16:44:42 +0000
> > "mah.kitteh via perl5-porters" perl5-porters@perl.org wrote:
> >
> > > This is quite presumptuous. There has been no conversation on where to
> place this. It's very concerning to me that there has also been very little
> discussion about "where" to place this "single" (yeah right) core feature.
> At this point, and mainly due to the pressure and rush being applied to
> this, my general concern as I said last night is not necessarily "trim" as
> the POC is currently implemented; but what comes after "trim" and how it's
> handed - string related or not. So what's the rush? No rush exists other
> than the proof of concept work facing potential bit rot. That's not really
> perl's problem.
> >
> > There was a lot of conversation. There are literallyhundreds of posts
> > about trim on p5p and github. The discussion has been going for almost a
> > year now.
> >
> > https://github.com/Perl/perl5/issues/17952
> > https://github.com/Perl/perl5/pull/17999
> >
> > In chronological order:
> >
> >
> https://www.nntp.perl.org/group/perl.perl5.porters/2020/07/msg258058.html
> >
> https://www.nntp.perl.org/group/perl.perl5.porters/2020/11/msg258544.html
> >
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/02/msg259118.html
> >
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/03/msg259427.html
> >
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/03/msg259615.html
>
> I am aware of those, but I appreciate you taking the time to provide the
> links.
>
> What I can't seem to find is the conversation on why it needs to be
> implemented at such a low level. If I understood this particular piece with
> some clarity then I'd be happy to never post about "trim" again.
>

If you mean "why does this warrant C level implementation" then there are a
couple of answers, the simplest one being that the particular type of regex
engine we use doesn't deal with this type of pattern well. A more complex
version would be it is not a DFA and does not know how to match utf8
backwards and it is non trivial to teach it to do so. And people tend to
write the worst possible regexen to do it anyway. The end result is that
trimming strings can be a surprisingly expensive task if not done artfully,
and the code to do it is pretty cryptic so having a function really helps
performance and code clarity.

Having said that, making the function return a result and not do inplace
edit is a massive speed penalty and will likely mean that those using
custom xs already to do this (my workplace) won't migrate. At least for us
the point is to do it quickly, not to do it in a more self explanatory way.

Anyway, I just wanted to point out that doing trim properly in perl with
its bifocal strings and taking account of utf8 and unicode whitespace rules
is not quite as trivial as it might sound.

Yves



>
Re: Revisiting trim [ In reply to ]
On Thu, 27 May 2021 21:13:35 +0200
demerphq <demerphq@gmail.com> wrote:

> Having said that, making the function return a result and not do
> inplace edit is a massive speed penalty and will likely mean that
> those using custom xs already to do this (my workplace) won't
> migrate. At least for us the point is to do it quickly, not to do it
> in a more self explanatory way.

The implementation already detects if target SV == source SV, and edits
in-place if that is the case.

$str = trim $str;

will be an inplace edit.

Don't conflate "the user must write `trim $str` as a mutating keyword"
with "the implementation will mutate an existing SV inplace".

--
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Re: Revisiting trim [ In reply to ]
On Fri, 28 May 2021 at 01:36, Scott Baker <scott@perturb.org> wrote:

> Personally I'd prefer *trim()* instead of *trimmed()* just for
> consistency with other languages:
>
> - PHP = trim() <https://www.php.net/manual/en/function.trim.php>
> - Javascript = trim()
> <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim>
>
> In Javascript it's actually a method on a string so " string ".trim()

https://www.w3schools.com/jsref/jsref_trim_string.asp


>
> - Raku = trim() <https://docs.raku.org/routine/trim>
> - Go = trim()
> <https://www.geeksforgeeks.org/how-to-trim-a-string-in-golang/>
>
> There is strings.Trim but it is not equivalent. The equivalent to your
Perl proposal is strings.TrimSpace:

https://golang.org/pkg/strings/#TrimSpace

>
> - Vimscript = trim()
> <https://github.com/vim/vim/commit/295ac5ab5e840af6051bed5ec9d9acc3c73445de>
> - PowerShell = trim()
> <https://devblogs.microsoft.com/scripting/trim-your-strings-with-powershell/>
> - VBA = trim() <https://trumpexcel.com/vba-trim/>
>
> It's actually called Trim (with a capital letter) in VBA. There is also
LTrim and RTrim.


>
> - C# = trim()
> <https://www.c-sharpcorner.com/uploadfile/mahesh/trim-string-in-C-Sharp/>
> - String::Util = trim()
> <https://metacpan.org/pod/String::Util#trim($string),-ltrim($string),-rtrim($string)>
> - Text::Trim = trim() <https://metacpan.org/pod/Text::Trim>
> - Lisp = string_trim() <http://clhs.lisp.se/Body/f_stg_tr.htm>
>
>
Lisp doesn't usually use underscores to separate words and it doesn't use
final brackets for arguments to functions either so this cannot be right.


>
> -
> - Python = strip()
> <https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip>
> - Ruby = strip()
> <https://ruby-doc.org/core-2.7.1/String.html#method-i-strip>
>
> This does scotch the argument that "trim" is the most familiar form for
othe programming languages.


>
>
> If we go with *trimmed()* we'll definitely be an outlier.
>
Perl is already an outlier, who else uses "next" and "last" instead of
"break" and "continue", or uses -> for members rather than .?

> Since the PSC has agreed this is a valuable feature, and should be
> included in core (bike shedding is done), the only thing left to debate
> before we have a final implementation is the name.
>
To avoid over-lengthy discussions, in my opinion the person who does the
implementation should basically have the right to choose the name at the
experimental stage. Then, if the name causes a problem in practice, it can
be changed. But a lot of these discussions on this mailing list have
involved imaginary people with imagined problems.

> I'd like to being work in earnest next Monday on this feature. Can we
> debate the best name here for a couple days so I can begin work on the
> final feature? I have a large rebase on the PR to do, and some other
> tweaking.
>
The very best possible name for this function is "trim". Or whatever you
want to call it.
Re: Revisiting trim [ In reply to ]
On Thu, 27 May 2021 at 22:17, Paul "LeoNerd" Evans
<leonerd@leonerd.org.uk> wrote:
>
> On Thu, 27 May 2021 21:13:35 +0200
> demerphq <demerphq@gmail.com> wrote:
>
> > Having said that, making the function return a result and not do
> > inplace edit is a massive speed penalty and will likely mean that
> > those using custom xs already to do this (my workplace) won't
> > migrate. At least for us the point is to do it quickly, not to do it
> > in a more self explanatory way.
>
> The implementation already detects if target SV == source SV, and edits
> in-place if that is the case.
>
> $str = trim $str;
>
> will be an inplace edit.
>
> Don't conflate "the user must write `trim $str` as a mutating keyword"
> with "the implementation will mutate an existing SV inplace".

Ah, so that would be this implementation is hairier than it would need
to be if the argument was modified in place without this type of
detection, it also explains one of your other comments that didnt make
sense to me.

Thanks,

Yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: Revisiting trim [ In reply to ]
Hi.
As a long-time perl *applications* programmer, I'd like to contribute a couple of things
and ask a question.

1) maybe 50% of the usage of perl I've had over the last 30 years (and probably 95% of the
CPU time used with it over that same time) has consisted of processing text (historically
Terabytes of it, and still Gigabytes of it every day) in more or less complex ways,
something which perl has always been particularly good at. "Good" being understood as "you
can do anything with it" and "fast".

2) if there would be a trim() (or trimmed()) function directly in the base language, it
would be welcome, not only for its functionality itself, but as a way to avoid those
ever-recurring nagging comments from non-perl people about how "unnecessarily
complicated/clumsy" this looks like in perl, as comnpared to all these "more modern"
languages where it is built-in. (So, see this at least in part as a little drop in the
general bucket of avoiding things which could discourage new potential perl aficionados).

3) many many times when processing textual data, it is convenient and/or necessary to
strip *trailing* spaces, /without/ stripping *leading* spaces. Trailing spaces are
generally not significant and mostly use up disk/memory space unnecessarily.
But leading spaces often fulfill some need for alignment or syntax, and should not always
be stripped. Thus, if a single trimmed() function was provided, which always trims both
sides, it would in my view be insufficient, make its usage quite conditional, and even
sometimes make the deciphering of code (written by someone else) more difficult.
(Like : did they *know* that it trims both sides ? or was that a typo ?). And it would
still leave the "trim only trailing spaces" functionality to be expressed differently,
which sounds a bit awkward, even if quite fits the TIMTOWTDI basic perl philosophy.
In other words, I would strongly favor either 3 functions (trimmed, rtrimmed, ltrimmed) or
trimmed($subject{,"L(eft)"|"B(oth)"|"R(ight)"}), with the default being Both.
(which kind of suggests 1|0|-1 instead as 2d optional argument, a bit like substr() and
co. where "-1" tends to mean "start from the end backwards", no ?)
(And maybe ltrimmed and rtrimmed can just be internal "aliases" to trimmed)

4) due to the expectations of vintage perl programmers in what regards perl's
text-processing prouesse (see above), *if* such function(s) were to be provided, one would
expect it/them to be at least as fast as the best ("unnecessarily complicated/clumsy
looking") regex achieving the same thing.

And finally, the question : several times in this discussion I have read that, left to
their own devices currently (meaning with regexp), naive perl programmers do it "in the
worst way possible".
Now which way is that ?
I admit that for 30+ years, I have been doing this without much thinking about it (once I
got over my initial wonder 30 years ago at there not being a trim() function) :

my $line = <>; # e.g.
my $stripped_line = $line; # keep the original as is, work on a copy
$stripped_line =~ s/^\s+//; $stripped_line =~ /\s+$//; # or only one of those, depends

Is /that/ the worst possible way ? or if not *the* worst, was there a better way all along
? (*)

(I should probably add that in 30 years, I heve probably not written a single perl program
where some form of the above trimming did not happen).

(*) if yes, knowing this from the beginning would probably have helped avoiding the
current climate crisis

On 28.05.2021 09:26, demerphq wrote:
> On Thu, 27 May 2021 at 22:17, Paul "LeoNerd" Evans
> <leonerd@leonerd.org.uk> wrote:
>>
>> On Thu, 27 May 2021 21:13:35 +0200
>> demerphq <demerphq@gmail.com> wrote:
>>
>>> Having said that, making the function return a result and not do
>>> inplace edit is a massive speed penalty and will likely mean that
>>> those using custom xs already to do this (my workplace) won't
>>> migrate. At least for us the point is to do it quickly, not to do it
>>> in a more self explanatory way.
>>
>> The implementation already detects if target SV == source SV, and edits
>> in-place if that is the case.
>>
>> $str = trim $str;
>>
>> will be an inplace edit.
>>
>> Don't conflate "the user must write `trim $str` as a mutating keyword"
>> with "the implementation will mutate an existing SV inplace".
>
> Ah, so that would be this implementation is hairier than it would need
> to be if the argument was modified in place without this type of
> detection, it also explains one of your other comments that didnt make
> sense to me.
>
> Thanks,
>
> Yves
>
>
Re: Revisiting trim [ In reply to ]
André Warnier (tomcat/perl) <aw@ice-sa.com> wrote:

> $stripped_line =~ s/^\s+//; $stripped_line =~ /\s+$//; # or only one of those, depends

> Is /that/ the worst possible way ? or if not *the* worst, was there a better way all along ? (*)

That's a very reasonable way of doing it which may very well be the
best way (though you dropped an "s" on the second "s///").

They were probably referring to a tendency of many programmers to
obsess with trimming the left and right with a single s/// operation,
which will result in a hairy, unreadable solution that won't peform
any better than just doing it in two steps.

I've no strong feelings on the "trim" discussion, but I think you
argue well that the "rtrim" case is pretty common.

I think tchrist probably has a point about the clarity of "trimmed",
but I suspect if it'd been up to Larry Wall, he'd have gone with the
shortest form. For some reason "trim", "trim('R')" and "trim('L')"
seem perlish too me (though I gather "parameterization" is supposed to
be off the table at this point, so an R/L argument would be
controversial, too).

I see that in Raku, the routines are called "trim", "trim-leading" and
"trim-trailing". (None of these trim in-place, to do that you'd use
this idiom: "$line.=trim;").

My apologies if it seems like we're re-opening old discussions at this
point, but it's a problem in these debates that there's no easy way to
review what's already been talked to death.
Re: Revisiting trim [ In reply to ]
On Fri, May 28, 2021 at 12:26 PM Joseph Brenner <doomvox@gmail.com> wrote:

> André Warnier (tomcat/perl) <aw@ice-sa.com> wrote:
>
> > $stripped_line =~ s/^\s+//; $stripped_line =~ /\s+$//; # or only one of
> those, depends
>
> > Is /that/ the worst possible way ? or if not *the* worst, was there a
> better way all along ? (*)
>
> That's a very reasonable way of doing it which may very well be the
> best way (though you dropped an "s" on the second "s///").
>
> They were probably referring to a tendency of many programmers to
> obsess with trimming the left and right with a single s/// operation,
> which will result in a hairy, unreadable solution that won't peform
> any better than just doing it in two steps.
>
> I've no strong feelings on the "trim" discussion, but I think you
> argue well that the "rtrim" case is pretty common.
>
> I think tchrist probably has a point about the clarity of "trimmed",
> but I suspect if it'd been up to Larry Wall, he'd have gone with the
> shortest form. For some reason "trim", "trim('R')" and "trim('L')"
> seem perlish too me (though I gather "parameterization" is supposed to
> be off the table at this point, so an R/L argument would be
> controversial, too).
>
> I see that in Raku, the routines are called "trim", "trim-leading" and
> "trim-trailing". (None of these trim in-place, to do that you'd use
> this idiom: "$line.=trim;").
>
> My apologies if it seems like we're re-opening old discussions at this
> point, but it's a problem in these debates that there's no easy way to
> review what's already been talked to death.
>

My two cents on the parameterized trims:

1) trim-right and trim-left are certainly reasonable use cases, *however*
they are not as common a need across CPAN and general code.

2) The Perlish way is to add an option rather than similar functions with
slightly different names.

3) Such an option or additional functions can be added later; even possibly
during the two-year experimental window of the feature.

-Dan
Re: Revisiting trim [ In reply to ]
??????? Original Message ???????
On Friday, May 28, 2021 11:25 AM, Joseph Brenner <doomvox@gmail.com> wrote:

> André Warnier (tomcat/perl) aw@ice-sa.com wrote:
>
> > $stripped_line =~ s/^\s+//; $stripped_line =~ /\s+$//; # or only one of those, depends
>
> > Is /that/ the worst possible way ? or if not the worst, was there a better way all along ? (*)
>
> That's a very reasonable way of doing it which may very well be the
> best way (though you dropped an "s" on the second "s///").
>
> They were probably referring to a tendency of many programmers to
> obsess with trimming the left and right with a single s/// operation,
> which will result in a hairy, unreadable solution that won't peform
> any better than just doing it in two steps.

This is a good and generally applicable point for a lot of things; it smacks at the heart of "premature optimization if the root of all evil*"...

* except for 3% of the time when it's a trivial optimization

>
> I've no strong feelings on the "trim" discussion, but I think you
> argue well that the "rtrim" case is pretty common.
>
> I think tchrist probably has a point about the clarity of "trimmed",
> but I suspect if it'd been up to Larry Wall, he'd have gone with the
> shortest form. For some reason "trim", "trim('R')" and "trim('L')"
> seem perlish too me (though I gather "parameterization" is supposed to
> be off the table at this point, so an R/L argument would be
> controversial, too).

The length of the function is proportional to the frequency of use, this is the "Huffman encoding" aspect of "WWLD" (what would Larry do?).

Related to this discussion, that might no have been brought up; just for more context and information:

* 2 chars - uc [used primarily to normalize input from what I've seen]
* 7 chars - ucfirst [pretty sure I have *never* used this on purpose]
* 2 chars - lc [used same way generally as uc]

It's worth to note that they return the affected value and are non-destructive. But since 'trim' has been most often couched in terms of 'chomp', that is what's defining that whole part of the discussion.

>
> I see that in Raku, the routines are called "trim", "trim-leading" and
> "trim-trailing". (None of these trim in-place, to do that you'd use
> this idiom: "$line.=trim;").
>
> My apologies if it seems like we're re-opening old discussions at this
> point, but it's a problem in these debates that there's no easy way to
> review what's already been talked to death.

This horse is not dead. For me the most important aspect, as I've stated, is the precendent this can set (for good or ill) regarding but not limited to:

* a coherent and consistent strategy for DWIM string functions (which has been recognized by the PRC, tyvm)
* the question of *where* to put things (core vs CPAN/dual-life, namespaces, etc)
* and refining how "features" or "experiments" are handled wrt, among other things, backward and furture compatibilities (also seems to have been recognized by the PRC; again tyvm)

So this is not about 'trim'; it truly is what comes after. And since we have this opportunity now to take a step back, it's worth discussing. The issue of trim being efficatious is a part of this discussion; but not the "real" discussion IMO.

Cheers,
Brett
Re: Revisiting trim [ In reply to ]
> On May 27, 2021, at 1:57 AM, demerphq <demerphq@gmail.com> wrote:
> Do we just endlessly accrete new keywords into the main part of the language?

Yes. And we appreciate how much more useful the language gets over time because of it.

--
Aaron Priven, aaron@priven.com, www.priven.com/aaron
Re: Revisiting trim [ In reply to ]
On 28.05.2021 18:31, Dan Book wrote:
> My two cents on the parameterized trims:
>
> 1) trim-right and trim-left are certainly reasonable use cases, *however* they are not as
> common a need across CPAN and general code.

That's one way of looking at it.

I understand that you need a criterium to estimate the usefulness and/or appeal of a
proposed new keyword/function the language. But maybe counting how often it appears in a
(even large) set of code does not always tell the whole story ?

Another way would be to wonder at how often such code might be *executed*.

As a trivial and circumstancial example if I may :
Earlier this week I exported an SQL Server table of 157 million rows at 25 columns per
row, initially as a 14 GB CSV file. For reasons I shall not get into here, all the columns
came out as fixed-length, values right-appended with spaces. The ultimate goal was to
convert this to JSON, so to avoid a lot of unnecessary volume (JSON is already a lot more
verbose than CSV), I chose to individually right-trim every column in every CSV line
first. The program thus ran "s/\s+$//" 157 M x 25 = 3,175,000,000 times.

However, the "s/\s+$//" expression appears only once in the source of the program.

P.S.
Understand that I am certainly not complaining about the efficiency of perl and
"s/\s+$//". They both did their job perfectly, and pretty fast too (close to the time it
took to just read that file with "wc -l", and much less time than it took to export the
CSV file in the first place).
But if a dedicated rtrimmed() function, in addition to being slightly more elegant, would
happen also to be 25% faster than the regex above, I wouldn't say no to it. I might even
write a perl program to look into all our data-intensive programs and flag all its
potential uses.

1 2  View All