Mailing List Archive

RFC: White space within braced regex constructs
The question is should we allow space within the braces of things like
\b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?

Space is already allowed within Unicode property definitions

\p{ foo = bar }

is perfectly legal without /x. This was because the Unicode standard
required it. The space is only valid adjacent to the braces and the
equals sign.

I believe this is the only case where it is legal, however. You can't
say \x{ df } or \b{ wb }, for example, even under /x.

It has long been planned to bring Perl to parity with other languages so
as to be able to omit the lower bound in a curly quantifier, a{,3} would
have a lower bound of 0. We are now in a position to do that. We could
choose to allow white space within this construct 1) never; 2) always;
3) with /x.

I don't really know what is the right decision.
Re: RFC: White space within braced regex constructs [ In reply to ]
Karl Williamson writes:

> The question is should we allow space within the braces of things like
> \b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?
>
> ... omit the lower bound in a curly quantifier, a{,3} would have a
> lower bound of 0. We are now in a position to do that. We could
> choose to allow white space within this construct 1) never; 2)
> always; 3) with /x.
>
> I don't really know what is the right decision.

What's the disadvantage to always allowing whitespace?

Triggering on /x seems pointless. Outside of braces, whitespace
characters are normally literal, so /x changes their interpretation from
one valid meaning to a different one.

But spaces inside {m,n} are currently an error (“Unescaped left brace in
regex is illegal here in regex”). Nobody is currently using them. If
we're going to start skipping spaces in there, it seems unnecessarily
petty to throw an error unless the user enables /x. If somebody writes
{2, 3}, we unambiguously know what they mean.

Is there a significant efficiency or complexity of implementation
disadvantage to allowing whitespace in there?

The main disadvantage I can think of is backwards compatibility:
somebody adding spaces inside braces will find their code doesn't run on
older versions of perl. Or somebody uses spaces in an example, which
another user can't get to work. But that's also true for any
improvements to Perl; for years, many people avoided C<say>, to remain
compatible with pre-v5.10 perls.

Smylers
Re: RFC: White space within braced regex constructs [ In reply to ]
Karl Williamson <public@khwilliamson.com> wrote:
:The question is should we allow space within the braces of things like
:\b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?
:
:Space is already allowed within Unicode property definitions
:
: \p{ foo = bar }
:
:is perfectly legal without /x. This was because the Unicode standard
:required it. The space is only valid adjacent to the braces and the
:equals sign.
:
:I believe this is the only case where it is legal, however. You can't
:say \x{ df } or \b{ wb }, for example, even under /x.
:
:It has long been planned to bring Perl to parity with other languages so
:as to be able to omit the lower bound in a curly quantifier, a{,3} would
:have a lower bound of 0. We are now in a position to do that. We could
:choose to allow white space within this construct 1) never; 2) always;
:3) with /x.
:
:I don't really know what is the right decision.

I feel we should absolutely allow whitespace next to the punctuation in
\x{df} and {1,10} under /x. I think the value of it absent /x is a lot
lower, enough so that if there are any backcompat concerns we probably
shouldn't change it. So I'd go for (3).

I don't think we should allow whitespace within the numbers in either case
(\x{d f}, {1,1 0}).

It is a shame, though, that the error message you get is about
"unescaped left brace" - if the scenario Smylers suggests arises,
where someone on an older perl tries to use a regexp suggested by
someone used to the newer semantics, the error message will trigger
exactly the wrong attempt to "fix" the problem.

(Still better than silently wrong, as it would be for perl < 5.22).

Hugo
Re: RFC: White space within braced regex constructs [ In reply to ]
On Fri, 16 Oct 2020 11:26:32 +0100, hv@crypt.org wrote:

> Karl Williamson <public@khwilliamson.com> wrote:
> :The question is should we allow space within the braces of things
> like :\b{} {m,n} quantifiers, etc, regardless of the /x modifier
> setting? :
> :Space is already allowed within Unicode property definitions
> :
> : \p{ foo = bar }
> :
> :is perfectly legal without /x. This was because the Unicode
> standard :required it. The space is only valid adjacent to the
> braces and the :equals sign.
> :
> :I believe this is the only case where it is legal, however. You
> can't :say \x{ df } or \b{ wb }, for example, even under /x.
> :
> :It has long been planned to bring Perl to parity with other
> languages so :as to be able to omit the lower bound in a curly
> quantifier, a{,3} would :have a lower bound of 0. We are now in a
> position to do that. We could :choose to allow white space within
> this construct 1) never; 2) always; :3) with /x.
> :
> :I don't really know what is the right decision.
>
> I feel we should absolutely allow whitespace next to the punctuation
> in \x{df} and {1,10} under /x. I think the value of it absent /x is a
> lot lower, enough so that if there are any backcompat concerns we
> probably shouldn't change it. So I'd go for (3).

/me would also prefer {3}

Personally I see a big diff between \X{...} and {x,y} and see
implementation of whitespace in the two as separate issues

> I don't think we should allow whitespace within the numbers in either
> case (\x{d f}, {1,1 0}).

\x{d f} and {1,1 0}: NO
\x{ df } and {1, 10}: YES

both only under /x

> It is a shame, though, that the error message you get is about
> "unescaped left brace" - if the scenario Smylers suggests arises,
> where someone on an older perl tries to use a regexp suggested by
> someone used to the newer semantics, the error message will trigger
> exactly the wrong attempt to "fix" the problem.
>
> (Still better than silently wrong, as it would be for perl < 5.22).
>
> Hugo

--
H.Merijn Brand https://tux.nl Perl Monger http://amsterdam.pm.org/
using perl5.00307 .. 5.33 porting perl5 on HP-UX, AIX, and Linux
https://useplaintext.email https://www.test-smoke.org
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Re: RFC: White space within braced regex constructs [ In reply to ]
On Thu, Oct 15, 2020, at 11:20 PM, Karl Williamson wrote:
> The question is should we allow space within the braces of things like
> \b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?

I think we should pick one and stick to it. I would pick "always allow the whitespace", because I think it's easy to read and matches up to the behavior of an interpolated @{...} for example.

No backward compatibility concerns have yet come to my mind. (I am not worried about "but code on new perls won't necessarily run on old perls", because that is not a backward compatibility concern.)

If we go the opposite direction and say that /x is needed for that whitespace, I think that will be fine, too. It's just not what I'd do.

--
rjbs
Re: RFC: White space within braced regex constructs [ In reply to ]
On 2020-10-17 1:53 p.m., Ricardo Signes wrote:
> On Thu, Oct 15, 2020, at 11:20 PM, Karl Williamson wrote:
>> The question is should we allow space within the braces of things like
>> \b{} {m,n} quantifiers, etc, regardless of the /x modifier setting?
>
> I think we should pick one and stick to it.  I would pick "always allow the
> whitespace", because I think it's easy to read and matches up to the behavior of
> an interpolated @{...} for example.

I also vote for whitespace being allowed unconditionally (no matter whether /x
is present or not) inside curly brace constructs. Then /x unambiguously only
affects the meaning of things outside the curly brace constructs. This is a
much cleaner and more predictable or easy to use language design. -- Darren Duncan
Re: RFC: White space within braced regex constructs [ In reply to ]
On Sat, Oct 17, 2020 at 05:43:56PM -0700, Darren Duncan wrote:
> I also vote for whitespace being allowed unconditionally (no matter whether
> /x is present or not) inside curly brace constructs. Then /x unambiguously
> only affects the meaning of things outside the curly brace constructs. This
> is a much cleaner and more predictable or easy to use language design. --
> Darren Duncan

+1

--
You live and learn (although usually you just live).
Re: RFC: White space within braced regex constructs [ In reply to ]
On 10/19/20 4:02 AM, Dave Mitchell wrote:
> On Sat, Oct 17, 2020 at 05:43:56PM -0700, Darren Duncan wrote:
>> I also vote for whitespace being allowed unconditionally (no matter whether
>> /x is present or not) inside curly brace constructs. Then /x unambiguously
>> only affects the meaning of things outside the curly brace constructs. This
>> is a much cleaner and more predictable or easy to use language design. --
>> Darren Duncan
>
> +1
>

The effective rule would be that any tokens within braces may be
preceded or followed by white space. Should it be just horizontal white
space? I think so

A complication is that certain braced constructs can occur in double
quoted strings, such a \x{fb00}. Would they follow the same rules?
Re: RFC: White space within braced regex constructs [ In reply to ]
On 2020-10-20 5:41 p.m., Karl Williamson wrote:
> On 10/19/20 4:02 AM, Dave Mitchell wrote:
>> On Sat, Oct 17, 2020 at 05:43:56PM -0700, Darren Duncan wrote:
>>> I also vote for whitespace being allowed unconditionally (no matter whether
>>> /x is present or not) inside curly brace constructs.  Then /x unambiguously
>>> only affects the meaning of things outside the curly brace constructs.  This
>>> is a much cleaner and more predictable or easy to use language design. --
>>> Darren Duncan
>>
>> +1
>>
>
> The effective rule  would be that any tokens within braces may be preceded or
> followed by white space.  Should it be just horizontal white space? I think so
>
> A complication is that certain braced constructs can occur in double quoted
> strings, such a \x{fb00}.  Would they follow the same rules?

I wouldn't have a problem with that personally. And consistency is good for
looks alike is behaves alike. -- Darren Duncan
Re: RFC: White space within braced regex constructs [ In reply to ]
On 10/21/20 2:41 AM, Karl Williamson wrote:
> On 10/19/20 4:02 AM, Dave Mitchell wrote:
>> On Sat, Oct 17, 2020 at 05:43:56PM -0700, Darren Duncan wrote:
>>> I also vote for whitespace being allowed unconditionally (no matter
>>> whether
>>> /x is present or not) inside curly brace constructs.  Then /x
>>> unambiguously
>>> only affects the meaning of things outside the curly brace
>>> constructs.  This
>>> is a much cleaner and more predictable or easy to use language
>>> design. --
>>> Darren Duncan
>>
>> +1
>>

I would go for (3) in which we use "/x" for spacing, but spacing between
braces without "/x" makes sense, so I would +1 that option instead as
more lenient but still reasonable.


>
> The effective rule  would be that any tokens within braces may be
> preceded or followed by white space.  Should it be just horizontal
> white space? I think so


I think this is sane.


>
> A complication is that certain braced constructs can occur in double
> quoted strings, such a \x{fb00}.  Would they follow the same rules?
>

In what way is this a complication?
Re: RFC: White space within braced regex constructs [ In reply to ]
On 2020-11-01 11:19 a.m., Sawyer X wrote:
> On 10/21/20 2:41 AM, Karl Williamson wrote:
>> A complication is that certain braced constructs can occur in double quoted
>> strings, such a \x{fb00}.  Would they follow the same rules?
>
> In what way is this a complication?

It would be more accurate to call this, not a complication per se, but an
acknowledgement that the effects of the change go beyond regular expressions and
into regular string literals as well. -- Darren Duncan