Mailing List Archive

What the heck does /o do in a regex and can we clean up the documentation
Reading through the perlre documentation
<https://perldoc.perl.org/perlre#Other-Modifiers> I see:

o - pretend to optimize your code, but actually introduce bugs

While this is clever it should probably be more clear in the official
documentation? What exactly does /o do? I'm guessing it was a failed
attempt to optimize certain things? If I can get clarification on what
it does I will update the documentation to reflect that.

- Scottchiefbaker
||
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
Hi there,

On Fri, 16 Feb 2024, Scott Baker wrote:

> Reading through the perlre documentation
> <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
>
> o - pretend to optimize your code, but actually introduce bugs
>
> While this is clever it should probably be more clear in the official
> documentation? What exactly does /o do? I'm guessing it was a failed attempt
> to optimize certain things? If I can get clarification on what it does I will
> update the documentation to reflect that.

That's kinda scary. Any idea when that was written? Did you find out
anything about these alleged bugs? I'm guessing (hoping) that they're
more along the lines of "your code might not do what you think it does"
than "your interpreter might not do what it's supposed to do" - but it
really isn't very clear about that.

For decades I've been under the impression that

(1) what's on page 193 of the third edition of the Camel Book, and

(2) the extract below from perlfaq6

were all all I needed to know about it:

[quote]
What is "/o" really for?
(contributed by brian d foy)

The "/o" option for regular expressions (documented in perlop and
perlreref) tells Perl to compile the regular expression only once. This
is only useful when the pattern contains a variable. ...
...
...
In versions 5.6 and later, Perl won't recompile the regular expression
if the variable hasn't changed, so you probably don't need the "/o"
option. It doesn't hurt, but it doesn't help either. If you want any
version of Perl to compile the regular expression only once even if the
variable changes (thus, only using its initial value), you still need
the "/o".
[quote]

Now I feel not only a need to look over all my code to find out where I
used the /o modifier, but also dumb that I missed that in perlre.

--

73,
Ged.
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On Fri, Feb 16, 2024 at 1:07?PM G.W. Haywood <
perl5porters@jubileegroup.co.uk> wrote:

> Hi there,
>
> On Fri, 16 Feb 2024, Scott Baker wrote:
>
> > Reading through the perlre documentation
> > <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
> >
> > o - pretend to optimize your code, but actually introduce bugs
> >
> > While this is clever it should probably be more clear in the official
> > documentation? What exactly does /o do? I'm guessing it was a failed
> attempt
> > to optimize certain things? If I can get clarification on what it does I
> will
> > update the documentation to reflect that.
>
> That's kinda scary. Any idea when that was written? Did you find out
> anything about these alleged bugs? I'm guessing (hoping) that they're
> more along the lines of "your code might not do what you think it does"
> than "your interpreter might not do what it's supposed to do" - but it
> really isn't very clear about that.
>
> For decades I've been under the impression that
>
> (1) what's on page 193 of the third edition of the Camel Book, and
>
> (2) the extract below from perlfaq6
>
> were all all I needed to know about it:
>
> [quote]
> What is "/o" really for?
> (contributed by brian d foy)
>
> The "/o" option for regular expressions (documented in perlop and
> perlreref) tells Perl to compile the regular expression only once.
> This
> is only useful when the pattern contains a variable. ...
> ...
> ...
> In versions 5.6 and later, Perl won't recompile the regular expression
> if the variable hasn't changed, so you probably don't need the "/o"
> option. It doesn't hurt, but it doesn't help either. If you want any
> version of Perl to compile the regular expression only once even if
> the
> variable changes (thus, only using its initial value), you still need
> the "/o".
> [quote]
>
> Now I feel not only a need to look over all my code to find out where I
> used the /o modifier, but also dumb that I missed that in perlre.
>

This is pretty much it. The reason it says "pretend to optimize your code"
is because most assume it will do an optimization which Perl does
automatically anyways, and don't realize it actually causes the regex to
become static even if interpolated variables should change (i.e. in
repeated calls to a loop or subroutine). If you want such an optimization
without the logical incoherence of /o, regex refs exist now: my
$compile_this_once = qr/foo$bar/;

-Dan
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On Fri, Feb 16, 2024 at 1:07?PM G.W. Haywood <
perl5porters@jubileegroup.co.uk> wrote:

> Hi there,
>
> On Fri, 16 Feb 2024, Scott Baker wrote:
>
> > Reading through the perlre documentation
> > <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
> >
> > o - pretend to optimize your code, but actually introduce bugs
> >
> > While this is clever it should probably be more clear in the official
> > documentation? What exactly does /o do? I'm guessing it was a failed
> attempt
> > to optimize certain things? If I can get clarification on what it does I
> will
> > update the documentation to reflect that.
>
> That's kinda scary. Any idea when that was written? Did you find out
> anything about these alleged bugs? I'm guessing (hoping) that they're
> more along the lines of "your code might not do what you think it does"
> than "your interpreter might not do what it's supposed to do" - but it
> really isn't very clear about that.
>
> For decades I've been under the impression that
>
> (1) what's on page 193 of the third edition of the Camel Book, and
>
> (2) the extract below from perlfaq6
>
> were all all I needed to know about it:
>
> [quote]
> What is "/o" really for?
> (contributed by brian d foy)
>
> The "/o" option for regular expressions (documented in perlop and
> perlreref) tells Perl to compile the regular expression only once.
> This
> is only useful when the pattern contains a variable. ...
> ...
> ...
> In versions 5.6 and later, Perl won't recompile the regular expression
> if the variable hasn't changed, so you probably don't need the "/o"
> option. It doesn't hurt, but it doesn't help either. If you want any
> version of Perl to compile the regular expression only once even if
> the
> variable changes (thus, only using its initial value), you still need
> the "/o".
> [quote]
>
> Now I feel not only a need to look over all my code to find out where I
> used the /o modifier, but also dumb that I missed that in perlre.
>

(resent to list)

This is pretty much it. The reason it says "pretend to optimize your code"
is because most assume it will do an optimization which Perl does
automatically anyways, and don't realize it actually causes the regex to
become static even if interpolated variables should change (i.e. in
repeated calls to a loop or subroutine). If you want such an optimization
without the logical incoherence of /o, regex refs exist now: my
$compile_this_once = qr/foo$bar/;

-Dan
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
The /o modifier used to be the way you would tell perl the regex only
needed to be compiled "once", but it's essentially obsolete because
perl does a better job of detecting when a regex needs to be
recompiled than it used to-- and further you can use qr{} quoting to
control where compilation happens. There's a thorough discussion of
it in "perlop".

This dismissive joke in the docs there can indeed be confusing (and
classifing it a "substitution-specific modifier" is essentially
wrong).

I think the easiest fix is just to delete that line from perlre--
perlop already discusses /o, calling it "largely obsolete".

On 2/16/24, Scott Baker <scott@perturb.org> wrote:
> Reading through the perlre documentation
> <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
>
> o - pretend to optimize your code, but actually introduce bugs
>
> While this is clever it should probably be more clear in the official
> documentation? What exactly does /o do? I'm guessing it was a failed
> attempt to optimize certain things? If I can get clarification on what
> it does I will update the documentation to reflect that.
>
> - Scottchiefbaker
> ||
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an
> email to doom+unsubscribe@kzsu.stanford.edu.
>
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
> That's kinda scary. Any idea when that was written? Did you find out
> anything about these alleged bugs?

I think it's scarier than it needs to be, because I'm pretty sure the
"bugs" they're referring to is if you've got variable interpolation
into a regex that's marked with "/o", changing the variable won't get
refelected in the regex-- in other words, "/o" does exactly what it's
supposed to, but it's easy to get confused about what it in later
maintenance.

And you pretty much don't ever need it for any reason. We're stuck
talking about it somewhere because you may see it in older code (or
newer code written by older people).

On 2/16/24, G.W. Haywood <perl5porters@jubileegroup.co.uk> wrote:
> Hi there,
>
> On Fri, 16 Feb 2024, Scott Baker wrote:
>
>> Reading through the perlre documentation
>> <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
>>
>> o - pretend to optimize your code, but actually introduce bugs
>>
>> While this is clever it should probably be more clear in the official
>> documentation? What exactly does /o do? I'm guessing it was a failed
>> attempt
>> to optimize certain things? If I can get clarification on what it does I
>> will
>> update the documentation to reflect that.
>
> That's kinda scary. Any idea when that was written? Did you find out
> anything about these alleged bugs? I'm guessing (hoping) that they're
> more along the lines of "your code might not do what you think it does"
> than "your interpreter might not do what it's supposed to do" - but it
> really isn't very clear about that.
>
> For decades I've been under the impression that
>
> (1) what's on page 193 of the third edition of the Camel Book, and
>
> (2) the extract below from perlfaq6
>
> were all all I needed to know about it:
>
> [quote]
> What is "/o" really for?
> (contributed by brian d foy)
>
> The "/o" option for regular expressions (documented in perlop and
> perlreref) tells Perl to compile the regular expression only once.
> This
> is only useful when the pattern contains a variable. ...
> ...
> ...
> In versions 5.6 and later, Perl won't recompile the regular expression
> if the variable hasn't changed, so you probably don't need the "/o"
> option. It doesn't hurt, but it doesn't help either. If you want any
> version of Perl to compile the regular expression only once even if
> the
> variable changes (thus, only using its initial value), you still need
> the "/o".
> [quote]
>
> Now I feel not only a need to look over all my code to find out where I
> used the /o modifier, but also dumb that I missed that in perlre.
>
> --
>
> 73,
> Ged.
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an
> email to doom+unsubscribe@kzsu.stanford.edu.
>
>
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On 2/16/24 11:58, Scott Baker wrote:
>
> Reading through the perlre documentation
> <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
>
> o - pretend to optimize your code, but actually introduce bugs
>
> While this is clever it should probably be more clear in the official
> documentation? What exactly does /o do? I'm guessing it was a failed
> attempt to optimize certain things? If I can get clarification on what
> it does I will update the documentation to reflect that.
>
> - Scottchiefbaker
> ||

I, for one, enjoy finding these sort of things in documentation :-)   To
me it says "don't use it, it was an earlier attempt at optimization
preserved for backward compatibility."  If that's true, then I'd prefer
not to change it.   ...maybe add a link where the user can learn more
about it.

Other great quirky documentation that got neutered to be more corporate
and soulless was sfdisk:
Old version:

-f or --force
    Do what I say, even if it is stupid.

-I file
    After destroying your filesystems with an unfortunate sfdisk
command,
    you would have been able to restore the old situation if only
you had
    preserved it using the -O flag.

New version:

*-f*,*--force*
Disable all consistency checking.

(-I option removed)
This can later be restored by:

*sfdisk /dev/sda < sda.dump*
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On Fri, 16 Feb 2024 at 19:22, Dan Book <grinnz@gmail.com> wrote:

> If you want such an optimization without the logical incoherence of /o,
> regex refs exist now: my $compile_this_once = qr/foo$bar/;
>

While you are correct in a literal sense, i would be wary of advising use
of qr// as a replacement for /o. Better to just recommend that they omit
the /o. In some cases using qr// will be dramatically slower than simply
omitting the /o. Or at least provide some caveats, using qr// will be
faster if you can ensure that the qr// is executed only once, but omitting
the /o will be faster if you cant.

Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On Sat, Feb 17, 2024 at 1:31?AM demerphq <demerphq@gmail.com> wrote:

> On Fri, 16 Feb 2024 at 19:22, Dan Book <grinnz@gmail.com> wrote:
>
>> If you want such an optimization without the logical incoherence of /o,
>> regex refs exist now: my $compile_this_once = qr/foo$bar/;
>>
>
> While you are correct in a literal sense, i would be wary of advising use
> of qr// as a replacement for /o. Better to just recommend that they omit
> the /o. In some cases using qr// will be dramatically slower than simply
> omitting the /o. Or at least provide some caveats, using qr// will be
> faster if you can ensure that the qr// is executed only once, but omitting
> the /o will be faster if you cant.
>

It's more than a performance decision between these three options as they
would result in different functionality; removing the /o would cause the
regex to change if interpolated variables change on subsequent calls, and
using qr// instead depends on where the qr// is instantiated (in a much
more intuitive way, IMO). And yes I was referring to instantiating the qr//
only once as a replacement for the m//o only compiling once.

-Dan
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
I think some people in this discussion may need to look at the
discussion in perlop:

"Perl will not recompile the pattern unless an interpolated variable
that it contains changes."

Myself, if I knew this I'd forgotten about it: I thought the main
reason "/o" was obsolete was you could use qr{} judiciously to
control where compilation happens, but it turns out "/o" is now
nearly completely useless. In the vast majority of cases, you could
just delete it anywhere it's used without any performance penalty,
and (most likely) without changing the overall behavior of the code.


On 2/16/24, Dan Book <grinnz@gmail.com> wrote:
> On Sat, Feb 17, 2024 at 1:31?AM demerphq <demerphq@gmail.com> wrote:
>
>> On Fri, 16 Feb 2024 at 19:22, Dan Book <grinnz@gmail.com> wrote:
>>
>>> If you want such an optimization without the logical incoherence of /o,
>>> regex refs exist now: my $compile_this_once = qr/foo$bar/;
>>>
>>
>> While you are correct in a literal sense, i would be wary of advising use
>> of qr// as a replacement for /o. Better to just recommend that they omit
>> the /o. In some cases using qr// will be dramatically slower than simply
>> omitting the /o. Or at least provide some caveats, using qr// will be
>> faster if you can ensure that the qr// is executed only once, but
>> omitting
>> the /o will be faster if you cant.
>>
>
> It's more than a performance decision between these three options as they
> would result in different functionality; removing the /o would cause the
> regex to change if interpolated variables change on subsequent calls, and
> using qr// instead depends on where the qr// is instantiated (in a much
> more intuitive way, IMO). And yes I was referring to instantiating the qr//
> only once as a replacement for the m//o only compiling once.
>
> -Dan
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an
> email to doom+unsubscribe@kzsu.stanford.edu.
>
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
Scott Baker <scott@perturb.org> writes:

> Reading through the perlre documentation I see:
>
> o - pretend to optimize your code, but actually introduce bugs
>

That summary provides no inormation for new-comers who are indeed
spending their effort to understand what /o modifier is doing.

It should be rewritten as something that describe how it modifies the
regular expression at question, such as "Restrict the interpolation to
be performed only once"

Other references of /o modifier that I could find:

In perlchat: https://perldoc.perl.org/perlcheat

/o compile pat once

In perlop: https://perldoc.perl.org/perlop#qr/STRING/msixpodualn

o Compile pattern only once.

In perlreref: https://perldoc.perl.org/perlreref#OPERATORS

o compile pattern Once

I supposed we could also just use the same phrase in perlop too.

OTOH, other than this short phrase, I could not find other, longer,
paragraphs that describe /o modifier in more detail, or how the
compilation of pattern happens, and the how it might happens multiple time on the
same RE (hence the "once" modifier.)

I guess there are books that already cover such detail, but it would be
nice if there are a few paragraphs in perlre to describe 'o' flag in
a gist. Like how the same modifier is documented inside the Ruby egexp
class.

=> https://docs.ruby-lang.org/en/master/Regexp.html#class-Regexp-label-Interpolation+Mode

--
Kang-min Liu
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On Fri, 16 Feb 2024 10:26:20 -0800 Joseph Brenner <doomvox@gmail.com> wrote:

> The /o modifier used to be the way you would tell perl the regex only
> needed to be compiled "once", but it's essentially obsolete because
> perl does a better job of detecting when a regex needs to be
> recompiled than it used to-- and further you can use qr{} quoting to
> control where compilation happens. There's a thorough discussion of
> it in "perlop".
>
> This dismissive joke in the docs there can indeed be confusing (and
> classifing it a "substitution-specific modifier" is essentially
> wrong).
>
> I think the easiest fix is just to delete that line from perlre--
> perlop already discusses /o, calling it "largely obsolete".
>
> On 2/16/24, Scott Baker <scott@perturb.org> wrote:
> > Reading through the perlre documentation
> > <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
> >
> > o - pretend to optimize your code, but actually introduce bugs
> >
> > While this is clever it should probably be more clear in the official
> > documentation? What exactly does /o do? I'm guessing it was a failed
> > attempt to optimize certain things? If I can get clarification on what
> > it does I will update the documentation to reflect that.
> >
> > - Scottchiefbaker
> > ||
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to doom+unsubscribe@kzsu.stanford.edu.
> >

Have you read posts below? I found them useful.

https://blogs.perl.org/users/tom_wyant/2022/08/match-anything-quickly.html
https://blogs.perl.org/users/tom_wyant/2022/09/match-anything-quickly----revision-1.html
https://blogs.perl.org/users/aristotle/2022/09/reinterpolate.html

I prefer to devide big expressions into few qr objects. I know that
(?(DEFINE)(?<FOO>...)) block is better, but perldoc says somewhere that
(?&FOO) is not optimised right now. But interpolating qr objects
according to posts above and my observations also takes more time because
of stringification of qr objects (which, I suppose, are not interpolated
when not changed, but check must be made that they have not changed).

I write code like this:

state $foo = qr{...};
state $bar = qr{...};

sub func ($par) {
$par =~ m{$foo$bar}o;
}

Actually I think that /o modifier when interpolate must not be used
only when interpolated variable is taken from user somehow (because
other way varialbe practically immutable). So, in my opinion, /o should
be on by default with different modifier to explicitly interpolate on
each match. But historically different route was chosen.

I think that /o modifier is usefull even now, when Perl makes some
optimisations. But it should be properly documented, so that user can
make informed decisions.

--
Ivan Vorontsov <ivrntsv@yandex.ru>
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
On Sun, 18 Feb 2024, 14:29 Ivan Vorontsov, <ivrntsv@yandex.ru> wrote:

> On Fri, 16 Feb 2024 10:26:20 -0800 Joseph Brenner <doomvox@gmail.com>
> wrote:
>
> > The /o modifier used to be the way you would tell perl the regex only
> > needed to be compiled "once", but it's essentially obsolete because
> > perl does a better job of detecting when a regex needs to be
> > recompiled than it used to-- and further you can use qr{} quoting to
> > control where compilation happens. There's a thorough discussion of
> > it in "perlop".
> >
> > This dismissive joke in the docs there can indeed be confusing (and
> > classifing it a "substitution-specific modifier" is essentially
> > wrong).
> >
> > I think the easiest fix is just to delete that line from perlre--
> > perlop already discusses /o, calling it "largely obsolete".
> >
> > On 2/16/24, Scott Baker <scott@perturb.org> wrote:
> > > Reading through the perlre documentation
> > > <https://perldoc.perl.org/perlre#Other-Modifiers> I see:
> > >
> > > o - pretend to optimize your code, but actually introduce bugs
> > >
> > > While this is clever it should probably be more clear in the official
> > > documentation? What exactly does /o do? I'm guessing it was a failed
> > > attempt to optimize certain things? If I can get clarification on what
> > > it does I will update the documentation to reflect that.
> > >
> > > - Scottchiefbaker
> > > ||
> > >
> > > --
> > > To unsubscribe from this group and stop receiving emails from it, send
> an
> > > email to doom+unsubscribe@kzsu.stanford.edu.
> > >
>
> Have you read posts below? I found them useful.
>
>
> https://blogs.perl.org/users/tom_wyant/2022/08/match-anything-quickly.html
>
> https://blogs.perl.org/users/tom_wyant/2022/09/match-anything-quickly----revision-1.html
> https://blogs.perl.org/users/aristotle/2022/09/reinterpolate.html
>
> I prefer to devide big expressions into few qr objects. I know that
> (?(DEFINE)(?<FOO>...)) block is better, but perldoc says somewhere that
> (?&FOO) is not optimised right now.


Can you give more context? Using define and named recursion is optimized
*more* than any alternatives.

But interpolating qr objects
> according to posts above and my observations also takes more time because
> of stringification of qr objects (which, I suppose, are not interpolated
> when not changed, but check must be made that they have not changed).
>
> I write code like this:
>
> state $foo = qr{...};
> state $bar = qr{...};
>
> sub func ($par) {
> $par =~ m{$foo$bar}o;
> }
>
> Actually I think that /o modifier when interpolate must not be used
> only when interpolated variable is taken from user somehow (because
> other way varialbe practically immutable). So, in my opinion, /o should
> be on by default with different modifier to explicitly interpolate on
> each match. But historically different route was chosen.
>
> I think that /o modifier is usefull even now, when Perl makes some
> optimisations. But it should be properly documented, so that user can
> make informed decisions.
>
> --
> Ivan Vorontsov <ivrntsv@yandex.ru>
>
Re: What the heck does /o do in a regex and can we clean up the documentation [ In reply to ]
> > I prefer to devide big expressions into few qr objects. I know that
> > (?(DEFINE)(?<FOO>...)) block is better, but perldoc says somewhere that
> > (?&FOO) is not optimised right now.
>
>
> Can you give more context? Using define and named recursion is optimized
> *more* than any alternatives.
>

perlre.pod under (?(condition)yes-pattern|no-pattern) section says:

...

Here's a summary of the possible predicates:

...

(DEFINE)

In this case, the yes-pattern is never directly executed, and no
no-pattern is allowed. Similar in spirit to (?{0}) but more efficient.
See below for details. Full syntax: (?(DEFINE)definitions...)

...

A special form is the (DEFINE) predicate, which never executes its
yes-pattern directly, and does not allow a no-pattern. This allows one
to define subpatterns which will be executed only by the recursion
mechanism. This way, you can define a set of regular expression rules
that can be bundled into any pattern you choose.

It is recommended that for this usage you put the DEFINE block at the
end of the pattern, and that you name any subpatterns defined within it.

Also, it's worth noting that patterns defined this way probably will
not be as efficient, as the optimizer is not very clever about handling
them.

...

--
Ivan Vorontsov <ivrntsv@yandex.ru>