Mailing List Archive: [RFC PATCH] introduce the RXapif_NPAR flag

[RFC PATCH] introduce the RXapif_NPAR flag

Sep 21, 2023, 4:32 AM

Post #1 of 12 (208 views)

Hello,

Alexey is trying to rewrite some code from Go to Perl and he ran
into a problem: it is not possible to know the start/end offsets
of the named captures.

This change makes it possible to find out the logical number(s)
of the named capture buffer which in turn allows to use @- / @+.

Example:

# like %- but reports the paren numbers
tie my %h, "Tie::Hash::NamedCapture", all=>1, npar=>1;

#01234567
'12ab45xy' =~ /^(\d+)(?<N>[a-z]+)(\d+)(?<N>[a-z]+)$/ or die;

$pn = $h{N};

print "pn=@$pn\n";

print "start=@-[@$pn] end=@+[@$pn]\n";

output:

pn=2 4
start=2 6 end=4 8

Do you think something like the patch below makes sense?

Perhaps it would be better if RXapif_NPAR will make "ret" dualvar
in Perl_reg_named_buff_fetch?

Thanks,

Oleg.
---
regexec.c | 8 ++++++--
regexp.h | 2 ++
universal.c | 8 +++++---
3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/regexec.c b/regexec.c
index de0b7c4619..acf2a116d6 100644
--- a/regexec.c
+++ b/regexec.c
@@ -11978,8 +11978,12 @@ Perl_reg_named_buff_fetch(pTHX_ REGEXP * const r, SV * const namesv,
if ((I32)(rx->nparens) >= nums[i]
&& RXp_OFFS_VALID(rx,nums[i]))
{
- ret = newSVpvs("");
- Perl_reg_numbered_buff_fetch_flags(aTHX_ r, nums[i], ret, REG_FETCH_ABSOLUTE);
+ if (flags & RXapif_NPAR) {
+ ret = newSViv(nums[i]);
+ } else {
+ ret = newSVpvs("");
+ Perl_reg_numbered_buff_fetch_flags(aTHX_ r, nums[i], ret, REG_FETCH_ABSOLUTE);
+ }
if (!retarray)
return ret;
} else {
diff --git a/regexp.h b/regexp.h
index 8272487212..94389a7aab 100644
--- a/regexp.h
+++ b/regexp.h
@@ -351,6 +351,8 @@ typedef struct regexp_engine {
# define RXapif_REGNAMES 0x0800
# define RXapif_REGNAMES_COUNT 0x1000

+# define RXapif_NPAR 0x02000 /* %- %+ */
+
/*
=for apidoc Am|REGEXP *|SvRX|SV *sv

diff --git a/universal.c b/universal.c
index 1e039d1936..75838d4846 100644
--- a/universal.c
+++ b/universal.c
@@ -1195,17 +1195,19 @@ XS(XS_NamedCapture_TIEHASH)
croak_xs_usage(cv, "package, ...");
{
const char * package = (const char *)SvPV_nolen(ST(0));
- UV flag = RXapif_ONE;
+ UV all = RXapif_ONE, npar = 0;
mark += 2;
while(mark < sp) {
STRLEN len;
const char *p = SvPV_const(*mark, len);
if(memEQs(p, len, "all"))
- flag = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
+ all = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
+ else if(memEQs(p, len, "npar"))
+ npar = SvTRUE(mark[1]) ? RXapif_NPAR : 0;
mark += 2;
}
ST(0) = newSV_type_mortal(SVt_IV);
- sv_setuv(newSVrv(ST(0), package), flag);
+ sv_setuv(newSVrv(ST(0), package), all | npar);
}
XSRETURN(1);
}
--
2.25.1.362.g51ebf55

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

Nov 21, 2023, 9:20 AM

Post #2 of 12 (184 views)

No one even bothered to reply...

It's a pity. OK, I am sure Alexey will find a workaround, but I do
remember the days when Perl developers were more attentive to the
needs of Perl users ;)

On 09/21, Oleg Nesterov wrote:
>
> Hello,
>
> Alexey is trying to rewrite some code from Go to Perl and he ran
> into a problem: it is not possible to know the start/end offsets
> of the named captures.
>
> This change makes it possible to find out the logical number(s)
> of the named capture buffer which in turn allows to use @- / @+.
>
> Example:
>
> # like %- but reports the paren numbers
> tie my %h, "Tie::Hash::NamedCapture", all=>1, npar=>1;
>
> #01234567
> '12ab45xy' =~ /^(\d+)(?<N>[a-z]+)(\d+)(?<N>[a-z]+)$/ or die;
>
> $pn = $h{N};
>
> print "pn=@$pn\n";
>
> print "start=@-[@$pn] end=@+[@$pn]\n";
>
> output:
>
> pn=2 4
> start=2 6 end=4 8
>
> Do you think something like the patch below makes sense?
>
> Perhaps it would be better if RXapif_NPAR will make "ret" dualvar
> in Perl_reg_named_buff_fetch?
>
> Thanks,
>
> Oleg.
> ---
> regexec.c | 8 ++++++--
> regexp.h | 2 ++
> universal.c | 8 +++++---
> 3 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/regexec.c b/regexec.c
> index de0b7c4619..acf2a116d6 100644
> --- a/regexec.c
> +++ b/regexec.c
> @@ -11978,8 +11978,12 @@ Perl_reg_named_buff_fetch(pTHX_ REGEXP * const r, SV * const namesv,
> if ((I32)(rx->nparens) >= nums[i]
> && RXp_OFFS_VALID(rx,nums[i]))
> {
> - ret = newSVpvs("");
> - Perl_reg_numbered_buff_fetch_flags(aTHX_ r, nums[i], ret, REG_FETCH_ABSOLUTE);
> + if (flags & RXapif_NPAR) {
> + ret = newSViv(nums[i]);
> + } else {
> + ret = newSVpvs("");
> + Perl_reg_numbered_buff_fetch_flags(aTHX_ r, nums[i], ret, REG_FETCH_ABSOLUTE);
> + }
> if (!retarray)
> return ret;
> } else {
> diff --git a/regexp.h b/regexp.h
> index 8272487212..94389a7aab 100644
> --- a/regexp.h
> +++ b/regexp.h
> @@ -351,6 +351,8 @@ typedef struct regexp_engine {
> # define RXapif_REGNAMES 0x0800
> # define RXapif_REGNAMES_COUNT 0x1000
>
> +# define RXapif_NPAR 0x02000 /* %- %+ */
> +
> /*
> =for apidoc Am|REGEXP *|SvRX|SV *sv
>
> diff --git a/universal.c b/universal.c
> index 1e039d1936..75838d4846 100644
> --- a/universal.c
> +++ b/universal.c
> @@ -1195,17 +1195,19 @@ XS(XS_NamedCapture_TIEHASH)
> croak_xs_usage(cv, "package, ...");
> {
> const char * package = (const char *)SvPV_nolen(ST(0));
> - UV flag = RXapif_ONE;
> + UV all = RXapif_ONE, npar = 0;
> mark += 2;
> while(mark < sp) {
> STRLEN len;
> const char *p = SvPV_const(*mark, len);
> if(memEQs(p, len, "all"))
> - flag = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
> + all = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
> + else if(memEQs(p, len, "npar"))
> + npar = SvTRUE(mark[1]) ? RXapif_NPAR : 0;
> mark += 2;
> }
> ST(0) = newSV_type_mortal(SVt_IV);
> - sv_setuv(newSVrv(ST(0), package), flag);
> + sv_setuv(newSVrv(ST(0), package), all | npar);
> }
> XSRETURN(1);
> }
> --
> 2.25.1.362.g51ebf55
>

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

rwp.primary at gmail

Nov 21, 2023, 9:58 AM

Post #3 of 12 (184 views)

I (personally) think there's a greater chance of visibility and persistence
if this is opened as a GitHub issue and/or pull request (and hopefully as a
discussion, in the ideal future, too).

On Tue, Nov 21, 2023, 6:21 PM Oleg Nesterov <oleg@redhat.com> wrote:

> No one even bothered to reply...
>
> It's a pity. OK, I am sure Alexey will find a workaround, but I do
> remember the days when Perl developers were more attentive to the
> needs of Perl users ;)
>
> On 09/21, Oleg Nesterov wrote:
> >
> > Hello,
> >
> > Alexey is trying to rewrite some code from Go to Perl and he ran
> > into a problem: it is not possible to know the start/end offsets
> > of the named captures.
> >
> > This change makes it possible to find out the logical number(s)
> > of the named capture buffer which in turn allows to use @- / @+.
> >
> > Example:
> >
> > # like %- but reports the paren numbers
> > tie my %h, "Tie::Hash::NamedCapture", all=>1, npar=>1;
> >
> > #01234567
> > '12ab45xy' =~ /^(\d+)(?<N>[a-z]+)(\d+)(?<N>[a-z]+)$/ or die;
> >
> > $pn = $h{N};
> >
> > print "pn=@$pn\n";
> >
> > print "start=@-[@$pn] end=@+[@$pn]\n";
> >
> > output:
> >
> > pn=2 4
> > start=2 6 end=4 8
> >
> > Do you think something like the patch below makes sense?
> >
> > Perhaps it would be better if RXapif_NPAR will make "ret" dualvar
> > in Perl_reg_named_buff_fetch?
> >
> > Thanks,
> >
> > Oleg.
> > ---
> > regexec.c | 8 ++++++--
> > regexp.h | 2 ++
> > universal.c | 8 +++++---
> > 3 files changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/regexec.c b/regexec.c
> > index de0b7c4619..acf2a116d6 100644
> > --- a/regexec.c
> > +++ b/regexec.c
> > @@ -11978,8 +11978,12 @@ Perl_reg_named_buff_fetch(pTHX_ REGEXP * const
> r, SV * const namesv,
> > if ((I32)(rx->nparens) >= nums[i]
> > && RXp_OFFS_VALID(rx,nums[i]))
> > {
> > - ret = newSVpvs("");
> > - Perl_reg_numbered_buff_fetch_flags(aTHX_ r,
> nums[i], ret, REG_FETCH_ABSOLUTE);
> > + if (flags & RXapif_NPAR) {
> > + ret = newSViv(nums[i]);
> > + } else {
> > + ret = newSVpvs("");
> > + Perl_reg_numbered_buff_fetch_flags(aTHX_ r,
> nums[i], ret, REG_FETCH_ABSOLUTE);
> > + }
> > if (!retarray)
> > return ret;
> > } else {
> > diff --git a/regexp.h b/regexp.h
> > index 8272487212..94389a7aab 100644
> > --- a/regexp.h
> > +++ b/regexp.h
> > @@ -351,6 +351,8 @@ typedef struct regexp_engine {
> > # define RXapif_REGNAMES 0x0800
> > # define RXapif_REGNAMES_COUNT 0x1000
> >
> > +# define RXapif_NPAR 0x02000 /* %- %+ */
> > +
> > /*
> > =for apidoc Am|REGEXP *|SvRX|SV *sv
> >
> > diff --git a/universal.c b/universal.c
> > index 1e039d1936..75838d4846 100644
> > --- a/universal.c
> > +++ b/universal.c
> > @@ -1195,17 +1195,19 @@ XS(XS_NamedCapture_TIEHASH)
> > croak_xs_usage(cv, "package, ...");
> > {
> > const char * package = (const char *)SvPV_nolen(ST(0));
> > - UV flag = RXapif_ONE;
> > + UV all = RXapif_ONE, npar = 0;
> > mark += 2;
> > while(mark < sp) {
> > STRLEN len;
> > const char *p = SvPV_const(*mark, len);
> > if(memEQs(p, len, "all"))
> > - flag = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
> > + all = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
> > + else if(memEQs(p, len, "npar"))
> > + npar = SvTRUE(mark[1]) ? RXapif_NPAR : 0;
> > mark += 2;
> > }
> > ST(0) = newSV_type_mortal(SVt_IV);
> > - sv_setuv(newSVrv(ST(0), package), flag);
> > + sv_setuv(newSVrv(ST(0), package), all | npar);
> > }
> > XSRETURN(1);
> > }
> > --
> > 2.25.1.362.g51ebf55
> >
>
>

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

agladkov at redhat

Nov 21, 2023, 10:30 AM

Post #4 of 12 (184 views)

On Tue, Nov 21, 2023 at 06:20:12PM +0100, Oleg Nesterov wrote:
> No one even bothered to reply...
>
> It's a pity. OK, I am sure Alexey will find a workaround, but I do
> remember the days when Perl developers were more attentive to the
> needs of Perl users ;)

Thanks for your patch, but I actually found the workaround. My solution is
to use python :)

It looks like this:

import re

m = re.match(r'^(\d+)(?P<N1>[a-z]+)(\d+)(?P<N2>[a-z]+)$', '12ab45xy')
print(m.group('N1'), m.start('N1'), m.end('N1'))
print(m.group('N2'), m.start('N2'), m.end('N2'))

output:

ab 2 4
xy 6 8

> On 09/21, Oleg Nesterov wrote:
> >
> > Hello,
> >
> > Alexey is trying to rewrite some code from Go to Perl and he ran
> > into a problem: it is not possible to know the start/end offsets
> > of the named captures.
> >
> > This change makes it possible to find out the logical number(s)
> > of the named capture buffer which in turn allows to use @- / @+.
> >
> > Example:
> >
> > # like %- but reports the paren numbers
> > tie my %h, "Tie::Hash::NamedCapture", all=>1, npar=>1;
> >
> > #01234567
> > '12ab45xy' =~ /^(\d+)(?<N>[a-z]+)(\d+)(?<N>[a-z]+)$/ or die;
> >
> > $pn = $h{N};
> >
> > print "pn=@$pn\n";
> >
> > print "start=@-[@$pn] end=@+[@$pn]\n";
> >
> > output:
> >
> > pn=2 4
> > start=2 6 end=4 8
> >
> > Do you think something like the patch below makes sense?
> >
> > Perhaps it would be better if RXapif_NPAR will make "ret" dualvar
> > in Perl_reg_named_buff_fetch?
> >
> > Thanks,
> >
> > Oleg.
> > ---
> > regexec.c | 8 ++++++--
> > regexp.h | 2 ++
> > universal.c | 8 +++++---
> > 3 files changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/regexec.c b/regexec.c
> > index de0b7c4619..acf2a116d6 100644
> > --- a/regexec.c
> > +++ b/regexec.c
> > @@ -11978,8 +11978,12 @@ Perl_reg_named_buff_fetch(pTHX_ REGEXP * const r, SV * const namesv,
> > if ((I32)(rx->nparens) >= nums[i]
> > && RXp_OFFS_VALID(rx,nums[i]))
> > {
> > - ret = newSVpvs("");
> > - Perl_reg_numbered_buff_fetch_flags(aTHX_ r, nums[i], ret, REG_FETCH_ABSOLUTE);
> > + if (flags & RXapif_NPAR) {
> > + ret = newSViv(nums[i]);
> > + } else {
> > + ret = newSVpvs("");
> > + Perl_reg_numbered_buff_fetch_flags(aTHX_ r, nums[i], ret, REG_FETCH_ABSOLUTE);
> > + }
> > if (!retarray)
> > return ret;
> > } else {
> > diff --git a/regexp.h b/regexp.h
> > index 8272487212..94389a7aab 100644
> > --- a/regexp.h
> > +++ b/regexp.h
> > @@ -351,6 +351,8 @@ typedef struct regexp_engine {
> > # define RXapif_REGNAMES 0x0800
> > # define RXapif_REGNAMES_COUNT 0x1000
> >
> > +# define RXapif_NPAR 0x02000 /* %- %+ */
> > +
> > /*
> > =for apidoc Am|REGEXP *|SvRX|SV *sv
> >
> > diff --git a/universal.c b/universal.c
> > index 1e039d1936..75838d4846 100644
> > --- a/universal.c
> > +++ b/universal.c
> > @@ -1195,17 +1195,19 @@ XS(XS_NamedCapture_TIEHASH)
> > croak_xs_usage(cv, "package, ...");
> > {
> > const char * package = (const char *)SvPV_nolen(ST(0));
> > - UV flag = RXapif_ONE;
> > + UV all = RXapif_ONE, npar = 0;
> > mark += 2;
> > while(mark < sp) {
> > STRLEN len;
> > const char *p = SvPV_const(*mark, len);
> > if(memEQs(p, len, "all"))
> > - flag = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
> > + all = SvTRUE(mark[1]) ? RXapif_ALL : RXapif_ONE;
> > + else if(memEQs(p, len, "npar"))
> > + npar = SvTRUE(mark[1]) ? RXapif_NPAR : 0;
> > mark += 2;
> > }
> > ST(0) = newSV_type_mortal(SVt_IV);
> > - sv_setuv(newSVrv(ST(0), package), flag);
> > + sv_setuv(newSVrv(ST(0), package), all | npar);
> > }
> > XSRETURN(1);
> > }
> > --
> > 2.25.1.362.g51ebf55
> >
>

--
Rgrds, agladkov

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

Nov 21, 2023, 10:39 AM

Post #5 of 12 (184 views)

On 11/21, Elvin Aslanov wrote:
>
> I (personally) think there's a greater chance of visibility and persistence
> if this is opened as a GitHub issue

OK, thanks. I'll try to open an issue and copy-and-paste my email.

> and/or pull request

Well, no... I don't feel this patch is ready for inclusion. I am not familiar
with the Perl internals, I made this "POC" patch just to provoke the discussion.
Because I think it always makes sense to at least try to find the solution
when you report a problem or a feature request. And after 2 months I forgot
everything I learned when I grepped the sources, I can't even recall what else
I had in mind.

Thanks Elvin,

Oleg.

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

Nov 21, 2023, 10:41 AM

Post #6 of 12 (184 views)

On 11/21, Alexey Gladkov wrote:
>
> On Tue, Nov 21, 2023 at 06:20:12PM +0100, Oleg Nesterov wrote:
> > No one even bothered to reply...
> >
> > It's a pity. OK, I am sure Alexey will find a workaround, but I do
> > remember the days when Perl developers were more attentive to the
> > needs of Perl users ;)
>
> Thanks for your patch, but I actually found the workaround. My solution is
> to use python :)

Ah, OK ;)

Then I guess I don't need to open an issue and we can forget this thtead.

Oleg.

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

demerphq at gmail

Nov 21, 2023, 12:34 PM

Post #7 of 12 (184 views)

On Tue, 21 Nov 2023 at 19:42, Oleg Nesterov <oleg@redhat.com> wrote:

> On 11/21, Alexey Gladkov wrote:
> >
> > On Tue, Nov 21, 2023 at 06:20:12PM +0100, Oleg Nesterov wrote:
> > > No one even bothered to reply...
> > >
> > > It's a pity. OK, I am sure Alexey will find a workaround, but I do
> > > remember the days when Perl developers were more attentive to the
> > > needs of Perl users ;)
> >
> > Thanks for your patch, but I actually found the workaround. My solution
> is
> > to use python :)
>
> Ah, OK ;)
>
> Then I guess I don't need to open an issue and we can forget this thtead.
>

FWIW, I am probably the person who would pick up your patch, but I simply
didn't see your mail, sorry it got overlooked, it was definitely not for
lack of interest or willingness to support this use case. I just dont have
a lot of time lately.

We can probably do something to facilitate this. I am not sure how it
would work with the recent changes related to branch reset however, as we
do not expose any way to extract physical capture buffers by number, just
logical capture buffers, and as of 5.38 named captures DWIM, such that
patterns like

/(?|(?<a>A)|(?<b>B))/

will DWIM. Older perls would have mapped both 'a' and 'b' to 1, but as of
5.38, only one of the two would be matching, but whichever it was would be
exposed via the numerical api as $1. Maybe we need to expose @^+ and @^- or
something like that.

I created a ticket for this functionality and credited you Oleg for the
idea.

https://github.com/Perl/perl5/issues/21656

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

Nov 21, 2023, 12:55 PM

Post #8 of 12 (184 views)

On Thu, Sep 21, 2023 at 4:33?AM Oleg Nesterov <oleg@redhat.com> wrote:

> Alexey is trying to rewrite some code from Go to Perl and he ran
> into a problem: it is not possible to know the start/end offsets
> of the named captures.
>

Isn't this information available in @- and @+? I recall (and see `perldoc
perlvar`) that for each $n where $-[$n] is defined, you can use "substr $_,
$-[n], $+[n] - $-[n]" to extract the individual strings.

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

demerphq at gmail

Nov 21, 2023, 1:20 PM

Post #9 of 12 (184 views)

On Tue, 21 Nov 2023 at 21:55, Karen Etheridge <perl@froods.org> wrote:

>
>
> On Thu, Sep 21, 2023 at 4:33?AM Oleg Nesterov <oleg@redhat.com> wrote:
>
>> Alexey is trying to rewrite some code from Go to Perl and he ran
>> into a problem: it is not possible to know the start/end offsets
>> of the named captures.
>>
>
> Isn't this information available in @- and @+? I recall (and see `perldoc
> perlvar`) that for each $n where $-[$n] is defined, you can use "substr $_,
> $-[n], $+[n] - $-[n]" to extract the individual strings.
>

For some reason they wanted to know what index in @- and @+ they should
look in for named capture "a" in a pattern like:

/(?<a>A)(?<b>B)/

They didn't explain why they can't just use $-{a} to access the content of
the capture buffer for the named capture 'a'.

In https://github.com/Perl/perl5/issues/21656 I explain why this isn't
quite as simple as it seems, assuming you care about branch reset (which I
do).

FWIW, branch reset is a great example of "be careful what you implement,
just because you can doesn't mean you should, as the implications can be
hard to predict in the long run". Sigh.

Nevetheless we are where we are, and given where we are we probably should
provide further introspection.

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

Nov 21, 2023, 2:56 PM

Post #10 of 12 (184 views)

Thank you all for your replies.

Let me repeat, I forgot everything about this issue after 2 motnth. Plus I am
already sleeping, most probably I don't understand you. IOW, I am sure I will
regret about this email tomorrow ;)

On 11/21, demerphq wrote:
>
> On Tue, 21 Nov 2023 at 21:55, Karen Etheridge <perl@froods.org> wrote:
>
> >
> >
> > On Thu, Sep 21, 2023 at 4:33?AM Oleg Nesterov <oleg@redhat.com> wrote:
> >
> >> Alexey is trying to rewrite some code from Go to Perl and he ran
> >> into a problem: it is not possible to know the start/end offsets
> >> of the named captures.
> >>
> >
> > Isn't this information available in @- and @+? I recall (and see `perldoc
> > perlvar`) that for each $n where $-[$n] is defined, you can use "substr $_,
> > $-[n], $+[n] - $-[n]" to extract the individual strings.
> >
>
> For some reason they wanted to know what index in @- and @+ they should
> look in for named capture "a" in a pattern like:
>
> /(?<a>A)(?<b>B)/
>
> They didn't explain why they can't just use $-{a} to access the content of
> the capture buffer for the named capture 'a'.

IIRC, the problem is NOT that you can't access the CONTENT of the named capture.

IIRC, the problem is that you can't know the start/end offsets of the matched
named capture.

Yes, we have $-[n] and $+[n]. So which "n" can I use for a named capture?

Sorry for my poor English, and sorry again if I (quite possibly) misunderstood
you.

Oleg.

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

demerphq at gmail

Nov 22, 2023, 1:24 AM

Post #11 of 12 (184 views)

On Tue, 21 Nov 2023 at 23:57, Oleg Nesterov <oleg@redhat.com> wrote:

> Thank you all for your replies.
>
> Let me repeat, I forgot everything about this issue after 2 motnth. Plus I
> am
> already sleeping, most probably I don't understand you. IOW, I am sure I
> will
> regret about this email tomorrow ;)

I think your post is fine. :-)

>
> On 11/21, demerphq wrote:
> >
> > On Tue, 21 Nov 2023 at 21:55, Karen Etheridge <perl@froods.org> wrote:
> >
> > >
> > >
> > > On Thu, Sep 21, 2023 at 4:33?AM Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > >> Alexey is trying to rewrite some code from Go to Perl and he ran
> > >> into a problem: it is not possible to know the start/end offsets
> > >> of the named captures.
> > >>
> > >
> > > Isn't this information available in @- and @+? I recall (and see
> `perldoc
> > > perlvar`) that for each $n where $-[$n] is defined, you can use
> "substr $_,
> > > $-[n], $+[n] - $-[n]" to extract the individual strings.
> > >
> >
> > For some reason they wanted to know what index in @- and @+ they should
> > look in for named capture "a" in a pattern like:
> >
> > /(?<a>A)(?<b>B)/
> >
> > They didn't explain why they can't just use $-{a} to access the content
> of
> > the capture buffer for the named capture 'a'.
>
> IIRC, the problem is NOT that you can't access the CONTENT of the named
> capture.
>
> IIRC, the problem is that you can't know the start/end offsets of the
> matched
> named capture.
>
> Yes, we have $-[n] and $+[n]. So which "n" can I use for a named capture?
>

As of 5.38 unfortunately there are patterns where there is no 'n' that
answers this question, which is why i created a ticket regardless.

But I am curious, why is it so important to know the offsets? I totally
think we should provide a way to access this data, but I would love to know
why it is so important to you specifically. Historically it was useful to
access the offsets as it was the only way to access capture data via an
indexed data structure, evaling $1 or whatever isnt pretty, but why are the
offsets useful if you can just access the named capture buffer by name? Why
do you want to go through the indirection of @+ and @-?

cheers,
Yves

--
perl -Mre=debug -e "/just|another|perl|hacker/"

Re: [RFC PATCH] introduce the RXapif_NPAR flag [ In reply to ]

agladkov at redhat

Nov 22, 2023, 2:40 AM

Post #12 of 12 (184 views)

On Wed, Nov 22, 2023 at 10:24:42AM +0100, demerphq wrote:
> > > > On Thu, Sep 21, 2023 at 4:33?AM Oleg Nesterov <oleg@redhat.com> wrote:
> > > >
> > > >> Alexey is trying to rewrite some code from Go to Perl and he ran
> > > >> into a problem: it is not possible to know the start/end offsets
> > > >> of the named captures.
> > > >>
> > > >
> > > > Isn't this information available in @- and @+? I recall (and see
> > `perldoc
> > > > perlvar`) that for each $n where $-[$n] is defined, you can use
> > "substr $_,
> > > > $-[n], $+[n] - $-[n]" to extract the individual strings.
> > > >
> > >
> > > For some reason they wanted to know what index in @- and @+ they should
> > > look in for named capture "a" in a pattern like:
> > >
> > > /(?<a>A)(?<b>B)/
> > >
> > > They didn't explain why they can't just use $-{a} to access the content
> > of
> > > the capture buffer for the named capture 'a'.
> >
> > IIRC, the problem is NOT that you can't access the CONTENT of the named
> > capture.
> >
> > IIRC, the problem is that you can't know the start/end offsets of the
> > matched
> > named capture.
> >
> > Yes, we have $-[n] and $+[n]. So which "n" can I use for a named capture?
> >
>
> As of 5.38 unfortunately there are patterns where there is no 'n' that
> answers this question, which is why i created a ticket regardless.
>
> But I am curious, why is it so important to know the offsets? I totally
> think we should provide a way to access this data, but I would love to know
> why it is so important to you specifically. Historically it was useful to
> access the offsets as it was the only way to access capture data via an
> indexed data structure, evaling $1 or whatever isnt pretty, but why are the
> offsets useful if you can just access the named capture buffer by name? Why
> do you want to go through the indirection of @+ and @-?

I have a utility that uses regular expressions to tokenize parts of a
string and adds special text (defined by the user) around named groups.
The entire resulting string is then sent to the user. I'm simplifying
things a lot. In reality, opening and closing a group should not be in the
wrong order in the case when one named group is inside another.

/123(?<a>A(?<b>B))C/ => 123<a>A<b>B</b></a>C

I need not only the content itself, but also the position of the start and
end of the group. In fact, content is not needed if the position is known.

Perl has start and end positions for groups. But there is no way to
understand the number of a named group.

--
Rgrds, agladkov