Mailing List Archive

Regular expressions store capture call and undef operator
Hello,

So I'm implementing a C compiler with Perl regular expressions and mid
pattern code execution - part of it's working rely on having the captured
named groups available at the point of the code in (?{ }), however
currently a subroutine call, when returns it destroys the capture context.
Currently I can just embed the pattern directly instead of issuing a
subroutine call but it would be cool if I could avoid duplication.

The other issue is that I'm matching out of order to feed in order
information to the compiler backend but for that I need to create what I
call 'facets' - copies of the same pattern but without code calls in order
to fill the match when backtracking. The same motivation as above is coming
into play here - I want to avoid code duplication.

I'm proposing non destructive subroutine calls with the syntax

(?&&sub)

Which will have the same behavior if the subroutine body is embedded as
text inside the pattern.

So if we have:

(?<sub>sometext)

(?&&sub) will be an alias to the above and so:

(?&&sub)\g{sub}

will work and match sometext twice.

Also the same behavior will apply recursively to any named group define
inside the subroutine but it won't apply to destructive (aka normal)
subroutine calls:

(?<sub>some(?<text>text))

(?&&sub)\g{text}

will match sometexttext

But if we have like:

(?<sub>some(?&text))

the capture 'text' (as it's currently) won't exist in the caller.

For the second part of this proposal I suggest (*UNDEF:name) verb. Used
like this:

(?<sub>some(?<text>text)(?(<facet>)|(?{someperlsub($+{text})})))
(?<facet>)(?#disable code calls)(?&sub)(*UNDEF:facet)(?#enable code calls
back)(?&sub)

Which will instance someperlsub only a single time.

The benefits of this syntax is easier parsing of complex structures (like
the C programming language) with plain Regular Expressions.

Potential issues at least with the first part of this proposal are the
possible clogging of memory but I feel like if implemented correctly this
issue could be avoided completely.

Thanks so much in advance,

Alexander Nikolov
Re: Regular expressions store capture call and undef operator [ In reply to ]
I've recently found that the opposite may also be needed - i.e. declare a
capture group to act like a subroutine and delete all the matches upon
return.

Thanks so much for your time

On Tue, Sep 21, 2021 at 10:21 AM sasho648 <sasho648@gmail.com> wrote:

> Hello,
>
> So I'm implementing a C compiler with Perl regular expressions and mid
> pattern code execution - part of it's working rely on having the captured
> named groups available at the point of the code in (?{ }), however
> currently a subroutine call, when returns it destroys the capture context.
> Currently I can just embed the pattern directly instead of issuing a
> subroutine call but it would be cool if I could avoid duplication.
>
> The other issue is that I'm matching out of order to feed in order
> information to the compiler backend but for that I need to create what I
> call 'facets' - copies of the same pattern but without code calls in order
> to fill the match when backtracking. The same motivation as above is coming
> into play here - I want to avoid code duplication.
>
> I'm proposing non destructive subroutine calls with the syntax
>
> (?&&sub)
>
> Which will have the same behavior if the subroutine body is embedded as
> text inside the pattern.
>
> So if we have:
>
> (?<sub>sometext)
>
> (?&&sub) will be an alias to the above and so:
>
> (?&&sub)\g{sub}
>
> will work and match sometext twice.
>
> Also the same behavior will apply recursively to any named group define
> inside the subroutine but it won't apply to destructive (aka normal)
> subroutine calls:
>
> (?<sub>some(?<text>text))
>
> (?&&sub)\g{text}
>
> will match sometexttext
>
> But if we have like:
>
> (?<sub>some(?&text))
>
> the capture 'text' (as it's currently) won't exist in the caller.
>
> For the second part of this proposal I suggest (*UNDEF:name) verb. Used
> like this:
>
> (?<sub>some(?<text>text)(?(<facet>)|(?{someperlsub($+{text})})))
> (?<facet>)(?#disable code calls)(?&sub)(*UNDEF:facet)(?#enable code calls
> back)(?&sub)
>
> Which will instance someperlsub only a single time.
>
> The benefits of this syntax is easier parsing of complex structures (like
> the C programming language) with plain Regular Expressions.
>
> Potential issues at least with the first part of this proposal are the
> possible clogging of memory but I feel like if implemented correctly this
> issue could be avoided completely.
>
> Thanks so much in advance,
>
> Alexander Nikolov
>