Hello,
So I'm implementing a C compiler with Perl regular expressions and mid
pattern code execution - part of it's working rely on having the captured
named groups available at the point of the code in (?{ }), however
currently a subroutine call, when returns it destroys the capture context.
Currently I can just embed the pattern directly instead of issuing a
subroutine call but it would be cool if I could avoid duplication.
The other issue is that I'm matching out of order to feed in order
information to the compiler backend but for that I need to create what I
call 'facets' - copies of the same pattern but without code calls in order
to fill the match when backtracking. The same motivation as above is coming
into play here - I want to avoid code duplication.
I'm proposing non destructive subroutine calls with the syntax
(?&&sub)
Which will have the same behavior if the subroutine body is embedded as
text inside the pattern.
So if we have:
(?<sub>sometext)
(?&&sub) will be an alias to the above and so:
(?&&sub)\g{sub}
will work and match sometext twice.
Also the same behavior will apply recursively to any named group define
inside the subroutine but it won't apply to destructive (aka normal)
subroutine calls:
(?<sub>some(?<text>text))
(?&&sub)\g{text}
will match sometexttext
But if we have like:
(?<sub>some(?&text))
the capture 'text' (as it's currently) won't exist in the caller.
For the second part of this proposal I suggest (*UNDEF:name) verb. Used
like this:
(?<sub>some(?<text>text)(?(<facet>)|(?{someperlsub($+{text})})))
(?<facet>)(?#disable code calls)(?&sub)(*UNDEF:facet)(?#enable code calls
back)(?&sub)
Which will instance someperlsub only a single time.
The benefits of this syntax is easier parsing of complex structures (like
the C programming language) with plain Regular Expressions.
Potential issues at least with the first part of this proposal are the
possible clogging of memory but I feel like if implemented correctly this
issue could be avoided completely.
Thanks so much in advance,
Alexander Nikolov
So I'm implementing a C compiler with Perl regular expressions and mid
pattern code execution - part of it's working rely on having the captured
named groups available at the point of the code in (?{ }), however
currently a subroutine call, when returns it destroys the capture context.
Currently I can just embed the pattern directly instead of issuing a
subroutine call but it would be cool if I could avoid duplication.
The other issue is that I'm matching out of order to feed in order
information to the compiler backend but for that I need to create what I
call 'facets' - copies of the same pattern but without code calls in order
to fill the match when backtracking. The same motivation as above is coming
into play here - I want to avoid code duplication.
I'm proposing non destructive subroutine calls with the syntax
(?&&sub)
Which will have the same behavior if the subroutine body is embedded as
text inside the pattern.
So if we have:
(?<sub>sometext)
(?&&sub) will be an alias to the above and so:
(?&&sub)\g{sub}
will work and match sometext twice.
Also the same behavior will apply recursively to any named group define
inside the subroutine but it won't apply to destructive (aka normal)
subroutine calls:
(?<sub>some(?<text>text))
(?&&sub)\g{text}
will match sometexttext
But if we have like:
(?<sub>some(?&text))
the capture 'text' (as it's currently) won't exist in the caller.
For the second part of this proposal I suggest (*UNDEF:name) verb. Used
like this:
(?<sub>some(?<text>text)(?(<facet>)|(?{someperlsub($+{text})})))
(?<facet>)(?#disable code calls)(?&sub)(*UNDEF:facet)(?#enable code calls
back)(?&sub)
Which will instance someperlsub only a single time.
The benefits of this syntax is easier parsing of complex structures (like
the C programming language) with plain Regular Expressions.
Potential issues at least with the first part of this proposal are the
possible clogging of memory but I feel like if implemented correctly this
issue could be avoided completely.
Thanks so much in advance,
Alexander Nikolov