Mailing List Archive

Long non-greedy matches
Although I see that 1n added Gurusamy's patch of July 4th to regexec.c,
it still seems to fail for very long matches on HP-UX 9.05.

Is the "real fix" to change the embedded 32767's to the maximum value
of an I32 - 1?

Jeff
Re: Long non-greedy matches [ In reply to ]
On Wed, 01 Nov 1995 15:03:00 PST, Jeff Okamoto wrote:
>Although I see that 1n added Gurusamy's patch of July 4th to regexec.c,
>it still seems to fail for very long matches on HP-UX 9.05.

Could you post a test case? I have'nt seen it fail on SunOS yet.

>
>Is the "real fix" to change the embedded 32767's to the maximum value
>of an I32 - 1?
>

That just might work, but I maintained the 32767 because explicit match
counts ({n,m} style) are currently implemented with short int's rather than
long's (making n and m no larger than 65535 :-( ), and I didn't know if that
might interact badly with the dependency on 32767 as the ceiling for "max".

Unless there is a good reason for _this_ rather small limit, a better
overall solution would be to change the short's to long's and make the
hardcoded 32767 into LONG_MAX.

If Larry will comment on this, I'll see if I can come up with a patch.

- Sarathy.
gsar@engin.umich.edu
Re: Long non-greedy matches [ In reply to ]
: On Wed, 01 Nov 1995 15:03:00 PST, Jeff Okamoto wrote:
: >Although I see that 1n added Gurusamy's patch of July 4th to regexec.c,
: >it still seems to fail for very long matches on HP-UX 9.05.
:
: Could you post a test case? I have'nt seen it fail on SunOS yet.
:
: >
: >Is the "real fix" to change the embedded 32767's to the maximum value
: >of an I32 - 1?
: >
:
: That just might work, but I maintained the 32767 because explicit match
: counts ({n,m} style) are currently implemented with short int's rather than
: long's (making n and m no larger than 65535 :-( ), and I didn't know if that
: might interact badly with the dependency on 32767 as the ceiling for "max".
:
: Unless there is a good reason for _this_ rather small limit, a better
: overall solution would be to change the short's to long's and make the
: hardcoded 32767 into LONG_MAX.
:
: If Larry will comment on this, I'll see if I can come up with a patch.

I can pretty much promise you that if you start on this, you'll be
tempted to rewrite the whole dang thing. :-)

Which would be okay by me.

But it would certainly be possible to patch up the current code to
use longer arguments. It's just you'll find code in there that looks
like this:

#ifdef REGALIGN
#ifndef lint
if (!((long)ret & 1))
*ret++ = 127;
#endif
#endif
ptr = ret;
*ptr++ = op;
*ptr++ = '\0'; /* Null "next" pointer. */
*ptr++ = '\0';
#ifdef REGALIGN
*(unsigned short *)(ret+3) = arg;
#else
ret[3] = arg >> 8; ret[4] = arg & 0377;
#endif
ptr += 2;

Some of this is legacy code of Henry's, and some of it is my attempts to
speed up his code. The whole thing needs to be rewritten to use
a parse tree of structs rather than one long string. (Henry originally
had the constraint of needing to free one of these using free(), but
that hasn't been true in Perl for a long time.)

Larry
Re: Long non-greedy matches [ In reply to ]
On Wed, 01 Nov 1995 18:26:56 PST, Larry Wall wrote:
>
>I can pretty much promise you that if you start on this, you'll be
>tempted to rewrite the whole dang thing. :-)
>
>Which would be okay by me.
>

I'll save that for when a day when I feel radical and its raining :-)

>
>Some of this is legacy code of Henry's, and some of it is my attempts to
>speed up his code. The whole thing needs to be rewritten to use
>a parse tree of structs rather than one long string. (Henry originally
>had the constraint of needing to free one of these using free(), but
>that hasn't been true in Perl for a long time.)
>

Yes, I did notice the flat structure and the dependency on type sizes. It
looks like it can't be made any faster (well, except maybe for the two-pass
compile and the extra nodes for backslashed chars), and it works (mostly),
so I'll see what I can do about the size limits for a start.

- Sarathy.
gsar@engin.umich.edu