Mailing List Archive

perl5 regexp bug
Hullo Folk!

Perhaps I am truly taxing perl's regular expression
capabilities.

I have discovered a bug (or at least a regexp limitation)
in perl 5.001 and perl 5.001m.

Perl dies with a segmentation fault under the isolated
conditions I detail below.

The current setup:

I have compiled perl5.001m on a 3.2 68k NeXTstep machine,
with gcc's -g flag.

Try this:

On a unix machine, make a text file (> 4K)
without any ampersands or semicolons:

grasshopper> ls -al /dev > /tmp/capture;

Add one ampersand (&) to the beginning of this file,
/tmp/capture, with your favorite editor.

try this perl script:

-cut here-
#!/b/penrose/albini/perl
#
# iso.pl (an isolation of an html parsing bug)

require 5.001;

@document = <>;

$document = join(' ', @document);

$document =~ s/&([^&;]|\w|\s)*;//g;

printf STDERR "Made it!!!\n\n";

exit;

---cut here---

Unfortunately, with the above file, /tmp/capture,
as input, the script never makes it to the end.

Here are the results:

grasshopper> cd ~/lord/perl/perl5.001m
grasshopper> gdb perl
(gdb) run -d ~/perl/traverse/DDT/iso.pl < /tmp/capture
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /homes/penrose/lord/perl/perl5.001m/perl -d
~/perl/traverse/DDT/iso.pl < /tmp/capture

Loading DB routines from $RCSfile: perl5db.pl,v $$Revision: 4.1
$$Date: 92/08/07 18:24:07 $
Emacs support available.

Enter h for help.

main::(/b/penrose/perl/traverse/DDT/iso.pl:3):
3: require 5.001;
DB<1> r
Program generated(1): Memory access exception on address 0x3f7ffd8
(invalid address).
Reading in symbols for regexec.c...done.
0x4242e in regmatch (prog=0x1c0f3d "\t") at regexec.c:570
570 {
(gdb)


If you remove the ampersand, the script shouldn't have any problems.

I haven't had the time to figure out regmatch() and why this
memory fault occurs.



Christopher Penrose
penrose@ucsd.edu
http://www-crca.ucsd.edu/TajMahal/after.html