Mailing List Archive

Is quotewords really this slow???
With a input data file of records read into an array of lines, I use
quotewords to break up the record into it's component fields, some
of which are quoted strings. The records are not long - 8 fields only.

I'm finding that it takes between .3 and .4 second *PER RECORD*!!!! This
seems so slow that I'm wondering if there is something wrong...

Here is a sample of the loop values before and after the calls.

$line[0] = $_;

@t = times;
printf ("Start_ %3d %6.2f %6.2f <p>",$i,$t[0],$t[1]);

@record = &Text::ParseWords::quotewords('\s+', 0, @line);

@t = times;
printf ("Parsed %3d %6.2f %6.2f <p>",$i,$t[0],$t[1]);

----
Start_ 0 2.95 0.85

Parsed 0 3.55 0.88
----
Start_ 1 3.57 0.88

Parsed 1 3.90 0.90
----
Start_ 2 3.92 0.90

Parsed 2 4.28 0.90
----
Start_ 3 4.30 0.90

Parsed 3 4.65 0.90
----

and so forth for several hundred lines...

The response pushes me well over the limit for human factors on this
application, so I need suggestions on how to get about a factor of
10 improvement in this section of code... (I can hear the moans of
people saying, riight.... yer jokin :-)

Suggestions???
Re: Is quotewords really this slow??? [ In reply to ]
Re: Is quotewords really this slow??? [ In reply to ]
Re: Is quotewords really this slow??? [ In reply to ]
Re: Is quotewords really this slow??? [ In reply to ]
Perhaps someone with nothing to do would like to write ParseWords.xs?
Alternately, someone with something to do that they don't want to do would do.

Larry
Re: Is quotewords really this slow??? [ In reply to ]
Just FYI, I tried it with split's that are specific to the actual record
setup:

# $line[0] = $_;
@t = times;
printf ("Start_ %3d %6.2f %6.2f <p>\n",$i,$t[0],$t[1]);

@a = split(/"/, $_);
@b = split(/\s+/, $a[0]);
@record = (@b, $a[1]);

# @record = &Text::ParseWords::quotewords('\s+', 0, @line);
@t = times;
printf ("Parsed %3d %6.2f %6.2f <p>\n",$i,$t[0],$t[1]);


And this gives responses rather faster:

--
Start_ 0 2.82 0.52

Parsed 0 2.82 0.52
--
Start_ 1 2.83 0.52

Parsed 1 2.85 0.52
--
Start_ 2 2.85 0.52

Parsed 2 2.85 0.52
--
Start_ 3 2.85 0.52

Parsed 3 2.87 0.52
--
Start_ 4 2.88 0.52

Parsed 4 2.90 0.52
--
Start_ 5 2.92 0.52

Parsed 5 2.92 0.52
--

This brings performance within the range of usability on my particular
applications - particularly since it will be on a much faster machine...

If someone does write a fast Parsewords, let me know. Also, consider a
version that works slightly differently - one that parses by line.
The input would be an array of lines of text and it would parse
each individual input line into an array and then return an array
of the line arrays. The existing version looks useful for compilers, but
the above is more useful for scarfing up record oriented data files...

An even smarter version might allow you to specify that one record
consists of n lines from the source array...
Re: Is quotewords really this slow??? [ In reply to ]
On Thu, 7 Sep 1995, Larry Wall wrote:

> Perhaps someone with nothing to do would like to write ParseWords.xs?
> Alternately, someone with something to do that they don't want to do would do.

Hmmm. This is actually _very_ similar to some of the parsing that I've
been inventing to cope with POD. I'll see if I can whip something up.

> Larry

--
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)
Re: Is quotewords really this slow??? [ In reply to ]
Try this. It's a quick hack, and I'm not positive it works properly and
is faster, but it seems both. If this is faster and works, I can clean it
up a bit.

(Part of reason I'm not sure of the speed is that this bit of code segfaults
under perl5.001m. Sigh.

use Text::ParseWords;
$text = "testasdfa sdas'asfdjnaskdnfas'aslnfdasnd asl nflkasdn lasnd";
for($i=0;$i<200;$i++) {
quotewords("\\s+",1,$text);
}

)


-------------------------
#!/usr/bin/perl

use Carp;

sub quotewords {
my($delim,$keep,@lines) = @_;

local($_)=join("",@lines);

$_ =~ s/%/%0%0/g;

$_ =~ s/\\(.)/"%1".ord($1)."%1"/ge;

@q=();
while(length($_)) {
if( m/([`'"])/g) {
$q = $1;
$begin = pos;
if( ! m/$q/g) {
croak "unmatched quote";
}
$end = pos;
push(@q,substr($_,0,$begin-1));
push(@q,[$q,substr($_,$begin,$end-$begin-1)]);
$_ = substr($_,$end);
} else {
push(@q,$_);
last;
}
}

@q2 = ("");
for (@q) {
if(ref($_)) {
$q2[-1].= @{$_}[0] if $keep;
$q2[-1].= @{$_}[1];
$q2[-1].= @{$_}[0] if $keep;
} else {
while( m/$delim/g ) {
$q2[-1].= $`;
push(@q2,"");
$_ = $';
pos = 0;
}
$q2[-1].=$_;
}
}

map( s/%1(\d+)%1/"\\".chr($1)/ge , @q2 );
map( s/%0%0/%/g , @q2 );

@q2;
}


--
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)