Mailing List Archive

SEGV ($_ and arithmetical "for")
Sorry about the vagueness, but I've written a script which is throwing
phase-of-moon SEGVs at me. The only thing that seems to be in common is
that I'm using an arithmetical for (C-style for) and manipulating $_
within it. Linux and perl 5.001l.

Sorry I can't quote a reproducible example, but the problem comes and
goes with trivial changes - e.g. adding print statements, running on
different machine etc.. Can anyone shed any light?

This routine here (sometimes) threw a SEGV at me while $source was
just the global $_ rather than a parameter. It would go 3500 times
round the loop, then crash 311 iterations in on the second call.

sub addcounts { # size, threshold
my ( $size, $threshold, $source ) = @_;
print STDERR "size=$size threshold=$threshold\n";
my %count = ();
my $end = length($source) - $size ;
warn "end=$end size=$size length=",length($source),"\n";
for($i=0; $i<$end ; $i++) {
my $str = substr($_,$i,$size); # <<<SEGV HERE
##fix using this my $str = substr($source,$i,$size);
$count{$str}++ unless $str =~ /[\0\n]/ };
while(($k,$v)=each %count) {
push @counts,[$k,$v*(length($k)-1)] if $v>=$threshold;
};
print STDERR ": ", +(scalar keys %count)," words, ",
(scalar @counts)," over threshold\n";
}

having decided that this was too vague to report, I now run into another
one. I'm counting the occurrence of REs in a string, and this statement,
later in the program, is SEGVing:

for $ch (@counts[0.. ($#counts>10 ? 10 : $#counts )]) {
$ch->[1] = ( s/\Q$ch->[0]/$ch->[0]/g ) * (length($ch->[0]) - 1 );

This loop is within another arithmetical "for": for($slot=255; $slot>0; $slot--)

I feel sure there must be a better way of counting REs in a string. I
don't want to pull all the matches into a vector, since this could be
too large for comfort when I run this for real.

Thank you for reading this far :-)

Ian
Re: SEGV ($_ and arithmetical "for") [ In reply to ]
Larry Wall <lwall@netlabs.com> wrote:

> The following has been submitted to the Perl buglist on your behalf. The
> bug number is not yet assigned, but will probably be NETaa14717 or so.
>
> Sincerely,
> Larry's perlbug program
>
> From: ian@pipex.net (Ian Phillipps)
>
> Headline: SEGV ($_ and arithmetical "for")
> Severity: 1

I have now become convinced that this problem lies with the handling of
'sort', and only manifests itself if the number of elements to be sorted
is of the order of a few thousand or more.

Further, it appears that a problem - of incorrect sort order, not a core
dump - results from using an in-line block in the sort command.

I had occasion (as one does) to count words in the Bible (project
Gutenberg edition 10b). Applying this:
for (/\b([a-zA-Z]\w*)\b/g) { $w{$_}++ }
to the text of the Bible resulted in a 13561-entry %w.

Doing this:
@w = sort { $w{$b} <=> $w{a} } keys %w;
produced an @w which was incorrectly sorted, the first few words being:

the shall and that And to of in he unto his I Dread sudden
stacte heaviness interpreted graveclothes rending slander
muttered until baketh interpreter Passing

Doing this:
sub wb { $w{$b} <=> $w{$a} };
@ww = sort wb keys %w;
produced a correctly-sorted list, which seems to exhonorate the qsort
itself.
the and of to And that in shall he unto I his a for they be is
him LORD not them with it all thou was

Perl 5.001l; Linux 1.2.11, GCC 2.58 (a.out) shared libraries
/lib/libc.so.4.6.27. Default build options taken throughout.

Hmm... repeating this by loading the word-counts from a stored file
(rather than counting them from the original text) gives a sort that
works. :-(

Ian
Do you think that the phrase in the mis-sorted list "muttered until
baketh interpreter" is trying to tell us something? :-)
Re: SEGV ($_ and arithmetical "for") [ In reply to ]
On Sat, 28 Oct 1995, Ian Phillipps wrote:

> ...
> Ian
> Do you think that the phrase in the mis-sorted list "muttered until
> baketh interpreter" is trying to tell us something? :-)

Dunno, but I'd suggest someone should use it in their .sig file. :-)

"muttered until baketh interpreter"
-- The Bible, as told by Perl5

On a more practical note, can you try moving the entire thing to a
different machine, or bug in a different qsort()? It seems unlikely that
the system qsort would be broken, but if it was...

--
Kenneth Albanowski (kjahds@kjahds.com, CIS: 70705,126)
Re: SEGV ($_ and arithmetical "for") [ In reply to ]
Ian Phillipps writes:
>
> I have now become convinced that this problem lies with the handling of
> 'sort', and only manifests itself if the number of elements to be sorted
> is of the order of a few thousand or more.
>
> Further, it appears that a problem - of incorrect sort order, not a core
> dump - results from using an in-line block in the sort command.
>

I mumbled about this long before the database were available, so it
may have missed it. On the other hand, it may be there, in severity 7
or 8 ;-). What I say below probably is not related to the above
problem, but maybe...

Perl stack may be relocated if it needs to grow, so after any "long"
operation (sub call) you will see SPAGAIN that refetches the value of
sp. Sort operates on the stack itself, so buildin sort just preextends
stack on 20 (in case of signal coming) and runs qsort on the stack.

The sort with a code, however, cannot do it, so it just goes to
separate sortstack and calles the subroutine from it, sorting the
values on the previous stack.

Now note that a sort inside a sort subroutine will result in a great
havoc.

Note also that a simple operation, like popping a value out of stack
may result in a subroutine call if the operation is tied. Since the
addition is implemented as
pop
pop
add
push
without SPAGAIN in between (for speed), just adding something to a
tied value may result in SEGV. Larry answered this by noting that tie
operations may operate from a different stack too, but this may fail
if you operate tied values inside the tied subroutine.

My proposal was to (of course!) have a linked list of stacks with just
switching to the next in the above scenarios. As you see, it is of a
very low priority.

If anyone is interested, look for SWITCHSTACK in the source.

Ilya