Mailing List Archive

Bug in AIX signal()? Run my test for me?
I have a problem with configuring a %SIG{CHLD} handler function in AIX
3.2.5 that I believe stems from a bug in signal(). I've tried rewriting
this test program in C using signal() and ran into similar problems. I
changed to sigset() and it worked properly.

/*
I'm crossposting this to the perl5-porters mailing list, but I don't
read it. If you respond to this there, could you cc your comments to
me? Oh, and sorry about the length, but there's a lot to explain.
*/

Could someone running perl 5 in a non-AIX environment (preferably a
SysVish system, since it may be a bug in AIX but it's poor design in
SysV anyway) run this test script for me? I want to make sure it's a
problem with AIX and not a problem with my program.

The goal of the test is to fork off a controlled number of children. It
should fork 20 children total, but no more than 10 should run at one
time. And every time a child dies, a new one should take its place.
The logic is pretty simple if you read the code.

---BEGIN FORKCNT.PL---
#!/vendr/local/bin/perl

$proc=0; # total child procs
$proccnt=0; # total current procs
$procmax=10; # max simultaneous procs
$proctot=20; # max procs

sub chld {
$SIG{'CHLD'} = \&chld; # because SysV signal() is broken
wait;
$proccnt--;
print "Caught SIGCHLD... proccnt = $proccnt\n";
# yes, I know you shouldn't put a print statement in
# a signal handler with non-reentrant libs. But I
# needed it for debugging and it hasn't caused any problems.
}

$SIG{'CHLD'} = \&chld;

while ($proc < $proctot) {

if ($proccnt < $procmax) {

$proc++;
$proccnt++;

if (($pid = fork) == 0) {
print "Child $proc sez hi\n";
sleep $proc;
exit;
}
elsif (!(defined $pid)) {
die "fork failed... exiting";
}

}
else {
print "waiting... proc = $proc, proccnt = $proccnt\n";
sleep 1;
}
}

print "exiting... proccnt = $proccnt, proc = $proc\n";
exit;

---END FORKCNT.PL---

And here are the results of running it in our AIX. You'll love this,
it's really screwy.

/home/stagda $ forkcnt.pl
Child 1 sez hi
Child 2 sez hi
Child 3 sez hi
Child 4 sez hi
Child 5 sez hi
Child 6 sez hi
Child 7 sez hi
Child 8 sez hi
Child 9 sez hi
Child 10 sez hi
waiting... proc = 10, proccnt = 10
Caught SIGCHLD... proccnt = 9
Caught SIGCHLD... proccnt = 8
Child 11 sez hi
Child 12 sez hi
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
^C/home/stagda $

See what's happening? It apparently catches the first signal okay,
resets $SIG{CHLD}, catches the second signal, and then fails to catch
any more. A ps shows that the rest of the children did die correctly.

<FLAME>
If this is ANOTHER AIX bug, I think smoke will come out of my ears. I
already wasted over a week tracking down a kernel bug with mmap(), and
you can imagine how hard it is for some schmoe developer to convince IBM
there is a kernel bug (at least I got the satisfaction of an
acknowledgement from them, eventually). I don't mind the peculiar AIX
ways of doing some things. I *do* mind finding serious bugs in basic OS
services.
</FLAME>

If this *is* an AIX bug, there is still a workaround, which should maybe
be considered in future releases of perl as a fix for both AIX and other
SysV systems. The semantics of sigset() are identical to signal(). The
difference is that the signal handler remains installed (eliminating the
need to reinstall the handler when you catch a signal, which leads to
SysV-specific portability problems that perl shouldn't have). This is
like BSD signal(). It still doesn't restart interrupted system calls
like BSD signal(), but neither does the existing method.

This is something that could be done with conditional compilation. Test
for SysV (except R2, but who uses that anymore?), and substitute
sigset() for signal(). This would provide more BSD-like semantics,
enhance portability and reliability, and fix my stupid bug. :}

--
* David Stagner david_stagner@ncs.com
* National Computer Systems vox 319 354 9200 ext 6884
* Operations Division fax 319 339 6555
I disclaim my employer and I'm sure they'd disclaim me too.
Bug in AIX signal()? Run my test for me? [ In reply to ]
Would you consider rewriting this program in C? That way if it is a
bug in AIX you have quite substantial proof to show to IBM *). It is a
somewhat unsound tactic to start fixing/retrofitting Perl too much if
only one vendor is acting badly. If it is only one vendor, hints/foo.sh
should be enough. If it is _many_ vendors (such as all SysVish boxes),
then, perhaps, maybe, a more general fix may be wise.

Apropos: did we get any consensus on resetting signal handlers in
SysVish places? It was discussed a way back but the thread died. Andy?
Dave's example just reminded me on that. I think that _is_ a place
where Perl could do some regularising: either turn the handler off by
default or turn it on -- but not depend on the underlying system
doing whatever it pleases.

++jhi;

*) Har-har, substantial proof to a vendor is an oxymoron :-) I have
found bugs against POSIX 1003.1/segmentation faults in Digital/HP-UX/AIX
and none of them got fixed fast (as compared to free software
such as Perl/gcc/...)