I have a problem with configuring a %SIG{CHLD} handler function in AIX
3.2.5 that I believe stems from a bug in signal(). I've tried rewriting
this test program in C using signal() and ran into similar problems. I
changed to sigset() and it worked properly.
/*
I'm crossposting this to the perl5-porters mailing list, but I don't
read it. If you respond to this there, could you cc your comments to
me? Oh, and sorry about the length, but there's a lot to explain.
*/
Could someone running perl 5 in a non-AIX environment (preferably a
SysVish system, since it may be a bug in AIX but it's poor design in
SysV anyway) run this test script for me? I want to make sure it's a
problem with AIX and not a problem with my program.
The goal of the test is to fork off a controlled number of children. It
should fork 20 children total, but no more than 10 should run at one
time. And every time a child dies, a new one should take its place.
The logic is pretty simple if you read the code.
---BEGIN FORKCNT.PL---
#!/vendr/local/bin/perl
$proc=0; # total child procs
$proccnt=0; # total current procs
$procmax=10; # max simultaneous procs
$proctot=20; # max procs
sub chld {
$SIG{'CHLD'} = \&chld; # because SysV signal() is broken
wait;
$proccnt--;
print "Caught SIGCHLD... proccnt = $proccnt\n";
# yes, I know you shouldn't put a print statement in
# a signal handler with non-reentrant libs. But I
# needed it for debugging and it hasn't caused any problems.
}
$SIG{'CHLD'} = \&chld;
while ($proc < $proctot) {
if ($proccnt < $procmax) {
$proc++;
$proccnt++;
if (($pid = fork) == 0) {
print "Child $proc sez hi\n";
sleep $proc;
exit;
}
elsif (!(defined $pid)) {
die "fork failed... exiting";
}
}
else {
print "waiting... proc = $proc, proccnt = $proccnt\n";
sleep 1;
}
}
print "exiting... proccnt = $proccnt, proc = $proc\n";
exit;
---END FORKCNT.PL---
And here are the results of running it in our AIX. You'll love this,
it's really screwy.
/home/stagda $ forkcnt.pl
Child 1 sez hi
Child 2 sez hi
Child 3 sez hi
Child 4 sez hi
Child 5 sez hi
Child 6 sez hi
Child 7 sez hi
Child 8 sez hi
Child 9 sez hi
Child 10 sez hi
waiting... proc = 10, proccnt = 10
Caught SIGCHLD... proccnt = 9
Caught SIGCHLD... proccnt = 8
Child 11 sez hi
Child 12 sez hi
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
^C/home/stagda $
See what's happening? It apparently catches the first signal okay,
resets $SIG{CHLD}, catches the second signal, and then fails to catch
any more. A ps shows that the rest of the children did die correctly.
<FLAME>
If this is ANOTHER AIX bug, I think smoke will come out of my ears. I
already wasted over a week tracking down a kernel bug with mmap(), and
you can imagine how hard it is for some schmoe developer to convince IBM
there is a kernel bug (at least I got the satisfaction of an
acknowledgement from them, eventually). I don't mind the peculiar AIX
ways of doing some things. I *do* mind finding serious bugs in basic OS
services.
</FLAME>
If this *is* an AIX bug, there is still a workaround, which should maybe
be considered in future releases of perl as a fix for both AIX and other
SysV systems. The semantics of sigset() are identical to signal(). The
difference is that the signal handler remains installed (eliminating the
need to reinstall the handler when you catch a signal, which leads to
SysV-specific portability problems that perl shouldn't have). This is
like BSD signal(). It still doesn't restart interrupted system calls
like BSD signal(), but neither does the existing method.
This is something that could be done with conditional compilation. Test
for SysV (except R2, but who uses that anymore?), and substitute
sigset() for signal(). This would provide more BSD-like semantics,
enhance portability and reliability, and fix my stupid bug. :}
--
* David Stagner david_stagner@ncs.com
* National Computer Systems vox 319 354 9200 ext 6884
* Operations Division fax 319 339 6555
I disclaim my employer and I'm sure they'd disclaim me too.
3.2.5 that I believe stems from a bug in signal(). I've tried rewriting
this test program in C using signal() and ran into similar problems. I
changed to sigset() and it worked properly.
/*
I'm crossposting this to the perl5-porters mailing list, but I don't
read it. If you respond to this there, could you cc your comments to
me? Oh, and sorry about the length, but there's a lot to explain.
*/
Could someone running perl 5 in a non-AIX environment (preferably a
SysVish system, since it may be a bug in AIX but it's poor design in
SysV anyway) run this test script for me? I want to make sure it's a
problem with AIX and not a problem with my program.
The goal of the test is to fork off a controlled number of children. It
should fork 20 children total, but no more than 10 should run at one
time. And every time a child dies, a new one should take its place.
The logic is pretty simple if you read the code.
---BEGIN FORKCNT.PL---
#!/vendr/local/bin/perl
$proc=0; # total child procs
$proccnt=0; # total current procs
$procmax=10; # max simultaneous procs
$proctot=20; # max procs
sub chld {
$SIG{'CHLD'} = \&chld; # because SysV signal() is broken
wait;
$proccnt--;
print "Caught SIGCHLD... proccnt = $proccnt\n";
# yes, I know you shouldn't put a print statement in
# a signal handler with non-reentrant libs. But I
# needed it for debugging and it hasn't caused any problems.
}
$SIG{'CHLD'} = \&chld;
while ($proc < $proctot) {
if ($proccnt < $procmax) {
$proc++;
$proccnt++;
if (($pid = fork) == 0) {
print "Child $proc sez hi\n";
sleep $proc;
exit;
}
elsif (!(defined $pid)) {
die "fork failed... exiting";
}
}
else {
print "waiting... proc = $proc, proccnt = $proccnt\n";
sleep 1;
}
}
print "exiting... proccnt = $proccnt, proc = $proc\n";
exit;
---END FORKCNT.PL---
And here are the results of running it in our AIX. You'll love this,
it's really screwy.
/home/stagda $ forkcnt.pl
Child 1 sez hi
Child 2 sez hi
Child 3 sez hi
Child 4 sez hi
Child 5 sez hi
Child 6 sez hi
Child 7 sez hi
Child 8 sez hi
Child 9 sez hi
Child 10 sez hi
waiting... proc = 10, proccnt = 10
Caught SIGCHLD... proccnt = 9
Caught SIGCHLD... proccnt = 8
Child 11 sez hi
Child 12 sez hi
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
waiting... proc = 12, proccnt = 10
^C/home/stagda $
See what's happening? It apparently catches the first signal okay,
resets $SIG{CHLD}, catches the second signal, and then fails to catch
any more. A ps shows that the rest of the children did die correctly.
<FLAME>
If this is ANOTHER AIX bug, I think smoke will come out of my ears. I
already wasted over a week tracking down a kernel bug with mmap(), and
you can imagine how hard it is for some schmoe developer to convince IBM
there is a kernel bug (at least I got the satisfaction of an
acknowledgement from them, eventually). I don't mind the peculiar AIX
ways of doing some things. I *do* mind finding serious bugs in basic OS
services.
</FLAME>
If this *is* an AIX bug, there is still a workaround, which should maybe
be considered in future releases of perl as a fix for both AIX and other
SysV systems. The semantics of sigset() are identical to signal(). The
difference is that the signal handler remains installed (eliminating the
need to reinstall the handler when you catch a signal, which leads to
SysV-specific portability problems that perl shouldn't have). This is
like BSD signal(). It still doesn't restart interrupted system calls
like BSD signal(), but neither does the existing method.
This is something that could be done with conditional compilation. Test
for SysV (except R2, but who uses that anymore?), and substitute
sigset() for signal(). This would provide more BSD-like semantics,
enhance portability and reliability, and fix my stupid bug. :}
--
* David Stagner david_stagner@ncs.com
* National Computer Systems vox 319 354 9200 ext 6884
* Operations Division fax 319 339 6555
I disclaim my employer and I'm sure they'd disclaim me too.