Mailing List Archive

FIXED: Re: strange behavior in heartbeat children
Hi there,

) Most of what I do regarding signal handling is normal for daemon
) processes. The only thing I can do that might be funny would be that
) I ignore SIGCHLD. This could be the problem. You could try and
) change it before it starts the scripts, and see if that fixes the
) problem.

This is exactly the problem. This behaviour will be inherited by all
children of heartbeat, and god knows what will happen to them if they
rely on the defaults. I really believe that every special setup heartbeat
does to itself should be undone before forking unrelated children, because
they may also be affected by the changes in a bad way.

Luis Claudio and I solved this problem by calling signal(SIGCHLD, SIG_DFL)
just before forking in a few places... now even httpd startup messages are
ok. Of course, drbd's datadisk is also happy now. :) The altered functions
were req_our_resources and notify_world.

Patch follows:

---8<---Cut---Here---8<---
--- heartbeat-0.4.7b/heartbeat/heartbeat.c Fri May 12 16:19:49 2000
+++ linux-ha.new/heartbeat/heartbeat.c Wed May 24 15:13:55 2000
@@ -1199,6 +1316,7 @@
case 0: { /* Child */
int j;
make_normaltime();
+ signal(SIGCHLD, SIG_DFL);
for (j=0; j < msg->nfields; ++j) {
char ename[64];
sprintf(ename, "HA_%s", msg->names[j]);
@@ -1759,6 +1967,7 @@
break;
}

+ signal(SIGCHLD, SIG_DFL);
ha_log(LOG_INFO, "Requesting our resources.");
sprintf(cmd, HALIB "/ResourceManager listkeys %s", curnode->nodename);

---8<---Cut---Here---8<---

Cheers!
Fábio
( Fábio Olivé Leite -* ConectivaLinux *- olive@conectiva.com[.br] )
( PPGC/UFRGS MSc candidate -*- Advisor: Taisy Silva Weber )
( Linux - Distributed Systems - Fault Tolerance - Security - /etc )
FIXED: Re: strange behavior in heartbeat children [ In reply to ]
Fábio Olivé Leite wrote:
>
> Hi there,
>
> ) Most of what I do regarding signal handling is normal for daemon
> ) processes. The only thing I can do that might be funny would be that
> ) I ignore SIGCHLD. This could be the problem. You could try and
> ) change it before it starts the scripts, and see if that fixes the
> ) problem.
>
> This is exactly the problem. This behaviour will be inherited by all
> children of heartbeat, and god knows what will happen to them if they
> rely on the defaults. I really believe that every special setup heartbeat
> does to itself should be undone before forking unrelated children, because
> they may also be affected by the changes in a bad way.
>
> Luis Claudio and I solved this problem by calling signal(SIGCHLD, SIG_DFL)
> just before forking in a few places... now even httpd startup messages are
> ok. Of course, drbd's datadisk is also happy now. :) The altered functions
> were req_our_resources and notify_world.
>
> Patch follows:
>
> ---8<---Cut---Here---8<---
> --- heartbeat-0.4.7b/heartbeat/heartbeat.c Fri May 12 16:19:49 2000
> +++ linux-ha.new/heartbeat/heartbeat.c Wed May 24 15:13:55 2000
> @@ -1199,6 +1316,7 @@
> case 0: { /* Child */
> int j;
> make_normaltime();
> + signal(SIGCHLD, SIG_DFL);
> for (j=0; j < msg->nfields; ++j) {
> char ename[64];
> sprintf(ename, "HA_%s", msg->names[j]);
> @@ -1759,6 +1967,7 @@
> break;
> }
>
> + signal(SIGCHLD, SIG_DFL);
> ha_log(LOG_INFO, "Requesting our resources.");
> sprintf(cmd, HALIB "/ResourceManager listkeys %s", curnode->nodename);

You missed one. There's another in the code to give up all our
resources as well...


-- Alan Robertson
alanr@suse.com
FIXED: Re: strange behavior in heartbeat children [ In reply to ]
Alan Robertson wrote:
>
> Fábio Olivé Leite wrote:
> >
> > Hi there,
> >
> > ) Most of what I do regarding signal handling is normal for daemon
> > ) processes. The only thing I can do that might be funny would be that
> > ) I ignore SIGCHLD. This could be the problem. You could try and
> > ) change it before it starts the scripts, and see if that fixes the
> > ) problem.
> >
> > This is exactly the problem. This behaviour will be inherited by all
> > children of heartbeat, and god knows what will happen to them if they
> > rely on the defaults. I really believe that every special setup heartbeat
> > does to itself should be undone before forking unrelated children, because
> > they may also be affected by the changes in a bad way.
> >
> > Luis Claudio and I solved this problem by calling signal(SIGCHLD, SIG_DFL)
> > just before forking in a few places... now even httpd startup messages are
> > ok. Of course, drbd's datadisk is also happy now. :) The altered functions
> > were req_our_resources and notify_world.
> >
> > Patch follows:
> >
> > ---8<---Cut---Here---8<---
> > --- heartbeat-0.4.7b/heartbeat/heartbeat.c Fri May 12 16:19:49 2000
> > +++ linux-ha.new/heartbeat/heartbeat.c Wed May 24 15:13:55 2000
> > @@ -1199,6 +1316,7 @@
> > case 0: { /* Child */
> > int j;
> > make_normaltime();
> > + signal(SIGCHLD, SIG_DFL);
> > for (j=0; j < msg->nfields; ++j) {
> > char ename[64];
> > sprintf(ename, "HA_%s", msg->names[j]);
> > @@ -1759,6 +1967,7 @@
> > break;
> > }
> >
> > + signal(SIGCHLD, SIG_DFL);
> > ha_log(LOG_INFO, "Requesting our resources.");
> > sprintf(cmd, HALIB "/ResourceManager listkeys %s", curnode->nodename);
>
> You missed one. There's another in the code to give up all our
> resources as well...

And, they are all in CVS...


-- Alan Robertson
alanr@suse.com