Mailing List Archive: signal handling breaks gdb on NetBSD - fixed in CVS

signal handling breaks gdb on NetBSD - fixed in CVS

Jul 13, 2004, 6:56 AM

Post #1 of 13 (1835 views)

Another person at BBN had trouble running gdb on ospfd. gdb would
intermittently lock up.

After running gdb on gdb, I determined that breakpoints seemed to
occur in the sigprocmask14 system call, not at the requested point.
In lib/sigevent.c:quagga_signal_timer, all signals are blocked while
signal handlers are run. (I'm not sure why this is necessary, but I'm
probably missing something.) Even SIGTRAP is blocked, and on NetBSD
2.0_BETA2 this seems to work - even for breakpoints. The sigprocmask
man page says:

The system quietly disallows SIGKILL or SIGSTOP to be blocked.

I have committed a change to quagga_signal_timer to refrain from
blocking SIGTRAP. This should either help people with gdb or be
harmless.

I'm curious if others have had trouble with gdb.

--
Greg Troxel <gdt@ir.bbn.com>

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 14, 2004, 4:17 AM

Post #2 of 13 (1813 views)

On Tue, 13 Jul 2004, Greg Troxel wrote:

> In lib/sigevent.c:quagga_signal_timer, all signals are blocked while
> signal handlers are run. (I'm not sure why this is necessary, but I'm
> probably missing something.)

Because it examines the quagga_sigevent_master_t structure, one
member of which, caught, is set from signal context. Now, we could
possibly drop the blocking altogether actually, if the following two
operations are atomic on all architectures we care about:

(volatile sig_atomic_t) = 0
(volatile sig_atomic_t)++

I'm not sure whether they are, so i blocked signals to be safe. If
those operations are guaranteed atomic, (increment might very well
be, however I'm not sure the store of 0 would be).

If they are not atomic, the other possibility is to only block the
specific signal concerned, and only block the comparison and the
increment. However, that seems like a lot of overhead.

It would be nice though to not run the 'heavy' handler function with
signals blocked.

> Even SIGTRAP is blocked, and on NetBSD
> 2.0_BETA2 this seems to work - even for breakpoints.

At a guess, debuggers need to be able to block SIGTRAP from being
passed to the process being debugged?

> The sigprocmask
> man page says:
>
> The system quietly disallows SIGKILL or SIGSTOP to be blocked.
>
> I have committed a change to quagga_signal_timer to refrain from
> blocking SIGTRAP. This should either help people with gdb or be
> harmless.
>
> I'm curious if others have had trouble with gdb.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
If you sell diamonds, you cannot expect to have many customers.
But a diamond is a diamond even if there are no customers.
-- Swami Prabhupada

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 14, 2004, 4:52 AM

Post #3 of 13 (1817 views)

On Wed, 14 Jul 2004, Paul Jakma wrote:

> If they are not atomic, the other possibility is to only block the specific
> signal concerned, and only block the comparison and the increment. However,
> that seems like a lot of overhead.

Eg, something like the below, compiles-but-not-tested, patch. Though,
we can just strip the signal blocking altogether if someone knows for
sure whether sig_atomic_t can be counted on to be atomic as its
name suggests. :)

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Demand the establishment of the government in its rightful home at Disneyland.

Index: sigevent.c
===================================================================
RCS file: /var/cvsroot/quagga/lib/sigevent.c,v
retrieving revision 1.1
diff -u -r1.1 sigevent.c
--- sigevent.c 19 Jan 2004 21:23:37 -0000 1.1
+++ sigevent.c 14 Jul 2004 11:52:29 -0000
@@ -54,39 +54,50 @@
int
quagga_signal_timer (struct thread *t)
{
- sigset_t newmask, oldmask;
+ sigset_t sset;
struct quagga_sigevent_master_t *sigm;
struct quagga_signal_t *sig;
int i;

sigm = THREAD_ARG (t);
-
- /* block all signals */
- sigfillset (&newmask);
- if ( (sigprocmask (SIG_BLOCK, &newmask, &oldmask)) < 0)
- {
- zlog_err ("quagga_signal_timer: couldnt block signals!");
- sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer,
- &sigmaster, QUAGGA_SIGNAL_TIMER_INTERVAL);
- return -1;
- }
-
+
+ sigemptyset (&sset);
+
for (i = 0; i < sigm->sigc; i++)
{
sig = &(sigm->signals[i]);
+
+ sigaddset (&sset, sig->signal);
+
+ if ((sigprocmask (SIG_BLOCK, &sset, NULL)) < 0)
+ {
+ zlog_err ("quagga_signal_timer: couldnt block signal %d!",
+ sig->signal);
+ sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer,
+ &sigmaster,
+ QUAGGA_SIGNAL_TIMER_INTERVAL);
+ return -1;
+ }
+
+ sigdelset (&sset, sig->signal);
+
if (sig->caught > 0)
{
sig->caught = 0;
- sig->handler();
+
+ if (sigprocmask (SIG_UNBLOCK, &sset, NULL) < 0)
+ zlog_err ("quagga_signal_timer: couldnt unblock signal %d!",
+ sig->signal);
+
+ sig->handler ();
}
+ else if (sigprocmask (SIG_UNBLOCK, &sset, NULL) < 0)
+ zlog_err ("quagga_signal_timer: couldnt unblock signal %d!",
+ sig->signal);
}
-
- sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer, &sigmaster,
- QUAGGA_SIGNAL_TIMER_INTERVAL);
-
- if ( sigprocmask (SIG_UNBLOCK, &oldmask, NULL) < 0 );
- return -1;
-
+
+ sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer, &sigmaster,
+ QUAGGA_SIGNAL_TIMER_INTERVAL);
return 0;
}

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 14, 2004, 6:58 AM

Post #4 of 13 (1811 views)

It would be nice though to not run the 'heavy' handler function with
signals blocked.

I'm not sure it matters - all the handler does is set caught, and the
real handler will run at next timer interval. So it won't speed up
processing.

One possibly useful thing to do is to have a master 'signal handler
runnable' flag that gets set by any signal, and have the select loop
run the signal handler dispatch when that's set on return from
select. That would make signal handling prompt while keeping the
safety properties we have now.

I don't see the need to make the locking more fine-grained, since the
code we avoid deferring isn't the actual handler.

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 14, 2004, 6:59 AM

Post #5 of 13 (1810 views)

I don't see the need to make the locking more fine-grained, since the
code we avoid deferring isn't the actual handler.

Plus, the fewer system calls the better, if it doesn't gain us
anything.

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 14, 2004, 7:06 AM

Post #6 of 13 (1811 views)

On Wed, 14 Jul 2004, Greg Troxel wrote:

> I'm not sure it matters - all the handler does is set caught, and
> the real handler will run at next timer interval. So it won't
> speed up processing.

No i mean the non-sigcontext handler, sig->handler() - not the stub
handler which sigevent sets up and is called in signal context.
sig->handler might _do_ anything (whole reason for existence of
sigevent is that some daemons were doing things like running long
long shut down paths in sigcontext, calling non-sigcontext-safe
library functions).

> One possibly useful thing to do is to have a master 'signal handler
> runnable' flag that gets set by any signal, and have the select
> loop run the signal handler dispatch when that's set on return from
> select. That would make signal handling prompt while keeping the
> safety properties we have now.

Good point.

> I don't see the need to make the locking more fine-grained, since the
> code we avoid deferring isn't the actual handler.

it is, sig->handler() is pointer to god knows what code path.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Are you ever going to do the dishes? Or will you change your major to biology?

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 14, 2004, 10:11 AM

Post #7 of 13 (1815 views)

On Wed, 14 Jul 2004, Paul Jakma wrote:

> Because it examines the quagga_sigevent_master_t structure, one
> member of which, caught, is set from signal context. Now, we could
> possibly drop the blocking altogether actually, if the following
> two operations are atomic on all architectures we care about:
>
> (volatile sig_atomic_t) = 0
> (volatile sig_atomic_t)++

Well, as the name suggests, sig_atomic_t really should be safe to
access on any sane architecture. So i can strip the blocking
altogether.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Nothing increases your golf score like witnesses.

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 15, 2004, 1:45 AM

Post #8 of 13 (1815 views)

On Wed, 14 Jul 2004, Greg Troxel wrote:

> One possibly useful thing to do is to have a master 'signal handler
> runnable' flag that gets set by any signal, and have the select
> loop run the signal handler dispatch when that's set on return from
> select. That would make signal handling prompt while keeping the
> safety properties we have now.

Hmm.. We could create another thread queue i guess, THREAD_SIGNAL or
somesuch. The natural thing would be to have each thread == signal,
could remove the statically allocated sigmaster in that case i think.
but is prompt handling of signals _that_ important? They'll be
handled within 2s at present, presuming no other threads hog run
time.

till then, what about below?

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Their idea of an offer you can't refuse is an offer... and you'd better
not refuse.

Index: ChangeLog
===================================================================
RCS file: /var/cvsroot/quagga/lib/ChangeLog,v
retrieving revision 1.21
diff -u -r1.21 ChangeLog
--- ChangeLog 13 Jul 2004 13:47:25 -0000 1.21
+++ ChangeLog 15 Jul 2004 08:38:55 -0000
@@ -1,3 +1,11 @@
+2004-07-14 Paul Jakma <paul@dishone.st>
+
+ * sigevent.c: (quagga_signal_handler) add a global caught flag.
+ (quagga_signal_timer) dont block signals at all.
+ sig->caught is volatile sig_atomic_t and should be safe to access
+ from signal and normal contexts. The signal blocking is unneeded
+ paranoia. Check global caught flag..
+
2004-07-13 Greg Troxel <gdt@poblano.ir.bbn.com>

* sigevent.c: Don't block SIGTRAP and SIGKILL. Blocking SIGTRAP
Index: sigevent.c
===================================================================
RCS file: /var/cvsroot/quagga/lib/sigevent.c,v
retrieving revision 1.2
diff -u -r1.2 sigevent.c
--- sigevent.c 13 Jul 2004 13:47:25 -0000 1.2
+++ sigevent.c 15 Jul 2004 08:38:55 -0000
@@ -28,9 +28,10 @@
struct thread_master *tm;
struct thread *t;

- struct quagga_signal_t *signals;
+ struct quagga_signal_t *signals;
int sigc;
-
+
+ volatile sig_atomic_t caught;
} sigmaster;

/* Generic signal handler
@@ -49,51 +50,37 @@
if (sig->signal == signo)
sig->caught++;
}
+
+ sigmaster.caught++;
}

int
quagga_signal_timer (struct thread *t)
{
- sigset_t newmask, oldmask;
struct quagga_sigevent_master_t *sigm;
struct quagga_signal_t *sig;
int i;

sigm = THREAD_ARG (t);
-
- /*
- * Block most signals, but be careful not to defer SIGTRAP because
- * doing so breaks gdb, at least on NetBSD 2.0. Avoid asking to
- * block SIGKILL, just because we shouldn't be able to do so.
- */
- sigfillset (&newmask);
- sigdelset (&newmask, SIGTRAP);
- sigdelset (&newmask, SIGKILL);

- if ( (sigprocmask (SIG_BLOCK, &newmask, &oldmask)) < 0)
- {
- zlog_err ("quagga_signal_timer: couldnt block signals!");
- sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer,
- &sigmaster, QUAGGA_SIGNAL_TIMER_INTERVAL);
- return -1;
- }
-
- for (i = 0; i < sigm->sigc; i++)
+ if (sigm->caught > 0)
{
- sig = &(sigm->signals[i]);
- if (sig->caught > 0)
+ sigm->caught = 0;
+
+ for (i = 0; i < sigm->sigc; i++)
{
- sig->caught = 0;
- sig->handler();
+ sig = &(sigm->signals[i]);
+
+ if (sig->caught > 0)
+ {
+ sig->caught = 0;
+ sig->handler ();
+ }
}
}
-
- sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer, &sigmaster,
- QUAGGA_SIGNAL_TIMER_INTERVAL);

- if ( sigprocmask (SIG_UNBLOCK, &oldmask, NULL) < 0 );
- return -1;
-
+ sigm->t = thread_add_timer (sigm->tm, quagga_signal_timer, &sigmaster,
+ QUAGGA_SIGNAL_TIMER_INTERVAL);
return 0;
}

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 15, 2004, 4:21 AM

Post #9 of 13 (1807 views)

without blocking, we need to be careful about ordering, esp. with two
variables, so this could use comments.

I had meant to check the master flag at return from select to get
promptness, rather than avoiding the for loop over memory. IMHO it's
annoying to send a SIGTERM and have it take 2 seconds to exit; there
is no good reason why it shouldn't be prompt. This has actually
caused me grief - I have a script to send sigterms, wait and then
restart everything. Daemons were failing to restart because the old
one hadn't exited yet.

I don't see that it is safe to assume sig_atomic_t can be incremented,
unless POSIX says so, but perhaps all we care about that is that it
ends up non-zero. We could assign 1 to it instead, since we aren't
counting, and that seems safer, not needing a load/store on risc
architectures or a RMW cycle on CISC.

I'd leave the blocking code ifdef 0 for now; we may need to add a
configure test to enable it later.

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 15, 2004, 4:41 AM

Post #10 of 13 (1812 views)

On Thu, 15 Jul 2004, Greg Troxel wrote:

> without blocking, we need to be careful about ordering, esp. with
> two variables,

Yep. That's why the global caught is cleared immediately compare and
not checked again.

> so this could use comments.

Ok.

> I had meant to check the master flag at return from select to get
> promptness, rather than avoiding the for loop over memory. IMHO
> it's annoying to send a SIGTERM and have it take 2 seconds to exit;
> there is no good reason why it shouldn't be prompt. This has
> actually caused me grief - I have a script to send sigterms, wait
> and then restart everything. Daemons were failing to restart
> because the old one hadn't exited yet.

hmmm.. Surely your script is broken?

But yes, I guess we could change it to run the non-signal-context
'signal handlers' on EINTR. We dont need any master flag in that case
btw, EINTR from select() should be enough to tell us a signal occured
;)

> I don't see that it is safe to assume sig_atomic_t can be
> incremented, unless POSIX says so,

It does apparently. I cant find an authoritative reference. However,
I do find many second-hand statements saying it is defined to atomic
by POSIX, and my Stevens APUE book says the same.

But I've not found any authoritative POSIX text or quotes from the
standard to say so.

> but perhaps all we care about that is that it ends up non-zero.
> We could assign 1 to it instead, since we aren't counting, and that
> seems safer, not needing a load/store on risc architectures or a
> RMW cycle on CISC.

Ah, good point.

> I'd leave the blocking code ifdef 0 for now; we may need to add a
> configure test to enable it later.

ok.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
No use getting too involved in life -- you're only here for a limited time.

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 15, 2004, 7:29 AM

Post #11 of 13 (1811 views)

Yep. That's why the global caught is cleared immediately compare and
not checked again.

> so this could use comments.

That's what I meant - the ordering of the operations - test, then
clear and run, is critical, so that there is no chance of a signal
handler not getting run. I'm probably being overly paranoid - I like
to explain why things are safe fo the code reviewer.

hmmm.. Surely your script is broken?

Well, it kills the pids from the pidfiles, waits 2 seconds, and starts
all the daemons. So given that the quagga formal interface spec
doesn't say how long it takes to shut down, this is indeed broken
since really it should wait for the pid files to go away, or until the
processes exit.

But, IMHO it's a reasonable expectation that shutdown at least
commence 'immediately' upen receipt of SIGTERM.

But yes, I guess we could change it to run the non-signal-context
'signal handlers' on EINTR. We dont need any master flag in that case
btw, EINTR from select() should be enough to tell us a signal occured
;)

Good point - that sounds like a good plan.

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 15, 2004, 7:46 AM

Post #12 of 13 (1807 views)

On Thu, 15 Jul 2004, Greg Troxel wrote:

> That's what I meant - the ordering of the operations - test, then
> clear and run, is critical, so that there is no chance of a signal
> handler not getting run.

Indeed.

> I'm probably being overly paranoid - I like to explain why things
> are safe fo the code reviewer.

No, you're not. I had actually initially put the clear of the global
caught /below/ the loop through the array of quagga_signal_t's.

> But, IMHO it's a reasonable expectation that shutdown at least
> commence 'immediately' upen receipt of SIGTERM.

define immediately - the whole point of sigevent is to _not_ do lots
of stuff in signal context. :)

> But yes, I guess we could change it to run the non-signal-context
> 'signal handlers' on EINTR. We dont need any master flag in that case
> btw, EINTR from select() should be enough to tell us a signal occured
> ;)
>
> Good point - that sounds like a good plan.

ah, we'll still need a timer, unless we tell users to be very careful
and use to strace to ensure they only send signals while daemons are
sleeping in select ;) So we can keep the global flag.

i'll have a think about cleanest way to do this.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
your process is not ISO 9000 compliant

Re: signal handling breaks gdb on NetBSD - fixed in CVS [ In reply to ]

Jul 20, 2004, 4:14 AM

Post #13 of 13 (1821 views)

See attached. The result is:

shell 1 shell 2

$ ./test-sig
processed hup $ for H in HUP USR1 USR2 SIGPWR URG ; \
processed hup do kill -${H} `pidof test-sig` ; \
processed hup kill -${H} `pidof test-sig` ; \
processed usr1 kill -${H} `pidof test-sig` ; \
processed usr1 done
processed usr1
processed usr2
processed usr2
processed usr2

Select returns EINTR and the signal handlers are run.

If signals occur outside of select sleep, they'll be spotted at the
beginning of the thread_fetch() loop.

I havnt tested it with either SIGEVENT_SCHEDULE_THREAD and/or
SIGEVENT_BLOCK_SIGNALS defined, but idea is to not need those.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
After years of research, scientists recently reported that there is,
indeed, arroz in Spanish Harlem.