Mailing List Archive: [PATCH] fix de_thread() vs do_coredump() deadlock

[PATCH] fix de_thread() vs do_coredump() deadlock

oleg at tv-sign

Oct 13, 2005, 9:50 AM

Post #1 of 11 (840 views)

de_thread() sends SIGKILL to all sub-threads and
waits them to die in 'D' state. It is possible that
one of the threads already dequeued coredump signal.
When de_thread() unlocks ->sighand->lock that thread
can enter do_coredump()->coredump_wait() and cause a
deadlock.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- 2.6.14-rc4/fs/exec.c~ 2005-09-21 21:08:33.000000000 +0400
+++ 2.6.14-rc4/fs/exec.c 2005-10-14 00:19:19.000000000 +0400
@@ -1468,11 +1468,21 @@ int do_coredump(long signr, int exit_cod
current->fsuid = 0; /* Dump root private */
}
mm->dumpable = 0;
- init_completion(&mm->core_done);
+
+ retval = -EAGAIN;
spin_lock_irq(&current->sighand->siglock);
- current->signal->flags = SIGNAL_GROUP_EXIT;
- current->signal->group_exit_code = exit_code;
+ if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
+ current->signal->flags = SIGNAL_GROUP_EXIT;
+ current->signal->group_exit_code = exit_code;
+ retval = 0;
+ }
spin_unlock_irq(&current->sighand->siglock);
+ if (retval) {
+ up_write(&mm->mmap_sem);
+ goto fail;
+ }
+
+ init_completion(&mm->core_done);
coredump_wait(mm);

/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

roland at redhat

Apr 9, 2006, 6:36 PM

Post #2 of 11 (829 views)

Sorry I have been absent from the discussion for such a long time.
I'm trying to catch up on old issues that I should have reviewed earlier.
This is about Oleg's change, commit 1291cf4163d21f1b4999d697cbf68d38e7151c28.

Your change is sensible on its own terms and clearly it resolves the
deadlock. But thinking about this made me realize that we have a more
general problem with this kind of race condition. The coredump deadlock is
the worst failure mode, but in non-coredump fatal signals we were already
getting the semantics wrong in the same race condition. That is, the fatal
signal should never be dropped on the floor. If the exec wins the race,
then any signals not yet acted on should remain queued. Before your patch
this race scenario would hit the deadlock in the coredump case. Now, the
coredump case matches the non-coredump fatal signal: the signal is dropped
on the floor. This violates POSIX, which says that signals pending for the
process are preserved across exec. (There might even be some way to abuse
this, though it seems like a bit of a stretch off hand.)

What do you think about this patch? I think it's the correct thing to do,
and a bit of a cleanup not to use SIGNAL_GROUP_EXIT for exec. The
do_coredump part of the patch can be cleaned up nicely in concert with some
of the other coredump changes you have posted recently. In de_thread, I
wanted to get rid of taking tasklist_lock, which is no longer needed by
zap_other_threads these days. But I didn't really know the story on the
child_reaper magic there that seems still to need the lock. (Note that for
the ptrace corner cases not to lose signals either, we also need the patch
I posted to revert most of commit 30e0fca6c1d7d26f3f2daa4dd2b12c51dadc778a.)

By reverting the 1291cf4163d21f1b4999d697cbf68d38e7151c28 change, I was
able to reproduce the deadlock. With that patch restored, I then verified
that in some of the races, a fatal signal was dropped on the floor. In the
same tests using the patch below, there was never a deadlock or a lost signal.

Thanks,
Roland

[PATCH] Fix race between exec and fatal signals

When we have a race between an exec and another thread in the same group
dequeuing a fatal signal, the signal loses and the exec goes through. The
problem here is that the signal is lost entirely. In some circumstances
this violates POSIX, which says that signals pending for the process
(i.e. on the group's shared_pending queue) are preserved across exec.
This might even be abusable to some extent, though it's hard to see much
real damage being done.

There is no problem when signals remain queued, as they will be seen after
the exec. The problematic race is when a signal has been dequeued by some
thread and is then dropped on floor when the thread decides to go and die
for the exec instead. I think the correct solution is for the exec to
always lose that race.

This patch adds a SIGNAL_GROUP_EXEC flag to distinguish de_thread from a
true group exit that will be fatal to all threads (SIGNAL_GROUP_EXIT).
When an exec is waiting for threads to die and another thread tries to do a
group exit, it aborts the exec so the exec'ing thread will die with the others.

Signed-off-by: Roland McGrath <roland@redhat.com>

---

fs/exec.c | 48 ++++++++++++++++++++++++++++++++++++++++--------
include/linux/sched.h | 3 ++-
kernel/exit.c | 5 +++--
kernel/fork.c | 3 ++-
kernel/signal.c | 31 ++++++++++++++++++++++++-------
5 files changed, 71 insertions(+), 19 deletions(-)

ced7c831060aadb6f2dbb15f36391e84061f5720
diff --git a/fs/exec.c b/fs/exec.c
index 0291a68..eae6371 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -606,15 +606,16 @@ static int de_thread(struct task_struct
*/
read_lock(&tasklist_lock);
spin_lock_irq(lock);
- if (sig->flags & SIGNAL_GROUP_EXIT) {
+ if (unlikely(sig->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC))) {
/*
* Another group action in progress, just
* return so that the signal is processed.
*/
- spin_unlock_irq(lock);
read_unlock(&tasklist_lock);
+ dying:
+ spin_unlock_irq(lock);
kmem_cache_free(sighand_cachep, newsighand);
- return -EAGAIN;
+ return -ERESTARTNOINTR;
}

/*
@@ -625,7 +626,7 @@ static int de_thread(struct task_struct
if (unlikely(current->group_leader == child_reaper))
child_reaper = current;

- zap_other_threads(current);
+ zap_other_threads(current, SIGNAL_GROUP_EXEC);
read_unlock(&tasklist_lock);

/*
@@ -646,6 +647,17 @@ static int de_thread(struct task_struct
if (hrtimer_cancel(&sig->real_timer))
hrtimer_restart(&sig->real_timer);
spin_lock_irq(lock);
+ if (unlikely(!(sig->flags & SIGNAL_GROUP_EXEC))) {
+ BUG_ON(!(sig->flags & SIGNAL_GROUP_EXIT));
+ /*
+ * Oh, dear. Our exec was cancelled by a group
+ * exit during the window where we released the
+ * lock to update the timer. Since we know we are
+ * all about to die, it doesn't matter that the
+ * timer now points at us.
+ */
+ goto dying;
+ }
}
while (atomic_read(&sig->count) > count) {
sig->group_exit_task = current;
@@ -654,6 +666,14 @@ static int de_thread(struct task_struct
spin_unlock_irq(lock);
schedule();
spin_lock_irq(lock);
+ if (unlikely(!(sig->flags & SIGNAL_GROUP_EXEC))) {
+ BUG_ON(!(sig->flags & SIGNAL_GROUP_EXIT));
+ BUG_ON(sig->group_exit_task == current);
+ /*
+ * Our exec was cancelled by a group exit.
+ */
+ goto dying;
+ }
}
sig->group_exit_task = NULL;
sig->notify_count = 0;
@@ -1480,10 +1500,22 @@ int do_coredump(long signr, int exit_cod

retval = -EAGAIN;
spin_lock_irq(&current->sighand->siglock);
- if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
- current->signal->flags = SIGNAL_GROUP_EXIT;
- current->signal->group_exit_code = exit_code;
- current->signal->group_stop_count = 0;
+ if (likely(!(current->signal->flags & SIGNAL_GROUP_EXIT))) {
+ struct signal_struct *sig = current->signal;
+ /*
+ * If there is an exec in progress, cancel it and wake
+ * up the exec'ing thread to feel our imminent zapping.
+ */
+ if (unlikely(sig->flags & SIGNAL_GROUP_EXEC)) {
+ struct task_struct *execer = sig->group_exit_task;
+ sigaddset(&execer->pending.signal, SIGKILL);
+ sig->group_exit_task = NULL;
+ sig->notify_count = 0;
+ wake_up_process(execer);
+ }
+ sig->flags = SIGNAL_GROUP_EXIT;
+ sig->group_exit_code = exit_code;
+ sig->group_stop_count = 0;
retval = 0;
}
spin_unlock_irq(&current->sighand->siglock);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 541f482..1f5d7ab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -463,6 +463,7 @@ struct signal_struct {
#define SIGNAL_STOP_DEQUEUED 0x00000002 /* stop signal dequeued */
#define SIGNAL_STOP_CONTINUED 0x00000004 /* SIGCONT since WCONTINUED reap */
#define SIGNAL_GROUP_EXIT 0x00000008 /* group exit in progress */
+#define SIGNAL_GROUP_EXEC 0x00000010 /* group-killing exec in progress */

/*
@@ -1102,7 +1103,7 @@ extern void do_notify_parent(struct task
extern void force_sig(int, struct task_struct *);
extern void force_sig_specific(int, struct task_struct *);
extern int send_sig(int, struct task_struct *, int);
-extern void zap_other_threads(struct task_struct *p);
+extern void zap_other_threads(struct task_struct *p, int);
extern int kill_pg(pid_t, int, int);
extern int kill_proc(pid_t, int, int);
extern struct sigqueue *sigqueue_alloc(void);
diff --git a/kernel/exit.c b/kernel/exit.c
index 6c2eeb8..176c641 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -826,7 +826,8 @@ static void exit_notify(struct task_stru
state = EXIT_ZOMBIE;
if (tsk->exit_signal == -1 &&
(likely(tsk->ptrace == 0) ||
- unlikely(tsk->parent->signal->flags & SIGNAL_GROUP_EXIT)))
+ unlikely(tsk->parent->signal->flags
+ & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC))))
state = EXIT_DEAD;
tsk->exit_state = state;

@@ -989,7 +990,7 @@ do_group_exit(int exit_code)
exit_code = sig->group_exit_code;
else {
sig->group_exit_code = exit_code;
- zap_other_threads(current);
+ zap_other_threads(current, SIGNAL_GROUP_EXIT);
}
spin_unlock_irq(&sighand->siglock);
}
diff --git a/kernel/fork.c b/kernel/fork.c
index 3384eb8..8276618 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1167,7 +1167,8 @@ static task_t *copy_process(unsigned lon
* do not create this new thread - the whole thread
* group is supposed to exit anyway.
*/
- if (current->signal->flags & SIGNAL_GROUP_EXIT) {
+ if (current->signal->flags
+ & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC)) {
spin_unlock(&current->sighand->siglock);
write_unlock_irq(&tasklist_lock);
retval = -EAGAIN;
diff --git a/kernel/signal.c b/kernel/signal.c
index 0492c04..b59154b 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -465,7 +465,8 @@ int dequeue_signal(struct task_struct *t
* is to alert stop-signal processing code when another
* processor has come along and cleared the flag.
*/
- if (!(tsk->signal->flags & SIGNAL_GROUP_EXIT))
+ if (!(tsk->signal->flags
+ & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC)))
tsk->signal->flags |= SIGNAL_STOP_DEQUEUED;
}
if ( signr &&
@@ -604,7 +605,7 @@ static void handle_stop_signal(int sig,
{
struct task_struct *t;

- if (p->signal->flags & SIGNAL_GROUP_EXIT)
+ if (p->signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC))
/*
* The process is in the middle of dying already.
*/
@@ -887,7 +888,8 @@ __group_complete_signal(int sig, struct
* Found a killable thread. If the signal will be fatal,
* then start taking the whole group down immediately.
*/
- if (sig_fatal(p, sig) && !(p->signal->flags & SIGNAL_GROUP_EXIT) &&
+ if (sig_fatal(p, sig) &&
+ !(p->signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC)) &&
!sigismember(&t->real_blocked, sig) &&
(sig == SIGKILL || !(t->ptrace & PT_PTRACED))) {
/*
@@ -975,12 +977,26 @@ __group_send_sig_info(int sig, struct si

/*
* Nuke all other threads in the group.
+ * tasklist_lock is not needed here, but p->sighand->siglock must be held.
*/
-void zap_other_threads(struct task_struct *p)
+void zap_other_threads(struct task_struct *p, int flag)
{
struct task_struct *t;

- p->signal->flags = SIGNAL_GROUP_EXIT;
+ if (unlikely(p->signal->flags & SIGNAL_GROUP_EXEC)) {
+ /*
+ * We are cancelling an exec that is in progress, to let
+ * the thread group die instead. We need to wake the
+ * exec'ing thread up from uninterruptible wait.
+ */
+ BUG_ON(flag != SIGNAL_GROUP_EXIT);
+ t = p->signal->group_exit_task;
+ p->signal->group_exit_task = NULL;
+ p->signal->notify_count = 0;
+ wake_up_process(t);
+ }
+
+ p->signal->flags = flag;
p->signal->group_stop_count = 0;

if (thread_group_empty(p))
@@ -1564,7 +1580,8 @@ static void ptrace_stop(int exit_code, i
likely(current->parent != current->real_parent ||
!(current->ptrace & PT_ATTACHED)) &&
(likely(current->parent->signal != current->signal) ||
- !unlikely(current->signal->flags & SIGNAL_GROUP_EXIT))) {
+ !unlikely(current->signal->flags
+ & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC)))) {
do_notify_parent_cldstop(current, CLD_TRAPPED);
read_unlock(&tasklist_lock);
schedule();
@@ -1705,7 +1722,7 @@ static int handle_group_stop(void)
return 0;
}

- if (current->signal->flags & SIGNAL_GROUP_EXIT)
+ if (current->signal->flags & (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC))
/*
* Group stop is so another thread can do a core dump,
* or else we are racing against a death signal.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

oleg at tv-sign

Apr 10, 2006, 8:20 AM

Post #3 of 11 (830 views)

On 04/09, Roland McGrath wrote:
>
> [PATCH] Fix race between exec and fatal signals

I'll try to study this patch carefully tomorrow, but now I have
the feeling it is not right (probably my misunderstanding after
the quick reading).

> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -606,15 +606,16 @@ static int de_thread(struct task_struct
>
> ... [snip] ...
>
> - zap_other_threads(current);
> + zap_other_threads(current, SIGNAL_GROUP_EXEC);
>
> ... [snip] ...
>
> -void zap_other_threads(struct task_struct *p)
> +void zap_other_threads(struct task_struct *p, int flag)
> {
> struct task_struct *t;
>
> - p->signal->flags = SIGNAL_GROUP_EXIT;
> + if (unlikely(p->signal->flags & SIGNAL_GROUP_EXEC)) {
> + /*
> + * We are cancelling an exec that is in progress, to let
> + * the thread group die instead. We need to wake the
> + * exec'ing thread up from uninterruptible wait.
> + */
> + BUG_ON(flag != SIGNAL_GROUP_EXIT);
> + t = p->signal->group_exit_task;
> + p->signal->group_exit_task = NULL;
> + p->signal->notify_count = 0;
> + wake_up_process(t);
> + }
> +
> + p->signal->flags = flag;
> p->signal->group_stop_count = 0;

So, de_thread() sets SIGNAL_GROUP_EXEC and sends SIGKILL to other thereads.

Sub-thread receives the signal, and calls get_signal_to_deliver->do_group_exit.
do_group_exit() calls zap_other_threads(SIGNAL_GROUP_EXIT) because there is no
SIGNAL_GROUP_EXIT set. zap_other_threads() notices SIGNAL_GROUP_EXEC, wakes up
execer, and changes ->signal->flags to SIGNAL_GROUP_EXIT.

de_thread() re-locks sighand, sees !SIGNAL_GROUP_EXEC and goes to 'dying:'.

No?

Another problem. de_thread() sets '->group_exit_task = current' _only_ if
'atomic_read(&sig->count) > count', so wake_up_process(->group_exit_task)
in zap_other_threads() is unsafe.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

oleg at tv-sign

Apr 10, 2006, 12:20 PM

Post #4 of 11 (833 views)

On 04/09, Roland McGrath wrote:
>
> Now, the
> coredump case matches the non-coredump fatal signal: the signal is dropped
> on the floor.

A fatal signal is placed to ->shared_pending in any (non tkill) case, so I
think it is not lost (but may be unnoticed for a while).

sig_kernel_coredump() is different. It could be stealed by one of sub-threads
while another one does de_thread(), that is why it could be lost.

What do you think about something like this untested patch instead?
I am far from sure it is correct, I need a sleep ...

Oleg.

--- fs/exec.c~ 2006-04-10 22:15:06.000000000 +0400
+++ fs/exec.c 2006-04-11 00:56:52.000000000 +0400
@@ -657,6 +657,7 @@ static int de_thread(struct task_struct
}
sig->group_exit_task = NULL;
sig->notify_count = 0;
+ recalc_sigpending();
spin_unlock_irq(lock);

/*
@@ -1478,9 +1479,15 @@ int do_coredump(long signr, int exit_cod
}
mm->dumpable = 0;

- retval = -EAGAIN;
spin_lock_irq(&current->sighand->siglock);
- if (!(current->signal->flags & SIGNAL_GROUP_EXIT)) {
+ if (current->signal->flags & SIGNAL_GROUP_EXIT) {
+ // Re-add the signal. This does not matter
+ // if we are doing do_group_exit().
+ // If it was de_thread(), this signal will be
+ // received again after sys_exec() succeeds.
+ sigaddset(&current->signal->shared_pending.signal, signr);
+ retval = -EAGAIN;
+ } else {
current->signal->flags = SIGNAL_GROUP_EXIT;
current->signal->group_exit_code = exit_code;
current->signal->group_stop_count = 0;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

roland at redhat

Apr 11, 2006, 12:27 AM

Post #5 of 11 (807 views)

> So, de_thread() sets SIGNAL_GROUP_EXEC and sends SIGKILL to other thereads.
>
> Sub-thread receives the signal, and calls get_signal_to_deliver->do_group_exit.
> do_group_exit() calls zap_other_threads(SIGNAL_GROUP_EXIT) because there is no
> SIGNAL_GROUP_EXIT set. zap_other_threads() notices SIGNAL_GROUP_EXEC, wakes up
> execer, and changes ->signal->flags to SIGNAL_GROUP_EXIT.
>
> de_thread() re-locks sighand, sees !SIGNAL_GROUP_EXEC and goes to 'dying:'.

That is what I intend. The exec'ing thread backs out and processes its SIGKILL.
It sounds like you are calling this scenario a problem, but I don't know why.

> Another problem. de_thread() sets '->group_exit_task = current' _only_ if
> 'atomic_read(&sig->count) > count', so wake_up_process(->group_exit_task)
> in zap_other_threads() is unsafe.

Ah, this is indeed a problem when the only other thread is the old leader.
I think it's easy enough just to set it unconditionally and clear it at the
end, after the leader too is gone. i.e., this on top of the previous patch:

--- a/fs/exec.c
+++ b/fs/exec.c
@@ -659,8 +659,8 @@ static int de_thread(struct task_struct
goto dying;
}
}
+ sig->group_exit_task = current;
while (atomic_read(&sig->count) > count) {
- sig->group_exit_task = current;
sig->notify_count = count;
__set_current_state(TASK_UNINTERRUPTIBLE);
spin_unlock_irq(lock);
@@ -675,7 +675,6 @@ static int de_thread(struct task_struct
goto dying;
}
}
- sig->group_exit_task = NULL;
sig->notify_count = 0;
spin_unlock_irq(lock);

@@ -774,6 +773,7 @@ static int de_thread(struct task_struct
* but it's safe to stop telling the group to kill themselves.
*/
sig->flags = 0;
+ sig->group_exit_task = NULL;

no_thread_group:
exit_itimers(sig);

Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

roland at redhat

Apr 11, 2006, 1:01 AM

Post #6 of 11 (824 views)

> A fatal signal is placed to ->shared_pending in any (non tkill) case, so I
> think it is not lost (but may be unnoticed for a while).
>
> sig_kernel_coredump() is different. It could be stealed by one of sub-threads
> while another one does de_thread(), that is why it could be lost.

I am not talking about the case where it's still pending on either queue.
Those are fine as they are. What matters is when it's been dequeued, in
the race window afer releasing the siglock in get_signal_to_deliver.
There are two windows of race here.

The first one is only when ptrace'd, and doesn't even require a race that
takes good timing to create. The window is in ptrace_stop when the siglock
is released, including all the time stopped in TASK_TRACED. Say another
thread does an exec (and de_thread) while we're in TASK_TRACED after
reporting a death signal to the debugger. The SIGKILL wakes us out of
ptrace_stop. Assuming the debugger wasn't racing with a PTRACE_CONT too,
then the signal remains in ->exit_code and (assuming the SIGNAL_GROUP_EXIT
check there reverted, as I mentioned before), we just come out of the
ptrace path with the siglock held as if we'd dequeued the signal without ptrace.

Then comes the second window. With no ptrace, or after ptrace, we've
dequeued the signal and if it's a SIG_DFL fatal signal, we release the
siglock. Here a non-coredump signal just calls do_group_exit. Meanwhile,
a racing exec comes along and sets SIGNAL_GROUP_EXIT (or it already did
earlier while we were in ptrace_stop). In do_group_exit, we will see that
SIGNAL_GROUP_EXIT is set, and just do_exit ourselves with the group_exit_code.
When it's an exec rather than a real exit, we've swallowed the signal.
This is no different than the coredump case. (When do_coredump bails out,
then it joins this very same code path.)

> What do you think about something like this untested patch instead?
> I am far from sure it is correct, I need a sleep ...

For one thing, it only handles the coredump case. Moreover, I think there
are too many problems with any plan to stuff a signal back on the queue.
1. This might be moving a signal from some thread's per-thread queue to the
shared_pending queue. If the signal was going to be fatal to the group,
that is fine before the exec but not after it--the new program shouldn't
get a fatal signal generated for some thread it never had.
2. siginfo_t is lost.
These are just what I could think of off hand. Pushing a signal
back on the queue is a whole can of worms that I really want to avoid.

Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

oleg at tv-sign

Apr 11, 2006, 1:05 AM

Post #7 of 11 (809 views)

On 04/11, Roland McGrath wrote:
>
> > So, de_thread() sets SIGNAL_GROUP_EXEC and sends SIGKILL to other thereads.
> >
> > Sub-thread receives the signal, and calls get_signal_to_deliver->do_group_exit.
> > do_group_exit() calls zap_other_threads(SIGNAL_GROUP_EXIT) because there is no
> > SIGNAL_GROUP_EXIT set. zap_other_threads() notices SIGNAL_GROUP_EXEC, wakes up
> > execer, and changes ->signal->flags to SIGNAL_GROUP_EXIT.
> >
> > de_thread() re-locks sighand, sees !SIGNAL_GROUP_EXEC and goes to 'dying:'.
>
> That is what I intend. The exec'ing thread backs out and processes its SIGKILL.
> It sounds like you are calling this scenario a problem, but I don't know why.

Oh, probably I missed something. But as I can see it, MT exec can never succeed!

Once again. Process starts exec, it has no pending signals. Execer thread sets
SIGNAL_GROUP_EXEC, sends SIGKILL to other threads, and waits them to die.
The first thread which dequeues SIGKILL will change SIGNAL_GROUP_EXEC to
SIGNAL_GROUP_EXIT, thus aborting exec.

Sorry for persistance if I really misunderstand this patch.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

roland at redhat

Apr 11, 2006, 2:10 AM

Post #8 of 11 (824 views)

> Once again. Process starts exec, it has no pending signals. Execer thread sets
> SIGNAL_GROUP_EXEC, sends SIGKILL to other threads, and waits them to die.
> The first thread which dequeues SIGKILL will change SIGNAL_GROUP_EXEC to
> SIGNAL_GROUP_EXIT, thus aborting exec.

You are very right. Sorry I did not manage to understand your objection
the first time around. Sigh, trying to think on too little sleep again, I guess.

> Sorry for persistance if I really misunderstand this patch.

Thanks for the persistence.

What I'd really like to do here is get rid of these fake SIGKILLs. We're
already waking the threads up directly in zap_other_threads. We don't
really need to set SIGKILL pending. By setting group_stop_count we can
make sure recalc_sigpending_tsk keeps signal_pending set, and every thread will
get into handle_group_stop. The (SIGNAL_GROUP_EXIT | SIGNAL_GROUP_EXEC)
check there can just do_exit directly, instead of using the fatal signal path.
I'm inclined to get rid of zap_other_threads, since the part of it common
to both callers (de_thread and do_group_exit) will be just:

for (t = next_thread(p); t != p; t = next_thread(t))
if (!unlikely(t->exit_state))
signal_wake_up(t, 1);

(The exit_signal business in the zap_other_threads loop is already dead
cruft, because ->exit_signal is always -1 in non-leader threads nowadays.)

If this makes sense to you, I'll rework my patch along these lines.
But it seems like it will be easiest to do it after your other coredump and
de_thread changes are all resolved.

Also, I've realized that the same lost signal issue can occur for stop
signals. There it's real hard to see what can be done other than trying to
stuff the signal back on the queue, which I've said is problematical.
I need to think about this more.

Thanks again,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

oleg at tv-sign

Apr 11, 2006, 2:21 AM

Post #9 of 11 (819 views)

On 04/11, Roland McGrath wrote:
>
> > So, de_thread() sets SIGNAL_GROUP_EXEC and sends SIGKILL to other thereads.
> >
> > Sub-thread receives the signal, and calls get_signal_to_deliver->do_group_exit.
> > do_group_exit() calls zap_other_threads(SIGNAL_GROUP_EXIT) because there is no
> > SIGNAL_GROUP_EXIT set. zap_other_threads() notices SIGNAL_GROUP_EXEC, wakes up
> > execer, and changes ->signal->flags to SIGNAL_GROUP_EXIT.
> >
> > de_thread() re-locks sighand, sees !SIGNAL_GROUP_EXEC and goes to 'dying:'.
>
> That is what I intend. The exec'ing thread backs out and processes its SIGKILL.
> It sounds like you are calling this scenario a problem, but I don't know why.

Ok, here is the test:

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>

static void *tfunc(void *arg)
{
pause();
return NULL;
}

int main(int argc, const char *argv[])
{
pthread_t thread;

if (!argv[0]) {
printf("--------- SUCCESS ----------\n");
exit(0);
}

if (pthread_create(&thread, NULL, tfunc, NULL)) {
perror ("pthread_create");
exit (1);
}

execl("/proc/self/exe", NULL);

return pause();
}

$ ./test
--------- SUCCESS ----------

With this patch applied I have:

$ ./test
Killed

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

oleg at tv-sign

Apr 11, 2006, 2:21 AM

Post #10 of 11 (820 views)

On 04/11, Roland McGrath wrote:
>
> > A fatal signal is placed to ->shared_pending in any (non tkill) case, so I
> > think it is not lost (but may be unnoticed for a while).
> >
> > sig_kernel_coredump() is different. It could be stealed by one of sub-threads
> > while another one does de_thread(), that is why it could be lost.
>
> I am not talking about the case where it's still pending on either queue.
> Those are fine as they are. What matters is when it's been dequeued, in
> the race window afer releasing the siglock in get_signal_to_deliver.
> There are two windows of race here.
>
> The first one is only when ptrace'd, and doesn't even require a race that
> takes good timing to create. The window is in ptrace_stop when the siglock
> is released, including all the time stopped in TASK_TRACED. Say another
> thread does an exec (and de_thread) while we're in TASK_TRACED after
> reporting a death signal to the debugger. The SIGKILL wakes us out of
> ptrace_stop. Assuming the debugger wasn't racing with a PTRACE_CONT too,
> then the signal remains in ->exit_code and (assuming the SIGNAL_GROUP_EXIT
> check there reverted, as I mentioned before), we just come out of the
> ptrace path with the siglock held as if we'd dequeued the signal without ptrace.

Yes, you are right, thanks.

> Then comes the second window. With no ptrace, or after ptrace, we've
> dequeued the signal and if it's a SIG_DFL fatal signal, we release the
> siglock. Here a non-coredump signal just calls do_group_exit. Meanwhile,
> a racing exec comes along and sets SIGNAL_GROUP_EXIT (or it already did
> earlier while we were in ptrace_stop). In do_group_exit, we will see that
> SIGNAL_GROUP_EXIT is set, and just do_exit ourselves with the group_exit_code.
> When it's an exec rather than a real exit, we've swallowed the signal.
> This is no different than the coredump case. (When do_coredump bails out,
> then it joins this very same code path.)

I still think we are ok with no ptrace. If that (non-coredump) signal was
delivered before de_thread sets SIGNAL_GROUP_EXIT, then this flag is set
by __group_complete_signal(), so de_thread return -EAGAIN. If de_thread()
wins, the signal will be dequeued later from ->shared_pending.

tkill() is different, but I don't see any problem here.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [PATCH] fix de_thread() vs do_coredump() deadlock [ In reply to ]

roland at redhat

Apr 11, 2006, 2:49 AM

Post #11 of 11 (810 views)

> I still think we are ok with no ptrace. If that (non-coredump) signal was
> delivered before de_thread sets SIGNAL_GROUP_EXIT, then this flag is set
> by __group_complete_signal(), so de_thread return -EAGAIN. If de_thread()
> wins, the signal will be dequeued later from ->shared_pending.

There is no guarantee that __group_complete_signal gets to that code path
when the signal is generated. There may be no thread that doesn't have it
blocked nor is already descheduled with pending signals. Then some thread
gets scheduled, or changes it signal mask, and then gets into
get_signal_to_deliver and takes the siglock either before an exec'ing
thread gets the lock, or while the exec'ing thread releases it to wait.
When the dequeuer thread releases the siglock, we have the race window.
(There is another similar case if the signal was handled at generation time
and the handler is reset to SIG_DFL later in a race with another thread
dequeuing the signal and a third doing an exec.)

Thanks,
Roland

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/