Mailing List Archive

[PATCH] prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision
From: Lennart Poettering <lennart@poettering.net>
Subject: prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision

Userspace service managers/supervisors need to track their started
services. Many services daemonize by double-forking and get implicitely
re-parented to PID 1. The process manager will no longer be able to
receive the SIGCHLD signals for them.

With this prctl, a service manager can mark itself as a sort of
'sub-init' process, able to stay as the parent process for all processes
created by the started services. All SIGCHLD signals will be delivered
to the service manager.

As a side effect, the relevant parent PID information does not get lost
by a double-fork, which results in a more elaborate process tree and 'ps'
output.

This is orthogonal to PID namespaces. PID namespaces are isolated
from each other, while a service management process usually requires
the serices to live in the same namespace, to be able to talk to each
other.

Users of this will be the systemd per-user instance, which provides
init-like functionality for the user's login session and D-Bus, which
activates bus services on on-demand. Both will need init-like capabilities
to be able to properly keep track of the services they start.

Signed-off-by: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
---

include/linux/prctl.h | 3 +++
include/linux/sched.h | 2 ++
kernel/exit.c | 9 ++++++++-
kernel/fork.c | 2 ++
kernel/sys.c | 7 +++++++
5 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/include/linux/prctl.h b/include/linux/prctl.h
index a3baeb2..716b7d3 100644
--- a/include/linux/prctl.h
+++ b/include/linux/prctl.h
@@ -102,4 +102,7 @@

#define PR_MCE_KILL_GET 34

+#define PR_SET_CHILD_REAPER 35
+#define PR_GET_CHILD_REAPER 36
+
#endif /* _LINUX_PRCTL_H */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 20b03bf..2dba23b 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1300,6 +1300,8 @@ struct task_struct {
* execve */
unsigned in_iowait:1;

+ /* Reparent child processes to this process instead of pid 1. */
+ unsigned child_reaper:1;

/* Revert to default priority/policy when forking */
unsigned sched_reset_on_fork:1;
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..61a80a4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -700,7 +700,7 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
__acquires(&tasklist_lock)
{
struct pid_namespace *pid_ns = task_active_pid_ns(father);
- struct task_struct *thread;
+ struct task_struct *thread, *reaper;

thread = father;
while_each_thread(father, thread) {
@@ -711,6 +711,13 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
return thread;
}

+ /* find the first ancestor which is marked as child_reaper */
+ for (reaper = father->parent;
+ reaper != &init_task && reaper != pid_ns->child_reaper;
+ reaper = reaper->parent)
+ if (reaper->child_reaper)
+ return reaper;
+
if (unlikely(pid_ns->child_reaper == father)) {
write_unlock_irq(&tasklist_lock);
if (unlikely(pid_ns == &init_pid_ns))
diff --git a/kernel/fork.c b/kernel/fork.c
index e7ceaca..863c5c7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1326,6 +1326,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->parent_exec_id = current->self_exec_id;
}

+ p->child_reaper = 0;
+
spin_lock(&current->sighand->siglock);

/*
diff --git a/kernel/sys.c b/kernel/sys.c
index a101ba3..9b41498 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1792,6 +1792,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
else
error = PR_MCE_KILL_DEFAULT;
break;
+ case PR_SET_CHILD_REAPER:
+ me->child_reaper = !!arg2;
+ error = 0;
+ break;
+ case PR_GET_CHILD_REAPER:
+ error = put_user(me->child_reaper, (int __user *) arg2);
+ break;
default:
error = -EINVAL;
break;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision [ In reply to ]
Andrew, mind picking this up?

Thanks,
Kay

On Fri, Jul 29, 2011 at 02:01, Kay Sievers <kay.sievers@vrfy.org> wrote:
> From: Lennart Poettering <lennart@poettering.net>
> Subject: prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision
>
> Userspace service managers/supervisors need to track their started
> services. Many services daemonize by double-forking and get implicitely
> re-parented to PID 1. The process manager will no longer be able to
> receive the SIGCHLD signals for them.
>
> With this prctl, a service manager can mark itself as a sort of
> 'sub-init' process, able to stay as the parent process for all processes
> created by the started services. All SIGCHLD signals will be delivered
> to the service manager.
>
> As a side effect, the relevant parent PID information does not get lost
> by a double-fork, which results in a more elaborate process tree and 'ps'
> output.
>
> This is orthogonal to PID namespaces. PID namespaces are isolated
> from each other, while a service management process usually requires
> the serices to live in the same namespace, to be able to talk to each
> other.
>
> Users of this will be the systemd per-user instance, which provides
> init-like functionality for the user's login session and D-Bus, which
> activates bus services on on-demand. Both will need init-like capabilities
> to be able to properly keep track of the services they start.
>
> Signed-off-by: Lennart Poettering <lennart@poettering.net>
> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
> ---
>
>  include/linux/prctl.h |    3 +++
>  include/linux/sched.h |    2 ++
>  kernel/exit.c         |    9 ++++++++-
>  kernel/fork.c         |    2 ++
>  kernel/sys.c          |    7 +++++++
>  5 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/prctl.h b/include/linux/prctl.h
> index a3baeb2..716b7d3 100644
> --- a/include/linux/prctl.h
> +++ b/include/linux/prctl.h
> @@ -102,4 +102,7 @@
>
>  #define PR_MCE_KILL_GET 34
>
> +#define PR_SET_CHILD_REAPER 35
> +#define PR_GET_CHILD_REAPER 36
> +
>  #endif /* _LINUX_PRCTL_H */
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 20b03bf..2dba23b 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1300,6 +1300,8 @@ struct task_struct {
>                                 * execve */
>        unsigned in_iowait:1;
>
> +       /* Reparent child processes to this process instead of pid 1. */
> +       unsigned child_reaper:1;
>
>        /* Revert to default priority/policy when forking */
>        unsigned sched_reset_on_fork:1;
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 2913b35..61a80a4 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -700,7 +700,7 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
>        __acquires(&tasklist_lock)
>  {
>        struct pid_namespace *pid_ns = task_active_pid_ns(father);
> -       struct task_struct *thread;
> +       struct task_struct *thread, *reaper;
>
>        thread = father;
>        while_each_thread(father, thread) {
> @@ -711,6 +711,13 @@ static struct task_struct *find_new_reaper(struct task_struct *father)
>                return thread;
>        }
>
> +       /* find the first ancestor which is marked as child_reaper */
> +       for (reaper = father->parent;
> +            reaper != &init_task && reaper != pid_ns->child_reaper;
> +            reaper = reaper->parent)
> +               if (reaper->child_reaper)
> +                       return reaper;
> +
>        if (unlikely(pid_ns->child_reaper == father)) {
>                write_unlock_irq(&tasklist_lock);
>                if (unlikely(pid_ns == &init_pid_ns))
> diff --git a/kernel/fork.c b/kernel/fork.c
> index e7ceaca..863c5c7 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1326,6 +1326,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>                p->parent_exec_id = current->self_exec_id;
>        }
>
> +       p->child_reaper = 0;
> +
>        spin_lock(&current->sighand->siglock);
>
>        /*
> diff --git a/kernel/sys.c b/kernel/sys.c
> index a101ba3..9b41498 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1792,6 +1792,13 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
>                        else
>                                error = PR_MCE_KILL_DEFAULT;
>                        break;
> +               case PR_SET_CHILD_REAPER:
> +                       me->child_reaper = !!arg2;
> +                       error = 0;
> +                       break;
> +               case PR_GET_CHILD_REAPER:
> +                       error = put_user(me->child_reaper, (int __user *) arg2);
> +                       break;
>                default:
>                        error = -EINVAL;
>                        break;
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision [ In reply to ]
On Fri, 29 Jul 2011 02:01:44 +0200
Kay Sievers <kay.sievers@vrfy.org> wrote:

> From: Lennart Poettering <lennart@poettering.net>
> Subject: prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision
>
> Userspace service managers/supervisors need to track their started
> services. Many services daemonize by double-forking and get implicitely
> re-parented to PID 1. The process manager will no longer be able to
> receive the SIGCHLD signals for them.
>
> With this prctl, a service manager can mark itself as a sort of
> 'sub-init' process, able to stay as the parent process for all processes
> created by the started services. All SIGCHLD signals will be delivered
> to the service manager.
>
> As a side effect, the relevant parent PID information does not get lost
> by a double-fork, which results in a more elaborate process tree and 'ps'
> output.
>
> This is orthogonal to PID namespaces. PID namespaces are isolated
> from each other, while a service management process usually requires
> the serices to live in the same namespace, to be able to talk to each
> other.
>
> Users of this will be the systemd per-user instance, which provides
> init-like functionality for the user's login session and D-Bus, which
> activates bus services on on-demand. Both will need init-like capabilities
> to be able to properly keep track of the services they start.
>

Interesting patch. I can't immediately see any nasty effects from it..

Did you consider using the existing taskstats capability for this?

The comment block over find_new_reaper() is now incomplete. Please
update it?

prctl(2) has a manpage. What's the plan for updating it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision [ In reply to ]
On Tue, Aug 16, 2011 at 22:10, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Fri, 29 Jul 2011 02:01:44 +0200
> Kay Sievers <kay.sievers@vrfy.org> wrote:
>
>> From: Lennart Poettering <lennart@poettering.net>
>> Subject: prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision
>>
>> Userspace service managers/supervisors need to track their started
>> services. Many services daemonize by double-forking and get implicitely
>> re-parented to PID 1. The process manager will no longer be able to
>> receive the SIGCHLD signals for them.
>>
>> With this prctl, a service manager can mark itself as a sort of
>> 'sub-init' process, able to stay as the parent process for all processes
>> created by the started services. All SIGCHLD signals will be delivered
>> to the service manager.
>>
>> As a side effect, the relevant parent PID information does not get lost
>> by a double-fork, which results in a more elaborate process tree and 'ps'
>> output.
>>
>> This is orthogonal to PID namespaces. PID namespaces are isolated
>> from each other, while a service management process usually requires
>> the serices to live in the same namespace, to be able to talk to each
>> other.
>>
>> Users of this will be the systemd per-user instance, which provides
>> init-like functionality for the user's login session and D-Bus, which
>> activates bus services on on-demand. Both will need init-like capabilities
>> to be able to properly keep track of the services they start.
>>
>
> Interesting patch.  I can't immediately see any nasty effects from it..
>
> Did you consider using the existing taskstats capability for this?

Yes, but as it always is with buffered async interfaces, they are
tricky regarding ordering, races and possible overflows.

SIGCHLD is async too, but it has important differences in this case:
If the service-manager is the reaper, it will do the waitpid() itself,
and before it reaps the child, it can still investigate the existing
task and it will also directly receive the return values from
waitpid(). If we let the pids re-parent to PID 1, then the dead pids
and most of their information is gone before the service manager sees
the taskstats event.

The service-manager needs to handle SIGCHLD and waitpid() anyway for
all the stuff that does not double-fork, so the code is already there
and does all what we need without involving a second interface just
for re-parenting processes.

My very personal favourite is that 'ps afx' looks so nice now. The
tree of the processes of the login session start to make sense, and we
don't have half of the user processes hanging off PID 1. But that's
surely just cosmetics, and no reason to do that. I just like pretty
things. :)

> The comment block over find_new_reaper() is now incomplete.  Please
> update it?

'... give it to the child reaper process (ie "init") in out pid
space.' still kind of fits, I think?

Would:
'... give it to the child reaper process (ie 'init' or parent marked
as reaper) in our pid space.' sound better?

> prctl(2) has a manpage.  What's the plan for updating it?

I'll send the update to Michael when it hits the repo.

Thanks,
Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision [ In reply to ]
On Wed, 17 Aug 2011 02:32:39 +0200 Kay Sievers <kay.sievers@vrfy.org> wrote:

> On Tue, Aug 16, 2011 at 22:10, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Fri, 29 Jul 2011 02:01:44 +0200
> > Kay Sievers <kay.sievers@vrfy.org> wrote:
> >
> >> From: Lennart Poettering <lennart@poettering.net>
> >> Subject: prctl: add PR_{SET,GET}_CHILD_REAPER to allow simple process supervision
> >>
> >> Userspace service managers/supervisors need to track their started
> >> services. Many services daemonize by double-forking and get implicitely
> >> re-parented to PID 1. The process manager will no longer be able to
> >> receive the SIGCHLD signals for them.
> >>
> >> With this prctl, a service manager can mark itself as a sort of
> >> 'sub-init' process, able to stay as the parent process for all processes
> >> created by the started services. All SIGCHLD signals will be delivered
> >> to the service manager.
> >>
> >> As a side effect, the relevant parent PID information does not get lost
> >> by a double-fork, which results in a more elaborate process tree and 'ps'
> >> output.
> >>
> >> This is orthogonal to PID namespaces. PID namespaces are isolated
> >> from each other, while a service management process usually requires
> >> the serices to live in the same namespace, to be able to talk to each
> >> other.
> >>
> >> Users of this will be the systemd per-user instance, which provides
> >> init-like functionality for the user's login session and D-Bus, which
> >> activates bus services on on-demand. Both will need init-like capabilities
> >> to be able to properly keep track of the services they start.
> >>
> >
> > Interesting patch. __I can't immediately see any nasty effects from it..
> >
> > Did you consider using the existing taskstats capability for this?
>
> Yes, but as it always is with buffered async interfaces, they are
> tricky regarding ordering, races and possible overflows.
>
> SIGCHLD is async too, but it has important differences in this case:
> If the service-manager is the reaper, it will do the waitpid() itself,
> and before it reaps the child, it can still investigate the existing
> task and it will also directly receive the return values from
> waitpid(). If we let the pids re-parent to PID 1, then the dead pids
> and most of their information is gone before the service manager sees
> the taskstats event.
>
> The service-manager needs to handle SIGCHLD and waitpid() anyway for
> all the stuff that does not double-fork, so the code is already there
> and does all what we need without involving a second interface just
> for re-parenting processes.
>
> My very personal favourite is that 'ps afx' looks so nice now. The
> tree of the processes of the login session start to make sense, and we
> don't have half of the user processes hanging off PID 1. But that's
> surely just cosmetics, and no reason to do that. I just like pretty
> things. :)

Spose so. I spy suitable changelog enhancements.

Also, other means of notification if they exist. I'm sure they do ;)

> > The comment block over find_new_reaper() is now incomplete. __Please
> > update it?
>
> '... give it to the child reaper process (ie "init") in out pid
> space.' still kind of fits, I think?
>
> Would:
> '... give it to the child reaper process (ie 'init' or parent marked
> as reaper) in our pid space.' sound better?

At a minimum. A nice discourse on what that code is doing in there
(and why!) would be better. After all, the comment is supposed to
explain the function.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/