Mailing List Archive

[RFC][PATCH 2/3] memcg:: seq_ops support for cgroup
Does anyone have a better idea ?
==

Currently, cgroup's seq_file interface just supports single_open.
This patch allows arbitrary seq_ops if passed.

For example, "status per cpu, status per node" can be very big
in general and they tend to use its own start/next/stop ops.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


---
include/linux/cgroup.h | 9 +++++++++
kernel/cgroup.c | 32 +++++++++++++++++++++++++++++---
2 files changed, 38 insertions(+), 3 deletions(-)

Index: mm-2.6.26-rc2-mm1/include/linux/cgroup.h
===================================================================
--- mm-2.6.26-rc2-mm1.orig/include/linux/cgroup.h
+++ mm-2.6.26-rc2-mm1/include/linux/cgroup.h
@@ -232,6 +232,11 @@ struct cftype {
*/
int (*read_seq_string) (struct cgroup *cont, struct cftype *cft,
struct seq_file *m);
+ /*
+ * If this is not NULL, read ops will use this instead of
+ * single_open(). Useful for showing very large data.
+ */
+ struct seq_operations *seq_ops;

ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
struct file *file,
@@ -285,6 +290,10 @@ int cgroup_path(const struct cgroup *cgr

int cgroup_task_count(const struct cgroup *cgrp);

+
+struct cgroup *cgroup_of_seqfile(struct seq_file *m);
+struct cftype *cftype_of_seqfile(struct seq_file *m);
+
/* Return true if the cgroup is a descendant of the current cgroup */
int cgroup_is_descendant(const struct cgroup *cgrp);

Index: mm-2.6.26-rc2-mm1/kernel/cgroup.c
===================================================================
--- mm-2.6.26-rc2-mm1.orig/kernel/cgroup.c
+++ mm-2.6.26-rc2-mm1/kernel/cgroup.c
@@ -1540,6 +1540,16 @@ struct cgroup_seqfile_state {
struct cgroup *cgroup;
};

+struct cgroup *cgroup_of_seqfile(struct seq_file *m)
+{
+ return ((struct cgroup_seqfile_state *)m->private)->cgroup;
+}
+
+struct cftype *cftype_of_seqfile(struct seq_file *m)
+{
+ return ((struct cgroup_seqfile_state *)m->private)->cft;
+}
+
static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
{
struct seq_file *sf = cb->state;
@@ -1563,8 +1573,14 @@ static int cgroup_seqfile_show(struct se
static int cgroup_seqfile_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
+ struct cgroup_seqfile_state *state = seq->private;
+ struct cftype *cft = state->cft;
+
kfree(seq->private);
- return single_release(inode, file);
+ if (!cft->seq_ops)
+ return single_release(inode, file);
+ else
+ return seq_release(inode, file);
}

static struct file_operations cgroup_seqfile_operations = {
@@ -1585,7 +1601,7 @@ static int cgroup_file_open(struct inode
cft = __d_cft(file->f_dentry);
if (!cft)
return -ENODEV;
- if (cft->read_map || cft->read_seq_string) {
+ if (cft->read_map || cft->read_seq_string || cft->seq_ops) {
struct cgroup_seqfile_state *state =
kzalloc(sizeof(*state), GFP_USER);
if (!state)
@@ -1593,7 +1609,17 @@ static int cgroup_file_open(struct inode
state->cft = cft;
state->cgroup = __d_cgrp(file->f_dentry->d_parent);
file->f_op = &cgroup_seqfile_operations;
- err = single_open(file, cgroup_seqfile_show, state);
+
+ if (!cft->seq_ops)
+ err = single_open(file, cgroup_seqfile_show, state);
+ else {
+ err = seq_open(file, cft->seq_ops);
+ if (!err) {
+ struct seq_file *sf;
+ sf = ((struct seq_file *)file->private_data);
+ sf->private = state;
+ }
+ }
if (err < 0)
kfree(state);
} else if (cft->open)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup [ In reply to ]
KAMEZAWA Hiroyuki wrote:
> Does anyone have a better idea ?
> ==
>
> Currently, cgroup's seq_file interface just supports single_open.
> This patch allows arbitrary seq_ops if passed.

That's great :)

> For example, "status per cpu, status per node" can be very big
> in general and they tend to use its own start/next/stop ops.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

Acked-by: Pavel Emelyanov <xemul@openvz.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup [ In reply to ]
On Tue, May 20, 2008 at 2:08 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> Does anyone have a better idea ?

As a way of printing plain text files, it seems fine.

My concern is that it means that cgroups no longer has any idea about
the typing of the data being returned, which will make it harder to
integrate with a binary stats API. You'd end up having to have a
separate reporting method for the same data to use it. That's why the
"read_map" function specifically doesn't take a seq_file, but instead
takes a key/value callback abstraction, which currently maps into a
seq_file. For the binary stats API, we can use the same reporting
functions, and just map into the binary API output.

Maybe we can somehow combine the read_map() abstraction with the
seq_file's start/stop/next operations.

Paul

> ==
>
> Currently, cgroup's seq_file interface just supports single_open.
> This patch allows arbitrary seq_ops if passed.
>
> For example, "status per cpu, status per node" can be very big
> in general and they tend to use its own start/next/stop ops.
>
> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>
>
> ---
> include/linux/cgroup.h | 9 +++++++++
> kernel/cgroup.c | 32 +++++++++++++++++++++++++++++---
> 2 files changed, 38 insertions(+), 3 deletions(-)
>
> Index: mm-2.6.26-rc2-mm1/include/linux/cgroup.h
> ===================================================================
> --- mm-2.6.26-rc2-mm1.orig/include/linux/cgroup.h
> +++ mm-2.6.26-rc2-mm1/include/linux/cgroup.h
> @@ -232,6 +232,11 @@ struct cftype {
> */
> int (*read_seq_string) (struct cgroup *cont, struct cftype *cft,
> struct seq_file *m);
> + /*
> + * If this is not NULL, read ops will use this instead of
> + * single_open(). Useful for showing very large data.
> + */
> + struct seq_operations *seq_ops;
>
> ssize_t (*write) (struct cgroup *cgrp, struct cftype *cft,
> struct file *file,
> @@ -285,6 +290,10 @@ int cgroup_path(const struct cgroup *cgr
>
> int cgroup_task_count(const struct cgroup *cgrp);
>
> +
> +struct cgroup *cgroup_of_seqfile(struct seq_file *m);
> +struct cftype *cftype_of_seqfile(struct seq_file *m);
> +
> /* Return true if the cgroup is a descendant of the current cgroup */
> int cgroup_is_descendant(const struct cgroup *cgrp);
>
> Index: mm-2.6.26-rc2-mm1/kernel/cgroup.c
> ===================================================================
> --- mm-2.6.26-rc2-mm1.orig/kernel/cgroup.c
> +++ mm-2.6.26-rc2-mm1/kernel/cgroup.c
> @@ -1540,6 +1540,16 @@ struct cgroup_seqfile_state {
> struct cgroup *cgroup;
> };
>
> +struct cgroup *cgroup_of_seqfile(struct seq_file *m)
> +{
> + return ((struct cgroup_seqfile_state *)m->private)->cgroup;
> +}
> +
> +struct cftype *cftype_of_seqfile(struct seq_file *m)
> +{
> + return ((struct cgroup_seqfile_state *)m->private)->cft;
> +}
> +
> static int cgroup_map_add(struct cgroup_map_cb *cb, const char *key, u64 value)
> {
> struct seq_file *sf = cb->state;
> @@ -1563,8 +1573,14 @@ static int cgroup_seqfile_show(struct se
> static int cgroup_seqfile_release(struct inode *inode, struct file *file)
> {
> struct seq_file *seq = file->private_data;
> + struct cgroup_seqfile_state *state = seq->private;
> + struct cftype *cft = state->cft;
> +
> kfree(seq->private);
> - return single_release(inode, file);
> + if (!cft->seq_ops)
> + return single_release(inode, file);
> + else
> + return seq_release(inode, file);
> }
>
> static struct file_operations cgroup_seqfile_operations = {
> @@ -1585,7 +1601,7 @@ static int cgroup_file_open(struct inode
> cft = __d_cft(file->f_dentry);
> if (!cft)
> return -ENODEV;
> - if (cft->read_map || cft->read_seq_string) {
> + if (cft->read_map || cft->read_seq_string || cft->seq_ops) {
> struct cgroup_seqfile_state *state =
> kzalloc(sizeof(*state), GFP_USER);
> if (!state)
> @@ -1593,7 +1609,17 @@ static int cgroup_file_open(struct inode
> state->cft = cft;
> state->cgroup = __d_cgrp(file->f_dentry->d_parent);
> file->f_op = &cgroup_seqfile_operations;
> - err = single_open(file, cgroup_seqfile_show, state);
> +
> + if (!cft->seq_ops)
> + err = single_open(file, cgroup_seqfile_show, state);
> + else {
> + err = seq_open(file, cft->seq_ops);
> + if (!err) {
> + struct seq_file *sf;
> + sf = ((struct seq_file *)file->private_data);
> + sf->private = state;
> + }
> + }
> if (err < 0)
> kfree(state);
> } else if (cft->open)
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup [ In reply to ]
On Tue, 20 May 2008 11:46:46 -0700
"Paul Menage" <menage@google.com> wrote:

> On Tue, May 20, 2008 at 2:08 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > Does anyone have a better idea ?
>
> As a way of printing plain text files, it seems fine.
>
> My concern is that it means that cgroups no longer has any idea about
> the typing of the data being returned, which will make it harder to
> integrate with a binary stats API. You'd end up having to have a
> separate reporting method for the same data to use it. That's why the
> "read_map" function specifically doesn't take a seq_file, but instead
> takes a key/value callback abstraction, which currently maps into a
> seq_file. For the binary stats API, we can use the same reporting
> functions, and just map into the binary API output.
>
With current interface, my concern is hotplug.

File-per-node method requires delete/add files at hotplug.
A file for all nodes with _maps_ method cannot be used because
maps file says
==
The key/value pairs (and their ordering) should not
* change between reboots.
==

And (*read) method isn't useful ;)

Can we add new stat file dynamically ?

Thanks,
-Kame





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup [ In reply to ]
On Tue, May 20, 2008 at 5:28 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> With current interface, my concern is hotplug.
>
> File-per-node method requires delete/add files at hotplug.
> A file for all nodes with _maps_ method cannot be used because
> maps file says
> ==
> The key/value pairs (and their ordering) should not
> * change between reboots.
> ==

OK, so we may need to extend the interface ...

The main reason for that restriction (not allowing the set of keys to
change) was to simplify and speed up userspace parsing and make any
future binary API simpler. But if it's not going to work, we can maybe
make that optional instead.

>
> And (*read) method isn't useful ;)
>
> Can we add new stat file dynamically ?

Yes, there's no reason we can't do that. Right now it's not possible
to remove a control file without deleting the cgroup, but I have a
patch that supports removal.

The question is whether it's better to have one file per CPU/node or
one large complex file.

Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup [ In reply to ]
On Tue, 20 May 2008 22:06:48 -0700
"Paul Menage" <menage@google.com> wrote:

> >
> > And (*read) method isn't useful ;)
> >
> > Can we add new stat file dynamically ?
>
> Yes, there's no reason we can't do that. Right now it's not possible
> to remove a control file without deleting the cgroup, but I have a
> patch that supports removal.
>
Good news. I'll wait for.

> The question is whether it's better to have one file per CPU/node or
> one large complex file.
>
For making the kernel simple, one-file-per-entity(cpu/node...) is better.
For making the applications simple, one big file is better.

I think recent interfaces uses one-file-per-entity method. So I vote for it
for this numastat. One concern is size of cpu/node. It can be 1024...4096 depends
on environment.

open/close 4096 files took some amount of cpu time.
(And that's why 'ps' command is slow on big system.)

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/3] memcg:: seq_ops support for cgroup [ In reply to ]
Hi,

> > With current interface, my concern is hotplug.
> >
> > File-per-node method requires delete/add files at hotplug.
> > A file for all nodes with _maps_ method cannot be used because
> > maps file says
> > ==
> > The key/value pairs (and their ordering) should not
> > * change between reboots.
> > ==
>
> OK, so we may need to extend the interface ...

I also hope it!

Now I'm working on dm-ioband --- I/O bandwidth controller --- and
making it be able to work under cgroups.
I realized it is quite hard to set some specific value to each block
device because each machine has various number of devices and then
some of them are hot-added or hot-removed.

So I hope CGROUP will support some method to handle hot-pluggable
resources.

> The main reason for that restriction (not allowing the set of keys to
> change) was to simplify and speed up userspace parsing and make any
> future binary API simpler. But if it's not going to work, we can maybe
> make that optional instead.
> >
> > And (*read) method isn't useful ;)
> >
> > Can we add new stat file dynamically ?
>
> Yes, there's no reason we can't do that. Right now it's not possible
> to remove a control file without deleting the cgroup, but I have a
> patch that supports removal.
>
> The question is whether it's better to have one file per CPU/node or
> one large complex file.
>
> Paul
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/