Mailing List Archive

[PATCH] Fix d_path for lazy unmounts
Hello,

here is a bugfix to d_path. Please apply (after 2.6.20).

First, when d_path() hits a lazily unmounted mount point, it tries to
prepend the name of the lazily unmounted dentry to the path name. It
gets this wrong, and also overwrites the slash that separates the name
from the following pathname component. This is demonstrated by the
attached test case, which prints "getcwd returned d_path-bugsubdir"
with the bug. The correct result would be "getcwd returned
d_path-bug/subdir".

It could be argued that the name of the root dentry should not be part
of the result of d_path in the first place. On the other hand, what the
unconnected namespace was once reachable as may provide some useful
hints to users, and so that seems okay.

Second, it isn't always possible to tell from the __d_path result whether
the specified root and rootmnt (i.e., the chroot) was reached: lazy
unmounts of bind mounts will produce a path that does start with a
non-slash so we can tell from that, but other lazy unmounts will produce
a path that starts with a slash, just like "ordinary" paths.

The attached patch cleans up __d_path() to fix the bug with overlapping
pathname components. It also adds a @fail_deleted argument, which allows
to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock
can then also be moved into __d_path(). The patch also makes sure that
paths will only start with a slash for paths which are connected to the
root and rootmnt.

The @fail_deleted argument could be added to d_path() as well: this would
allow callers to recognize deleted files, without having to resort to the
ambiguous check for the " (deleted)" string at the end of the pathnames.
This is not currently done, but it might be worthwhile.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1739,45 +1739,43 @@ shouldnt_be_hashed:
* @rootmnt: vfsmnt to which the root dentry belongs
* @buffer: buffer to return value in
* @buflen: buffer length
+ * @fail_deleted: what to return for deleted files
*
- * Convert a dentry into an ASCII path name. If the entry has been deleted
- * the string " (deleted)" is appended. Note that this is ambiguous.
+ * Convert a dentry into an ASCII path name. If the entry has been deleted,
+ * then if @fail_deleted is true, ERR_PTR(-ENOENT) is returned. Otherwise,
+ * the the string " (deleted)" is appended. Note that this is ambiguous.
*
- * Returns the buffer or an error code if the path was too long.
- *
- * "buflen" should be positive. Caller holds the dcache_lock.
+ * Returns the buffer or an error code.
*/
-static char * __d_path( struct dentry *dentry, struct vfsmount *vfsmnt,
- struct dentry *root, struct vfsmount *rootmnt,
- char *buffer, int buflen)
+static char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+ struct dentry *root, struct vfsmount *rootmnt,
+ char *buffer, int buflen, int fail_deleted)
{
- char * end = buffer+buflen;
- char * retval;
+ char *end = buffer + buflen - 1;
int namelen;

- *--end = '\0';
+ buffer = end;
+ if (buflen < 2)
+ return ERR_PTR(-ENAMETOOLONG);
+ *end = '\0';
buflen--;
+
+ spin_lock(&dcache_lock);
if (!IS_ROOT(dentry) && d_unhashed(dentry)) {
- buflen -= 10;
- end -= 10;
- if (buflen < 0)
+ if (fail_deleted) {
+ buffer = ERR_PTR(-ENOENT);
+ goto out;
+ }
+ if (buflen < 10)
goto Elong;
- memcpy(end, " (deleted)", 10);
+ buflen -= 10;
+ buffer -= 10;
+ memcpy(buffer, " (deleted)", 10);
}
-
- if (buflen < 1)
- goto Elong;
- /* Get '/' right */
- retval = end-1;
- *retval = '/';
-
- for (;;) {
+ while (dentry != root || vfsmnt != rootmnt) {
struct dentry * parent;

- if (dentry == root && vfsmnt == rootmnt)
- break;
if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
- /* Global root? */
spin_lock(&vfsmount_lock);
if (vfsmnt->mnt_parent == vfsmnt) {
spin_unlock(&vfsmount_lock);
@@ -1791,33 +1789,49 @@ static char * __d_path( struct dentry *d
parent = dentry->d_parent;
prefetch(parent);
namelen = dentry->d_name.len;
- buflen -= namelen + 1;
- if (buflen < 0)
+ if (buflen <= namelen)
goto Elong;
- end -= namelen;
- memcpy(end, dentry->d_name.name, namelen);
- *--end = '/';
- retval = end;
+ buflen -= namelen + 1;
+ buffer -= namelen;
+ memcpy(buffer, dentry->d_name.name, namelen);
+ *--buffer = '/';
dentry = parent;
}
+ /* Get '/' right */
+ if (buffer == end)
+ *--buffer = '/';

- return retval;
+out:
+ spin_unlock(&dcache_lock);
+ return buffer;

global_root:
+ /*
+ * We went past the (vfsmount, dentry) we were looking for and have
+ * either hit a root dentry, a lazily unmounted dentry, or an
+ * unconnected dentry. Make sure we won't return a pathname rooted
+ * in '/'.
+ */
namelen = dentry->d_name.len;
- buflen -= namelen;
- if (buflen < 0)
- goto Elong;
- retval -= namelen-1; /* hit the slash */
- memcpy(retval, dentry->d_name.name, namelen);
- return retval;
+ if (namelen == 1 && *dentry->d_name.name == '/') {
+ if (buffer != end)
+ buffer++;
+ } else {
+ if (buflen < namelen)
+ goto Elong;
+ buffer -= namelen;
+ memcpy(buffer, dentry->d_name.name, namelen);
+ }
+ goto out;
+
Elong:
- return ERR_PTR(-ENAMETOOLONG);
+ buffer = ERR_PTR(-ENAMETOOLONG);
+ goto out;
}

/* write full pathname into buffer and return start of pathname */
-char * d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
- char *buf, int buflen)
+char *d_path(struct dentry *dentry, struct vfsmount *vfsmnt, char *buf,
+ int buflen)
{
char *res;
struct vfsmount *rootmnt;
@@ -1827,9 +1841,7 @@ char * d_path(struct dentry *dentry, str
rootmnt = mntget(current->fs->rootmnt);
root = dget(current->fs->root);
read_unlock(&current->fs->lock);
- spin_lock(&dcache_lock);
- res = __d_path(dentry, vfsmnt, root, rootmnt, buf, buflen);
- spin_unlock(&dcache_lock);
+ res = __d_path(dentry, vfsmnt, root, rootmnt, buf, buflen, 0);
dput(root);
mntput(rootmnt);
return res;
@@ -1855,10 +1867,10 @@ char * d_path(struct dentry *dentry, str
*/
asmlinkage long sys_getcwd(char __user *buf, unsigned long size)
{
- int error;
+ int error, len;
struct vfsmount *pwdmnt, *rootmnt;
struct dentry *pwd, *root;
- char *page = (char *) __get_free_page(GFP_USER);
+ char *page = (char *) __get_free_page(GFP_USER), *cwd;

if (!page)
return -ENOMEM;
@@ -1870,29 +1882,18 @@ asmlinkage long sys_getcwd(char __user *
root = dget(current->fs->root);
read_unlock(&current->fs->lock);

- error = -ENOENT;
- /* Has the current directory has been unlinked? */
- spin_lock(&dcache_lock);
- if (pwd->d_parent == pwd || !d_unhashed(pwd)) {
- unsigned long len;
- char * cwd;
-
- cwd = __d_path(pwd, pwdmnt, root, rootmnt, page, PAGE_SIZE);
- spin_unlock(&dcache_lock);
-
- error = PTR_ERR(cwd);
- if (IS_ERR(cwd))
- goto out;
-
- error = -ERANGE;
- len = PAGE_SIZE + page - cwd;
- if (len <= size) {
- error = len;
- if (copy_to_user(buf, cwd, len))
- error = -EFAULT;
- }
- } else
- spin_unlock(&dcache_lock);
+ cwd = __d_path(pwd, pwdmnt, root, rootmnt, page, PAGE_SIZE, 1);
+ error = PTR_ERR(cwd);
+ if (IS_ERR(cwd))
+ goto out;
+
+ error = -ERANGE;
+ len = PAGE_SIZE + page - cwd;
+ if (len <= size) {
+ error = len;
+ if (copy_to_user(buf, cwd, len))
+ error = -EFAULT;
+ }

out:
dput(pwd);
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Friday February 2, agruen@suse.de wrote:
> Hello,
>
> here is a bugfix to d_path. Please apply (after 2.6.20).

Looks good! Just a couple of little comments (to prove that I have
really read it and thought about it :-)

>
> Signed-off-by: Andreas Gruenbacher <agruen@suse.de>

Reviewed-by: NeilBrown <neilb@suse.de>

>
> Index: linux-2.6/fs/dcache.c
> ===================================================================
> --- linux-2.6.orig/fs/dcache.c
> +++ linux-2.6/fs/dcache.c
> @@ -1739,45 +1739,43 @@ shouldnt_be_hashed:
> * @rootmnt: vfsmnt to which the root dentry belongs
> * @buffer: buffer to return value in
> * @buflen: buffer length
> + * @fail_deleted: what to return for deleted files
> *
> - * Convert a dentry into an ASCII path name. If the entry has been deleted
> - * the string " (deleted)" is appended. Note that this is ambiguous.
> + * Convert a dentry into an ASCII path name. If the entry has been deleted,
> + * then if @fail_deleted is true, ERR_PTR(-ENOENT) is returned. Otherwise,
> + * the the string " (deleted)" is appended. Note that this is ambiguous.

The behaviour in the face of a lazy unmount should be clarified in
this comment.

And I cannot help wondering if that behaviour should - in some cases -
be 'fail'.

i.e. if sys_getcwd is called on a directory that is no longer
connected to the root, it isn't clear to me that it should return
without an error.
Without your patch it can return garbage which is clearly wrong.
With you patch it will return a relative path name, which is also
wrong (it isn't a valid path that leads to the current working directory).
I would suggest that 'fail_deleted' be (e.g.) changed to
'fail_condition' where two conditions are defined
#define DPATH_FAIL_DELETED 1
#define DPATH_FAIL_DISCONNECTED 2

and both these are passed in by sys_getcwd.


> @@ -1791,33 +1789,49 @@ static char * __d_path( struct dentry *d
> parent = dentry->d_parent;
> prefetch(parent);
> namelen = dentry->d_name.len;
> - buflen -= namelen + 1;
> - if (buflen < 0)
> + if (buflen <= namelen)
> goto Elong;

This bothers me. You appear to be comparing buflen with namelen, but
are about the change buflen by 'namelen + 1'. It looks like a bug.

In reality, you are comparing "buflen < namelen+1" but spelling it as
"buflen <= namelen". I would prefer the full spelling with least room
for confusion.


> - end -= namelen;
> - memcpy(end, dentry->d_name.name, namelen);
> - *--end = '/';
> - retval = end;
> + buflen -= namelen + 1;
> + buffer -= namelen;
> + memcpy(buffer, dentry->d_name.name, namelen);
> + *--buffer = '/';

This wouldn't be my preference either. It is important that 'buflen'
and 'buffer' move in step, but when I see
buflen -= namelen + 1;
buffer -= namelen;
it looks like they aren't. Maybe:
> + buflen -= namelen + 1;
> + buffer -= namelen + 1;
> + memcpy(buffer+1, dentry->d_name.name, namelen);
> + *buffer = '/';

or am I being too picky?

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Friday 02 February 2007 19:23, Andreas Gruenbacher wrote:
> Hello,
>
> here is a bugfix to d_path. Please apply (after 2.6.20).
>
> First, when d_path() hits a lazily unmounted mount point, it tries to
> prepend the name of the lazily unmounted dentry to the path name. It
> gets this wrong, and also overwrites the slash that separates the name
> from the following pathname component. This is demonstrated by the
> attached test case, which prints "getcwd returned d_path-bugsubdir"
> with the bug. The correct result would be "getcwd returned
> d_path-bug/subdir".

It turns out that overwriting the separating slash is the intended behavior for
pseudo filesystems such as pipefs, which end up producing pathnames
like "pipe:[deadbeef]" with "pipe:" as the root dentry's name
and "[deadbeef]" as the child's name. Special casing this doesn't make a
lot of sense to me, but I guess it will probably have to stay -- quite a few
programs presumable rely on this convention.

Here is an updated patch that also catches this special case.

While looking into this, I noticed that the inotifyfs and futexfs pseudo
filesystems don't use a trailing slash in the root dentry name (see
get_sb_pseudo() in fs/inotify_user.c and kernel/futex.c). Should this be
changed?

> It could be argued that the name of the root dentry should not be part
> of the result of d_path in the first place. On the other hand, what the
> unconnected namespace was once reachable as may provide some useful
> hints to users, and so that seems okay.
>
> Second, it isn't always possible to tell from the __d_path result whether
> the specified root and rootmnt (i.e., the chroot) was reached: lazy
> unmounts of bind mounts will produce a path that does start with a
> non-slash so we can tell from that, but other lazy unmounts will produce
> a path that starts with a slash, just like "ordinary" paths.
>
> The attached patch cleans up __d_path() to fix the bug with overlapping
> pathname components. It also adds a @fail_deleted argument, which allows
> to get rid of some of the mess in sys_getcwd(). Grabbing the dcache_lock
> can then also be moved into __d_path(). The patch also makes sure that
> paths will only start with a slash for paths which are connected to the
> root and rootmnt.
>
> The @fail_deleted argument could be added to d_path() as well: this would
> allow callers to recognize deleted files, without having to resort to the
> ambiguous check for the " (deleted)" string at the end of the pathnames.
> This is not currently done, but it might be worthwhile.

Updated patch follows.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>

Index: linux-2.6-d_path/fs/dcache.c
===================================================================
--- linux-2.6-d_path.orig/fs/dcache.c
+++ linux-2.6-d_path/fs/dcache.c
@@ -1739,45 +1739,41 @@ shouldnt_be_hashed:
* @rootmnt: vfsmnt to which the root dentry belongs
* @buffer: buffer to return value in
* @buflen: buffer length
+ * @fail_deleted: what to return for deleted files
*
- * Convert a dentry into an ASCII path name. If the entry has been deleted
- * the string " (deleted)" is appended. Note that this is ambiguous.
+ * Convert a dentry into an ASCII path name. If the entry has been deleted,
+ * then if @fail_deleted is true, ERR_PTR(-ENOENT) is returned. Otherwise,
+ * the the string " (deleted)" is appended. Note that this is ambiguous.
*
- * Returns the buffer or an error code if the path was too long.
- *
- * "buflen" should be positive. Caller holds the dcache_lock.
+ * Returns the buffer or an error code.
*/
-static char * __d_path( struct dentry *dentry, struct vfsmount *vfsmnt,
- struct dentry *root, struct vfsmount *rootmnt,
- char *buffer, int buflen)
-{
- char * end = buffer+buflen;
- char * retval;
- int namelen;
+static char *__d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
+ struct dentry *root, struct vfsmount *rootmnt,
+ char *buffer, int buflen, int fail_deleted)
+{
+ int namelen, is_slash;
+
+ if (buflen < 2)
+ return ERR_PTR(-ENAMETOOLONG);
+ buffer += --buflen;
+ *buffer = '\0';

- *--end = '\0';
- buflen--;
+ spin_lock(&dcache_lock);
if (!IS_ROOT(dentry) && d_unhashed(dentry)) {
- buflen -= 10;
- end -= 10;
- if (buflen < 0)
+ if (fail_deleted) {
+ buffer = ERR_PTR(-ENOENT);
+ goto out;
+ }
+ if (buflen < 10)
goto Elong;
- memcpy(end, " (deleted)", 10);
+ buflen -= 10;
+ buffer -= 10;
+ memcpy(buffer, " (deleted)", 10);
}
-
- if (buflen < 1)
- goto Elong;
- /* Get '/' right */
- retval = end-1;
- *retval = '/';
-
- for (;;) {
+ while (dentry != root || vfsmnt != rootmnt) {
struct dentry * parent;

- if (dentry == root && vfsmnt == rootmnt)
- break;
if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
- /* Global root? */
spin_lock(&vfsmount_lock);
if (vfsmnt->mnt_parent == vfsmnt) {
spin_unlock(&vfsmount_lock);
@@ -1791,33 +1787,60 @@ static char * __d_path( struct dentry *d
parent = dentry->d_parent;
prefetch(parent);
namelen = dentry->d_name.len;
- buflen -= namelen + 1;
- if (buflen < 0)
+ if (buflen <= namelen)
goto Elong;
- end -= namelen;
- memcpy(end, dentry->d_name.name, namelen);
- *--end = '/';
- retval = end;
+ buflen -= namelen + 1;
+ buffer -= namelen;
+ memcpy(buffer, dentry->d_name.name, namelen);
+ *--buffer = '/';
dentry = parent;
}
+ /* Get '/' right */
+ if (*buffer != '/')
+ *--buffer = '/';

- return retval;
+out:
+ spin_unlock(&dcache_lock);
+ return buffer;

global_root:
+ /*
+ * We went past the (vfsmount, dentry) we were looking for and have
+ * either hit a root dentry, a lazily unmounted dentry, an
+ * unconnected dentry, or the file is on a pseudo filesystem.
+ */
namelen = dentry->d_name.len;
- buflen -= namelen;
- if (buflen < 0)
+ is_slash = (namelen == 1 && *dentry->d_name.name == '/');
+ if (is_slash || (dentry->d_sb->s_flags & MS_NOUSER)) {
+ /*
+ * Make sure we won't return a pathname starting with '/'.
+ *
+ * Historically, we also glue together the root dentry and
+ * remaining name for pseudo filesystems like pipefs, which
+ * have the MS_NOUSER flag set. This results in pathnames
+ * like "pipe:[439336]".
+ */
+ if (*buffer == '/') {
+ buffer++;
+ buflen++;
+ }
+ if (is_slash)
+ goto out;
+ }
+ if (buflen < namelen)
goto Elong;
- retval -= namelen-1; /* hit the slash */
- memcpy(retval, dentry->d_name.name, namelen);
- return retval;
+ buffer -= namelen;
+ memcpy(buffer, dentry->d_name.name, namelen);
+ goto out;
+
Elong:
- return ERR_PTR(-ENAMETOOLONG);
+ buffer = ERR_PTR(-ENAMETOOLONG);
+ goto out;
}

/* write full pathname into buffer and return start of pathname */
-char * d_path(struct dentry *dentry, struct vfsmount *vfsmnt,
- char *buf, int buflen)
+char *d_path(struct dentry *dentry, struct vfsmount *vfsmnt, char *buf,
+ int buflen)
{
char *res;
struct vfsmount *rootmnt;
@@ -1827,9 +1850,7 @@ char * d_path(struct dentry *dentry, str
rootmnt = mntget(current->fs->rootmnt);
root = dget(current->fs->root);
read_unlock(&current->fs->lock);
- spin_lock(&dcache_lock);
- res = __d_path(dentry, vfsmnt, root, rootmnt, buf, buflen);
- spin_unlock(&dcache_lock);
+ res = __d_path(dentry, vfsmnt, root, rootmnt, buf, buflen, 0);
dput(root);
mntput(rootmnt);
return res;
@@ -1855,10 +1876,10 @@ char * d_path(struct dentry *dentry, str
*/
asmlinkage long sys_getcwd(char __user *buf, unsigned long size)
{
- int error;
+ int error, len;
struct vfsmount *pwdmnt, *rootmnt;
struct dentry *pwd, *root;
- char *page = (char *) __get_free_page(GFP_USER);
+ char *page = (char *) __get_free_page(GFP_USER), *cwd;

if (!page)
return -ENOMEM;
@@ -1870,29 +1891,18 @@ asmlinkage long sys_getcwd(char __user *
root = dget(current->fs->root);
read_unlock(&current->fs->lock);

- error = -ENOENT;
- /* Has the current directory has been unlinked? */
- spin_lock(&dcache_lock);
- if (pwd->d_parent == pwd || !d_unhashed(pwd)) {
- unsigned long len;
- char * cwd;
-
- cwd = __d_path(pwd, pwdmnt, root, rootmnt, page, PAGE_SIZE);
- spin_unlock(&dcache_lock);
-
- error = PTR_ERR(cwd);
- if (IS_ERR(cwd))
- goto out;
-
- error = -ERANGE;
- len = PAGE_SIZE + page - cwd;
- if (len <= size) {
- error = len;
- if (copy_to_user(buf, cwd, len))
- error = -EFAULT;
- }
- } else
- spin_unlock(&dcache_lock);
+ cwd = __d_path(pwd, pwdmnt, root, rootmnt, page, PAGE_SIZE, 1);
+ error = PTR_ERR(cwd);
+ if (IS_ERR(cwd))
+ goto out;
+
+ error = -ERANGE;
+ len = PAGE_SIZE + page - cwd;
+ if (len <= size) {
+ error = len;
+ if (copy_to_user(buf, cwd, len))
+ error = -EFAULT;
+ }

out:
dput(pwd);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Monday 05 February 2007 00:32, Andreas Gruenbacher wrote:
> Here is an updated patch that also catches this special case.
> [...]

The d_path change was to not start unreachable paths with slashes. In the
extreme case, this leads to an empty string. As it turns out, we are
reporting meaningless paths to users in a bunch of places in /proc, like
in /proc/$pid/mounts, where we ended up with entries like this:

rootfs rootfs rw 0 0

No wonder this immediately broke things; sorry for that.

Mountpoints are reported relative to the chroot if they are reachable from the
chroot, and relative to the namespace they are defined in otherwise. This is
big nonsense, but it's unclear to me how to best fix it:

- don't report unreachable mount points,
- somehow indicate which mountpoints are reachable and which are not,
like by prepending a question flag?

What's the point in reporting the rootfs at all -- it's never reachable to an
ordinary process?

The same issue shows up in a few other places as well: /proc/$pid/mountstats
which is similar to /proc/$pid/mounts, and here:

/proc/$pid/maps
/proc/$pid/smaps
/proc/$pid/numa_maps
/proc/swaps
/proc/mdstat
/proc/net/rpc/nfsd.fh/content
/proc/net/rpc/nfsd.export/content

We surely do not want to hide the entries that have unreachable pathnames
here...

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wed, Feb 14, Andreas Gruenbacher wrote:

> What's the point in reporting the rootfs at all -- it's never reachable to an
> ordinary process?

/init and its childs has it as root, until it passes control over to
/sbin/init
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wednesday 14 February 2007 00:29, Olaf Hering wrote:
> On Wed, Feb 14, Andreas Gruenbacher wrote:
> > What's the point in reporting the rootfs at all -- it's never reachable
> > to an ordinary process?
>
> /init and its childs has it as root, until it passes control over to
> /sbin/init

Yes, that's why I said "ordinary".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wed, 14 Feb 2007, Andreas Gruenbacher wrote:
>
> Mountpoints are reported relative to the chroot if they are reachable from the
> chroot, and relative to the namespace they are defined in otherwise. This is
> big nonsense, but it's unclear to me how to best fix it:

Well, it's also what a traditional "pwd" implementation would do, so it's
not "nonsense" in that sense.

> - don't report unreachable mount points,
> - somehow indicate which mountpoints are reachable and which are not,
> like by prepending a question flag?

We could prepend another '/' (so that you'd have a path that starts with
"//"). That's still a legal path, but it's also somethign that even POSIX
says is valid to mean something else (eg "//ftp/.." or "//socket/.." to
escape into another namespace).

But the fact is, some things just want a path. And it's generally *better*
to get them a

- path that looks ok and starts from '/' than it is to give them
something that looks strange and doesn't start from root (because the
latter gives many many more possible attack vectors: if somebody
actually looks up the path, a bad user can much more easily fake a
relative path than fake an absolute one).

- the path we've historically always given them.

> What's the point in reporting the rootfs at all -- it's never reachable to an
> ordinary process?

All the paths are generally useful for USER INFORMATION. That's the
primary use of paths for anything but "getcwd()". And the primary use for
"getcwd()" is to do the same thing that any traditional cwd implementation
has done, except faster (and _possibly_ better, but compatibility is more
important than extensions - so the "better" is mainly an issue about
non-readable or non-executable path component that we can show, and
about being able to tell _how_ you got to a point that has multiple ways
of getting there).

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wednesday 14 February 2007 07:37, Linus Torvalds wrote:
> We could prepend another '/' (so that you'd have a path that starts with
> "//"). That's still a legal path, but it's also somethign that even POSIX
> says is valid to mean something else (eg "//ftp/.." or "//socket/.." to
> escape into another namespace).

This sounds good enough to me. My main point is that users that care should be
able to tell the difference.

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wednesday 14 February 2007 11:39, Andreas Gruenbacher wrote:
> On Wednesday 14 February 2007 07:37, Linus Torvalds wrote:
> > We could prepend another '/' (so that you'd have a path that starts with
> > "//"). That's still a legal path, but it's also somethign that even POSIX
> > says is valid to mean something else (eg "//ftp/.." or "//socket/.." to
> > escape into another namespace).
>
> This sounds good enough to me. My main point is that users that care should
> be able to tell the difference.

Looking some more into it, appending another slash doesn't look like the best
solution after all. Here is a more elaborate description of the cases we have
to deal with:

* Regular paths that lead up to the root we are interested in. These should
obviously be reported as "/dir/file", so I won't mention them any further.

* Files on pseudo filesystems such as pipefs. Historically, we are
reporting them as "pipe:[439336]" or similar, where "pipe:" is the name of
the root dentry, and "[439336]" is the name of the child; no slash between
them.

* Lazily unmounted directories. Currently, we report them as "dirfile", where
"dir" is the name of the directory dentry, and "file" is the name of the
child dentry. That's the obvious bug.

* Lazily unmounted filesystems. Currently, they and up as "/file", where
"/" is the name of the root dentry, and "file" is the name of the
child dentry.

* Unreachable paths. Currently, we report them as absolute paths, so there
is no way to tell the one from the other.

What I'm after here is two things:

(1) fix the bug with the lazily unmounted dirs, and

(2) allow to tell when a path is unreachable.

The key problem is that we know when we hit a disconnected path, but it's hard
to tell the pseudo filesystem case from the root filesystem case because
rootfs also is a pseudo filesystem.

Let's look at a few choices we have:

[0] Now:
--------
pipe: "pipe:[439336]"
lazily umounted dir: "dirfile" <== bug
lazily unmounted fs: "/file" <== ambiguous
path on rootfs: "/file" <== ambiguous
rootfs root: "/" <== ambiguous

[1] Always make disconnected paths relative:
--------------------------------------------
pipe: "pipe:[439336]" (or "pipe/[439336]")
lazily umounted dir: "dir/file"
lazily unmounted fs: "file"
path on rootfs: "file"
rootfs root: "."

[2] Make disconnected pseudo-fs paths relative and others absolute;
special case the rootfs:
------------------------
pipe: "pipe:[439336]" (or "pipe/[439336]")
lazily umounted dir: "//dir/file"
lazily unmounted fs: "//file"
path on rootfs: "//file"
rootfs root: "//"

[3] Always make disconnected paths double-slashed:
--------------------------------------------------
pipe: "//pipe/[439336]"
lazily umounted dir: "//dir/file"
lazily unmounted fs: "//file"
unreachable root: "//"


From all these choices, I actually like [1] best, together with hiding
unreachable mount points in /proc/$pid/mounts and /proc/$pid/mountstats:
there is no real point in pretending that unreachable paths are reachable;
whichever process uses these paths is just as much in error when trying from
the pwd as when trying from the chroot.

Hiding unreachable mount points has the benefits that processes will not get
to see relative paths in /proc/$pid/mounts and /proc/$pid/mountstats. Those
paths are irrelevant to them anyway; just consider a chroot()ed process that
currently sees all sorts of crap.

A process that hasn't been chroot()ed will still see all filesystems except
the rootfs, so no problem there. Hiding the rootfs by virtue of it being
unreachable is actually be a good thing: nobody is actually going to be able
to do anything with it anyway; it's just confusing, and adds unnecessary
special cases to scripts like mkinitrd.

I would actually also prefer to see the hack go that leaves out the slash
between pathame components (so "pipe:[439336]" would just
become "pipe/[439336]") -- it's good for nothing other than backward
compatibility.

Opinions?

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Sunday 04 February 2007 16:15, Neil Brown wrote:
> The behaviour in the face of a lazy unmount should be clarified in
> this comment.

Done.

> If sys_getcwd is called on a directory that is no longer
> connected to the root, it isn't clear to me that it should return
> without an error.
> Without your patch it can return garbage which is clearly wrong.
> With you patch it will return a relative path name, which is also
> wrong (it isn't a valid path that leads to the current working directory).

Right, it should return -ENOENT instead in that case. Fixed as well.

> I would suggest that 'fail_deleted' be (e.g.) changed to
> 'fail_condition' where two conditions are defined
> #define DPATH_FAIL_DELETED 1
> #define DPATH_FAIL_DISCONNECTED 2

The much cleaner interface is to check if the path returned starts with a
slash. If it doesn't, we know the path is bad as far as sys_getcwd() is
concerned. We will construct the partial path in __d_path before figuring out
that the path is disconnected, so no performance penalty, either.

> In reality, you are comparing "buflen < namelen+1" but spelling it as
> "buflen <= namelen". I would prefer the full spelling with least room
> for confusion.

I'm fine either way.

> Maybe:
> > + buflen -= namelen + 1;
> > + buffer -= namelen + 1;
> > + memcpy(buffer+1, dentry->d_name.name, namelen);
> > + *buffer = '/';

That's better, yes.

Thanks,
Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wednesday 14 February 2007 14:57, Andreas Gruenbacher wrote:
> [1] Always make disconnected paths relative:
>
> From all these choices, I actually like [1] best, together with hiding
> unreachable mount points in /proc/$pid/mounts and /proc/$pid/mountstats:
> there is no real point in pretending that unreachable paths are reachable;
> whichever process uses these paths is just as much in error when trying
> from the pwd as when trying from the chroot.

So here is how this could be implemented. See the lengthy explanations in the
patch headers, too.

d_path-lazy-unmounts.diff

- Now fails sys_getcwd() with -ENOENT in case the path computed
doesn't lead to the chroot.
- No longer reports empty paths, but report '.' in that case
instead.
- Incorporates Neil's other minor feedback.

no-unreachable-paths.diff

Hide unreachable mount points in /proc/mounts and /proc/$PID/mountstats.
In particular, this includes the rootfs mount for all processes that
are not chroot()ed in that mount.

Andreas
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
Hi,

On Feb 14 2007 14:57, Andreas Gruenbacher wrote:
>[2]
> ------------------------
>pipe: "pipe:[439336]" (or "pipe/[439336]")
>
>[3] Always make disconnected paths double-slashed:
>--------------------------------------------------
>pipe: "//pipe/[439336]"
>lazily umounted dir: "//dir/file"
>lazily unmounted fs: "//file"
>unreachable root: "//"
>
>Opinions?

As for [2]/[3]:

What's the point in changing pipefs... you can *never*
reach it *anyway*, even if it was a /-style path, since
pipefs is a NOMNT filesystem.

That said, programs like lsof might break when it changes
away from "pipe:[integer]" (same goes for socket:, etc.)


Jan
--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Thursday 15 February 2007 04:53, Jan Engelhardt wrote:
> What's the point in changing pipefs... you can *never*
> reach it *anyway*, even if it was a /-style path, since
> pipefs is a NOMNT filesystem.

The point is that we could then get rid of the special case for MS_NOUSER
filesystems like pipefs in __d_path(). (This special case caused the lazy
unmounted dir bug in the first place.) It is likely not really worth it,
though.

Andreas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Fix d_path for lazy unmounts [ In reply to ]
On Wednesday 14 February 2007 19:13, Andreas Gruenbacher wrote:
> So here is how this could be implemented. See the lengthy explanations in
> the patch headers, too.

Turns out I messed up with one of Neil's review comments. Here is yet another
update.

With this fix, everything works as expected in testing now. sys_getcwd
returns -ENOENT for paths not connected to the chroot. /proc/mounts
and /proc/$pid/mountstats never contain a relative path. gnome-vfs-daemon
seems quite happy again.

Andreas