Mailing List Archive

[xen staging] tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
commit beb105af19cc3e58e2ed49224cfe190a736e5fa0
Author: David Woodhouse <dwmw@amazon.co.uk>
AuthorDate: Thu Mar 19 20:40:24 2020 +0000
Commit: Andrew Cooper <andrew.cooper3@citrix.com>
CommitDate: Fri Aug 14 16:14:21 2020 +0100

tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating

The do_ls() function has somewhat inconsistent handling of errors.

If reading the node's contents with xs_read() fails, then do_ls() will
just quietly not display the contents.

If reading the node's permissions with xs_get_permissions() fails, then
do_ls() will print a warning, continue, and ultimately won't exit with
an error code (unless another error happens).

If recursing into the node with xs_directory() fails, then do_ls() will
abort immediately, not printing any further nodes.

For persistent failure modes — such as ENOENT because a node has been
removed, or EACCES because it has had its permisions changed since the
xs_directory() on the parent directory returned its name — it's
obviously quite likely that if either of the first two errors occur for
a given node, then so will the third and thus xenstore-ls will abort.

The ENOENT one is actually a fairly common case, and has caused tools to
fail to clean up a network device because it *apparently* already
doesn't exist in xenstore.

There is a school of thought that says, "Well, xenstore-ls returned an
error. So the tools should not trust its output."

The natural corollary of this would surely be that the tools must re-run
xenstore-ls as many times as is necessary until its manages to exit
without hitting the race condition. I am not keen on that conclusion.

For the specific case of ENOENT it seems reasonable to declare that,
but for the timing, we might as well just not have seen that node at
all when calling xs_directory() for the parent. By ignoring the error,
we give acceptable output.

The issue can be reproduced as follows:

(dom0) # for a in `seq 1 1000` ; do
xenstore-write /local/domain/2/foo/$a $a ;
done

Now simultaneously:

(dom0) # for a in `seq 1 999` ; do
xenstore-rm /local/domain/2/foo/$a ;
done
(dom2) # while true ; do
./xenstore-ls -p /local/domain/2/foo | grep -c 1000 ;
done

We should expect to see node 1000 in the output, every time.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
tools/xenstore/xenstore_client.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tools/xenstore/xenstore_client.c b/tools/xenstore/xenstore_client.c
index 3afc630ab8..ae7ed3eb9e 100644
--- a/tools/xenstore/xenstore_client.c
+++ b/tools/xenstore/xenstore_client.c
@@ -148,14 +148,20 @@ static void do_ls(struct xs_handle *h, char *path, int cur_depth, int show_perms
int i;
unsigned int num, len;

+ e = xs_directory(h, XBT_NULL, path, &num);
+ if (e == NULL) {
+ if (cur_depth && errno == ENOENT) {
+ /* If a node disappears while recursing, silently move on. */
+ return;
+ }
+
+ err(1, "xs_directory (%s)", path);
+ }
+
newpath = malloc(STRING_MAX);
if (!newpath)
err(1, "malloc in do_ls");

- e = xs_directory(h, XBT_NULL, path, &num);
- if (e == NULL)
- err(1, "xs_directory (%s)", path);
-
for (i = 0; i<num; i++) {
char buf[MAX_STRLEN(unsigned int)+1];
struct xs_permissions *perms;
--
generated by git-patchbot for /home/xen/git/xen.git#staging