Mailing List Archive

bug in xenstored? No notification to subscription on @introduceDomain
Good day.

I think I met some strange bug in xenstored.

I using XCP for long time and all that time we have some funny bug we
was not able to debug enough due product environment and very low chance
to appear, now we was able to catch it in testing environment and done
some research.

We have python application running in dom0 and waiting domain
appearance. This implemented this via subscription to @introduceDomain
xenstore key. Under some conditions we stops to receive notification on
subscription. If we ran application as second instance it will receive
that notification, if we restart application it will receive too.

I unable to pinpoint exact condition for this, but this
a) Happens occasionally but consistently (about once a month in farm of
50 hosts at least at one host)
b) Not related to xenstored uptime
c) Not related to load on xen or dom0
d) Not related to amount of domains
e) Occur at least at XCP 0.5, 1.0 and 1.1 (I don't know how to get
version from xenstored)

Last time I got that on two hosts in lab at same time (with single guest
domain without any high load) and done some experiments - so I can say
exactly I wrote above.

The pieces from python code we ran:

from xen.lowlevel.xs import xs
conn = xs.xs()
conn.watch("@introduceDomain", "+")
conn.watch("@releaseDomain", "-")
conn.read_watch()

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: bug in xenstored? No notification to subscription on @introduceDomain [ In reply to ]
On Fri, 2011-12-09 at 19:49 +0000, George Shuklin wrote:
> Good day.
>
> I think I met some strange bug in xenstored.

If you are using XCP then this will be using oxenstored. I've CC'd
xen-api@ since that is the correct place for XCP discussions.

It's also plausibly a bug in the C client library or the python bindings
to that library (or indeed your application).

> I using XCP for long time and all that time we have some funny bug we
> was not able to debug enough due product environment and very low chance
> to appear, now we was able to catch it in testing environment and done
> some research.
>
> We have python application running in dom0 and waiting domain
> appearance. This implemented this via subscription to @introduceDomain
> xenstore key. Under some conditions we stops to receive notification on
> subscription. If we ran application as second instance it will receive
> that notification, if we restart application it will receive too.

You lose both @introduce and @release notifications or just @introduce?

Does the app do any other XS stuff, e.g. other watches or read/write? Do
these stop working also?

oxenstored (at least in XCP) logs to /var/log/xenstore-access.log -- do
you see any activity in there? There is also /var/log/xenstored.log

Does strace show the daemon writing (or trying to write) to the socket
associated with this client? What about on the client side? (nb:
libxenstore uses a thread to handle watches so be sure to use the
appropriate options to strace.) Identifying the fd associated with the
connection on either end might be tricky, /proc/<pid>/fd and/or netstat
might help narrow it down.

The app being python presumably makes it hard to attach gdb to and get
anything sensible, likewise the daemon being ocaml. If anyone has any
hints on attaching a debugging to an existing process of these types
then that might be useful.

Other than that I'm afraid I really don't have any idea what might be
going wrong, or indeed what other next steps can be taken to diagnose
the issue :-(

Ian.

> I unable to pinpoint exact condition for this, but this
> a) Happens occasionally but consistently (about once a month in farm of
> 50 hosts at least at one host)
> b) Not related to xenstored uptime
> c) Not related to load on xen or dom0
> d) Not related to amount of domains
> e) Occur at least at XCP 0.5, 1.0 and 1.1 (I don't know how to get
> version from xenstored)
>
> Last time I got that on two hosts in lab at same time (with single guest
> domain without any high load) and done some experiments - so I can say
> exactly I wrote above.
>
> The pieces from python code we ran:
>
> from xen.lowlevel.xs import xs
> conn = xs.xs()
> conn.watch("@introduceDomain", "+")
> conn.watch("@releaseDomain", "-")
> conn.read_watch()
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: bug in xenstored? No notification to subscription on @introduceDomain [ In reply to ]
Please don't top post and don't drop people/lists from the CC. I have
reinstated xen-devel and refrained from trimming the quotes as heavily
as I normally would.

Counter to my own advice I have also dropped xen-hosts@googlegroups.com
because last time I got a bounce in Russian to the effect that the group
does not exist (according to google translate).

On Mon, 2011-12-12 at 12:10 +0000, George Shuklin wrote:
> Thanks for reply.
>
> The problem is we tried at least two different libraries - xs (+python
> xen.lowlevel.xs) and our own library (pyxs), created from scratches on
> pure python - both shows exactly same behavior. We loosing same time
> @introduce and @release, but only for new domains. Older domains (which
> starts before error appear) during shutdown/migration sends @release
> normally.
>
> I done strace, nothig is sending by xenstored to application socket when
> 'new' domains appears and disappears (I'm not sure 100% due not very
> good strace skills).
>
> Application performs write/read operations to/from xenstore (and do many
> subscriptions, but only after @introduce) and older subscription works fine.
>
> PS We got other strange bug with memory leak in xenstored (happens only
> with big amount of transactions, and ONLY with socket) - but this case
> is still under research, so I decide not to post this (but may be it
> related somehow?).

Are the two event correlated? i.e. is the oxenstored process huge when
these failures occur? Inability to allocate memory could explain some of
your symptoms although I'd expect it to be more fatal more quickly and
obviously than what you describe or to have wider impact.

> Sorry for question - how I can gather debug information for oxenstored?

What sort of debug information are you after?

There are various logging options which you could turn up to 11
in /etc/xensource/xenstored.conf but I do not have a complete list of
what they are, similarly for command line options -- perhaps someone on
xen-api@ could chime in? Otherwise looking in the source might be the
best way to find out what they are, try xenstore.ml, parse_args.ml
logging.ml would be good places to start. (if having done so you feel
motivated to write a patch to add docs/man/oxenstored.1.pod we would be
much obliged...)

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: bug in xenstored? No notification to subscription on @introduceDomain [ In reply to ]
On Mon, 2011-12-12 at 11:31 +0000, Ian Campbell wrote:
> Does the app do any other XS stuff, e.g. other watches or read/write? Do
> these stop working also?

One other question -- does your app use threading anywhere apart from
the one it gets from libxenstore?

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: bug in xenstored? No notification to subscription on @introduceDomain [ In reply to ]
On 12.12.2011 17:46, Ian Campbell wrote:
> Please don't top post and don't drop people/lists from the CC. I have
> reinstated xen-devel and refrained from trimming the quotes as heavily
> as I normally would.
>
> Counter to my own advice I have also dropped xen-hosts@googlegroups.com
> because last time I got a bounce in Russian to the effect that the group
> does not exist (according to google translate).
>
> On Mon, 2011-12-12 at 12:10 +0000, George Shuklin wrote:
>> Thanks for reply.
>>
>> The problem is we tried at least two different libraries - xs (+python
>> xen.lowlevel.xs) and our own library (pyxs), created from scratches on
>> pure python - both shows exactly same behavior. We loosing same time
>> @introduce and @release, but only for new domains. Older domains (which
>> starts before error appear) during shutdown/migration sends @release
>> normally.
>>
>> I done strace, nothig is sending by xenstored to application socket when
>> 'new' domains appears and disappears (I'm not sure 100% due not very
>> good strace skills).
>>
>> Application performs write/read operations to/from xenstore (and do many
>> subscriptions, but only after @introduce) and older subscription works fine.
>>
>> PS We got other strange bug with memory leak in xenstored (happens only
>> with big amount of transactions, and ONLY with socket) - but this case
>> is still under research, so I decide not to post this (but may be it
>> related somehow?).
> Are the two event correlated? i.e. is the oxenstored process huge when
> these failures occur? Inability to allocate memory could explain some of
> your symptoms although I'd expect it to be more fatal more quickly and
> obviously than what you describe or to have wider impact.
Nope, memory leak occur only if transaction happens with subscription,
but 'no notification' problem continues after we stops to use
transaction (this cure memory leak completely, so I think this is
separate issue, but I don't sure).

I still can't catch condition for lack of notifications for @introduce,
sorry (I got one more this morning in test pool).

>> Sorry for question - how I can gather debug information for oxenstored?
> What sort of debug information are you after?
>
> There are various logging options which you could turn up to 11
> in /etc/xensource/xenstored.conf but I do not have a complete list of
> what they are, similarly for command line options -- perhaps someone on
> xen-api@ could chime in? Otherwise looking in the source might be the
> best way to find out what they are, try xenstore.ml, parse_args.ml
> logging.ml would be good places to start. (if having done so you feel
> motivated to write a patch to add docs/man/oxenstored.1.pod we would be
> much obliged...)
>
Ok, thanks, I'll dig to sources to set up them all. We heavily using
xenstore for dynamic memory regulation (about five operations for every
domain per second).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: bug in xenstored? No notification to subscription on @introduceDomain [ In reply to ]
On 12.12.2011 17:55, Ian Campbell wrote:
> On Mon, 2011-12-12 at 11:31 +0000, Ian Campbell wrote:
>> Does the app do any other XS stuff, e.g. other watches or read/write? Do
>> these stop working also?
> One other question -- does your app use threading anywhere apart from
> the one it gets from libxenstore?
>
Yes, it is!

We using multithread model (that why we wrote an alternative library to
access xenstore - to get normal multithread subscription). But this
problem happens before we start multithread, with single-thread application.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel