Mailing List Archive

Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save'
On Fri, Sep 28, 2012 at 06:34:51PM +0800, Zhenzhong Duan wrote:
> On 2012-09-27 19:59, Konrad Rzeszutek Wilk wrote:
> >On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote:
> >>
> >>On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote:
> >>>On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote:
> >>>>Konrad Rzeszutek Wilk wrote:
> >>>>>On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote:
> >>>>>>Hi maintainers,
> >>>>>>
> >>>>>>I found there is an issue when 'xm save' a pvm guest. See below:
> >>>>>>
> >>>>>>When I do save then restore once, CPU(%) in xentop showed around 99%.
> >>>>>>When I do that second time, CPU(%) showed 199%
> >>>>>>
> >>>>>>top in dom0 showed:
> >>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> >>>>>> 20946 root 18 -2 10984 1284 964 S 19.8 0.3 0:48.93 block
> >>>>>> 4939 root 18 -2 10984 1288 964 S 19.5 0.3 1:34.68 block
> >>>>>>
> >>>>>>I could kill the block process, then all look normal again.
> >>>>>What is the 'block' process? If you attach 'perf' to it do you get an idea
> >>>>>of what it is spinning at?
> >>>>It's /etc/xen/scripts/block
> >>>>I add 'set -x' to /etc/xen/scripts/block, found it blocked at claim_lock.
> >>>>When domU was created first time, claim_lock/release_lock finished quickly,
> >>>>when 'xm save' was called, claim_lock spin in its own while loop.
> >>>>I can ensure no other domU create/save/etc happen when I test.
> >>>OK, so how come you have two block processes? Is it b/c you have two
> >>>disks attached to the guest? The are multiple claim_lock in the shell
> >>>script - do you know where each of two threads are spinning? Are they
> >>>spinning on the same function?
> >>In above test, I run save/restore twice, so two block processes.
> >>In other test, run save/restore once, there is only one block process.
> >>After do 'xm save', I see block process spin at line 328:
> >>321 remove)
> >>322 case $t in
> >>323 phy)
> >>324 exit 0
> >>325 ;;
> >>326
> >>327 file)
> >>328 claim_lock "block"
> >>329 node=$(xenstore_read "$XENBUS_PATH/node")
> >>330 losetup -d "$node"
> >>331 release_lock "block"
> >>332 exit 0
> >>333 ;;
> >So with the patches in OVM - do they have this fixed? Can they be upstreamed
> >or are the dependent on some magic OVM sauce?
> After replace locking.sh with OVM's, it worked.
> But xen-tools evolved to use flock based locking currently. We can't
> revert back.
> It seems changeset 25595:497e2fe49455 bring the issue.
> Finally, I came with a small patch that workaround the issue.

Ian, any thoughts on how this can be fixed properly? The locking
is quite heavy and causes two scripts that do deal with blocks
to spin for a long time.

>
> diff -r d364becfb083 tools/hotplug/Linux/locking.sh
> --- a/tools/hotplug/Linux/locking.sh Thu Sep 20 13:31:19 2012 +0200
> +++ b/tools/hotplug/Linux/locking.sh Fri Sep 28 18:27:31 2012 +0800
> @@ -66,6 +66,7 @@
> release_lock()
> {
> _setlockfd $1
> + flock -u $_lockfd
> rm "$_lockfile"
> }

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save' [ In reply to ]
Konrad Rzeszutek Wilk writes ("Is: Contention in block script when doing guest saving. Was:Re: [Xen-devel] an issue with 'xm save'"):
> On Fri, Sep 28, 2012 at 06:34:51PM +0800, Zhenzhong Duan wrote:
> > After replace locking.sh with OVM's, it worked.
> > But xen-tools evolved to use flock based locking currently. We can't
> > revert back.
> > It seems changeset 25595:497e2fe49455 bring the issue.
> > Finally, I came with a small patch that workaround the issue.
>
> Ian, any thoughts on how this can be fixed properly? The locking
> is quite heavy and causes two scripts that do deal with blocks
> to spin for a long time.

The algorithm is not supposed to spin. If it spins there is a bug, I
think.

Can you show me a transcript of the spinning happening with "set -x"
in force ?

> > diff -r d364becfb083 tools/hotplug/Linux/locking.sh
> > --- a/tools/hotplug/Linux/locking.sh Thu Sep 20 13:31:19 2012 +0200
> > +++ b/tools/hotplug/Linux/locking.sh Fri Sep 28 18:27:31 2012 +0800
> > @@ -66,6 +66,7 @@
> > release_lock()
> > {
> > _setlockfd $1
> > + flock -u $_lockfd
> > rm "$_lockfile"

I'm not sure I understand how this would help.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save' [ In reply to ]
On 2012-09-28 22:14, Ian Jackson wrote:
> Konrad Rzeszutek Wilk writes ("Is: Contention in block script when doing guest saving. Was:Re: [Xen-devel] an issue with 'xm save'"):
>> On Fri, Sep 28, 2012 at 06:34:51PM +0800, Zhenzhong Duan wrote:
>>> After replace locking.sh with OVM's, it worked.
>>> But xen-tools evolved to use flock based locking currently. We can't
>>> revert back.
>>> It seems changeset 25595:497e2fe49455 bring the issue.
>>> Finally, I came with a small patch that workaround the issue.
>> Ian, any thoughts on how this can be fixed properly? The locking
>> is quite heavy and causes two scripts that do deal with blocks
>> to spin for a long time.
> The algorithm is not supposed to spin. If it spins there is a bug, I
> think.
>
> Can you show me a transcript of the spinning happening with "set -x"
> in force ?
Hi Ian,
Attachment is part of xen-hotplug.log that will show the spin.
Original file is nearly 7G big. I fetch first 1000 lines.

Below is steps I did:
1. delete xen-hotplug.log
2. reboot dom0
3. add "set -x" to head of block script.
4.create first PV guest
5. save guest
6. get first 1000 lines from xen-hotplug.log

thanks
zduan
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save' [ In reply to ]
Zhenzhong Duan writes ("Re: [Xen-devel] Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save'"):
> Attachment is part of xen-hotplug.log that will show the spin.
> Original file is nearly 7G big. I fetch first 1000 lines.

Thanks. This is quite mystifying to me.


Picking a representative example, and translating it to what I think
the syscalls would be:

> + eval 'exec 200>>/var/run/xen-hotplug/block'
> ++ exec

fd = open("/var/run/xen-hotplug/block", O_WRONLY|O_APPEND|O_CREAT, 0666);
dup2(fd, 200);
close(fd);

> + flock -x 200

flock(200, LOCK_EX);

> ++ perl -e '
> open STDIN, "<&200" or die $!;

dup2(200, 0);

> my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;

fstat(0, &stdin_stat);

> my $file_inum = (stat $ARGV[0])[1];

stat("/var/run/xen-hotplug/block", &file_stat);

> print "y\n" if $fd_inum eq $file_inum;

if (stdin_stat.st_ino == file_stat.st_ino)
puts("y");

> ' /var/run/xen-hotplug/block
> + rightfile=

And here we see that perl didn't print "y" so the two files must be
different.


Let's try something else: can you strace it ? That is, get it to the
point where it's spinning, find the pid of the relevant shell process,
and
strace -f -vvs500 -ooutput.strace PID
where PID is the pid in question ?

Let that run for a fraction of a second and then ^C it.

The output may shed some light.

Thanks,
Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
Re: Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save' [ In reply to ]
Ian Jackson wrote:
> Zhenzhong Duan writes ("Re: [Xen-devel] Is: Contention in block script when doing guest saving. Was:Re: an issue with 'xm save'"):
>
>> Attachment is part of xen-hotplug.log that will show the spin.
>> Original file is nearly 7G big. I fetch first 1000 lines.
>>
>
> Thanks. This is quite mystifying to me.
>
>
> Picking a representative example, and translating it to what I think
> the syscalls would be:
>
>
>> + eval 'exec 200>>/var/run/xen-hotplug/block'
>> ++ exec
>>
>
> fd = open("/var/run/xen-hotplug/block", O_WRONLY|O_APPEND|O_CREAT, 0666);
> dup2(fd, 200);
> close(fd);
>
>
>> + flock -x 200
>>
>
> flock(200, LOCK_EX);
>
>
>> ++ perl -e '
>> open STDIN, "<&200" or die $!;
>>
>
> dup2(200, 0);
>
>
>> my $fd_inum = (stat STDIN)[1]; die $! unless defined $fd_inum;
>>
>
> fstat(0, &stdin_stat);
>
>
>> my $file_inum = (stat $ARGV[0])[1];
>>
>
> stat("/var/run/xen-hotplug/block", &file_stat);
>
>
>> print "y\n" if $fd_inum eq $file_inum;
>>
>
> if (stdin_stat.st_ino == file_stat.st_ino)
> puts("y");
>
>
>> ' /var/run/xen-hotplug/block
>> + rightfile=
>>
>
> And here we see that perl didn't print "y" so the two files must be
> different.
>
>
> Let's try something else: can you strace it ? That is, get it to the
> point where it's spinning, find the pid of the relevant shell process,
> and
> strace -f -vvs500 -ooutput.strace PID
> where PID is the pid in question ?
>
> Let that run for a fraction of a second and then ^C it.
>
> The output may shed some light.
>
> Thanks,
> Ian.
>
Hi Ian,
Sorry for late, just back from vocation today.
Your requested strace info attached, any info you need, let me know.

--
Regards
zhenzhong
--
Oracle Building, No.24 Building, Zhongguancun Software Park
Haidian District, Beijing 100193, China