Mailing List Archive

weird hvm performance issue.
(I posted this on xen-users last week, and nobody seemed to know what was
up. It is time, I suppose then, to dust off the debugger; I was hoping
one of you could point me in the correct direction to begin
troubleshooting this.)

I'm HVM booting our install-disks (systemimager) here, and I'm spotting some
weird performance issues. Our install uses rsync and prints every file as it
copies it. I notice that files scroll quickly for a while, then slowly. the
slowness continues until I press a button (within xm console) at which point
the slowness immediately vanishes.


I watch 'top' in the Dom0 in another window, and I see that when files scroll
slowly, qemu-dm uses at most 1% cpu; once I press a key and it moves fast
again, qemu-dm uses up to 70% of a CPU. (this is on a box with 4 cores) The
correlation is extremely strong; I'm fairly certain it's not just in my head.

The CPU usage of qemu-dm correlates with the speed of the scroll; exiting from
the xm console does not appear to effect the cpu usage of qemu-dm- it
will drop to 1% after several minutes.

any ideas what is going on here? any ideas how I could debug this
further?

my xm config file looks something like this:

kernel = "/usr/lib/xen/boot/hvmloader"
device_model = '/usr/lib64/xen/bin/qemu-dm'
builder='hvm'
memory = 512
shadow_memory = 8
vcpus=1
vif = [ 'type=ioemu, bridge=xenbr0' ]
nographic=1
sdl=0
vnc=0
serial='pty'
name = "VM1-64"
fda='/var/tmp/disk_dev.img'
disk = ['file:/var/tmp/si.iso,hdb:cdrom,w',
'phy:/dev/sda5,hda,w']
boot='da'
on_poweroff = 'destroy'
on_crash = 'restart'



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: weird hvm performance issue. [ In reply to ]
On 29/1/07 18:08, "Luke S. Crawford" <lsc@prgmr.com> wrote:

> The CPU usage of qemu-dm correlates with the speed of the scroll; exiting
> from
> the xm console does not appear to effect the cpu usage of qemu-dm- it
> will drop to 1% after several minutes.
>
> any ideas what is going on here? any ideas how I could debug this
> further?

This kind of thing can be a pain to debug. Perhaps instrument qemu-dm and
find out interesting things like how often its select() call returns due to
an event on a file descriptor rather than due to hitting the timeout?

Does the slowness happen if you are using the emulated network device at the
same time (i.e., does the event to kick back to normal behaviour have to be
a console event)?

I guess something may have crept in when we moved to using qemu's
asynchronous block i/o code...

-- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: weird hvm performance issue. [ In reply to ]
Thanks for your quick response! (mine was delayed by figuring out
oprofile.)

On Mon, 29 Jan 2007, Keir Fraser wrote:
> On 29/1/07 18:08, "Luke S. Crawford" <lsc@prgmr.com> wrote:
>> The CPU usage of qemu-dm correlates with the speed of the scroll; exiting
>> from
>> the xm console does not appear to effect the cpu usage of qemu-dm- it
>> will drop to 1% after several minutes.

> This kind of thing can be a pain to debug. Perhaps instrument qemu-dm and
> find out interesting things like how often its select() call returns due to
> an event on a file descriptor rather than due to hitting the timeout?

Okay, I have oprofile installed, and mostly figured out;

so opreport without console input:

[root@1950-2 ~]# opreport -l /usr/lib64/xen/bin/qemu-dm|head
CPU: P4 / Xeon with 2 hyper-threads, speed 3191.87 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
13193 32.7483 main_loop_wait
5787 14.3648 cpu_handle_ioreq
3873 9.6138 DMA_run
1904 4.7262 qemu_get_clock
1683 4.1776 qemu_run_timers
1333 3.3088 qemu_del_timer
985 2.4450 __handle_ioreq



with console input:
[root@1950-2 ~]# opreport -l /usr/lib64/xen/bin/qemu-dm|head
CPU: P4 / Xeon with 2 hyper-threads, speed 3191.87 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not
stopped) with a unit mask of 0x01 (mandatory) count 100000
samples % symbol name
11299 29.0904 main_loop_wait
4944 12.7288 cpu_handle_ioreq
3104 7.9916 DMA_run
1729 4.4515 cpu_physical_memory_rw
1481 3.8130 qemu_get_clock
1431 3.6843 iomem_index
1392 3.5838 qemu_run_timers


(those two look about the same; but the slow one, it took a good minute
to get to 3000 samples, wheras on the 'with console input' test it took
maybe 10 seconds.

hm. I need to do more reading on this oprofile in the morning (I don't
have remote access to a HVM-capable system from home at this moment)


> Does the slowness happen if you are using the emulated network device
at the
> same time (i.e., does the event to kick back to normal behaviour have to be
> a console event)?

pinging it doesn't help. (the thing that is triggering this is rsync
over the network to the drive) I will re-install with a HVM that is a
proper linux box (rather than a systemimager boot floppy) and attempt to
re-create the problem with dd or rsync in the morning.

> I guess something may have crept in when we moved to using qemu's
> asynchronous block i/o code...

It is using rsync to copy several gigabytes of files, so it is certainly
excersizing the block I/O

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
Re: weird hvm performance issue. [ In reply to ]
On Mon, 29 Jan 2007, Keir Fraser wrote:
> On 29/1/07 18:08, "Luke S. Crawford" <lsc@prgmr.com> wrote:
>
>> The CPU usage of qemu-dm correlates with the speed of the scroll; exiting
>> from
>> the xm console does not appear to effect the cpu usage of qemu-dm- it
>> will drop to 1% after several minutes.
>>
>> any ideas what is going on here? any ideas how I could debug this
>> further?
>
> This kind of thing can be a pain to debug. Perhaps instrument qemu-dm and
> find out interesting things like how often its select() call returns due to
> an event on a file descriptor rather than due to hitting the timeout?


What tools should I use for that? I don't see how to make oprofile give
me the return values of select- in fact oprofile seems to be mostly
tailored to profiling CPU usage, and CPU usage does not appear to be the
problem here. My oprofile results (both opannotate and opreport) seem to
be largely the same in both fast and slow cases; the main difference
being that the samples accumulate much faster in 'fast' mode than in
'slow' mode.

Should I be ignoring these tools and adding logging to the code itself?


Here is some of what I got using strace:

fast:
[root@1950-2 xen-3.0.4-testing.hg]# strace -p 4013 -c
Process 4013 attached - interrupt to quit
Process 4013 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
66.58 0.018072 1 27020 write
14.73 0.003997 0 74281 clock_gettime
8.81 0.002391 0 26481 select
3.77 0.001024 0 24815 read
2.70 0.000733 0 7869 munlock
2.49 0.000677 0 32441 ioctl
0.91 0.000248 0 7869 mlock
0.00 0.000000 0 496 lseek
------ ----------- ----------- --------- --------- ----------------
100.00 0.027142 201272 total

slow:
[root@1950-2 xen-3.0.4-testing.hg]# strace -p 4013 -c
Process 4013 attached - interrupt to quit
Process 4013 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
29.89 0.011048 0 54389 read
27.61 0.010202 0 57631 write
15.52 0.005734 0 158975 clock_gettime
11.93 0.004408 0 73270 ioctl
11.17 0.004127 0 56038 select
2.31 0.000853 0 20786 mlock
1.58 0.000585 0 20786 munlock
0.00 0.000000 0 784 lseek
0.00 0.000000 0 6 gettimeofday
------ ----------- ----------- --------- --------- ----------------
100.00 0.036957 442665 total

It's still not what you are asking for, but somewhat interesting, I hope.
I do notice that the 'fast' spends a whole lot more time in write.

(strace seems to be rather high-overhead; running strace, the 'fast' is
much slower than the normal 'slow', and I'm not seeing much consistancy
here, so perhaps strace is the wrong tool for the job?)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel