Mailing List Archive

Monitoring domU resource usage
Using 2.0-testing, what methods are people using to monitor
individual domU CPU usage?

Obviously I can run an snmpd in each domU and see how busy the CPU
is, but that could be tampered with from inside the domU, and as the
domU doesn't know it doesn't have the whole cpu needs some
interpretation.

I can see CPU time used in "xm list" but that looks like rather a
blunt tool - would I have to do something like this for example:

Name Id Mem(MB) CPU State Time(s) Console
Domain-0 0 507 0 r---- 7644.0
foo 1 127 1 -b--- 2585.9 9601
bar 2 63 1 -b--- 147.6 9602

(5 minutes later)

Name Id Mem(MB) CPU State Time(s) Console
Domain-0 0 507 0 r---- 7841.5
foo 1 127 1 -b--- 2593.2 9601
bar 2 63 1 -b--- 184.9 9602

dom0 has used 7841.5-7644.0=197.5 seconds, foo used 7.3s, bar used
37.3s. In 5 minutes there are 300 seconds so dom0 used
197.5/300*100=65.83% CPU, foo 2.43%, bar 12.43%, machine was 19.31%
idle or overhead.

That still doesn't give a way to tell how much CPU a domU *wanted*
to have, though. i.e. if a domU could make use of more CPU share if
given it.

Andy
Re: Monitoring domU resource usage [ In reply to ]
On Wed, 8 Jun 2005, Andy Smith wrote:

> Using 2.0-testing, what methods are people using to monitor
> individual domU CPU usage?
>
> Obviously I can run an snmpd in each domU and see how busy the CPU
> is, but that could be tampered with from inside the domU, and as the
> domU doesn't know it doesn't have the whole cpu needs some
> interpretation.
>
> I can see CPU time used in "xm list" but that looks like rather a
> blunt tool - would I have to do something like this for example:

AFAIK, from what you describe you don't have much choice, you are going to
have to watch from _both_ the inside and the outside. I guess you might be
able to guess which domU is requesting CPU if the total CPU on the box
goes to 100%.

I have a script that polls xm list ever 60 seconds and logs the results.
From this I can get a feeling for what % of the CPU is being consumed and
which dom is using it...

e.g.

[root@xen1 ~]# more ~tbrown/uptime.log.xen
Wed Jun 8 00:00:01 PDT 2005
Domain-0 2.48% cpu usage 102461.38 sec over 47.91 days
domain-dns 16.91% cpu usage 501043.99 sec over 34.30 days
mailman 0.70% cpu usage 6938.28 sec over 11.53 days
mx 27.56% cpu usage 605861.02 sec over 25.44 days

1 7 1 20 68
1 11 2 27 55
3 96 0 35 -36
2 48 0 26 21
1 8 0 32 56
1 5 0 31 60
2 6 0 38 51
1 12 0 21 64
2 13 0 34 49
2 21 0 36 37
1 8 0 31 57
2 14 0 34 48


Where the columns are the doms in the order described in the header... and
the last column being idle time.

hhmm, the script is only 67 lines... I will include it...
The polling frequency and number of samples to take are
commandline parameters, I run it hourly as "script 60 59" in my
crontab.

here goes:

#!/usr/bin/perl
#
# script to dump CPU stats for VMs
#
# vim:ai

use strict;

my $interval = shift || 5;
my $num_samples = shift || 0;

my $uptime = `cat /proc/uptime`;
$uptime =~ s/ .*$//g; # trim from first space, should leave us uptime in
secs

my $XM="/usr/sbin/xm";


my $loop = 0;
my $lasttimestamp = 0;
my %lastcpu = ();
while ( ++$loop ) { # loop forever
my $buf = '';
my $count = 0;
my $dat = `$XM list -l`
or die "no output from $XM list ?? maybe it isn't in your path?";
my $datatimestamp = time();
$dat =~ tr/\(\)/{}/; # for readability of regex below.
my $totcpu = 0;

while ( my($dom,$rest) = ($dat =~ m/^(.*?\n})(.*)$/gs ) ) {
my ($domcpu) = ($dom =~ m/{cpu_time\s([\d\.]+)}/)
or die "couldn't extract cpu_time from $dom on dom $count\n";
my ($domup) = $uptime;
if ($count > 0 ) {
($domup) = ($dom =~ m/{up_time\s([\d\.]+)}/)
or die "couldn't extract up_time from $dom on dom $count\n";
}
my $domname = "dom-$count";
{
my ($tmp) = ($dom =~ m/{name\s(.+)}/);
$domname = $tmp if ($tmp);
}

if ($loop <= 1) {
$buf .= sprintf "%10s %5.2f%% cpu usage %.2f sec over %.2f days\n",
$domname, 100 * $domcpu/$domup, $domcpu, $domup/24/3600;
} else {
my $cpu = $domcpu-$lastcpu{$count};
$totcpu += $cpu;
$buf .= sprintf "%3d ",
100 * $cpu/$interval;
}
$lastcpu{$count} = $domcpu;
$dat = $rest;
$count++;
die "count exceeded" if ($count > 100);
}
my $period = ($datatimestamp - $lasttimestamp);
if ($loop > 1) { # add on idle cpu
$buf .= sprintf "%3d", 100 * ($interval - $totcpu)/$period;
}
$lasttimestamp = $datatimestamp;
print "$buf\n";
exit if ($num_samples && $loop > $num_samples);
sleep $interval;
}


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users