Mailing List Archive

OT: Can a linux vmware guest tell if its host is CPU constrained?
Having performance issues on a linux vmware guest that doesnt run vmtools
because its an 'appliance', but it does allow shell access. I assume CPU
utilisation shown by top etc is the utilisation of the vCPUs. Is there any
way to discover or infer host CPU issues?
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
On Mon, Jul 27, 2020 at 01:23:46PM +1000, Adam Carter wrote:
> Having performance issues on a linux vmware guest that doesnt run vmtools
> because its an 'appliance', but it does allow shell access. I assume CPU
> utilisation shown by top etc is the utilisation of the vCPUs. Is there any
> way to discover or infer host CPU issues?

Do you mean that you want to monitor the host system from the guest? Can you not
just SSH into the host from the guest? You can also infer CPU usage from the
/proc/stat file on the host system, if you can share files over NFS or some
other file-sharing means.

--

Ashley Dixon
suugaku.co.uk

2A9A 4117
DA96 D18A
8A7B B0D2
A30E BF25
F290 A8AA
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
On Mon, Jul 27, 2020 at 1:35 PM Ashley Dixon <ash@suugaku.co.uk> wrote:

> On Mon, Jul 27, 2020 at 01:23:46PM +1000, Adam Carter wrote:
> > Having performance issues on a linux vmware guest that doesnt run vmtools
> > because its an 'appliance', but it does allow shell access. I assume CPU
> > utilisation shown by top etc is the utilisation of the vCPUs. Is there
> any
> > way to discover or infer host CPU issues?
>
> Do you mean that you want to monitor the host system from the guest? Can
> you not
> just SSH into the host from the guest? You can also infer CPU usage
> from the
> /proc/stat file on the host system, if you can share files over NFS
> or some
> other file-sharing means.
>

I have ssh access (including root) to the guest but no access to the host.
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
On Mon, Jul 27, 2020 at 02:21:06PM +1000, Adam Carter wrote:
> On Mon, Jul 27, 2020 at 1:35 PM Ashley Dixon <ash@suugaku.co.uk> wrote:
> > Do you mean that you want to monitor the host system from the guest? Can you
> > not just SSH into the host from the guest? You can also infer CPU usage from
> > the /proc/stat file on the host system, if you can share files over NFS or
> > some other file-sharing means.
>
> I have ssh access (including root) to the guest but no access to the host.

Do you not even have non-privileged access to the host? If you can't access the
host _at all_, and you can't petition the host owner to give guest systems
access to files like /proc/stat, there isn't really any method of monitoring it.

--

Ashley Dixon
suugaku.co.uk

2A9A 4117
DA96 D18A
8A7B B0D2
A30E BF25
F290 A8AA
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
On Sun, Jul 26, 2020, at 11:21 PM, Adam Carter wrote:
> On Mon, Jul 27, 2020 at 1:35 PM Ashley Dixon <ash@suugaku.co.uk> wrote:
> > On Mon, Jul 27, 2020 at 01:23:46PM +1000, Adam Carter wrote:
> > > Having performance issues on a linux vmware guest that doesnt run vmtools
> > > because its an 'appliance', but it does allow shell access. I assume CPU
> > > utilisation shown by top etc is the utilisation of the vCPUs. Is there any
> > > way to discover or infer host CPU issues?
> >
> > Do you mean that you want to monitor the host system from the guest? Can you not
> > just SSH into the host from the guest? You can also infer CPU usage from the
> > /proc/stat file on the host system, if you can share files over NFS or some
> > other file-sharing means.
>
> I have ssh access (including root) to the guest but no access to the host.

Compare realtime it to measured CPU time. If one realtime second is shorter than a
CPU second then you know the host is pausing your VM. There are other ways to
check, but this should always work if you can contact an asynchronous time standard.
You may need to average the time over tens of seconds or a minute.

This method will allow you to figure out that AWS spot instances are
oversubscribed ~1.5x.
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
>
> Compare realtime it to measured CPU time. If one realtime second is
> shorter than a
> CPU second then you know the host is pausing your VM. There are other ways
> to
> check, but this should always work if you can contact an asynchronous time
> standard.
> You may need to average the time over tens of seconds or a minute.
>
> This method will allow you to figure out that AWS spot instances are
> oversubscribed ~1.5x.
>
>
Nice. FWIW the guest is running NTP.

So should I run something like: date ; time <some command that runs at
100%CPU for a minute> ; date ?
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
On Mon, Jul 27, 2020, at 9:24 PM, Adam Carter wrote:
> > Compare realtime it to measured CPU time. If one realtime second is shorter than a
> > CPU second then you know the host is pausing your VM. There are other ways to
> > check, but this should always work if you can contact an asynchronous time standard.
> > You may need to average the time over tens of seconds or a minute.
> >
> > This method will allow you to figure out that AWS spot instances are
> > oversubscribed ~1.5x.
> >
>
> Nice. FWIW the guest is running NTP.
>
> So should I run something like: date ; time <some command that runs at
> 100%CPU for a minute> ; date ?

No, date will pull from your RTC, which is usually kept up to date with an asynchronous
counter.

First check GNU top(1) and look in the %Cpu line for "st." That is % CPU time stolen. If it is
nonzero then the guest time's accounting is probably working. It's not typical for the
hypervisor to hide this information. It's really important for load balancing.

If that doesn't work we're going to have to write some C. Look at clock_gettime(3):
https://linux.die.net/man/3/clock_gettime.

The clocks are performance counters. Usually their only guarantee is that they go up.
On some platforms you may be able to see a difference between CLOCK_REALTIME and
CLOCK_MONOTONIC. On most platforms however, CLOCK_MONOTONIC is clocked
from the CPU timebase and continues to increment when your program is not running.
On Windows the API exposes the per-core clocks as well.

So to get around this, you need to know the frequency of the processor and how long
it takes to execute specific instructions.

% time ./stealcheck
real 0.680168s
expected 0.625681s
./stealcheck 0.69s user 0.00s system 98% cpu 0.698 total

As commented below, I didn't have time to find the exact cycle count for a busy loop.
But six is familiar and these times line up with what `time` gives. The other issue is
I haven't implemented CPU pinning nor have I fixed the frequency.

If possible do those, otherwise you can still infer an accurate steal time it just
requires statistics. This will be good enough for a yes/no answer. (I.e. if you
get a noticeable discrepancy buy more hardware.)

https://github.com/R030t1/stealcheck

g++ -std=gnu++2a -Wall -pedantic \
stealcheck.cc -o stealcheck

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>

#include <string>
#include <regex>
#include <iostream>
#include <fstream>
using namespace std;

uint64_t cpufreq();

int main(int argc, char *argv[]) {
// If you have a newer processor you can request
// cpuid level 0x16. For this impl. libpcre is
// likely faster.
uint64_t cf = cpufreq(),
// Six is familiar but likely not right.
cycles_per_loop = 6;

struct timespec start = { 0 };
clock_gettime(CLOCK_REALTIME, &start);

// Confirm the cycle count of these instructions for
// accurate results and/or implement loop with asm.
uint64_t count = 0x10000000, orig = 0x10000000;
while (count--);

struct timespec end = { 0 };
clock_gettime(CLOCK_REALTIME, &end);
// Calculate delta.
end.tv_sec -= start.tv_sec;
end.tv_nsec -= start.tv_nsec;

double real = (end.tv_sec * 1.0) + (end.tv_nsec / 1000000000.0);
double expected = (1.0 / cf) * orig * cycles_per_loop;
printf("real\t\t%lfs\n", real);
printf("expected\t%lfs\n", expected);

return 0;
}

uint64_t cpufreq() {
uint64_t res = 0;
regex pattern("^cpu MHz.*?([\\d.]+)");
smatch glean;

string line;
ifstream cpuinf("/proc/cpuinfo");
while (getline(cpuinf, line)) {
if (!regex_search(line, glean, pattern))
continue;
// This effectively returns the last one, but I didn't
// want to add CPU pinning etc. They are typically close
// together.
res = stod(glean[1].str()) * 1000000;
}

return res;
}
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
>
> > So should I run something like: date ; time <some command that runs at
> > 100%CPU for a minute> ; date ?
>
> No, date will pull from your RTC, which is usually kept up to date with an
> asynchronous
> counter.
>
> First check GNU top(1) and look in the %Cpu line for "st." That is % CPU
> time stolen. If it is
> nonzero then the guest time's accounting is probably working. It's not
> typical for the
> hypervisor to hide this information. It's really important for load
> balancing.
>

Thanks for that. I haven't seen any non-zero stolen time yet, however.

FWIW vmstat also shows stolen time.
Re: OT: Can a linux vmware guest tell if its host is CPU constrained? [ In reply to ]
On Thu, Jul 30, 2020 at 2:52 AM Adam Carter <adamcarter3@gmail.com> wrote:
>>
>> > So should I run something like: date ; time <some command that runs at
>> > 100%CPU for a minute> ; date ?
>>
>> No, date will pull from your RTC, which is usually kept up to date with an asynchronous
>> counter.
>>
>> First check GNU top(1) and look in the %Cpu line for "st." That is % CPU time stolen. If it is
>> nonzero then the guest time's accounting is probably working. It's not typical for the
>> hypervisor to hide this information. It's really important for load balancing.
>
>
> Thanks for that. I haven't seen any non-zero stolen time yet, however.
>
> FWIW vmstat also shows stolen time.

Stolen time reporting through vmstat/top only works on xen and kvm
hypervisors, it wasn't
implemented for vmware. It actually looks like it was finally
submitted for linux v5.7
(https://lore.kernel.org/lkml/20200331100353.GA37509@gmail.com/).

If you want those numbers for older kernels, fetch this repository:
https://github.com/dagwieers/vmguestlib
and run vmguest-stats. You'll also need open-vm-tools installed. Have
used this many times in the past and
the numbers are good.