Mailing List Archive

High "steal" on new dom0 with no domu's running
A bit of back story. My Xen experience started out about 5 years ago with
an Aaeon EMB-CV1 with 4GB of memory, running four VMs. As I outgrew that
hardware, I moved to an Aaeon EMB-KB1 with 8GB of memory. I outgrew that
hardware and have moved to an Asrock J5040-ITX with 16GB of memory. These
have all worked beautifully and have been very performant, it's amazed me
the capabilities of paravirtualization and the efficiency attained from the
platform. All have been running with a Debian dom0 and Debian domu's, using
the Debian-maintained Xen version. Each dom0 install has been a fresh
build, migrating the VMs afterwards.

I recently decided to add an additional board into the mix to help with
load so that things requiring more horsepower (like my ELK stack, minecraft
server, nextcloud instance, etc.) can live on the 5040, while lower-CPU
stuff like DNS, VPN server, mail server, etc. can live on a board with a
slower CPU. I wanted to be on the same CPU architecture so that I could
live-migrate VMs for maintenance, so I picked up an Asrock J4205 with 8GB
of memory. After installing Debian 10 (my standard build currently,) the
board was snappy and performant. I then installed xen-tools and
xen-system-amd64, and after rebooting, the system took significantly longer
to boot, and was very laggy from the console (and SSH as well.) At this
point I wasn't running any VMs, didn't have any custom tweaks, etc. Looking
at top, there was a lot of "steal," overall averaging around 10% between
all four cores (all physical cores.)

I tried tying the dom0 to only one CPU and at that point the dom0 was
consistently performant again. However, any domu's I tried to spin up would
be very laggy, with high "steal." Live-migrating them back to the 5040
they'd be fine again. This was also the case if I didn't live-migrate but
just started them up on the 4205.

I thought maybe I'd goofed something up in the build somehow, so I blew
away that installation and rebuilt it from scratch, and experience the same
thing.

I started logging performance with sysstat and this is what I see:

On the 5040 with 6 VMs running:

08:25:01 AM CPU %user %nice %system %iowait %steal
%idle
08:35:01 AM all 0.03 0.00 0.11 0.00 0.05
99.81
08:45:01 AM all 0.03 0.00 0.11 0.00 0.05
99.81
08:55:01 AM all 0.03 0.00 0.17 0.00 0.20
99.60
09:05:01 AM all 0.03 0.00 0.12 0.00 0.05
99.80
09:15:01 AM all 0.03 0.00 0.13 0.00 0.09
99.75
09:25:01 AM all 0.03 0.00 0.18 0.00 0.26
99.53
Average: all 0.03 0.00 0.14 0.00 0.12
99.72

On the 4205 with no VMs running:

08:35:01 AM CPU %user %nice %system %iowait %steal
%idle
08:45:02 AM all 0.03 0.00 0.07 0.01 7.74
92.15
08:55:02 AM all 0.03 0.00 0.09 0.00 8.95
90.93
09:05:01 AM all 0.03 0.00 0.07 0.00 9.19
90.70
09:15:01 AM all 0.03 0.00 0.08 0.00 7.93
91.96
09:25:01 AM all 0.03 0.00 0.07 0.00 8.85
91.05
09:35:01 AM all 0.03 0.00 0.19 0.00 6.73
93.05
Average: all 0.03 0.00 0.09 0.00 8.24
91.63

# top
top - 09:45:55 up 1:29, 1 user, load average: 0.15, 0.11, 0.08
Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 1.1 sy, 0.0 ni, 96.6 id, 0.0 wa, 0.0 hi, 0.0 si,
2.3 st
%Cpu1 : 0.0 us, 0.0 sy, 0.0 ni, 78.9 id, 0.0 wa, 0.0 hi, 0.0 si,
21.1 st
%Cpu2 : 0.0 us, 1.1 sy, 0.0 ni, 89.4 id, 0.0 wa, 0.0 hi, 0.0 si,
9.6 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni, 62.8 id, 0.0 wa, 0.0 hi, 0.0 si,
37.2 st

Is there any way to tell what's causing the performance degradation, and
what the dom0 is doing when it's "stealing" the CPU? I've been googling the
issue a lot the last few days and haven't found anything useful so far,
only threads saying that this happens when you oversubscribe your domu's,
but as I'm not running any domu's at this point I don't see how that could
be an issue since it's just sitting there looking cool but not doing any
real work.

Local disk storage on both dom0's is a single 20GB Intel 313 SLC SSD. VMs
are stored on a Debian nas box, connecting via iscsi.

# uname -a
Linux vhost2 4.19.0-14-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64
GNU/Linux
# cat /etc/debian_version
10.9

# xl info
host :
release : 4.19.0-14-amd64
version : #1 SMP Debian 4.19.171-2 (2021-01-30)
machine : x86_64
nr_cpus : 4
max_cpu_id : 3
nr_nodes : 1
cores_per_socket : 4
threads_per_core : 1
cpu_mhz : 1497.612
hw_caps :
bfebfbff:47f8e3bf:2c100800:00000101:0000000f:2094e283:00000000:00000100
virt_caps : hvm hvm_directio
total_memory : 8040
free_memory : 7413
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 11
xen_extra : .4
xen_version : 4.11.4
xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder dom0_mem=512M,max:512M no-real-mode
edd=off
cc_compiler : gcc (Debian 8.3.0-6) 8.3.0
cc_compile_by : pkg-xen-devel
cc_compile_domain : lists.alioth.debian.org
cc_compile_date : Fri Dec 11 21:33:51 UTC 2020
build_id : 6d8e0fa3ddb825695eb6c6832631b4fa2331fe41
xend_config_format : 4


Chris