Hi,
I've run into an interesting wee issue and wondered if anyone else had seen
this before?
Running on a Debian Bullseye dom0, with a Debian Bullseye guest I am seeing
a really large difference between the memory performance on the dom0 vs the
domU.
I have looked at the documentation online about NUMA and tried pinning the
cores but no change I have made seems to have made a huge difference. I
have searched high and low but couldn't see any obvious documentation about
any default memory throughput thresholds? The server I am running on is
running an AMD EPYC 7313 Processor with DDR4-3200 Memory, but I suspect the
hardware specifications don't matter a whole lot as I see the same domU
performance when the guest is running on another server with an AMD EPYC
7302 Processor, again the dom0 performance is a lot better.
Running a `sysbench memory run` on the dom0 gets me the following output:
$ sysbench memory run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 66554847 (6654930.25 per second)
64994.97 MiB transferred (6498.96 MiB/sec)
General statistics:
total time: 10.0001s
total number of events: 66554847
Latency (ms):
min: 0.00
avg: 0.00
max: 0.43
95th percentile: 0.00
sum: 4074.10
Threads fairness:
events (avg/stddev): 66554847.0000/0.00
execution time (avg/stddev): 4.0741/0.00
Running this same command on a domU gets me the following:
$ sysbench memory run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 6135308 (613485.30 per second)
5991.51 MiB transferred (599.11 MiB/sec)
General statistics:
total time: 10.0001s
total number of events: 6135308
Latency (ms):
min: 0.00
avg: 0.00
max: 0.27
95th percentile: 0.00
sum: 3469.04
Threads fairness:
events (avg/stddev): 6135308.0000/0.00
execution time (avg/stddev): 3.4690/0.00
That is quite a difference, so I wondered if I am missing anything obvious
here?
An xl info for those interested:
host : dom0
release : 5.10.0-9-amd64
version : #1 SMP Debian 5.10.70-1 (2021-09-30)
machine : x86_64
nr_cpus : 64
max_cpu_id : 255
nr_nodes : 2
cores_per_socket : 16
threads_per_core : 2
cpu_mhz : 3000.046
hw_caps :
178bf3ff:76da320b:2e500800:244037ff:0000000f:219c97a9:0040068c:00000500
virt_caps : pv hvm hvm_directio pv_directio hap shadow
total_memory : 1048433
free_memory : 1026384
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 14
xen_extra : .3
xen_version : 4.14.3
xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p
hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder dom0_mem=6144M,max:6144M
dom0_max_vcpus=6 dom0_vcpus_pin ucode=scan
cc_compiler : x86_64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1
20210110
cc_compile_by : pkg-xen-devel
cc_compile_domain : lists.alioth.debian.org
cc_compile_date : Mon Sep 13 14:28:21 UTC 2021
build_id : 1a67c53a8813b422d2033de494f8b444915791f2
xend_config_format : 4
Any assistance would be greatly appreciated. I have hardware available to
run any tests and am eager to resolve this problem if anyone has any
pointers or has seen something similar before?
Thanks,
Connor
I've run into an interesting wee issue and wondered if anyone else had seen
this before?
Running on a Debian Bullseye dom0, with a Debian Bullseye guest I am seeing
a really large difference between the memory performance on the dom0 vs the
domU.
I have looked at the documentation online about NUMA and tried pinning the
cores but no change I have made seems to have made a huge difference. I
have searched high and low but couldn't see any obvious documentation about
any default memory throughput thresholds? The server I am running on is
running an AMD EPYC 7313 Processor with DDR4-3200 Memory, but I suspect the
hardware specifications don't matter a whole lot as I see the same domU
performance when the guest is running on another server with an AMD EPYC
7302 Processor, again the dom0 performance is a lot better.
Running a `sysbench memory run` on the dom0 gets me the following output:
$ sysbench memory run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 66554847 (6654930.25 per second)
64994.97 MiB transferred (6498.96 MiB/sec)
General statistics:
total time: 10.0001s
total number of events: 66554847
Latency (ms):
min: 0.00
avg: 0.00
max: 0.43
95th percentile: 0.00
sum: 4074.10
Threads fairness:
events (avg/stddev): 66554847.0000/0.00
execution time (avg/stddev): 4.0741/0.00
Running this same command on a domU gets me the following:
$ sysbench memory run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)
Running the test with following options:
Number of threads: 1
Initializing random number generator from current time
Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global
Initializing worker threads...
Threads started!
Total operations: 6135308 (613485.30 per second)
5991.51 MiB transferred (599.11 MiB/sec)
General statistics:
total time: 10.0001s
total number of events: 6135308
Latency (ms):
min: 0.00
avg: 0.00
max: 0.27
95th percentile: 0.00
sum: 3469.04
Threads fairness:
events (avg/stddev): 6135308.0000/0.00
execution time (avg/stddev): 3.4690/0.00
That is quite a difference, so I wondered if I am missing anything obvious
here?
An xl info for those interested:
host : dom0
release : 5.10.0-9-amd64
version : #1 SMP Debian 5.10.70-1 (2021-09-30)
machine : x86_64
nr_cpus : 64
max_cpu_id : 255
nr_nodes : 2
cores_per_socket : 16
threads_per_core : 2
cpu_mhz : 3000.046
hw_caps :
178bf3ff:76da320b:2e500800:244037ff:0000000f:219c97a9:0040068c:00000500
virt_caps : pv hvm hvm_directio pv_directio hap shadow
total_memory : 1048433
free_memory : 1026384
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 14
xen_extra : .3
xen_version : 4.14.3
xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p
hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder dom0_mem=6144M,max:6144M
dom0_max_vcpus=6 dom0_vcpus_pin ucode=scan
cc_compiler : x86_64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1
20210110
cc_compile_by : pkg-xen-devel
cc_compile_domain : lists.alioth.debian.org
cc_compile_date : Mon Sep 13 14:28:21 UTC 2021
build_id : 1a67c53a8813b422d2033de494f8b444915791f2
xend_config_format : 4
Any assistance would be greatly appreciated. I have hardware available to
run any tests and am eager to resolve this problem if anyone has any
pointers or has seen something similar before?
Thanks,
Connor