Mailing List Archive

Slow Memory Performance
Hi,

I've run into an interesting wee issue and wondered if anyone else had seen
this before?

Running on a Debian Bullseye dom0, with a Debian Bullseye guest I am seeing
a really large difference between the memory performance on the dom0 vs the
domU.

I have looked at the documentation online about NUMA and tried pinning the
cores but no change I have made seems to have made a huge difference. I
have searched high and low but couldn't see any obvious documentation about
any default memory throughput thresholds? The server I am running on is
running an AMD EPYC 7313 Processor with DDR4-3200 Memory, but I suspect the
hardware specifications don't matter a whole lot as I see the same domU
performance when the guest is running on another server with an AMD EPYC
7302 Processor, again the dom0 performance is a lot better.

Running a `sysbench memory run` on the dom0 gets me the following output:

$ sysbench memory run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global

Initializing worker threads...

Threads started!

Total operations: 66554847 (6654930.25 per second)

64994.97 MiB transferred (6498.96 MiB/sec)


General statistics:
total time: 10.0001s
total number of events: 66554847

Latency (ms):
min: 0.00
avg: 0.00
max: 0.43
95th percentile: 0.00
sum: 4074.10

Threads fairness:
events (avg/stddev): 66554847.0000/0.00
execution time (avg/stddev): 4.0741/0.00


Running this same command on a domU gets me the following:

$ sysbench memory run
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Running memory speed test with the following options:
block size: 1KiB
total size: 102400MiB
operation: write
scope: global

Initializing worker threads...

Threads started!

Total operations: 6135308 (613485.30 per second)

5991.51 MiB transferred (599.11 MiB/sec)


General statistics:
total time: 10.0001s
total number of events: 6135308

Latency (ms):
min: 0.00
avg: 0.00
max: 0.27
95th percentile: 0.00
sum: 3469.04

Threads fairness:
events (avg/stddev): 6135308.0000/0.00
execution time (avg/stddev): 3.4690/0.00

That is quite a difference, so I wondered if I am missing anything obvious
here?

An xl info for those interested:

host : dom0
release : 5.10.0-9-amd64
version : #1 SMP Debian 5.10.70-1 (2021-09-30)
machine : x86_64
nr_cpus : 64
max_cpu_id : 255
nr_nodes : 2
cores_per_socket : 16
threads_per_core : 2
cpu_mhz : 3000.046
hw_caps :
178bf3ff:76da320b:2e500800:244037ff:0000000f:219c97a9:0040068c:00000500
virt_caps : pv hvm hvm_directio pv_directio hap shadow
total_memory : 1048433
free_memory : 1026384
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 14
xen_extra : .3
xen_version : 4.14.3
xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p
hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset :
xen_commandline : placeholder dom0_mem=6144M,max:6144M
dom0_max_vcpus=6 dom0_vcpus_pin ucode=scan
cc_compiler : x86_64-linux-gnu-gcc (Debian 10.2.1-6) 10.2.1
20210110
cc_compile_by : pkg-xen-devel
cc_compile_domain : lists.alioth.debian.org
cc_compile_date : Mon Sep 13 14:28:21 UTC 2021
build_id : 1a67c53a8813b422d2033de494f8b444915791f2
xend_config_format : 4

Any assistance would be greatly appreciated. I have hardware available to
run any tests and am eager to resolve this problem if anyone has any
pointers or has seen something similar before?

Thanks,

Connor