Mailing List Archive

Performance Problems, probably network related
Hello!

Hardware: 2 servers, hardware is more or less identical

Server 1: Ubuntu 20.04 (xen 4.11, Kernel 5.4, Linux Bridge)
AMD EPYC 7702P 64-Core Processor
BCM57416 10G NIC
dom0 has 4 vCPUs

Server 2: VMware ESXi 7.0
AMD EPYC 7543P 32-Core Processor
BCM57414 NetXtreme-E 10Gb/25Gb


VM: Ubuntu 20.04, 8vCPUs. Running Knot DNS name server. I am doing benchmark tests against a VM running either on XEN or VMware.

In both cases no tuning (no cpu pinning ...).

The XEN VM: 170.000 qps
The ESX VM: 575.000 qps

So, the XEN VM is much slower than the VMware VM. I thought this is because the XEN VM is "good old" PV. So I repeated the test with type=pvh but the results were the same. I did some more tests:
When I test with a name server which is CPU intensive, then VMware is only a bit faster. But if the workload is more network-heavy (pps), then VMware is much more faster.

I have read https://wiki.xenproject.org/wiki/Network_Throughput_and_Performance_Guide but there are so many things and I do not know which of them are still relevant, or where to start.

Are there some general advices where to start debugging and tuning (ie are there know network bottlenecks)? Or is XEN known to be slower than VMware in network througput (then I could just stop tuning).

Thanks
Klaus





--
Klaus Darilion, Head of Operations
nic.at GmbH, Jakob-Haringer-Stra?e 8/V
5020 Salzburg, Austria
Re: Performance Problems, probably network related [ In reply to ]
Dear Klaus,

> Hardware: 2 servers, hardware is more or less identical
>
> Server 1: Ubuntu 20.04 (xen 4.11, Kernel 5.4, Linux Bridge)
> AMD EPYC 7702P 64-Core Processor
> BCM57416 10G NIC
> dom0 has 4 vCPUs
>
> Server 2: VMware ESXi 7.0
> AMD EPYC 7543P 32-Core Processor
> BCM57414 NetXtreme-E 10Gb/25Gb
>
>
> VM: Ubuntu 20.04, 8vCPUs. Running Knot DNS name server. I am doing benchmark
> tests against a VM running either on XEN or VMware.
>
> In both cases no tuning (no cpu pinning ...).
>
> The XEN VM: 170.000 qps
> The ESX VM: 575.000 qps
>
> So, the XEN VM is much slower than the VMware VM. I thought this is because
> the XEN VM is "good old" PV. So I repeated the test with type=pvh but the
> results were the same. I did some more tests: When I test with a name
> server which is CPU intensive, then VMware is only a bit faster. But if the
> workload is more network-heavy (pps), then VMware is much more faster.
I do run similar setups, although with different NICs. From what I have
observed so far is that different offloading settings and different
drivers do often make a huge difference. (it might also be the case that
default settings are better optimized for heavy tcp traffic...)
Do you get huge differences when running iperf benchmarks on those
boxes?

all the best,
Adi
Re: Performance Problems, probably network related [ In reply to ]
Klaus Darilion <klaus.darilion@nic.at> wrote:

> Hardware: 2 servers, hardware is more or less identical
>
> Server 1: Ubuntu 20.04 (xen 4.11, Kernel 5.4, Linux Bridge)
> AMD EPYC 7702P 64-Core Processor
> BCM57416 10G NIC
> dom0 has 4 vCPUs
>
> Server 2: VMware ESXi 7.0
> AMD EPYC 7543P 32-Core Processor
> BCM57414 NetXtreme-E 10Gb/25Gb
>
>
> VM: Ubuntu 20.04, 8vCPUs. Running Knot DNS name server. I am doing benchmark tests against a VM running either on XEN or VMware.
>
> In both cases no tuning (no cpu pinning …).
>
> The XEN VM: 170.000 qps
> The ESX VM: 575.000 qps
>
> So, the XEN VM is much slower than the VMware VM.

With the disclaimer that it’s now a few years since I was seriously working with Xen (I now only run it at home) ...

From memory, the default is that all I/O goes through a single thread in Dom0 - or it used to, dunno if things have changed. Naturally this is likely to create a bottleneck. There is an option of running the I/O in a separate domain, but I’ve not done it and never looked at how to or whether it might help.

Simon
AW: Performance Problems, probably network related [ In reply to ]
Hello!

Answering myself to share my findings.


1. The XEN Wiki performance tips are very old, some are still true, some are just wrong.

2. Bottleneck was CPU, either by having to few CPUs in the dom0, or by not distributing interrupt workload to multiple CPUs

Some wiki page mentions that in the domU the network handling is always on vCPU 0. This is not true. Current vif interface is multi-queue. So the interrupt of each queue should be pinned to a dedicated vCPU. this can be done manually, or using irqbalance.

The vif in the dom0 and the physical NICs are also mutli queue and interrupts need to be distributed over vCPUs too -> manual pinning or irqbalance.

Giving the dom0 enough vCPUs to handle the interrupts.

I often found claims that PV is old and slow. The newer PVH is much faster. I can not confirm that. In my testings PV was as fast as PVH.

So what I did:

- stay with PV

- increased dom0 vCPU from 4 to 16

- installed irqbalance in domU

- (irqbalance was already installed in dom0)

With this changes, the performance of the name server in the domU increased from 170.000 pps to 850.000 pps.

regards
Klaus



Von: Klaus Darilion
Gesendet: Montag, 19. September 2022 23:07
An: 'xen-users@lists.xenproject.org' <xen-users@lists.xenproject.org>
Betreff: Performance Problems, probably network related

Hello!

Hardware: 2 servers, hardware is more or less identical

Server 1: Ubuntu 20.04 (xen 4.11, Kernel 5.4, Linux Bridge)
AMD EPYC 7702P 64-Core Processor
BCM57416 10G NIC
dom0 has 4 vCPUs

Server 2: VMware ESXi 7.0
AMD EPYC 7543P 32-Core Processor
BCM57414 NetXtreme-E 10Gb/25Gb


VM: Ubuntu 20.04, 8vCPUs. Running Knot DNS name server. I am doing benchmark tests against a VM running either on XEN or VMware.

In both cases no tuning (no cpu pinning ...).

The XEN VM: 170.000 qps
The ESX VM: 575.000 qps

So, the XEN VM is much slower than the VMware VM. I thought this is because the XEN VM is "good old" PV. So I repeated the test with type=pvh but the results were the same. I did some more tests:
When I test with a name server which is CPU intensive, then VMware is only a bit faster. But if the workload is more network-heavy (pps), then VMware is much more faster.

I have read https://wiki.xenproject.org/wiki/Network_Throughput_and_Performance_Guide but there are so many things and I do not know which of them are still relevant, or where to start.

Are there some general advices where to start debugging and tuning (ie are there know network bottlenecks)? Or is XEN known to be slower than VMware in network througput (then I could just stop tuning).

Thanks
Klaus





--
Klaus Darilion, Head of Operations
nic.at GmbH, Jakob-Haringer-Stra?e 8/V
5020 Salzburg, Austria
Re: AW: Performance Problems, probably network related [ In reply to ]
> I often found claims that PV is old and slow. The newer PVH is much
> faster. I can not confirm that. In my testings PV was as fast as PVH.

Yes I am also satisfied with PV and it's even faster at boot-time
because there's less of a machine skeleton to setup. However I
understood it's more about security and reduced attack surface so at
least for a public cloud use-case PVH seems preferable for domUs.

> -increased dom0 vCPU from 4 to 16

I do not pin vcpu for dom0 but rather enable higher priority for it
(default is 256) and simply keep sharing all the cores.

xl sched-credit2 --domain=0 --weight=512

-elge
AW: AW: Performance Problems, probably network related [ In reply to ]
Hi Pierre-Philipp!

> > -increased dom0 vCPU from 4 to 16
>
> I do not pin vcpu for dom0 but rather enable higher priority for it
> (default is 256) and simply keep sharing all the cores.
>
> xl sched-credit2 --domain=0 --weight=512

How do you handle domU? Also share all CPUs with the domU? Or only dedicated CPUs? CPUs overbooking?

Thanks
Klaus
Re: AW: AW: Performance Problems, probably network related [ In reply to ]
> How do you handle domU? Also share all CPUs with the domU? Or only dedicated CPUs? CPUs overbooking?

for public cloud usage, I would share only one or two cores

vcpus = 2

for private cloud usage, I would simply give all the power to any guest
then monitor, detect and handle guests that are misbehaving.

vcpus = <as many as physical cores>

-elge