Mailing List Archive

Performance regressions in networking & storage benchmarks in Linux kernel 5.8
Part of VMware's performance regression testing for Linux Kernel upstream rele
ases we compared Linux kernel 5.8 against 5.7. Our evaluation revealed perform
ance regressions mostly in networking latency/response-time benchmarks up to 6
0%. Storage throughput & latency benchmarks were also up by 8%.

After performing the bisect between kernel 5.8 and 5.7, we identified the root
cause behaviour to be an interrupt related change from Thomas Gleixner's "633
260fa143bbed05e65dc557a492667dfdc45bb(x86/irq: Convey vector as argument and n
ot in ptregs)" commit. To confirm this, we backed out the commit from 5.8 & re
ran our tests and found that the performance was similar to 5.7 kernel.

Impacted test cases:

Networking:
- Netperf TCP_RR & TCP_CRR - Response time
- Ping - Response time
- Memcache - Response time
- Netperf TCP_STREAM small(8K socket & 256B message)(TCP_NODELAY set) pack
ets - Throughput & CPU utilization(CPU/Gbits)

Storage:
- FIO:
- 4K (rand|seq)_(read|write) local-NVMe MultiVM tests - Throughput & l
atency

From our testing, overall results indicate that above-mentioned commit has int
roduced performance regressions in latency-sensitive workloads for networking.
For storage, it affected both throughput & latency workloads.

Also, since Linux 5.9-rc4 kernel was released recently, we repeated the same e
xperiments on 5.9-rc4. We observed all regressions were fixed and the performa
nce numbers between 5.7 and 5.9-rc4 were similar.

In order to find the fix commit, we bisected again between 5.8 and 5.9-rc4 and
identified that regressions were fixed from a commit made by the same author
Thomas Gleixner, which unbreaks the interrupt affinity settings - "e027fffff79
9cdd70400c5485b1a54f482255985(x86/irq: Unbreak interrupt affinity setting)".

We believe these findings would be useful to the Linux community and wanted to
document the same.

Abdul Anshad Azeez
Performance Engineering
VMware, Inc.
Re: Performance regressions in networking & storage benchmarks in Linux kernel 5.8 [ In reply to ]
Abdul,

On Tue, Sep 22 2020 at 08:51, Abdul Anshad Azeez wrote:
> Part of VMware's performance regression testing for Linux Kernel upstream rele
> ases we compared Linux kernel 5.8 against 5.7. Our evaluation revealed perform
> ance regressions mostly in networking latency/response-time benchmarks up to 6
> 0%. Storage throughput & latency benchmarks were also up by 8%.
> In order to find the fix commit, we bisected again between 5.8 and 5.9-rc4 and
> identified that regressions were fixed from a commit made by the same author
> Thomas Gleixner, which unbreaks the interrupt affinity settings - "e027fffff79
> 9cdd70400c5485b1a54f482255985(x86/irq: Unbreak interrupt affinity setting)".
>
> We believe these findings would be useful to the Linux community and wanted to
> document the same.

thanks for letting us know, but the issue is known already and the fix
has been backported to the stable kernel version 5.8.6 as of Sept. 3rd.

Please always check the latest stable version.

Thanks,

tglx
Re: Performance regressions in networking & storage benchmarks in Linux kernel 5.8 [ In reply to ]
Hello Thomas,

Thank you very much for your comments.

Since the performance regressions were fixed when we tested version 5.9-rc4, we
were not reporting it as an issue and our intention was just to share this as an
information only.

Thanks,
Abdul Anshad A


From: Thomas Gleixner <tglx@linutronix.de>
Sent: Tuesday, September 22, 2020 04:55 PM
To: Abdul Anshad Azeez <aazees@vmware.com>; linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; x86@kernel.org <x86@kernel.org>; netdev@vger.kernel.org <netdev@vger.kernel.org>; linux-fsdevel@vger.kernel.org <linux-fsdevel@vger.kernel.org>
Cc: rostedt@goodmis.org <rostedt@goodmis.org>
Subject: Re: Performance regressions in networking & storage benchmarks in Linux kernel 5.8
?
Abdul,

On Tue, Sep 22 2020 at 08:51, Abdul Anshad Azeez wrote:
> Part of VMware's performance regression testing for Linux Kernel upstream rele
> ases we compared Linux kernel 5.8 against 5.7. Our evaluation revealed perform
> ance regressions mostly in networking latency/response-time benchmarks up to 6
> 0%. Storage throughput & latency benchmarks were also up by 8%.
> In order to find the fix commit, we bisected again between 5.8 and 5.9-rc4 and
>? identified that regressions were fixed from a commit made by the same author
> Thomas Gleixner, which unbreaks the interrupt affinity settings - "e027fffff79
> 9cdd70400c5485b1a54f482255985(x86/irq: Unbreak interrupt affinity setting)".
>
> We believe these findings would be useful to the Linux community and wanted to
>? document the same.

thanks for letting us know, but the issue is known already and the fix
has been backported to the stable kernel version 5.8.6 as of Sept. 3rd.

Please always check the latest stable version.

Thanks,

??????? tglx