Mailing List Archive

XSA-332 kernel patch - huge network performance on pfSense VMs
Hi list,

Another "popular" thread on XCP-ng forum [1], started in october 2020,
allowed us to detect that patch 12 from the XSA-332 advisory [2] had a
very significant impact on network performance in the case of pfSense VMs.

We reproduced the issue internally (well, we reproduced "something". The
user setups in this thread are diverse) and our findings seem to confirm
what the users reported. Running iperf3 from the pfSense VM to a debian
VM gives results around 5 times slower than before. Reverting this
single patch brings the performance back. On the debian to pfSense
direction, the drop is about 25%.

Testing environment:

Host
* XCP-ng 8.2
* kernel 4.19.19 + backports and sec fixes [3]
* Xen 4.13.1 + backports and sec fixes [4]

VM 1 - pfSense, HVM
VM 2 - debian, HVM
VM 3 - debian, HVM

All VMs running on the same host.

I can provide more details if needed.

Note: a user reported a performance drop between two windows VMs [5] and
another when accessing DBF files across the network [6]. They didn't
specify what the network setup was though. At least the second one
confirmed reverting patch 12 solved his issues.


*** Test results, without reverting patch 12 ***

Debian to pfSense: ~1,20 Gbit/s, stable

root@samtest:~# iperf3 -c 10.0.90.1
Connecting to host 10.0.90.1, port 5201
[  5] local 10.0.90.2 port 58036 connected to 10.0.90.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   100 MBytes   842 Mbits/sec    0   1.22 MBytes
[  5]   1.00-2.00   sec   148 MBytes  1.24 Gbits/sec    0   2.00 MBytes
[  5]   2.00-3.00   sec   145 MBytes  1.22 Gbits/sec    0   2.11 MBytes
[  5]   3.00-4.00   sec   144 MBytes  1.21 Gbits/sec    0   2.11 MBytes
[  5]   4.00-5.00   sec   141 MBytes  1.18 Gbits/sec    0   2.11 MBytes
[  5]   5.00-6.00   sec   150 MBytes  1.26 Gbits/sec    0   2.11 MBytes
[  5]   6.00-7.00   sec   144 MBytes  1.21 Gbits/sec    0   2.11 MBytes
[  5]   7.00-8.00   sec   132 MBytes  1.11 Gbits/sec    0   2.11 MBytes
[  5]   8.00-9.00   sec   140 MBytes  1.17 Gbits/sec    0   2.11 MBytes
[  5]   9.00-10.00  sec   142 MBytes  1.20 Gbits/sec    0   2.11 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec 0             sender
[  5]   0.00-10.00  sec  1.35 GBytes  1.16 Gbits/sec                 
receiver

pfSense to Debian: catastrophic and not stable. We should have 2,2
Gbit/s here.

root@samtest:~# iperf3 -c 10.0.90.1 -R
Connecting to host 10.0.90.1, port 5201
Reverse mode, remote host 10.0.90.1 is sending
[  5] local 10.0.90.2 port 58052 connected to 10.0.90.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  30.2 MBytes   254 Mbits/sec
[  5]   1.00-2.00   sec  28.5 MBytes   239 Mbits/sec
[  5]   2.00-3.00   sec  47.3 MBytes   397 Mbits/sec
[  5]   3.00-4.00   sec  96.4 MBytes   809 Mbits/sec
[  5]   4.00-5.00   sec  43.3 MBytes   363 Mbits/sec
[  5]   5.00-6.00   sec  36.9 MBytes   310 Mbits/sec
[  5]   6.00-7.00   sec  25.2 MBytes   211 Mbits/sec
[  5]   7.00-8.00   sec  43.8 MBytes   368 Mbits/sec
[  5]   8.00-9.00   sec  23.0 MBytes   193 Mbits/sec
[  5]   9.00-10.00  sec  17.0 MBytes   143 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   392 MBytes   329 Mbits/sec 112             sender
[  5]   0.00-10.00  sec   392 MBytes   329 Mbits/sec                 
receiver

Debian to debian: 8,5 Gbit/s

root@samtest:~# iperf3 -c 10.0.90.3
Connecting to host 10.0.90.3, port 5201
[  5] local 10.0.90.2 port 39928 connected to 10.0.90.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   963 MBytes  8.07 Gbits/sec    0    878 KBytes
[  5]   1.00-2.00   sec   994 MBytes  8.34 Gbits/sec    0   1014 KBytes
[  5]   2.00-3.00   sec   969 MBytes  8.13 Gbits/sec    0   1.10 MBytes
[  5]   3.00-4.00   sec  1.02 GBytes  8.80 Gbits/sec    0   1.15 MBytes
[  5]   4.00-5.00   sec  1022 MBytes  8.58 Gbits/sec    0   1.21 MBytes
[  5]   5.00-6.00   sec  1012 MBytes  8.49 Gbits/sec    0   1.33 MBytes
[  5]   6.00-7.00   sec  1.01 GBytes  8.71 Gbits/sec    0   1.33 MBytes
[  5]   7.00-8.00   sec  1.05 GBytes  9.06 Gbits/sec    0   1.33 MBytes
[  5]   8.00-9.00   sec  1.02 GBytes  8.74 Gbits/sec    0   1.40 MBytes
[  5]   9.00-10.00  sec  1018 MBytes  8.54 Gbits/sec    0   1.54 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.95 GBytes  8.54 Gbits/sec 0             sender
[  5]   0.00-10.04  sec  9.94 GBytes  8.51 Gbits/sec                 
receiver


*** Test results, after reverting patch 12 (only) ***

Debian to pfSense: better average perf, but there are big drops so it
does not manage to stay at 2,2 Gbit/s.

root@samtest:~# iperf3 -c 10.0.90.1
Connecting to host 10.0.90.1, port 5201
[  5] local 10.0.90.2 port 46946 connected to 10.0.90.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   237 MBytes  1.98 Gbits/sec    0   1.44 MBytes
[  5]   1.00-2.00   sec   240 MBytes  2.01 Gbits/sec    0   1.52 MBytes
[  5]   2.00-3.00   sec   260 MBytes  2.18 Gbits/sec    0   1.68 MBytes
[  5]   3.00-4.00   sec  36.2 MBytes   304 Mbits/sec    0   1.68 MBytes
[  5]   4.00-5.00   sec   115 MBytes   965 Mbits/sec    0   1.68 MBytes
[  5]   5.00-6.00   sec   255 MBytes  2.14 Gbits/sec    0   1.78 MBytes
[  5]   6.00-7.00   sec   181 MBytes  1.52 Gbits/sec    0   1.78 MBytes
[  5]   7.00-8.00   sec  76.2 MBytes   640 Mbits/sec    0   1.78 MBytes
[  5]   8.00-9.00   sec   261 MBytes  2.19 Gbits/sec    0   1.78 MBytes
[  5]   9.00-10.00  sec   251 MBytes  2.11 Gbits/sec    0   1.88 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.87 GBytes  1.60 Gbits/sec 0             sender
[  5]   0.00-10.00  sec  1.87 GBytes  1.60 Gbits/sec                 
receiver

pfSense to Debian: back to stable 2,2 Gbit/s

root@samtest:~# iperf3 -c 10.0.90.1 -R
Connecting to host 10.0.90.1, port 5201
Reverse mode, remote host 10.0.90.1 is sending
[  5] local 10.0.90.2 port 46954 connected to 10.0.90.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   238 MBytes  2.00 Gbits/sec
[  5]   1.00-2.00   sec   239 MBytes  2.01 Gbits/sec
[  5]   2.00-3.00   sec   252 MBytes  2.11 Gbits/sec
[  5]   3.00-4.00   sec   256 MBytes  2.15 Gbits/sec
[  5]   4.00-5.00   sec   263 MBytes  2.20 Gbits/sec
[  5]   5.00-6.00   sec   255 MBytes  2.14 Gbits/sec
[  5]   6.00-7.00   sec   265 MBytes  2.22 Gbits/sec
[  5]   7.00-8.00   sec   266 MBytes  2.23 Gbits/sec
[  5]   8.00-9.00   sec   262 MBytes  2.20 Gbits/sec
[  5]   9.00-10.00  sec   273 MBytes  2.29 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  2.51 GBytes  2.16 Gbits/sec 123             sender
[  5]   0.00-10.00  sec  2.51 GBytes  2.16 Gbits/sec                 
receiver

Debian to debian : no differences.

root@samtest:~# iperf3 -c 10.0.90.3
iperf3: error - unable to connect to server: Connection refused
root@samtest:~# iperf3 -c 10.0.90.3
Connecting to host 10.0.90.3, port 5201
[  5] local 10.0.90.2 port 59498 connected to 10.0.90.3 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   982 MBytes  8.23 Gbits/sec    0    779 KBytes
[  5]   1.00-2.00   sec   982 MBytes  8.24 Gbits/sec    0    904 KBytes
[  5]   2.00-3.00   sec  1.01 GBytes  8.65 Gbits/sec    0    904 KBytes
[  5]   3.00-4.00   sec  1.03 GBytes  8.85 Gbits/sec    0   1.07 MBytes
[  5]   4.00-5.00   sec  1022 MBytes  8.58 Gbits/sec    0   1.13 MBytes
[  5]   5.00-6.00   sec  1.01 GBytes  8.69 Gbits/sec    0   1.13 MBytes
[  5]   6.00-7.00   sec   988 MBytes  8.28 Gbits/sec    0   1.31 MBytes
[  5]   7.00-8.00   sec  1.01 GBytes  8.64 Gbits/sec    0   1.31 MBytes
[  5]   8.00-9.00   sec  1.01 GBytes  8.70 Gbits/sec    0   1.38 MBytes
[  5]   9.00-10.00  sec  1000 MBytes  8.39 Gbits/sec    0   1.52 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  9.93 GBytes  8.53 Gbits/sec 0             sender
[  5]   0.00-10.04  sec  9.92 GBytes  8.49 Gbits/sec                 
receiver


Best regards,

Samuel Verschelde

[1]
https://xcp-ng.org/forum/topic/3774/poor-pfsense-wan-speeds-after-xcp-ng-updates
[2] http://xenbits.xen.org/xsa/xsa332-linux-12.patch
[3] https://github.com/xcp-ng-rpms/kernel/tree/8.2
[4] https://github.com/xcp-ng-rpms/xen/tree/8.2
[5] https://xcp-ng.org/forum/post/33264
[6] https://xcp-ng.org/forum/post/33268
Re: XSA-332 kernel patch - huge network performance on pfSense VMs [ In reply to ]
On Fri, Jan 15, 2021 at 03:03:26PM +0000, Samuel Verschelde wrote:
> Hi list,
>
> Another "popular" thread on XCP-ng forum [1], started in october 2020,
> allowed us to detect that patch 12 from the XSA-332 advisory [2] had a very
> significant impact on network performance in the case of pfSense VMs.
>
> We reproduced the issue internally (well, we reproduced "something". The
> user setups in this thread are diverse) and our findings seem to confirm
> what the users reported. Running iperf3 from the pfSense VM to a debian VM
> gives results around 5 times slower than before. Reverting this single patch
> brings the performance back. On the debian to pfSense direction, the drop is
> about 25%.

pfSense is based on FreeBSD, so I would bet that whatever performance
degradation you are seeing would also happen with plain FreeBSD. I
would assume netfront in FreeBSD is triggering the ratelimit on Linux,
and hence it gets throttled.

Do you think you have the bandwidth to look into the FreeBSD side and
try to provide a fix? I'm happy to review and commit in upstream
FreeBSD, but would be nice to have someone else also in the loop as
ATM I'm the only one doing FreeBSD/Xen development AFAIK.

Thanks, Roger.
Re: XSA-332 kernel patch - huge network performance on pfSense VMs [ In reply to ]
Le 18/01/2021 à 11:03, Roger Pau Monné a écrit :
> On Fri, Jan 15, 2021 at 03:03:26PM +0000, Samuel Verschelde wrote: >> Hi list, >> >> Another "popular" thread on XCP-ng forum [1],
started in october >> 2020, allowed us to detect that patch 12 from the
XSA-332 advisory >> [2] had a very significant impact on network
performance in the >> case of pfSense VMs. >> >> We reproduced the issue
internally (well, we reproduced >> "something". The user setups in this
thread are diverse) and our >> findings seem to confirm what the users
reported. Running iperf3 >> from the pfSense VM to a debian VM gives
results around 5 times >> slower than before. Reverting this single
patch brings the >> performance back. On the debian to pfSense
direction, the drop is >> about 25%. > > pfSense is based on FreeBSD, so
I would bet that whatever performance > degradation you are seeing would
also happen with plain FreeBSD. I > would assume netfront in FreeBSD is
triggering the ratelimit on > Linux, and hence it gets throttled. > > Do
you think you have the bandwidth to look into the FreeBSD side and > try
to provide a fix? I'm happy to review and commit in upstream > FreeBSD,
but would be nice to have someone else also in the loop as > ATM I'm the
only one doing FreeBSD/Xen development AFAIK. >
I would personnally not be able to hack into either Xen, the linux
kernel or FreeBSD in any efficient way. My role here is limited to
packaging, testing and acting as a relay between users and developers.
We currently don't have anyone at Vates who would be able to hack into
FreeBSD either.

What currently puts FreeBSD into our radar is the large amount of users
who use FreeNAS/TrueNAS or pfSense VMs, and the recent bugs they
detected (XSA-360 and this performance drop).

Additionnally, regarding this performance issue, some users report an
impact of that same patch 12 on the network performance of their non-BSD
VMs [1][2], so I think the FreeBSD case might be helpful to help
identify what in that patch caused throttling (if that's what happens),
because it's easier to reproduce, but I'm not sure fixes would only need
to be made in FreeBSD.

Best regards,

Samuel Verschelde

[1] https://xcp-ng.org/forum/post/35521 mentions debian based Untangle
OS and inter-VLAN traffic
[2] https://xcp-ng.org/forum/post/35476 general slowdown affecting all
VMs (VM to workstation traffic), from the first user who identified
patch 12 as the cause.
Re: XSA-332 kernel patch - huge network performance on pfSense VMs [ In reply to ]
Le 18/01/2021 à 11:03, Roger Pau Monné a écrit :
> On Fri, Jan 15, 2021 at 03:03:26PM +0000, Samuel Verschelde wrote:
>> Hi list,
>>
>> Another "popular" thread on XCP-ng forum [1], started in october 2020,
>> allowed us to detect that patch 12 from the XSA-332 advisory [2] had a very
>> significant impact on network performance in the case of pfSense VMs.
>>
>> We reproduced the issue internally (well, we reproduced "something". The
>> user setups in this thread are diverse) and our findings seem to confirm
>> what the users reported. Running iperf3 from the pfSense VM to a debian VM
>> gives results around 5 times slower than before. Reverting this single patch
>> brings the performance back. On the debian to pfSense direction, the drop is
>> about 25%.
>
> pfSense is based on FreeBSD, so I would bet that whatever performance
> degradation you are seeing would also happen with plain FreeBSD. I
> would assume netfront in FreeBSD is triggering the ratelimit on Linux,
> and hence it gets throttled.
>
> Do you think you have the bandwidth to look into the FreeBSD side and
> try to provide a fix? I'm happy to review and commit in upstream
> FreeBSD, but would be nice to have someone else also in the loop as
> ATM I'm the only one doing FreeBSD/Xen development AFAIK.
>
> Thanks, Roger.
>

(sorry about the previous email, looks like my mail client hates me)

I would personnally not be able to hack into either Xen, the linux
kernel or FreeBSD in any efficient way. My role here is limited to
packaging, testing and acting as a relay between users and developers.
We currently don't have anyone at Vates who would be able to hack into
FreeBSD either.

What currently put FreeBSD on our radar is the large amount of users who
use FreeNAS/TrueNAS or pfSense VMs, and the recent bugs they detected
(XSA-360 and this performance drop).

Additionnally, regarding this performance issue, some users report an
impact of that same patch 12 on the network performance of their non-BSD
VMs [1][2], so I think the FreeBSD case might be helpful to help
identify what in that patch caused throttling (if that's what happens),
because it's easier to reproduce, but I'm not sure fixes would only need
to be made in FreeBSD.

Best regards,

Samuel Verschelde

[1] https://xcp-ng.org/forum/post/35521 mentions debian based Untangle
OS and inter-VLAN traffic
[2] https://xcp-ng.org/forum/post/35476 general slowdown affecting all
VMs (VM to workstation traffic), from the first user who identified
patch 12 as the cause.
Re: XSA-332 kernel patch - huge network performance on pfSense VMs [ In reply to ]
On 26/01/2021 15:04, Samuel Verschelde wrote:
> Le 18/01/2021 à 11:03, Roger Pau Monné a écrit :
>> On Fri, Jan 15, 2021 at 03:03:26PM +0000, Samuel Verschelde wrote:
>>> Hi list,
>>>
>>> Another "popular" thread on XCP-ng forum [1], started in october 2020,
>>> allowed us to detect that patch 12 from the XSA-332 advisory [2] had
>>> a very
>>> significant impact on network performance in the case of pfSense VMs.
>>>
>>> We reproduced the issue internally (well, we reproduced "something".
>>> The
>>> user setups in this thread are diverse) and our findings seem to
>>> confirm
>>> what the users reported. Running iperf3 from the pfSense VM to a
>>> debian VM
>>> gives results around 5 times slower than before. Reverting this
>>> single patch
>>> brings the performance back. On the debian to pfSense direction, the
>>> drop is
>>> about 25%.
>>
>> pfSense is based on FreeBSD, so I would bet that whatever performance
>> degradation you are seeing would also happen with plain FreeBSD. I
>> would assume netfront in FreeBSD is triggering the ratelimit on Linux,
>> and hence it gets throttled.
>>
>> Do you think you have the bandwidth to look into the FreeBSD side and
>> try to provide a fix? I'm happy to review and commit in upstream
>> FreeBSD, but would be nice to have someone else also in the loop as
>> ATM I'm the only one doing FreeBSD/Xen development AFAIK.
>>
>> Thanks, Roger.
>>
>
> (sorry about the previous email, looks like my mail client hates me)
>
> I would personnally not be able to hack into either Xen, the linux
> kernel or FreeBSD in any efficient way. My role here is limited to
> packaging, testing and acting as a relay between users and developers.
> We currently don't have anyone at Vates who would be able to hack into
> FreeBSD either.
>
> What currently put FreeBSD on our radar is the large amount of users
> who use FreeNAS/TrueNAS or pfSense VMs, and the recent bugs they
> detected (XSA-360 and this performance drop).
>
> Additionnally, regarding this performance issue, some users report an
> impact of that same patch 12 on the network performance of their
> non-BSD VMs [1][2], so I think the FreeBSD case might be helpful to
> help identify what in that patch caused throttling (if that's what
> happens), because it's easier to reproduce, but I'm not sure fixes
> would only need to be made in FreeBSD.
>
> Best regards,
>
> Samuel Verschelde
>
> [1] https://xcp-ng.org/forum/post/35521 mentions debian based Untangle
> OS and inter-VLAN traffic
> [2] https://xcp-ng.org/forum/post/35476 general slowdown affecting all
> VMs (VM to workstation traffic), from the first user who identified
> patch 12 as the cause.

Further to this, XenServer testing has also observed a ~5x drop in
intrahost VM->VM network performance between PV VMs running under PV-Shim

As one specific case has been bisected to patch 11, its obvious that
FreeBSD's netfront is hitting dom0's new spurious-event detection and
mitigation.  It is also reasonable to presume that the other ~5x hits
are related, which means it isn't behaviour unique to the FreeBSD netfront.

The next step is to figure out whether the event is genuinely spurious
(i.e. the frontend really is sending too many notifications), or whether
dom0's judgement of spuriosity is wrong.

~Andrew