Mailing List Archive

Package drop between eth0(domU) and vif(dom0)
Hello,

TL;DR: frames between 64..256 bytes entering virtual ethX never
reaches vifN.X at a rate of 1-10% while forwarding between two virtual
interfaces but not if I use a single device. Is it known issue? How to
debug further?

Now the long version.

I use a xen domU as a network router (OpenWrt) doing firewall, NAT and
port forward an UDP port to another internal machine (OpenVPN server).
Our support team reported that VPN links got high packet loss count
(1-10%) while pinging but without a significant user experience
effect. The packet loss happens only for some specific packet size and
never for others.

In other to isolate VPN problems, I built a simple UDP ping service
(socat pipe) and a UDP ping client (also socat), which could detect
packet loss and change packet size. For reference:

server# socat -v PIPE udp-recvfrom:4000,fork

client# for size in $(seq 1 500); do i=0; for try in $(seq 50); do
echo -n "$(date "+%s %c")"; rec=$({ printf "%-${size}s" "$i"; sleep 1;
} | socat - udp:my-internet-ip:4000); if [ "$rec" ]; then echo " "
$rec "$(date "+%s %c")"; else echo " lost $i (size $size)"; continue
2; fi; sleep 1 ; : $((i++)); done; echo "No loss (size $size)"; done #
sorry for the oneliner haters

I ran the UDP ping server in another completely different internal
server directly connected to the router (isolating any problem with
that server, OpenVPN service, network switching or any external
issues). Something like this:

client (socat client) -> (eth3) router:4000/udp (DNAT) (eth0) ->
internal-server:4000/udp (socat server)

I could reproduce the problem when UDP ping payload matched the same
size of OpenVPN packet while pinging. So, I tested it changing the UDP
payload size from 1 to 500 bytes. The packet loss started at some
"magic numbers":

udp payload size 1..21, frame size 43..63 bytes: no loss
udp payload size 22..214, frame size 64..256 bytes: 1-5% loss
udp payload size 215..500, frame size 257..542 bytes: no loss

It is consistently reproducible as in 50 UDP pings there was only 1
case of false negative in 22..214 range and three false positive in
43..63 and 257..542 ranges (probably normal network loss)

I sniffed both ethX(domU) and vifN.X(dom0). It seems that the frame
got into ethX but never appeared in vifN.X. It happened in both ethX
devices, while router sends to internal server (eth0) and also while
forwarding to client (eth3).

If I run the UDP ping server in the router, the problem does not
happen. If I forward the package in the router using userland (socat
udp-recvfrom:4000,fork udp:internal-server:4000) instead of port
forwarding, the problem does not appear. It only happens when I use
two different interfaces and kernel-mode only processing (iptables).

After I isolated the problem, I can reproduce it with normal ping
passing through the router that matches the problematic frame size
range.

The domU that have the problem is using kernel 4.14.63 (OpenWrt 18.06)
with no xen-related patches. The dom0 is a SLES12SP4 running
4.12.14-95.6-default on xen 4.11.1_02-2.3.

I'll try to change domU/dom0/xen versions in order to isolate futher
the problem. However, I guess that didn't happen with SLES12SP3 (xen
4.9.3_03-3.47)

Is this a known issue already fixed in a xen newer version or kernel release?
I'm using Xen for some year but I have no experience on how to debug
Xen internals.

Regards,

---
Luiz Angelo Daros de Luca
luizluca@gmail.com

_______________________________________________
Xen-users mailing list
Xen-users@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-users