Mailing List Archive

[lvs-users] One second connection delay in masquerading mode
Hi all,

I encountered an issue with IPVS load balancing in case of short-lived
connections. I've seen it in masqurading mode on CentOS 7 (kernel
3.10.0-693.11.6) and CoreOS 1235.12.0 (4.7.3-coreos-r3). After opening and
closing of thousands of TCP connctions, new connections are being delayed for 1
second.

Please see a short example [4], there are steps to reproduce the issue. It
starts nginx on port 8080 and creates a virtual service 127.0.0.1:80 ->
127.0.0.1:8080. After that an HTTP load generator (rakyll/hey) sends 30k queries
with disabled keep-alive option. All records in ip_vs_conn table are in
TIME_WAIT state. Then ipvs debug level is enabled and strace starts curl(1) with
the same virtual service. Curl encounters the 1 second delay as shown. Attached
you can find the full versions of strace.log and dmesg.log as well as their short
versions [1] [2].

Setting conn_reuse_mode to 0 resolves the issue, but doesn't fit our needs and
doesn't work well in case of changing the list of real servers.

What could be causing the delay? How can we get rid of it?

Thank you.

[1] strace.log

1516823661.570756 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
1516823661.571578 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
1516823661.572153 poll([{fd=3, events=POLLOUT}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT}])
1516823662.573304 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}])
1516823662.573556 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
1516823662.573947 getpeername(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
1516823662.574139 getsockname(3, {sa_family=AF_INET, sin_port=htons(34258), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
1516823662.574641 sendto(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 73, MSG_NOSIGNAL, NULL, 0) = 73

[2] dmesg.log

706.699647 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
706.699656 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit
706.699662 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 hit
706.700092 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn 127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5
706.700097 IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=127.0.0.1:34258->127.0.0.1:80/6
706.700102 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn 127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5
706.700105 IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=127.0.0.1:34258->127.0.0.1:80/6
706.700110 IPVS: Unbind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 fwd:M s:5 conn->flags:10100 conn->refcnt:0 dest->refcnt:15505
707.701103 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
707.701128 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit
707.701143 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 not hit
707.701156 IPVS: lookup service: fwm 0 TCP 127.0.0.1:80 hit
707.701169 IPVS: lookup/in TCP 127.0.0.1:80->127.0.0.1:34258 not hit
707.701180 IPVS: ip_vs_wlc_schedule(): Scheduling...
707.701193 IPVS: WLC: server 127.0.0.1:8080 activeconns 0 refcnt 15504 weight 1 overhead 15503
707.701210 IPVS: Bind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 fwd:M s:5 conn->flags:100 conn->refcnt:1 dest->refcnt:15505
707.701225 IPVS: Schedule fwd:M c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 conn->flags:10140 conn->refcnt:2
707.701239 IPVS: TCP input [S...] 127.0.0.1:8080->127.0.0.1:34258 state: NONE->SYN_RECV conn->refcnt:2
707.701252 IPVS: Enter: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 625
707.701269 IPVS: ip_vs_update_conntrack: Updating conntrack ct=ffff8800ce14e5d0, status=0x100, ctinfo=2, old reply=127.0.0.1:80->127.0.0.1:34258/6, new reply=127.0.0.1:8080->127.0.0.1:34258/6, cp=127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:3
707.701282 IPVS: Leave: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 698
707.701330 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
707.701374 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132

[3] sysctl

# uname -r
3.10.0-693.11.6.el7.x86_64.debug
# sysctl net.ipv4.vs
net.ipv4.vs.am_droprate = 10
net.ipv4.vs.amemthresh = 1024
net.ipv4.vs.backup_only = 0
net.ipv4.vs.cache_bypass = 0
net.ipv4.vs.conn_reuse_mode = 1
net.ipv4.vs.conntrack = 1
net.ipv4.vs.debug_level = 0
net.ipv4.vs.drop_entry = 0
net.ipv4.vs.drop_packet = 0
net.ipv4.vs.expire_nodest_conn = 0
net.ipv4.vs.expire_quiescent_template = 0
net.ipv4.vs.nat_icmp_send = 0
net.ipv4.vs.pmtu_disc = 1
net.ipv4.vs.secure_tcp = 0
net.ipv4.vs.snat_reroute = 1
net.ipv4.vs.sync_ports = 1
net.ipv4.vs.sync_qlen_max = 61432
net.ipv4.vs.sync_refresh_period = 0
net.ipv4.vs.sync_retries = 0
net.ipv4.vs.sync_sock_size = 0
net.ipv4.vs.sync_threshold = 3 50
net.ipv4.vs.sync_version = 1
# cat /sys/module/ip_vs/parameters/conn_tab_bits
18

[4] Steps to reproduce

# modprobe ip_vs_wlc
# sysctl -w net.ipv4.vs.conntrack=1
# sysctl -w net.ipv4.ip_local_port_range='32768 60999'
# docker run --rm --net=host --detach --name=nginx nginx sh -c 'echo "daemon off; worker_processes 2; events { worker_connections 16384; } http { server { listen 8080 backlog=16384; access_log off; server_name localhost; location / { return 200; } } }" > /etc/nginx/ping.conf && nginx -c /etc/nginx/ping.conf'
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -A -t 127.0.0.1:80
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -a -t 127.0.0.1:80 -r 127.0.0.1:8080 -m
# docker run --rm --net=host mesosphere/hey -n 30000 -c 1000 -disable-keepalive -t 3 http://127.0.0.1/
# awk '{print $8}' /proc/net/ip_vs_conn | sort | uniq -c
# sysctl -w net.ipv4.vs.debug_level=15; strace -ttt curl --silent http://127.0.0.1/ 2>&1 | tee strace.log; sysctl -w net.ipv4.vs.debug_level=0
# dmesg | sed -re 's!^\[ *([0-9.]+)\]!\1!' | tee dmesg.log
# docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C
# docker kill nginx
Re: [lvs-users] One second connection delay in masquerading mode [ In reply to ]
Hello Sergey

I had a quick look at your configuration and the first thing that
comes to mind is can you do this on a single host?

First what I see is you are in SYN_RECV as shown below. this means you
have not resolved the ARP issue. now again back to can you do this on
the same host like you are?

NONE->SYN_RECV conn->refcnt:2

I don't want to say NO however if you move your docker nginx guest
somewhere else and not use 127.0.0.1 as this is where the VIP/32 sits
on the real server and as LVS DR Mode can not talk to its self I
suggest this is because you cant do internal to director hosting of
services. or connect same host to VIP of the director.

Anyone wish to correct me ? It makes sense when the director is a
separate VM (Kvm etc) director where the docker host can share the
same host we now longer use 127.0.0.x addresses for communications.

Andruw Smalley

Loadbalancer.org Ltd.

www.loadbalancer.org
+1 888 867 9504 / +44 (0)330 380 1064
asmalley@loadbalancer.org

Leave a Review | Deployment Guides | Blog


On 24 January 2018 at 21:51, Sergey Urbanovich
<surbanovich@mesosphere.io> wrote:
> Hi all,
>
> I encountered an issue with IPVS load balancing in case of short-lived
> connections. I've seen it in masqurading mode on CentOS 7 (kernel
> 3.10.0-693.11.6) and CoreOS 1235.12.0 (4.7.3-coreos-r3). After opening and
> closing of thousands of TCP connctions, new connections are being delayed for 1
> second.
>
> Please see a short example [4], there are steps to reproduce the issue. It
> starts nginx on port 8080 and creates a virtual service 127.0.0.1:80 ->
> 127.0.0.1:8080. After that an HTTP load generator (rakyll/hey) sends 30k queries
> with disabled keep-alive option. All records in ip_vs_conn table are in
> TIME_WAIT state. Then ipvs debug level is enabled and strace starts curl(1) with
> the same virtual service. Curl encounters the 1 second delay as shown. Attached
> you can find the full versions of strace.log and dmesg.log as well as their short
> versions [1] [2].
>
> Setting conn_reuse_mode to 0 resolves the issue, but doesn't fit our needs and
> doesn't work well in case of changing the list of real servers.
>
> What could be causing the delay? How can we get rid of it?
>
> Thank you.
>
> [1] strace.log
>
> 1516823661.570756 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
> 1516823661.571578 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 0 (Timeout)
> 1516823661.572153 poll([{fd=3, events=POLLOUT}], 1, 1000) = 1 ([{fd=3, revents=POLLOUT}])
> 1516823662.573304 poll([{fd=3, events=POLLOUT|POLLWRNORM}], 1, 0) = 1 ([{fd=3, revents=POLLOUT|POLLWRNORM}])
> 1516823662.573556 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
> 1516823662.573947 getpeername(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
> 1516823662.574139 getsockname(3, {sa_family=AF_INET, sin_port=htons(34258), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
> 1516823662.574641 sendto(3, "GET / HTTP/1.1\r\nUser-Agent: curl"..., 73, MSG_NOSIGNAL, NULL, 0) = 73
>
> [2] dmesg.log
>
> 706.699647 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
> 706.699656 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit
> 706.699662 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 hit
> 706.700092 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn 127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5
> 706.700097 IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=127.0.0.1:34258->127.0.0.1:80/6
> 706.700102 IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=127.0.0.1:34258->127.0.0.1:80/6 for conn 127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:5
> 706.700105 IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=127.0.0.1:34258->127.0.0.1:80/6
> 706.700110 IPVS: Unbind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 fwd:M s:5 conn->flags:10100 conn->refcnt:0 dest->refcnt:15505
> 707.701103 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
> 707.701128 IPVS: lookup/out TCP 127.0.0.1:34258->127.0.0.1:80 not hit
> 707.701143 IPVS: lookup/in TCP 127.0.0.1:34258->127.0.0.1:80 not hit
> 707.701156 IPVS: lookup service: fwm 0 TCP 127.0.0.1:80 hit
> 707.701169 IPVS: lookup/in TCP 127.0.0.1:80->127.0.0.1:34258 not hit
> 707.701180 IPVS: ip_vs_wlc_schedule(): Scheduling...
> 707.701193 IPVS: WLC: server 127.0.0.1:8080 activeconns 0 refcnt 15504 weight 1 overhead 15503
> 707.701210 IPVS: Bind-dest TCP c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 fwd:M s:5 conn->flags:100 conn->refcnt:1 dest->refcnt:15505
> 707.701225 IPVS: Schedule fwd:M c:127.0.0.1:34258 v:127.0.0.1:80 d:127.0.0.1:8080 conn->flags:10140 conn->refcnt:2
> 707.701239 IPVS: TCP input [S...] 127.0.0.1:8080->127.0.0.1:34258 state: NONE->SYN_RECV conn->refcnt:2
> 707.701252 IPVS: Enter: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 625
> 707.701269 IPVS: ip_vs_update_conntrack: Updating conntrack ct=ffff8800ce14e5d0, status=0x100, ctinfo=2, old reply=127.0.0.1:80->127.0.0.1:34258/6, new reply=127.0.0.1:8080->127.0.0.1:34258/6, cp=127.0.0.1:34258->127.0.0.1:80->127.0.0.1:8080/6:3
> 707.701282 IPVS: Leave: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 698
> 707.701330 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
> 707.701374 IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
>
> [3] sysctl
>
> # uname -r
> 3.10.0-693.11.6.el7.x86_64.debug
> # sysctl net.ipv4.vs
> net.ipv4.vs.am_droprate = 10
> net.ipv4.vs.amemthresh = 1024
> net.ipv4.vs.backup_only = 0
> net.ipv4.vs.cache_bypass = 0
> net.ipv4.vs.conn_reuse_mode = 1
> net.ipv4.vs.conntrack = 1
> net.ipv4.vs.debug_level = 0
> net.ipv4.vs.drop_entry = 0
> net.ipv4.vs.drop_packet = 0
> net.ipv4.vs.expire_nodest_conn = 0
> net.ipv4.vs.expire_quiescent_template = 0
> net.ipv4.vs.nat_icmp_send = 0
> net.ipv4.vs.pmtu_disc = 1
> net.ipv4.vs.secure_tcp = 0
> net.ipv4.vs.snat_reroute = 1
> net.ipv4.vs.sync_ports = 1
> net.ipv4.vs.sync_qlen_max = 61432
> net.ipv4.vs.sync_refresh_period = 0
> net.ipv4.vs.sync_retries = 0
> net.ipv4.vs.sync_sock_size = 0
> net.ipv4.vs.sync_threshold = 3 50
> net.ipv4.vs.sync_version = 1
> # cat /sys/module/ip_vs/parameters/conn_tab_bits
> 18
>
> [4] Steps to reproduce
>
> # modprobe ip_vs_wlc
> # sysctl -w net.ipv4.vs.conntrack=1
> # sysctl -w net.ipv4.ip_local_port_range='32768 60999'
> # docker run --rm --net=host --detach --name=nginx nginx sh -c 'echo "daemon off; worker_processes 2; events { worker_connections 16384; } http { server { listen 8080 backlog=16384; access_log off; server_name localhost; location / { return 200; } } }" > /etc/nginx/ping.conf && nginx -c /etc/nginx/ping.conf'
> # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C
> # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -A -t 127.0.0.1:80
> # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -a -t 127.0.0.1:80 -r 127.0.0.1:8080 -m
> # docker run --rm --net=host mesosphere/hey -n 30000 -c 1000 -disable-keepalive -t 3 http://127.0.0.1/
> # awk '{print $8}' /proc/net/ip_vs_conn | sort | uniq -c
> # sysctl -w net.ipv4.vs.debug_level=15; strace -ttt curl --silent http://127.0.0.1/ 2>&1 | tee strace.log; sysctl -w net.ipv4.vs.debug_level=0
> # dmesg | sed -re 's!^\[ *([0-9.]+)\]!\1!' | tee dmesg.log
> # docker run --rm --net=host --privileged mesosphere/net-toolbox ipvsadm -C
> # docker kill nginx
>
>
>
> --
> Yours,
> Sergey Urbanovich
>
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] One second connection delay in masquerading mode [ In reply to ]
Hi Andrew,

Thank you for your response.

Initially I’ve found the issue on another complex network configuration. There
was the same 1 second delay. I hope It’s the same issue as on 127.0.0.1 network
from the initial post.

* http server worked on overlay network (9.0.3.130:80), on host 10.10.0.21.
* http clients started on host 10.10.0.49 with virtual service 11.171.172.80:80.

Please see the following and attached dmesg logs.

[ 1091.367563] IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
[ 1091.367571] IPVS: lookup/out TCP 11.171.172.80:56750->11.171.172.80:80 not hit
[ 1091.367578] IPVS: lookup/in TCP 11.171.172.80:56750->11.171.172.80:80 hit
[ 1091.368097] IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=11.171.172.80:56750->11.171.172.80:80/6 for conn 11.171.172.80:56750->11.171.172.80:80->9.0.3.130:80/6:5
[ 1091.368102] IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=11.171.172.80:56750->11.171.172.80:80/6
[ 1091.368106] IPVS: ip_vs_conn_drop_conntrack: dropping conntrack with tuple=11.171.172.80:56750->11.171.172.80:80/6 for conn 11.171.172.80:56750->11.171.172.80:80->9.0.3.130:80/6:5
[ 1091.368109] IPVS: ip_vs_conn_drop_conntrack: no conntrack for tuple=11.171.172.80:56750->11.171.172.80:80/6
[ 1091.368114] IPVS: Unbind-dest TCP c:11.171.172.80:56750 v:11.171.172.80:80 d:9.0.3.130:80 fwd:M s:5 conn->flags:10100 conn->refcnt:0 dest->refcnt:13880
[ 1092.368128] IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
[ 1092.368134] IPVS: lookup/out TCP 11.171.172.80:56750->11.171.172.80:80 not hit
[ 1092.368138] IPVS: lookup/in TCP 11.171.172.80:56750->11.171.172.80:80 not hit
[ 1092.368141] IPVS: lookup service: fwm 0 TCP 11.171.172.80:80 hit
[ 1092.368143] IPVS: lookup/in TCP 11.171.172.80:80->11.171.172.80:56750 not hit
[ 1092.368144] IPVS: ip_vs_wlc_schedule(): Scheduling...
[ 1092.368146] IPVS: WLC: server 9.0.3.130:80 activeconns 0 refcnt 13879 weight 1 overhead 13878
[ 1092.368154] IPVS: Bind-dest TCP c:11.171.172.80:56750 v:11.171.172.80:80 d:9.0.3.130:80 fwd:M s:5 conn->flags:100 conn->refcnt:1 dest->refcnt:13880
[ 1092.368158] IPVS: Schedule fwd:M c:11.171.172.80:56750 v:11.171.172.80:80 d:9.0.3.130:80 conn->flags:10140 conn->refcnt:2
[ 1092.368163] IPVS: TCP input [S...] 9.0.3.130:80->11.171.172.80:56750 state: NONE->SYN_RECV conn->refcnt:2
[ 1092.368165] IPVS: Enter: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 625
[ 1092.368172] IPVS: ip_vs_update_conntrack: Updating conntrack ct=ffff8801e67ab020, status=0x100, ctinfo=2, old reply=11.171.172.80:80->11.171.172.80:56750/6, new reply=9.0.3.130:80->11.171.172.80:56750/6, cp=11.171.172.80:56750->11.171.172.80:80->9.0.3.130:80/6:3
[ 1092.368183] IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
[ 1092.368187] IPVS: lookup/out TCP 9.0.3.130:80->11.171.172.80:56750 hit
[ 1092.368220] IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
[ 1092.368224] IPVS: lookup/out UDP 10.10.0.49:64000->10.10.0.21:38047 not hit
[ 1092.368244] IPVS: Leave: ip_vs_nat_xmit, net/netfilter/ipvs/ip_vs_xmit.c line 698
[ 1092.368538] IPVS: Enter: ip_vs_out, net/netfilter/ipvs/ip_vs_core.c line 1132
Re: [lvs-users] One second connection delay in masquerading mode [ In reply to ]
Hello,

On Wed, 24 Jan 2018, Sergey Urbanovich wrote:

> Hi all,
>
> I encountered an issue with IPVS load balancing in case of short-lived
> connections. I've seen it in masqurading mode on CentOS 7 (kernel
> 3.10.0-693.11.6) and CoreOS 1235.12.0 (4.7.3-coreos-r3). After opening and
> closing of thousands of TCP connctions, new connections are being delayed for 1
> second.
>
> Please see a short example [4], there are steps to reproduce the issue. It
> starts nginx on port 8080 and creates a virtual service 127.0.0.1:80 ->
> 127.0.0.1:8080. After that an HTTP load generator (rakyll/hey) sends 30k queries
> with disabled keep-alive option. All records in ip_vs_conn table are in
> TIME_WAIT state. Then ipvs debug level is enabled and strace starts curl(1) with
> the same virtual service. Curl encounters the 1 second delay as shown. Attached
> you can find the full versions of strace.log and dmesg.log as well as their short
> versions [1] [2].
>
> Setting conn_reuse_mode to 0 resolves the issue, but doesn't fit our needs and
> doesn't work well in case of changing the list of real servers.
>
> What could be causing the delay? How can we get rid of it?

It should be this code that leads to delay:

if (uses_ct)
return NF_DROP;

What happens is that we drop SYN packet that hits IPVS
connection in TIME_WAIT state if such connection uses
Netfilter connection tracking (conntrack=1).

The conn_reuse_mode=1 relies on selecting different
real server but as we can not alter the Netfilter conntrack
tuple after it is confirmed, we drop the conntrack, the IPVS
connection and current packet and expect next SYN (retransmitted
after 1 second, as you observe) to create new IPVS connection
and corresponding conntrack to some available real server.
And that is what happens after 1 second.

To get rid of this delay you have the following options:

1. do not enable IPVS conntrack mode (can be slower to create
and drop conntrack on every packet), use conntrack=0 for this.
This allows IPVS to ignore the TIME_WAIT connection and to
create a new one.

2. Use NOTRACK for IPVS connections, it should be faster
because conntracks are not created/removed

iptables -t raw -A PREROUTING -p tcp -d VIP --dport VPORT -j CT --notrack

For local clients use -A OUTPUT -o lo

If needed, such traffic can be matched with -m state --state UNTRACKED

3. Reduce the TIME_WAIT timeout in IPVS source, table
tcp_timeouts[]. It does not solve the problem but reduces
its rate.

Regards

--
Julian Anastasov <ja@ssi.bg>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] One second connection delay in masquerading mode [ In reply to ]
Hi Julian,

Thank you very much for such a valuable answer.

I'd like to add some more information about my use case. In our system client
and load-balancer work on the same server, and remote servers could bind on
different TCP ports; so VS/NAT is a requirement for us.

Do I understand correctly that I can’t use the first two options from your
previous message (conntrack=0 and --notrack) in my use case?

Why do we have to drop the first SYN packet if, in 1 second, the next SYN packet
uses the same source and destination TCP ports and it works fine?

Thanks again.

--
Yours,
Sergey Urbanovich

On Jan 27, 2018, at 7:16 AM, Julian Anastasov <ja@ssi.bg> wrote:

Hello,

It should be this code that leads to delay:

if (uses_ct)
return NF_DROP;

What happens is that we drop SYN packet that hits IPVS
connection in TIME_WAIT state if such connection uses
Netfilter connection tracking (conntrack=1).

The conn_reuse_mode=1 relies on selecting different
real server but as we can not alter the Netfilter conntrack
tuple after it is confirmed, we drop the conntrack, the IPVS
connection and current packet and expect next SYN (retransmitted
after 1 second, as you observe) to create new IPVS connection
and corresponding conntrack to some available real server.
And that is what happens after 1 second.

To get rid of this delay you have the following options:

1. do not enable IPVS conntrack mode (can be slower to create
and drop conntrack on every packet), use conntrack=0 for this.
This allows IPVS to ignore the TIME_WAIT connection and to
create a new one.

2. Use NOTRACK for IPVS connections, it should be faster
because conntracks are not created/removed

iptables -t raw -A PREROUTING -p tcp -d VIP --dport VPORT -j CT --notrack

For local clients use -A OUTPUT -o lo

If needed, such traffic can be matched with -m state --state UNTRACKED

3. Reduce the TIME_WAIT timeout in IPVS source, table
tcp_timeouts[]. It does not solve the problem but reduces
its rate.

Regards

--
Julian Anastasov <ja@ssi.bg>


_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] One second connection delay in masquerading mode [ In reply to ]
Hello,

On Wed, 31 Jan 2018, Sergey Urbanovich wrote:

> Hi Julian,
>
> Thank you very much for such a valuable answer.
>
> I'd like to add some more information about my use case. In our system client
> and load-balancer work on the same server, and remote servers could bind on
> different TCP ports; so VS/NAT is a requirement for us.
>
> Do I understand correctly that I can’t use the first two options from your
> previous message (conntrack=0 and --notrack) in my use case?

conntrack=1 allows the IPVS traffic to be under
netfilter control, such as, matching by conntrack state (NEW,
ESTABLISHED, RELATED), additional NAT, etc. You do not need
netfilter conntracks to NAT the packets, IPVS can do its
function without using conntracks.

> Why do we have to drop the first SYN packet if, in 1 second, the next SYN packet
> uses the same source and destination TCP ports and it works fine?

The original direction (CIP:CPORT->VIP:VPORT) is
same because the client sockets retransmits its SYN packet.
But IPVS can select different real server. So, the reply
direction can be from different RIP:RPORT to CIP:CPORT.
Both IPVS conns and the Netfilter conntrack require unique
tuple in both directions, so we can not keep the old
connections. As conntracks are created only when packet
enters the stack, our solution is to drop the old
conntrack, so that the new one is created with unique
tuples to the newly selected real server.

Regards

--
Julian Anastasov <ja@ssi.bg>