Mailing List Archive: [lvs-users] sporadic connection reset on director

Hi,

I am currently trying to get down to the core of a problem where my
LVS-director seems to drop a packet coming from a client from time to
time. We have this problem on our production systems and can reproduce
the problem on staging.

Our setup:
===========
We are using ipvsadm with Linux CentOS5 x86_64 in a PV XEN-DomU.

Current Version details:
Kernel: 2.6.18-348.1.1.el5xen
ipvsadm: 1.24-13.el5

LVS-Setup:
We use IPVS in DR-mode, for managing the running connections we use
lvs-kiss.
lvs is running in a heartbeat-v1-cluster (two virtual nodes), master
and backup are running constantly on both nodes
For the LVS-services we use logical IPs being setup by heartbeat
(active/passive-clustermode)

The real-servers are physical Linux-machines.

Network-Setup:
The VM acting as director is running as XEN-PV-DomU on a Dom0 using
bridged networks.
Networks "in play":
abn-network (staging-network, used to connect the client to the
director),
used by the real-servers to send the answer to the clients (direct
routing approach),
used for ipvsadm slave/master multicast-traffic

lvs-network: This is a dedicated VLAN which connects director and
real-servers

dr-arp-problem: solved my suppressing arp-answers on the
real-servers for the service-ip

The service-IP is configured as logical IP on the lvs-interface on
the real-servers.
In this setup ip_forwarding is not needed anywhere (neither on
director, nor on real-server).

VM details:
1 GB RAM, 2 vCPUs, system-load almost 0, memory 73M free, 224M
buffers, 536M cache, no swap.
top shows almost always 100% idle, 0% us/sy/ni/wa/hi/si/st.

Configuration details:

ipvsadm -Ln for the service in question shows:

TCP x.y.183.217:12405 wrr persistent 7200
-> 192.168.83.234:12405 Route 1000 0 0
-> 192.168.83.235:12405 Route 1000 0 0

x.y first two octets are from our internal class-B-range.
We use 192.168.83.x as lvs-network for staging.

Persistent ipvsadm-configuration:
/etc/sysconfig/ipvsadm: --set 20 20 20

Cluster-configuration:
/etc/ha.d/haresources: $primary_directorname lvs-kiss x.y.183.217

lvs-kiss-configuration-snippet for the service above:

<VirtualServer idm-abn:12405>
ServiceType tcp
Scheduler wrr
DynamicScheduler 0
Persistance 7200
QueueSize 2
Fuzz 0.1
<RealServer rs1-lvs:12405>
PacketForwardingMethod gatewaying
Test ping -c 1 -nq -W 1 rs1-lvs >/dev/null
RunOnFailure "/sbin/ipvsadm -d -t idm-abn:12405 -r rs1-lvs"
RunOnRecovery "/sbin/ipvsadm -a -t idm-abn:12405 -r rs1-lvs"
</RealServer>
<RealServer rs2-lvs:12405>
PacketForwardingMethod gatewaying
Test ping -c 1 -nq -W 1 rs2-lvs >/dev/null
RunOnFailure "/sbin/ipvsadm -d -t idm-abn:12405 -r rs2-lvs"
RunOnRecovery "/sbin/ipvsadm -a -t idm-abn:12405 -r rs2-lvs"
</RealServer>
</VirtualServer>

idm-abn, rs1 and rs2 resolve via /etc/hosts.

About the service:
This is a soa-web-service.

How we reproduce the error:
From a client we run constant calls to the web-service at an interval
of one call in three seconds.
From time to time there will be a connection reset from the director
to the client.
Interesting: This happens on n x 100th + 1 tries - interesting is the
one.

What we did to trace down the problem:
- Checked /proc/sys/net/ipv4/vs: all values are set to default, so
drop_packet is NOT in place (=0)
- tcpdump on client, fronted/abn of the director, backend/lvs of the
directory, lvs and abn of the real-servers

In this tcpdump we could see a request from the client, answered by a
connection-reset by the director.
The packet was NOT forwarded via LVS.

I welcome any ideas on how to track this problem further down.
If any information is unclear/missing to drill down the problem - please
ask.

Kind regards

Nils Hildebrand

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users

Hi,

sadly no answer so far here.
I am currently stuck trying to compile the ipv kernel-modules (with
debugging enabled) with a XEN-DomU-kernel on CentOS 5.

I followed these HowTos:

http://wiki.centos.org/HowTos/I_need_the_Kernel_Source (yes, I needed
the full kernel-source),

http://wiki.centos.org/HowTos/BuildingKernelModule (building a .ko),

And went to Step 8:

EXTRAVERSION = -348.4.1.el5xen

Now I got a number of kernel-modules:

ip_vs_dh.ko
ip_vs_ftp.ko
ip_vs.ko
ip_vs_lblc.ko
ip_vs_lblcr.ko
ip_vs_lc.ko
ip_vs_nq.ko
ip_vs_rr.ko
ip_vs_sed.ko
ip_vs_sh.ko
ip_vs_wlc.ko
ip_vs_wrr.ko

The main modul "ip_vs" loads fine.

But: if I start to use ipvs (e.g. by starting the replication slave) my
systems gets a kernel-panic.

I am most propably missing something that is needed to fully integrate
the network-stack for the XEN-DomU into my debug-kernel (and thus into
the compiled modules).

Network IS different on a PV XEN-DomU...

Any pointers for this problem?

Kind regards

Nils

> -----Original Message-----
> From: lvs-users-bounces@linuxvirtualserver.org [mailto:lvs-users-
> bounces@linuxvirtualserver.org] On Behalf Of Hildebrand, Nils, 122
> Sent: Thursday, February 21, 2013 2:36 PM
> To: LinuxVirtualServer.org users mailing list.
> Subject: [lvs-users] sporadic connection reset on director
>
> Hi,
>
> I am currently trying to get down to the core of a problem where my
> LVS-director seems to drop a packet coming from a client from time to
> time. We have this problem on our production systems and can reproduce
> the problem on staging.
>
> Our setup:
> ===========
> We are using ipvsadm with Linux CentOS5 x86_64 in a PV XEN-DomU.
>
> Current Version details:
> Kernel: 2.6.18-348.1.1.el5xen
> ipvsadm: 1.24-13.el5
>
> LVS-Setup:
> We use IPVS in DR-mode, for managing the running connections we use
> lvs-kiss.
> lvs is running in a heartbeat-v1-cluster (two virtual nodes), master
> and backup are running constantly on both nodes
> For the LVS-services we use logical IPs being setup by heartbeat
> (active/passive-clustermode)
>
> The real-servers are physical Linux-machines.
>
> Network-Setup:
> The VM acting as director is running as XEN-PV-DomU on a Dom0 using
> bridged networks.
> Networks "in play":
> abn-network (staging-network, used to connect the client to the
> director),
> used by the real-servers to send the answer to the clients (direct
> routing approach),
> used for ipvsadm slave/master multicast-traffic
>
> lvs-network: This is a dedicated VLAN which connects director and
> real-servers
>
> dr-arp-problem: solved my suppressing arp-answers on the
> real-servers for the service-ip
>
> The service-IP is configured as logical IP on the lvs-interface on
> the real-servers.
> In this setup ip_forwarding is not needed anywhere (neither on
> director, nor on real-server).
>
> VM details:
> 1 GB RAM, 2 vCPUs, system-load almost 0, memory 73M free, 224M
> buffers, 536M cache, no swap.
> top shows almost always 100% idle, 0% us/sy/ni/wa/hi/si/st.
>
>
> Configuration details:
>
> ipvsadm -Ln for the service in question shows:
>
> TCP x.y.183.217:12405 wrr persistent 7200
> -> 192.168.83.234:12405 Route 1000 0 0
> -> 192.168.83.235:12405 Route 1000 0 0
>
> x.y first two octets are from our internal class-B-range.
> We use 192.168.83.x as lvs-network for staging.
>
> Persistent ipvsadm-configuration:
> /etc/sysconfig/ipvsadm: --set 20 20 20
>
> Cluster-configuration:
> /etc/ha.d/haresources: $primary_directorname lvs-kiss x.y.183.217
>
> lvs-kiss-configuration-snippet for the service above:
>
> <VirtualServer idm-abn:12405>
> ServiceType tcp
> Scheduler wrr
> DynamicScheduler 0
> Persistance 7200
> QueueSize 2
> Fuzz 0.1
> <RealServer rs1-lvs:12405>
> PacketForwardingMethod gatewaying
> Test ping -c 1 -nq -W 1 rs1-lvs >/dev/null
> RunOnFailure "/sbin/ipvsadm -d -t idm-abn:12405 -r rs1-lvs"
> RunOnRecovery "/sbin/ipvsadm -a -t idm-abn:12405 -r rs1-lvs"
> </RealServer>
> <RealServer rs2-lvs:12405>
> PacketForwardingMethod gatewaying
> Test ping -c 1 -nq -W 1 rs2-lvs >/dev/null
> RunOnFailure "/sbin/ipvsadm -d -t idm-abn:12405 -r rs2-lvs"
> RunOnRecovery "/sbin/ipvsadm -a -t idm-abn:12405 -r rs2-lvs"
> </RealServer>
> </VirtualServer>
>
> idm-abn, rs1 and rs2 resolve via /etc/hosts.
>
> About the service:
> This is a soa-web-service.
>
> How we reproduce the error:
> From a client we run constant calls to the web-service at an
interval
> of one call in three seconds.
> From time to time there will be a connection reset from the director
> to the client.
> Interesting: This happens on n x 100th + 1 tries - interesting is
the
> one.
>
> What we did to trace down the problem:
> - Checked /proc/sys/net/ipv4/vs: all values are set to default, so
> drop_packet is NOT in place (=0)
> - tcpdump on client, fronted/abn of the director, backend/lvs of the
> directory, lvs and abn of the real-servers
>
> In this tcpdump we could see a request from the client, answered by a
> connection-reset by the director.
> The packet was NOT forwarded via LVS.
>
> I welcome any ideas on how to track this problem further down.
> If any information is unclear/missing to drill down the problem -
> please
> ask.
>
> Kind regards
>
> Nils Hildebrand
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users