Mailing List Archive

[lvs-users] ldirectord does not transfer connections when a real server dies
Hello LVS users,

I am using ldirectord to load balance two IIS servers. The
ldirectord.cglooks like this:


autoreload = yes
quiescent = yes
checkinterval = 1
negotiatetimeout = 2
emailalertfreq = 60
emailalert = Konstantin.Boyanov@mysite.com
failurecount = 1

virtual = 172.22.9.100:80
checktimeout = 1
checktype = negotiate
protocol = tcp
real = 172.22.1.133:80 masq 2048
real = 172.22.1.134:80 masq 2048
request = "alive.htm"
receive = "I am not a zombie"
scheduler = wrr

The load balancing is working fine, the real servers are visible etc.
Nevertheless I am encountering a problem with a simple test:

1. I open some connections from a client browser (IE 8) to the sites that
are hosted on the real servers
2. I cange the weight of the real server which server the above connections
to 0 and leave only the other real server alive
3. I reload the pages to regenerate the connections

What I am seeing with ipvsadm -Ln is that the connections are still on the
"dead" server. I have to wait up to one minute (I suppose some TCP timeout
from the browser-side) for them to transfer to the "living" server. And If
in this one minute I continue pressing the reload button the connections
stay at the "dead" server and their TCP timeout counter gets restarted.

So my question is: Is there a way to tell the load balancer in NAT mode to
terminate / redirect existing connections to a dead server *immediately*
(or close to immediately)?

It seems to me a blunder that a reload on the client-side can make a
connection become a "zombie", e.g. be bound to a dead real server although
persistance is not used and the other server is ready and available.

The only thing that I found affecting this timeout is changing the
keepAliveTimeout in the Windows machine running the IE8 which I use for the
tests. When I cahnged it from the dafault value of 60 seconds to 30 seconds
the connections could be transferred after 30 seconds. It seems to me very
odd that a client setting can affect the operation of a network component
as the load balancer.

And another thing - what is the colum named "Inactive Conenctions" in the
output from ipvsadm used for? Which connections are considered inactive?

And also in the output of ipvsadm i see a couple of connections with the
state TIME_WAIT. What are these for?

Any insight and suggestions are highly appreciated !

Cheers,
Konstantin



P.S: Here is some more information about the configuration:

# uname -a
Linux 3.0.58-0.6.2-default #1 SMP Fri Jan 25 08:31:01 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux

# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP lb-mysite.com wrr
-> spwfe001.mysite.com:h Masq 10 0 0
-> spwfe002.mysite.com:h Masq 10 0 0

# iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target prot opt source destination

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
SNAT all -- anywhere anywhere
to:172.22.9.100
SNAT all -- anywhere anywhere
to:172.22.1.130


# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN qlen 1000
link/ether 00:50:56:a5:77:ae brd ff:ff:ff:ff:ff:ff
inet 192.168.8.216/22 brd 192.168.11.255 scope global eth0
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN qlen 1000
link/ether 00:50:56:a5:77:af brd ff:ff:ff:ff:ff:ff
inet 172.22.9.100/22 brd 172.22.11.255 scope global eth1:1
inet 172.22.8.213/22 brd 172.22.11.255 scope global secondary eth1
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN qlen 1000
link/ether 00:50:56:a5:77:b0 brd ff:ff:ff:ff:ff:ff
inet 172.22.1.130/24 brd 172.22.1.255 scope global eth2


# cat /proc/sys/net/ipv4/ip_forward
1
# cat /proc/sys/net/ipv4/vs/conntrack
1
# cat /proc/sys/net/ipv4/vs/expire_nodest_conn
1
# cat /proc/sys/net/ipv4/vs/expire_quiescent_template
1
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] ldirectord does not transfer connections when a real server dies [ In reply to ]
Konstantin,

Easier said than done but...
You would need to completely remove the server from the LVS table,
then you can put it back in with a weight of zero.
This is similar to the health check behaviour when you set:

quiescent = no





On 30 April 2013 10:30, Konstantin Boyanov <kkboyanov@gmail.com> wrote:
> Hello LVS users,
>
> I am using ldirectord to load balance two IIS servers. The
> ldirectord.cglooks like this:
>
>
> autoreload = yes
> quiescent = yes
> checkinterval = 1
> negotiatetimeout = 2
> emailalertfreq = 60
> emailalert = Konstantin.Boyanov@mysite.com
> failurecount = 1
>
> virtual = 172.22.9.100:80
> checktimeout = 1
> checktype = negotiate
> protocol = tcp
> real = 172.22.1.133:80 masq 2048
> real = 172.22.1.134:80 masq 2048
> request = "alive.htm"
> receive = "I am not a zombie"
> scheduler = wrr
>
> The load balancing is working fine, the real servers are visible etc.
> Nevertheless I am encountering a problem with a simple test:
>
> 1. I open some connections from a client browser (IE 8) to the sites that
> are hosted on the real servers
> 2. I cange the weight of the real server which server the above connections
> to 0 and leave only the other real server alive
> 3. I reload the pages to regenerate the connections
>
> What I am seeing with ipvsadm -Ln is that the connections are still on the
> "dead" server. I have to wait up to one minute (I suppose some TCP timeout
> from the browser-side) for them to transfer to the "living" server. And If
> in this one minute I continue pressing the reload button the connections
> stay at the "dead" server and their TCP timeout counter gets restarted.
>
> So my question is: Is there a way to tell the load balancer in NAT mode to
> terminate / redirect existing connections to a dead server *immediately*
> (or close to immediately)?
>
> It seems to me a blunder that a reload on the client-side can make a
> connection become a "zombie", e.g. be bound to a dead real server although
> persistance is not used and the other server is ready and available.
>
> The only thing that I found affecting this timeout is changing the
> keepAliveTimeout in the Windows machine running the IE8 which I use for the
> tests. When I cahnged it from the dafault value of 60 seconds to 30 seconds
> the connections could be transferred after 30 seconds. It seems to me very
> odd that a client setting can affect the operation of a network component
> as the load balancer.
>
> And another thing - what is the colum named "Inactive Conenctions" in the
> output from ipvsadm used for? Which connections are considered inactive?
>
> And also in the output of ipvsadm i see a couple of connections with the
> state TIME_WAIT. What are these for?
>
> Any insight and suggestions are highly appreciated !
>
> Cheers,
> Konstantin
>
>
>
> P.S: Here is some more information about the configuration:
>
> # uname -a
> Linux 3.0.58-0.6.2-default #1 SMP Fri Jan 25 08:31:01 UTC 2013 x86_64
> x86_64 x86_64 GNU/Linux
>
> # ipvsadm -L
> IP Virtual Server version 1.2.1 (size=4096)
> Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> TCP lb-mysite.com wrr
> -> spwfe001.mysite.com:h Masq 10 0 0
> -> spwfe002.mysite.com:h Masq 10 0 0
>
> # iptables -t nat -L
> Chain PREROUTING (policy ACCEPT)
> target prot opt source destination
>
> Chain INPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain OUTPUT (policy ACCEPT)
> target prot opt source destination
>
> Chain POSTROUTING (policy ACCEPT)
> target prot opt source destination
> SNAT all -- anywhere anywhere
> to:172.22.9.100
> SNAT all -- anywhere anywhere
> to:172.22.1.130
>
>
> # ip a
> 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
> link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
> inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UNKNOWN qlen 1000
> link/ether 00:50:56:a5:77:ae brd ff:ff:ff:ff:ff:ff
> inet 192.168.8.216/22 brd 192.168.11.255 scope global eth0
> 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UNKNOWN qlen 1000
> link/ether 00:50:56:a5:77:af brd ff:ff:ff:ff:ff:ff
> inet 172.22.9.100/22 brd 172.22.11.255 scope global eth1:1
> inet 172.22.8.213/22 brd 172.22.11.255 scope global secondary eth1
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> state UNKNOWN qlen 1000
> link/ether 00:50:56:a5:77:b0 brd ff:ff:ff:ff:ff:ff
> inet 172.22.1.130/24 brd 172.22.1.255 scope global eth2
>
>
> # cat /proc/sys/net/ipv4/ip_forward
> 1
> # cat /proc/sys/net/ipv4/vs/conntrack
> 1
> # cat /proc/sys/net/ipv4/vs/expire_nodest_conn
> 1
> # cat /proc/sys/net/ipv4/vs/expire_quiescent_template
> 1
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users



--
Regards,

Malcolm Turnbull.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] ldirectord does not transfer connections when a real server dies [ In reply to ]
Hello,

First off thanks for your reply! I tried setting quiescent to NO, but the
result was not changed - when i set the weight of one of the real servers
to 0 and then reload the pages on the client, the connections to the dead
server stay on the dead server. The only thing affecting the behaviour of
the connections which I found was decreasing the KeepAliveTimeout on the
client. But this is no solution...

Do you mean that I have to _manually_ remove the server and then add it
back in with weight of zero?

About the health check - I am using the negotiate method so we can easily
pull of real servers for maitenance. Does the quiescent=no setting means
that I would have to manually readd each real server after maintenance?

Best Regards,
Konstantin Boyanov

On Tue, Apr 30, 2013 at 12:01 PM, Malcolm Turnbull <malcolm@loadbalancer.org
> wrote:

> Konstantin,
>
> Easier said than done but...
> You would need to completely remove the server from the LVS table,
> then you can put it back in with a weight of zero.
> This is similar to the health check behaviour when you set:
>
> quiescent = no
>
>
>
>
>
> On 30 April 2013 10:30, Konstantin Boyanov <kkboyanov@gmail.com> wrote:
> > Hello LVS users,
> >
> > I am using ldirectord to load balance two IIS servers. The
> > ldirectord.cglooks like this:
> >
> >
> > autoreload = yes
> > quiescent = yes
> > checkinterval = 1
> > negotiatetimeout = 2
> > emailalertfreq = 60
> > emailalert = Konstantin.Boyanov@mysite.com
> > failurecount = 1
> >
> > virtual = 172.22.9.100:80
> > checktimeout = 1
> > checktype = negotiate
> > protocol = tcp
> > real = 172.22.1.133:80 masq 2048
> > real = 172.22.1.134:80 masq 2048
> > request = "alive.htm"
> > receive = "I am not a zombie"
> > scheduler = wrr
> >
> > The load balancing is working fine, the real servers are visible etc.
> > Nevertheless I am encountering a problem with a simple test:
> >
> > 1. I open some connections from a client browser (IE 8) to the sites that
> > are hosted on the real servers
> > 2. I cange the weight of the real server which server the above
> connections
> > to 0 and leave only the other real server alive
> > 3. I reload the pages to regenerate the connections
> >
> > What I am seeing with ipvsadm -Ln is that the connections are still on
> the
> > "dead" server. I have to wait up to one minute (I suppose some TCP
> timeout
> > from the browser-side) for them to transfer to the "living" server. And
> If
> > in this one minute I continue pressing the reload button the connections
> > stay at the "dead" server and their TCP timeout counter gets restarted.
> >
> > So my question is: Is there a way to tell the load balancer in NAT mode
> to
> > terminate / redirect existing connections to a dead server *immediately*
> > (or close to immediately)?
> >
> > It seems to me a blunder that a reload on the client-side can make a
> > connection become a "zombie", e.g. be bound to a dead real server
> although
> > persistance is not used and the other server is ready and available.
> >
> > The only thing that I found affecting this timeout is changing the
> > keepAliveTimeout in the Windows machine running the IE8 which I use for
> the
> > tests. When I cahnged it from the dafault value of 60 seconds to 30
> seconds
> > the connections could be transferred after 30 seconds. It seems to me
> very
> > odd that a client setting can affect the operation of a network component
> > as the load balancer.
> >
> > And another thing - what is the colum named "Inactive Conenctions" in the
> > output from ipvsadm used for? Which connections are considered inactive?
> >
> > And also in the output of ipvsadm i see a couple of connections with the
> > state TIME_WAIT. What are these for?
> >
> > Any insight and suggestions are highly appreciated !
> >
> > Cheers,
> > Konstantin
> >
> >
> >
> > P.S: Here is some more information about the configuration:
> >
> > # uname -a
> > Linux 3.0.58-0.6.2-default #1 SMP Fri Jan 25 08:31:01 UTC 2013 x86_64
> > x86_64 x86_64 GNU/Linux
> >
> > # ipvsadm -L
> > IP Virtual Server version 1.2.1 (size=4096)
> > Prot LocalAddress:Port Scheduler Flags
> > -> RemoteAddress:Port Forward Weight ActiveConn InActConn
> > TCP lb-mysite.com wrr
> > -> spwfe001.mysite.com:h Masq 10 0 0
> > -> spwfe002.mysite.com:h Masq 10 0 0
> >
> > # iptables -t nat -L
> > Chain PREROUTING (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain INPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain OUTPUT (policy ACCEPT)
> > target prot opt source destination
> >
> > Chain POSTROUTING (policy ACCEPT)
> > target prot opt source destination
> > SNAT all -- anywhere anywhere
> > to:172.22.9.100
> > SNAT all -- anywhere anywhere
> > to:172.22.1.130
> >
> >
> > # ip a
> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
> > inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
> > inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
> > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> > state UNKNOWN qlen 1000
> > link/ether 00:50:56:a5:77:ae brd ff:ff:ff:ff:ff:ff
> > inet 192.168.8.216/22 brd 192.168.11.255 scope global eth0
> > 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> > state UNKNOWN qlen 1000
> > link/ether 00:50:56:a5:77:af brd ff:ff:ff:ff:ff:ff
> > inet 172.22.9.100/22 brd 172.22.11.255 scope global eth1:1
> > inet 172.22.8.213/22 brd 172.22.11.255 scope global secondary eth1
> > 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
> > state UNKNOWN qlen 1000
> > link/ether 00:50:56:a5:77:b0 brd ff:ff:ff:ff:ff:ff
> > inet 172.22.1.130/24 brd 172.22.1.255 scope global eth2
> >
> >
> > # cat /proc/sys/net/ipv4/ip_forward
> > 1
> > # cat /proc/sys/net/ipv4/vs/conntrack
> > 1
> > # cat /proc/sys/net/ipv4/vs/expire_nodest_conn
> > 1
> > # cat /proc/sys/net/ipv4/vs/expire_quiescent_template
> > 1
> > _______________________________________________
> > Please read the documentation before posting - it's available at:
> > http://www.linuxvirtualserver.org/
> >
> > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> > Send requests to lvs-users-request@LinuxVirtualServer.org
> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>
>
>
> --
> Regards,
>
> Malcolm Turnbull.
>
> Loadbalancer.org Ltd.
> Phone: +44 (0)870 443 8779
> http://www.loadbalancer.org/
>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] ldirectord does not transfer connections when a real server dies [ In reply to ]
The quiescent=no will only give the desired behaviour on health check
failures (it will rip the server out).
Manually setting the weight to 0 will just drain the server...

So best bet is force it to fail the negotiate check
OR
manually pull it out....



On 30 April 2013 11:38, Konstantin Boyanov <kkboyanov@gmail.com> wrote:
> Hello,
>
> First off thanks for your reply! I tried setting quiescent to NO, but the
> result was not changed - when i set the weight of one of the real servers
> to 0 and then reload the pages on the client, the connections to the dead
> server stay on the dead server. The only thing affecting the behaviour of
> the connections which I found was decreasing the KeepAliveTimeout on the
> client. But this is no solution...
>
> Do you mean that I have to _manually_ remove the server and then add it
> back in with weight of zero?
>
> About the health check - I am using the negotiate method so we can easily
> pull of real servers for maitenance. Does the quiescent=no setting means
> that I would have to manually readd each real server after maintenance?
>
> Best Regards,
> Konstantin Boyanov
>
> On Tue, Apr 30, 2013 at 12:01 PM, Malcolm Turnbull <malcolm@loadbalancer.org
>> wrote:
>
>> Konstantin,
>>
>> Easier said than done but...
>> You would need to completely remove the server from the LVS table,
>> then you can put it back in with a weight of zero.
>> This is similar to the health check behaviour when you set:
>>
>> quiescent = no
>>
>>
>>
>>
>>
>> On 30 April 2013 10:30, Konstantin Boyanov <kkboyanov@gmail.com> wrote:
>> > Hello LVS users,
>> >
>> > I am using ldirectord to load balance two IIS servers. The
>> > ldirectord.cglooks like this:
>> >
>> >
>> > autoreload = yes
>> > quiescent = yes
>> > checkinterval = 1
>> > negotiatetimeout = 2
>> > emailalertfreq = 60
>> > emailalert = Konstantin.Boyanov@mysite.com
>> > failurecount = 1
>> >
>> > virtual = 172.22.9.100:80
>> > checktimeout = 1
>> > checktype = negotiate
>> > protocol = tcp
>> > real = 172.22.1.133:80 masq 2048
>> > real = 172.22.1.134:80 masq 2048
>> > request = "alive.htm"
>> > receive = "I am not a zombie"
>> > scheduler = wrr
>> >
>> > The load balancing is working fine, the real servers are visible etc.
>> > Nevertheless I am encountering a problem with a simple test:
>> >
>> > 1. I open some connections from a client browser (IE 8) to the sites that
>> > are hosted on the real servers
>> > 2. I cange the weight of the real server which server the above
>> connections
>> > to 0 and leave only the other real server alive
>> > 3. I reload the pages to regenerate the connections
>> >
>> > What I am seeing with ipvsadm -Ln is that the connections are still on
>> the
>> > "dead" server. I have to wait up to one minute (I suppose some TCP
>> timeout
>> > from the browser-side) for them to transfer to the "living" server. And
>> If
>> > in this one minute I continue pressing the reload button the connections
>> > stay at the "dead" server and their TCP timeout counter gets restarted.
>> >
>> > So my question is: Is there a way to tell the load balancer in NAT mode
>> to
>> > terminate / redirect existing connections to a dead server *immediately*
>> > (or close to immediately)?
>> >
>> > It seems to me a blunder that a reload on the client-side can make a
>> > connection become a "zombie", e.g. be bound to a dead real server
>> although
>> > persistance is not used and the other server is ready and available.
>> >
>> > The only thing that I found affecting this timeout is changing the
>> > keepAliveTimeout in the Windows machine running the IE8 which I use for
>> the
>> > tests. When I cahnged it from the dafault value of 60 seconds to 30
>> seconds
>> > the connections could be transferred after 30 seconds. It seems to me
>> very
>> > odd that a client setting can affect the operation of a network component
>> > as the load balancer.
>> >
>> > And another thing - what is the colum named "Inactive Conenctions" in the
>> > output from ipvsadm used for? Which connections are considered inactive?
>> >
>> > And also in the output of ipvsadm i see a couple of connections with the
>> > state TIME_WAIT. What are these for?
>> >
>> > Any insight and suggestions are highly appreciated !
>> >
>> > Cheers,
>> > Konstantin
>> >
>> >
>> >
>> > P.S: Here is some more information about the configuration:
>> >
>> > # uname -a
>> > Linux 3.0.58-0.6.2-default #1 SMP Fri Jan 25 08:31:01 UTC 2013 x86_64
>> > x86_64 x86_64 GNU/Linux
>> >
>> > # ipvsadm -L
>> > IP Virtual Server version 1.2.1 (size=4096)
>> > Prot LocalAddress:Port Scheduler Flags
>> > -> RemoteAddress:Port Forward Weight ActiveConn InActConn
>> > TCP lb-mysite.com wrr
>> > -> spwfe001.mysite.com:h Masq 10 0 0
>> > -> spwfe002.mysite.com:h Masq 10 0 0
>> >
>> > # iptables -t nat -L
>> > Chain PREROUTING (policy ACCEPT)
>> > target prot opt source destination
>> >
>> > Chain INPUT (policy ACCEPT)
>> > target prot opt source destination
>> >
>> > Chain OUTPUT (policy ACCEPT)
>> > target prot opt source destination
>> >
>> > Chain POSTROUTING (policy ACCEPT)
>> > target prot opt source destination
>> > SNAT all -- anywhere anywhere
>> > to:172.22.9.100
>> > SNAT all -- anywhere anywhere
>> > to:172.22.1.130
>> >
>> >
>> > # ip a
>> > 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
>> > link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
>> > inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
>> > inet 127.0.0.2/8 brd 127.255.255.255 scope host secondary lo
>> > 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> > state UNKNOWN qlen 1000
>> > link/ether 00:50:56:a5:77:ae brd ff:ff:ff:ff:ff:ff
>> > inet 192.168.8.216/22 brd 192.168.11.255 scope global eth0
>> > 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> > state UNKNOWN qlen 1000
>> > link/ether 00:50:56:a5:77:af brd ff:ff:ff:ff:ff:ff
>> > inet 172.22.9.100/22 brd 172.22.11.255 scope global eth1:1
>> > inet 172.22.8.213/22 brd 172.22.11.255 scope global secondary eth1
>> > 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
>> > state UNKNOWN qlen 1000
>> > link/ether 00:50:56:a5:77:b0 brd ff:ff:ff:ff:ff:ff
>> > inet 172.22.1.130/24 brd 172.22.1.255 scope global eth2
>> >
>> >
>> > # cat /proc/sys/net/ipv4/ip_forward
>> > 1
>> > # cat /proc/sys/net/ipv4/vs/conntrack
>> > 1
>> > # cat /proc/sys/net/ipv4/vs/expire_nodest_conn
>> > 1
>> > # cat /proc/sys/net/ipv4/vs/expire_quiescent_template
>> > 1
>> > _______________________________________________
>> > Please read the documentation before posting - it's available at:
>> > http://www.linuxvirtualserver.org/
>> >
>> > LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
>> > Send requests to lvs-users-request@LinuxVirtualServer.org
>> > or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>>
>>
>>
>> --
>> Regards,
>>
>> Malcolm Turnbull.
>>
>> Loadbalancer.org Ltd.
>> Phone: +44 (0)870 443 8779
>> http://www.loadbalancer.org/
>>
>> _______________________________________________
>> Please read the documentation before posting - it's available at:
>> http://www.linuxvirtualserver.org/
>>
>> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
>> Send requests to lvs-users-request@LinuxVirtualServer.org
>> or go to http://lists.graemef.net/mailman/listinfo/lvs-users
>>
> _______________________________________________
> Please read the documentation before posting - it's available at:
> http://www.linuxvirtualserver.org/
>
> LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
> Send requests to lvs-users-request@LinuxVirtualServer.org
> or go to http://lists.graemef.net/mailman/listinfo/lvs-users



--
Regards,

Malcolm Turnbull.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] ldirectord does not transfer connections when a real server dies [ In reply to ]
On 30.04.2013, Konstantin Boyanov wrote:
> 1. I open some connections from a client browser (IE 8) to the sites that
> are hosted on the real servers
> 2. I cange the weight of the real server which server the above connections
> to 0 and leave only the other real server alive
> 3. I reload the pages to regenerate the connections
>
> What I am seeing with ipvsadm -Ln is that the connections are still on the
> "dead" server. I have to wait up to one minute (I suppose some TCP timeout
> from the browser-side) for them to transfer to the "living" server. And If
> in this one minute I continue pressing the reload button the connections
> stay at the "dead" server and their TCP timeout counter gets restarted.
>
> So my question is: Is there a way to tell the load balancer in NAT mode to
> terminate / redirect existing connections to a dead server *immediately*
> (or close to immediately)?

The man page for ipvsadm (the --weight parameter) helps understanding this :)

To remove a dead server "immediately", you need to remove it from the ipvs table
(ipvsadm --delete-server). This also breaks any open tcp connections,
probably resulting in "connection refused"-error messages or "broken images"
on the client.

A weight of zero essentially means "don't put new connections on this realserver,
but continue serving existing connections". This is something you're usually
using for maintenance: let the server fulfill any pending requests, and after
a few minutes, all new connections usually have shifted to other realservers.

Of course, as long as the existing connection is being used, IPVS does
recognize your connection to be "in use". If you don't like this, you'll
actively have to tell your webserver to shut down any connections (e.g.
by shutting down IIS), forcing the client to reconnect or remove
your realserver from the IPVS table (which will result in timeouts for
any open connections, but ultimately the client will be forced to
reconnect as well).

> It seems to me a blunder that a reload on the client-side can make a
> connection become a "zombie", e.g. be bound to a dead real server although
> persistance is not used and the other server is ready and available.

The maintenance usecase is usually quite obvious to understand, and
that's the "zero weight" usecase. It's for "draining" connections: new
connections won't sent to this box, but any current connections will be
served (until they're being closed).

> The only thing that I found affecting this timeout is changing the
> keepAliveTimeout in the Windows machine running the IE8 which I use for the
> tests. When I cahnged it from the dafault value of 60 seconds to 30 seconds
> the connections could be transferred after 30 seconds. It seems to me very
> odd that a client setting can affect the operation of a network component
> as the load balancer.

The server does also have similiar options to configure:
-you can turn off HTTP Keepalive completely
-IIS 7 per default uses 2 minutes as a keepalive timeout; after this
idle time, IIS will close the connection.

Of course, these options may have some (small) performance impact.

If you're turning off HTTP Keepalive completely, the browser is forced to setup a
new tcp connection, including the full tcp handshake for every object
to retrieve and the two packets for tearing down the connection.
If your website e.g. contains 40 images, you'll open and close at least 41
connetions: the first for retrieving the html-part, another 40
connections for every single image (that's over-simplified, about every browser
does open a few connections in parallel - yet much fewer than those 40).

If the network latency between client and server is high, opening that
many connections one after another may result in a noticable delay.

By using HTTP keepalive, your browser will open one connection and ask
one image after the other via this connection, without setting up new
connections and tearing down connections, the website is somehow faster.
Additionally, long-living tcp connections do automatically increase
their tcp receive window size, which may give you a little bit of extra
performance as well.

If you decrease your IIS keepalive timeout lower than the limit of your
clients, your server is likely the one who'll be initiating closing of idle
tcp connections. If your server does this frequently, you'll end up
collecting lots of connections in the TIME_WAIT state (well, usually
less than without keepalive at all, but depending on the exact setup,
this may result in some trouble opening outgoing connections).

> And another thing - what is the colum named "Inactive Conenctions" in the
> output from ipvsadm used for? Which connections are considered inactive?
>
> And also in the output of ipvsadm i see a couple of connections with the
> state TIME_WAIT. What are these for?

TCP connections do have some kind of lifecycle and many states.

-setting up a connection works by sending a few packets back and forth,
before the connection is known to be established and ready for data transfer.
Search "tcp 3way handshake", if you'd like to know more about this.
-tearing down a connection also involves sending a few packets back and
forth as well. The host initiating the closing tracks this connection as
"TIME_WAIT", the other host tracks this connection as "CLOSE_WAIT". In case some
late packets arrive for this connection, the host can handle them
more appropriately (discard them, don't mistake them for being part of
a subsequent connection). And in case the final FIN packet didn't
arrive, the WAIT-state still gives that host the idea of temporarily
not re-using this connection.
Read: if you do have sockets in a TIME_WAIT-state, your host tries not to
reuse them immediately for something else, but waits a few minutes before
reusing it. If you're closing too many connections and try to open new
outgoing connections, you may be out of usable sockets. That problem does
rarely occur on systems with either poor software or some poor connection
access pattern.

In IPVS, only established connections are counted as "active", while "inactive"
refers to connections in any other state. A few minutes after TIME_WAIT, the
connection will become "closed", its state will be forgotten and won't show
up both in ipvsadm nor netstat anymore.


Anders
--
1&1 Internet AG Expert Systems Architect (IT Operations)
Brauerstrasse 50 v://49.721.91374.0
D-76135 Karlsruhe f://49.721.91374.225

Amtsgericht Montabaur HRB 6484
Vorstände: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich,
Robert Hoffmann, Andreas Hofmann, Markus Huhn, Hans-Henning Kettler,
Dr. Oliver Mauss, Jan Oetjen, Martin Witt, Christian Würst
Aufsichtsratsvorsitzender: Michael Scheeren

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] ldirectord does not transfer connections when a real server dies [ In reply to ]
Hello,

Thanks for the detailed explanation Andreas! :)

The maintenance usecase is one of the possible scenarios I tested. The
other one is killing (hard rebootin / shutting down) one of the real
servers to simulate server failure. In this scenario the user clicking on
reload willbe landing on the dead server for as long as he grabs the phone
and starts to yell at customer support :) That is why we wanted to have the
connections terminate / relocate when a server goes down or is set a weight
of 0 (which from the ldirectord persepctive are the same things, right?).

So, when a real server goes down, how can I decrease the time in which the
web server seems to be offline for users that were connected to the dead
server? To expire the quiescent connections seems part of it, because we
are using persistence, but are there any useful configuration options for
ldirectord which can help achieve such behaviour?

Best regards,
Konstantin Boyanov
_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users