Mailing List Archive

[lvs-users] ECMP Active-Active Setup - Failover issues
Hi,

I'm trying this setup where an upstream router ECMPs to multiple
active-active LVS nodes with the exact same configuration.

--- | router | -- ECMP --> | LVS nodes | -- IPinIP --> | HAProxy nodes |

The LVS nodes are configured as such:
1. BIRD to advertise VIP to upstream router
2. ipvs setup with VIP and sh scheduler
3. ipvs uses TUN forwarding method
4. loopback interface has VIP

LVS load balances to some real servers with a tunnel interface setup.
The real servers will direct return to the client.

In one of my tests, when I stop BIRD and hence drop the route to node
A, the upstream router starts forwarding packets to node B. So far so
good.

What I noticed then is that node B will send TCP RST packets back to
the client. Instead, what I hoped to see was that
1. node B receives the packet
2. sh scheduler means that the same real server picked by node A will
also be picked by node B
3. the same real server receives the packet and the connection stays up

When I enable daemon state sync between node A and B the connection
stays up. However, state sync happens periodically and so it is still
possible for some connections to be dropped during this interval.

>From the above observation, my guess is that ipvs evaluated the packet
(perhaps in ip_vs_in()), but somehow managed it differently because
the connection does not exist in the table.

What I hope to understand is
1. is ipvs sending the RST? or is that somewhere else in the netfilter system?
2. since sh scheduler (and the recent maglev scheduler patch) are
consistent hashes, they don't actually have a state to sync (except
for purely optimization purposes). Is it reasonable to hope that we do
not need state sync to get this working at all?
3. should I be setting up this in a different way?

Forgive me if I have gaps in my knowledge of the Linux networking
stack and ipvs's hook with netfilter (and the schedulers and
connection tracking too).

Thank you.

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users
Re: [lvs-users] ECMP Active-Active Setup - Failover issues [ In reply to ]
Hello,

On Fri, 1 Jun 2018, Poh Chiat wrote:

> Hi,
>
> I'm trying this setup where an upstream router ECMPs to multiple
> active-active LVS nodes with the exact same configuration.
>
> --- | router | -- ECMP --> | LVS nodes | -- IPinIP --> | HAProxy nodes |
>
> The LVS nodes are configured as such:
> 1. BIRD to advertise VIP to upstream router
> 2. ipvs setup with VIP and sh scheduler
> 3. ipvs uses TUN forwarding method
> 4. loopback interface has VIP
>
> LVS load balances to some real servers with a tunnel interface setup.
> The real servers will direct return to the client.
>
> In one of my tests, when I stop BIRD and hence drop the route to node
> A, the upstream router starts forwarding packets to node B. So far so
> good.
>
> What I noticed then is that node B will send TCP RST packets back to
> the client. Instead, what I hoped to see was that
> 1. node B receives the packet
> 2. sh scheduler means that the same real server picked by node A will
> also be picked by node B
> 3. the same real server receives the packet and the connection stays up

What people use in such case is:

1. Enable Sloppy Mode to create connection from any non-RST TCP packet:
echo 1 > sloppy_tcp
echo 1 > sloppy_sctp

2. For active-active setups do not use sync but:
2.1 Maglev ("mh" scheduler in 4.18): non-persistent connections,
MH is better than SH when real servers are added/removed from
service
2.2 SH scheduler (not better than MH): non-persistent connections

3. For persistence use sync only for persistent templates,
any scheduler can be used:
# sync only templates (requires sloppy_tcp=1):
echo 1 > sync_persist_mode
# How often to sync the templates, in seconds:
echo 10 > sync_refresh_period
echo 0 > sync_retries
echo "0 0" > sync_threshold

> When I enable daemon state sync between node A and B the connection
> stays up. However, state sync happens periodically and so it is still
> possible for some connections to be dropped during this interval.
>
> >From the above observation, my guess is that ipvs evaluated the packet
> (perhaps in ip_vs_in()), but somehow managed it differently because
> the connection does not exist in the table.
>
> What I hope to understand is
> 1. is ipvs sending the RST? or is that somewhere else in the netfilter system?

IPVS when not in sloppy mode simply skips the packet
and it is delivered to the local stack where RST is generated
by TCP.

> 2. since sh scheduler (and the recent maglev scheduler patch) are
> consistent hashes, they don't actually have a state to sync (except
> for purely optimization purposes). Is it reasonable to hope that we do
> not need state sync to get this working at all?

Yes, as long as you do not need persistence.

> 3. should I be setting up this in a different way?

May be you are missing only the sloppy_tcp setting...

> Forgive me if I have gaps in my knowledge of the Linux networking
> stack and ipvs's hook with netfilter (and the schedulers and
> connection tracking too).

No problem, we are still learning too :)

Regards

--
Julian Anastasov <ja@ssi.bg>

_______________________________________________
Please read the documentation before posting - it's available at:
http://www.linuxvirtualserver.org/

LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
Send requests to lvs-users-request@LinuxVirtualServer.org
or go to http://lists.graemef.net/mailman/listinfo/lvs-users