Mailing List Archive: Re: more failover detail...

Re: more failover detail...

Dec 6, 2000, 10:24 AM

Post #1 of 5 (939 views)

On Thu, Nov 16, 2000 at 03:36:54PM +0000, Stephen Rowles wrote:
> I am running NAT LVS. This is used for a compute cluster and so has
> permanent connections almost 24/7.
>
> Currently there is not redundancy / failover provision for the director, so
> if it dies (for whatever reason) the cluster is out of action. I know that
> heartbeat can be used to enable another director if the main director
> dies... however, what will happen to the current connections? I assume that
> they will be lost as the failover director will not have a list of the
> current connections to manage.

This is correct.

> Is there anyway that the failover director can keep track of the current
> connections so that when it takes over the IP address, the LVS NAT
> forwarding can be taken over as well?
>
> I know it's a long shot but I thought it was worth a try.

Essentially what is needed is for information about the affinity for a
connection for a real server to be communicated to the standby. The trick
is that this has to be done without impacting on performance. Because if
this the information would most likely need to be communicated to the
standby in a non-blocking fashion. This is likely to create some
interesting race conditions. What if the active Linux Director crashes just
as a connection is established but the communitcation of this to the stanby
is still pending (possibly failed and is in the process of a retransmit).

Just some food for thought.

--
Horms

RE: more failover detail... [ In reply to ]

Steve.Gonczi at networkengines

Dec 11, 2000, 6:26 PM

Post #2 of 5 (930 views)

Permalink

IMHO there is no way to do this in a bullet-proof fashion.
The best bet is to minimize the impact of a failover.
I.e. any updates would be sent on a best effort basis,
without any attempts for guaranteed delivery.
Some connections will still get hosed, but you can
minimize the number.

The speed of lvs makes any other approach impractical.

>Essentially what is needed is for information about the affinity for a
>connection for a real server to be communicated to the standby. The trick
>is that this has to be done without impacting on performance.

/sG

RE: more failover detail... [ In reply to ]

spr at ecs

Dec 12, 2000, 2:29 AM

Post #3 of 5 (933 views)

Permalink

At 20:26 11/12/2000 -0500, you wrote:
>IMHO there is no way to do this in a bullet-proof fashion.
>The best bet is to minimize the impact of a failover.
>I.e. any updates would be sent on a best effort basis,
>without any attempts for guaranteed delivery.
>Some connections will still get hosed, but you can
>minimize the number.
>
>The speed of lvs makes any other approach impractical.

This would seem like the best approach for my particular problem. My
compute cluster does not deal with a high connection rate, but with fewer
long term connections. A best effort (possibly just UDP type packets)
transmission of the connections to the failover director would probably be
sufficient. The idea would be to lose as few connections as possible, I
think that losing one or two connections in the event of a failure would be
brilliant. In a situation where we lost the director and one node, losing
only that nodes connections and possibly one or two other connections would
be a massive improvement over losing the entire clusters connections.

> >Essentially what is needed is for information about the affinity for a
> >connection for a real server to be communicated to the standby. The trick
> >is that this has to be done without impacting on performance.

Yep, that pretty much sums up what I was looking for....

>/sG
>
>_______________________________________________
>LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
>Send requests to lvs-users-request@LinuxVirtualServer.org
>or go to http://www.in-addr.de/mailman/listinfo/lvs-users

Steve.

----------------------------------------------------------------------------
Going to church doesn't make you a Christian any more than going to a garage
makes you a mechanic.

Re: RE: more failover detail... [ In reply to ]

lmb at suse

Dec 12, 2000, 2:33 AM

Post #4 of 5 (932 views)

Permalink

On 2000-12-12T09:29:48,
Stephen Rowles <spr@ecs.soton.ac.uk> said:

> This would seem like the best approach for my particular problem. My
> compute cluster does not deal with a high connection rate, but with fewer
> long term connections. A best effort (possibly just UDP type packets)
> transmission of the connections to the failover director would probably be
> sufficient. The idea would be to lose as few connections as possible, I
> think that losing one or two connections in the event of a failure would be
> brilliant.

If your application can cope with losing one connection - which potentially
translates to a client being able with having to reconnect to the server - the
probability is high that you could cope with losing all connections and having
the client reconnect.

Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
Development HA

--
Perfection is our goal, excellence will be tolerated. -- J. Yahl

Re: RE: more failover detail... [ In reply to ]

spr at ecs

Dec 12, 2000, 2:47 AM

Post #5 of 5 (933 views)

Permalink

At 10:33 12/12/2000 +0100, you wrote:
>On 2000-12-12T09:29:48,
> Stephen Rowles <spr@ecs.soton.ac.uk> said:
>
> > This would seem like the best approach for my particular problem. My
> > compute cluster does not deal with a high connection rate, but with fewer
> > long term connections. A best effort (possibly just UDP type packets)
> > transmission of the connections to the failover director would probably be
> > sufficient. The idea would be to lose as few connections as possible, I
> > think that losing one or two connections in the event of a failure
> would be
> > brilliant.
>
>If your application can cope with losing one connection - which potentially
>translates to a client being able with having to reconnect to the server - the
>probability is high that you could cope with losing all connections and having
>the client reconnect.

As my cluster is a compute cluster which load balances "long term" ssh /
telnet connections for people running "heavy" linux compute jobs, losing
one connection would mean that one user lost their connection (or only one
of their connections) which is preferable to all users losing all of their
connections - more a case of damage limitation, I'd rather have to deal
with one / two annoyed users as opposed to 50 odd!

>Sincerely,
> Lars Marowsky-Brée <lmb@suse.de>
> Development HA
>
>--
>Perfection is our goal, excellence will be tolerated. -- J. Yahl
>
>
>_______________________________________________
>LinuxVirtualServer.org mailing list - lvs-users@LinuxVirtualServer.org
>Send requests to lvs-users-request@LinuxVirtualServer.org
>or go to http://www.in-addr.de/mailman/listinfo/lvs-users

Steve.

----------------------------------------------------------------------------
Going to church doesn't make you a Christian any more than going to a garage
makes you a mechanic.