Mailing List Archive

Failover inconsistent when a node is disconnected from the network forcefully
Hi,

We got the following problem when we were testing wackamole in our Lab.
We observed that the failover is not consistent among the nodes.

We had a 4 node(A,B,C,D) configuration.

We have a Single Virtual IP configured.

Initially B acquired the VIP.Through the wackatrl tool we tested who has
acquired the VIP. All the 4 nodes A,B,C and D showed that B has acquired

We tried to unplug B from the LAN(by pulling out the network cable). Now
A has acquired the VIP. A, C and D tell that A has acquired the VIP.

Now B is connected back to the LAN . When B was connected 'wackatrl -l'
in B tells that B has acquired the VIP, whereas A,C and D tell that A
has acquired the VIP. When we tried to reach the VIP through putty via
ssh. It got connected to A.

When A was disconnected from LAN, B,C and D tell that B has acquired the
VIP. But when we tried to connect the VIP through putty via ssh, and the
connection timed out.

Can someone throw some light on what would be the cause?

Thanks,
Suganthi



The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.

www.wipro.com
Re: Failover inconsistent when a node is disconnected from the network forcefully [ In reply to ]
On Jul 16, 2007, at 8:28 AM, <suganthi.arumugam@wipro.com> wrote:

> Hi,
>
> We got the following problem when we were testing wackamole in our
> Lab. We observed that the failover is not consistent among the nodes.
>
> We had a 4 node(A,B,C,D) configuration.
>
> We have a Single Virtual IP configured.
>
> Initially B acquired the VIP.Through the wackatrl tool we tested
> who has acquired the VIP. All the 4 nodes A,B,C and D showed that B
> has acquired
>
> We tried to unplug B from the LAN(by pulling out the network
> cable). Now A has acquired the VIP. A, C and D tell that A has
> acquired the VIP.

As B did not crash. B believe A,B,C went down. B has the address.
all others think A has the address, A arps for it.

> Now B is connected back to the LAN . When B was connected 'wackatrl
> -l' in B tells that B has acquired the VIP, whereas A,C and D tell
> that A has acquired the VIP. When we tried to reach the VIP through
> putty via ssh. It got connected to A.

B _should_ learn soon that A has the address. If you wait a while.

> When A was disconnected from LAN, B,C and D tell that B has
> acquired the VIP. But when we tried to connect the VIP through
> putty via ssh, and the connection timed out.

I think you didn't wait long enough, so B had the address before, and
C and D learn it, B keeps it, no need to re-arp. So, no one on the
network knows B has the address.

> Can someone throw some light on what would be the cause?

If B actually crashes, when it comes back up it will work fine.
However, you simulated a network partition, not a node failure. It
would be good it on membership change nodes would re-arp for their
machines. "just in case" That would solve this particular problem.

// Theo Schlossnagle
// Principal@OmniTI: http://omniti.com
// Esoteric Curio: http://www.lethargy.org/~jesus/


_______________________________________________
wackamole-users mailing list
wackamole-users@lists.backhand.org
http://lists.backhand.org/mailman/listinfo/wackamole-users