Mailing List Archive

hello, i have a question about heartbeat, can somebody give me a reply? thank you
When master is down, can the time of switching backup to master be shorter than a second?

The following is the switching time about keepalived. Can Heartbeat be more excellent?

No, it can't be shorter than 1 second. This is a VRRP protocol limitation. Most enterprise-class VRRP implementation are using a BFD protocol to achieve sub-second fault detection time. I've created proof-of-concept BFD subsystem for keepalived some time ago - https://github.com/ivoronin/keepalived/tree/bfd . Unfortunately it is not well tested and not suitable for production use.

-------------------------------------------------------------------------------------------------------------------------------------
????????????????????????????????????????
????????????????????????????????????????
????????????????????????????????????????
???
This e-mail and its attachments contain confidential information from H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!
Re: hello, i have a question about heartbeat, can somebody give me a reply? thank you [ In reply to ]
You may prefer to use the clusterlabs mailing list. This list is being
phased out.

On 16/10/15 05:20 AM, Shilu wrote:
> When master is down, can the time of switching backup to master be
> shorter than a second?

Not safely, no.

In HA, if a node is declared dead, it needs to be fenced/stonith'ed
before its services are recovered. Not doing this can lead to a
split-brain. The process of fencing a node takes time; Exactly how much
depends on the device or method you are using. IPMI fencing, one of the
most common types, takes a few seconds.

Also, if you shorten the time it takes to declare a node dead, you
increase the chance of having a node declared dead when it's not.

> The following is the switching time about keepalived. Can Heartbeat be
> more excellent?

Heartbeat is long deprecated. The modern stack is Corosync + Pacemaker.

Here is why:

https://alteeve.ca/w/History_of_HA_Clustering

> No, it can't be shorter than 1 second. This is a VRRP protocol
> limitation. Most enterprise-class VRRP implementation are using a BFD
> protocol to achieve sub-second fault detection time. I've created
> proof-of-concept BFD subsystem for keepalived some time ago -
> https://github.com/ivoronin/keepalived/tree/bfd . Unfortunately it is
> not well tested and not suitable for production use.

I've never been a big fan of keepalived because it does not fence. It
assumes that the peer is dead, and when people test it, they kill the
node so in those cases it was a safe assumption. In the real world
though, losing access to a node is no guarantee that it is actually
failed. So people think they're safe, until they're not.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/