Mailing List Archive

Antw: Re: heartbeat failover
We have a server with a "network traffic light" on the front. With
corosync/pacemaker the light is constantly flickering, even if the cluster
"does nothing".
So I guess it's normal. If your firewall has a problem with that, I can tell
you that traffic will increase if your configuration grows and if the cluster
actually does something. I don't know if "communicate like mad" was a design
concept, but on out HP-UX Service Guard cluster there was only the heartbeat
traffic (configured to be one packet every 7 seconds (for good luck reasons
;-)) when the cluster was idle.

I noticed that with cLVM (part of the "log like mad family") doing mirroring
the cluster communikation frequently breaks down. When you raed abou the TOTEM
design goals, you might conclude that either the implementation is broken, or
the configuration you use is. Maybe the system is just too complex.

Here are two examples; first the "log like mad", then the corosync
communication problems:

---
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:09 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffd1d4, len=31, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:09 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:09 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x5000000 XID 1 on node 230314ac
Jan 23 13:51:09 o1 lvm[23717]: Syncing device names
Jan 23 13:51:09 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffd524, len=31, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x5000000 XID 6 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 29
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffd874, len=29, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command LOCK_VG (0x33) for
clientid 0x5000000 XID 8 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: do_lock_vg: resource 'P_#global', cmd = 0x4
LCK_VG (WRITE|VG), flags = 0x4 ( DMEVENTD_MONITOR ), critical_section = 0
Jan 23 13:51:10 o1 lvm[23717]: Refreshing context
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffdbc4, len=31, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x5000000 XID 10 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffdf14, len=31, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x5000000 XID 13 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 29
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffe264, len=29, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command LOCK_VG (0x33) for
clientid 0x5000000 XID 15 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: do_lock_vg: resource 'P_#global', cmd = 0x6
LCK_VG (UNLOCK|VG), flags = 0x4 ( DMEVENTD_MONITOR ), critical_section = 0
Jan 23 13:51:10 o1 lvm[23717]: Refreshing context
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 84
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffe5b4, len=84, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command LOCK_QUERY (0x34) for
clientid 0x5000000 XID 17 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: do_lock_query: resource
'o1ZL06tdiALQl2gL2U58uJu6euJzwvz0Wn6yIpFEBd7mFhTuoxQ2905C35cRf7c2', mode 1
(CR)
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 21
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 21
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 21
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffe94c, len=31, csid=0x7ffff40294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x5000000 XID 19 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
---

Obviously the cLVM mirroring causes the corosync problems:

---
Jan 23 13:51:14 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:15 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:18 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:51:19 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:51:22 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:23 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:26 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:51:27 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:51:27 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:51:30 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:31 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:31 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:37 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:38 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:44 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:45 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:45 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:45 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:51 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:52 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:51:57 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:51:58 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:01 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:52:02 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:52:13 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:52:14 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:18 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:52:19 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:19 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:23 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:52:24 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:52:28 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:52:29 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:32 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:52:33 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:52:37 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:52:38 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:41 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:52:42 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:52:42 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:52:45 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:52:46 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:49 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:52:50 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:52:53 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:52:54 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:52:54 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:02 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:03 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:03 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:06 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:53:07 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:10 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:11 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:14 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:15 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:18 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:53:19 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:19 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:19 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:22 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:23 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:23 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:26 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:53:27 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:27 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:27 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:30 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:31 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:31 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:31 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:35 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:36 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:36 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:36 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:39 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:53:40 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:40 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:53:43 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 13:53:44 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:44 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:44 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 13:53:56 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 13:53:57 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 13:54:05 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
[...]
Jan 23 15:31:00 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 15:31:03 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 15:31:04 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 15:31:04 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 15:31:04 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 15:31:07 o1 corosync[13822]: [TOTEM ] Marking ringid 0 interface
172.20.3.31 FAULTY
Jan 23 15:31:08 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 15:31:08 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 15:31:08 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 0
Jan 23 15:31:12 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 15:31:13 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
Jan 23 15:31:17 o1 corosync[13822]: [TOTEM ] Marking ringid 1 interface
192.168.0.61 FAULTY
Jan 23 15:31:18 o1 corosync[13822]: [TOTEM ] Automatically recovered ring 1
(now mirroring was obviously finished)
---

Regards,
Ulrich

>>> <Bjoern.Becker@easycash.de> schrieb am 23.01.2014 um 17:45 in Nachricht
<E1F012C748BE4A4EB59827C3F1786CC92194F399@v-msx-5-prd>:
> Uhhh..I got the same configuration as the example config you sent me now.
> But I cause high cpu load on our cisco asa firewall..
>
> I guess this traffic is not normal?
>
> root@node01:/etc/corosync# tcpdump dst port 5405
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
> 17:41:06.093140 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:06.097327 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:06.113418 IP node01.52580 > node02.5405: UDP, length 82
> 17:41:06.286517 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:06.291095 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:06.480221 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:06.484520 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:06.500608 IP node01.52580 > node02.5405: UDP, length 82
> 17:41:06.673721 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:06.678654 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:06.867757 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:06.872492 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:06.888576 IP node01.52580 > node02.5405: UDP, length 82
> 17:41:07.061664 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:07.066304 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:07.255409 IP node01.5405 > node02.5405: UDP, length 70
> 17:41:07.260512 IP node02.5405 > node01.5405: UDP, length 70
> 17:41:07.275601 IP node01.52580 > node02.5405: UDP, length 82
>
> Mit freundlichen Grüßen / Best regards
> Björn
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces@lists.linux-ha.org
> [mailto:linux-ha-bounces@lists.linux-ha.org] Im Auftrag von Becker, Björn
> Gesendet: Donnerstag, 23. Januar 2014 17:28
> An: linux-ha@lists.linux-ha.org
> Betreff: Re: [Linux-HA] heartbeat failover
>
> Hi Lukas,
>
> thank you. Well, I've to wait for some firewall changes for 5405 UDP.
>
> But I'm not sure if it's correct what I'm doing.
>
> Node1:
> interface {
> member {
> memberaddr: 10.128.61.60 # node 1
> }
> member {
> memberaddr: 10.128.62.60 # node 2
> }
> # The following values need to be set based on your environment

> ringnumber: 0
> bindnetaddr: 10.128.61.0
> mcastport: 5405
> }
> transport: udpu
>
> Node2:
> interface {
> member {
> memberaddr: 10.128.61.60
> }
> member {
> memberaddr: 10.128.62.60
> }
> # The following values need to be set based on your
> environment
> ringnumber: 0
> bindnetaddr: 10.128.62.0
> mcastport: 5405
> }
> transport: udpu
>
> Something seems to be wrong defenitly. My firewall was on very high load...
>
>
> Mit freundlichen Grüßen / Best regards
> Björn
>
>
> -----Ursprüngliche Nachricht-----
> Von: linux-ha-bounces@lists.linux-ha.org
> [mailto:linux-ha-bounces@lists.linux-ha.org] Im Auftrag von Lukas Grossar
> Gesendet: Donnerstag, 23. Januar 2014 16:54
> An: linux-ha@lists.linux-ha.org
> Betreff: Re: [Linux-HA] heartbeat failover
>
> Hi Björn
>
> Here ist an example how you can setup corosync to use Unicast UDP:
> https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.ud

> pu
>
> The important parts are "transport: udpu" and that you need to configure
> every member manually using "memberaddr: 10.16.35.115".
>
> Best regards
> Lukas
>
>
> On Thu, 23 Jan 2014 13:36:22 +0000
> <Bjoern.Becker@easycash.de> wrote:
>
>> Hello,
>>
>> thanks a lot! I didn't know about heartbeat is almost deprecated.
>> I'll try corosync and pacemaker, but I read that corosync need to run
>> over multicast. Unfortunately, I can't use multicast in my network.
>> Do you know any other possibility, I can't find anything that corosync
>> can run without multicast?
>>
>>
>> Best regards
>> Björn
>>
>> -----Ursprüngliche Nachricht-----
>>
>> Von: linux-ha-bounces@lists.linux-ha.org
>> [mailto:linux-ha-bounces@lists.linux-ha.org] Im Auftrag von Digimer
>> Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing
>> list Betreff: Re: [Linux-HA] heartbeat failover
>>
>> On 22/01/14 10:44 AM, Bjoern.Becker@easycash.de wrote:
>> > Hello,
>> >
>> > I got a drbd+nfs+heartbeat setup and in general it's working. But it
>> > takes to long to failover and I try to tune this.
>> >
>> > When node 1 is active and I shutdown node 2, then node 1 try to
>> > activate the cluster. The problem is, node 1 already got the primary
>> > role and when re-activating it take time again and during this the
>> > nfs share isn't available.
>> >
>> > Is it possible to disable this? Node 1 don't have to do anything if
>> > it's already in primary role and the second node is not available.
>> >
>> > Mit freundlichen Grüßen / Best regards Björn
>>
>> If this is a new project, I strongly recommend switching out heartbeat
>> for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed
>> in a long time and there are no plans to restart development in the
>> future. Everything (even RH) is standardizing on the
>> corosync+pacemaker stack, so it has the most vibrant community as
>> well.
>>
>
>
>
> --
> Adfinis SyGroup AG
> Lukas Grossar, System Engineer
>
> Keltenstrasse 98 | CH-3018 Bern
> Tel. 031 550 31 11 | Direkt 031 550 31 06
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: heartbeat failover [ In reply to ]
On 2014-01-24T08:16:03, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de> wrote:

> We have a server with a "network traffic light" on the front. With
> corosync/pacemaker the light is constantly flickering, even if the cluster
> "does nothing".
> So I guess it's normal.

Yes. Totem and other components have on-going health checks, so there's
constant background traffic. But nothing major.

> you that traffic will increase if your configuration grows and if the cluster
> actually does something. I don't know if "communicate like mad" was a design
> concept, but on out HP-UX Service Guard cluster there was only the heartbeat
> traffic (configured to be one packet every 7 seconds (for good luck reasons
> ;-)) when the cluster was idle.

HP SG runs an entirely different protocol and is an entirely different
architecture.

> Obviously the cLVM mirroring causes the corosync problems:

I've never, ever observed this during testing and my continuously
running cLVM RAID setup. Looks like an overload during resync, though.
You need faster NICs ;-)


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: Antw: Re: heartbeat failover [ In reply to ]
>>> Lars Marowsky-Bree <lmb@suse.com> schrieb am 24.01.2014 um 14:24 in
Nachricht
<20140124132433.GH18969@suse.de>:
> On 2014-01-24T08:16:03, Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
wrote:
>
>> We have a server with a "network traffic light" on the front. With
>> corosync/pacemaker the light is constantly flickering, even if the cluster
>> "does nothing".
>> So I guess it's normal.
>
> Yes. Totem and other components have on-going health checks, so there's
> constant background traffic. But nothing major.
>
>> you that traffic will increase if your configuration grows and if the
> cluster
>> actually does something. I don't know if "communicate like mad" was a
design
>> concept, but on out HP-UX Service Guard cluster there was only the
heartbeat
>> traffic (configured to be one packet every 7 seconds (for good luck
reasons
>> ;-)) when the cluster was idle.
>
> HP SG runs an entirely different protocol and is an entirely different
> architecture.
>
>> Obviously the cLVM mirroring causes the corosync problems:
>
> I've never, ever observed this during testing and my continuously
> running cLVM RAID setup. Looks like an overload during resync, though.
> You need faster NICs ;-)

What we do (in the test environment) is mirror a 300GB cLVM LV over iSCSI to
two distict disk arrays. iSCSI uses dedicated 1Gb NICs in a 2x2x2x2 redundant
way, and the cluster communicates over different 1Gb NICs.

On the production system we have 4Gb and 8Gb FC-links to the disk arrays, and
the cluster communicates over dedicated NICs also.

In my experience if the mirrored LV is small (1GB or so) you don't notice the
problems; you need a big LV that is mirrored while other things are going
on...

>
>
> Regards,
> Lars
>
> --
> Architect Storage/HA
> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer,

> HRB 21284 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems