Mailing List Archive

SBD flipping between Pacemaker: UNHEALTHY and OK
Has anyone seen this? Do you know what might be causing the flapping?

Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
/dev/mapper/qa-xen-sbd
Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
uuid: ae835596-3d26-4681-ba40-206b4d51149b
Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
quorum check enabled
Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
cluster ...
Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
/dev/watchdog
Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
seconds.
Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
cluster ...
Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check: UNHEALTHY
Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
you are missingo cluster configuration and sbd configuration and multipath
config


2014-04-22 20:21 GMT+02:00 Tom Parker <tparker@cbnco.com>:

> Has anyone seen this? Do you know what might be causing the flapping?
>
> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
> /dev/mapper/qa-xen-sbd
> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
> uuid: ae835596-3d26-4681-ba40-206b4d51149b
> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
> quorum check enabled
> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
> cluster ...
> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
> /dev/watchdog
> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
> seconds.
> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> cluster ...
> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> UNHEALTHY
> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
I have attached the config files to this e-mail. The sbd dump is below

[LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
==Dumping header on disk /dev/mapper/qa-xen-sbd
Header version : 2.1
UUID : ae835596-3d26-4681-ba40-206b4d51149b
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 45
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 90
==Header on disk /dev/mapper/qa-xen-sbd is dumped

On 22/04/14 02:30 PM, emmanuel segura wrote:
> you are missingo cluster configuration and sbd configuration and multipath
> config
>
>
> 2014-04-22 20:21 GMT+02:00 Tom Parker <tparker@cbnco.com>:
>
>> Has anyone seen this? Do you know what might be causing the flapping?
>>
>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
>> /dev/mapper/qa-xen-sbd
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
>> uuid: ae835596-3d26-4681-ba40-206b4d51149b
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
>> quorum check enabled
>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
>> cluster ...
>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
>> /dev/watchdog
>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
>> seconds.
>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
>> cluster ...
>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>> UNHEALTHY
>> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
>> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
the first thing, you are using no_path_retry in wrong way in your
multipath, try to read this
http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html


2014-04-22 20:41 GMT+02:00 Tom Parker <tparker@cbnco.com>:

> I have attached the config files to this e-mail. The sbd dump is below
>
> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
> ==Dumping header on disk /dev/mapper/qa-xen-sbd
> Header version : 2.1
> UUID : ae835596-3d26-4681-ba40-206b4d51149b
> Number of slots : 255
> Sector size : 512
> Timeout (watchdog) : 45
> Timeout (allocate) : 2
> Timeout (loop) : 1
> Timeout (msgwait) : 90
> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
>
> On 22/04/14 02:30 PM, emmanuel segura wrote:
> > you are missingo cluster configuration and sbd configuration and
> multipath
> > config
> >
> >
> > 2014-04-22 20:21 GMT+02:00 Tom Parker <tparker@cbnco.com>:
> >
> >> Has anyone seen this? Do you know what might be causing the flapping?
> >>
> >> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
> >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
> >> /dev/mapper/qa-xen-sbd
> >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
> >> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
> >> uuid: ae835596-3d26-4681-ba40-206b4d51149b
> >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
> >> quorum check enabled
> >> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
> >> cluster ...
> >> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
> >> /dev/watchdog
> >> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
> >> seconds.
> >> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> >> cluster ...
> >> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
> >> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> >> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> >> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >> UNHEALTHY
> >> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
> >> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >
> >
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
ok. I have fixed that to be no_path_retry fail but I don't think this
has anything to do with the errors I am seeing.

They seem to be related to sbd's link with my cluster, not with disk I/O

Tom

On 23/04/14 03:11 AM, emmanuel segura wrote:
> the first thing, you are using no_path_retry in wrong way in your
> multipath, try to read this
> http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html
>
>
> 2014-04-22 20:41 GMT+02:00 Tom Parker <tparker@cbnco.com>:
>
>> I have attached the config files to this e-mail. The sbd dump is below
>>
>> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
>> ==Dumping header on disk /dev/mapper/qa-xen-sbd
>> Header version : 2.1
>> UUID : ae835596-3d26-4681-ba40-206b4d51149b
>> Number of slots : 255
>> Sector size : 512
>> Timeout (watchdog) : 45
>> Timeout (allocate) : 2
>> Timeout (loop) : 1
>> Timeout (msgwait) : 90
>> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
>>
>> On 22/04/14 02:30 PM, emmanuel segura wrote:
>>> you are missingo cluster configuration and sbd configuration and
>> multipath
>>> config
>>>
>>>
>>> 2014-04-22 20:21 GMT+02:00 Tom Parker <tparker@cbnco.com>:
>>>
>>>> Has anyone seen this? Do you know what might be causing the flapping?
>>>>
>>>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
>>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
>>>> /dev/mapper/qa-xen-sbd
>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
>>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device /dev/mapper/qa-xen-sbd
>>>> uuid: ae835596-3d26-4681-ba40-206b4d51149b
>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected, AIS
>>>> quorum check enabled
>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
>>>> cluster ...
>>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
>>>> /dev/watchdog
>>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
>>>> seconds.
>>>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
>>>> cluster ...
>>>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
>>>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
>>>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
>>>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>> UNHEALTHY
>>>> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
>>>> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>
>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
what do you mean with link?


2014-04-23 15:23 GMT+02:00 Tom Parker <tparker@cbnco.com>:

> ok. I have fixed that to be no_path_retry fail but I don't think this
> has anything to do with the errors I am seeing.
>
> They seem to be related to sbd's link with my cluster, not with disk I/O
>
> Tom
>
> On 23/04/14 03:11 AM, emmanuel segura wrote:
> > the first thing, you are using no_path_retry in wrong way in your
> > multipath, try to read this
> > http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html
> >
> >
> > 2014-04-22 20:41 GMT+02:00 Tom Parker <tparker@cbnco.com>:
> >
> >> I have attached the config files to this e-mail. The sbd dump is below
> >>
> >> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
> >> ==Dumping header on disk /dev/mapper/qa-xen-sbd
> >> Header version : 2.1
> >> UUID : ae835596-3d26-4681-ba40-206b4d51149b
> >> Number of slots : 255
> >> Sector size : 512
> >> Timeout (watchdog) : 45
> >> Timeout (allocate) : 2
> >> Timeout (loop) : 1
> >> Timeout (msgwait) : 90
> >> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
> >>
> >> On 22/04/14 02:30 PM, emmanuel segura wrote:
> >>> you are missingo cluster configuration and sbd configuration and
> >> multipath
> >>> config
> >>>
> >>>
> >>> 2014-04-22 20:21 GMT+02:00 Tom Parker <tparker@cbnco.com>:
> >>>
> >>>> Has anyone seen this? Do you know what might be causing the flapping?
> >>>>
> >>>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
> >>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
> >>>> /dev/mapper/qa-xen-sbd
> >>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
> >>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device
> /dev/mapper/qa-xen-sbd
> >>>> uuid: ae835596-3d26-4681-ba40-206b4d51149b
> >>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected,
> AIS
> >>>> quorum check enabled
> >>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
> >>>> cluster ...
> >>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
> >>>> /dev/watchdog
> >>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
> >>>> seconds.
> >>>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> >>>> cluster ...
> >>>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right
> now.
> >>>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> >>>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> >>>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> >>>> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
> >>>> UNHEALTHY
> >>>> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
> >>>> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
> >>>>
> >>>> _______________________________________________
> >>>> Linux-HA mailing list
> >>>> Linux-HA@lists.linux-ha.org
> >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>> See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> Linux-HA@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >
> >
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
esta es mi vida e me la vivo hasta que dios quiera
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
SDB has a connection to pacemaker to establish overall cluster health
(the -P flag). This seems to be where the problem is. I just don't
know what the problem might be.

On 23/04/14 11:32 AM, emmanuel segura wrote:
> what do you mean with link?
>
>
> 2014-04-23 15:23 GMT+02:00 Tom Parker <tparker@cbnco.com>:
>
>> ok. I have fixed that to be no_path_retry fail but I don't think this
>> has anything to do with the errors I am seeing.
>>
>> They seem to be related to sbd's link with my cluster, not with disk I/O
>>
>> Tom
>>
>> On 23/04/14 03:11 AM, emmanuel segura wrote:
>>> the first thing, you are using no_path_retry in wrong way in your
>>> multipath, try to read this
>>> http://www.novell.com/documentation/oes2/clus_admin_lx/data/bl9ykz6.html
>>>
>>>
>>> 2014-04-22 20:41 GMT+02:00 Tom Parker <tparker@cbnco.com>:
>>>
>>>> I have attached the config files to this e-mail. The sbd dump is below
>>>>
>>>> [LIVE] qaxen1:~ # sbd -d /dev/mapper/qa-xen-sbd dump
>>>> ==Dumping header on disk /dev/mapper/qa-xen-sbd
>>>> Header version : 2.1
>>>> UUID : ae835596-3d26-4681-ba40-206b4d51149b
>>>> Number of slots : 255
>>>> Sector size : 512
>>>> Timeout (watchdog) : 45
>>>> Timeout (allocate) : 2
>>>> Timeout (loop) : 1
>>>> Timeout (msgwait) : 90
>>>> ==Header on disk /dev/mapper/qa-xen-sbd is dumped
>>>>
>>>> On 22/04/14 02:30 PM, emmanuel segura wrote:
>>>>> you are missingo cluster configuration and sbd configuration and
>>>> multipath
>>>>> config
>>>>>
>>>>>
>>>>> 2014-04-22 20:21 GMT+02:00 Tom Parker <tparker@cbnco.com>:
>>>>>
>>>>>> Has anyone seen this? Do you know what might be causing the flapping?
>>>>>>
>>>>>> Apr 21 22:03:03 qaxen6 sbd: [12962]: info: Watchdog enabled.
>>>>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Servant starting for device
>>>>>> /dev/mapper/qa-xen-sbd
>>>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Monitoring Pacemaker health
>>>>>> Apr 21 22:03:03 qaxen6 sbd: [12973]: info: Device
>> /dev/mapper/qa-xen-sbd
>>>>>> uuid: ae835596-3d26-4681-ba40-206b4d51149b
>>>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Legacy plug-in detected,
>> AIS
>>>>>> quorum check enabled
>>>>>> Apr 21 22:03:03 qaxen6 sbd: [12974]: info: Waiting to sign in with
>>>>>> cluster ...
>>>>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: notice: Using watchdog device:
>>>>>> /dev/watchdog
>>>>>> Apr 21 22:03:04 qaxen6 sbd: [12971]: info: Set watchdog timeout to 45
>>>>>> seconds.
>>>>>> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
>>>>>> cluster ...
>>>>>> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right
>> now.
>>>>>> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
>>>>>> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 21 22:03:09 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
>>>>>> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 21 22:15:01 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 21 22:16:37 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 21 22:25:08 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 21 22:26:44 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 21 22:26:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 21 22:39:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 21 22:39:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 21 22:42:44 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 21 22:42:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 01:36:24 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 01:36:24 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 01:36:34 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 01:36:34 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 06:53:15 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 06:53:15 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 06:54:03 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 06:54:03 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 09:57:21 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 09:57:21 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 09:58:12 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 09:58:12 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 10:59:49 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 10:59:49 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 11:00:41 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 11:00:41 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 11:50:55 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 11:50:55 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 11:51:06 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 11:51:06 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 13:09:12 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 13:09:12 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 13:09:35 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 13:09:35 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 13:31:35 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 13:31:35 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 13:31:44 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 13:31:44 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 13:32:52 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 13:32:52 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 13:33:01 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 13:33:01 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 13:44:39 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 13:44:39 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 13:44:47 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 13:44:47 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>> Apr 22 14:07:42 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
>>>>>> Apr 22 14:07:42 qaxen6 sbd: [12971]: WARN: Pacemaker health check:
>>>>>> UNHEALTHY
>>>>>> Apr 22 14:07:51 qaxen6 sbd: [12974]: info: Node state: online
>>>>>> Apr 22 14:07:51 qaxen6 sbd: [12971]: info: Pacemaker health check: OK
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-HA mailing list
>>>>>> Linux-HA@lists.linux-ha.org
>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>>>> See also: http://linux-ha.org/ReportingProblems
>>>>>>
>>>> _______________________________________________
>>>> Linux-HA mailing list
>>>> Linux-HA@lists.linux-ha.org
>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>>> See also: http://linux-ha.org/ReportingProblems
>>>>
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>
>

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Re: SBD flipping between Pacemaker: UNHEALTHY and OK [ In reply to ]
On 2014-04-22T14:21:33, Tom Parker <tparker@cbnco.com> wrote:

Hi Tom,

> Has anyone seen this? Do you know what might be causing the flapping?

No, I've never seen this.

> Apr 21 22:03:04 qaxen6 sbd: [12974]: info: Waiting to sign in with
> cluster ...

So it connected fine. This is the process maintaining the pcmk
connection, so the others can be disregarded.

> Apr 21 22:03:06 qaxen6 sbd: [12974]: info: We don't have a DC right now.
> Apr 21 22:03:08 qaxen6 sbd: [12974]: WARN: Node state: UNKNOWN
> Apr 21 22:03:09 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:03:10 qaxen6 sbd: [12974]: WARN: Node state: pending
> Apr 21 22:03:11 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:15:01 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!
> Apr 21 22:16:37 qaxen6 sbd: [12974]: info: Node state: online
> Apr 21 22:25:08 qaxen6 sbd: [12974]: WARN: AIS: Quorum outdated!

Is this all that is happening here?

Judging from this, there should be an unstable pacemaker cluster to go
with this.

Are there any crmd/corosync etc messages?


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems