Hi All,
I'm using pgsql resource agent ( resource-agents-3.9.5-9 ) on fedora20.
I'm testing various failure patterns in a pgsql replicated cluster using it.
I think if MASTER PostgreSQL process has suspended for a long time,
then the resource monitoring and demotion timed out, and the cluster cannot failover until resume.
-----the Cluster status after master demotion timed out.-----
Online: [ server1 server2 ]
Master/Slave Set: msPostgresql [pgsql]
pgsql (ocf::heartbeat:pgsql): FAILED server2
Stopped: [ server1 ]
Clone Set: ping-gw-rsc-clone [ping-gw-rsc]
Started: [ server1 server2 ]
Node Attributes:
* Node server1:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : STOP
+ ping-gw1 : 100
* Node server2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : LATEST
+ pgsql-status : PRI
+ ping-gw1 : 100
Migration summary:
* Node server1:
* Node server2:
pgsql: migration-threshold=1 fail-count=2 last-failure='Fri Apr 11 14:07:43 2014'
Failed actions:
pgsql_demote_0 on server2 'unknown error' (1): call=77, status=Timed Out, last-rc-change='Fri Apr 11 14:06:43 2014', queued=1ms, exec=60001ms
-------------------------------------------------------
I think pgsql_real_stop() had better throw SIGKILL to PostgreSQL when the shutdown(-m i) command has timed out.
What do you think abount my opinion ?
Regards,
Naoya
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: anzai-naoya@mxu.nes.nec.co.jp
---
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
I'm using pgsql resource agent ( resource-agents-3.9.5-9 ) on fedora20.
I'm testing various failure patterns in a pgsql replicated cluster using it.
I think if MASTER PostgreSQL process has suspended for a long time,
then the resource monitoring and demotion timed out, and the cluster cannot failover until resume.
-----the Cluster status after master demotion timed out.-----
Online: [ server1 server2 ]
Master/Slave Set: msPostgresql [pgsql]
pgsql (ocf::heartbeat:pgsql): FAILED server2
Stopped: [ server1 ]
Clone Set: ping-gw-rsc-clone [ping-gw-rsc]
Started: [ server1 server2 ]
Node Attributes:
* Node server1:
+ master-pgsql : -INFINITY
+ pgsql-data-status : STREAMING|SYNC
+ pgsql-status : STOP
+ ping-gw1 : 100
* Node server2:
+ master-pgsql : -INFINITY
+ pgsql-data-status : LATEST
+ pgsql-status : PRI
+ ping-gw1 : 100
Migration summary:
* Node server1:
* Node server2:
pgsql: migration-threshold=1 fail-count=2 last-failure='Fri Apr 11 14:07:43 2014'
Failed actions:
pgsql_demote_0 on server2 'unknown error' (1): call=77, status=Timed Out, last-rc-change='Fri Apr 11 14:06:43 2014', queued=1ms, exec=60001ms
-------------------------------------------------------
I think pgsql_real_stop() had better throw SIGKILL to PostgreSQL when the shutdown(-m i) command has timed out.
What do you think abount my opinion ?
Regards,
Naoya
---
Naoya Anzai
Engineering Department
NEC Solution Inovetors, Ltd.
E-Mail: anzai-naoya@mxu.nes.nec.co.jp
---
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems