Hi
I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy
and I'm testing how Pacemaker reacts to specific failure scenarios.
One thing I did test that currently drives me crazy is when I manually
stop PostgreSQL trough pg_ctl or just kill the master process to
simulate a crash the pgsql resource agent correctly detects the error
and restarts PostgreSQL.
The problem is have arises when I later call 'crm resource cleanup
pgsql' to delete the failcount and the failed tasks the pgsql resources
shows up as Stopped, but in reality it is still running fine. I'm
having the same problem when I delete the failcount separately and then
do the cleanup.
The problem seems to be that psql_monitor runs into a timeout:
Feb 21 12:47:59 vm-db-01 crmd: [6494]: WARN: cib_action_update:
rsc_op 44: pgsql_monitor_30000 on vm-db-01 timed out
After the timeout pgsql is being restarted, and the interesting thing
is that I can delete the failed action from the timeout without a
problem.
Does anyone have an idea what the problem could be in this case?
Best regards
Lukas
--
Adfinis SyGroup AG
Lukas Grossar, System Engineer
Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06
I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy
and I'm testing how Pacemaker reacts to specific failure scenarios.
One thing I did test that currently drives me crazy is when I manually
stop PostgreSQL trough pg_ctl or just kill the master process to
simulate a crash the pgsql resource agent correctly detects the error
and restarts PostgreSQL.
The problem is have arises when I later call 'crm resource cleanup
pgsql' to delete the failcount and the failed tasks the pgsql resources
shows up as Stopped, but in reality it is still running fine. I'm
having the same problem when I delete the failcount separately and then
do the cleanup.
The problem seems to be that psql_monitor runs into a timeout:
Feb 21 12:47:59 vm-db-01 crmd: [6494]: WARN: cib_action_update:
rsc_op 44: pgsql_monitor_30000 on vm-db-01 timed out
After the timeout pgsql is being restarted, and the interesting thing
is that I can delete the failed action from the timeout without a
problem.
Does anyone have an idea what the problem could be in this case?
Best regards
Lukas
--
Adfinis SyGroup AG
Lukas Grossar, System Engineer
Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06