関係者å„ä½
ãŠä¸–話ã«ãªã£ã¦ãŠã‚Šã¾ã™ã€‚nakaã¨ç”³ã—ã¾ã™ã€‚
å°å‡ºã—ã§ç”³ã—訳ã‚ã‚Šã¾ã›ã‚“。
環境:
CentOS 6.2(x86_64)
pacemaker-1.0.12-1.el6.x86_64
稼åƒç³»ï¼šhoge01v
待機系:hoge02v
異常時ã®å‹•ä½œæ¤œè¨¼ã‚’ã—ã¦ãŠã‚Šã¾ã™ã€‚
2ノードクラスタ構æˆã«ã¦ã€ç¨¼åƒç³»ã‚’SYSRQã«ã¦ãƒãƒ³ã‚°ã‚¢ãƒƒãƒ—ã•ã›ãŸæ™‚ã«
å¾…æ©Ÿç³»ã«F/Oã—ã¦ãã®ã¾ã¾å¾…æ©Ÿç³»ã§ã‚µãƒ¼ãƒ“スリソース(MySQL)ãŒèµ·å‹•ã™ã‚‹ã“ã¨ã‚’
期待ã—ã¦ãŠã‚Šã¾ã™ã€‚
実際ã«ç¨¼åƒç³»ã‚’
# echo c > /proc/sysrq-trigger
ã§ãƒãƒ³ã‚°ã‚¢ãƒƒãƒ—ã•ã›ãŸã¨ã“ã‚ã€å¾…æ©Ÿç³»(hoge02v)ã§ãªãœã‹MySQLインスタンスãŒäºŒé‡èµ·å‹•ã—ã¦ãŠã‹ã—ãª
状態ã«ãªã‚Šã¾ã—ãŸã€‚ãã®å¾Œå…ƒã€…ã®ç¨¼åƒç³»(hoge01v)ã®OSãŒèµ·å‹•ã—ã¦ããŸå¾Œã€hoge01vã«ã‚¯ãƒ©ã‚¹ã‚¿ãƒªã‚½ãƒ¼ã‚¹ãŒF/Bã•ã‚Œã¾ã—ãŸã€‚
(crm_mon -Afr1)
Migration summary:
* Node hoge02v:
service_sfdb01v: migration-threshold=1000000 fail-count=1000000
* Node hoge01v:
Failed actions:
service_sfdb01v_monitor_300000 (node=hoge02v, call=24, rc=7,
status=complete): not running
service_sfdb01v_start_0 (node=hoge02v, call=27, rc=-2, status=Timed
Out): unknown exec error
「service_sfdb01vã€ã¨ã„ã†ã®ãŒMySQLサービスã®ãƒªã‚½ãƒ¼ã‚¹ã¨ãªã‚Šã¾ã™ã€‚
(è¨å®šå†…容)
# crm configure save current.crm
# cat current.crm
node $id="1fc381d6-d6ad-a50f-9aab-cd8ace90fa70" hoge01v
node $id="4a851515-443f-6140-b38f-dfb4bb46c010" hoge02v
primitive ip_sfdb01v ocf:heartbeat:IPaddr2 \
meta migration-threshold="5" \
params ip="10.2.28.62" cidr_netmask="24" nic="eth0" iflabel="0" \
op monitor interval="3s"
primitive res_ping ocf:pacemaker:ping \
params name="eth0_ping_set" host_list="10.2.28.1" multiplier="200"
dampen="1" debug="true" attempts="10" \
op monitor interval="10s" timeout="60" \
op start interval="0" timeout="60"
primitive service_naka01v lsb:pkg_naka01v \
op start interval="0s" timeout="90s" \
op monitor interval="300s" timeout="20s" \
op stop interval="0s" timeout="100s" \
meta is-managed="true"
primitive service_sfdb01v lsb:pkg_sfdb01v \
op start interval="0s" timeout="90s" \
op monitor interval="300s" enabled="true" timeout="20s" \
op stop interval="0s" timeout="100s" \
meta is-managed="true"
primitive vgsfdb01v ocf:heartbeat:LVM \
params volgrpname="vgsfdb01v"
primitive vgsfdb01v_LogVol00 ocf:heartbeat:Filesystem \
meta migration-threshold="5" \
params device="/dev/vgsfdb01v/LogVol00" fstype="ext4"
directory="/mysf" \
op monitor interval="20s"
primitive vgsfdb01v_lv_quorum ocf:heartbeat:sfex \
params index="1" device="/dev/vgsfdb01v/lv_quorum"
group pkg_naka01v service_naka01v \
meta is-managed="true" target-role="Started"
group pkg_sfdb01v ip_sfdb01v vgsfdb01v vgsfdb01v_lv_quorum
vgsfdb01v_LogVol00 service_sfdb01v \
meta is-managed="true" target-role="Started"
clone clone_ping res_ping \
meta target-role="Started"
location pkg_naka01v-location pkg_naka01v \
rule $id="pkg_naka01v-location-0" 200: #uname eq hoge01v \
rule $id="pkg_naka01v-location-1" 100: #uname eq hoge02v
location pkg_naka01v-service-location pkg_naka01v \
rule $id="pkg_naka01v-service-location-rule" -inf: defined
eth0_ping_set and eth0_ping_set lt 200
location pkg_sfdb01v-location pkg_sfdb01v \
rule $id="pkg_sfdb01v-location-0" 200: #uname eq hoge01v \
rule $id="pkg_sfdb01v-location-1" 100: #uname eq hoge02v
location pkg_sfdb01v-service-location pkg_sfdb01v \
rule $id="pkg_sfdb01v-service-location-rule" -inf: defined
eth0_ping_set and eth0_ping_set lt 200
property $id="cib-bootstrap-options" \
dc-version="1.0.12-066152e" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
default-action-timeout="120s" \
last-lrm-refresh="1450681148" \
maintenance-mode="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY"
異常時ã«ã©ã®ã‚ˆã†ãªä»•çµ„ã¿ã§F/Oã™ã‚‹ã‹æ£ç¢ºãªã¨ã“ã‚ã‚’ç†è§£ã—ã¦ãŠã‚‰ãšã€ç§ã‚‚
調査段階ã§ã¯ã‚ã‚Šã¾ã™ãŒã€è¢«ç–‘箇所ã¨ã¿ã‚‰ã‚Œã‚‹éƒ¨åˆ†ç‰ã”å˜çŸ¥ã§ã—ãŸã‚‰ã€ã‚¢ãƒ‰ãƒã‚¤ã‚¹ã‚’
é ‚ã‘ã‚‹ã¨å¹¸ã„ã§ã™ã€‚Packemakerã¨ã„ã†ã‚ˆã‚Šã€ã“ã¡ã‚‰ã®MySQLèµ·å‹•åœæ¢ã®ã‚¹ã‚¯ãƒªãƒ—トã ã£ãŸã‚Š
エラーãƒãƒ³ãƒ‰ãƒªãƒ³ã‚°ã«å•é¡ŒãŒã‚ã‚‹ã®ã‹ã‚‚未調査ã§ã™ã€‚
曖昧ãªçŠ¶æ…‹ã§ã®ãƒ¡ãƒ¼ãƒ«ã¨ãªã‚Šæ縮ã§ã™ãŒã€å®œã—ããŠé¡˜ã„致ã—ã¾ã™ã€‚
以上ã€å®œã—ããŠé¡˜ã„致ã—ã¾ã™ã€‚
--
Nakamura
ãŠä¸–話ã«ãªã£ã¦ãŠã‚Šã¾ã™ã€‚nakaã¨ç”³ã—ã¾ã™ã€‚
å°å‡ºã—ã§ç”³ã—訳ã‚ã‚Šã¾ã›ã‚“。
環境:
CentOS 6.2(x86_64)
pacemaker-1.0.12-1.el6.x86_64
稼åƒç³»ï¼šhoge01v
待機系:hoge02v
異常時ã®å‹•ä½œæ¤œè¨¼ã‚’ã—ã¦ãŠã‚Šã¾ã™ã€‚
2ノードクラスタ構æˆã«ã¦ã€ç¨¼åƒç³»ã‚’SYSRQã«ã¦ãƒãƒ³ã‚°ã‚¢ãƒƒãƒ—ã•ã›ãŸæ™‚ã«
å¾…æ©Ÿç³»ã«F/Oã—ã¦ãã®ã¾ã¾å¾…æ©Ÿç³»ã§ã‚µãƒ¼ãƒ“スリソース(MySQL)ãŒèµ·å‹•ã™ã‚‹ã“ã¨ã‚’
期待ã—ã¦ãŠã‚Šã¾ã™ã€‚
実際ã«ç¨¼åƒç³»ã‚’
# echo c > /proc/sysrq-trigger
ã§ãƒãƒ³ã‚°ã‚¢ãƒƒãƒ—ã•ã›ãŸã¨ã“ã‚ã€å¾…æ©Ÿç³»(hoge02v)ã§ãªãœã‹MySQLインスタンスãŒäºŒé‡èµ·å‹•ã—ã¦ãŠã‹ã—ãª
状態ã«ãªã‚Šã¾ã—ãŸã€‚ãã®å¾Œå…ƒã€…ã®ç¨¼åƒç³»(hoge01v)ã®OSãŒèµ·å‹•ã—ã¦ããŸå¾Œã€hoge01vã«ã‚¯ãƒ©ã‚¹ã‚¿ãƒªã‚½ãƒ¼ã‚¹ãŒF/Bã•ã‚Œã¾ã—ãŸã€‚
(crm_mon -Afr1)
Migration summary:
* Node hoge02v:
service_sfdb01v: migration-threshold=1000000 fail-count=1000000
* Node hoge01v:
Failed actions:
service_sfdb01v_monitor_300000 (node=hoge02v, call=24, rc=7,
status=complete): not running
service_sfdb01v_start_0 (node=hoge02v, call=27, rc=-2, status=Timed
Out): unknown exec error
「service_sfdb01vã€ã¨ã„ã†ã®ãŒMySQLサービスã®ãƒªã‚½ãƒ¼ã‚¹ã¨ãªã‚Šã¾ã™ã€‚
(è¨å®šå†…容)
# crm configure save current.crm
# cat current.crm
node $id="1fc381d6-d6ad-a50f-9aab-cd8ace90fa70" hoge01v
node $id="4a851515-443f-6140-b38f-dfb4bb46c010" hoge02v
primitive ip_sfdb01v ocf:heartbeat:IPaddr2 \
meta migration-threshold="5" \
params ip="10.2.28.62" cidr_netmask="24" nic="eth0" iflabel="0" \
op monitor interval="3s"
primitive res_ping ocf:pacemaker:ping \
params name="eth0_ping_set" host_list="10.2.28.1" multiplier="200"
dampen="1" debug="true" attempts="10" \
op monitor interval="10s" timeout="60" \
op start interval="0" timeout="60"
primitive service_naka01v lsb:pkg_naka01v \
op start interval="0s" timeout="90s" \
op monitor interval="300s" timeout="20s" \
op stop interval="0s" timeout="100s" \
meta is-managed="true"
primitive service_sfdb01v lsb:pkg_sfdb01v \
op start interval="0s" timeout="90s" \
op monitor interval="300s" enabled="true" timeout="20s" \
op stop interval="0s" timeout="100s" \
meta is-managed="true"
primitive vgsfdb01v ocf:heartbeat:LVM \
params volgrpname="vgsfdb01v"
primitive vgsfdb01v_LogVol00 ocf:heartbeat:Filesystem \
meta migration-threshold="5" \
params device="/dev/vgsfdb01v/LogVol00" fstype="ext4"
directory="/mysf" \
op monitor interval="20s"
primitive vgsfdb01v_lv_quorum ocf:heartbeat:sfex \
params index="1" device="/dev/vgsfdb01v/lv_quorum"
group pkg_naka01v service_naka01v \
meta is-managed="true" target-role="Started"
group pkg_sfdb01v ip_sfdb01v vgsfdb01v vgsfdb01v_lv_quorum
vgsfdb01v_LogVol00 service_sfdb01v \
meta is-managed="true" target-role="Started"
clone clone_ping res_ping \
meta target-role="Started"
location pkg_naka01v-location pkg_naka01v \
rule $id="pkg_naka01v-location-0" 200: #uname eq hoge01v \
rule $id="pkg_naka01v-location-1" 100: #uname eq hoge02v
location pkg_naka01v-service-location pkg_naka01v \
rule $id="pkg_naka01v-service-location-rule" -inf: defined
eth0_ping_set and eth0_ping_set lt 200
location pkg_sfdb01v-location pkg_sfdb01v \
rule $id="pkg_sfdb01v-location-0" 200: #uname eq hoge01v \
rule $id="pkg_sfdb01v-location-1" 100: #uname eq hoge02v
location pkg_sfdb01v-service-location pkg_sfdb01v \
rule $id="pkg_sfdb01v-service-location-rule" -inf: defined
eth0_ping_set and eth0_ping_set lt 200
property $id="cib-bootstrap-options" \
dc-version="1.0.12-066152e" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
no-quorum-policy="ignore" \
default-action-timeout="120s" \
last-lrm-refresh="1450681148" \
maintenance-mode="false"
rsc_defaults $id="rsc-options" \
resource-stickiness="INFINITY"
異常時ã«ã©ã®ã‚ˆã†ãªä»•çµ„ã¿ã§F/Oã™ã‚‹ã‹æ£ç¢ºãªã¨ã“ã‚ã‚’ç†è§£ã—ã¦ãŠã‚‰ãšã€ç§ã‚‚
調査段階ã§ã¯ã‚ã‚Šã¾ã™ãŒã€è¢«ç–‘箇所ã¨ã¿ã‚‰ã‚Œã‚‹éƒ¨åˆ†ç‰ã”å˜çŸ¥ã§ã—ãŸã‚‰ã€ã‚¢ãƒ‰ãƒã‚¤ã‚¹ã‚’
é ‚ã‘ã‚‹ã¨å¹¸ã„ã§ã™ã€‚Packemakerã¨ã„ã†ã‚ˆã‚Šã€ã“ã¡ã‚‰ã®MySQLèµ·å‹•åœæ¢ã®ã‚¹ã‚¯ãƒªãƒ—トã ã£ãŸã‚Š
エラーãƒãƒ³ãƒ‰ãƒªãƒ³ã‚°ã«å•é¡ŒãŒã‚ã‚‹ã®ã‹ã‚‚未調査ã§ã™ã€‚
曖昧ãªçŠ¶æ…‹ã§ã®ãƒ¡ãƒ¼ãƒ«ã¨ãªã‚Šæ縮ã§ã™ãŒã€å®œã—ããŠé¡˜ã„致ã—ã¾ã™ã€‚
以上ã€å®œã—ããŠé¡˜ã„致ã—ã¾ã™ã€‚
--
Nakamura