MLã®çš†ã•ã‚“
å¤ç©ã¨ç”³ã—ã¾ã™ã€‚
スプリットブレインã®äº‹è±¡ãŒç™ºç”Ÿã—ã¾ã—ãŸã€‚
対処方法ã«ã¤ã„ã¦ã”相談ã•ã›ã¦é ‚ã‘ãªã„ã§ã—ょã†ã‹ã€‚
■事象内容
1/7 18:12 稼åƒç³»(host1)ã«ã¦CPUé«˜è² è·äº‹è±¡ãŒç™ºç”Ÿ
1/7 18:12:51 å¾…æ©Ÿç³»(host2)ã§ä¸»ç³»ã¸æ˜‡æ ¼ã™ã‚‹å‹•ä½œãŒèµ·ã“りスプリットブレインã¨ãªã£ãŸã€‚
â– ã”相談内容
1) 事象ã®ç†è§£ã«ã¤ã„ã¦
稼åƒç³»é«˜è² è·ã«ã‚ˆã‚Šå¾…æ©Ÿç³»ã¨ã®é–“ã§å®Ÿæ–½ã—ã¦ã„㟠UDP パケット㮠heartbeat 通信ã§
å¿œç”ãŒç„¡ã‹ã£ãŸãŸã‚ã€å¾…æ©Ÿç³»ãŒç¨¼åƒç³»ã¨ãªã‚Šã€ã‚¹ãƒ—リットブレインãŒç™ºç”Ÿã—ãŸã¨ã®
ç†è§£ã§èª¤ã‚Šã¯ãªã„ã§ã—ょã†ã‹ã€‚ä»–ã®è§£é‡ˆãŒã‚ã‚Œã°ã”æŒ‡æ‘˜é ‚ã‘ã‚‹ã¨å¹¸ã„ã§ã™ã€‚
2) パラメータ調整ã«ã¤ã„ã¦
稼åƒç³»é«˜è² è·ã¨ãªã£ãŸå ´åˆã§ã‚‚ã€å¾…æ©Ÿç³»ãŒç¨¼åƒç³»ã¸
æ˜‡æ ¼ã™ã‚‹ã¾ã§ã®å¾…ã¡æ™‚間を延長ã—ãŸã„ã¨è€ƒãˆã¦ã„ã¾ã™ã€‚
今回ã®äº‹è±¡ã‚‚3分程度待ã¦ã°ç¨¼åƒç³»ã®è² è·ã‚‚下ãŒã£ã¦ã„る為ã€
今回ã®ç’°å¢ƒã«ã¤ã„ã¦ã¯5分程度 timeout ã‚’æŒãŸã›ãŸã„ã¨è€ƒãˆã¦ãŠã‚Šã¾ã—ãŸã€‚
下記マニュアルをèªã‚€ã¨ default-action-timeout ã®ãƒ‘ラメータã§
実ç¾å‡ºæ¥ãã†ã«ã‚‚æ€ãˆã¦ãŠã‚Šã¾ã™ãŒã€ãã®èªè˜ã§èª¤ã‚Šãªã„ã§ã—ょã†ã‹ã€‚
ä»–ã®ãƒ‘ラメータãŒé–¢é€£ã—ãã†ã§ã‚ã‚Œã°ã”æŒ‡æ‘˜é ‚ã‘ã¾ã™ã¨å¬‰ã—ã„ã§ã™ã€‚
https://access.redhat.com/documentation/ja-JP/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ch-clusteropts-HAAR.html
■環境
OS: CentOS6.5
corosync-1.4.1-17
pacemaker-1.1.10
■事象発生時ã®å¾…æ©Ÿç³»(2å·æ©Ÿ)ã®ãƒã‚°æŠœç²‹
Jan 07 18:12:43 [9102] <host2> cib: info: crm_client_new:
Connecting 0x1007450 for uid=0 gid=0 pid=28534
id=bbc03f9e-ca10-4332-bca7-273466553815
Jan 07 18:12:43 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section nodes: OK (rc=0,
origin=local/crm_attribute/2, version=0.508.3)
Jan 07 18:12:43 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section
//cib/configuration/nodes//node[@id='<host2>']//instance_attributes//nvpair[@name='pgsql-nirc-d
ata-status']: OK (rc=0, origin=local/crm_attribute/3, version=0.508.3)
Jan 07 18:12:43 [9102] <host2> cib: info: crm_client_destroy:
Destroying 0 events
Jan 07 18:12:51 [9102] <host2> cib: notice:
plugin_handle_membership: Membership 8880: quorum lost
Jan 07 18:12:51 [9102] <host2> cib: info: crm_update_peer_proc:
plugin_handle_membership: Node <host1>[1191880896] - unknown is now lost
Jan 07 18:12:51 [9102] <host2> cib: notice: crm_update_peer_state:
plugin_handle_membership: Node <host1>[1191880896] - state is now lost
(was member)
Jan 07 18:12:51 [9107] <host2> crmd: notice:
plugin_handle_membership: Membership 8880: quorum lost
Jan 07 18:12:51 [9107] <host2> crmd: info: crm_update_peer_proc:
plugin_handle_membership: Node <host1>[1191880896] - unknown is now lost
Jan 07 18:12:51 [9107] <host2> crmd: info: peer_update_callback:
Client <host1>/peer now has status [offline] (DC=<host1>)
Jan 07 18:12:51 [9107] <host2> crmd: notice: crm_update_peer_state:
plugin_handle_membership: Node <host1>[1191880896] - state is now lost
(was member)
Jan 07 18:12:51 [9107] <host2> crmd: info: peer_update_callback:
<host1> is now lost (was member)
Jan 07 18:12:51 [9107] <host2> crmd: warning: reap_dead_nodes: Our
DC node (<host1>) left the cluster
Jan 07 18:12:51 [9107] <host2> crmd: notice: do_state_transition:
State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=reap_dead_nodes ]
Jan 07 18:12:51 [9107] <host2> crmd: info: update_dc:
Unset DC. Was <host1>
Jan 07 18:12:51 [9107] <host2> crmd: info: do_log: FSA: Input
I_ELECTION_DC from do_election_check() received in state S_ELECTION
Jan 07 18:12:51 [9107] <host2> crmd: notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 07 18:12:51 [9107] <host2> crmd: info: do_te_control:
Registering TE UUID: f6fe7fd1-0859-4bb3-b98a-574f5be7b3f9
Jan 07 18:12:51 [9107] <host2> crmd: info: set_graph_functions:
Setting custom graph functions
Jan 07 18:12:51 [9106] <host2> pengine: info: crm_client_new:
Connecting 0xb3b490 for uid=189 gid=0 pid=9107
id=381681d3-4fc2-42a9-97f7-286c014297dd
Jan 07 18:12:51 [9107] <host2> crmd: info: do_dc_takeover:
Taking over DC status for this partition
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_readwrite:
We are now in R/W mode
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_master operation for section 'all': OK (rc=0,
origin=local/crmd/13, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_modify operation for section cib: OK (rc=0,
origin=local/crmd/14, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='dc-version']:
OK (rc=0, origin=local/crmd/15, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_modify operation for section crm_config: OK (rc=0,
origin=local/crmd/16, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='cluster-infrastructure']:
OK (rc=0, origin=local/crmd/17, version=0.508.3)
--
â€ï¼ï¼ï¼â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•ï¼ï¼ï¼â€
å¤ç© 広一(Kozumi Koichi)
E-Mail: kozumi@repica.co.jp
―――――――――――――――――――――
æ ªå¼ä¼šç¤¾ãƒ¬ãƒ”ã‚« (repica inc.)
レピカ事æ¥éƒ¨ 技術本部 é‹ç”¨éƒ¨
〒107-0062 æ±äº¬éƒ½æ¸¯åŒºå—é’å±±2-24-15
é’山タワービル別館
TEL: 03-5414-3611 FAX: 03-5414-3622
URL: http://repica.jp/
â€ï¼ï¼ï¼â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•ï¼ï¼ï¼â€
â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”
â–¼10年連続シェアNo.1 å€‹äººæƒ…å ±æ¼ãˆã„対ç–ソフト
ã€P-Pointer】http://ppointer.jp/ ★マイナンãƒãƒ¼å¯¾å¿œæ¸ˆã¿
「1分ã§ã‚ã‹ã‚‹ï¼P-Pointerã€å‹•ç”»ã§ã”紹介ä¸ï¼
â””https://youtu.be/6uvuXlPAeHc
â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”
▼国内åˆï¼ç«‹ä½“物èªè˜ã«å¯¾å¿œã—ãŸARアプリ
ã€ARAPPLI】http://www.arappli.com/service/arappli/
â–¼ITビジãƒã‚¹ã‚’å‰µé€ ã—ãªãŒã‚‰æœªæ¥ã‚’創る
ã€VARCHAR】http://varchar.co.jp/
â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”repica groupâ”
å¤ç©ã¨ç”³ã—ã¾ã™ã€‚
スプリットブレインã®äº‹è±¡ãŒç™ºç”Ÿã—ã¾ã—ãŸã€‚
対処方法ã«ã¤ã„ã¦ã”相談ã•ã›ã¦é ‚ã‘ãªã„ã§ã—ょã†ã‹ã€‚
■事象内容
1/7 18:12 稼åƒç³»(host1)ã«ã¦CPUé«˜è² è·äº‹è±¡ãŒç™ºç”Ÿ
1/7 18:12:51 å¾…æ©Ÿç³»(host2)ã§ä¸»ç³»ã¸æ˜‡æ ¼ã™ã‚‹å‹•ä½œãŒèµ·ã“りスプリットブレインã¨ãªã£ãŸã€‚
â– ã”相談内容
1) 事象ã®ç†è§£ã«ã¤ã„ã¦
稼åƒç³»é«˜è² è·ã«ã‚ˆã‚Šå¾…æ©Ÿç³»ã¨ã®é–“ã§å®Ÿæ–½ã—ã¦ã„㟠UDP パケット㮠heartbeat 通信ã§
å¿œç”ãŒç„¡ã‹ã£ãŸãŸã‚ã€å¾…æ©Ÿç³»ãŒç¨¼åƒç³»ã¨ãªã‚Šã€ã‚¹ãƒ—リットブレインãŒç™ºç”Ÿã—ãŸã¨ã®
ç†è§£ã§èª¤ã‚Šã¯ãªã„ã§ã—ょã†ã‹ã€‚ä»–ã®è§£é‡ˆãŒã‚ã‚Œã°ã”æŒ‡æ‘˜é ‚ã‘ã‚‹ã¨å¹¸ã„ã§ã™ã€‚
2) パラメータ調整ã«ã¤ã„ã¦
稼åƒç³»é«˜è² è·ã¨ãªã£ãŸå ´åˆã§ã‚‚ã€å¾…æ©Ÿç³»ãŒç¨¼åƒç³»ã¸
æ˜‡æ ¼ã™ã‚‹ã¾ã§ã®å¾…ã¡æ™‚間を延長ã—ãŸã„ã¨è€ƒãˆã¦ã„ã¾ã™ã€‚
今回ã®äº‹è±¡ã‚‚3分程度待ã¦ã°ç¨¼åƒç³»ã®è² è·ã‚‚下ãŒã£ã¦ã„る為ã€
今回ã®ç’°å¢ƒã«ã¤ã„ã¦ã¯5分程度 timeout ã‚’æŒãŸã›ãŸã„ã¨è€ƒãˆã¦ãŠã‚Šã¾ã—ãŸã€‚
下記マニュアルをèªã‚€ã¨ default-action-timeout ã®ãƒ‘ラメータã§
実ç¾å‡ºæ¥ãã†ã«ã‚‚æ€ãˆã¦ãŠã‚Šã¾ã™ãŒã€ãã®èªè˜ã§èª¤ã‚Šãªã„ã§ã—ょã†ã‹ã€‚
ä»–ã®ãƒ‘ラメータãŒé–¢é€£ã—ãã†ã§ã‚ã‚Œã°ã”æŒ‡æ‘˜é ‚ã‘ã¾ã™ã¨å¬‰ã—ã„ã§ã™ã€‚
https://access.redhat.com/documentation/ja-JP/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/ch-clusteropts-HAAR.html
■環境
OS: CentOS6.5
corosync-1.4.1-17
pacemaker-1.1.10
■事象発生時ã®å¾…æ©Ÿç³»(2å·æ©Ÿ)ã®ãƒã‚°æŠœç²‹
Jan 07 18:12:43 [9102] <host2> cib: info: crm_client_new:
Connecting 0x1007450 for uid=0 gid=0 pid=28534
id=bbc03f9e-ca10-4332-bca7-273466553815
Jan 07 18:12:43 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section nodes: OK (rc=0,
origin=local/crm_attribute/2, version=0.508.3)
Jan 07 18:12:43 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section
//cib/configuration/nodes//node[@id='<host2>']//instance_attributes//nvpair[@name='pgsql-nirc-d
ata-status']: OK (rc=0, origin=local/crm_attribute/3, version=0.508.3)
Jan 07 18:12:43 [9102] <host2> cib: info: crm_client_destroy:
Destroying 0 events
Jan 07 18:12:51 [9102] <host2> cib: notice:
plugin_handle_membership: Membership 8880: quorum lost
Jan 07 18:12:51 [9102] <host2> cib: info: crm_update_peer_proc:
plugin_handle_membership: Node <host1>[1191880896] - unknown is now lost
Jan 07 18:12:51 [9102] <host2> cib: notice: crm_update_peer_state:
plugin_handle_membership: Node <host1>[1191880896] - state is now lost
(was member)
Jan 07 18:12:51 [9107] <host2> crmd: notice:
plugin_handle_membership: Membership 8880: quorum lost
Jan 07 18:12:51 [9107] <host2> crmd: info: crm_update_peer_proc:
plugin_handle_membership: Node <host1>[1191880896] - unknown is now lost
Jan 07 18:12:51 [9107] <host2> crmd: info: peer_update_callback:
Client <host1>/peer now has status [offline] (DC=<host1>)
Jan 07 18:12:51 [9107] <host2> crmd: notice: crm_update_peer_state:
plugin_handle_membership: Node <host1>[1191880896] - state is now lost
(was member)
Jan 07 18:12:51 [9107] <host2> crmd: info: peer_update_callback:
<host1> is now lost (was member)
Jan 07 18:12:51 [9107] <host2> crmd: warning: reap_dead_nodes: Our
DC node (<host1>) left the cluster
Jan 07 18:12:51 [9107] <host2> crmd: notice: do_state_transition:
State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION
cause=C_FSA_INTERNAL origin=reap_dead_nodes ]
Jan 07 18:12:51 [9107] <host2> crmd: info: update_dc:
Unset DC. Was <host1>
Jan 07 18:12:51 [9107] <host2> crmd: info: do_log: FSA: Input
I_ELECTION_DC from do_election_check() received in state S_ELECTION
Jan 07 18:12:51 [9107] <host2> crmd: notice: do_state_transition:
State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC
cause=C_FSA_INTERNAL origin=do_election_check ]
Jan 07 18:12:51 [9107] <host2> crmd: info: do_te_control:
Registering TE UUID: f6fe7fd1-0859-4bb3-b98a-574f5be7b3f9
Jan 07 18:12:51 [9107] <host2> crmd: info: set_graph_functions:
Setting custom graph functions
Jan 07 18:12:51 [9106] <host2> pengine: info: crm_client_new:
Connecting 0xb3b490 for uid=189 gid=0 pid=9107
id=381681d3-4fc2-42a9-97f7-286c014297dd
Jan 07 18:12:51 [9107] <host2> crmd: info: do_dc_takeover:
Taking over DC status for this partition
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_readwrite:
We are now in R/W mode
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_master operation for section 'all': OK (rc=0,
origin=local/crmd/13, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_modify operation for section cib: OK (rc=0,
origin=local/crmd/14, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='dc-version']:
OK (rc=0, origin=local/crmd/15, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_modify operation for section crm_config: OK (rc=0,
origin=local/crmd/16, version=0.508.3)
Jan 07 18:12:51 [9102] <host2> cib: info: cib_process_request:
Completed cib_query operation for section
//cib/configuration/crm_config//cluster_property_set//nvpair[@name='cluster-infrastructure']:
OK (rc=0, origin=local/crmd/17, version=0.508.3)
--
â€ï¼ï¼ï¼â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•ï¼ï¼ï¼â€
å¤ç© 広一(Kozumi Koichi)
E-Mail: kozumi@repica.co.jp
―――――――――――――――――――――
æ ªå¼ä¼šç¤¾ãƒ¬ãƒ”ã‚« (repica inc.)
レピカ事æ¥éƒ¨ 技術本部 é‹ç”¨éƒ¨
〒107-0062 æ±äº¬éƒ½æ¸¯åŒºå—é’å±±2-24-15
é’山タワービル別館
TEL: 03-5414-3611 FAX: 03-5414-3622
URL: http://repica.jp/
â€ï¼ï¼ï¼â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•â€•ï¼ï¼ï¼â€
â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”
â–¼10年連続シェアNo.1 å€‹äººæƒ…å ±æ¼ãˆã„対ç–ソフト
ã€P-Pointer】http://ppointer.jp/ ★マイナンãƒãƒ¼å¯¾å¿œæ¸ˆã¿
「1分ã§ã‚ã‹ã‚‹ï¼P-Pointerã€å‹•ç”»ã§ã”紹介ä¸ï¼
â””https://youtu.be/6uvuXlPAeHc
â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”
▼国内åˆï¼ç«‹ä½“物èªè˜ã«å¯¾å¿œã—ãŸARアプリ
ã€ARAPPLI】http://www.arappli.com/service/arappli/
â–¼ITビジãƒã‚¹ã‚’å‰µé€ ã—ãªãŒã‚‰æœªæ¥ã‚’創る
ã€VARCHAR】http://varchar.co.jp/
â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”â”repica groupâ”