Mailing List Archive

qb_ipcs_disconnect message in corosync cluster
Hi,

We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests.

Frequently, one node gets disconnected from cib. This is the message seen
in corosync logs,

Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
qb_ipcs_dispatch_connection_request: HUP conn (3760-5529-13)
Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-13) state:2
Nov 25 08:36:07 [3760] sysmon-secondary cib: info:
crm_client_destroy: Destroying 0 events
Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
qb_rb_close: Free'ing ringbuffer:
/dev/shm/qb-cib_ro-response-3760-5529-13-header
Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
qb_rb_close: Free'ing ringbuffer:
/dev/shm/qb-cib_ro-event-3760-5529-13-header
Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
qb_rb_close: Free'ing ringbuffer:
/dev/shm/qb-cib_ro-request-3760-5529-13-header


Can you pls help fix the issue?

--
Bharathiraja
Re: qb_ipcs_disconnect message in corosync cluster [ In reply to ]
> On 12 Dec 2014, at 9:57 pm, Bharathiraja P <raja@where2getit.com> wrote:
>
> Hi,
>
> We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests.
>
> Frequently, one node gets disconnected from cib. This is the message seen in corosync logs,
>
> Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_ipcs_dispatch_connection_request: HUP conn (3760-5529-13)
> Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-13) state:2
> Nov 25 08:36:07 [3760] sysmon-secondary cib: info: crm_client_destroy: Destroying 0 events
> Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_ro-response-3760-5529-13-header
> Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_ro-event-3760-5529-13-header
> Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_ro-request-3760-5529-13-header
>
>
> Can you pls help fix the issue?

What issue?


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: qb_ipcs_disconnect message in corosync cluster [ In reply to ]
Hi Andrew,

Frequently one node gets disconnected from CIB and stops the cluster
resources. I'm not able to start or cleanup failed actions for any of the
resources. For ex, if nodeA gets disconnected from CIB, I won't be able to
run actions on a resource like cleanup/stop/restart,... as that hangs
forever.

In corosync log I will see a message like this " cib: debug:
qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-
13) state:2"

All I had to do is to force kill the cib process on both nodes multiple
times.

Let me know if you need any other info to nail down this issue.

--
Bharathiraja

On Mon, Dec 15, 2014 at 9:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote:

>
> > On 12 Dec 2014, at 9:57 pm, Bharathiraja P <raja@where2getit.com> wrote:
> >
> > Hi,
> >
> > We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests.
> >
> > Frequently, one node gets disconnected from cib. This is the message
> seen in corosync logs,
> >
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_ipcs_dispatch_connection_request: HUP conn (3760-5529-13)
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-13) state:2
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: info:
> crm_client_destroy: Destroying 0 events
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_rb_close: Free'ing ringbuffer:
> /dev/shm/qb-cib_ro-response-3760-5529-13-header
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_rb_close: Free'ing ringbuffer:
> /dev/shm/qb-cib_ro-event-3760-5529-13-header
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_rb_close: Free'ing ringbuffer:
> /dev/shm/qb-cib_ro-request-3760-5529-13-header
> >
> >
> > Can you pls help fix the issue?
>
> What issue?
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Re: qb_ipcs_disconnect message in corosync cluster [ In reply to ]
> On 15 Dec 2014, at 4:29 pm, Bharathiraja P <raja@where2getit.com> wrote:
>
> Hi Andrew,
>
> Frequently one node gets disconnected from CIB and stops the cluster resources. I'm not able to start or cleanup failed actions for any of the resources. For ex, if nodeA gets disconnected from CIB, I won't be able to run actions on a resource like cleanup/stop/restart,... as that hangs forever.
>
> In corosync log I will see a message like this " cib: debug: qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-
> 13) state:2"
>
> All I had to do is to force kill the cib process on both nodes multiple times.
>
> Let me know if you need any other info to nail down this issue.

For starters, we'd need to know what process 5529 was and what the rest of the processes in the cluster were doing.
Its impossible to say anything from so few non-error logs.

>
> --
> Bharathiraja
>
> On Mon, Dec 15, 2014 at 9:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
>
> > On 12 Dec 2014, at 9:57 pm, Bharathiraja P <raja@where2getit.com> wrote:
> >
> > Hi,
> >
> > We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests.
> >
> > Frequently, one node gets disconnected from cib. This is the message seen in corosync logs,
> >
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_ipcs_dispatch_connection_request: HUP conn (3760-5529-13)
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-13) state:2
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: info: crm_client_destroy: Destroying 0 events
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_ro-response-3760-5529-13-header
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_ro-event-3760-5529-13-header
> > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug: qb_rb_close: Free'ing ringbuffer: /dev/shm/qb-cib_ro-request-3760-5529-13-header
> >
> >
> > Can you pls help fix the issue?
>
> What issue?
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: qb_ipcs_disconnect message in corosync cluster [ In reply to ]
Thanks Andrew.

I upgraded corosync and pacemaker and the cluster works fine now.

On Thu, Jan 8, 2015 at 8:26 AM, Andrew Beekhof <andrew@beekhof.net> wrote:

>
> > On 15 Dec 2014, at 4:29 pm, Bharathiraja P <raja@where2getit.com> wrote:
> >
> > Hi Andrew,
> >
> > Frequently one node gets disconnected from CIB and stops the cluster
> resources. I'm not able to start or cleanup failed actions for any of the
> resources. For ex, if nodeA gets disconnected from CIB, I won't be able to
> run actions on a resource like cleanup/stop/restart,... as that hangs
> forever.
> >
> > In corosync log I will see a message like this " cib: debug:
> qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-
> > 13) state:2"
> >
> > All I had to do is to force kill the cib process on both nodes multiple
> times.
> >
> > Let me know if you need any other info to nail down this issue.
>
> For starters, we'd need to know what process 5529 was and what the rest of
> the processes in the cluster were doing.
> Its impossible to say anything from so few non-error logs.
>
> >
> > --
> > Bharathiraja
> >
> > On Mon, Dec 15, 2014 at 9:19 AM, Andrew Beekhof <andrew@beekhof.net>
> wrote:
> >
> > > On 12 Dec 2014, at 9:57 pm, Bharathiraja P <raja@where2getit.com>
> wrote:
> > >
> > > Hi,
> > >
> > > We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests.
> > >
> > > Frequently, one node gets disconnected from cib. This is the message
> seen in corosync logs,
> > >
> > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_ipcs_dispatch_connection_request: HUP conn (3760-5529-13)
> > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_ipcs_disconnect: qb_ipcs_disconnect(3760-5529-13) state:2
> > > Nov 25 08:36:07 [3760] sysmon-secondary cib: info:
> crm_client_destroy: Destroying 0 events
> > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_rb_close: Free'ing ringbuffer:
> /dev/shm/qb-cib_ro-response-3760-5529-13-header
> > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_rb_close: Free'ing ringbuffer:
> /dev/shm/qb-cib_ro-event-3760-5529-13-header
> > > Nov 25 08:36:07 [3760] sysmon-secondary cib: debug:
> qb_rb_close: Free'ing ringbuffer:
> /dev/shm/qb-cib_ro-request-3760-5529-13-header
> > >
> > >
> > > Can you pls help fix the issue?
> >
> > What issue?
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>