Mailing List Archive

CoroSync's UDPu transport for public IP addresses?
Hello.

I have a geographically distributed cluster, all machines have public IP
addresses. No virtual IP subnet exists, so no multicast is available.

I thought that UDPu transport can work in such environment, doesn't it?

To test everything in advance, I've set up a corosync+pacemaker on Ubuntu
14.04 with the following corosync.conf:

totem {
transport: udpu
interface {
ringnumber: 0
bindnetaddr: ip-address-of-the-current-machine
mcastport: 5405
}
}
nodelist {
node {
ring0_addr: node1
}
node {
ring0_addr: node2
}
}
...

(here node1 and node2 are hostnames from /etc/hosts on both machines).
After running "service corosync start; service pacemaker start" logs show
no problems, but actually both nodes are always offline:

root@node1:/etc/corosync# crm status | grep node
OFFLINE: [ node1 node2 ]

and "crm node online" (as all other attempts to make crm to do something)
are timed out with "communication error".

No iptables, selinux, apparmor and other bullshit are active: just pure
virtual machines with single public IP addresses on each. Also tcpdump
shows that UDB packets on port 5405 are going in and out, and if I e.g.
stop corosync at node1, the tcpdump output at node2 changes significantly.
So they see each other definitely.

And if I attach a gvpe adapter to these 2 machines with a private subnet
and switch transport to the default one, corosync + pacemaker begin to work.

So my question is: what am I doing wrong? Maybe UDPu is not suitable for
communications among machines with public IP addresses only?
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Hi,

On Mon, Dec 29, 2014 at 06:11:49AM +0300, Dmitry Koterov wrote:
> Hello.
>
> I have a geographically distributed cluster, all machines have public IP
> addresses. No virtual IP subnet exists, so no multicast is available.
>
> I thought that UDPu transport can work in such environment, doesn't it?
>
> To test everything in advance, I've set up a corosync+pacemaker on Ubuntu
> 14.04 with the following corosync.conf:
>
> totem {
> transport: udpu
> interface {
> ringnumber: 0
> bindnetaddr: ip-address-of-the-current-machine
> mcastport: 5405
> }

You need to add the member directives too. See corosync.conf(5).

Thanks,

Dejan

> }
> nodelist {
> node {
> ring0_addr: node1
> }
> node {
> ring0_addr: node2
> }
> }
> ...
>
> (here node1 and node2 are hostnames from /etc/hosts on both machines).
> After running "service corosync start; service pacemaker start" logs show
> no problems, but actually both nodes are always offline:
>
> root@node1:/etc/corosync# crm status | grep node
> OFFLINE: [ node1 node2 ]
>
> and "crm node online" (as all other attempts to make crm to do something)
> are timed out with "communication error".
>
> No iptables, selinux, apparmor and other bullshit are active: just pure
> virtual machines with single public IP addresses on each. Also tcpdump
> shows that UDB packets on port 5405 are going in and out, and if I e.g.
> stop corosync at node1, the tcpdump output at node2 changes significantly.
> So they see each other definitely.
>
> And if I attach a gvpe adapter to these 2 machines with a private subnet
> and switch transport to the default one, corosync + pacemaker begin to work.
>
> So my question is: what am I doing wrong? Maybe UDPu is not suitable for
> communications among machines with public IP addresses only?

> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
On Mon, Dec 29, 2014 at 1:50 PM, Dejan Muhamedagic <dejanmm@fastmail.fm> wrote:
> Hi,
>
> On Mon, Dec 29, 2014 at 06:11:49AM +0300, Dmitry Koterov wrote:
>> Hello.
>>
>> I have a geographically distributed cluster, all machines have public IP
>> addresses. No virtual IP subnet exists, so no multicast is available.
>>
>> I thought that UDPu transport can work in such environment, doesn't it?
>>
>> To test everything in advance, I've set up a corosync+pacemaker on Ubuntu
>> 14.04 with the following corosync.conf:
>>
>> totem {
>> transport: udpu
>> interface {
>> ringnumber: 0
>> bindnetaddr: ip-address-of-the-current-machine
>> mcastport: 5405
>> }
>
> You need to add the member directives too. See corosync.conf(5).
>

Are not member directives for corosync 1.x and nodelist directives for
corosync 2.x?

Dmitry, which version do you have?

> Thanks,
>
> Dejan
>
>> }
>> nodelist {
>> node {
>> ring0_addr: node1
>> }
>> node {
>> ring0_addr: node2
>> }
>> }
>> ...
>>
>> (here node1 and node2 are hostnames from /etc/hosts on both machines).
>> After running "service corosync start; service pacemaker start" logs show
>> no problems, but actually both nodes are always offline:
>>
>> root@node1:/etc/corosync# crm status | grep node
>> OFFLINE: [ node1 node2 ]
>>
>> and "crm node online" (as all other attempts to make crm to do something)
>> are timed out with "communication error".
>>
>> No iptables, selinux, apparmor and other bullshit are active: just pure
>> virtual machines with single public IP addresses on each. Also tcpdump
>> shows that UDB packets on port 5405 are going in and out, and if I e.g.
>> stop corosync at node1, the tcpdump output at node2 changes significantly.
>> So they see each other definitely.
>>
>> And if I attach a gvpe adapter to these 2 machines with a private subnet
>> and switch transport to the default one, corosync + pacemaker begin to work.
>>
>> So my question is: what am I doing wrong? Maybe UDPu is not suitable for
>> communications among machines with public IP addresses only?
>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Hi,

On Mon, Dec 29, 2014 at 03:47:16PM +0300, Andrei Borzenkov wrote:
> On Mon, Dec 29, 2014 at 1:50 PM, Dejan Muhamedagic <dejanmm@fastmail.fm> wrote:
> > Hi,
> >
> > On Mon, Dec 29, 2014 at 06:11:49AM +0300, Dmitry Koterov wrote:
> >> Hello.
> >>
> >> I have a geographically distributed cluster, all machines have public IP
> >> addresses. No virtual IP subnet exists, so no multicast is available.
> >>
> >> I thought that UDPu transport can work in such environment, doesn't it?
> >>
> >> To test everything in advance, I've set up a corosync+pacemaker on Ubuntu
> >> 14.04 with the following corosync.conf:
> >>
> >> totem {
> >> transport: udpu
> >> interface {
> >> ringnumber: 0
> >> bindnetaddr: ip-address-of-the-current-machine
> >> mcastport: 5405
> >> }
> >
> > You need to add the member directives too. See corosync.conf(5).
> >
>
> Are not member directives for corosync 1.x and nodelist directives for
> corosync 2.x?

Yes, that's right. Looks like my memory's still on 1.x.

Thanks,

Dejan

> Dmitry, which version do you have?
>
> > Thanks,
> >
> > Dejan
> >
> >> }
> >> nodelist {
> >> node {
> >> ring0_addr: node1
> >> }
> >> node {
> >> ring0_addr: node2
> >> }
> >> }
> >> ...
> >>
> >> (here node1 and node2 are hostnames from /etc/hosts on both machines).
> >> After running "service corosync start; service pacemaker start" logs show
> >> no problems, but actually both nodes are always offline:
> >>
> >> root@node1:/etc/corosync# crm status | grep node
> >> OFFLINE: [ node1 node2 ]
> >>
> >> and "crm node online" (as all other attempts to make crm to do something)
> >> are timed out with "communication error".
> >>
> >> No iptables, selinux, apparmor and other bullshit are active: just pure
> >> virtual machines with single public IP addresses on each. Also tcpdump
> >> shows that UDB packets on port 5405 are going in and out, and if I e.g.
> >> stop corosync at node1, the tcpdump output at node2 changes significantly.
> >> So they see each other definitely.
> >>
> >> And if I attach a gvpe adapter to these 2 machines with a private subnet
> >> and switch transport to the default one, corosync + pacemaker begin to work.
> >>
> >> So my question is: what am I doing wrong? Maybe UDPu is not suitable for
> >> communications among machines with public IP addresses only?
> >
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
>
> On Mon, Dec 29, 2014 at 1:50 PM, Dejan Muhamedagic <dejanmm@fastmail.fm>
> wrote:
> Ubuntu
>
>> ...

>> }

>> nodelist {
> >> node {
> >> ring0_addr: node1
> >> }
> >> node {
> >> ring0_addr: node2
> >> }
> >> }

>> root@node1:/etc/corosync# crm status | grep node
> >> OFFLINE: [ node1 node2 ]
> >> and "crm node online" (as all other attempts to make crm to do
> something) are timed out with "communication error".


> Dmitry, which version do you have?


root@node1:~# corosync -v
Corosync Cluster Engine, version '2.3.3'
Copyright (c) 2006-2009 Red Hat, Inc.

- so nodelist is defenitely enough, and totem->interface->member is
deprecated.

So, am I at least right that the configuration with UDPu SHOULD work with
geo-distributed nodes with only public IP addresses and no private/virtual
subnetwork? If yes, how could I debug it?

Here's some more info (x.x.x.x is a public IP associated to node1):

root@node1:~# netstat -nap|grep coro
udp 0 0 x.x.x.x:41083 0.0.0.0:*
7037/corosync
udp 0 0 x.x.x.x:49299 0.0.0.0:*
7037/corosync
udp 0 0 x.x.x.x:5405 0.0.0.0:*
7037/corosync
unix 2 [ ACC ] STREAM LISTENING 52458 7037/corosync
@quorum
unix 2 [ ACC ] STREAM LISTENING 52455 7037/corosync
@cmap
unix 2 [ ACC ] STREAM LISTENING 52456 7037/corosync
@cfg
unix 2 [ ACC ] STREAM LISTENING 52457 7037/corosync
@cpg
unix 3 [ ] STREAM CONNECTED 52512 7037/corosync
@cpg
unix 3 [ ] STREAM CONNECTED 52625 7037/corosync
@cpg
unix 3 [ ] STREAM CONNECTED 52504 7037/corosync
@cfg
unix 3 [ ] STREAM CONNECTED 52520 7037/corosync
@quorum
unix 2 [ ] DGRAM 52420 7037/corosync
unix 3 [ ] STREAM CONNECTED 52643 7037/corosync
@quorum
unix 3 [ ] STREAM CONNECTED 52568 7037/corosync
@cpg
unix 3 [ ] STREAM CONNECTED 52588 7037/corosync
@cpg
unix 3 [ ] STREAM CONNECTED 52554 7037/corosync
@cpg

root@node1:~# crm status
Last updated: Tue Dec 30 04:33:40 2014
Last change: Sun Dec 28 21:40:41 2014 via crmd on node2
Stack: corosync
Current DC: NONE
2 Nodes configured
0 Resources configured
OFFLINE: [ node1 node2 ]

root@node1:~# crm node online
Error setting standby=off (section=nodes, set=nodes-1084751873):
Communication error on send
Error performing operation: Communication error on send
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Oh, seems I've found the solution! At least two mistakes was in my
corosync.conf (BTW logs did not say about any errors, so my conclusion is
based on my experiments only).

1. nodelist.node MUST contain only IP addresses. No hostnames! They simply
do not work, "crm status" shows no nodes. And no warnings are in logs
regarding this.
2. quorum {} MUST NOT be empty (in the config sample it IS empty): in my
case, the following fixed the problem together with (1):

quorum {
provider: corosync_votequorum
two_node: 1
}

So, below is my final corosync.conf. Now "crm status" shows "Online: [
node1 node2 ]", UDPu transport is used, no virtual network exists at all
(only public IP addresses are specified in corosync.conf).

========================

# This seems to be a really WORKING configuration.
# Ubuntu 14.04, corosync 2.3.3, pacemaker 1.1.10
totem {
version: 2
cluster_name: cluster
crypto_cipher: none
crypto_hash: none
clear_node_high_bit: yes
interface {
ringnumber: 0
bindnetaddr: <public-ip-address-of-the-current-machine>
mcastport: 5405
ttl: 1
}
transport: udpu
heartbeat_failures_allowed: 3
}
logging {
fileline: off
to_logfile: no
to_syslog: yes
debug: on
timestamp: off
logger_subsys {
subsys: QUORUM
debug: off
}
}
nodelist {
node {
ring0_addr: <public-ip-address-of-the-first-machine>
}
node {
ring0_addr: <public-ip-address-of-the-second-machine>
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}

=========================


On Tue, Dec 30, 2014 at 12:34 PM, Dmitry Koterov <dmitry.koterov@gmail.com>
wrote:

> On Mon, Dec 29, 2014 at 1:50 PM, Dejan Muhamedagic <dejanmm@fastmail.fm>
>> wrote:
>> >> On Mon, Dec 29, 2014 at 06:11:49AM +0300, Dmitry Koterov wrote:
>> >> Hello.
>> >>
>> >> I have a geographically distributed cluster, all machines have public
>> IP
>> >> addresses. No virtual IP subnet exists, so no multicast is available.
>> >>
>> >> I thought that UDPu transport can work in such environment, doesn't it?
>> >>
>> >> To test everything in advance, I've set up a corosync+pacemaker on
>> Ubuntu
>> >> 14.04 with the following corosync.conf:
>> >>
>> >> totem {
>> >> transport: udpu
>> >> interface {
>> >> ringnumber: 0
>> >> bindnetaddr: ip-address-of-the-current-machine
>> >> mcastport: 5405
>> >> }
>>
>
>
>> >> node {
>> >> ring0_addr: node1
>> >> }
>> >> node {
>> >> ring0_addr: node2
>> >> }
>> >> }
>
>> >> OFFLINE: [ node1 node2 ]
>> >> and "crm node online" (as all other attempts to make crm to do
>> something) are timed out with "communication error".
>
>
>> Dmitry, which version do you have?
>
>
> root@node1:~# corosync -v
> Corosync Cluster Engine, version '2.3.3'
> Copyright (c) 2006-2009 Red Hat, Inc.
>
> - so nodelist is defenitely enough, and totem->interface->member is
> deprecated.
>
> So, am I at least right that the configuration with UDPu SHOULD work with
> geo-distributed nodes with only public IP addresses and no private/virtual
> subnetwork? If yes, how could I debug it?
>
> Here's some more info (x.x.x.x is a public IP associated to node1):
>
> root@node1:~# netstat -nap|grep coro
> udp 0 0 x.x.x.x:41083 0.0.0.0:*
> 7037/corosync
> udp 0 0 x.x.x.x:49299 0.0.0.0:*
> 7037/corosync
> udp 0 0 x.x.x.x:5405 0.0.0.0:*
> 7037/corosync
> unix 2 [ ACC ] STREAM LISTENING 52458 7037/corosync
> @quorum
> unix 2 [ ACC ] STREAM LISTENING 52455 7037/corosync
> @cmap
> unix 2 [ ACC ] STREAM LISTENING 52456 7037/corosync
> @cfg
> unix 2 [ ACC ] STREAM LISTENING 52457 7037/corosync
> @cpg
> unix 3 [ ] STREAM CONNECTED 52512 7037/corosync
> @cpg
> unix 3 [ ] STREAM CONNECTED 52625 7037/corosync
> @cpg
> unix 3 [ ] STREAM CONNECTED 52504 7037/corosync
> @cfg
> unix 3 [ ] STREAM CONNECTED 52520 7037/corosync
> @quorum
> unix 2 [ ] DGRAM 52420 7037/corosync
> unix 3 [ ] STREAM CONNECTED 52643 7037/corosync
> @quorum
> unix 3 [ ] STREAM CONNECTED 52568 7037/corosync
> @cpg
> unix 3 [ ] STREAM CONNECTED 52588 7037/corosync
> @cpg
> unix 3 [ ] STREAM CONNECTED 52554 7037/corosync
> @cpg
>
> root@node1:~# crm status
> Last updated: Tue Dec 30 04:33:40 2014
> Last change: Sun Dec 28 21:40:41 2014 via crmd on node2
> Stack: corosync
> Current DC: NONE
> 2 Nodes configured
> 0 Resources configured
> OFFLINE: [ node1 node2 ]
>
> root@node1:~# crm node online
> Error setting standby=off (section=nodes, set=nodes-1084751873):
> Communication error on send
> Error performing operation: Communication error on send
>
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Dmitry Koterov <dmitry.koterov@gmail.com> writes:

> Oh, seems I've found the solution! At least two mistakes was in my
> corosync.conf (BTW logs did not say about any errors, so my conclusion is
> based on my experiments only).
>
> 1. nodelist.node MUST contain only IP addresses. No hostnames! They simply
> do not work, "crm status" shows no nodes. And no warnings are in logs
> regarding this.

You can add name like this:

nodelist {
node {
ring0_addr: <public-ip-address-of-the-first-machine>
name: node1
}
node {
ring0_addr: <public-ip-address-of-the-second-machine>
name: node2
}
}

I used it on Ubuntu Trusty with udpu.

Regards.

--
Daniel Dehennin
Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
No, I meant that if you pass a domain name in ring0_addr, there are no
errors in logs, corosync even seems to find nodes (based on its logs), And
crm_node -l shows them, but in practice nothing really works. A verbose
error message would be very helpful in such case.

On Tuesday, December 30, 2014, Daniel Dehennin <daniel.dehennin@baby-gnu.org>
wrote:

> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>
> > Oh, seems I've found the solution! At least two mistakes was in my
> > corosync.conf (BTW logs did not say about any errors, so my conclusion is
> > based on my experiments only).
> >
> > 1. nodelist.node MUST contain only IP addresses. No hostnames! They
> simply
> > do not work, "crm status" shows no nodes. And no warnings are in logs
> > regarding this.
>
> You can add name like this:
>
> nodelist {
> node {
> ring0_addr: <public-ip-address-of-the-first-machine>
> name: node1
> }
> node {
> ring0_addr: <public-ip-address-of-the-second-machine>
> name: node2
> }
> }
>
> I used it on Ubuntu Trusty with udpu.
>
> Regards.
>
> --
> Daniel Dehennin
> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Dmitry,


> No, I meant that if you pass a domain name in ring0_addr, there are no
> errors in logs, corosync even seems to find nodes (based on its logs), And
> crm_node -l shows them, but in practice nothing really works. A verbose
> error message would be very helpful in such case.

This sounds weird. Are you sure that DNS names really maps to correct IP
address? In logs there should be something like "adding new UDPU member
{IP_ADDRESS}".

Regards,
Honza

>
> On Tuesday, December 30, 2014, Daniel Dehennin <daniel.dehennin@baby-gnu.org>
> wrote:
>
>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>
>>> Oh, seems I've found the solution! At least two mistakes was in my
>>> corosync.conf (BTW logs did not say about any errors, so my conclusion is
>>> based on my experiments only).
>>>
>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>> simply
>>> do not work, "crm status" shows no nodes. And no warnings are in logs
>>> regarding this.
>>
>> You can add name like this:
>>
>> nodelist {
>> node {
>> ring0_addr: <public-ip-address-of-the-first-machine>
>> name: node1
>> }
>> node {
>> ring0_addr: <public-ip-address-of-the-second-machine>
>> name: node2
>> }
>> }
>>
>> I used it on Ubuntu Trusty with udpu.
>>
>> Regards.
>>
>> --
>> Daniel Dehennin
>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
are definitely resolved), but in practice the cluster does not work, as I
said above. So validations of ringX_addr in corosync.conf would be very
helpful in corosync.

On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com> wrote:

> Dmitry,
>
>
> No, I meant that if you pass a domain name in ring0_addr, there are no
>> errors in logs, corosync even seems to find nodes (based on its logs), And
>> crm_node -l shows them, but in practice nothing really works. A verbose
>> error message would be very helpful in such case.
>>
>
> This sounds weird. Are you sure that DNS names really maps to correct IP
> address? In logs there should be something like "adding new UDPU member
> {IP_ADDRESS}".
>
> Regards,
> Honza
>
>
>> On Tuesday, December 30, 2014, Daniel Dehennin <
>> daniel.dehennin@baby-gnu.org>
>> wrote:
>>
>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>>
>>> Oh, seems I've found the solution! At least two mistakes was in my
>>>> corosync.conf (BTW logs did not say about any errors, so my conclusion
>>>> is
>>>> based on my experiments only).
>>>>
>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>>>>
>>> simply
>>>
>>>> do not work, "crm status" shows no nodes. And no warnings are in logs
>>>> regarding this.
>>>>
>>>
>>> You can add name like this:
>>>
>>> nodelist {
>>> node {
>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>> name: node1
>>> }
>>> node {
>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>> name: node2
>>> }
>>> }
>>>
>>> I used it on Ubuntu Trusty with udpu.
>>>
>>> Regards.
>>>
>>> --
>>> Daniel Dehennin
>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Dmitry,


> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
> are definitely resolved), but in practice the cluster does not work, as I
> said above. So validations of ringX_addr in corosync.conf would be very
> helpful in corosync.

that's weird. Because as long as DNS is resolved, corosync works only
with IP. This means, code path is exactly same with IP or with DNS. Do
you have logs from corosync?

Honza


>
> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com> wrote:
>
>> Dmitry,
>>
>>
>> No, I meant that if you pass a domain name in ring0_addr, there are no
>>> errors in logs, corosync even seems to find nodes (based on its logs), And
>>> crm_node -l shows them, but in practice nothing really works. A verbose
>>> error message would be very helpful in such case.
>>>
>>
>> This sounds weird. Are you sure that DNS names really maps to correct IP
>> address? In logs there should be something like "adding new UDPU member
>> {IP_ADDRESS}".
>>
>> Regards,
>> Honza
>>
>>
>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>> daniel.dehennin@baby-gnu.org>
>>> wrote:
>>>
>>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>>>
>>>> Oh, seems I've found the solution! At least two mistakes was in my
>>>>> corosync.conf (BTW logs did not say about any errors, so my conclusion
>>>>> is
>>>>> based on my experiments only).
>>>>>
>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>>>>>
>>>> simply
>>>>
>>>>> do not work, "crm status" shows no nodes. And no warnings are in logs
>>>>> regarding this.
>>>>>
>>>>
>>>> You can add name like this:
>>>>
>>>> nodelist {
>>>> node {
>>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>>> name: node1
>>>> }
>>>> node {
>>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>>> name: node2
>>>> }
>>>> }
>>>>
>>>> I used it on Ubuntu Trusty with udpu.
>>>>
>>>> Regards.
>>>>
>>>> --
>>>> Daniel Dehennin
>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Yes, now I have the clear experiment. Sorry, I misinformed you about
"adding new UDPU member" - when I use DNS names in ringX_addr, I don't see
such messages (for now). But, anyway, DNS names in ringX_addr seem not
working, and no relevant messages are in default logs. Maybe add some
validations for ringX_addr?

I'm having resolvable DNS names:

root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms

root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from
64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms

root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms


With corosync.conf below, nothing works:
...
nodelist {
node {
ring0_addr: node1
}
node {
ring0_addr: node2
}
node {
ring0_addr: node3
}
}
...
Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
('2.3.3'): started and ready to provide service.
Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
features: dbus testagents rdma watchdog augeas pie relro bindnow
Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: aes256 hash: sha1
Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
[a.b.c.d] is now up.
Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
corosync configuration map access [0]
Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
corosync configuration service [1]
Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
corosync profile loading service [4]
Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try modprobe
<a watchdog>
Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources configured.
Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
corosync watchdog service [7]
Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
corosync_votequorum
Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
corosync_votequorum failed to initialize.
Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
'corosync_quorum' failed to load for reason 'configuration error: nodelist
or quorum.expected_votes must be configured!'
Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
exiting with status 20 at service.c:356.


But with IP addresses specified in ringX_addr, everything works:
...
nodelist {
node {
ring0_addr: 104.236.71.79
}
node {
ring0_addr: 188.166.54.190
}
node {
ring0_addr: 128.199.116.218
}
}
...
Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
('2.3.3'): started and ready to provide service.
Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
features: dbus testagents rdma watchdog augeas pie relro bindnow
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
(UDP/IP Unicast).
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
transmit/receive security (NSS) crypto: aes256 hash: sha1
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
[a.b.c.d] is now up.
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync configuration map access [0]
Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync configuration service [1]
Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync cluster closed process group service v1.01 [2]
Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync profile loading service [4]
Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try modprobe
<a watchdog>
Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources configured.
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync watchdog service [7]
Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
corosync_votequorum
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync vote quorum service v1.0 [5]
Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
corosync cluster quorum service v0.1 [3]
Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
{a.b.c.d}
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
{e.f.g.h}
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
{i.j.k.l}
Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
(m.n.o.p:80) was formed. Members joined: 1760315215
Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
synchronization, ready to provide service.


On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse@redhat.com> wrote:

> Dmitry,
>
>
> > Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
> > are definitely resolved), but in practice the cluster does not work, as I
> > said above. So validations of ringX_addr in corosync.conf would be very
> > helpful in corosync.
>
> that's weird. Because as long as DNS is resolved, corosync works only
> with IP. This means, code path is exactly same with IP or with DNS. Do
> you have logs from corosync?
>
> Honza
>
>
> >
> > On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com> wrote:
> >
> >> Dmitry,
> >>
> >>
> >> No, I meant that if you pass a domain name in ring0_addr, there are no
> >>> errors in logs, corosync even seems to find nodes (based on its logs),
> And
> >>> crm_node -l shows them, but in practice nothing really works. A verbose
> >>> error message would be very helpful in such case.
> >>>
> >>
> >> This sounds weird. Are you sure that DNS names really maps to correct IP
> >> address? In logs there should be something like "adding new UDPU member
> >> {IP_ADDRESS}".
> >>
> >> Regards,
> >> Honza
> >>
> >>
> >>> On Tuesday, December 30, 2014, Daniel Dehennin <
> >>> daniel.dehennin@baby-gnu.org>
> >>> wrote:
> >>>
> >>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
> >>>>
> >>>> Oh, seems I've found the solution! At least two mistakes was in my
> >>>>> corosync.conf (BTW logs did not say about any errors, so my
> conclusion
> >>>>> is
> >>>>> based on my experiments only).
> >>>>>
> >>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
> >>>>>
> >>>> simply
> >>>>
> >>>>> do not work, "crm status" shows no nodes. And no warnings are in logs
> >>>>> regarding this.
> >>>>>
> >>>>
> >>>> You can add name like this:
> >>>>
> >>>> nodelist {
> >>>> node {
> >>>> ring0_addr: <public-ip-address-of-the-first-machine>
> >>>> name: node1
> >>>> }
> >>>> node {
> >>>> ring0_addr: <public-ip-address-of-the-second-machine>
> >>>> name: node2
> >>>> }
> >>>> }
> >>>>
> >>>> I used it on Ubuntu Trusty with udpu.
> >>>>
> >>>> Regards.
> >>>>
> >>>> --
> >>>> Daniel Dehennin
> >>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> >>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
> >>>>
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Re: CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Dmitry,


> Yes, now I have the clear experiment. Sorry, I misinformed you about
> "adding new UDPU member" - when I use DNS names in ringX_addr, I don't see

This is good to know

> such messages (for now). But, anyway, DNS names in ringX_addr seem not
> working, and no relevant messages are in default logs. Maybe add some
> validations for ringX_addr?
>
> I'm having resolvable DNS names:
>
> root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>

This is problem. Resolving node1 to localhost (127.0.0.1) is simply
wrong. Names you want to use in corosync.conf should resolve to
interface address. I believe other nodes has similar setting (so node2
resolved on node2 is again 127.0.0.1)

Please try to fix this problem first and let's see if this will solve
issue you are hitting.

Regards,
Honza

> root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from
> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>
> root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>
>
> With corosync.conf below, nothing works:
> ...
> nodelist {
> node {
> ring0_addr: node1
> }
> node {
> ring0_addr: node2
> }
> node {
> ring0_addr: node3
> }
> }
> ...
> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
> ('2.3.3'): started and ready to provide service.
> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
> features: dbus testagents rdma watchdog augeas pie relro bindnow
> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
> (UDP/IP Unicast).
> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: aes256 hash: sha1
> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
> [a.b.c.d] is now up.
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync configuration map access [0]
> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync configuration service [1]
> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync cluster closed process group service v1.01 [2]
> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync profile loading service [4]
> Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try modprobe
> <a watchdog>
> Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources configured.
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> corosync watchdog service [7]
> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
> corosync_votequorum
> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
> corosync_votequorum failed to initialize.
> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
> 'corosync_quorum' failed to load for reason 'configuration error: nodelist
> or quorum.expected_votes must be configured!'
> Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
> exiting with status 20 at service.c:356.
>
>
> But with IP addresses specified in ringX_addr, everything works:
> ...
> nodelist {
> node {
> ring0_addr: 104.236.71.79
> }
> node {
> ring0_addr: 188.166.54.190
> }
> node {
> ring0_addr: 128.199.116.218
> }
> }
> ...
> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
> ('2.3.3'): started and ready to provide service.
> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
> features: dbus testagents rdma watchdog augeas pie relro bindnow
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
> (UDP/IP Unicast).
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
> transmit/receive security (NSS) crypto: aes256 hash: sha1
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
> [a.b.c.d] is now up.
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync configuration map access [0]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync configuration service [1]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync cluster closed process group service v1.01 [2]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync profile loading service [4]
> Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try modprobe
> <a watchdog>
> Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources configured.
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync watchdog service [7]
> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
> corosync_votequorum
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync vote quorum service v1.0 [5]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> corosync cluster quorum service v0.1 [3]
> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> {a.b.c.d}
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> {e.f.g.h}
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> {i.j.k.l}
> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
> (m.n.o.p:80) was formed. Members joined: 1760315215
> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
> Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
> synchronization, ready to provide service.
>
>
> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse@redhat.com> wrote:
>
>> Dmitry,
>>
>>
>>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
>>> are definitely resolved), but in practice the cluster does not work, as I
>>> said above. So validations of ringX_addr in corosync.conf would be very
>>> helpful in corosync.
>>
>> that's weird. Because as long as DNS is resolved, corosync works only
>> with IP. This means, code path is exactly same with IP or with DNS. Do
>> you have logs from corosync?
>>
>> Honza
>>
>>
>>>
>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com> wrote:
>>>
>>>> Dmitry,
>>>>
>>>>
>>>> No, I meant that if you pass a domain name in ring0_addr, there are no
>>>>> errors in logs, corosync even seems to find nodes (based on its logs),
>> And
>>>>> crm_node -l shows them, but in practice nothing really works. A verbose
>>>>> error message would be very helpful in such case.
>>>>>
>>>>
>>>> This sounds weird. Are you sure that DNS names really maps to correct IP
>>>> address? In logs there should be something like "adding new UDPU member
>>>> {IP_ADDRESS}".
>>>>
>>>> Regards,
>>>> Honza
>>>>
>>>>
>>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>> daniel.dehennin@baby-gnu.org>
>>>>> wrote:
>>>>>
>>>>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>>>>>
>>>>>> Oh, seems I've found the solution! At least two mistakes was in my
>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>> conclusion
>>>>>>> is
>>>>>>> based on my experiments only).
>>>>>>>
>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>>>>>>>
>>>>>> simply
>>>>>>
>>>>>>> do not work, "crm status" shows no nodes. And no warnings are in logs
>>>>>>> regarding this.
>>>>>>>
>>>>>>
>>>>>> You can add name like this:
>>>>>>
>>>>>> nodelist {
>>>>>> node {
>>>>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>> name: node1
>>>>>> }
>>>>>> node {
>>>>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>> name: node2
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> --
>>>>>> Daniel Dehennin
>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: [corosync] CoroSync's UDPu transport for public IP addresses? [ In reply to ]
>
> > such messages (for now). But, anyway, DNS names in ringX_addr seem not
> > working, and no relevant messages are in default logs. Maybe add some
> > validations for ringX_addr?
> >
> > I'm having resolvable DNS names:
> >
> > root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
> > 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
> >
>
> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
> wrong. Names you want to use in corosync.conf should resolve to
> interface address. I believe other nodes has similar setting (so node2
> resolved on node2 is again 127.0.0.1)
>

Wow! What a shame! How could I miss it... So you're absolutely right,
thanks: that was the cause, an entry in /etc/hosts. On some machines I
removed it manually, but on others - didn't. Now I do it automatically
by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
initialization script.

I apologize for the mess.

So now I have only one place in corosync.conf where I need to specify a
plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
failed to load for reason 'configuration error: nodelist or
quorum.expected_votes must be configured!'" in the logs (BTW it does not
say that I mistaked in bindnetaddr). Is there a way to completely untie
from IP addresses?



> Please try to fix this problem first and let's see if this will solve
> issue you are hitting.
>
> Regards,
> Honza
>
> > root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from
> > 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
> >
> > root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
> > 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
> >
> >
> > With corosync.conf below, nothing works:
> > ...
> > nodelist {
> > node {
> > ring0_addr: node1
> > }
> > node {
> > ring0_addr: node2
> > }
> > node {
> > ring0_addr: node3
> > }
> > }
> > ...
> > Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
> > ('2.3.3'): started and ready to provide service.
> > Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
> > features: dbus testagents rdma watchdog augeas pie relro bindnow
> > Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
> > (UDP/IP Unicast).
> > Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
> > transmit/receive security (NSS) crypto: aes256 hash: sha1
> > Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
> > [a.b.c.d] is now up.
> > Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> > corosync configuration map access [0]
> > Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
> > Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> > corosync configuration service [1]
> > Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
> > Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> > corosync cluster closed process group service v1.01 [2]
> > Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
> > Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> > corosync profile loading service [4]
> > Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try
> modprobe
> > <a watchdog>
> > Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources configured.
> > Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
> > corosync watchdog service [7]
> > Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
> > corosync_votequorum
> > Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
> > corosync_votequorum failed to initialize.
> > Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
> > 'corosync_quorum' failed to load for reason 'configuration error:
> nodelist
> > or quorum.expected_votes must be configured!'
> > Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
> > exiting with status 20 at service.c:356.
> >
> >
> > But with IP addresses specified in ringX_addr, everything works:
> > ...
> > nodelist {
> > node {
> > ring0_addr: 104.236.71.79
> > }
> > node {
> > ring0_addr: 188.166.54.190
> > }
> > node {
> > ring0_addr: 128.199.116.218
> > }
> > }
> > ...
> > Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
> > ('2.3.3'): started and ready to provide service.
> > Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
> > features: dbus testagents rdma watchdog augeas pie relro bindnow
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
> > (UDP/IP Unicast).
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
> > transmit/receive security (NSS) crypto: aes256 hash: sha1
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
> > [a.b.c.d] is now up.
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync configuration map access [0]
> > Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync configuration service [1]
> > Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync cluster closed process group service v1.01 [2]
> > Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync profile loading service [4]
> > Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try
> modprobe
> > <a watchdog>
> > Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources configured.
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync watchdog service [7]
> > Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
> > corosync_votequorum
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync vote quorum service v1.0 [5]
> > Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
> > Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
> > corosync cluster quorum service v0.1 [3]
> > Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> > {a.b.c.d}
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> > {e.f.g.h}
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
> > {i.j.k.l}
> > Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
> > (m.n.o.p:80) was formed. Members joined: 1760315215
> > Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
> > Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
> > synchronization, ready to provide service.
> >
> >
> > On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse@redhat.com> wrote:
> >
> >> Dmitry,
> >>
> >>
> >>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
> >>> are definitely resolved), but in practice the cluster does not work,
> as I
> >>> said above. So validations of ringX_addr in corosync.conf would be very
> >>> helpful in corosync.
> >>
> >> that's weird. Because as long as DNS is resolved, corosync works only
> >> with IP. This means, code path is exactly same with IP or with DNS. Do
> >> you have logs from corosync?
> >>
> >> Honza
> >>
> >>
> >>>
> >>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com>
> wrote:
> >>>
> >>>> Dmitry,
> >>>>
> >>>>
> >>>> No, I meant that if you pass a domain name in ring0_addr, there are
> no
> >>>>> errors in logs, corosync even seems to find nodes (based on its
> logs),
> >> And
> >>>>> crm_node -l shows them, but in practice nothing really works. A
> verbose
> >>>>> error message would be very helpful in such case.
> >>>>>
> >>>>
> >>>> This sounds weird. Are you sure that DNS names really maps to correct
> IP
> >>>> address? In logs there should be something like "adding new UDPU
> member
> >>>> {IP_ADDRESS}".
> >>>>
> >>>> Regards,
> >>>> Honza
> >>>>
> >>>>
> >>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
> >>>>> daniel.dehennin@baby-gnu.org>
> >>>>> wrote:
> >>>>>
> >>>>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
> >>>>>>
> >>>>>> Oh, seems I've found the solution! At least two mistakes was in my
> >>>>>>> corosync.conf (BTW logs did not say about any errors, so my
> >> conclusion
> >>>>>>> is
> >>>>>>> based on my experiments only).
> >>>>>>>
> >>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
> >>>>>>>
> >>>>>> simply
> >>>>>>
> >>>>>>> do not work, "crm status" shows no nodes. And no warnings are in
> logs
> >>>>>>> regarding this.
> >>>>>>>
> >>>>>>
> >>>>>> You can add name like this:
> >>>>>>
> >>>>>> nodelist {
> >>>>>> node {
> >>>>>> ring0_addr: <public-ip-address-of-the-first-machine>
> >>>>>> name: node1
> >>>>>> }
> >>>>>> node {
> >>>>>> ring0_addr: <public-ip-address-of-the-second-machine>
> >>>>>> name: node2
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> I used it on Ubuntu Trusty with udpu.
> >>>>>>
> >>>>>> Regards.
> >>>>>>
> >>>>>> --
> >>>>>> Daniel Dehennin
> >>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
> >>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>>
> >>>>> Project Home: http://www.clusterlabs.org
> >>>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>>> Bugs: http://bugs.clusterlabs.org
> >>>>>
> >>>>>
> >>>>
> >>>> _______________________________________________
> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>>
> >>>> Project Home: http://www.clusterlabs.org
> >>>> Getting started:
> >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>>> Bugs: http://bugs.clusterlabs.org
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>>
> >>> Project Home: http://www.clusterlabs.org
> >>> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >>> Bugs: http://bugs.clusterlabs.org
> >>>
> >>
> >>
> >> _______________________________________________
> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >>
> >> Project Home: http://www.clusterlabs.org
> >> Getting started:
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> >> Bugs: http://bugs.clusterlabs.org
> >>
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> discuss mailing list
> discuss@corosync.org
> http://lists.corosync.org/mailman/listinfo/discuss
>
Re: [corosync] CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Dmitry Koterov napsal(a):
>>
>>> such messages (for now). But, anyway, DNS names in ringX_addr seem not
>>> working, and no relevant messages are in default logs. Maybe add some
>>> validations for ringX_addr?
>>>
>>> I'm having resolvable DNS names:
>>>
>>> root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
>>> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>>>
>>
>> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
>> wrong. Names you want to use in corosync.conf should resolve to
>> interface address. I believe other nodes has similar setting (so node2
>> resolved on node2 is again 127.0.0.1)
>>
>
> Wow! What a shame! How could I miss it... So you're absolutely right,
> thanks: that was the cause, an entry in /etc/hosts. On some machines I
> removed it manually, but on others - didn't. Now I do it automatically
> by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
> initialization script.
>
> I apologize for the mess.
>
> So now I have only one place in corosync.conf where I need to specify a
> plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
> 0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
> failed to load for reason 'configuration error: nodelist or
> quorum.expected_votes must be configured!'" in the logs (BTW it does not
> say that I mistaked in bindnetaddr). Is there a way to completely untie
> from IP addresses?

You can just remove whole interface section completely. Corosync will
find correct address from nodelist.

Regards,
Honza

>
>
>
>> Please try to fix this problem first and let's see if this will solve
>> issue you are hitting.
>>
>> Regards,
>> Honza
>>
>>> root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from
>>> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>>>
>>> root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
>>> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>>>
>>>
>>> With corosync.conf below, nothing works:
>>> ...
>>> nodelist {
>>> node {
>>> ring0_addr: node1
>>> }
>>> node {
>>> ring0_addr: node2
>>> }
>>> node {
>>> ring0_addr: node3
>>> }
>>> }
>>> ...
>>> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
>>> ('2.3.3'): started and ready to provide service.
>>> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
>>> (UDP/IP Unicast).
>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
>>> [a.b.c.d] is now up.
>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>> corosync configuration map access [0]
>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>> corosync configuration service [1]
>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>> corosync cluster closed process group service v1.01 [2]
>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>> corosync profile loading service [4]
>>> Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try
>> modprobe
>>> <a watchdog>
>>> Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources configured.
>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>> corosync watchdog service [7]
>>> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
>>> corosync_votequorum
>>> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
>>> corosync_votequorum failed to initialize.
>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
>>> 'corosync_quorum' failed to load for reason 'configuration error:
>> nodelist
>>> or quorum.expected_votes must be configured!'
>>> Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
>>> exiting with status 20 at service.c:356.
>>>
>>>
>>> But with IP addresses specified in ringX_addr, everything works:
>>> ...
>>> nodelist {
>>> node {
>>> ring0_addr: 104.236.71.79
>>> }
>>> node {
>>> ring0_addr: 188.166.54.190
>>> }
>>> node {
>>> ring0_addr: 128.199.116.218
>>> }
>>> }
>>> ...
>>> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
>>> ('2.3.3'): started and ready to provide service.
>>> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
>>> (UDP/IP Unicast).
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
>>> [a.b.c.d] is now up.
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync configuration map access [0]
>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync configuration service [1]
>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync cluster closed process group service v1.01 [2]
>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync profile loading service [4]
>>> Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try
>> modprobe
>>> <a watchdog>
>>> Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources configured.
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync watchdog service [7]
>>> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
>>> corosync_votequorum
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync vote quorum service v1.0 [5]
>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>> corosync cluster quorum service v0.1 [3]
>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>> {a.b.c.d}
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>> {e.f.g.h}
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>> {i.j.k.l}
>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
>>> (m.n.o.p:80) was formed. Members joined: 1760315215
>>> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
>>> Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
>>> synchronization, ready to provide service.
>>>
>>>
>>> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse@redhat.com> wrote:
>>>
>>>> Dmitry,
>>>>
>>>>
>>>>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS names
>>>>> are definitely resolved), but in practice the cluster does not work,
>> as I
>>>>> said above. So validations of ringX_addr in corosync.conf would be very
>>>>> helpful in corosync.
>>>>
>>>> that's weird. Because as long as DNS is resolved, corosync works only
>>>> with IP. This means, code path is exactly same with IP or with DNS. Do
>>>> you have logs from corosync?
>>>>
>>>> Honza
>>>>
>>>>
>>>>>
>>>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com>
>> wrote:
>>>>>
>>>>>> Dmitry,
>>>>>>
>>>>>>
>>>>>> No, I meant that if you pass a domain name in ring0_addr, there are
>> no
>>>>>>> errors in logs, corosync even seems to find nodes (based on its
>> logs),
>>>> And
>>>>>>> crm_node -l shows them, but in practice nothing really works. A
>> verbose
>>>>>>> error message would be very helpful in such case.
>>>>>>>
>>>>>>
>>>>>> This sounds weird. Are you sure that DNS names really maps to correct
>> IP
>>>>>> address? In logs there should be something like "adding new UDPU
>> member
>>>>>> {IP_ADDRESS}".
>>>>>>
>>>>>> Regards,
>>>>>> Honza
>>>>>>
>>>>>>
>>>>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>>>> daniel.dehennin@baby-gnu.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>>>>>>>
>>>>>>>> Oh, seems I've found the solution! At least two mistakes was in my
>>>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>>>> conclusion
>>>>>>>>> is
>>>>>>>>> based on my experiments only).
>>>>>>>>>
>>>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames! They
>>>>>>>>>
>>>>>>>> simply
>>>>>>>>
>>>>>>>>> do not work, "crm status" shows no nodes. And no warnings are in
>> logs
>>>>>>>>> regarding this.
>>>>>>>>>
>>>>>>>>
>>>>>>>> You can add name like this:
>>>>>>>>
>>>>>>>> nodelist {
>>>>>>>> node {
>>>>>>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>>>> name: node1
>>>>>>>> }
>>>>>>>> node {
>>>>>>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>>>> name: node2
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>>>
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Daniel Dehennin
>>>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@corosync.org
>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
Re: [corosync] CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Great, it works! Thank you.

It would be extremely helpful if this information will be included in a
default corosync.conf as comments:
- regarding allowed and even preferred absense of totem.interface in case
of UDPu
- that quorum section must not be empty, and that the default quorum.provider
could be corosync_votequorum (but not empty).

It would help to install and launch corosync instantly by novices.


On Fri, Jan 16, 2015 at 7:31 PM, Jan Friesse <jfriesse@redhat.com> wrote:

> Dmitry Koterov napsal(a):
>
>>
>>> such messages (for now). But, anyway, DNS names in ringX_addr seem not
>>>> working, and no relevant messages are in default logs. Maybe add some
>>>> validations for ringX_addr?
>>>>
>>>> I'm having resolvable DNS names:
>>>>
>>>> root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
>>>> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>>>>
>>>>
>>> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
>>> wrong. Names you want to use in corosync.conf should resolve to
>>> interface address. I believe other nodes has similar setting (so node2
>>> resolved on node2 is again 127.0.0.1)
>>>
>>>
>> Wow! What a shame! How could I miss it... So you're absolutely right,
>> thanks: that was the cause, an entry in /etc/hosts. On some machines I
>> removed it manually, but on others - didn't. Now I do it automatically
>> by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
>> initialization script.
>>
>> I apologize for the mess.
>>
>> So now I have only one place in corosync.conf where I need to specify a
>> plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
>> 0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
>> failed to load for reason 'configuration error: nodelist or
>> quorum.expected_votes must be configured!'" in the logs (BTW it does not
>> say that I mistaked in bindnetaddr). Is there a way to completely untie
>> from IP addresses?
>>
>
> You can just remove whole interface section completely. Corosync will find
> correct address from nodelist.
>
> Regards,
> Honza
>
>
>
>>
>>
>> Please try to fix this problem first and let's see if this will solve
>>> issue you are hitting.
>>>
>>> Regards,
>>> Honza
>>>
>>> root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from
>>>> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>>>>
>>>> root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
>>>> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>>>>
>>>>
>>>> With corosync.conf below, nothing works:
>>>> ...
>>>> nodelist {
>>>> node {
>>>> ring0_addr: node1
>>>> }
>>>> node {
>>>> ring0_addr: node2
>>>> }
>>>> node {
>>>> ring0_addr: node3
>>>> }
>>>> }
>>>> ...
>>>> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
>>>> ('2.3.3'): started and ready to provide service.
>>>> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
>>>> (UDP/IP Unicast).
>>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
>>>> [a.b.c.d] is now up.
>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>> corosync configuration map access [0]
>>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>> corosync configuration service [1]
>>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>> corosync cluster closed process group service v1.01 [2]
>>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>> corosync profile loading service [4]
>>>> Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try
>>>>
>>> modprobe
>>>
>>>> <a watchdog>
>>>> Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources
>>>> configured.
>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>> corosync watchdog service [7]
>>>> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
>>>> corosync_votequorum
>>>> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
>>>> corosync_votequorum failed to initialize.
>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
>>>> 'corosync_quorum' failed to load for reason 'configuration error:
>>>>
>>> nodelist
>>>
>>>> or quorum.expected_votes must be configured!'
>>>> Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
>>>> exiting with status 20 at service.c:356.
>>>>
>>>>
>>>> But with IP addresses specified in ringX_addr, everything works:
>>>> ...
>>>> nodelist {
>>>> node {
>>>> ring0_addr: 104.236.71.79
>>>> }
>>>> node {
>>>> ring0_addr: 188.166.54.190
>>>> }
>>>> node {
>>>> ring0_addr: 128.199.116.218
>>>> }
>>>> }
>>>> ...
>>>> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
>>>> ('2.3.3'): started and ready to provide service.
>>>> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
>>>> (UDP/IP Unicast).
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
>>>> [a.b.c.d] is now up.
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync configuration map access [0]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync configuration service [1]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync cluster closed process group service v1.01 [2]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync profile loading service [4]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try
>>>>
>>> modprobe
>>>
>>>> <a watchdog>
>>>> Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources
>>>> configured.
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync watchdog service [7]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
>>>> corosync_votequorum
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync vote quorum service v1.0 [5]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>> corosync cluster quorum service v0.1 [3]
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>>> {a.b.c.d}
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>>> {e.f.g.h}
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>>> {i.j.k.l}
>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
>>>> (m.n.o.p:80) was formed. Members joined: 1760315215
>>>> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
>>>> Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
>>>> synchronization, ready to provide service.
>>>>
>>>>
>>>> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse@redhat.com>
>>>> wrote:
>>>>
>>>> Dmitry,
>>>>>
>>>>>
>>>>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS
>>>>>> names
>>>>>> are definitely resolved), but in practice the cluster does not work,
>>>>>>
>>>>> as I
>>>
>>>> said above. So validations of ringX_addr in corosync.conf would be very
>>>>>> helpful in corosync.
>>>>>>
>>>>>
>>>>> that's weird. Because as long as DNS is resolved, corosync works only
>>>>> with IP. This means, code path is exactly same with IP or with DNS. Do
>>>>> you have logs from corosync?
>>>>>
>>>>> Honza
>>>>>
>>>>>
>>>>>
>>>>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com>
>>>>>>
>>>>> wrote:
>>>
>>>>
>>>>>> Dmitry,
>>>>>>>
>>>>>>>
>>>>>>> No, I meant that if you pass a domain name in ring0_addr, there are
>>>>>>>
>>>>>> no
>>>
>>>> errors in logs, corosync even seems to find nodes (based on its
>>>>>>>>
>>>>>>> logs),
>>>
>>>> And
>>>>>
>>>>>> crm_node -l shows them, but in practice nothing really works. A
>>>>>>>>
>>>>>>> verbose
>>>
>>>> error message would be very helpful in such case.
>>>>>>>>
>>>>>>>>
>>>>>>> This sounds weird. Are you sure that DNS names really maps to correct
>>>>>>>
>>>>>> IP
>>>
>>>> address? In logs there should be something like "adding new UDPU
>>>>>>>
>>>>>> member
>>>
>>>> {IP_ADDRESS}".
>>>>>>>
>>>>>>> Regards,
>>>>>>> Honza
>>>>>>>
>>>>>>>
>>>>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>>>>> daniel.dehennin@baby-gnu.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Oh, seems I've found the solution! At least two mistakes was in
>>>>>>>>> my
>>>>>>>>>
>>>>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>>>>>>>>>>
>>>>>>>>> conclusion
>>>>>
>>>>>> is
>>>>>>>>>> based on my experiments only).
>>>>>>>>>>
>>>>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames!
>>>>>>>>>> They
>>>>>>>>>>
>>>>>>>>>> simply
>>>>>>>>>
>>>>>>>>> do not work, "crm status" shows no nodes. And no warnings are in
>>>>>>>>>>
>>>>>>>>> logs
>>>
>>>> regarding this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> You can add name like this:
>>>>>>>>>
>>>>>>>>> nodelist {
>>>>>>>>> node {
>>>>>>>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>>>>> name: node1
>>>>>>>>> }
>>>>>>>>> node {
>>>>>>>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>>>>> name: node2
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>>>>
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Daniel Dehennin
>>>>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>>
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>>
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>
>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>>
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started:
>>>>>
>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>
>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>
>>>> Project Home: http://www.clusterlabs.org
>>>> Getting started: http://www.clusterlabs.org/
>>>> doc/Cluster_from_Scratch.pdf
>>>> Bugs: http://bugs.clusterlabs.org
>>>>
>>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@corosync.org
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>
>>>
>>
>
Re: [corosync] CoroSync's UDPu transport for public IP addresses? [ In reply to ]
Dmitry,


> Great, it works! Thank you.
>
> It would be extremely helpful if this information will be included in a
> default corosync.conf as comments:
> - regarding allowed and even preferred absense of totem.interface in case
> of UDPu

Yep

> - that quorum section must not be empty, and that the default quorum.provider
> could be corosync_votequorum (but not empty).

This is not entirely true. quorum.provider cannot be empty string, or
generally must be valid provider like corosync_votequorum. But
unspecified quorum.provider works without any problem (as in example
configuration file). Truth is, that Pacemaker must then be configured in
a way that quorum is not required.

Regards,
Honza

>
> It would help to install and launch corosync instantly by novices.
>
>
> On Fri, Jan 16, 2015 at 7:31 PM, Jan Friesse <jfriesse@redhat.com> wrote:
>
>> Dmitry Koterov napsal(a):
>>
>>>
>>>> such messages (for now). But, anyway, DNS names in ringX_addr seem not
>>>>> working, and no relevant messages are in default logs. Maybe add some
>>>>> validations for ringX_addr?
>>>>>
>>>>> I'm having resolvable DNS names:
>>>>>
>>>>> root@node1:/etc/corosync# ping -c1 -W100 node1 | grep from
>>>>> 64 bytes from node1 (127.0.1.1): icmp_seq=1 ttl=64 time=0.039 ms
>>>>>
>>>>>
>>>> This is problem. Resolving node1 to localhost (127.0.0.1) is simply
>>>> wrong. Names you want to use in corosync.conf should resolve to
>>>> interface address. I believe other nodes has similar setting (so node2
>>>> resolved on node2 is again 127.0.0.1)
>>>>
>>>>
>>> Wow! What a shame! How could I miss it... So you're absolutely right,
>>> thanks: that was the cause, an entry in /etc/hosts. On some machines I
>>> removed it manually, but on others - didn't. Now I do it automatically
>>> by sed -i -r "/^.*[[:space:]]$host([[:space:]]|\$)/d" /etc/hosts in the
>>> initialization script.
>>>
>>> I apologize for the mess.
>>>
>>> So now I have only one place in corosync.conf where I need to specify a
>>> plain IP address for UDPu: totem.interface.bindnetaddr. If I specify
>>> 0.0.0.0 there, I'm having a message "Service engine 'corosync_quorum'
>>> failed to load for reason 'configuration error: nodelist or
>>> quorum.expected_votes must be configured!'" in the logs (BTW it does not
>>> say that I mistaked in bindnetaddr). Is there a way to completely untie
>>> from IP addresses?
>>>
>>
>> You can just remove whole interface section completely. Corosync will find
>> correct address from nodelist.
>>
>> Regards,
>> Honza
>>
>>
>>
>>>
>>>
>>> Please try to fix this problem first and let's see if this will solve
>>>> issue you are hitting.
>>>>
>>>> Regards,
>>>> Honza
>>>>
>>>> root@node1:/etc/corosync# ping -c1 -W100 node2 | grep from
>>>>> 64 bytes from node2 (188.166.54.190): icmp_seq=1 ttl=55 time=88.3 ms
>>>>>
>>>>> root@node1:/etc/corosync# ping -c1 -W100 node3 | grep from
>>>>> 64 bytes from node3 (128.199.116.218): icmp_seq=1 ttl=51 time=252 ms
>>>>>
>>>>>
>>>>> With corosync.conf below, nothing works:
>>>>> ...
>>>>> nodelist {
>>>>> node {
>>>>> ring0_addr: node1
>>>>> }
>>>>> node {
>>>>> ring0_addr: node2
>>>>> }
>>>>> node {
>>>>> ring0_addr: node3
>>>>> }
>>>>> }
>>>>> ...
>>>>> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync Cluster Engine
>>>>> ('2.3.3'): started and ready to provide service.
>>>>> Jan 14 10:47:44 node1 corosync[15061]: [MAIN ] Corosync built-in
>>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing transport
>>>>> (UDP/IP Unicast).
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] Initializing
>>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [TOTEM ] The network interface
>>>>> [a.b.c.d] is now up.
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>>> corosync configuration map access [0]
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cmap
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>>> corosync configuration service [1]
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cfg
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>>> corosync cluster closed process group service v1.01 [2]
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [QB ] server name: cpg
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>>> corosync profile loading service [4]
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [WD ] No Watchdog, try
>>>>>
>>>> modprobe
>>>>
>>>>> <a watchdog>
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [WD ] no resources
>>>>> configured.
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine loaded:
>>>>> corosync watchdog service [7]
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Using quorum provider
>>>>> corosync_votequorum
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [QUORUM] Quorum provider:
>>>>> corosync_votequorum failed to initialize.
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [SERV ] Service engine
>>>>> 'corosync_quorum' failed to load for reason 'configuration error:
>>>>>
>>>> nodelist
>>>>
>>>>> or quorum.expected_votes must be configured!'
>>>>> Jan 14 10:47:44 node1 corosync[15062]: [MAIN ] Corosync Cluster Engine
>>>>> exiting with status 20 at service.c:356.
>>>>>
>>>>>
>>>>> But with IP addresses specified in ringX_addr, everything works:
>>>>> ...
>>>>> nodelist {
>>>>> node {
>>>>> ring0_addr: 104.236.71.79
>>>>> }
>>>>> node {
>>>>> ring0_addr: 188.166.54.190
>>>>> }
>>>>> node {
>>>>> ring0_addr: 128.199.116.218
>>>>> }
>>>>> }
>>>>> ...
>>>>> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync Cluster Engine
>>>>> ('2.3.3'): started and ready to provide service.
>>>>> Jan 14 10:48:28 node1 corosync[15155]: [MAIN ] Corosync built-in
>>>>> features: dbus testagents rdma watchdog augeas pie relro bindnow
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing transport
>>>>> (UDP/IP Unicast).
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] Initializing
>>>>> transmit/receive security (NSS) crypto: aes256 hash: sha1
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] The network interface
>>>>> [a.b.c.d] is now up.
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync configuration map access [0]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cmap
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync configuration service [1]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cfg
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync cluster closed process group service v1.01 [2]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: cpg
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync profile loading service [4]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [WD ] No Watchdog, try
>>>>>
>>>> modprobe
>>>>
>>>>> <a watchdog>
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [WD ] no resources
>>>>> configured.
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync watchdog service [7]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Using quorum provider
>>>>> corosync_votequorum
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync vote quorum service v1.0 [5]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: votequorum
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [SERV ] Service engine loaded:
>>>>> corosync cluster quorum service v0.1 [3]
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QB ] server name: quorum
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>>>> {a.b.c.d}
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>>>> {e.f.g.h}
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] adding new UDPU member
>>>>> {i.j.k.l}
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [TOTEM ] A new membership
>>>>> (m.n.o.p:80) was formed. Members joined: 1760315215
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [QUORUM] Members[1]: 1760315215
>>>>> Jan 14 10:48:28 node1 corosync[15156]: [MAIN ] Completed service
>>>>> synchronization, ready to provide service.
>>>>>
>>>>>
>>>>> On Mon, Jan 5, 2015 at 6:45 PM, Jan Friesse <jfriesse@redhat.com>
>>>>> wrote:
>>>>>
>>>>> Dmitry,
>>>>>>
>>>>>>
>>>>>> Sure, in logs I see "adding new UDPU member {IP_ADDRESS}" (so DNS
>>>>>>> names
>>>>>>> are definitely resolved), but in practice the cluster does not work,
>>>>>>>
>>>>>> as I
>>>>
>>>>> said above. So validations of ringX_addr in corosync.conf would be very
>>>>>>> helpful in corosync.
>>>>>>>
>>>>>>
>>>>>> that's weird. Because as long as DNS is resolved, corosync works only
>>>>>> with IP. This means, code path is exactly same with IP or with DNS. Do
>>>>>> you have logs from corosync?
>>>>>>
>>>>>> Honza
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Fri, Jan 2, 2015 at 2:49 PM, Jan Friesse <jfriesse@redhat.com>
>>>>>>>
>>>>>> wrote:
>>>>
>>>>>
>>>>>>> Dmitry,
>>>>>>>>
>>>>>>>>
>>>>>>>> No, I meant that if you pass a domain name in ring0_addr, there are
>>>>>>>>
>>>>>>> no
>>>>
>>>>> errors in logs, corosync even seems to find nodes (based on its
>>>>>>>>>
>>>>>>>> logs),
>>>>
>>>>> And
>>>>>>
>>>>>>> crm_node -l shows them, but in practice nothing really works. A
>>>>>>>>>
>>>>>>>> verbose
>>>>
>>>>> error message would be very helpful in such case.
>>>>>>>>>
>>>>>>>>>
>>>>>>>> This sounds weird. Are you sure that DNS names really maps to correct
>>>>>>>>
>>>>>>> IP
>>>>
>>>>> address? In logs there should be something like "adding new UDPU
>>>>>>>>
>>>>>>> member
>>>>
>>>>> {IP_ADDRESS}".
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Honza
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday, December 30, 2014, Daniel Dehennin <
>>>>>>>>> daniel.dehennin@baby-gnu.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Dmitry Koterov <dmitry.koterov@gmail.com <javascript:;>> writes:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Oh, seems I've found the solution! At least two mistakes was in
>>>>>>>>>> my
>>>>>>>>>>
>>>>>>>>>>> corosync.conf (BTW logs did not say about any errors, so my
>>>>>>>>>>>
>>>>>>>>>> conclusion
>>>>>>
>>>>>>> is
>>>>>>>>>>> based on my experiments only).
>>>>>>>>>>>
>>>>>>>>>>> 1. nodelist.node MUST contain only IP addresses. No hostnames!
>>>>>>>>>>> They
>>>>>>>>>>>
>>>>>>>>>>> simply
>>>>>>>>>>
>>>>>>>>>> do not work, "crm status" shows no nodes. And no warnings are in
>>>>>>>>>>>
>>>>>>>>>> logs
>>>>
>>>>> regarding this.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> You can add name like this:
>>>>>>>>>>
>>>>>>>>>> nodelist {
>>>>>>>>>> node {
>>>>>>>>>> ring0_addr: <public-ip-address-of-the-first-machine>
>>>>>>>>>> name: node1
>>>>>>>>>> }
>>>>>>>>>> node {
>>>>>>>>>> ring0_addr: <public-ip-address-of-the-second-machine>
>>>>>>>>>> name: node2
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> I used it on Ubuntu Trusty with udpu.
>>>>>>>>>>
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Daniel Dehennin
>>>>>>>>>> Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF
>>>>>>>>>> Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>>
>>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>>> Getting started:
>>>>>>>>>
>>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>>
>>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>>> Getting started:
>>>>>>>>
>>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>>>
>>>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>>
>>>>>>> Project Home: http://www.clusterlabs.org
>>>>>>> Getting started:
>>>>>>>
>>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>>
>>>>>> Project Home: http://www.clusterlabs.org
>>>>>> Getting started:
>>>>>>
>>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>>>
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>>>>
>>>>> Project Home: http://www.clusterlabs.org
>>>>> Getting started: http://www.clusterlabs.org/
>>>>> doc/Cluster_from_Scratch.pdf
>>>>> Bugs: http://bugs.clusterlabs.org
>>>>>
>>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss@corosync.org
>>>> http://lists.corosync.org/mailman/listinfo/discuss
>>>>
>>>>
>>>
>>
>


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org