Mailing List Archive

Rebooting first machine causes wackamole on second machine to abort
--=-C7EGpiS1MFTycsN7I1ie
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Hi

I'm trying out wackamole to see if I could use it for HA of a
mailing-list server address, testing on three identical machines
(mahler, haydn and mozart), all running RedHat 8 (Linux 2.4.18). We
currently use LVS, so I'd like to see if wackamole would be an
alternative, particularly as I only want HA and not load balancing.
Spread is installed and working OK on the three machines -- I can send
messages between them with spuser. I'm using the local multicast address
239.0.0.1 for the machine group.

I start wackamole on one of the machines successfully, assigning the
virtual IP (128.243.40.99) to eth0:1. (However, it does log "No such
interface" in /var/log/messages, as other people have reported in this
list, even though the conf file only mentions eth0 -- it would be useful
if the log message stated what interface it was objecting to).

Starting wackamole on the second machine is OK as well -- for this
exercise, I run it under strace. However, when I reboot the first
machine, wackamole on the second machine dies with an ABRT signal. (To
avoid sending the 31K of strace output to the list, I put it on our
webserver at http://www.nottingham.ac.uk/~cczdao/wackamole/strace.txt)

A similar thing happens using all three machines. I can start wackamole
on the first two OK, but starting it on the third machine makes the
instance on the second machine die with an ABRT signal.

It's as if the group membership change when the first machine is
rebooted, or when the third machine joins, upsets the second wackamole.
For the three-machine case, it doesn't matter which order I start the
wackamoles: it's always the second one which ABRTs, on whichever
machine.

spread.conf, wackamole.conf attached.

David
--
David Osborne
Central Systems & Security Team
Information Services
The University of Nottingham

--=-C7EGpiS1MFTycsN7I1ie
Content-Disposition: attachment; filename=spread.conf
Content-Type: text/plain; name=spread.conf; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

# Blank lines are permitted in this file.
# spread.conf sample file
#
# questions to spread@spread.org
#

#MINIMAL REQUIRED FILE
#
# Spread should work fine on one machine with just the uncommented
# lines below. The rest of the file documents all the options and
# more complex network setups.
#
# This configures one spread daemon running on port 4803 on localhost.

Spread_Segment 239.0.0.1:4803 {
haydn 128.243.40.92
mozart 128.243.40.93
mahler 128.243.40.94
}

#Spread_Segment 127.0.0.255:4803 {
# localhost 127.0.0.1
# haydn 128.243.40.92
# mozart 128.243.40.93
#}

# Spread options
#---------------------------------------------------------------------------
#---------------------------------------------------------------------------
#Set what internal Spread events are logged to the screen or file
# (see EventLogFile).
# Default setting is to enable PRINT and EXIT events only.
#The PRINT and EXIT types should always be enabled. The names of others are:
# EXIT PRINT DEBUG DATA_LINK NETWORK PROTOCOL SESSION
# CONFIGURATION MEMBERSHIP FLOW_CONTROL STATUS EVENTS
# GROUPS MEMORY SKIPLIST ALL NONE
# ALL and NONE are special and represent either enabling every type
# or enabling none of them.
# You can also use a "!" sign to negate a type,
# so { ALL !DATA_LINK } means log all events except data_link ones.

#DebugFlags = { PRINT EXIT }

#Set whether to log to a file as opposed to stdout/stderr and what
# file to log to.
# Default is to log to stdout.
#
#If option is not set then logging is to stdout.
#If option is set then logging is to the filename specified.
# The filename can include a %h or %H escape that will be replaced at runtime
# by the hostname of the machine upon which the daemon is running.
# For example "EventLogFile = spreadlog_%h.log" with 2 machines
# running Spread (machine1.mydomain.com and machine2.mydomain.com) will
# cause the daemons to log to "spreadlog_machine1.mydomain.com.log" and
# "spreadlog_machine2.mydomain.com.log" respectively.

EventLogFile = /usr/local/spread/testlog.out

#Set whether to add a timestamp in front of all logged events or not.
# Default is no timestamps. Default format is "[%a %d %b %Y %H:%M:%S]".
#If option is commented out then no timestamp is added.
#If option is enabled then a timestamp is added with the default format
#If option is enabled and set equal to a string, then that string is used
# as the format string for the timestamp. The string must be a valid time
# format string as used by the strftime() function.

#EventTimeStamp
# or
#EventTimeStamp = "[%a %d %b %Y %H:%M:%S]"

#Set whether to allow dangerous monitor commands
# like "partition, flow_control, or kill"
# Default setting is FALSE.
#If option is set to false then only "safe" monitor commands are allowed
# (such as requesting a status update).
#If option is set to true then all monitor commands are enabled.
# THIS IS A SECURTIY RISK IF YOUR NETWORK IS NOT PROTECTED!

#DangerousMonitor = false

#Set handling of SO_REUSEADDR socket option for the daemon's TCP
# listener. This is useful for facilitating quick daemon restarts (OSes
# often hold onto the interface/port combination for a short period of time
# after daemon shut down).
#
# AUTO - Active when bound to specific interfaces (default).
# ON - Always active, regardless of interface.
# SECURITY RISK FOR ANY OS WHICH ALLOW DOUBLE BINDS BY DIFFERENT USERS
# OFF - Always off.

#SocketPortReuse = AUTO

#Sets the runtime directory used when the Spread daemon is run as root
# as the directory to chroot to. Defaults to the value of the
# compile-time preprocessor define SP_RUNTIME_DIR, which is generally
# "/var/run/spread".

#RuntimeDir = /var/run/spread

#Sets the unix user that the Spread daemon runs as (when launched as
# the "root" user). Not effective on a Windows system. Defaults to
# the user and group "spread".

#DaemonUser = spread
#DaemonGroup = spread


#Set the list of authentication methods that the daemon will allow
# and those which are required in all cases.
# All of the methods listed in "RequiredAuthMethods" will be checked,
# irregardless of what methods the client chooses.
# Of the methods listed is "AllowedAuthMethods" the client is
# permitted to choose one or more, and all the ones the client chooses
# will also be checked.
#
# To support older clients, if NULL is enabled, then older clients can
# connect without any authentication. Any methods which do not require
# any interaction with the client (such as IP) can also be enabled
# for older clients. If you enable methods that require interaction,
# then essentially all older clients will be locked out.
#
#The current choices are:
# NULL for default, allow anyone authentication
# IP for IP based checks using the spread.access_ip file

#RequiredAuthMethods = " "
#AllowedAuthMethods = "NULL"

#Set the current access control policy.
# This is only needed if you want to establish a customized policy.
# The default policy is to allow any actions by authenticated clients.
#AccessControlPolicy = "PERMIT"


# network description line.
# Spread_Segment <multicast address for subnet> <port> {
# port is optional, if not specified the default 4803 port is used.

#Spread_Segment 127.0.0.255:4803 {

# either a name or IP address. If both are given, than the name is taken
# as-is, and the IP address is used for that name.

# localhost 127.0.0.1
#}
# repeat for next sub-network

#Spread_Segment x.2.2.255 {

# other1 128.2.2.10
# 128.2.2.11
# other3.my.com
#}
# Spread will feel free to use broadcast messages within a sub-network.
# if you do not want this to happen, you should specify your machines on
# different logical sub-networks.

# IP-Multicast addresses can also be used as the multicast address for
# the logical sub-network as in this example. If IP-multicast is supported
# by the operating system, then the messages will only be received
# by those machines who are in the group and not by all others in the same
# sub-network as happens with broadcast addresses

#Spread_Segment 225.0.1.1:3333 {
# mcast1 1.2.3.4
# mcast2 1.2.3.6
#}

# Multi-homed host setup
#
# If you run Spread on hosts with multiple interfaces you may want to
# control which interfaces Spread uses for client connections and for
# the daemon-to-daemon (and monitor control) messages. This can be done
# by adding an extra stanza to each configured machine.
#
#Sample:
#
#Spread_Segment 225.0.1.1 {
# multihomed1 1.2.3.4 {
# D 192.168.0.4
# C 1.2.3.4 }
# multihomed2 1.2.3.5 {
# D 192.168.0.5
# C 1.2.3.5
# C 127.0.0.1 }
# multihomed3 1.2.3.6 {
# 192.168.0.6
# 1.2.3.6 }
#}
# This configuration sets up three multihomed machines into a Spread segment.
# The first host has a 'main' IP address of 1.2.3.4 and listens for client
# connections only on that interface. All daemon-to-daemon UDP multicasts and
# the tokens and any monitor messages must use the 192.168.0.4 interface.
# The second host multihomed2 has a similar setup, except it also listens for
# client connections on the localhost interface as well as the 1.2.3.5 interface.
# If you make any use of the extra interface stanza ( a { } block ) then you must
# explicitly configure ALL interfaces you want as Spread removes all defaults when
# you use the explicit notation.
# The third multihomed3 host uses a shorthand form of omitting the D or C option and
# just listening for all types of traffic and events on both the 192.168.0 and 1.2.3
# networks. If no letter is listed before the interface address then ALL types of
# events are handled on that interface.

--=-C7EGpiS1MFTycsN7I1ie
Content-Disposition: attachment; filename=wackamole.conf
Content-Type: text/plain; name=wackamole.conf; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

# The Spread daemon we are going to connect to. It should be on the local box
Spread = 4803
SpreadRetryInterval = 5s
# The group name
Group = wack1
# Named socket for online control
Control = /var/run/wack.it

# Denote the interface we prefer to have
#prefer eth0:10.3.4.5/8
#prefer { eth0:10.2.3.4/8 eth1:192.168.10.23/24 }

# In most cases, I just don't care. Let wackamole decide.
Prefer None

# List all the virtual interfaces (ALL of them)
VirtualInterfaces {
{ eth0:128.243.40.99/16 }
}

# Collect and broadcast the IPs in our ARP table every so often
Arp-Cache = 90s

# List who we will notify
# Here the netblock (/24 or /28) can be deceptive. It is NOT a netmask
# for a single IP. It is how one will describe that they want to
# notify ALL IPs in a segment.
Notify {
# Wackamole shares arp-cache across machines, this says to
# notify every IP address in the aggregate shared arp-cache.
arp-cache
}
balance {
# This field is the maximum number of IP addresses that will move
# from one wackamole to another during a round of balancing.
AcquisitionsPerRound = all
# Time interval in each balancing round.
interval = 4s
}
# How long it takes us to mature
mature = 5s

--=-C7EGpiS1MFTycsN7I1ie--
Rebooting first machine causes wackamole on second machine to abort [ In reply to ]
Hi,

I would recommend setting up Spread not to use multicast. It looks
like you have a multicast issue on your network.
Try this first and see if the problem persists:

Spread_Segment 128.243.40.255:4803 {
haydn 128.243.40.92
mozart 128.243.40.93
mahler 128.243.40.94
}

Ciprian

DO> Hi

DO> I'm trying out wackamole to see if I could use it for HA of a
DO> mailing-list server address, testing on three identical machines
DO> (mahler, haydn and mozart), all running RedHat 8 (Linux 2.4.18). We
DO> currently use LVS, so I'd like to see if wackamole would be an
DO> alternative, particularly as I only want HA and not load balancing.
DO> Spread is installed and working OK on the three machines -- I can send
DO> messages between them with spuser. I'm using the local multicast address
DO> 239.0.0.1 for the machine group.

DO> I start wackamole on one of the machines successfully, assigning the
DO> virtual IP (128.243.40.99) to eth0:1. (However, it does log "No such
DO> interface" in /var/log/messages, as other people have reported in this
DO> list, even though the conf file only mentions eth0 -- it would be useful
DO> if the log message stated what interface it was objecting to).

DO> Starting wackamole on the second machine is OK as well -- for this
DO> exercise, I run it under strace. However, when I reboot the first
DO> machine, wackamole on the second machine dies with an ABRT signal. (To
DO> avoid sending the 31K of strace output to the list, I put it on our
DO> webserver at http://www.nottingham.ac.uk/~cczdao/wackamole/strace.txt)

DO> A similar thing happens using all three machines. I can start wackamole
DO> on the first two OK, but starting it on the third machine makes the
DO> instance on the second machine die with an ABRT signal.

DO> It's as if the group membership change when the first machine is
DO> rebooted, or when the third machine joins, upsets the second wackamole.
DO> For the three-machine case, it doesn't matter which order I start the
DO> wackamoles: it's always the second one which ABRTs, on whichever
DO> machine.

DO> spread.conf, wackamole.conf attached.

DO> David
Rebooting first machine causes wackamole on second machine to abort [ In reply to ]
Ciprian

Thanks for the quick reply -- that solved the problem. The second
wackamole kept running and took over the virtual IP.

Next problem is, however, after rebooting the first machine (and
starting spread & wackamole on it), it configs eth0:1 with the virtual
IP, even though the second one still has it and I have Prefer None in
the wackamole.conf. With "prefer none", I expected the VIP to stay on
the second machine and that the first machine wouldn't try to claim it
back.

David

On Wed, 2003-05-14 at 15:15, Ciprian Tutu wrote:
> Hi,
>
> I would recommend setting up Spread not to use multicast. It looks
> like you have a multicast issue on your network.
> Try this first and see if the problem persists:
>
> Spread_Segment 128.243.40.255:4803 {
> haydn 128.243.40.92
> mozart 128.243.40.93
> mahler 128.243.40.94
> }
>
> Ciprian

--
David Osborne david.osborne@nottingham.ac.uk
Central Systems & Security Team
Information Services
The University of Nottingham http://www.nottingham.ac.uk/~cczdao/