Mailing List Archive

some answers and other questions...
This is a multi-part message in MIME format.

------=_NextPart_000_009A_01C2C6BA.068C7440
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

So, i guess to answer some of my own questions, yes, i should have the =
same wackamole.conf for each box with the VIP the same for both. The =
Changelog reference in 1.2.0 was referring to having more real machines =
than VIPs by setting VIP to 0.0.0.0 on some of them. Must have been =
changed in 2.0.0. Oh, well. The core dump is still perdicatably =
happening. I am working around it as mentioned below. So, my other =
question is that on one of the machines the failover is not working to =
well. It keep getting these messages in var log messages. I assump the =
connect failed and Illegal session are bad. I'm going to be a total jerk =
and cross post since it is awfully quiet around here:

Jan 27 23:57:24 spokane wackamole[1147]: No such interface
Jan 27 23:57:26 spokane wackamole[1147]: connecting to 4803
Jan 27 23:57:26 spokane wackamole[1147]: Dequeued arp spoof notifier.
Jan 27 23:57:26 spokane wackamole[1147]: No such interface
Jan 27 23:57:26 spokane wackamole[1147]: Spread connect failed [-6].
Jan 27 23:57:29 spokane wackamole[1147]: SP_error: (-11) Illegal session =
was supplied=20
Jan 27 23:57:29 spokane wackamole[1147]: connecting to 4803
Jan 27 23:57:29 spokane wackamole[1147]: Dequeued arp spoof notifier.
Jan 27 23:57:29 spokane wackamole[1147]: No such interface
Jan 27 23:57:31 spokane wackamole[1147]: connecting to 4803
Jan 27 23:57:31 spokane wackamole[1147]: Dequeued arp spoof notifier.
Jan 27 23:57:31 spokane wackamole[1147]: No such interface
Jan 27 23:57:31 spokane wackamole[1147]: Spread connect failed [-6].
----- Original Message -----=20
From: Sumeet Pannu=20
To: wackamole-users@lists.backhand.org=20
Sent: Sunday, January 26, 2003 1:43 AM
Subject: [Wackamole-users] wackamole core dump


i would like to setup wackamole as a simple failover mechanism for a =
stateless web server running linux 2.2.16.
the web servers have IP addresses 192.168.0.1 and 192.168.0.2 =
respectively
i setup the following wackmole.conf on each:
=20
Spread =3D 4803
SpreadRetryInterval =3D 5s
Group =3D wack1
Control =3D /var/run/wack.it
Prefer None
VirtualInterfaces {
{ eth0:10.1.1.2/32 }
}
Arp-Cache =3D 90s
Notify {
eth0:10.1.1.5/32
eth0:10.1.1.4/32
eth0:10.1.1.6/32
eth0:192.168.0.0/24 throttle 128
arp-cache
}
balance {
AcquisitionsPerRound =3D all
interval =3D 4s
}
mature =3D 5s
=20
My first question is -- would that suffice for a failover scenario, or =
should the .2 backup web server have a virtual interface of =
eth0:0.0.0.0/32 (as per a change log i read)?
This of course is all theoretical since my real problem seems to be =
that wackamole seg faults when i try to start it. If i create a =
secondary interface (eth2 for eg) change references to eth2 for virtual =
interface it does not core dump. I can attach the core if you are =
interested, but it is 7.6megs...
spread seems to run fine (according to spmonitor and spuser passes =
messages back and forth), although i do start it with a spread -n =
hostname.
thx for your time.

------=_NextPart_000_009A_01C2C6BA.068C7440
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Diso-8859-1" =
http-equiv=3DContent-Type>
<META content=3D"MSHTML 5.00.3315.2870" name=3DGENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=3D#ffffff>
<DIV><FONT face=3DArial size=3D2>So, i guess to answer some of my own =
questions,=20
yes, i should have the same wackamole.conf for each box with the VIP the =
same=20
for both. The Changelog reference in 1.2.0 was referring to having more =
real=20
machines than VIPs by setting VIP to 0.0.0.0 on some of them. Must have =
been=20
changed in 2.0.0. Oh, well. The core dump is still perdicatably =
happening. I am=20
working around it as mentioned below. So, my other question is that on =
one of=20
the machines the failover is not working to well. It keep getting these =
messages=20
in var log messages. I assump the connect failed and Illegal session are =
bad.=20
I'm going to be a total jerk and cross post since it is awfully quiet =
around=20
here:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2><BR>Jan 27 23:57:24 spokane =
wackamole[1147]: No=20
such interface<BR>Jan 27 23:57:26 spokane wackamole[1147]: connecting to =

4803<BR>Jan 27 23:57:26 spokane wackamole[1147]: Dequeued arp spoof=20
notifier.<BR>Jan 27 23:57:26 spokane wackamole[1147]: No such =
interface<BR>Jan=20
27 23:57:26 spokane wackamole[1147]: Spread connect failed [-6].<BR>Jan =
27=20
23:57:29 spokane wackamole[1147]: SP_error: (-11) Illegal session was =
supplied=20
<BR>Jan 27 23:57:29 spokane wackamole[1147]: connecting to 4803<BR>Jan =
27=20
23:57:29 spokane wackamole[1147]: Dequeued arp spoof notifier.<BR>Jan 27 =

23:57:29 spokane wackamole[1147]: No such interface<BR>Jan 27 23:57:31 =
spokane=20
wackamole[1147]: connecting to 4803<BR>Jan 27 23:57:31 spokane =
wackamole[1147]:=20
Dequeued arp spoof notifier.<BR>Jan 27 23:57:31 spokane wackamole[1147]: =
No such=20
interface<BR>Jan 27 23:57:31 spokane wackamole[1147]: Spread connect =
failed=20
[-6].</FONT></DIV>
<BLOCKQUOTE=20
style=3D"BORDER-LEFT: #000000 2px solid; MARGIN-LEFT: 5px; MARGIN-RIGHT: =
0px; PADDING-LEFT: 5px; PADDING-RIGHT: 0px">
<DIV style=3D"FONT: 10pt arial">----- Original Message ----- </DIV>
<DIV=20
style=3D"BACKGROUND: #e4e4e4; FONT: 10pt arial; font-color: =
black"><B>From:</B>=20
<A href=3D"mailto:sumeetp@hotmail.com" =
title=3Dsumeetp@hotmail.com>Sumeet=20
Pannu</A> </DIV>
<DIV style=3D"FONT: 10pt arial"><B>To:</B> <A=20
href=3D"mailto:wackamole-users@lists.backhand.org"=20
=
title=3Dwackamole-users@lists.backhand.org>wackamole-users@lists.backhand=
.org</A>=20
</DIV>
<DIV style=3D"FONT: 10pt arial"><B>Sent:</B> Sunday, January 26, 2003 =
1:43=20
AM</DIV>
<DIV style=3D"FONT: 10pt arial"><B>Subject:</B> [Wackamole-users] =
wackamole core=20
dump</DIV>
<DIV><BR></DIV>
<DIV><FONT face=3DArial size=3D2>i would like to setup wackamole as a =
simple=20
failover mechanism for a stateless&nbsp;web server running linux=20
2.2.16.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>the&nbsp;web servers have IP =
addresses=20
192.168.0.1 and 192.168.0.2 respectively</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>i setup the following wackmole.conf =
on=20
each:</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Spread =3D =
4803<BR>SpreadRetryInterval =3D=20
5s<BR>Group =3D wack1<BR>Control =3D /var/run/wack.it</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Prefer None</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>VirtualInterfaces=20
{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { eth0:10.1.1.2/32=20
}<BR>}</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Arp-Cache =3D 90s</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>Notify=20
{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
eth0:10.1.1.5/32<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
eth0:10.1.1.4/32<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
eth0:10.1.1.6/32<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
eth0:192.168.0.0/24 throttle =
128<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
arp-cache<BR>}<BR>balance=20
{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
AcquisitionsPerRound =3D=20
all<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; interval =3D=20
4s<BR>}<BR>mature =3D 5s</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>My first question is --&nbsp;would =
that suffice=20
for a failover scenario, or should the .2 backup web server have a =
virtual=20
interface of eth0:0.0.0.0/32 (as per a change log i =
read)?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>This of course is all theoretical =
since my real=20
problem seems to be that wackamole&nbsp;seg faults&nbsp;when i try to =
start=20
it. If i create a secondary interface (eth2 for eg) change references =
to eth2=20
for virtual interface it does not core dump. I can attach the core if =
you are=20
interested, but it is 7.6megs...</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>spread seems to run fine (according =
to spmonitor=20
and spuser passes messages back and forth), although i do start it =
with a=20
spread -n hostname.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>thx for your=20
time.</FONT></DIV></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_009A_01C2C6BA.068C7440--
some answers and other questions... [ In reply to ]
thanks for the response.
>fewer Spread daemon's than Wackamole hosts
Does this imply that i have started another wackamole session on the same
machine w/o killing another one? I can buy that.
However if i have only one interface i still get an error. I don't know if
that has anything to do with the -11 and -6 spread errors.
I guess i can live with that as long as spread/wackamole works in a
perdicatable manner.
thanks again for your help.
ps. should i be cross posting this stuff?

wackamole.conf
Spread = 4803
SpreadRetryInterval = 5s
Group = wack1
Control = /var/run/wack.it
Prefer None
VirtualInterfaces {
{ eth2:10.1.1.2/32 }
}
Arp-Cache = 90s
Notify {
eth0:10.1.1.6/32
eth0:10.1.1.4/32
eth0:10.1.1.5.2/32
eth2:192.168.0.0/24 throttle 128
arp-cache
}
balance {
AcquisitionsPerRound = all
interval = 4s
}
mature = 5s

spread.conf:
Spread_Segment 192.168.0.255:4803 {

spokanea 192.168.0.1
spokaneb 192.168.0.2
}
DaemonUser = spread
DaemonGroup = spread




----- Original Message -----
From: "Ryan Caudy" <caudy@jhu.edu>
To: <wackamole-users@lists.backhand.org>
Sent: Tuesday, January 28, 2003 1:34 PM
Subject: Re: [Wackamole-users] some answers and other questions...


> I guess you've found out some of this stuff by yourself already.
>
> First, the change between 1.2.0 and 2.0.0 for configuration files
> basically consists of a more intuitive, flexible syntax. "vip" on the
> old configuration was for a preferred address, and "of" was for the
> virtual interfaces to be managed. In 2.0.0, you have "Prefer" and
> "VirtualInterfaces" declarations, and you can explicitly prefer None,
> rather than the dummy address 0.0.0.0.
>
> If the virtual interface lists between two daemons are not the same, I
> don't think you can expect things to work correctly, as this was
> originally an assumption in the system. In your system, it sounds like
> you want the main webserver to prefer the virtual address being managed,
> the backup to Prefer None, and both to have the same list of virtual
> addresses (i.e. eth0:10.1.1.2/32).
>
> Connect failed and illegal session are complaints from the Spread
> library routines used to connect to Spread and join the appropriate
> group. Could you send your current wackamole.conf and spread.conf files?
>
> The -6 return code translates to REJECT_NOT_UNIQUE. This may happen if
> the Wackamole daemon died without properly disconnecting from Spread...
> after a short interval Spread should accept the same private group name
> again, however. The other possible cause for this is trying to use
> fewer Spread daemon's than Wackamole hosts.
>
> --Ryan
>
> Sumeet wrote:
> > So, i guess to answer some of my own questions, yes, i should have the
> > same wackamole.conf for each box with the VIP the same for both. The
> > Changelog reference in 1.2.0 was referring to having more real machines
> > than VIPs by setting VIP to 0.0.0.0 on some of them. Must have been
> > changed in 2.0.0. Oh, well. The core dump is still perdicatably
> > happening. I am working around it as mentioned below. So, my other
> > question is that on one of the machines the failover is not working to
> > well. It keep getting these messages in var log messages. I assump the
> > connect failed and Illegal session are bad. I'm going to be a total jerk
> > and cross post since it is awfully quiet around here:
> >
> > Jan 27 23:57:24 spokane wackamole[1147]: No such interface
> > Jan 27 23:57:26 spokane wackamole[1147]: connecting to 4803
> > Jan 27 23:57:26 spokane wackamole[1147]: Dequeued arp spoof notifier.
> > Jan 27 23:57:26 spokane wackamole[1147]: No such interface
> > Jan 27 23:57:26 spokane wackamole[1147]: Spread connect failed [-6].
> > Jan 27 23:57:29 spokane wackamole[1147]: SP_error: (-11) Illegal session
> > was supplied
> > Jan 27 23:57:29 spokane wackamole[1147]: connecting to 4803
> > Jan 27 23:57:29 spokane wackamole[1147]: Dequeued arp spoof notifier.
> > Jan 27 23:57:29 spokane wackamole[1147]: No such interface
> > Jan 27 23:57:31 spokane wackamole[1147]: connecting to 4803
> > Jan 27 23:57:31 spokane wackamole[1147]: Dequeued arp spoof notifier.
> > Jan 27 23:57:31 spokane wackamole[1147]: No such interface
> > Jan 27 23:57:31 spokane wackamole[1147]: Spread connect failed [-6].
> >
> > ----- Original Message -----
> > *From:* Sumeet Pannu <mailto:sumeetp@hotmail.com>
> > *To:* wackamole-users@lists.backhand.org
> > <mailto:wackamole-users@lists.backhand.org>
> > *Sent:* Sunday, January 26, 2003 1:43 AM
> > *Subject:* [Wackamole-users] wackamole core dump
> >
> > i would like to setup wackamole as a simple failover mechanism for a
> > stateless web server running linux 2.2.16.
> > the web servers have IP addresses 192.168.0.1 and 192.168.0.2
> > respectively
> > i setup the following wackmole.conf on each:
> >
> > Spread = 4803
> > SpreadRetryInterval = 5s
> > Group = wack1
> > Control = /var/run/wack.it
> > Prefer None
> > VirtualInterfaces {
> > { eth0:10.1.1.2/32 }
> > }
> > Arp-Cache = 90s
> > Notify {
> > eth0:10.1.1.5/32
> > eth0:10.1.1.4/32
> > eth0:10.1.1.6/32
> > eth0:192.168.0.0/24 throttle 128
> > arp-cache
> > }
> > balance {
> > AcquisitionsPerRound = all
> > interval = 4s
> > }
> > mature = 5s
> >
> > My first question is -- would that suffice for a failover scenario,
> > or should the .2 backup web server have a virtual interface of
> > eth0:0.0.0.0/32 (as per a change log i read)?
> > This of course is all theoretical since my real problem seems to be
> > that wackamole seg faults when i try to start it. If i create a
> > secondary interface (eth2 for eg) change references to eth2 for
> > virtual interface it does not core dump. I can attach the core if
> > you are interested, but it is 7.6megs...
> > spread seems to run fine (according to spmonitor and spuser passes
> > messages back and forth), although i do start it with a spread -n
> > hostname.
> > thx for your time.
> >
>
>
>
> _______________________________________________
> wackamole-users mailing list
> wackamole-users@lists.backhand.org
> http://lists.backhand.org/mailman/listinfo/wackamole-users
>
some answers and other questions... [ In reply to ]
well, this is a pretty quiet town. I've found out further that it is infact
wackamole dying. running wackamole -d on both hosts gets me an interface up
and running. Then doing a wackatrl -f drops the interface on the first host
and on the second gives me
wackamole: wackamole.c:672: Send_state_message: Assertion 'ret ==
My.num_allocated' failed."
Aborted (core dumped)

I already have a VIP of x.x.x.x/32 as mentioned in some FreeBSD fix, so that
isn't the problem. Some other workarounds were mentioned but I'm not sure
what the person is trying to say. Any suggestions would be appreciated.

So i guess wackamole is pretty broken in two respects, obviously the above
thing shouldn't happen and secondly when wackamole on machine A realizes
that wackamole process on machine B has died, the correct behaviour would be
for it to snatch the ip right back, even though this may cause a core dump
on the original host as well. On top of this spread still experiences a core
dump when i try to start it as a second interface. Maybe these problems are
related.
Thanks, sumeet.
Sample testlog.out:
handle_events: select with timeout (1, 999948)
E_handle_events: next event
E_handle_events: exec time event
new: reusing pointer 0x813ee58 to object type 35 named time_event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
DL_send: sent a message of 28 bytes to (192.168.0.2,4804) on channel 5
Prot_token_hurry: retransmiting token 9 1
dispose: disposing pointer 0x813ee80 to object type 35 named time_event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999891)
E_handle_events: exec handler for fd 4, fd_type 0, priority 1
DL_recv: received 28 bytes on channel 4
Received Token
new: reusing pointer 0x813ee80 to object type 35 named time_event
dispose: disposing pointer 0x813ee58 to object type 35 named time_event
E_queue: dequeued a (first) simillar event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
new: reusing pointer 0x813ee58 to object type 35 named time_event
dispose: disposing pointer 0x81851b8 to object type 35 named time_event
E_queue: dequeued a simillar event
E_queue: (last) event queued func 0x8054a04 code 0 data 0x0 in future (5:0)
dispose: disposing pointer 0x813edd8 to object type 8 named token_head_obj
new: reusing pointer 0x813edd8 to object type 8 named token_head_obj
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999946)
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (0, 587)
E_handle_events: next event
E_handle_events: exec time event
new: reusing pointer 0x81851b8 to object type 35 named time_event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
DL_send: sent a message of 28 bytes to (192.168.0.2,4804) on channel 5
Prot_token_hurry: retransmiting token 10 1
dispose: disposing pointer 0x813ee80 to object type 35 named time_event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999893)
E_handle_events: exec handler for fd 4, fd_type 0, priority 1
DL_recv: received 28 bytes on channel 4
Received Token
new: reusing pointer 0x813ee80 to object type 35 named time_event
dispose: disposing pointer 0x81851b8 to object type 35 named time_event
E_queue: dequeued a (first) simillar event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
new: reusing pointer 0x81851b8 to object type 35 named time_event
dispose: disposing pointer 0x813ee58 to object type 35 named time_event
E_queue: dequeued a simillar event
E_queue: (last) event queued func 0x8054a04 code 0 data 0x0 in future (5:0)
dispose: disposing pointer 0x813ee00 to object type 8 named token_head_obj
new: reusing pointer 0x813ee00 to object type 8 named token_head_obj
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999948)
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (0, 41)
E_handle_events: next event
E_handle_events: exec time event
new: reusing pointer 0x813ee58 to object type 35 named time_event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
DL_send: sent a message of 28 bytes to (192.168.0.2,4804) on channel 5
Prot_token_hurry: retransmiting token 11 1
dispose: disposing pointer 0x813ee80 to object type 35 named time_event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999881)
E_handle_events: exec handler for fd 4, fd_type 0, priority 1
DL_recv: received 28 bytes on channel 4
Received Token
new: reusing pointer 0x813ee80 to object type 35 named time_event
dispose: disposing pointer 0x813ee58 to object type 35 named time_event
E_queue: dequeued a (first) simillar event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
new: reusing pointer 0x813ee58 to object type 35 named time_event
dispose: disposing pointer 0x81851b8 to object type 35 named time_event
E_queue: dequeued a simillar event
E_queue: (last) event queued func 0x8054a04 code 0 data 0x0 in future (5:0)
dispose: disposing pointer 0x813edd8 to object type 8 named token_head_obj
new: reusing pointer 0x813edd8 to object type 8 named token_head_obj
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999947)
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (0, 325)
E_handle_events: next event
E_handle_events: exec time event
new: reusing pointer 0x81851b8 to object type 35 named time_event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
DL_send: sent a message of 28 bytes to (192.168.0.2,4804) on channel 5
Prot_token_hurry: retransmiting token 12 1
dispose: disposing pointer 0x813ee80 to object type 35 named time_event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999896)
E_handle_events: exec handler for fd 4, fd_type 0, priority 1
DL_recv: received 28 bytes on channel 4
Received Token
new: reusing pointer 0x813ee80 to object type 35 named time_event
dispose: disposing pointer 0x81851b8 to object type 35 named time_event
E_queue: dequeued a (first) simillar event
E_queue: (first) event queued func 0x804b498 code 0 data 0x0 in future (2:0)
new: reusing pointer 0x81851b8 to object type 35 named time_event
dispose: disposing pointer 0x813ee58 to object type 35 named time_event
E_queue: dequeued a simillar event
E_queue: (last) event queued func 0x8054a04 code 0 data 0x0 in future (5:0)
dispose: disposing pointer 0x813ee00 to object type 8 named token_head_obj
new: reusing pointer 0x813ee00 to object type 8 named token_head_obj
E_handle_events: next event
E_handle_events: poll select
E_handle_events: select with timeout (1, 999947)
E_handle_events: next event
----- Original Message -----
From: "Sumeet" <sumeetp@hotmail.com>
To: <wackamole-users@lists.backhand.org>
Sent: Tuesday, January 28, 2003 4:12 PM
Subject: Re: [Wackamole-users] some answers and other questions...


> thanks for the response.
> >fewer Spread daemon's than Wackamole hosts
> Does this imply that i have started another wackamole session on the same
> machine w/o killing another one? I can buy that.
> However if i have only one interface i still get an error. I don't know if
> that has anything to do with the -11 and -6 spread errors.
> I guess i can live with that as long as spread/wackamole works in a
> perdicatable manner.
> thanks again for your help.
> ps. should i be cross posting this stuff?
>
> wackamole.conf
> Spread = 4803
> SpreadRetryInterval = 5s
> Group = wack1
> Control = /var/run/wack.it
> Prefer None
> VirtualInterfaces {
> { eth2:10.1.1.2/32 }
> }
> Arp-Cache = 90s
> Notify {
> eth0:10.1.1.6/32
> eth0:10.1.1.4/32
> eth0:10.1.1.5.2/32
> eth2:192.168.0.0/24 throttle 128
> arp-cache
> }
> balance {
> AcquisitionsPerRound = all
> interval = 4s
> }
> mature = 5s
>
> spread.conf:
> Spread_Segment 192.168.0.255:4803 {
>
> spokanea 192.168.0.1
> spokaneb 192.168.0.2
> }
> DaemonUser = spread
> DaemonGroup = spread
>
>
>
>
> ----- Original Message -----
> From: "Ryan Caudy" <caudy@jhu.edu>
> To: <wackamole-users@lists.backhand.org>
> Sent: Tuesday, January 28, 2003 1:34 PM
> Subject: Re: [Wackamole-users] some answers and other questions...
>
>
> > I guess you've found out some of this stuff by yourself already.
> >
> > First, the change between 1.2.0 and 2.0.0 for configuration files
> > basically consists of a more intuitive, flexible syntax. "vip" on the
> > old configuration was for a preferred address, and "of" was for the
> > virtual interfaces to be managed. In 2.0.0, you have "Prefer" and
> > "VirtualInterfaces" declarations, and you can explicitly prefer None,
> > rather than the dummy address 0.0.0.0.
> >
> > If the virtual interface lists between two daemons are not the same, I
> > don't think you can expect things to work correctly, as this was
> > originally an assumption in the system. In your system, it sounds like
> > you want the main webserver to prefer the virtual address being managed,
> > the backup to Prefer None, and both to have the same list of virtual
> > addresses (i.e. eth0:10.1.1.2/32).
> >
> > Connect failed and illegal session are complaints from the Spread
> > library routines used to connect to Spread and join the appropriate
> > group. Could you send your current wackamole.conf and spread.conf
files?
> >
> > The -6 return code translates to REJECT_NOT_UNIQUE. This may happen if
> > the Wackamole daemon died without properly disconnecting from Spread...
> > after a short interval Spread should accept the same private group name
> > again, however. The other possible cause for this is trying to use
> > fewer Spread daemon's than Wackamole hosts.
> >
> > --Ryan
> >
> > Sumeet wrote:
> > > So, i guess to answer some of my own questions, yes, i should have the
> > > same wackamole.conf for each box with the VIP the same for both. The
> > > Changelog reference in 1.2.0 was referring to having more real
machines
> > > than VIPs by setting VIP to 0.0.0.0 on some of them. Must have been
> > > changed in 2.0.0. Oh, well. The core dump is still perdicatably
> > > happening. I am working around it as mentioned below. So, my other
> > > question is that on one of the machines the failover is not working to
> > > well. It keep getting these messages in var log messages. I assump the
> > > connect failed and Illegal session are bad. I'm going to be a total
jerk
> > > and cross post since it is awfully quiet around here:
> > >
> > > Jan 27 23:57:24 spokane wackamole[1147]: No such interface
> > > Jan 27 23:57:26 spokane wackamole[1147]: connecting to 4803
> > > Jan 27 23:57:26 spokane wackamole[1147]: Dequeued arp spoof notifier.
> > > Jan 27 23:57:26 spokane wackamole[1147]: No such interface
> > > Jan 27 23:57:26 spokane wackamole[1147]: Spread connect failed [-6].
> > > Jan 27 23:57:29 spokane wackamole[1147]: SP_error: (-11) Illegal
session
> > > was supplied
> > > Jan 27 23:57:29 spokane wackamole[1147]: connecting to 4803
> > > Jan 27 23:57:29 spokane wackamole[1147]: Dequeued arp spoof notifier.
> > > Jan 27 23:57:29 spokane wackamole[1147]: No such interface
> > > Jan 27 23:57:31 spokane wackamole[1147]: connecting to 4803
> > > Jan 27 23:57:31 spokane wackamole[1147]: Dequeued arp spoof notifier.
> > > Jan 27 23:57:31 spokane wackamole[1147]: No such interface
> > > Jan 27 23:57:31 spokane wackamole[1147]: Spread connect failed [-6].
> > >
> > > ----- Original Message -----
> > > *From:* Sumeet Pannu <mailto:sumeetp@hotmail.com>
> > > *To:* wackamole-users@lists.backhand.org
> > > <mailto:wackamole-users@lists.backhand.org>
> > > *Sent:* Sunday, January 26, 2003 1:43 AM
> > > *Subject:* [Wackamole-users] wackamole core dump
> > >
> > > i would like to setup wackamole as a simple failover mechanism for
a
> > > stateless web server running linux 2.2.16.
> > > the web servers have IP addresses 192.168.0.1 and 192.168.0.2
> > > respectively
> > > i setup the following wackmole.conf on each:
> > >
> > > Spread = 4803
> > > SpreadRetryInterval = 5s
> > > Group = wack1
> > > Control = /var/run/wack.it
> > > Prefer None
> > > VirtualInterfaces {
> > > { eth0:10.1.1.2/32 }
> > > }
> > > Arp-Cache = 90s
> > > Notify {
> > > eth0:10.1.1.5/32
> > > eth0:10.1.1.4/32
> > > eth0:10.1.1.6/32
> > > eth0:192.168.0.0/24 throttle 128
> > > arp-cache
> > > }
> > > balance {
> > > AcquisitionsPerRound = all
> > > interval = 4s
> > > }
> > > mature = 5s
> > >
> > > My first question is -- would that suffice for a failover
scenario,
> > > or should the .2 backup web server have a virtual interface of
> > > eth0:0.0.0.0/32 (as per a change log i read)?
> > > This of course is all theoretical since my real problem seems to
be
> > > that wackamole seg faults when i try to start it. If i create a
> > > secondary interface (eth2 for eg) change references to eth2 for
> > > virtual interface it does not core dump. I can attach the core if
> > > you are interested, but it is 7.6megs...
> > > spread seems to run fine (according to spmonitor and spuser passes
> > > messages back and forth), although i do start it with a spread -n
> > > hostname.
> > > thx for your time.
> > >
> >
> >
> >
> > _______________________________________________
> > wackamole-users mailing list
> > wackamole-users@lists.backhand.org
> > http://lists.backhand.org/mailman/listinfo/wackamole-users
> >
>
> _______________________________________________
> wackamole-users mailing list
> wackamole-users@lists.backhand.org
> http://lists.backhand.org/mailman/listinfo/wackamole-users
>
some answers and other questions... [ In reply to ]
On Thursday, Jan 30, 2003, at 04:33 US/Eastern, Sumeet Pannu wrote:
> So i guess wackamole is pretty broken in two respects, obviously the
> above
> thing shouldn't happen and secondly when wackamole on machine A
> realizes
> that wackamole process on machine B has died, the correct behaviour
> would be
> for it to snatch the ip right back, even though this may cause a core
> dump
> on the original host as well. On top of this spread still experiences
> a core
> dump when i try to start it as a second interface. Maybe these
> problems are
> related.

First, please don't top-post... (include your comments at the bottom of
the mail you respond to).

Second, I have wackamole running production without any crashes.

Third, I am sorry you are having problems, but I would assume that most
people on this list don't share your problem or you would have seen
responses saying "Oh, I had that problem, here is what I did to fix it."

Now for what I see.

You are notifying on the 192.168 network which is useless as none of
your VIPs are on that network and an ARP-spoof for a 10/8 address on a
192.168/16 will not cause any reaction. That shouldn't be causing the
problem you are griping about.

You are getting a "no such interface" error, right? Well, that is a
problem. And the assertion failure is an issue as well. However, I
think that the "no such interface" error is more severe. If wackamole
assertion fails then it has come across an unexpected condition
(usually a misconfiguration or a byzantine participant), then its peers
should pick the VIPs and go on. If you see a "no such interface"
error, then something went wrong with ifup'ing or ifdown'ing a virtual
interface.

This has been tested under FreeBSD 4.0, Solaris 8/9, and Linux 2.4.x
and seems to work fine. If you are using an OS other than those, YMMV.
You are welcome to troubleshoot the problem and send in patches -- or
hire someone to do that for you. The wackamole core group is small and
doesn't have the resources to test wackamole on every
platform/architecture.

--
Theo Schlossnagle
Principal Consultant
OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Phone: +1 410 872 4910 x201 Fax: +1 410 872 4911
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
some answers and other questions... [ In reply to ]
> > So i guess wackamole is pretty broken in two respects, obviously the
> > above
> > thing shouldn't happen and secondly when wackamole on machine A
> > realizes
> > that wackamole process on machine B has died, the correct behaviour
> > would be
> > for it to snatch the ip right back, even though this may cause a core
> > dump
> > on the original host as well. On top of this spread still experiences
> > a core
> > dump when i try to start it as a second interface. Maybe these
> > problems are
> > related.

FWIW - I have had wackamole running in a fairly large environment in
production mode for almost a year, and solely on FreeBSD 4.x. I can attest
that it works in that environment, and works VERY well. We took over a month
to put it through all kinds of tests before moving it to production, and I
have to say it has been a very stable and overall an incredible product.
Hat's off to Theo and crew!

---
[This E-mail scanned for viruses by Declude Virus]
some answers and other questions... [ In reply to ]
From: "Jay West"
Subject: Re: [Wackamole-users] some answers and other questions...
Date: Thu, 30 Jan 2003 08:52:25 -0600

> > So i guess wackamole is pretty broken in two respects, obviously the
> > above
> > thing shouldn't happen and secondly when wackamole on machine A
> > realizes
> > that wackamole process on machine B has died, the correct behaviour
> > would be
> > for it to snatch the ip right back, even though this may cause a core
> > dump
> > on the original host as well. On top of this spread still experiences
> > a core
> > dump when i try to start it as a second interface. Maybe these
> > problems are
> > related.

FWIW - I have had wackamole running in a fairly large environment in
production mode for almost a year, and solely on FreeBSD 4.x. I can attest
that it works in that environment, and works VERY well. We took over a month
to put it through all kinds of tests before moving it to production, and I
have to say it has been a very stable and overall an incredible product.
Hat's off to Theo and crew!

---

Ditto, with a RedHat Linux 7.3 environment and Apache/JBOSS module as the
actual application we've made redundant. We've a couple hundred users
hitting these servers per day, no failures yet. It's been very nice to have
as we can do live maintenance by just taking down one VIP, working on that
box, testing it some, and then bringing it back up, and then working on the
other box.

I'm actually going to deploy it soon on a pair of multihomed RH 7.3 servers
acting as dns/dhcp/routers for each of the 4 segments.

Jason Roysdon
http://jason.roysdon.net/


_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE*
http://join.msn.com/?page=features/virus
some answers and other questions... [ In reply to ]
Jason wrote...
> Ditto, with a RedHat Linux 7.3 environment and Apache/JBOSS module as the
> actual application we've made redundant. We've a couple hundred users
> hitting these servers per day, no failures yet. It's been very nice to
have
> as we can do live maintenance by just taking down one VIP, working on that
> box, testing it some, and then bringing it back up, and then working on
the
> other box.

Just as an example... we have four servers (two webservers {FreeBSD, apache,
php}, two database servers {FreeBSD, mysql}), each with the following specs:

Dual P3 1.13ghz cpu's
Dual 10/100 Lan
2gb ECC ram
Three 18gb 10Krpm UltraSCSI drives (hot swap, raid5 via adaptec 2100S)

Monthly averages: 330,000 unique visitors, 7.4 million requests for pages,
and 345.2gb data transferred. For the past 6 months we have grown in
activity by about 20% per month.

The servers generally never dip below 70% idle

Jason hits the "sweet spot" in his email - we can down one of the servers in
the pair, upgrade it, test things out, and put it back into service -
without our customers missing a beat on the website. Our unplanned downtime
in the last 6 months (downtime as to the sites being unavailable to the
customers) is 20 minutes (due to a silly fat-finger-mistake on my part).
Theo and crew are on my christmas list :)

Jay West

---
[This E-mail scanned for viruses by Declude Virus]
some answers and other questions... [ In reply to ]
----- Original Message -----
From: "Theo Schlossnagle" <jesus@omniti.com>
To: <wackamole-users@lists.backhand.org>
Cc: "Theo Schlossnagle" <jesus@omniti.com>;
<spread-users-request@lists.spread.org>
Sent: Thursday, January 30, 2003 6:37 AM
Subject: Re: [Wackamole-users] some answers and other questions...


>
> On Thursday, Jan 30, 2003, at 04:33 US/Eastern, Sumeet Pannu wrote:
> > So i guess wackamole is pretty broken in two respects, obviously the
> > above
> > thing shouldn't happen and secondly when wackamole on machine A
> > realizes
> > that wackamole process on machine B has died, the correct behaviour
> > would be
> > for it to snatch the ip right back, even though this may cause a core
> > dump
> > on the original host as well. On top of this spread still experiences
> > a core
> > dump when i try to start it as a second interface. Maybe these
> > problems are
> > related.
>
> First, please don't top-post... (include your comments at the bottom of
> the mail you respond to).
>
> Second, I have wackamole running production without any crashes.
>
> Third, I am sorry you are having problems, but I would assume that most
> people on this list don't share your problem or you would have seen
> responses saying "Oh, I had that problem, here is what I did to fix it."
>
> Now for what I see.
>
> You are notifying on the 192.168 network which is useless as none of
> your VIPs are on that network and an ARP-spoof for a 10/8 address on a
> 192.168/16 will not cause any reaction. That shouldn't be causing the
> problem you are griping about.
>
> You are getting a "no such interface" error, right? Well, that is a
> problem. And the assertion failure is an issue as well. However, I
> think that the "no such interface" error is more severe. If wackamole
> assertion fails then it has come across an unexpected condition
> (usually a misconfiguration or a byzantine participant), then its peers
> should pick the VIPs and go on. If you see a "no such interface"
> error, then something went wrong with ifup'ing or ifdown'ing a virtual
> interface.
>
> This has been tested under FreeBSD 4.0, Solaris 8/9, and Linux 2.4.x
> and seems to work fine. If you are using an OS other than those, YMMV.
> You are welcome to troubleshoot the problem and send in patches -- or
> hire someone to do that for you. The wackamole core group is small and
> doesn't have the resources to test wackamole on every
> platform/architecture.
>
> --
> Theo Schlossnagle
> Principal Consultant
> OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
> Phone: +1 410 872 4910 x201 Fax: +1 410 872 4911
> 1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
> 2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
>
First, sorry for annoying everyone by top posting. I'm over that habit now.
Second, Great! And thanks to the other two people who sent testimonials as
well.
Third, i am entirely certain the problem lies with ifing up and down. I did
not see anywhere before that
there is a requirement to run wackamole/spread on any certain OSes/kernels.
Well as per your
suggestion I did upgrade to 2.4.18-3 with RedHat 7.3. As well, per someones
suggestion i ran gdb in
hopes it may help out. results below. As another issue that may or may not
be related, I have to run wackamole's
configure w/ --with-ldflags=/tmp/spread-src-3.1.7. If i do a /usr/local/lib
i get *** wackamole requires Spread ***
One last thing, it seems Geoff Campbell was seeing the same thing back in
August 2002 and there seemed to be
no real solution to the problem.
Thanks.

Going to write state: 904 bytes, filled 904
wackamole: wackamole.c:672: Send_state_message: Assertion `ret ==
My.num_allocated' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 1024 (LWP 20084)]
0x42029241 in kill () from /lib/i686/libc.so.6
(gdb) bt
#0 0x42029241 in kill () from /lib/i686/libc.so.6
#1 0x40043c4b in raise () from /lib/i686/libpthread.so.0
#2 0x4202a7d2 in abort () from /lib/i686/libc.so.6
#3 0x42022ddb in __assert_fail () from /lib/i686/libc.so.6
#4 0x0804b889 in main ()
#5 0x0804b186 in main ()
#6 0x0804ac6b in main ()
#7 0x400520d5 in E_handle_events () at events.c:675
#8 0x0804abdf in main ()
#9 0x42017499 in __libc_start_main () from /lib/i686/libc.so.6