Greetings;
We've had some problems with wackmole and arp notification, specifically the
cisco pix firewall doesn't seem to get the arp notifications from all the
machines in the wackamole cluster behind it. We found a posting in the
mailing list archives about this, and it appeared that it was all taken care
of (problem verified and fixed) in the latest CVS version. It was something
about now notifying the bc_mac address instead of ze_mac address.
So we put in the latest CVS and still have the problem it appears. If we
fail all machines in the cluster, when they come back up outside references
to those machines hit the cisco pix, and the pix has the arp entries for
those IP's on the wrong (old) machines. We can immediately fix this problem
by doing a "clear arp" on the cisco pix. However, I don't think this is a
pix issue.
I'm wondering if perhaps the arp entries are being cleaned up, but just not
as quickly as I would have thought. It's hard to test this theory because I
can't keep those machines down longer than a minute or two or top brass gets
a little irked :)
So in the search for what could be causing this, I'm wondering about the
time related variables in wackamole.conf, and perhaps I don't understand
them well as to the implication of their settings. Here is the file
(identical on all machines in the cluster except for Spread=):
Spread = 4803@britney.kwcorp.com
Group = web
SpreadRetryInterval = 5s
Control = /var/tmp/wack.it
Prefer None
VirtualInterfaces {
{em0:192.168.55.100/32 em0:192.168.55.101/32 em0:192.168.55.102/32
em0:192.168.55.103/32 em0:192.168.55.104/32}
{em0:192.168.55.110/32 em0:192.168.55.111/32 em0:192.168.55.112/32
em0:192.168.55.113/32 em0:192.168.55.114/32}
{em0:192.168.55.120/32 em0:192.168.55.121/32 em0:192.168.55.122/32
em0:192.168.55.123/32 em0:192.168.55.124/32}
{em0:192.168.55.130/32 em0:192.168.55.131/32 em0:192.168.55.132/32
em0:192.168.55.133/32 em0:192.168.55.134/32}
}
Arp-Cache = 90s
Notify {
em0:192.168.55.1/32
em0:192.168.55.0/24 throttle 128
arp-cache
}
balance {
AcquisitionsPerRound = all
interval = 4s
}
mature = 5s
Basically there are 4 machines in the cluster, and each machine has 4 VIP's
that should move as a group. 192.168.55.1 is the address of the inside
interface on the pix (where these webserver machines are located). I can't
find a lot in the documentation to explain exactly what the time related
settings really do, like SpreadRetryInterval, arp-cache, throttle 128,
interval, and mature. I have some basic idea what they mean, but not really
the impact or how to intelligently set them. Am I headed down the right path
here, and if so, can someone educate me a bit more on these settings? Also,
to my knowledge there are no special access lists or configuration to the
pix that would need to be done to allow this to happen.
Thanks!
Jay West
Knights Direct
---
[This E-mail scanned for viruses by Declude Virus]
We've had some problems with wackmole and arp notification, specifically the
cisco pix firewall doesn't seem to get the arp notifications from all the
machines in the wackamole cluster behind it. We found a posting in the
mailing list archives about this, and it appeared that it was all taken care
of (problem verified and fixed) in the latest CVS version. It was something
about now notifying the bc_mac address instead of ze_mac address.
So we put in the latest CVS and still have the problem it appears. If we
fail all machines in the cluster, when they come back up outside references
to those machines hit the cisco pix, and the pix has the arp entries for
those IP's on the wrong (old) machines. We can immediately fix this problem
by doing a "clear arp" on the cisco pix. However, I don't think this is a
pix issue.
I'm wondering if perhaps the arp entries are being cleaned up, but just not
as quickly as I would have thought. It's hard to test this theory because I
can't keep those machines down longer than a minute or two or top brass gets
a little irked :)
So in the search for what could be causing this, I'm wondering about the
time related variables in wackamole.conf, and perhaps I don't understand
them well as to the implication of their settings. Here is the file
(identical on all machines in the cluster except for Spread=):
Spread = 4803@britney.kwcorp.com
Group = web
SpreadRetryInterval = 5s
Control = /var/tmp/wack.it
Prefer None
VirtualInterfaces {
{em0:192.168.55.100/32 em0:192.168.55.101/32 em0:192.168.55.102/32
em0:192.168.55.103/32 em0:192.168.55.104/32}
{em0:192.168.55.110/32 em0:192.168.55.111/32 em0:192.168.55.112/32
em0:192.168.55.113/32 em0:192.168.55.114/32}
{em0:192.168.55.120/32 em0:192.168.55.121/32 em0:192.168.55.122/32
em0:192.168.55.123/32 em0:192.168.55.124/32}
{em0:192.168.55.130/32 em0:192.168.55.131/32 em0:192.168.55.132/32
em0:192.168.55.133/32 em0:192.168.55.134/32}
}
Arp-Cache = 90s
Notify {
em0:192.168.55.1/32
em0:192.168.55.0/24 throttle 128
arp-cache
}
balance {
AcquisitionsPerRound = all
interval = 4s
}
mature = 5s
Basically there are 4 machines in the cluster, and each machine has 4 VIP's
that should move as a group. 192.168.55.1 is the address of the inside
interface on the pix (where these webserver machines are located). I can't
find a lot in the documentation to explain exactly what the time related
settings really do, like SpreadRetryInterval, arp-cache, throttle 128,
interval, and mature. I have some basic idea what they mean, but not really
the impact or how to intelligently set them. Am I headed down the right path
here, and if so, can someone educate me a bit more on these settings? Also,
to my knowledge there are no special access lists or configuration to the
pix that would need to be done to allow this to happen.
Thanks!
Jay West
Knights Direct
---
[This E-mail scanned for viruses by Declude Virus]