Well, after a couple days of investigation I have found lots of little
problems with my LVS:
1 firewall with NAT (FW)
1 LVS box with NAT (LB)
1 real server (RS)
FW:
eth2: 4.18.45.98
eth1: 192.168.0.254
eth1:0: 192.168.1.1
LVS:
eth0: 192.168.0.34
eth0:0: 192.168.1.2
RS:
eth0: 192.168.0.135
A common wire connects FW:eth1, LVS and RS.
So, a packet shows up at 4.18.45.98:80. It gets rewritten for dest
192.168.1.2:80 by FW. LVS receives it and it's rewritten for dest
192.168.0.135:80. Then RS receives it and responds. Soon after this, LVS
decides 'RS should just talk to FW directly' and issues an ICMP-REDIRECT.
Arrgghh.
I read the HOWTO and it says 'I think because the director is a NAT box,
icmp redirects are off.' I'm thinking maybe I turned on some kernel option
when compiling (I've got 2.2.18 with the DEC 30 1.0.3 patch) like 'Optimize
as router not host...?'
In section 9.5 of the HOWTO it says 'For 2.2.13 the ipvs patch handles this
[icmp redirect problem].' This doesn't appear to be the case.
Fortunately!, since I'm not using the config. scripts, I found the line in
the HOWTO that indicates how to disable the icmp-redirect syndrome on the
director:
echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
However, even after this was done, my director was still sending out
ICMP-redirects...?!
Another question: one test client is at IP 192.168.1.3, on the same wire as
everything else. When the RS arps for the address 192.168.1.3, it gets a
reply from my client. Is it then ignoring the default gw (LVS:192.168.0.34)
and just trying to respond directly to the client? According to the HOWTO
this shouldn't happen, but I can't think of any other reason why my
connection is hanging. Unfortunately, the version of tcpdump that I have
(default one with RedHat 6.2) _doesn't_ print out both MACs with the -e
option (supposed to print link-level data). It only prints one full MAC, and
then either 0:0:0:0:0:0 or 0:0:0:0:0:1 for the other. Is this some kind of
shorthand? Seems like a bug...?
... Well, I downloaded the latest tcpdump (v3.6.1, RedHat's is 3.4):
www.tcpdump.org
It fixes the -e problem.
Rather than testing my local (same wire) client, I then tested with an
outside client (OC). Indeed, the ICMP-redirects are taking place, _and_ the
RS is paying attention to them:
OC
|
FW
|
---LVS
|
---RS
1. OC Syn
2. RS Syn/Ack
3. LVS ICMP-Redir; this first one appears not to take immediate effect, thus
I get a successful connection
4. OC Ack
I then simply close the telnet connection from the OC.
5. OC Fin/Ack
6. RS Ack
7. LVS ICMP-Redir; this one does the damage
8. RS Fin to FW MAC (instead of LVS MAC)
9. FW Reset to RS
As you can see, either the first or 2nd redirect is causing the RS to try to
FIN the connection with the FW rather than the LVS. Grrrr.
Is there any way to stop this madness without resorting to two NICs on my
LVS? Two NICs is no problem, but the HOWTO states this is not necessary...
If anyone has similar experience/fixes please respond!
Thanks for all your help!
Justin
problems with my LVS:
1 firewall with NAT (FW)
1 LVS box with NAT (LB)
1 real server (RS)
FW:
eth2: 4.18.45.98
eth1: 192.168.0.254
eth1:0: 192.168.1.1
LVS:
eth0: 192.168.0.34
eth0:0: 192.168.1.2
RS:
eth0: 192.168.0.135
A common wire connects FW:eth1, LVS and RS.
So, a packet shows up at 4.18.45.98:80. It gets rewritten for dest
192.168.1.2:80 by FW. LVS receives it and it's rewritten for dest
192.168.0.135:80. Then RS receives it and responds. Soon after this, LVS
decides 'RS should just talk to FW directly' and issues an ICMP-REDIRECT.
Arrgghh.
I read the HOWTO and it says 'I think because the director is a NAT box,
icmp redirects are off.' I'm thinking maybe I turned on some kernel option
when compiling (I've got 2.2.18 with the DEC 30 1.0.3 patch) like 'Optimize
as router not host...?'
In section 9.5 of the HOWTO it says 'For 2.2.13 the ipvs patch handles this
[icmp redirect problem].' This doesn't appear to be the case.
Fortunately!, since I'm not using the config. scripts, I found the line in
the HOWTO that indicates how to disable the icmp-redirect syndrome on the
director:
echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects
However, even after this was done, my director was still sending out
ICMP-redirects...?!
Another question: one test client is at IP 192.168.1.3, on the same wire as
everything else. When the RS arps for the address 192.168.1.3, it gets a
reply from my client. Is it then ignoring the default gw (LVS:192.168.0.34)
and just trying to respond directly to the client? According to the HOWTO
this shouldn't happen, but I can't think of any other reason why my
connection is hanging. Unfortunately, the version of tcpdump that I have
(default one with RedHat 6.2) _doesn't_ print out both MACs with the -e
option (supposed to print link-level data). It only prints one full MAC, and
then either 0:0:0:0:0:0 or 0:0:0:0:0:1 for the other. Is this some kind of
shorthand? Seems like a bug...?
... Well, I downloaded the latest tcpdump (v3.6.1, RedHat's is 3.4):
www.tcpdump.org
It fixes the -e problem.
Rather than testing my local (same wire) client, I then tested with an
outside client (OC). Indeed, the ICMP-redirects are taking place, _and_ the
RS is paying attention to them:
OC
|
FW
|
---LVS
|
---RS
1. OC Syn
2. RS Syn/Ack
3. LVS ICMP-Redir; this first one appears not to take immediate effect, thus
I get a successful connection
4. OC Ack
I then simply close the telnet connection from the OC.
5. OC Fin/Ack
6. RS Ack
7. LVS ICMP-Redir; this one does the damage
8. RS Fin to FW MAC (instead of LVS MAC)
9. FW Reset to RS
As you can see, either the first or 2nd redirect is causing the RS to try to
FIN the connection with the FW rather than the LVS. Grrrr.
Is there any way to stop this madness without resorting to two NICs on my
LVS? Two NICs is no problem, but the HOWTO states this is not necessary...
If anyone has similar experience/fixes please respond!
Thanks for all your help!
Justin