Mailing List Archive

LVS and Notes
Has anyone reported or seen any problems with Lotus Notes and LVS in
general or RedHat's Piranha stuff in particular? We're doing LVS-NAT on
the LVS boxes.

We have a client with lots of NT-based Notes machines that we're trying
to locate behind LVS boxes. (The servers do things other than Notes, so
simply using Notes own replication/failover stuff isn't enough.)
They're saying that a given machine "works fine" before it's located
behind the LVS setup and "doesn't work right" or "is really slow" after
being relocated behind the LVS setup. If we take the box from behind
the LVS and set it back up the way it was, it "works fine" again.

But ... from what's been reported to me, any other services running on
that box work just fine after being placed on the LVS'ed network. It
really sounds like it's just a Notes configuration issue to me. But
since we're not Notes experts and our clients are, and they say they've
properly reconfigured the boxes once they're behind the LVS and I don't
know what to look for to check, I can't say for sure.

As best I can tell from what they've (not very coherently) reported,
client machines trying to contact a Notes server behind the LVS will
make a connection and log in (maybe) but be unable to actually open any
database for periods of time. After 15 or 20 minutes of
inaccessibility, the clients can work just fine for another 15 or 20
minutes, after which they are again unable to connect. Or any
connections to the Notes server are simply very very slow at
accomplishing anything as if they were communicating over a slow dialup
connection.

We're not really doing "load balancing" but "failover" with the LVS
boxes. We've weighted a "primary" box very heavily and a "backup" box
very low, so that the connections all go to the primary unless it goes
down, in which case the backup takes over. And neither the LVS nor the
Notes servers themselves are very heavily loaded at all, only a few
simultaneous connections at any given time.

And yes, we're redirecting Notes port 1352. (Or at least, we've been
told that is the proper and only needed port for Notes to work
correctly.)

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
With Microsoft products, failure is not Derek Glidden
an option - it's a standard component. http://3dlinux.org/
Choose your life. Choose your http://www.tbcpc.org/
future. Choose Linux. http://www.illusionary.com/
Re: LVS and Notes [ In reply to ]
Derek Glidden wrote:

> They're saying that a given machine "works fine" before it's located
> behind the LVS setup and "doesn't work right" or "is really slow" after
> being relocated behind the LVS setup. If we take the box from behind
> the LVS and set it back up the way it was, it "works fine" again.

This usually is an identd problem (it's mentioned a bit in the HOWTO)
Do the NT boxes have identd?

Here's the bit from the next (unreleased) version of the HOWTO

Joe
---


6.6.2.x auth/identd (port 113) and tcpwrappers (tcpd).

You may not think you're the auth service when you setup your LVS,
but you probably are. Any service in inetd running under tcpwrappers
(probably just about every service, if tcpwrappers is installed)
and sendmail (see section on sendmail) use it.

When a request arrived at an IP on a server for such a service (telnet,
sendmail),
auth client on the server will connect from a high port to client:auth asking
"who is the owner of the process requesting this service".
If the client's authd replies with a username@nodename, the reply
will be optionally logged on the server (eg to syslog) and the
connection will be handed over to telnet (or whichever service)
to allow to connect. If the reply is "root@nodename" or there is
no authd on the client, then the server's authd will wait till
a timeout before allowing connection and the server will optionally
log " @nodename" to syslog. The delay is about 5secs for
Slackware and 2mins for RedHat7.0. There is no checking of the
validity of the reply and username@nodename could be bogus.
The authd is a security feature. However it doesn't get the
server very much (you don't know who has made the connect request,
only what they told you), while clients that fail are delayed. This
may be a nuisance for people telnetting in, but will bring mail delivery
to a crawl.

Here's the tcpdump of the interaction (client telnetting to a VS-DR
LVS, with telnet on the real-server running inside tcpwrappers, client
and real-servers cannot connect directly i.e. they have no routing to each
other).

seen from client:


telnet connect request -
12:56:05.427252 client2.1038 > lvs.telnet: S 1170880662:1170880662(0) win 32120
<mss 1460,sackOK,timestamp 6539901[|tcp]> (DF) [tos 0x10]
12:56:05.427949 client2.1038 > lvs.telnet: . ack 416490630 win 32120
<nop,nop,timestamp 6539901 161874539> (DF) [tos 0x10]
12:56:05.431752 client2.1038 > lvs.telnet: P 0:27(27) ack 1 win 32120
<nop,nop,timestamp 6539902 161874539> (DF) [tos 0x10]

client replying to real-server's auth request
12:56:05.465152 client2.auth > lvs.1377: S 1159930752:1159930752(0) ack
417813448 win 32120 <mss 1460,sackOK,timestamp 6539905[|tcp]> (DF)
12:56:05.465405 lvs.1377 > client2.auth: R 417813448:417813448(0) win 0
12:56:08.464671 client2.auth > lvs.1377: S 1162930275:1162930275(0) ack
417813448 win 32120 <mss 1460,sackOK,timestamp 6540205[|tcp]> (DF)
12:56:08.464901 lvs.1377 > client2.auth: R 417813448:417813448(0) win 0

6 second delay then trying again
12:56:14.466048 client2.auth > lvs.1377: S 1168931649:1168931649(0) ack
417813448 win 32120 <mss 1460,sackOK,timestamp 6540805[|tcp]> (DF)
12:56:14.466275 lvs.1377 > client2.auth: R 417813448:417813448(0) win 0

client login to LVS
12:56:15.501272 client2.1038 > lvs.telnet: . ack 13 win 32120 <nop,nop,timestamp
6540908 161875546> (DF) [tos 0x10]
12:56:15.503946 client2.1038 > lvs.telnet: P 27:125(98) ack 52 win 32120
<nop,nop,timestamp 6540909 161875546> (DF) [tos 0x10]
12:56:15.509024 client2.1038 > lvs.telnet: P 125:128(3) ack 55 win 32120
<nop,nop,timestamp 6540909 161875547> (DF) [tos 0x10]
12:56:15.538816 client2.1038 > lvs.telnet: P 128:131(3) ack 88 win 32120
<nop,nop,timestamp 6540912 161875550> (DF) [tos 0x10]
12:56:15.551836 client2.1038 > lvs.telnet: . ack 90 win 32120 <nop,nop,timestamp
6540914 161875550> (DF) [tos 0x10]
12:56:15.571837 client2.1038 > lvs.telnet: . ack 106 win 32120
<nop,nop,timestamp 6540916 161875551> (DF) [tos 0x10]

Here's what it looks like on the real-server (this is a different connection
from the above sample, so the times
are not the same).

real-server receives telnet request on VIP
12:50:58.049909 client2.1040 > lvs.telnet: S 1605709966:1605709966(0) win 32120
<mss 1460,sackOK,timestamp 6580274[|tcp]> (DF) [tos 0x10]
12:50:58.051263 lvs.telnet > client2.1040: S 862075007:862075007(0) ack
1605709967 win 32120 <mss 1460,sackOK,timestamp 161914907[|tcp]> (DF
)
12:50:58.051661 client2.1040 > lvs.telnet: . ack 1 win 32120 <nop,nop,timestamp
6580274 161914907> (DF) [tos 0x10]
12:50:58.052819 client2.1040 > lvs.telnet: P 1:28(27) ack 1 win 32120
<nop,nop,timestamp 6580274 161914907> (DF) [tos 0x10]
12:50:58.053036 lvs.telnet > client2.1040: . ack 28 win 32120 <nop,nop,timestamp
161914907 6580274> (DF)

real-server initiates auth request from VIP to client:auth
12:50:58.088510 lvs.1379 > client2.auth: S 852509908:852509908(0) win 32120 <mss
1460,sackOK,timestamp 161914911[|tcp]> (DF)
12:51:01.083659 lvs.1379 > client2.auth: S 852509908:852509908(0) win 32120 <mss
1460,sackOK,timestamp 161915211[|tcp]> (DF)

real-server waits for timeout (about 8secs), sends final request to client:auth
12:51:07.083164 lvs.1379 > client2.auth: S 852509908:852509908(0) win 32120 <mss
1460,sackOK,timestamp 161915811[|tcp]> (DF)

telnet replies from real-server continue, login occurs
12:51:08.117727 lvs.telnet > client2.1040: P 1:13(12) ack 28 win 32120
<nop,nop,timestamp 161915914 6580274> (DF)
12:51:08.118142 client2.1040 > lvs.telnet: . ack 13 win 32120 <nop,nop,timestamp
6581281 161915914> (DF) [tos 0x10]
In an LVS, authd on the real-server will be able to contact to the client if -

VS-NAT, the real-servers are on public IPs (not likely, since you usually
hide the real-servers from public view and they'll be on 192.168.x.x or
10.x.x.x networks)

VS-NAT, and high ports are nat'ed out with a command like

director:/etc/lvs# ipchains -A forward -j MASQ -s 192.168.1.0/24 -d 0.0.0.0/0

You usually don't want to blanket masquerade all ports. You really
only want to masquerade ports that are being LVS'ed (so you can still
get to the other services) in which case, for each service being
LVS'ed, you to use ipchains rules like

director:/etc/lvs# ipchains -A forward -p tcp -j MASQ -s realserver1.foo.net
telnet -d 0.0.0.0/0

Since the auth client (on your telnet server) is connecting from a high port
on the server, a better ipchains rule which will allow auth to work when the
real-servers are on private IPs.

director:/etc/lvs# ipchains -A forward -j MASQ -s realserver1.foo.net 1024:65535
-d 0.0.0.0/0

This handles auth for VS-NAT.

There is no solution for VS-DR at the moment. The auth client on the real-server
initiates
the connection from the VIP. There is no way for a packet from VIP:high port to
get a
reply through the LVS.

1. the incoming packet from the client on the internet is destined for a
non-LVS'ed high port

2. the incoming packet is not a connect request.

3. the incoming packet is not associated with an established connection.

The reply from the LVS client will be dropped.

Here's how to turn off tcpwrappers.

inetd.conf will have a line like

telnet stream tcp nowait root /usr/sbin/tcpd in.telnetd

change this to

telnet stream tcp nowait root /usr/sbin/in.telnetd in.telnetd

and re-HUP inetd.




--
Joseph Mack PhD, Senior Systems Engineer, Lockheed Martin
contractor to the National Environmental Supercomputer Center,
mailto:mack.joseph@epa.gov ph# 919-541-0007, RTP, NC, USA