Mailing List Archive: [mod_backhand-users] Kernel 2.4.2, iptables, fake, redirection and mod

[mod_backhand-users] Kernel 2.4.2, iptables, fake, redirection and mod_backhand

Mar 4, 2001, 10:39 PM

Post #1 of 7 (1368 views)

I apologize in advance for the length of this post. I have been messing
extensively with mod_backhand, and I have questions, suggestions, and
some solutions I have come up with.

Please bear with me as I explain my situation, we'll get to the meat
soon enough :)

BACKGROUND

I have recently founded my second startup, a specialized portal (FYI,
it's about plastic surgery, and is currently at www.talksurgery.com, not
backhanded yet). My first startup was a high-end hosting company, a
spin-off of a big telco, where I designed a gigabit load-balanced HA
hosting infrastructure.

Having incredible budgets at the time, I had the opportunity to play
with a lot of expensive hardware (Cisco, Foundry, Arrowpoint, Sun
Cluster, etc.)

Having being spoiled by HA and load balancing, I am trying to find a
solution to upgrade my current (dinky) one-machine configuration to
something more robust and scalable.

Of course, this current startup is the real "basement" type, all money
being out-of-pocket. So I need a very cost-effective solution. So why
mod_backhand?

1) Cost-effective (Some other software solutions require $$, can't beat
the mod_backhand price :)

2) LVS is based on Heartbeat, which includes the "fake" code.
Unfortunately, Heartbeat requires a serial cable between boxes (please
feel free to flame and correct me if I'm wrong), so it may cramp
your scalability style somewhat. Also, I have heard that the solution
does not scale to more than 8 machines in practice, and that you can not
have geographically distributed clusters (I have suggestions for this
with mod_backhand, see below).

3) Other packages such as Piranha (RH solutions based on LVS) have been
proven to be unreliable and unsecure.

4) Although I played with complex solutions, I am a fervent adept of the
KISS principle. The other software packages are "be-all-end-all" for all
your HA needs. I prefer the modular approach where I use the pieces I
really need.

CURRENT SETUP

My hosting company has pretty good bandwith, and my machines are all
connected to the same 100Mbps VLAN with a Cisco Catalyst switch.

I've got three new machines I ordered from our hosting company:

(note: mydomain.com is an example)

www1.mydomain.com, www2.mydomain.com
- 2 x PIII 700, 128MB, RH 6.2, Kernel 2.4.2, to be used as static web
page servers and light CGI
Running apache 1.3.17 and mod_backhand

www3.mydomain.com
- 1 x PIII 800, 512MB, RH 6.2, Kernel 2.4.2, to be used for DB and heavy
CGI (forums, EmbPerl), in addition to serving static content.
Running two apache 1.3.17 servers: one mod_backhand, the other mod_perl
mod_php

RewriteRules are used to proxy request dynamic pages to www3.

"fake" is used for the virtual ip (VIP) for the main url:
www.mydomain.com, and can be put on any one of the machines. I actually
run it simultaneously on all machines, as Cisco switches correctly
handle gratuitous ARPs from different sources with the same IP (the
protocol states that the lowest MAC address is chosen for IP
forwarding). If you try this config, YMMV, as I know it breaks the
Windows TCP stack if you are testing the configuration locally, and not
across a router. This setup ensures that if any machine goes down, the
VIP will move to the next machine. You can also have a MON script which
turns on and off fake depending on what is up or down.

For this configuration to work, you unfortunately (fortunately?) need
iptables to do destination NAT from the VIP to the real machine IP so
the packets get to destination. Otherwise, you need to duplicate your
Apache config or do an internal Rewrite. Example: if www1 is the
current holder of the VIP:

VIP: 1.1.1.1 (fake)
www1: 1.1.1.2

/sbin/iptables -A PREROUTING -t mangle -p tcp --dport 80 -d 1.1.1.1/32
-j MARK --set-mark 444
/sbin/iptables -A PREROUTING -t nat -m mark --mark 444 -j DNAT
--to-destination 1.1.1.2

I have tried in vain to duplicate this with ipchains (kernel 2.2). Once
the packet gets to the mod_backhand machine, the real machine IPs are
used by it for backhand proxying.

PROGRESS and QUESTIONS

I have successfully tested mod_backhand in proxy mode, with no problem
whatsoever. I have a performance question though: I've tested the
platform with ApacheBench (ab), and I seem to top out at about 500
requests per second, whatever the number of backhanded machines (even
just one). Requests get distributed properly. Shouldn't it scale? Or is
there some OS limitation I am not aware of? I've tried a bunch of stuff
to tweak the performance, including the SendBufferSize modifications and
other stuff in the FAQ.

Also, the backhanding machine spirals to its death when too many
simultaneous connections occur. I've been up to 1000 concurrent
connections, and it's ok for a couple of runs (the backhanding machine
is really, really busy though), but the the box does not accept any
further requests, and I get a bunch of kernel error messages. I figure
that the number of incoming connections + proxy connections from the
backhander is just too much for the IP stack. I've tried increasing some
of the IP parameters, without much difference:

echo 32768 > /proc/sys/fs/file-max
echo 16368 > /proc/sys/net/ipv4/ip_conntrack_max
echo "7168 32767 65535" > /proc/sys/net/ipv4/tcp_mem
echo 32768 > /proc/sys/net/ipv4/tcp_max_orphans
echo 4096 > /proc/sys/net/ipv4/tcp_max_syn_backlog
echo 1 > /proc/sys/net/ipv4/tcp_syncookies
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 4 > /proc/sys/net/ipv4/tcp_syn_retries
echo 7 > /proc/sys/net/ipv4/tcp_retries2
echo 300 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 30 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 16384 > /proc/sys/fs/file-max
echo 16384 > /proc/sys/kernel/rtsig-max

It would be nice to limit the number of concurrent connections issued by
mod_backhand for proxying, or provide some way to pipeline the requests
and responses from one backhand process to another using a limited
number of connections, to be buffered on each machine until enough TCP
resources become available under heavy load. Or another solution to the
connection problem is to intelligently tear them down. I know that
keepalive connections are good, but a large number of incoming clients
will kill your backhanding box. I know it's not an easy question.

Who knows, maybe a funky solution like multicasting the requests to the
backend and the first backhand that responds wins (status info needs to
be exchanged) may be workable? You would only need specialized data
transfer connections for the responses, which will come in pipelined in
a more efficient manner? Just a thought.

I also tested the redirect mode of mod_backhand. This presents some
challenges, but also some interesting advantages. So this is what I did:

- defined www1.mydomain.com, www2.mydomain.com, www3.mydomain.com in
DNS, and as the default ServerNames of the boxes
- Ensured that the Apache configurations had aliases for the redirected
virtual hosts
- Added the proper backhand directives

This is what my config looks like:

ServerName www1.mydomain.com
...
<IfModule mod_backhand.c>
UnixSocketDir /home/apache/backhand
MulticastStats 225.0.0.2:4445,1
AcceptStats 1.1.1.0/24
<Location "/backhand/">
SetHandler backhand-handler
</Location>
</IfModule>
...
<Directory /home/httpd/www/www.mydomain.com>
Backhand HTTPRedirectToName
Backhand byAge
Backhand byRandom
Backhand addSelf
</Directory>
...
<VirtualHost 1.1.1.2>
ServerName www.mydomain.com
ServerAlias www1.mydomain.com
ServerAlias www2.mydomain.com
ServerAlias www3.mydomain.com
...
</VirtualHost>

(repeat for other servers)

Note that I didn't put any parameters in HTTPRedirectToName. When I did
(%1H.%-2S), the redirects where only going to the local server. No
parameter seems to redirect them to all servers, maybe because the
machine names match the default ServerName? I couldn't find any entries
in the server logs, even with BackhandLogLevel directives (+dcsnall and
+mbcsall). Also, no connections were reported by the backhand-handler,
which is strange.

Anyway, the significant benefit I was seeing from this is that drastic
diminution of the workload and number of connections of the backhanding
machine. The fact that each individual machine is responding directly to
the client should provide close to n times the throughput. I noticed
however that once it was backhanded, it never came back or changed
servers. Is this normal? I couldn't benchmark this properly, as
ApacheBench does not follow redirects.

Another potential advantage of redirection is the possibility of
geographically distributed servers, using multicast routing (mrouted) to
allow backhand to talk to the remote machines. This would be
rudimentary, as it is difficult to determine what the "closest" machine
is to the client.

The major problem is cookies. Some browsers (like mine) have "security"
settings which prevent setting cookies through entire domains
(*.mydomain.com), or setting cookies with a domain which is different to
the current ServerName. This can break some applications. I haven't
tackled this yet, but I'm working on it. Has anyone done this?

SUGGESTIONS (some really funky)

Initially, with my first startup, we used the Cisco LocalDirector,
essentially a bridge. It used a very interesting way of load balancing.
It would:

1) Receive the request packet
2) Make the load balancing decision and select a target server
3) It would rewrite (spoof) the MAC address of the packet to that of the
target server, and not touch anything else, then put it back on the wire
4) The target server would pick it up (note: the target had to have the
VIP address aliased to its looback device, so the packet would go
through the IP stack)
5) The target server would respond directly to the client (TCP session
numbers would be correct, etc.)

This had a tremendous advantage in that each machine responded to the
client individually. Unfortunately, since it was a bridge, you had to go
through it again, which limited it's capacity.

The Foundry ServerIrons did the same thing, except that it was a true
router, and so you could bypass it if it wasn't your gateway router.
They called this configuration type as "Direct Return". Since you could
be at a Level 3 (IP) distance from the Foundry, the servers could be
anywhere and respond from there. Also, even if the IP address on the
return packet didn't match, the Foundry fabricated the packet so that
the TCP session number was correct (since it sourced it). The IP stack
is very forgiving in Level 4, so the client was never the wiser.

My suggestion is for backhand proxying to only forward the request, and
let the target machine respond. Essentially being a "half-proxy", and
immediately tear down the connection (or use a pipelined special
connection?). I know this is easier said than done. I don't know if
mod_backhand can have direct access to the IP stack, and do some packet
mangling. Maybe in conjunction with some external util? I haven't found
this capability with iptables. The TCP session number could be passed in
the same fashion as the client address (in the headers). Again, this has
advantages in scalability and performance.

Another suggestion is to simplify configuration by having something like
a BackhandMaster <preference> directive, and only put the backhand
configuration paramaters there, instead of duplicating them everywhere.
The configuration would be loaded by the "slaves" at startup. The
preference could be used to have one other server have a copy of the
configuration in case the first fails. (in an arrangement such as above,
or using an external load balancing device)

Lastly, a cookie duplication scheme would be interesting for redirected
requests to different servers (see note about cookies above). Eg.: all
cookies for the domain "www.mydomain.com" are duplicated by mod_backhand
on www1.mydomain.com to the domain "www1.mydomain.com", etc. This could
be done by appending the cookie values to the redirect URL, which are
stripped by the receiving backhand. Just a thought.

Ok, that's enough! (for now :) Any comments, flames, and suggestions are
welcome.

Nice work, Theo.

Regards,

Dejan Macesic