Mailing List Archive: [mod_backhand-users] Performance + configuration questions

This is a really tough question. Questions you might want to consider that I
won't address in too much detail here:

o what do these tests mean? are they a valid test of performance? (i.e. Are
you setting up a business to handle traffic from WebLoad? or from customers?)
-- With that said, WebLoad is a pretty good testing program.

o Why are you distributing requests randomly across the machines? That is a
pretty damn good algorithm all by itself (randomized distribution) assuming
your cluster is homogeneous... What are you using to direct you *clients*
requests randomly across your servers? If you can't, then it is invalid to
allow WebLoad to randomly distribute your requests during testing. As far as
I remember, the random disbursement of requests from WebLoad doesn't resemble
the random nature you will get from DNS RR-- it is closer to that of a
hardware solution.

o Does testing a cluster at a pressure that causes 12 second turn-around
times make any sense? You have already lost you customer's attention at that
point, why not just shut your machines off? ;-) Try targeting your tests in
the realm in which you plan on operating. (sessions/hits/customers per
second) Then, see which solution works best. Stress testing is useful, but
it is only part of the data that should determine your "best" solution.

So, here is the issue your probably hitting:

If you are distributing your load across you machines already by using a
hardware load-balancer (as the WebLoad configuration would imply) then you
should be testing it that way and you should only use mod_backhand to
compensate for mistakes.

If you are using byLoad (and byLoad alone) there will be resource contention
issues due to stale load information. The load that mod_backhand sees is a
rolling 1 minute average load updated every 5 seconds --
what is provided by the OS. So, all of the machines see more-or-less the same
information at the same time. Using only byLoad, all of the machines will
choose the same machine and overwhelm it until the resource information is
updated. This is bad.

To solve this problem, several approaches can be used. One is a random
selection that is weighted by the inverse of the relative loads of the
machines -- left as an excercise to the reader. Another is to randomly select
a subset of machines and choose the least loaded in that subset. This will
assure that not all requests will be directed to the same machine.

Another issue is that it requires resources to proxy the connection to another
server. You need to account for these somewhere. This is why the byLoad
function takes an option "bias" parameter. It allows you to give favor to
yourself when choosing the best candidate.

And, of course, you always want to be able to choose yourself -- so make sure
you add yourself back in if you are removed due to a call to byLogWindow.

So, try something like (Note that the addSelf candidacy function is available
in versions *after* 1.1.0, includingthe stable snapshot available on the
mod_backhand download page -- it's implemenation is trivial):

# remove dead servers
Backhand byAge
# randomize list
Backhand byRandom
# choose a log window (now a random subset of size lg(n)
Backhand byLogWindow
# add yourself back if you were removed
Backhand addSelf
# pick the least loaded giving youself a bias of 2 normalized load points.
Backhand byLoad 2

This sort of set up should work well for compensating for mistakes made by a
hardware load balancer. You have to remember, in order to get a randomized
distribution of requests, you must already have a hardware solution as DNS RR
provides a very different distribution "signature."

If you are using a load-balancer in practice to distribute requests, you
should be running WebLoad against the load-balancer and *not* your web servers
-- otherwise you aren't really testing your end implementation.

If you are planning on using DNS RR to distributed the requests, I would try
to persuade you otherwise. It is very hard to model and thus extremely
difficult to test accurately (WebLoad's (and other's) randomization algorithms
are too "good," for lack of a better word... The granularity is to fine).
Additionaly, it has fault-tolerance issues that will have to be solved using
external machanisms -- good luck.

So, now for my questions?

o what is the resource utilization of the average CGI (rusage)?

o how big are the average requests and responses? (there may be some tuning
within mod_backhand that will help)

o what is your end topology going to look like? -- so I can see how
"applicable" your tests are.

Just as a perspective:
If you were to use a single UNIX box to be the load balancer to distribute
the requests and run mod_proxy on it or LVS or BIG/ip. There are a few
advantages you would gain by running mod_backhand on it instead:

o You get the same fault tolerance.

o mod_backhand *knows* more about the backend machines, so it can handle
crippled maachine and heterogeneous clusters automatically.

o mod_backhand will keep active (pipelined) HTTP sessions open to *all* of
the backend servers regardless of whether or not your clients turn on
keep-alives.

o if your cluster is heterogeneous, mod_backhand will adaptively compensate
-- no clumsy "ratios" or "weights" need to be guessed for your back end
machines.

gali_diamant@yahoo.com wrote:
> I'm testing mod_backhand 1.1.0 + Apache-1.3.12, using
> WebLoad to test the performance.
> I have 5 machines: test[1-5]
> My test sends GET + cgi requests randomally to the 5
> machines, runs for 5 minutes.
>
> All 5 machines are configured to Backhand byLoad only.
>
> It seems that the random test without backhand works
> better:
>
> The RoundTime is higher for mod_backhand:
> 30 sec (compared to 12 without)
>
> The number of processed requests per 5 minutes:
> 11000 with mod_backhand, 30000 without
>
> Number of rounds (= number of users)
> 1000 with mod_backhand, 2300 without.
>
> So it seems that mod_backhand is actually redirecting
> works, but makes my site slower.
> Does it make sense?
>
> What's the best configuration for such a scenario?

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7