Mailing List Archive

FW: [mod_backhand-users] Another New Candidacy Function: byChooseMeOverLocal
Well, that will effectively limit the requests from going to the proxy
servers. But, if you receive 1000 requests in one second, all of the
proxies *see* the same picture and will choose the *same* app server.
This can be horrific. Becuase the load on a UNIX box is the 5 second
average, you won't notice that you have beaten the box to death until it
is too late. Then you will proceed to beat another box to death.
(Round robind has its advantages, no?)

So, if you look at a randomized window (Log base 2 has some nice
theoretical properties -- but any decently small window will do), not
all of the machines will contend for the same resource. You are assured
however that you will not go to the n-1 most heavily loaded machines
(where n is the window size), because even in the case where your random
window covers exactly the most loaded server, you will choose the most
lightly loaded of them.

Basically, a randomized log window will prevent you from beating a
single machine to death. I think this is discussed in a paper by
Mitzenmacher and Dahlin.

So
<Directory>
Backhand byAge 3
BackhandFromSO libexec/byHostname.so byHostname app
Backhand byRandom
Backhand byLogWindow
Backhand byLoad -1000
</Directory>

would probably do the trick, if you were using proxies in the front
end... Note that the byLoad has changed in the current CVS (stable)
version and will be so in the next release to more accurately reflect my
miscoded intentions and the FAQ :)

Stephen Nickels wrote:
>
> That looks pretty cool. But why not use byHostname?
> Say you've got the following server setup:
>
> Proxies:
> proxy1
> proxy2
> proxy3
>
> mod_perl boxes:
> app1
> app2
> app3
>
> You could use the following setup on your proxies:
>
> <Directory /path/to/docroot>
> Backhand byAge 3
> BackhandFromSO libexec/byHostname.so byHostname app
> Backhand byRandom
> Backhand byLoad -1000
> </Directory
>
> This will make all requests go off to the mod_perl application boxes and
> should take the proxy boxes out of the loop for filling requests.

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
FW: [mod_backhand-users] Another New Candidacy Function: byChooseMeOverLocal [ In reply to ]
Sean Chittenden wrote:
> Quickie, but wasn't the point of byCost to prevent overloading a
> single machine?

Quickie question... Not quickie answer :) Here we go:

Not exactly. Actually, not at all. byCost suffers from problems
similar to byLoad. byCost is a reproach of the problem based on the
idea that "load isn't all that matters". Memory utilization is
important to. byCost is an algorithm designed to take into account the
utilization of various resources. The current implementation only
accounts for load and memory utilization, but could easily take into
account any other resource you have in mind (the equations are there,
just nothing else is being *plugged* in).

byCost takes memory into consideration, where byLoad does not. So, it
doesn't suffer as severely from the stale information problem. The
memory utilization info is updated in realtime (because that info is
made available to mod_backhand by the OS). However, the resource
utilization is only shared between the machines every second. This
means that for a full second (at least) they all see *almost* the same
thing. So byCost and byLoad suffer from stale information, because no
matter how hard you try, the information will always be at least
partially stale.

There are several methods to take stale information into account when
attempting to select the *best* server (you see now why you DO NOT want
to select the least loaded server :). Here are a few:

(1) addPrediction

The addPrediction function adds incremental values to a machines
*perceived* load after assigning requests to it (a little less if you
assign requests locally). These increments are wiped out when you get
the next resource broadcast for that machine, but then it adds up
again. It tries to calculate it *smartly* by determining how long
requests have been taking to that machine and guessing at the load a
request *would* incur. Sounds really cool! Does it work? Truthfully,
our simulation environment is not complete, so I can't tell you for
sure. (It can't hurt though).

(2) randomized subset (byLogWindow)

This is discussed in detail in a paper by Mitzenmacher and Dahlin, and
the math is out of the scope of this mailing list :) But it, obviously,
tackles the problem in a VERY different way than the first method works.

(3) randomized equalization

This is not implemented yet in mod_backhand, but when I get time, it
will be. The idea is that you assign the requests randomly, but the
randomness is not uniformly distributed. Instead, the distribution is
based inversly upon the normalized cost of the machines -- the lower the
(relative) cost, the higher the probability. Perhaps I will implement
this for the 1.1.0 release :) It should work very well (gut feeling).

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
FW: [mod_backhand-users] Another New Candidacy Function: byChooseMeOverLocal [ In reply to ]
Sean Chittenden wrote:
> True true. I always group byCost and addPrediction together. That
> said, I've included the config that we use (works very well right now!), and
> am curious to know if there is some mix/match ideal that you
> personally/academically/professionally prefer?
>
> <Location />
> Backhand byAge 2
> BackhandFromSO libexec/byHostname.so byHostname (app)
> Backhand byRandom
> Backhand byLogWindow
> Backhand byCost
> Backhand addPrediction
> </Location>

That looks pretty good to me. You might want to set Age up to something
high.. Maybe 10... on busy systems, the resource information collectiona
nd broadcasting process has a tendency to not get scheduled for a long
time. You can get a strange artifact in that you will wind up with a
COMPLETELY blank candidacy list in the end and the apache instance will
just pass the request right through (this could be bad).

> Does anyone know of a way to improve this? Here's the quickie
> drawing:
>
> | High availability
> V
> |Proxy w/ mod_backhand (w/ backhand config above)
> V
> |App w/ mod_backhand (but no proxying/relaying of requests in config)

If your App servers are TOO heavy weight, you might try making the Proxy
servers App servers and putting them all in the front line with
mod_backhand to balnce amongst themselves. If your App servers (each
apache child) have database connections and you can only afford (money
or resources) to have a specific quantity, then your existing set up is
the right way to go.


--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7