Mailing List Archive

arriba calculation? (slightly long)
We have an odd problem; we recently migrated our application to 4 new
servers, identically configured with identical hardware. mod_backhand
1.2.1 is installed on all servers. Upon startup, the Arriba value
calculated for each of these servers varies wildly:

server 1: 625364
server 2: 723409
server 3: 643787
server 4: 562925

These results persist after a restart of the Apache server and removal of
the old Arriba files. Our backhand configuration is as follows:

BackhandSelfRedirect On
Backhand byAge
Backhand byRandom
Backhand byLogWindow
Backhand addSelf
Backhand byBusyChildren 1

The weight value on the byBusyChildren directive is intended to prevent
backhanding unless load goes over 1. We were testing this value to see how
it worked. These servers are 2-processor Sun 280Rs, and have no problem
serving requests at low load, so we wanted to let users who landed on the
server stay on the server unless load went above a certain threshold
(which would probably indicate a problem on the server, or an especially
long-running CGI). We intended to increase the threshold somewhat if
initial results were satisfactory. This could have been more thoroughly
tested, except that the application in question is an web-based e-mail
client, and it's difficult to emulate real-world conditions for that, for
economic and other reasons.

When in production, we're observing excessive usage of server 4; to the
point where it's doing up to 50% of the work, according to our statistics.
I can pinpoint three possible reasons for this:

1) Backhand is working properly (i.e., letting requests stay on the
recipient server at low load), but more requests are coming in to
server 4. This may be the result of misconfigured DNS servers that
aren't caching our round-robin DNS entry correctly. I know that all
Windows 2000 and higher boxes now have a built-in caching DNS server,
so this is not out of the realm of possibility.

2) The Arriba "miscalculation" is causing more requests to go to server 4.
(By implication, wouldn't this mean fewer requests would end up on
server 2? However, we don't observe this; load seems to be evenly spread over
the three remaining boxes; if anything, server 2 is doing slightly
more work than servers 1 and 3.)

3) Server 4 is actually faster, somehow, despite being identical in every
way.

I tend to discount option 3, and option 1 isn't really a topic for this
list. However, I can't seem to find any solid information on how the
Arriba value is calculated. Does anyone know of any? I looked at arriba.c
slightly, but I'm not very adept at C--it appears to create 12 threads,
measure the time it took, and calculate Arriba based on that--I was hoping
to confirm that suspicion.

Also, with this backhand configuration, does the Arriba value have any
import anyway? I thought putting the "byBusyChildren" directive last in
the list would cause the redirection to happen solely on the basis of the
number of busy Apache servers, which is essentially an estimate of the
length of the run queue, and ignore the Arriba or other resource estimates
on the server. We've avoided using the byLoad directive because we found
that in a cluster of servers with different hardware configurations,
byLoad tended to place too much emphasis on servers with extra memory.

Lastly, if option 1 is actually what's happening, shouldn't removing the
weight value on byBusyChildren cause server 4 to begin redirecting more
requests to the other servers? That would be acceptable.

I apologize for not being able to provide more information about what's
actually happening; the application is in production and for performance
reasons, we're chose not to enable logging of backhand information;
unfortunately, the application is dynamic and session-based, so now it's
politically expedient to wait for an appropriate time of low usage to
restart one or more of the servers to put the logging directives in.

Thanks for any suggestions-

James Ervin
UNC-Chapel Hill
arriba calculation? (slightly long) [ In reply to ]
On Thursday, August 15, 2002, at 11:53 , James Ervin wrote:
> The weight value on the byBusyChildren directive is intended to prevent
> backhanding unless load goes over 1. We were testing this value to see
> how

There is a bug in byBusyChildren in the 1.2.0 release. Make sure you
are running 1.2.1.

> it worked. These servers are 2-processor Sun 280Rs, and have no problem
> serving requests at low load, so we wanted to let users who landed on
> the
> server stay on the server unless load went above a certain threshold
> (which would probably indicate a problem on the server, or an especially
> long-running CGI). We intended to increase the threshold somewhat if
> initial results were satisfactory. This could have been more thoroughly
> tested, except that the application in question is an web-based e-mail
> client, and it's difficult to emulate real-world conditions for that,
> for
> economic and other reasons.
>
> When in production, we're observing excessive usage of server 4; to the
> point where it's doing up to 50% of the work, according to our
> statistics.
> I can pinpoint three possible reasons for this:
>
> 1) Backhand is working properly (i.e., letting requests stay on the
> recipient server at low load), but more requests are coming in to
> server 4. This may be the result of misconfigured DNS servers that
> aren't caching our round-robin DNS entry correctly. I know that all
> Windows 2000 and higher boxes now have a built-in caching DNS server,
> so this is not out of the realm of possibility.

DNS RR suffers from these problems in a cyclic nature. You should see
it on all the machines at different times.

> 2) The Arriba "miscalculation" is causing more requests to go to server
> 4.
> (By implication, wouldn't this mean fewer requests would end up on
> server 2? However, we don't observe this; load seems to be evenly
> spread over
> the three remaining boxes; if anything, server 2 is doing slightly
> more work than servers 1 and 3.)

You can manually set the Arriba by just editing the number in the Arriba
file. Make sure that you start up Apache while the machine is
absolutely idle the first time. It caclulates arriba by doing some
tight loop calculations. If you feel that these calculations don't
really represent the speed of your machines, just change the number in
the mod_backhand-arriba file and see how you fare.

> 3) Server 4 is actually faster, somehow, despite being identical in
> every
> way.

So, change the arriba.

> Lastly, if option 1 is actually what's happening, shouldn't removing the
> weight value on byBusyChildren cause server 4 to begin redirecting more
> requests to the other servers? That would be acceptable.

I don't think case 1 is what you are looking as it should cause similar
effect, after some time, on each other machine.

The byBusyChildren is interesting in that it tries to balance the
Apache-load (the number of Apache instances in the "run" queue). It is
only an approximation, but it works pretty well in all of the scenarios
I have tested.

--
Theo Schlossnagle
Principal Consultant
OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Phone: +1 301 776 6376 Fax: +1 410 880 4879
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
arriba calculation? (slightly long) [ In reply to ]
Theo-

On Sat, 24 Aug 2002, Theo Schlossnagle wrote:

> There is a bug in byBusyChildren in the 1.2.0 release. Make sure you
> are running 1.2.1.

We're running 1.2.1; I forgot to mention that.

> DNS RR suffers from these problems in a cyclic nature. You should see
> it on all the machines at different times.

Since I mailed this to the list, we've collected a week's worth of data,
and have observed exactly what you describe; yesterday, server 1 was in
the lead; today it looks like server 3 is catching up and server 4, which
was the original offender, is now behaving properly.

> So, change the arriba.

It's so simple I never thought of it. However, over the past week we've
also noticed some variance in the Arriba calculation, so we need to
ensure our machines are more idle when issuing restarts. This is an
argument in favor of a two-tier architecture that I'm unable to convince
my superiors of at the moment; it would give us more flexibility in
issuing restarts and doing maintenance.

> I don't think case 1 is what you are looking as it should cause similar
> effect, after some time, on each other machine.
>
> The byBusyChildren is interesting in that it tries to balance the
> Apache-load (the number of Apache instances in the "run" queue). It is
> only an approximation, but it works pretty well in all of the scenarios
> I have tested.

Since I'm now almost wholly convinced that the slightly lopsided weighting
of the servers is the result primarily of DNS and broken name resolution
all over our campus, at our next maintenance window I'm going to try
removing the "weight" from the byBusyChildren directive and see if that
stabilizes things somewhat. In the meantime, slight lopsidedness is
acceptable.

Thanks again for the help-

James Ervin
ATN Messaging Systems
UNC-Chapel Hill
arriba calculation? (slightly long) [ In reply to ]
In lists.backhand.users, you wrote:
> On a variety of days, several people wrote several things.
>
>> DNS RR suffers from these problems in a cyclic nature. You should see
>> it on all the machines at different times.
>
> Since I mailed this to the list, we've collected a week's worth of data,
> and have observed exactly what you describe; yesterday, server 1 was in
> the lead; today it looks like server 3 is catching up and server 4, which
> was the original offender, is now behaving properly.

Though this only partially solves your problem (for example, it reduces
redudancy to a large extent)... this is what we do. We have a single
"dumb" front end machine that doesn't serve itself, which then talks to
multiple backend boxes that do the real work and feed it back. Perhaps
some ascii art is called for::

Internet
|
**************
* Dumb Server*
**************
|
| Local Network
|
|
-----------------------
| | |
******** ******** ********
*Serv 1* *Serv 2* *Serv 3*
******** ******** ********

So forth and so on. The redancy isn't as much of an issue as it would
seem; One can use simple high availability scripts to back up if the
"dumb" frontend server fails.

A nice side effect from using this approach is that there are
dramatically fewer machines exposed to the outside world. Less targets
for malicious crackers and nimbdas scripts.

The trick to implementing this is on the frontend server simply call
"RemoveSelf". As a side benefit, you can put a simple "Down for
maintance" page on the frontend server for the astronomically rare case
that you loose all of your backend servers at the same time. Removeself
is apparently disregarded when there are *no* other machines available.

cheers,


--
GnuPG fingerprint AAE4 8C76 58DA 5902 761D 247A 8A55 DA73 0635 7400
James Blackwell -- Director http://www.linuxguru.net
arriba calculation? (slightly long) [ In reply to ]
James Blackwell wrote:

>Though this only partially solves your problem (for example, it reduces
>redudancy to a large extent)... this is what we do. We have a single
>"dumb" front end machine that doesn't serve itself, which then talks to
>multiple backend boxes that do the real work and feed it back. Perhaps
>some ascii art is called for::
>
> Internet
> |
> **************
> * Dumb Server*
> **************
> |
> | Local Network
> |
> |
> -----------------------
> | | |
> ******** ******** ********
> *Serv 1* *Serv 2* *Serv 3*
> ******** ******** ********
>
>So forth and so on. The redancy isn't as much of an issue as it would
>seem; One can use simple high availability scripts to back up if the
>"dumb" frontend server fails.
>
>
We do the same thing for one of our clients. But we run two front-end
"dumb servers" each running Apache+mod_ssl+mod_backhand and wackamole
for failover. It works like a charm. You turn one machine off and
there is no noticable service interruption.

One of the biggest benefits in this particular client's architecture is
that the developers can log in restart the back-end servers (which was a
requirement) and the SSL keys are completely self-contained on the
front-end machines and only the System Admins have access to those
machines via shell. The developers can't get a the keys if they wanted to!

This is very useful if you have very "heavy" servers on the back-end.
If you have "thin" servers throughout, I am not sure of the advantage
of tiering your architecture. One immediate downside is that you can
only support the number of concurrent connections that your front tier
supports. If that is an issue, there are many interesting things you
can do to have a workable two-tier solution, but I think it is much
healthier to first ask yourself why a single tier doesn't work.

--
Theo Schlossnagle
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
arriba calculation? (slightly long) [ In reply to ]
In lists.backhand.users, you wrote:
>>Though this only partially solves your problem (for example, it reduces
>>redudancy to a large extent)... this is what we do. We have a single
>>"dumb" front end machine that doesn't serve itself, which then talks to
>>multiple backend boxes that do the real work and feed it back. Perhaps
>>some ascii art is called for::

>>So forth and so on. The redancy isn't as much of an issue as it would
>>seem; One can use simple high availability scripts to back up if the
>>"dumb" frontend server fails.

> We do the same thing for one of our clients. But we run two front-end
> "dumb servers" each running Apache+mod_ssl+mod_backhand and wackamole
> for failover. It works like a charm. You turn one machine off and
> there is no noticable service interruption.

> This is very useful if you have very "heavy" servers on the back-end.
> If you have "thin" servers throughout, I am not sure of the advantage
> of tiering your architecture. One immediate downside is that you can
> only support the number of concurrent connections that your front tier
> supports. If that is an issue, there are many interesting things you
> can do to have a workable two-tier solution, but I think it is much
> healthier to first ask yourself why a single tier doesn't work.

Yes. Using a light frontend in our case makes perfect sense; that is why
we do it. I was trying to help that other gentlemen by giving a possible
solution that may help him. I'm actually quite happy with backhend on
our servers.

Regarding the thin front end being a potential bottleneck.... I can see
that happening. However I figure that an apache process on that front
end box is much smaller that the ones running on my back end. After all,
I don't have php, pgsql or any of a million other bells and whistles all
contending for memory.

Grin. Do you have any idea how sliced-bread neat backhand is? My only
disappointment with backhand is that bySession is not quite smart enough
to pull session ids from forms. We tried hacking bysession to try and
grab session id from the put and ended up causing a mess, so we
understand why it's not done (its difficult!)

Thank you very much for backhand!


--
GnuPG fingerprint AAE4 8C76 58DA 5902 761D 247A 8A55 DA73 0635 7400
James Blackwell -- Director http://www.linuxguru.net


--
GnuPG fingerprint AAE4 8C76 58DA 5902 761D 247A 8A55 DA73 0635 7400
James Blackwell -- Director http://www.linuxguru.net
arriba calculation? (slightly long) [ In reply to ]
> >
> > Internet
> > |
> > **************
> > * Dumb Server*
> > **************
> > |
> > | Local Network
> > |
> > |
> > -----------------------
> > | | |
> > ******** ******** ********
> > *Serv 1* *Serv 2* *Serv 3*
> > ******** ******** ********
> >


Can you please show me the backhand configuration for architecture
above, if possible with session support.


thanks.
arriba calculation? (slightly long) [ In reply to ]
In lists.backhand.users, you wrote:
>> >
>> > Internet
>> > |
>> > **************
>> > * Dumb Server*
>> > **************
>> > |
>> > | Local Network
>> > |
>> > |
>> > -----------------------
>> > | | |
>> > ******** ******** ********
>> > *Serv 1* *Serv 2* *Serv 3*
>> > ******** ******** ********
>> >
>
>
> Can you please show me the backhand configuration for architecture
> above, if possible with session support.
>
>
> thanks.


Sure. It's pretty easy.


However, we're currently not using bySession because the bySession
handler doesn't know how to get session ids out of http puts (i.e. form
variables). Our solution to getting sessions to work was to tell php to
use files to save sessions and NFSing out from the server that also does
the sql serving to the backend boxes. It would be convienant to pull the
session data out of the sql server, but I don't think php supports that
at this point. If you do add bySession.php, make sure you take care to
add a session id to every URI.


I.E. do not do this:
<form action="/dothis">
<input type="hidden" name="PHPSESSID" value="<thesessionid>"
</form>

But do this instead:
<form action="/dothis?PHPSESSID=<thesessionid>>
</form>

Obviously the latter is kinda messy, and dumber browsers can get really
confused when they're asked to handle put and get at the same time.

On the front server, add the following options (I assume you use
192.168.2.0/24 for your backend boxes). I make no promises that this is
the best (or even necessarily entirely proper) way to do this, but it
does work for me. On the front hand server, we do virtual hosting for
anything not linuxguru (such as an excellent poetry site at www.ath.cx).
For everything else, everything is sent to the default index.html in
/www/linuxguru, which actually consists of:

<html><head><title>Sorry</title><body>We are down for maintenance. Try
back later</body></html>

Anyways, the pertinant configuration options are:

Dumb server (192.168.2.1 and 209.173.6.52):
------------
UnixSocketDir /var/run/libapache-mod-backhand
MulticastStats 192.168.2.1:4445
AcceptStats 192.168.2.0/24


<Directory /www/linuxguru>
AllowOverride None
Backhand byLoad
Backhand removeSelf
</Directory>
------------

Backend servers (192.168.2.93):
----------------

UnixSocketDir /var/run/libapache-mod-backhand
MulticastStats 192.168.2.93 192.168.2.255:4445
AcceptStats 192.168.2.0/24
---------------

--
GnuPG fingerprint AAE4 8C76 58DA 5902 761D 247A 8A55 DA73 0635 7400
James Blackwell -- Director http://www.linuxguru.net
arriba calculation? (slightly long) [ In reply to ]
James Blackwell wrote:

>Sure. It's pretty easy.
>
>
>However, we're currently not using bySession because the bySession
>handler doesn't know how to get session ids out of http puts (i.e. form
>variables). Our solution to getting sessions to work was to tell php to
>use files to save sessions and NFSing out from the server that also does
>the sql serving to the backend boxes. It would be convienant to pull the
>session data out of the sql server, but I don't think php supports that
>at this point. If you do add bySession.php, make sure you take care to
>add a session id to every URI.
>
>
Why don't you put the session in a cookie? That is what they are there
for. As they come in with the rest of the Headers, mod_backhand
candidacy functions can see them and base their decisions on them.

--
Theo Schlossnagle
1024D/82844984/95FD 30F1 489E 4613 F22E 491A 7E88 364C 8284 4984
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
arriba calculation? (slightly long) [ In reply to ]
In lists.backhand.users, you wrote:
> James Blackwell wrote:
>
>>Sure. It's pretty easy.
>>
>>
>>However, we're currently not using bySession because the bySession
>>handler doesn't know how to get session ids out of http puts (i.e. form
>>variables). Our solution to getting sessions to work was to tell php to
>>use files to save sessions and NFSing out from the server that also does
>>the sql serving to the backend boxes. It would be convienant to pull the
>>session data out of the sql server, but I don't think php supports that
>>at this point. If you do add bySession.php, make sure you take care to
>>add a session id to every URI.
>>
>>
> Why don't you put the session in a cookie? That is what they are there
> for. As they come in with the rest of the Headers, mod_backhand
> candidacy functions can see them and base their decisions on them.

Actually we support that as well and is our preferred method. bysession
works absolutely great with cookies and GET. However, for users that
don't have support for cookies (usually because they disabled them),
we use POST and GET. It of course works out to doing nearly the exact
same thing, but some people swear by it. Now that I think about it,
maybe they have a point. those users that are disabling cookies
effectively avoid getting tracked by places like doubleclick...

Anyways, my site is basically fully usable with every browser with or
without cookies or graphics. I'm big on the old sendmail concept that
programs should be flexible on their input and strict on their output.

Andrew Rodland and I did take a healthy stab at getting bysession to
parse POST data, but the patch we came up with only suceeded in causing
apache to go into a neverending loop. If no one gets around to
figuring out how to do it before I win the lottery, I'll scratch up the
money to pay someone to add that functionality to bysession. :)

--
GnuPG fingerprint AAE4 8C76 58DA 5902 761D 247A 8A55 DA73 0635 7400
James Blackwell -- Director http://www.linuxguru.net