We have an odd problem; we recently migrated our application to 4 new
servers, identically configured with identical hardware. mod_backhand
1.2.1 is installed on all servers. Upon startup, the Arriba value
calculated for each of these servers varies wildly:
server 1: 625364
server 2: 723409
server 3: 643787
server 4: 562925
These results persist after a restart of the Apache server and removal of
the old Arriba files. Our backhand configuration is as follows:
BackhandSelfRedirect On
Backhand byAge
Backhand byRandom
Backhand byLogWindow
Backhand addSelf
Backhand byBusyChildren 1
The weight value on the byBusyChildren directive is intended to prevent
backhanding unless load goes over 1. We were testing this value to see how
it worked. These servers are 2-processor Sun 280Rs, and have no problem
serving requests at low load, so we wanted to let users who landed on the
server stay on the server unless load went above a certain threshold
(which would probably indicate a problem on the server, or an especially
long-running CGI). We intended to increase the threshold somewhat if
initial results were satisfactory. This could have been more thoroughly
tested, except that the application in question is an web-based e-mail
client, and it's difficult to emulate real-world conditions for that, for
economic and other reasons.
When in production, we're observing excessive usage of server 4; to the
point where it's doing up to 50% of the work, according to our statistics.
I can pinpoint three possible reasons for this:
1) Backhand is working properly (i.e., letting requests stay on the
recipient server at low load), but more requests are coming in to
server 4. This may be the result of misconfigured DNS servers that
aren't caching our round-robin DNS entry correctly. I know that all
Windows 2000 and higher boxes now have a built-in caching DNS server,
so this is not out of the realm of possibility.
2) The Arriba "miscalculation" is causing more requests to go to server 4.
(By implication, wouldn't this mean fewer requests would end up on
server 2? However, we don't observe this; load seems to be evenly spread over
the three remaining boxes; if anything, server 2 is doing slightly
more work than servers 1 and 3.)
3) Server 4 is actually faster, somehow, despite being identical in every
way.
I tend to discount option 3, and option 1 isn't really a topic for this
list. However, I can't seem to find any solid information on how the
Arriba value is calculated. Does anyone know of any? I looked at arriba.c
slightly, but I'm not very adept at C--it appears to create 12 threads,
measure the time it took, and calculate Arriba based on that--I was hoping
to confirm that suspicion.
Also, with this backhand configuration, does the Arriba value have any
import anyway? I thought putting the "byBusyChildren" directive last in
the list would cause the redirection to happen solely on the basis of the
number of busy Apache servers, which is essentially an estimate of the
length of the run queue, and ignore the Arriba or other resource estimates
on the server. We've avoided using the byLoad directive because we found
that in a cluster of servers with different hardware configurations,
byLoad tended to place too much emphasis on servers with extra memory.
Lastly, if option 1 is actually what's happening, shouldn't removing the
weight value on byBusyChildren cause server 4 to begin redirecting more
requests to the other servers? That would be acceptable.
I apologize for not being able to provide more information about what's
actually happening; the application is in production and for performance
reasons, we're chose not to enable logging of backhand information;
unfortunately, the application is dynamic and session-based, so now it's
politically expedient to wait for an appropriate time of low usage to
restart one or more of the servers to put the logging directives in.
Thanks for any suggestions-
James Ervin
UNC-Chapel Hill
servers, identically configured with identical hardware. mod_backhand
1.2.1 is installed on all servers. Upon startup, the Arriba value
calculated for each of these servers varies wildly:
server 1: 625364
server 2: 723409
server 3: 643787
server 4: 562925
These results persist after a restart of the Apache server and removal of
the old Arriba files. Our backhand configuration is as follows:
BackhandSelfRedirect On
Backhand byAge
Backhand byRandom
Backhand byLogWindow
Backhand addSelf
Backhand byBusyChildren 1
The weight value on the byBusyChildren directive is intended to prevent
backhanding unless load goes over 1. We were testing this value to see how
it worked. These servers are 2-processor Sun 280Rs, and have no problem
serving requests at low load, so we wanted to let users who landed on the
server stay on the server unless load went above a certain threshold
(which would probably indicate a problem on the server, or an especially
long-running CGI). We intended to increase the threshold somewhat if
initial results were satisfactory. This could have been more thoroughly
tested, except that the application in question is an web-based e-mail
client, and it's difficult to emulate real-world conditions for that, for
economic and other reasons.
When in production, we're observing excessive usage of server 4; to the
point where it's doing up to 50% of the work, according to our statistics.
I can pinpoint three possible reasons for this:
1) Backhand is working properly (i.e., letting requests stay on the
recipient server at low load), but more requests are coming in to
server 4. This may be the result of misconfigured DNS servers that
aren't caching our round-robin DNS entry correctly. I know that all
Windows 2000 and higher boxes now have a built-in caching DNS server,
so this is not out of the realm of possibility.
2) The Arriba "miscalculation" is causing more requests to go to server 4.
(By implication, wouldn't this mean fewer requests would end up on
server 2? However, we don't observe this; load seems to be evenly spread over
the three remaining boxes; if anything, server 2 is doing slightly
more work than servers 1 and 3.)
3) Server 4 is actually faster, somehow, despite being identical in every
way.
I tend to discount option 3, and option 1 isn't really a topic for this
list. However, I can't seem to find any solid information on how the
Arriba value is calculated. Does anyone know of any? I looked at arriba.c
slightly, but I'm not very adept at C--it appears to create 12 threads,
measure the time it took, and calculate Arriba based on that--I was hoping
to confirm that suspicion.
Also, with this backhand configuration, does the Arriba value have any
import anyway? I thought putting the "byBusyChildren" directive last in
the list would cause the redirection to happen solely on the basis of the
number of busy Apache servers, which is essentially an estimate of the
length of the run queue, and ignore the Arriba or other resource estimates
on the server. We've avoided using the byLoad directive because we found
that in a cluster of servers with different hardware configurations,
byLoad tended to place too much emphasis on servers with extra memory.
Lastly, if option 1 is actually what's happening, shouldn't removing the
weight value on byBusyChildren cause server 4 to begin redirecting more
requests to the other servers? That would be acceptable.
I apologize for not being able to provide more information about what's
actually happening; the application is in production and for performance
reasons, we're chose not to enable logging of backhand information;
unfortunately, the application is dynamic and session-based, so now it's
politically expedient to wait for an appropriate time of low usage to
restart one or more of the servers to put the logging directives in.
Thanks for any suggestions-
James Ervin
UNC-Chapel Hill