Mailing List Archive

[mod_backhand-users] latency of Spread
Hello,

I enjoyed Theo's talks at ApacheCon, and they got me thinking about the
nature of Spread and the broader uses of multicast in a web cluster. Theo
pointed out that mod_backhand does balancing based on load, while most
hardware balancers do not. What I wonder is, given the latency in the load
information, does this truly do a better job of evenly distributing load
than random distribution, or methods like "least connections" that some
hardware balancers provide? I know it would with a very heterogenous
cluster, but what about in the typical case where all the machines are
basically equivalent?

On a related note, I'm thinking of implementing Spread as a storage
mechanism for Apache::Session, the popular mod_perl module for storing
session data. My concern is that unless Spread can distribute the data to
all machines in the cluster very quickly, you'd still have to use sticky
load-balancing in order to make sure the user reaches a machine that has his
data. Can Spread keep up with the rate needed to handle things like robots
or quick reloads?

Apologies if I should be asking these questions on some Spread mailing list.
Just point me to it, if that's the case.

- Perrin
[mod_backhand-users] latency of Spread [ In reply to ]
On Sunday, April 8, 2001, at 11:10 AM, Perrin Harkins wrote:

> Hello,
>
> I enjoyed Theo's talks at ApacheCon, and they got me thinking about the
> nature of Spread and the broader uses of multicast in a web cluster.
> Theo
> pointed out that mod_backhand does balancing based on load, while most
> hardware balancers do not. What I wonder is, given the latency in the
> load
> information, does this truly do a better job of evenly distributing load
> than random distribution, or methods like "least connections" that some
> hardware balancers provide? I know it would with a very heterogenous
> cluster, but what about in the typical case where all the machines are
> basically equivalent?

Well, I take it you went to the talk on Friday (and not the mod_backhand
talk). I detailed in the mod_backhand talk that load is a poor resource
to make decisions on because of its inherently stale nature. On the
other hand, the current run queue or the current memory usage could be a
decent measure depending on your application.

The beauty of mod_backhand is that it is very flexible. If you don't
what to balance on the system load, then balance on something that makes
a little more sense for you application.

> On a related note, I'm thinking of implementing Spread as a storage
> mechanism for Apache::Session, the popular mod_perl module for storing
> session data. My concern is that unless Spread can distribute the data
> to
> all machines in the cluster very quickly, you'd still have to use sticky
> load-balancing in order to make sure the user reaches a machine that
> has his
> data. Can Spread keep up with the rate needed to handle things like
> robots
> or quick reloads?

Okay, just to clarify, mod_backhand doesn't use Spread. mod_log_spread
uses Spread to distribute Apache's access logs over a cluster.
mod_log_spread has its own mailing list... Check the mod_log_spread site
off of http://www.backhand.org/

As for the distributes session cache, this is an application that is
pretty hard to do "right." I would definitely propose the idea on the
Spread mailing list (I will see it there as I participate in that
project as well). They will be happy to talk in detail about you ideas
and your goals involved with implementing a shared session cache. Like
I said, it won't be trivial, so having a few people who understand the
more complicated scenarios review your approach could be infinitely
useful.

> Apologies if I should be asking these questions on some Spread mailing
> list.
> Just point me to it, if that's the case.

mod_backhand and Spread are both products/projects in the same lab at
Hopkins (CNDS), so the confusion is understandable and fairly common. I
would definitely ask that question on the Spread mailing list.

--
Theo Schlossnagle
1024D/A8EBCF8F/13BD 8C08 6BE2 629A 527E 2DC2 72C2 AD05 A8EB CF8F
2047R/33131B65/71 F7 95 64 49 76 5D BA 3D 90 B9 9F BE 27 24 E7
[mod_backhand-users] latency of Spread [ In reply to ]
On Sun, 8 Apr 2001, Theo Schlossnagle wrote:
> Well, I take it you went to the talk on Friday (and not the mod_backhand
> talk). I detailed in the mod_backhand talk that load is a poor resource
> to make decisions on because of its inherently stale nature. On the
> other hand, the current run queue or the current memory usage could be a
> decent measure depending on your application.

I actually saw both talks. I meant "load" in the generic sense.

After reviewing the docs, I see that lack of freshness in data is dealt
with by the randomization, which obviates the need to know the single best
server at any given time. Seems like a good compromise to me, although
not necessarilly better at simple load-balancing than big/ip's
least-connections algorithm, which always has up-to-date data because of
it's position in the cluster.

> As for the distributes session cache, this is an application that is
> pretty hard to do "right." I would definitely propose the idea on the
> Spread mailing list

Okay, thanks. Too bad you guys ran out of time in your talk on Friday; I
was hoping to hear your solution to the sticky sessions problem.

- Perrin
[mod_backhand-users] latency of Spread [ In reply to ]
Hi Perrin,

Spread has it own mailing list off of http://www.spread.org.

Spread is able to distributed thousands of messages per second in a cluster, with way below second latencies (depending on how big your system is and how high the traffic. Less than 100 milli-second latency should not be a problem on a mid-size system).

Cheers,

:) Yair. http://www.cnds.jhu.edu

Perrin Harkins wrote:

> Hello,
>
> I enjoyed Theo's talks at ApacheCon, and they got me thinking about the
> nature of Spread and the broader uses of multicast in a web cluster. Theo
> pointed out that mod_backhand does balancing based on load, while most
> hardware balancers do not. What I wonder is, given the latency in the load
> information, does this truly do a better job of evenly distributing load
> than random distribution, or methods like "least connections" that some
> hardware balancers provide? I know it would with a very heterogenous
> cluster, but what about in the typical case where all the machines are
> basically equivalent?
>
> On a related note, I'm thinking of implementing Spread as a storage
> mechanism for Apache::Session, the popular mod_perl module for storing
> session data. My concern is that unless Spread can distribute the data to
> all machines in the cluster very quickly, you'd still have to use sticky
> load-balancing in order to make sure the user reaches a machine that has his
> data. Can Spread keep up with the rate needed to handle things like robots
> or quick reloads?
>
> Apologies if I should be asking these questions on some Spread mailing list.
> Just point me to it, if that's the case.
>
> - Perrin
>
> _______________________________________________
> backhand-users mailing list
> backhand-users@lists.backhand.org
> http://lists.backhand.org/mailman/listinfo/backhand-users