Mailing List Archive

fastest responce
Just a thought,

Concerning a fastest response class scheduling algorithm which would
function to balance response time through out the cluster by manipulating the
number and type of connection to each node of the cluster, is and how would it
be possible to develop an algorithm that works by evaluating the rate of change
for the sum of the client tcp windows? My knowledge is not exact but would it
be possible to this? Realistically how fast does the environment between a
client and server on the internet change? Assuming a tcp protocol like ftp
which for now we assume is primarily performing bulk data transactions and thus
over a significant period of time it would act to continuously fill up the
window size. Therefor this would result in a number of connections to a single
node having filled windows or filling windows most of the time. Now the rate at
which the windows are/or become filled would be dependent on three factors: the
size of the physical links over the length of the connection, the current
congestion over those links, and also the current congestion at the end server
node(the node's ability to process data). Now if the sum of the rate of change
of these links is averaged a load value may be calculated for each node and as a
result connections maybe intelligently balanced. This should also apply to data
exchange types which do not resemble bulk ftp. I also wonder if it would be
possible to create a framework within the lvs to enable balancing at a packet
granularity rather than at a connection granularity? Would it be possible for
the lvs to record the necessary state information to manipulate acks in such a
manner that it could pass off connections between identical servers while the
connection always appeared to be "up" to the client? Seems like all this would
require lots of packet mugging so yeah...

Thanks, Tyrel
Re: fastest responce [ In reply to ]
Hello,

On Thu, 25 Jan 2001, Tyrel Beede wrote:

> Just a thought,
>
> Concerning a fastest response class scheduling algorithm which would
> function to balance response time through out the cluster by manipulating the
> number and type of connection to each node of the cluster, is and how would it
> be possible to develop an algorithm that works by evaluating the rate of change
> for the sum of the client tcp windows? My knowledge is not exact but would it
> be possible to this? Realistically how fast does the environment between a
> client and server on the internet change? Assuming a tcp protocol like ftp
> which for now we assume is primarily performing bulk data transactions and thus
> over a significant period of time it would act to continuously fill up the
> window size. Therefor this would result in a number of connections to a single

Only for NAT or for any forwarding method when the client uploads
data.

> node having filled windows or filling windows most of the time. Now the rate at
> which the windows are/or become filled would be dependent on three factors: the
> size of the physical links over the length of the connection, the current
> congestion over those links, and also the current congestion at the end server
> node(the node's ability to process data). Now if the sum of the rate of change
> of these links is averaged a load value may be calculated for each node and as a
> result connections maybe intelligently balanced. This should also apply to data
> exchange types which do not resemble bulk ftp. I also wonder if it would be
> possible to create a framework within the lvs to enable balancing at a packet
> granularity rather than at a connection granularity? Would it be possible for

Are you trying to balance a ftp traffic?

Hm, what do you mean: balancing at a packet/connection
granularity. I don't understand. Scheduling of independent packets?
What service needs this?

> the lvs to record the necessary state information to manipulate acks in such a
> manner that it could pass off connections between identical servers while the
> connection always appeared to be "up" to the client? Seems like all this would
> require lots of packet mugging so yeah...

You have a very strange vision for the load. Be more specific.
Explain carefully each idea you mention in this mail, I hear your
bullets but can't see them :)

IMHO, the director needs information from the real servers to
balance the load. There are many parameters we can monitor and we can
make different expressions based on these parameters: packet rate,
cpu usage, free memory. In this way, we can select different expressions
for the different services. There is a reason for this: each service
loads differently the real host or may be other hosts too, for example
databases, etc.

I don't believe in your theory about the fastest response
schedulng but you can surprise us with more specific details and
may be results :) Is this scheduler for NAT only?

> Thanks, Tyrel


Regards

--
Julian Anastasov <ja@ssi.bg>
Re: fastest responce [ In reply to ]
Julian Anastasov wrote:

> Hello,
>
> On Thu, 25 Jan 2001, Tyrel Beede wrote:
>
> > Just a thought,
> >
> > Concerning a fastest response class scheduling algorithm which would
> > function to balance response time through out the cluster by manipulating the
> > number and type of connection to each node of the cluster, is and how would it
> > be possible to develop an algorithm that works by evaluating the rate of change
> > for the sum of the client tcp windows? My knowledge is not exact but would it
> > be possible to this? Realistically how fast does the environment between a
> > client and server on the internet change? Assuming a tcp protocol like ftp
> > which for now we assume is primarily performing bulk data transactions and thus
> > over a significant period of time it would act to continuously fill up the
> > window size. Therefor this would result in a number of connections to a single
>
> Only for NAT or for any forwarding method when the client uploads
> data.

True, but there is a window size and a congestion window size for each size of a
connection. Which side of the connection data was primarily flowing doesn't matter.

>
>
> > node having filled windows or filling windows most of the time. Now the rate at
> > which the windows are/or become filled would be dependent on three factors: the
> > size of the physical links over the length of the connection, the current
> > congestion over those links, and also the current congestion at the end server
> > node(the node's ability to process data). Now if the sum of the rate of change
> > of these links is averaged a load value may be calculated for each node and as a
> > result connections maybe intelligently balanced. This should also apply to data
> > exchange types which do not resemble bulk ftp. I also wonder if it would be
> > possible to create a framework within the lvs to enable balancing at a packet
> > granularity rather than at a connection granularity? Would it be possible for
>
> Are you trying to balance a ftp traffic?
>
> Hm, what do you mean: balancing at a packet/connection
> granularity. I don't understand. Scheduling of independent packets?
> What service needs this?

For example, within a cluster their could be two nodes. Assuming these nodes each
had three established tcp connections it would be possible that one of the two nodes
could have three established connections which were not transmitting data. Therefor
when we schedual according to a connection granularity load is only shared at a level
where the number of connections between machines are balanced. With tcp this does
not mean that the amount of data transmitted is going to be the same per connection
and thus, in total, would not be the same per node in the cluster. Now, what I
wonder is would such a thing be possible within the current tcp implementation. As
you indicated I'm not sure which services would benifit from this the most but it
wouldn't be hard to characterize a type of data transaction which would benifit the
most. From this special case it would be possible to evaluate where or not any real
preformance gains could be made. This, however, is getting a little bit further away
from the topic than my original question had invisioned. I was just wondering if it
would be possible and if possible how would it be done on paper.

>
>
> > the lvs to record the necessary state information to manipulate acks in such a
> > manner that it could pass off connections between identical servers while the
> > connection always appeared to be "up" to the client? Seems like all this would
> > require lots of packet mugging so yeah...
>
> You have a very strange vision for the load. Be more specific.
> Explain carefully each idea you mention in this mail, I hear your
> bullets but can't see them :)

Yeah, I would agree. Most of the time I don't even see the bullets. :-)

If the lvs could keep a record of all transactions between a server and host and if
the connection were to be closed at the server end it would have the ability to
regenerate the connection on another server providing that the two servers were able
to serve identical content. Now the idea that the lvs could store all the
information for each transaction through it would most likely be impossible. But
would just the control information(acks and such) be enough to regenerate the
connection? A good visualization of this would be something like a a proxy server
designed to work at a protocol level instead of the application layer.

>
>
> IMHO, the director needs information from the real servers to
> balance the load. There are many parameters we can monitor and we can
> make different expressions based on these parameters: packet rate,
> cpu usage, free memory. In this way, we can select different expressions
> for the different services. There is a reason for this: each service
> loads differently the real host or may be other hosts too, for example
> databases, etc.

What do you mean my "the director needs information from the real servers to balance
the load" should this information be a direct result of a platform/application
specific modification? How sould it get this information?

>
>
> I don't believe in your theory about the fastest response
> schedulng but you can surprise us with more specific details and
> may be results :) Is this scheduler for NAT only?

If I was able to figure out the details and implement something of this nature it
would be done in NAT to prove the idea

Thanks, Tyrel
Re: fastest responce [ In reply to ]
Hello,

On Sat, 27 Jan 2001, Tyrel Beede wrote:

> > > exchange types which do not resemble bulk ftp. I also wonder if it would be
> > > possible to create a framework within the lvs to enable balancing at a packet
> > > granularity rather than at a connection granularity? Would it be possible for
> >
> > Are you trying to balance a ftp traffic?
> >
> > Hm, what do you mean: balancing at a packet/connection
> > granularity. I don't understand. Scheduling of independent packets?
> > What service needs this?
>
> For example, within a cluster their could be two nodes. Assuming these nodes each
> had three established tcp connections it would be possible that one of the two nodes
> could have three established connections which were not transmitting data. Therefor
> when we schedual according to a connection granularity load is only shared at a level
> where the number of connections between machines are balanced. With tcp this does
> not mean that the amount of data transmitted is going to be the same per connection
> and thus, in total, would not be the same per node in the cluster. Now, what I
> wonder is would such a thing be possible within the current tcp implementation. As
> you indicated I'm not sure which services would benifit from this the most but it
> wouldn't be hard to characterize a type of data transaction which would benifit the
> most. From this special case it would be possible to evaluate where or not any real
> preformance gains could be made. This, however, is getting a little bit further away
> from the topic than my original question had invisioned. I was just wondering if it
> would be possible and if possible how would it be done on paper.

I don't see what can be done here. In this example one of the
connections can transfer 10MB/sec while the other connections can
transfer only 1KB/sec. In this case we have communication between two
ends and I don't see a way to equally load the network traffic. The
first goal is to connect the both ends and then comes the second goal
to load the links equally. Only in this order. Splitting connection to
different real servers is logically incorrect. We are not sure whether
the two real servers will forward the traffic to same host, i.e. we assume
the real servers are one of the connection ends. The other end is the
client. The balancing effect will be achieved when many connections
are scheduled. This is a "Load informed connection scheduling" and
not "Load informed balancing" because the second term is too ambigous.
So, LVS schedules connections, not packets.

> If the lvs could keep a record of all transactions between a server and host and if
> the connection were to be closed at the server end it would have the ability to
> regenerate the connection on another server providing that the two servers were able
> to serve identical content. Now the idea that the lvs could store all the
> information for each transaction through it would most likely be impossible. But
> would just the control information(acks and such) be enough to regenerate the
> connection? A good visualization of this would be something like a a proxy server
> designed to work at a protocol level instead of the application layer.

Yes, this in theory is possible, for LVS/NAT. But there are some
questions:

- how the director knows which part from the connection can be continued
from another real server?

- how the second real server will agree to start connection in the middle
and to continue transferring the data started from the failed real
server. This leads to big changes in the real server TCP stack and
of course in the applications. You will need syscall accept() with
support for the initial connection position :) By this way the interal
web server will know at which pos to start sending a static content.

This sounds as Layer 7 job. For eaxmple, virtual web server.
I'm not sure whether this theory can be applied to TCP. The solution
is to make the application protocol robust, where a drop in one
connection is not fatal, for example FTP/HTTP reget. Everything else
is very complex and breaks many standards.

> > IMHO, the director needs information from the real servers to
> > balance the load. There are many parameters we can monitor and we can
> > make different expressions based on these parameters: packet rate,
> > cpu usage, free memory. In this way, we can select different expressions
> > for the different services. There is a reason for this: each service
> > loads differently the real host or may be other hosts too, for example
> > databases, etc.
>
> What do you mean my "the director needs information from the real servers to balance
> the load" should this information be a direct result of a platform/application
> specific modification? How sould it get this information?

There are agents in the real servers that report information.
The director uses this information to control the connection scheduling.
There is a WRR method in LVS that needs a good cluster software to
achieve this goal. Yes, the agents retrieve OS-specific information.
But this is an application level solution only. Nobody touches the
lower layers. Cluster software. I have some postings on this issue
in the mailing list, you can search them. And I'm preparing a preview
version but it is again delayed, I was busy with creating a healthcheck
program which is now completed.

> > I don't believe in your theory about the fastest response
> > schedulng but you can surprise us with more specific details and
> > may be results :) Is this scheduler for NAT only?
>
> If I was able to figure out the details and implement something of this nature it
> would be done in NAT to prove the idea

OK, you know well the source code but if you have some
questions you can post them to the mailing list for discussion.

> Thanks, Tyrel


Regards

--
Julian Anastasov <ja@ssi.bg>