Mailing List Archive: Westnet and Utah outage

Westnet and Utah outage

gcook at tigger

Nov 22, 1995, 7:20 AM

Post #1 of 21 (5893 views)

small clarification. I have no problem with westnet and do not mean to
be picking on it in anyway. I have a friend who lives in westnet's
service area who gets bothered by these outages and passes them on to me.

********************************************************************
Gordon Cook, Editor & Publisher Subscript.: Individ-ascii $85
The COOK Report on Internet Non Profit. $150
431 Greenway Ave, Ewing, NJ 08618 Small Corp & Gov't $200
(609) 882-2572 Corporate $350
Internet: cook@cookreport.com Corporate. Site Lic $650
Newly expanded COOK Report Web Pages http://pobox.com/cook/
********************************************************************

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 8:19 AM

Post #2 of 21 (5888 views)

So this is to imply that you do have a problem w/ Sprint and that
you are picking on them in a specific way?

-scott

> From list-admin@merit.edu Wed Nov 22 09:23:19 1995
> Date: Wed, 22 Nov 1995 09:20:38 -0500 (EST)
> From: Gordon Cook <gcook@tigger.jvnc.net>
> Reply-To: cook@cookreport.com
> To: nanog@merit.edu
> Subject: Westnet and Utah outage
> Mime-Version: 1.0
> Content-Type> : > TEXT/PLAIN> ; > charset=US-ASCII>
> Content-Length: 762
>
> small clarification. I have no problem with westnet and do not mean to
> be picking on it in anyway. I have a friend who lives in westnet's
> service area who gets bothered by these outages and passes them on to me.
>
> ********************************************************************
> Gordon Cook, Editor & Publisher Subscript.: Individ-ascii $85
> The COOK Report on Internet Non Profit. $150
> 431 Greenway Ave, Ewing, NJ 08618 Small Corp & Gov't $200
> (609) 882-2572 Corporate $350
> Internet: cook@cookreport.com Corporate. Site Lic $650
> Newly expanded COOK Report Web Pages http://pobox.com/cook/
> ********************************************************************
>
>
>

Re: Westnet and Utah outage [ In reply to ]

gcook at tigger

Nov 22, 1995, 9:21 AM

Post #3 of 21 (5893 views)

Scott asks:

So this is to imply that you do have a problem w/ Sprint and that
you are picking on them in a specific way?

COOK: your suggestion scott not mine....... If I remember correctly on
my one previous querry 6 weeks ago, the problem seemed to be more MCI's
than sprints. I am in the midst of writing a long cover story on how
backbones are responding to internet growth pressure and when something
breaks I am interest in understanding what happened. It looks in this
case however like people want me to yell on the sprint outage list rather
than here so I'll check out the possibility of doing that. Do you guys
have an outtage list I can join? My apologies if I have offended anyone.

********************************************************************
Gordon Cook, Editor & Publisher Subscript.: Individ-ascii $85
The COOK Report on Internet Non Profit. $150
431 Greenway Ave, Ewing, NJ 08618 Small Corp & Gov't $200
(609) 882-2572 Corporate $350
Internet: cook@cookreport.com Corporate. Site Lic $650
Newly expanded COOK Report Web Pages http://pobox.com/cook/
********************************************************************

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 10:21 AM

Post #4 of 21 (5887 views)

>Scott asks:
>
>So this is to imply that you do have a problem w/ Sprint and that
>you are picking on them in a specific way?
>
>COOK: your suggestion scott not mine....... If I remember correctly on
>my one previous querry 6 weeks ago, the problem seemed to be more MCI's
>than sprints. I am in the midst of writing a long cover story on how
>backbones are responding to internet growth pressure and when something
>breaks I am interest in understanding what happened. It looks in this
>case however like people want me to yell on the sprint outage list rather
>than here so I'll check out the possibility of doing that. Do you guys
>have an outtage list I can join? My apologies if I have offended anyone.
>
>********************************************************************
>Gordon Cook, Editor & Publisher Subscript.: Individ-ascii $85
>The COOK Report on Internet Non Profit. $150
>431 Greenway Ave, Ewing, NJ 08618 Small Corp & Gov't $200
>(609) 882-2572 Corporate $350
>Internet: cook@cookreport.com Corporate. Site Lic $650
>Newly expanded COOK Report Web Pages http://pobox.com/cook/
>********************************************************************

I really hate to make Gordon's points here, but the network is so
broken at times, it is hard to get interactive work done. Even an FTP
between two NSF supercomputer centers ((so far) idle 266MHz machines at
the end points) went at a whopping:

3320903 bytes sent in 1.1e+03 seconds (3.1 Kbytes/s)

And that was already the second try, as the uncompressed file version
just took ways too long. The packet losses were between 8 and 10
percent.

These kind of performances are ways too regular for me these days. And
as a "user" I have very little means to find out what the hell is wrong
with this network. I am sometimes so sick and tired of this that I am
tempted to use the tools I have (ping and traceroute) and broadly post
to people as to where things seem broken. And I will not care at all if
you guys tell me "well, that's unfair, as ping and traceroute go to
the main processor." Give me a working network, better tools, or SHUT
THE HELL UP AND GO BACK TO FARMING. I will be glad to shut up myself,
once you get your act together and provide smooth and transparent
network services.

Obviously I am able to send much more polite notes, but I am really
getting sick and tired of this lousy performance and degrading network
service qualities.

I suspect MANY will increase their amplitude over the next few months
if this continues.

And I don't want to hear this bullshit about regular 10% packet losses
being just fine, and 100% being just marginal.

*At least* let people know if things are broken, so they look for
alternatives (be it a cup of tea if short term, or another service
provider if persistent).

I think this problem is wide spread and not confined to specific
service providers. So if someone points a finger at your competitor,
don't be too happy about it. You may be next.

Geez.

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 11:08 AM

Post #5 of 21 (5879 views)

Sean:

I would wish that at least the major service providers would meet in
some smoke filled room and get to a fate sharing mindset and conclude
"we will fix the joint problem." I have heard there are some rooms
available now at WPAFB? Right next to all the archived UFOs? A good
part of the problem is a questionable mindset in a rather young
commercial environment. There is no Fed daddy any more, and I suspect
that you would not want one either. So, who is driving a global
collaboration? Has to be you and your pals. I do not see another
choice. Neither the Feds, not the IETF, ISoc, whoever, will fix it for
you. And NANOG is waaayyyys to weak and can't quite get things together
itself.

I routinely see high packet losses, the environment, from what I see,
has severely degraded to what it was, say, a year ago. I submit it also
has significantly grown and we have many more cooks in the kitchen,
but, as ill as they are defined, we cannot even keep our old standards
up. I still believe y'all got the Internet on a silver plate, and just
need to fix things up.

I suspect at least 80% of the problem is non-technical, but
abministrative and mindset. If those 80% could be resolved to a
somewhat high degree, I suspect we will find solutions for the
technical problems. Not over night, I am sure, but something we could
plan for.

Unfortunately, the problem is too big to be resolved solely by the
engineers. Takes more than just intelligence.

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 11:08 AM

Post #6 of 21 (5895 views)

HWB -

You make some very important points through your frustration,
the most key of which is this one: "...if someone points a finger
at your competitor, don't be too happy about it. You may be next".

Unfortunately with the available technology and labour pool,
there is no entity which could provide a very clean global Internet,
period. There is no organization you can pay more money to that
will improve your general-case performance, unless your general-case
utilization is really much more like a VPN than the Internet.

That is to say, if what you're looking for is high performance
to sites X, Y and Z, and OK current-Internet performance to
everywhere else, there are several options open to you, one of
which is putting X, Y, and Z on a VPN and hooking each of these
sites individually to the Internet. This is something that
Sprint Managed Router Networks does all the time and is very
good at, and is essentially what (modulo policy) the vBNS is for.

However, with respect to the Internet, anybody who tells
you that they can provide a better quality of service,
connectivity-wise, than their equivalent competitors is
either lying or in for a really big shock.

As to your other points, I'm not sure how to address
them, but I think you probably have some ideas you'd be
willing to share... :)

Sean.

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 12:42 PM

Post #7 of 21 (5881 views)

Gordon,

I'm sorry if I misinterpreted your note, but you seemed to be trying
to suggest exactly that. In one breath you excuse yourself from
pointing one finger in hopes that people don't notice the other
arm wildly gesturing the other way. Its quite disengenous to state
otherwise. It wasn't appropriate for you to do this to Sprint and it
wouldn't be appropriate to do this to anyone else, either.

I consider this list a place for ISPs to discuss general policy and
planning issues that effect all of us. It is a very inappropriate
place to discuss problems with a specific provider. If you have
a problem with a specific provider, whether it be Sprint, or ANS,
or MCI, or podunk-ISP, it would seem to me that it would be
most appropriate to be talking to them directly rather than
shouting to the wind.

As to the rest, this is a competitive market with lots of fine
service providers both national, regional, and local with more
coming on board all the time. Some problems we will all face
and other problems will be provider specific. This list
has always been appropriate and useful to discuss the first,
lets stick with it.

-scott (huddle@mci.net)

Gordon Cook <gcook@tigger.jvnc.net> writes
> Scott asks:
>
> So this is to imply that you do have a problem w/ Sprint and that
> you are picking on them in a specific way?
>
> COOK: your suggestion scott not mine.......

Re: Westnet and Utah outage [ In reply to ]

J.Hayward at utexas

Nov 22, 1995, 12:53 PM

Post #8 of 21 (5889 views)

Gordon Cook writes:
COOK: your suggestion scott not mine....... If I remember correctly on
my one previous querry 6 weeks ago, the problem seemed to be more MCI's
than sprints. I am in the midst of writing a long cover story on how
backbones are responding to internet growth pressure and when something
breaks I am interest in understanding what happened.

Gordon,

Why would this be a backbone issue? It sounds to me like Westnet
relies on a single connection from its NSP, and that this failed for
some period of time. Any single component, be it circuit, router,
etc. *will fail*. If a regional depends on a single point of failure
the outcome is inevitable.

--
Jeff Hayward

Re: Westnet and Utah outage [ In reply to ]

jprovo at ultra

Nov 22, 1995, 3:31 PM

Post #9 of 21 (5883 views)

[clip]
>>It looks in this
>>case however like people want me to yell on the sprint outage list rather
>>than here so I'll check out the possibility of doing that. Do you guys
>>have an outtage list I can join? My apologies if I have offended anyone.

Gordon, you're taking my comments a bit differently than intended. More
of a "outage was notified and then commented when closed" than a "go
yell there". The current outage list at sprint is light-years better
than the old sl-<mumble> lists where no traffic lived. My point was
that if you want to know the specifics for a suppliers outages, go get
on their list[s]. When in doubt, go off an VRFY some standard
list-managers at the mail-hosts you see people's mail coming from.

IMO, nanog is like a mega-NAP for the humans involved... a meetpoint, not
a notification list.

[clip]
>the main processor." Give me a working network, better tools, or SHUT
>THE HELL UP AND GO BACK TO FARMING. I will be glad to shut up myself,
>once you get your act together and provide smooth and transparent
>network services.
>
>Obviously I am able to send much more polite notes, but I am really
>getting sick and tired of this lousy performance and degrading network
>service qualities.

I think this may be the first case *for* decaf.

[.hwb, if you want to flame, then please direct to <flamage@rsuc.gen.ma.us>.
I'll deal with it when there's time. Or delete it. none of us need it
here.]

Seriously, though, while we _are_ seeing the impact of the End of NSFnet
and the insane increase in day-to-day, joe-average network usage, it was
obviously impending and affects everyone, anywhere and ranting will do
no-one no good.

>*At least* let people know if things are broken, so they look for
>alternatives (be it a cup of tea if short term, or another service
>provider if persistent).
Again, all providers have [at least private] outages-lists. Some of
them only have them as part of their normal local annoucements, but
that's due to change RSN according to my crystal ball... we all affect
each other so much that we all will benefit, at least in exchanging
outage data with near neighbors.

Perhaps "known outage lists" would be a good appendix to the
FAQ-in-progress? I like the idea of a meta-list, but it would have to
be out-bound only [to be useful] and then you're starting to establish
"who is important enough to be allowed" or "who/what method for
moderation" etc etc... No-one's got time for that.

What gets me is when you said earlier:
>I suspect at least 80% of the problem is non-technical, but
>abministrative and mindset.
On what do you base this suspicion?

Everone working on these things has to work on facts and tangibles.
"It's broke! Fix it!" is less useful than looking down the fiber jumper
to see what's clogging the data.

Anyway, here's to second mo's comment; may your pagers not go off while
carving the [turkey|tofu].

Cheers,

Joe Provo
Network Operations Center
UltraNet Communications, Inc.

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 3:40 PM

Post #10 of 21 (5880 views)

>I think this may be the first case *for* decaf.

I don't drink coffee. And I never noticed caffine having any impact on
me. Wish it would, these recurring arguments make me tired.

>What gets me is when you said earlier:
>>I suspect at least 80% of the problem is non-technical, but
>>abministrative and mindset.
>On what do you base this suspicion?

Experience on such matters. This kind of problems is not exactly new
to me. Been there, done that. Hope someone will get up who has the
guts to fix things.

>Everone working on these things has to work on facts and tangibles.
>"It's broke! Fix it!" is less useful than looking down the fiber jumper
>to see what's clogging the data.

Then do it. Don't waste time sending email instead.

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 6:19 PM

Post #11 of 21 (5884 views)

In message <199511221821.KAA15673@upeksa.sdsc.edu>, Hans-Werner Braun writes:
>
> These kind of performances are ways too regular for me these days. And
> as a "user" I have very little means to find out what the hell is wrong
> with this network. I am sometimes so sick and tired of this that I am
> tempted to use the tools I have (ping and traceroute) and broadly post
> to people as to where things seem broken. And I will not care at all if
> you guys tell me "well, that's unfair, as ping and traceroute go to
> the main processor." Give me a working network, better tools, or SHUT
> THE HELL UP AND GO BACK TO FARMING. I will be glad to shut up myself,
> once you get your act together and provide smooth and transparent
> network services.

Since when can't you use ping and traceroute. You just have to ignore
the results from routers that are probably too heavily loaded. (And
avoid loading them further by pinging them, but the few packets that
determine which way your traffic is headed should be no problem).

Curtis

Re: Westnet and Utah outage [ In reply to ]

Nov 22, 1995, 7:23 PM

Post #12 of 21 (5887 views)

In message <199511221908.LAA15917@upeksa.sdsc.edu>, Hans-Werner Braun writes:
> [. .. discussion of smoke filled rooms and UFOs deleted. ;-) ]
>
> I routinely see high packet losses, the environment, from what I see,
> has severely degraded to what it was, say, a year ago. I submit it also
> has significantly grown and we have many more cooks in the kitchen,
> but, as ill as they are defined, we cannot even keep our old standards
> up. I still believe y'all got the Internet on a silver plate, and just
> need to fix things up.
>
> I suspect at least 80% of the problem is non-technical, but
> abministrative and mindset. If those 80% could be resolved to a
> somewhat high degree, I suspect we will find solutions for the
> technical problems. Not over night, I am sure, but something we could
> plan for.

One requirement that we have been pushing hard for is:

Zero packet loss to stable destinations if the path is uncongested
in the presence of arbitrarily high levels of route flap.

We are trying to enforce this by requiring router vendors to report
the highest traffic rates that they can sustain where this holds true.
We completely ignore the Bradner test results and enphatically insist
that those tests are completely useless and have done the Internet a
great disservice.

This zero loss condition would seem to the naive Internet user to be a
given. It absolutely is not (unless your router is an NSS :). If you
have a cache of prefixes and can forward really fast with that cache
and have a much slower secondary means of forwarding, you had better
not invalidate any cache entries by flushing subsets of cache entries
(or worse yet the whole cache) and you had better not insure cache
consistency by timing out cache entries.

The cache problems are particularly difficult when trying to maintain
overlaps and componets of the overlap or the aggregate flap. Route
deletion is a hard problem with a partially populated cache. Router
vendors seemd to have overlooked this difficulty and/or not been
acutely aware of the requirement.

Unless this is fixed, you will get loss between two stable points even
if there is no congestion since high levels of route flap is sort of a
given now that the Internet has become this big.

Of course, things get much worse if links in the path are congested,
or if routers can't handle the PPS load, or if routers in the path
fall over and die. I'm just pointing out that for some routers, when
things are "working" they are not working by your definition or mine.

And - Yes, it is being fixed!

> Unfortunately, the problem is too big to be resolved solely by the
> engineers. Takes more than just intelligence.

Wow! Sound real complicated. Must require beaurocrats. ;-)

Curtis

Re: Westnet and Utah outage [ In reply to ]

freedman at netaxs

Nov 22, 1995, 8:27 PM

Post #13 of 21 (5884 views)

> > These kind of performances are ways too regular for me these days. And
> > as a "user" I have very little means to find out what the hell is wrong
> > with this network. I am sometimes so sick and tired of this that I am
> > tempted to use the tools I have (ping and traceroute) and broadly post
> > to people as to where things seem broken. And I will not care at all if
> > you guys tell me "well, that's unfair, as ping and traceroute go to
> > the main processor." Give me a working network, better tools, or SHUT
> > THE HELL UP AND GO BACK TO FARMING. I will be glad to shut up myself,
> > once you get your act together and provide smooth and transparent
> > network services.

Well, it's not terribly useful to complain to nanog about specific &
temporary problems. Unfortunately, it's often really only the Network
Operation Centers (NOCs) of the various providers who are equipped with the
knowledge of the topology of the various networks who can properly diagnose
network problems.

[.So it's not a problem to use ping & traceroute - acknowledging that they
may not be useful when seeing round trip times from (Cisco) routers. The
problem is that you should pick a productive forum such as your provider's
NOC to complain to.]

For example, it's easy to say "My provider sucked this morning - my
traceroute stopped in DC and there were huge packet losses". But what
you don't see is that the AC power was *off* @ MAE-East this morning
from 4am or so until 9:30am or so. And mailing to NANOG is not the
right way to find out these things.

Avi

Re: Westnet and Utah outage [ In reply to ]

Nov 23, 1995, 8:08 AM

Post #14 of 21 (5886 views)

>Question: Which RFC should I consult to determine acceptable delay and packet
>loss?

RFCs are the result of IETF activities. The IETF is essentially a
protocol standardization group, not an operations group. I don't think
you perceive the IETF as "running" your network, or? There may not be
much of an alternative, though, which to a large extend is the issue at
hand. Nobody is responsible (individually or as a consortium or
whatever) of this anarchically organized and largely uncoordinated (at
a systemic level) global operational environment. While IETF/RFCs could
be utilized somehow, this is not really an issue of theirs. I sure
would not blame the IETF for not delivering here, is this is not their
mandate.

In other email I saw it seems that the important issues are hard to
understand for some. I (and I suspect several others) don't really care
much about a specific tactical issue (be it an outage or whatever).
The issue is how to make the system work with predictable performance
and a fate sharing attitude at a global level, in a commercial and
competitive environment that is still extremely young at that, and
attempts to accomodate everything from mom'n'pop shops to multi-billion
dollar industry. And exhibits exponential usage and ubiquity growth,
without the resources to upgrade quickly to satisfy all the demands.
And no control over in-flows, and major disparities across the
applications. And TCP flow control not working that well, as the
aggregation of transactions is very heavy, and the
packet-per-transaction count is so low on average that TCP may not be
all that much better to the network than UDP (in terms of adjusting to
jitter in available resources). Not to mention this age-old problem
with routing table sizes and routing table updates.

Re: Westnet and Utah outage [ In reply to ]

joliveto at cwi

Nov 23, 1995, 8:58 AM

Post #15 of 21 (5904 views)

Question: Which RFC should I consult to determine acceptable delay and packet
loss?

- jeff -

On Wed, 22 Nov 95, hwb@upeksa.sdsc.edu (Hans-Werner Braun) wrote:
>>Scott asks:
>>
>>So this is to imply that you do have a problem w/ Sprint and that
>>you are picking on them in a specific way?
>>
>>COOK: your suggestion scott not mine....... If I remember correctly on
>>my one previous querry 6 weeks ago, the problem seemed to be more MCI's
>>than sprints. I am in the midst of writing a long cover story on how
>>backbones are responding to internet growth pressure and when something
>>breaks I am interest in understanding what happened. It looks in this
>>case however like people want me to yell on the sprint outage list rather
>>than here so I'll check out the possibility of doing that. Do you guys
>>have an outtage list I can join? My apologies if I have offended anyone.
>>
>>********************************************************************
>>Gordon Cook, Editor & Publisher Subscript.: Individ-ascii $85
>>The COOK Report on Internet Non Profit. $150
>>431 Greenway Ave, Ewing, NJ 08618 Small Corp & Gov't $200
>>(609) 882-2572 Corporate $350
>>Internet: cook@cookreport.com Corporate. Site Lic $650
>>Newly expanded COOK Report Web Pages http://pobox.com/cook/
>>********************************************************************
>
>I really hate to make Gordon's points here, but the network is so
>broken at times, it is hard to get interactive work done. Even an FTP
>between two NSF supercomputer centers ((so far) idle 266MHz machines at
>the end points) went at a whopping:
>
> 3320903 bytes sent in 1.1e+03 seconds (3.1 Kbytes/s)
>
>And that was already the second try, as the uncompressed file version
>just took ways too long. The packet losses were between 8 and 10
>percent.
>
>These kind of performances are ways too regular for me these days. And
>as a "user" I have very little means to find out what the hell is wrong
>with this network. I am sometimes so sick and tired of this that I am
>tempted to use the tools I have (ping and traceroute) and broadly post
>to people as to where things seem broken. And I will not care at all if
>you guys tell me "well, that's unfair, as ping and traceroute go to
>the main processor." Give me a working network, better tools, or SHUT
>THE HELL UP AND GO BACK TO FARMING. I will be glad to shut up myself,
>once you get your act together and provide smooth and transparent
>network services.
>
>Obviously I am able to send much more polite notes, but I am really
>getting sick and tired of this lousy performance and degrading network
>service qualities.
>
>I suspect MANY will increase their amplitude over the next few months
>if this continues.
>
>And I don't want to hear this bullshit about regular 10% packet losses
>being just fine, and 100% being just marginal.
>
>*At least* let people know if things are broken, so they look for
>alternatives (be it a cup of tea if short term, or another service
>provider if persistent).
>
>I think this problem is wide spread and not confined to specific
>service providers. So if someone points a finger at your competitor,
>don't be too happy about it. You may be next.
>
>Geez.
>
>

Re: Westnet and Utah outage [ In reply to ]

Nov 23, 1995, 9:59 AM

Post #16 of 21 (5895 views)

> The IETF is essentially a protocol standardization group, not
> an operations group.

As one of the ADs in the Operational Requirements area of the IETF I'll
second what Hans-Werner said.

The Operational Requirements area is designed mostly to be able to
provide feedback to protocol developers about what happened when
a new protocol was tried out in the real world. It also can
house efforts to establish measurement methodologies and has
been used as a meeting place and self help group for people who are
deploying some technology.

There is a new effort in the IETF working on figuring out what to
measure to gage the QoS of a network (the IPPM effort in the Benchmarking
Methodology Working Group) but Mike O'Dell (the other AD) and I made
it very clear that the working group could talk about termonology,
methodology & even tools but could not start saying what a "good"
or a "bad" service would be.

Bestowing good housekeeping awards on Internet providers can not
be the task of an organization that has the level of participation that
the IETF does from the Internet providers. There are a number of legal
issues here.

If someone would like to start up an Internet users group, I'd
sure be interested in watching the mailing list discussions but
the IETF would not be a good home for such a group.

Scott

Re: Westnet and Utah outage [ In reply to ]

Nov 27, 1995, 1:56 PM

Post #17 of 21 (5886 views)

In message <199511231608.IAA27230@upeksa.sdsc.edu>, Hans-Werner Braun writes:
> >Question: Which RFC should I consult to determine acceptable delay and packe
> t
> >loss?
>
> RFCs are the result of IETF activities. The IETF is essentially a
> protocol standardization group, not an operations group. I don't think
> you perceive the IETF as "running" your network, or? There may not be
> much of an alternative, though, which to a large extend is the issue at
> hand. Nobody is responsible (individually or as a consortium or
> whatever) of this anarchically organized and largely uncoordinated (at
> a systemic level) global operational environment. While IETF/RFCs could
> be utilized somehow, this is not really an issue of theirs. I sure
> would not blame the IETF for not delivering here, is this is not their
> mandate.
>
> In other email I saw it seems that the important issues are hard to
> understand for some. I (and I suspect several others) don't really care
> much about a specific tactical issue (be it an outage or whatever).
> The issue is how to make the system work with predictable performance
> and a fate sharing attitude at a global level, in a commercial and
> competitive environment that is still extremely young at that, and
> attempts to accomodate everything from mom'n'pop shops to multi-billion
> dollar industry. And exhibits exponential usage and ubiquity growth,
> without the resources to upgrade quickly to satisfy all the demands.
> And no control over in-flows, and major disparities across the
> applications. And TCP flow control not working that well, as the
> aggregation of transactions is very heavy, and the
> packet-per-transaction count is so low on average that TCP may not be
> all that much better to the network than UDP (in terms of adjusting to
> jitter in available resources). Not to mention this age-old problem
> with routing table sizes and routing table updates.

This belongs on the end2end-interest list or IPPM or elsewhere, but
I'll save a lot of people going through the archives.

In order to get X bandwidth on a given TCP flow you need to have an
average window size of X * RTT. This is expressed in terms of TCP
segments N = (X * RTT) / MSS (or more correctly the segment size in
use rather than MSS). To sustain an average window of N segments, you
must ideally reach a steady state where you cut cwnd (current window)
in half, then grow linearly, fluctuating between 2/3 and 4/3 of the
target size. This would mean one drop in 2/3 N windows or DropRate in
terms of time is 2/3 N * RTT. In one RTT on average X * RTT amount of
data flows. In practice, you rarely drop at the perfect time, so the
constant 2/3 (call it K) can be raised to 1-2. Since N = (X * RTT) /
MSS, DropRate = K * X * RTT * X * RTT / MSS. Units are b/s * sec *
b/s * sec / b, or b. The DropRate expressed in bits can be converted
to seconds or packets (divide by X or by MSS). This type of analysis
is courtesy of the good folks at PSC (Matt, Jamshid, et al).

For example, to get 40 Mb/s at 70 msec RTT and 4096 MSS, you get one
error about every 6 seconds (K=1) or 1 in 7,300 packets. If you look
at 56k Kb/s and 512 MSS you get a very interesting result. You need
one error every 66 msec or 1 error in 0.9 packets. This gives a good
incentive to increase delay. At 250 msec, you get a result of one
error in 11.7 packets (much better!).

Another interesting point to note is that you need 3 duplicate ACKs
for TCP fast retransmit to work, so your window must be at least 4
segments (and should be more). If you have a very large number of TCP
flows, where on average people get less than 1200 baud or so, the
delay you need to make TCP work well starts to exceed the magic 3
second boundary. This was discussed ad nauseum on end2end-interest.
An important result is that you need more queueing than the delay
bandwidth product for severely congested links. Another is that there
is a limit to the number of active TCP flows that can be supported per
bandwidth. One suggestion to address the latter problem is to further
drop segment size if cwnd is less than 4 segments in size and/or when
estimated RTT gets into the seconds range.

This analysis of how much loss is acceptable to TCP may not be outside
the bounds of an informational RFC, but so far none exists.

Curtis

Re: Westnet and Utah outage [ In reply to ]

joliveto at cwi

Nov 28, 1995, 10:19 PM

Post #18 of 21 (5874 views)

Hans;

Sorry...I waited for additional replies but you seemed to be the only one to
take my bait. My question was rhetorical.

I hear all this complaining on this forum about unacceptable delay and packet
loss by the ISP Community yet no "respected" industry standards body has yet
set QOS guidelines for ISP's! An old management dictum says "if its
important, measure it".

I know where to look for QOS criteria on my physical plant (T1/DS3's), I even
know where to look for QOS criteria for my old X.25 network. If we want
things to get better w/i the ISP Community...let's define what better is.

- jeff -

On Thu, 23 Nov 95, hwb@upeksa.sdsc.edu (Hans-Werner Braun) wrote:
>>Question: Which RFC should I consult to determine acceptable delay and
packet
>>loss?
>
>RFCs are the result of IETF activities. The IETF is essentially a
>protocol standardization group, not an operations group. I don't think
>you perceive the IETF as "running" your network, or? There may not be
>much of an alternative, though, which to a large extend is the issue at
>hand. Nobody is responsible (individually or as a consortium or
>whatever) of this anarchically organized and largely uncoordinated (at
>a systemic level) global operational environment. While IETF/RFCs could
>be utilized somehow, this is not really an issue of theirs. I sure
>would not blame the IETF for not delivering here, is this is not their
>mandate.
>
>In other email I saw it seems that the important issues are hard to
>understand for some. I (and I suspect several others) don't really care
>much about a specific tactical issue (be it an outage or whatever).
>The issue is how to make the system work with predictable performance
>and a fate sharing attitude at a global level, in a commercial and
>competitive environment that is still extremely young at that, and
>attempts to accomodate everything from mom'n'pop shops to multi-billion
>dollar industry. And exhibits exponential usage and ubiquity growth,
>without the resources to upgrade quickly to satisfy all the demands.
>And no control over in-flows, and major disparities across the
>applications. And TCP flow control not working that well, as the
>aggregation of transactions is very heavy, and the
>packet-per-transaction count is so low on average that TCP may not be
>all that much better to the network than UDP (in terms of adjusting to
>jitter in available resources). Not to mention this age-old problem
>with routing table sizes and routing table updates.
>
>

-----------------------------------------------------------------------------
Jeff Oliveto | Phone: +1.703.760.1764
Sr.Mgr Opns Technical Services | Fax: +1.703.760.3321
Cable & Wireless, Inc | Email: joliveto@cwi.net
1919 Gallows Road | URL: http://www.cwi.net/
Vienna, VA 22182 | NOC: +1.800.486.9999
-----------------------------------------------------------------------------

Re: Westnet and Utah outage [ In reply to ]

Nov 29, 1995, 7:28 AM

Post #19 of 21 (5895 views)

Other people have touched on it, but I'd like to re-iterate:

The quality that someone can expect out of their Internet connection,
as a practical matter, will somewhat vary with how much they're willing to
pay. It seems to me that giving someone <<1% downtime is an expensive
level of service. The Internet market today is not one where most customers
question the providers on the level of service; quite contrarily they
question the providers on how cheap they can go. This type of market
will be cost driven, and for my $19.95 unlimited PPP account, do you think
my ISP will be able to give me <<1% inaccessibility? Not without operating in
the red, I don't think.

I think most ISP's would be *delighted* to offer customers
Very High Quality service, but few customers are willing to pay for that
service. As a result, the final judgement of "how good is good enough"
will be "whatever the customer can live with," as compared to anything
that engineers like (ie 1%, 5%, etc).

Ed
ed@texas.net

(p.s. you notice I'm brushing aside the first question, being
"how do I *measure* the quality of service." Offhand, a weighted average
of all of the components that a given customer needs for a connection
makes the most sense to me.)

--
On Tue, 28 Nov 1995 joliveto@cwi.net wrote:

> Hans;
>
> Sorry...I waited for additional replies but you seemed to be the only one to
> take my bait. My question was rhetorical.
>
> I hear all this complaining on this forum about unacceptable delay and packet
> loss by the ISP Community yet no "respected" industry standards body has yet
> set QOS guidelines for ISP's! An old management dictum says "if its
> important, measure it".
>
> I know where to look for QOS criteria on my physical plant (T1/DS3's), I even
> know where to look for QOS criteria for my old X.25 network. If we want
> things to get better w/i the ISP Community...let's define what better is.
>
> - jeff -

Re: Westnet and Utah outage [ In reply to ]

forster at cisco

Nov 30, 1995, 5:01 PM

Post #20 of 21 (5888 views)

I made a private reply to Curtis on his posting earlier this week, and he
gave a nice analysis and cc'd end2end-interest rather than nanog. For
those that don't care to care to read all this, here's the summary:

> Which would you prefer? 140 msec and 0% loss or 70 msec and 5% loss?

So we get to choose between large delay or large lossage. Doesn't sound
wonderful...

I thought you folks in nanog might be interested, so with Curtis'
permission, here's the full exchange, (the original posting by Curtis is at
the at the very end).

-- Jim

Here's what I wrote:

> In message <199511272220.OAA01151@stilton.cisco.com>, Jim Forster writes:
> > Curtis,
> >
> > I think these days for lots of folks the interesting question is not what
> > happens when a single or a few high-rate TCPs get in equlibrium, but rather
> > what happens when a DS-3 or higher is filled with 56k or slower flows, each
> > of which only lasts for an average of 20 packets or so. Unfortunately,
> > these 20 packet TCP flows are what's driving the stats these days, due I
> > guess to the silly WWW (TCP per file; file per graphic; many graphics per
> > page) that's been so successful.

And Curtis's reply:

> The analysis below also applies to just under 800 TCP flows each
> getting 1/800th of a DS3 link or about 56Kb/s. The loss rate on the
> link should be about one packet in 11 if the delay can be increased to
> 250 msec. If the delay is held at 70 msec, lots of timeouts and
> terrible fairness and poor overall performance will result.
>
> Do we need an ISP to prove this to you by exhibiting terrible
> performance? If so, please speak to Jon Crowcroft. His case is 400
> flows on 4 Mb/s which is far worse, since delay would have to be
> increased over 3 seconds or segment size reduced below 552. :-(
>
> > I could try to derive the results but I'm sure you or others would do
> > better :-). How many of the packets in the 20 packet flow are at
> > equilibrium? What's the drop rate? Hmmm, very simple minded analysis says
> > that it will be large: expontential growth (doubling cwnd every ack) should
> > get above best case pretty quickly, certainly within the 20 packet flow.
> > Assume it's only above optimum once, then the packet loss rate is 1 in 20.
> > Sounds grim. Vegas TCP sounds better for these reasons, since it tracks
> > actual bw, but I'm not really qualified to judge.
> >
> > -- Jim
>
>
> Jim,
>
> The end2end-interest thread was quite long and I didn't want to repeat
> the whole thing. The initial topic was very tiny TCP flows of 3 to 4
> packets. That is a really bad problem, but should no longer be a
> realistic problem once HTTP is modified to allow it to pick up both
> the HTML page and all inline images in one TCP connection.
>
> Your example is quite reasonable. At 20 packets per flow, with no
> loss you get 1, 2, 4, 8, 3 packets per RTT or complete transfer in
> about 5 RTT. On average each TCP flow will get 20 packets / 5 RTT of
> bandwidth until congestion of 4 packets/RTT (for 552/70 msec, this is
> about 64 Kb/s). If the connection is temporarily overloaded by a
> factor of 2, this must be reduced to 2 packets/RTT. If we drop 1
> packet in 20, roughly 35% of the flows go completely untouched
> (0.95^20). Some 15% will drop one packet of the first 3 and timeout
> and slow start, resulting in less than 20 packet / 3 seconds (3
> seconds >> 5*RTT). Some 60% will drop one packet of the 4th through
> 20th, resulting in fast retransmit, no timeout, and linear growth in
> window. If the 4th is dropped, the window is cut to 2, so next few
> RTTs you get 2, 3, 4, 5, 3, or 8 RTTS (2 initial, 1 drop, 5 more).
> This is probably not quite enough to slow things down.
>
> On a DS3 with 70 msec RTT and 1500 simultaneous flows of 20 packets
> each (steady state such that the number of active flows remains about
> 1500, roughly twice what a DS3 could support) you would need a drop
> rate of on the order of 5% or more. Alternately, you could queue
> things up, doubling the delay to 140 msec and give every flow the same
> slower rate (perfect fairness in your example) and have a zero drop
> rate.
>
> Which would you prefer? 140 msec and 0% loss or 70 msec and 5% loss?
> Delay is good. We want delay for elastic traffic! But not for real
> time - use RSVP, admission control, police at the ingress and stick it
> on the front of the queue.
>
> In practice, I'd expect overload to be due to lots of flows, but not
> enough little guys to overload the link (if so, get a bigger pipe, we
> can say that and put it in practice). The overload will be due to a
> high baseline of little guys (20 packet flows, or a range of fairly
> small ones), plus some percentage of longer duration flows capable of
> sucking up the better part of a T1, giving half a chance. It is the
> latter that you want to slow down, and these are the ones that you
> *can* slow down with a fairly low drop rate.
>
> I leave it as an exercise to the reader to determine how RED fits into
> this picture (either one, my overload scenario or Jim's where all the
> flows are 20 packets in duration).
>
> The 400 flows on 4 Mb/s is an interesting (and difficult) case. I've
> suggested both allowing delay to get very large (ie: as high as 2
> seconds) and hacking the host implementation to reduce segment size to
> as low as 128 bytes when RTT gets huge or cwnd drops below 4 segments,
> holding the window to no less than 512 (4 segments) in hopes that fast
> retransmit will almost always work even in 15-20% loss situations.
>
> Curtis
>

Curtis's original posting:

> In order to get X bandwidth on a given TCP flow you need to have an
> average window size of X * RTT. This is expressed in terms of TCP
> segments N = (X * RTT) / MSS (or more correctly the segment size in
> use rather than MSS). To sustain an average window of N segments, you
> must ideally reach a steady state where you cut cwnd (current window)
> in half, then grow linearly, fluctuating between 2/3 and 4/3 of the
> target size. This would mean one drop in 2/3 N windows or DropRate in
> terms of time is 2/3 N * RTT. In one RTT on average X * RTT amount of
> data flows. In practice, you rarely drop at the perfect time, so the
> constant 2/3 (call it K) can be raised to 1-2. Since N = (X * RTT) /
> MSS, DropRate = K * X * RTT * X * RTT / MSS. Units are b/s * sec *
> b/s * sec / b, or b. The DropRate expressed in bits can be converted
> to seconds or packets (divide by X or by MSS). This type of analysis
> is courtesy of the good folks at PSC (Matt, Jamshid, et al).
>
> For example, to get 40 Mb/s at 70 msec RTT and 4096 MSS, you get one
> error about every 6 seconds (K=1) or 1 in 7,300 packets. If you look
> at 56k Kb/s and 512 MSS you get a very interesting result. You need
> one error every 66 msec or 1 error in 0.9 packets. This gives a good
> incentive to increase delay. At 250 msec, you get a result of one
> error in 11.7 packets (much better!).
>
> Another interesting point to note is that you need 3 duplicate ACKs
> for TCP fast retransmit to work, so your window must be at least 4
> segments (and should be more). If you have a very large number of TCP
> flows, where on average people get less than 1200 baud or so, the
> delay you need to make TCP work well starts to exceed the magic 3
> second boundary. This was discussed ad nauseum on end2end-interest.
> An important result is that you need more queueing than the delay
> bandwidth product for severely congested links. Another is that there
> is a limit to the number of active TCP flows that can be supported per
> bandwidth. One suggestion to address the latter problem is to further
> drop segment size if cwnd is less than 4 segments in size and/or when
> estimated RTT gets into the seconds range.
>
> This analysis of how much loss is acceptable to TCP may not be outside
> the bounds of an informational RFC, but so far none exists.
>
> Curtis

Re: Westnet and Utah outage [ In reply to ]

Dec 1, 1995, 9:10 PM

Post #21 of 21 (5885 views)

Jim,

> I made a private reply to Curtis on his posting earlier this week, and he
> gave a nice analysis and cc'd end2end-interest rather than nanog. For
> those that don't care to care to read all this, here's the summary:

[ .. summary deleted .. ]

I did mention that I didn't mind you forwarding that note.

There was subsequent discussion on the end2end-interest list. I may
be overstating the buffering requirements since the assumption is made
that a high degree of synchronization could occur. This really needs
to be backed up by simulations.

Curtis