Mailing List Archive: Re: links on the blink (fwd)

Re: links on the blink (fwd) [ In reply to ]

asp at uunet

Nov 9, 1995, 12:30 AM

Post #76 of 93 (2278 views)

Permalink

Re: links on the blink (fwd) [ In reply to ]

curtis at ans

Nov 9, 1995, 12:39 AM

Post #77 of 93 (2308 views)

Permalink

In message <95Nov9.012409-0000_est.20701+10@chops.icp.net>, Sean Doran writes:
>
> First you will have to explain to me how taking routing
> from the RA route server would help avoid a collapse
> of my iBGP mesh...
>
> Sean.

You got me on that. I can't help you. I doesn't look like we've lost
an IBGP connection since Nov 5 at 00:06:37 (E147). :-)

We only lost a number of EBGP sessions between AS690 and our Cisco
concentrators. We did lose quite a few EBGP sessions to other providers.

It's usually a year or more between anything at all of the magnitude
of an IBGP collapse on our net since about 1992 when rcp_routed
managed 3 backbone wide core dumps in a week so my experience with
this sort of things is quite limited. We still do find an occasional
gated bug though.

I defer to the experts. :-)

Curtis

ps - faster routers in the world. yeah right. so much for the
Bradner tests.

:-) :-)

Re: links on the blink (fwd) [ In reply to ]

michael at memra

Nov 9, 1995, 12:53 AM

Post #78 of 93 (2294 views)

Permalink

On Wed, 8 Nov 1995, Dennis Ferguson wrote:

> to be a bleak prospect. There comes a point where you just run out of
> router bandwidth, and nothing but more router bandwidth is going to fix
> it, but the bigger bandwidth boxes are no where to be found.

Are you sure that creative ways of using lots of smaller T3 bandwidth
boxes couldn't solve the problem?

If we assume that bandwidth on the lines is not a problem (no shortages)
and that T3 routers with smaller routing tables could make effective use
of the bandwidth, then is it possible to do the following?

In Hypothetica, PA are ABC ISP who has a T1 to Sprint and XYZ ISP who has
a line to MCI. Both have so-called portable addresses from the swamp and
thus consume space in the core routing tables. This means that traffic
from ABC to XYZ travels from Hypothetica to Pennsauken, thence to MCI and
back to Hypothetica. However, suppose we clean up the swamp by simply
removing it entirely from all the core routing tables. What then? Every
provider puts a default route in each core router. This default route
points to a special router whose job is to just deal with the swamp
routes and nothing else. In effect we are partitioning the routing tables
in two. Under this regimen packets from ABC to XYZ travel to Pennsauken,
then follow the default to Fort Worth and thence to Chicago where the
swamp router lives. The swamp router uses a separate continental backbone
to route the traffic back to Fort Worth, back to Pensauken and thence to
MCI where the traffic takes a similar circuitous route before reaching
Hypothetica.

Seems terribly wasteful of bandwidth doesn't it? But if something like
this can help prevent routers from flapping and if bandwidth is
avaialbale, perhaps it could work. If the parallel lines carrying "swamp"
traffic are of lower bandwidth than the main lines and suffer congestion,
then I suppose ABC could simply renumber to be within Sprint's aggregate
and be back on the mainline.

In fact, if this really is a viable technical solution, perhaps the
threat of deployment would cause a rush of renumbering and make it easier
for NSP's to just say no to swamp addresses.

> seem to be anything to spend the money on which is clearly going to fix
> anything. I don't think this is a happy state to be in, in fact it sucks,

If you are right, then yes it sucks. Obvoiusly the ATM and OC3
technologies are right where you have pegged them, but what about
parallelism using existing DS3 technology? And if this is done, are there
mux/demux boxes that can handle DS3's<->OC3 ?

> profit motives. I think we're victims of our having own success creep up
> to and pass the technology when we weren't paying close enough attention,
> and the only thing left to do seems to be to try to play catch-up from
> a position of increasing disadvantage.

One nice side effect is that this may force the video-on-demand folks off
the Internet and into straight ATM instead. I rather like the future
scenario where the globe is girdled by an IPng data network and a separate
parallel video/ATM network.

Michael Dillon Voice: +1-604-546-8022
Memra Software Inc. Fax: +1-604-542-4130
http://www.memra.com E-mail: michael@memra.com

Re: links on the blink (fwd) [ In reply to ]

avg at sprint

Nov 9, 1995, 2:19 AM

Post #79 of 93 (2297 views)

Permalink

Michael Dillon <michael@memra.com> wrote:

>Are you sure that creative ways of using lots of smaller T3 bandwidth
>boxes couldn't solve the problem?

There are hard architectural limits on the number of core routers in
the defaultless backbone. Backbone has to have a relatively small
number of BGP speakers, to avoid severe routing information propagation
problems.

There _are_ "creative ways", see for example SprintLink presentations
on NANOG, the planned "3-dimensional grid" backbone topology (it allows
to grow the aggregate capacity to about OC-3). However, you inevitably
run into capacity limitation of LAN interconnects. Then, there's a
problem with load balancing, as it generally cannot be done with exterior
protocols which have to select a single path. (And there's no easy
way to do per-destination load distribution on a large scale).

It's only a kludge to survive until (and if) somebody will build real
central-office routers.

>If you are right, then yes it sucks. Obvoiusly the ATM and OC3
>technologies are right where you have pegged them, but what about
>parallelism using existing DS3 technology? And if this is done, are there
>mux/demux boxes that can handle DS3's<->OC3 ?

There are boxes which can *statically* mux/demux OC-192 to DS-3s.
Synchronous muxes is not a high technology, being basicallly decorated
shift registers.

>One nice side effect is that this may force the video-on-demand folks off
>the Internet and into straight ATM instead. I rather like the future
>scenario where the globe is girdled by an IPng data network and a separate
>parallel video/ATM network.

That already happened. I would rather see things going in opposite
deirection. (For VOD applications ATM is adequate, as it only demultiplexes
big pipes from VOD servers into small access pipes; there's no backwards
data flow, and no statistical multiplexing).

However, the utility of VOD is very questionable, as the basic need to see
the movie quite adequately and cheaply satisfyed by low-tech video rentals.
It is not a "killer application", definitely. Video telephony and distributed
computing network can be such applications but they beg for symmetrical IP
connectivity.

--vadim

Re: links on the blink (fwd) [ In reply to ]

bmanning at ISI

Nov 9, 1995, 7:39 AM

Post #80 of 93 (2280 views)

Permalink

>
> > > a/ people MUST withdraw as many prefixes as possible
> > > b/ the background route-flap MUST be reduced
> >
> > You could also take routing from the RA route servers. This would
> > solve your problem. I think this was mentioned at the last NANOG
> > meeting so it is not a new solution either.
>
> The RA does route aggregation? I didn't know that. So I can send it
> all of my more specifics and it will aggregate them for me? Neat.
> --asp

Well, the RA does not, but the route server code running in the route servers
does.

--bill

Re: links on the blink (fwd) [ In reply to ]

bmanning at ISI

Nov 9, 1995, 7:42 AM

Post #81 of 93 (2279 views)

Permalink

> However, you inevitably
> run into capacity limitation of LAN interconnects. Then, there's a
>
> --vadim
>

Just for grins, how fast do the LAN interconnects need to be?

--bill

Re: links on the blink (fwd) [ In reply to ]

hwb at upeksa

Nov 9, 1995, 8:08 AM

Post #82 of 93 (2288 views)

Permalink

>However, the utility of VOD is very questionable, as the basic need to see
>the movie quite adequately and cheaply satisfyed by low-tech video rentals.

In a cost/benefit tradeoff, delivering videos like pizza (including 30
minutes or free) is probably really a more efficient way to go than
wrapping them into packets (or cells). At least for the time being.
Just need a good way to pick'em up next day, a problem resolved for
pizza for ages already as well (translation into something volatile
after use).

Re: links on the blink (fwd) [ In reply to ]

curtis at ans

Nov 9, 1995, 9:37 AM

Post #83 of 93 (2292 views)

Permalink

In message <QQzpbq15078.199511090730@rodan.UU.NET>, Andrew Partan writes:
> > > a/ people MUST withdraw as many prefixes as possible
> > > b/ the background route-flap MUST be reduced
> >
> > You could also take routing from the RA route servers. This would
> > solve your problem. I think this was mentioned at the last NANOG
> > meeting so it is not a new solution either.
>
> The RA does route aggregation? I didn't know that. So I can send it
> all of my more specifics and it will aggregate them for me? Neat.
> --asp

It just reduces route flap by doing BGP dampenning plus reducing the
number of peering session you need to maintain. That helps with b/ in
the list above.

As Sean pointed out if your network is imploding, possibly because the
flap is within your own network (you pointed out at NANOG that this
may be the problem for SprintLink and AlterNet), then there is nothing
the RS can do to help you.

Curtis

Re: links on the blink (fwd) [ In reply to ]

avg at sprint

Nov 9, 1995, 3:04 PM

Post #84 of 93 (2295 views)

Permalink

Bill Manning wrote:

Just for grins, how fast do the LAN interconnects need to be?

A FDDI is about as fast as a single DS-3. So it makes little
sense to connect mode than one DS-3 to a BB box, since there's
no way to get that traffic thru a LAN to customer access boxes
in the cluster!

Ideally, the capacity of LAN *attachments* (not the total bandwidth
of the LAN switch!) should be about the same as the capacity
of backbone links, multiplied by the number of backbone links attached
to each backbone router.

Of course, there are tricks to partially offload traffic from the
cluster's LAN (like connecting high-speed customers directly to
backbone boxes, and arranging BB topology so the transit traffic
won't cross the LAN -- this works in case of "duplicate" backbone).

Anyway, that boils down to the LAN switch being the single point
of failure. There's no reasonable way to do "parallel" LANs with
load sharing, as the number of destinations in intra-POP traffic
is relatively small. Aggregations makes things even worse.

All in all, the backbone router should be scalable to the point that
it does not need any clustering; and redundancy is provided by
"internal" duplication.

--vadim

Re: links on the blink (fwd) [ In reply to ]

kozowski at structured

Nov 9, 1995, 5:57 PM

Post #85 of 93 (2284 views)

Permalink

>First you will have to explain to me how taking routing
>from the RA route server would help avoid a collapse
>of my iBGP mesh...
>
> Sean.

Just out of curiousity, is SprintLink still using AGS+'s in it's
backbone?

Eric

--
Eric Kozowski Structured Network Systems, Inc.
kozowski@structured.net Better, Cheaper, Faster -- pick any two.
(503)656-3530 Voice "Providing High Quality, Reliable Internet Service"
(800)881-0962 Voice 56k to DS1

Re: links on the blink (fwd) [ In reply to ]

paul at vix

Nov 9, 1995, 8:09 PM

Post #86 of 93 (2276 views)

Permalink

>
> --
> Paul
>

Re: links on the blink (fwd) [ In reply to ]

paul at vix

Nov 9, 1995, 8:09 PM

Post #87 of 93 (2280 views)

Permalink

>
> --
> Paul
>

Re: links on the blink (fwd) [ In reply to ]

paul at vix

Nov 9, 1995, 8:46 PM

Post #88 of 93 (2292 views)

Permalink

I have no idea how those two blank messages got out. My apologies.

> Ideally, the capacity of LAN *attachments* (not the total bandwidth
> of the LAN switch!) should be about the same as the capacity
> of backbone links, multiplied by the number of backbone links attached
> to each backbone router.

Agreed, which is why the GIGAswitch works and why, once Cisco has a reasonable
100BaseT interface processor, a Grand Junction will work.

> Anyway, that boils down to the LAN switch being the single point
> of failure.

Statistically speaking, the things that go wrong with modern machines that
have no moving parts are usually software or configuration related. Since
the LAN switch is mostly hardware with a little bit of bridging and spanning
tree stuff, it is a lot less complicated than the average router. I think
router software and configuration errors are going to pretty much drive the
failure rates for the next few years. I'm not worried about the LAN switches
since there are so many worse/nearer things to worry about.

> All in all, the backbone router should be scalable to the point that
> it does not need any clustering; and redundancy is provided by
> "internal" duplication.

Agreed. I had a conversation over coffee a few months ago and this topic
came up; my conclusion was that we were in for a really rough ride since each
previous time that the Internet backbone (or the average high end customer)
needed more bandwidth, there was something available from the datacom folks
to fit the bill. (9.6 analog, 56K digital, 1.5M digital, 45M digital.)
Furthermore, the various framing and tariff issues moved right along and have
made it possible to provision Internet service in ways that would have made
no sense just a few years ago (witness frame relay and SMDS.)

But this time, we are well and truly screwed. The datacom people have
gradually moved to the "one world" model of ATM, and have put all their
effort behind the mighty Cell. They thought their grand unification of
voice, data, VOD, etc, was finally a way to stop dinking around with lots
of incompatible systems and oddball gateways (those of you who have tried
to get inter-LATA ISDN or international T1/E1 working know what I mean).

So the datacom community's answer to the Internet community is "if you're
not able to get what you need from T3, use ATM which will be infinitely
scalable and can be used as an end-to-end system." Who knows? If DEC can
make a 90MHz NVAX chip after some PhD somewhere "proved" that the multibyte
instruction encoding made certain kinds of parallism impossible and that
the fastest VAX would run at 60MHz, and if Intel can make a sewing machine's
CPU into a 64-bit monster with three kinds of virtual memory, then perhaps
the ATM folks can figure out how to do all the lookasides and exceptions
they need to do in their little itty bitty hundred-picosecond window.

But I don't think so. I think we are going to have to get clever, and that
we have used up our bank account of times when brute force coming out of the
sky can save us. This time, gentlemen, the calvary is not coming. As much
as I despise the topic Matt and Vadim have been discussing today, I think
we're looking at some kind of load sharing. Probably static load sharing
with no adaptation to traffic type or flow, since as Jerry pointed out a
few weeks back, that's a slippery slope and a lot of people have died on it.

Re: links on the blink (fwd) [ In reply to ]

paul at vix

Nov 9, 1995, 10:09 PM

Post #89 of 93 (2294 views)

Permalink

> By "load sharing" I presume you mean some sort of TDM where you have n
> real lines and every n'th packet goes out on any particular line. I
> suppose this would be even simpler to do at the byte level if we assume
> that all n lines go to the same endpoint.
>
> Or do you mean something different?

There's one nice thing about the brick wall we're headed for, and that's
that it'll nail every part of our system simultaneously. We're running
out of bus bandwidth, link capacity, route memory, and human brain power
for keeping it all straight -- and we'll probably hit all the walls in the
same week.

But no, to answer your question, I mean something different. Ganging links
and doing reverse aggregation would still require unifying the flows inside
each switching center. (ATM avoids this by only putting the flows back
together at endpoints.) One important wall we're about to hit is in the
capacity of the switching centers (routers). It's not just DS3 that's not
enough; if I ask a Cisco 75xx to route among four OC12*'s it'll up and die.

The boxes who are doing well at higher bit rates are the ones who don't do
anything complicated. Thus the GIGAswitch and the Cascade, which each do
their (limited) jobs well even though they have many more bits flowing when
they're full and busy than a 75xx can handle without getting dizzy.

So, what I think I mean by static load balancing would look (at the BGP level)
like a bazillion NAPs and a bazillion**2 MED's to keep local traffic local.
It means doing star topologies with DS3's rather than buying a virtual star
via SMDS or FR or ATM from some company who doesn't have a physical star to
implement it with.

This assumes that we can handle 100 or 500 views of 30,000 routes inside a
commonly occuring switching center. Right now we can't. If pressured, I
think Sean would admit that this is at the root of his desire for "CIDR uber
alles" and a six-entry routing table in his core routers. I don't consider
that goal achievable for IPv4 and the allocation plans I've seen for IPv6
do not give me cause for hope. So we are going to see Ncubed enter the
routing business with 1GB routers and 256 RP's and an SSE in every IP. It
will cost $1M to provision a superhub and a lot of the smaller folks will
just go under or become second tier customers of the few who can afford to
run a defaultless core.

Re: links on the blink (fwd) [ In reply to ]

michael at memra

Nov 9, 1995, 10:51 PM

Post #90 of 93 (2286 views)

Permalink

On Thu, 9 Nov 1995, Paul A Vixie wrote:

> sky can save us. This time, gentlemen, the calvary is not coming. As much
> as I despise the topic Matt and Vadim have been discussing today, I think
> we're looking at some kind of load sharing. Probably static load sharing
> with no adaptation to traffic type or flow, since as Jerry pointed out a
> few weeks back, that's a slippery slope and a lot of people have died on it.

By "load sharing" I presume you mean some sort of TDM where you have n
real lines and every n'th packet goes out on any particular line. I
suppose this would be even simpler to do at the byte level if we assume
that all n lines go to the same endpoint.

Or do you mean something different?

Michael Dillon Voice: +1-604-546-8022
Memra Software Inc. Fax: +1-604-542-4130
http://www.memra.com E-mail: michael@memra.com

Re: links on the blink (fwd) [ In reply to ]

D.Mills at cs

Nov 10, 1995, 12:23 PM

Post #91 of 93 (2280 views)

Permalink

Dennis,

Geeze, another rare country heard from. I see you have moved again. From
recent messages, it seems that the self-similar phenomenon has been
rediscovered again. I would assume the providers have added that worry
to their bag of troubles.

Dave

Re: links on the blink (fwd) [ In reply to ]

avg at sprint

Nov 10, 1995, 3:28 PM

Post #92 of 93 (2281 views)

Permalink

Michael Dillon wrote:

>By "load sharing" I presume you mean some sort of TDM where you have n
>real lines and every n'th packet goes out on any particular line. I
>suppose this would be even simpler to do at the byte level if we assume
>that all n lines go to the same endpoint.

Yep. However the round-robin load-sharing you described breaks the
sequencing of packets (i.e. they will often arrive in the different
order from which they were sent) and that breaks a lot of TCP stacks
out there. Cisco's solution is to to do sharing on per-host or per-network
basis (i.e. all packets to host/network go along the same path) which
is, to say, of very little use in the backbone. It breaks down particularly
spectacularly when you have high degree of aggregation or "calling centers",
i.e. hosts or networks attracting particularly heavy traffic.

I have a simple solution for the sequencing problem in load sharing
setups, which produces load patterns nearly as good as round-robin.
Don't ask how if you're not going to invest :) That algorithm is
central to the proposed hypercube router architecture, and i'm
past the age when i did things just for fun.

--vadim

Re: links on the blink (fwd) [ In reply to ]

rcraig at cisco

Nov 10, 1995, 3:31 PM

Post #93 of 93 (2281 views)

Permalink

At 18:28 95/11/10, Vadim Antonov wrote:
>Michael Dillon wrote:
>
>>By "load sharing" I presume you mean some sort of TDM where you have n
>>real lines and every n'th packet goes out on any particular line. I
>>suppose this would be even simpler to do at the byte level if we assume
>>that all n lines go to the same endpoint.
>
>Yep. However the round-robin load-sharing you described breaks the
>sequencing of packets (i.e. they will often arrive in the different
>order from which they were sent) and that breaks a lot of TCP stacks
>out there. Cisco's solution is to to do sharing on per-host or per-network
>basis (i.e. all packets to host/network go along the same path) which
>is, to say, of very little use in the backbone. It breaks down particularly
>spectacularly when you have high degree of aggregation or "calling centers",
>i.e. hosts or networks attracting particularly heavy traffic.
>
>I have a simple solution for the sequencing problem in load sharing
>setups, which produces load patterns nearly as good as round-robin.
>Don't ask how if you're not going to invest :) That algorithm is
>central to the proposed hypercube router architecture, and i'm
>past the age when i did things just for fun.
>
>--vadim

Vadim! You've already said too much!! :_)

Robert.