Mailing List Archive

mae-west route servers are both down hard
This...

>From: noc@noc.ns.itd.umich.edu
>Date: Thursday, May 23, 1996 4:26 PM
>Subject: rs2.mae-west down indefinately
>
>The route server rs2.mae-west.ra.net has crashed with the same symptoms as
>rs1.mae-west. Both servers will be down indefinately. Further
>information will be passed on as it becomes available. We apologize for
>any inconvenience this may cause.
>
>------------------------------------------------------------------------
> UMNet MichNet CICNet
>Merit RA
>------------------------------------------------------------------------
>Network Systems Consultant Network Operations Center
>trouble@noc.ns.itd.umich.edu U of M Information Technology Division

...is not just an inconvenience. On the strength of Merit's publicity,
quite a few peerships have been done through route servers, and to have
both route servers down at the same time is a massive disruption to many
of the folks who connect at MAE-W.

I have two heavily overconfigured Alpha machines loaded in my van, with
Stephen Stuart's port of RSd all ready for someone from Merit or ISI to
load up their configuration tables. I've got a driver and a technician.

I need someone from Merit or ISI to call me and arrange physical access
as well as scheduling the download of the initial configuration files and
turnover of the root account and other "system keys".

The use of the machines, and the time to deliver and install them, have
all been donated by people who are suffering due to MAE-West's ills. It
is most extraordinary that we have let things suffer this long, and if
the current plan is to wait until after the three day weekend, I consider
that flatly unacceptable.

E-mail me (paul@vix.com) or call me (1 415 747 0204) to arrange details.
- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
I suppose this would be an incredibly tacky time to say I told you so?

Sean.
- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
> I suppose this would be an incredibly tacky time to say I told you so?
> Sean.

That depends. I expected you to say "I told you so", and if you want to call
that "Paul expected Sean to be incredibly tacky" that's your business ;-).

If any other NAP is running on one RS, then whoever is in charge of that NAP
should be on Red Alert with people running up and down the halls, yelling for
parts.

- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
On Fri, 24 May 1996, Paul A Vixie wrote:

> >The route server rs2.mae-west.ra.net has crashed with the same symptoms as
> >rs1.mae-west.

Hmmm....

> I have two heavily overconfigured Alpha machines loaded in my van, with
> Stephen Stuart's port of RSd all ready for someone from Merit or ISI to
> load up their configuration tables. I've got a driver and a technician.

This is nice as a fast fix, but it seems to me that identically configured
mission critical machines have crashed for the same reason (not
surprising) and now may be replaced by two other identically configured
machines. Might it not be wise to have two DIFFERENT machines doing this
job, maybe a SPARC and an Alpha?

> The use of the machines, and the time to deliver and install them, have
> all been donated by people who are suffering due to MAE-West's ills. It
> is most extraordinary that we have let things suffer this long, and if
> the current plan is to wait until after the three day weekend, I
> consider that flatly unacceptable.
> E-mail me (paul@vix.com) or call me (1 415 747 0204) to arrange details.

Glad to see that people are willing to do what it takes to keep things
operating. Somehow I suspect that Bob Metcalfe's next InfoWorld column
will be gloating over the route server failures and fail to mention
Paul and the other people who put the effort in to deal with the problem.

Sort of like how people use to say Ethernet could never be any good
because of the packet collisions... ;-)

Michael Dillon ISP & Internet Consulting
Memra Software Inc. Fax: +1-604-546-3049
http://www.memra.com E-mail: michael@memra.com


- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
### On Fri, 24 May 1996 19:41:00 +0100, Sean Doran <smd@icp.net> wrote to
### nanog@Merit.Net, paul@vix.com concerning "Re: mae-west route servers are
### both down hard":

SD> I suppose this would be an incredibly tacky time to say I told you so?

s/tacky/misleading/g

The RA services at MAE-West were never declared production because we did
not have two operational RSes deployed there. The RSes at the other NAPs
that have been declared for production were able to sustain a single-machine
failure without loss of service.


--
/*===================[ Jake Khuon <khuon@Merit.Net> ]======================+
| Systems Research Programmer, IE Group /| /|[.~|)|~|~ N E T W O R K |
| VOX: (313) 763-4907 FAX: (313) 747-3185 / |/ |[_|\| | Incorporated |
+==[. Suite C2122, Bldg. 1 4251 Plymouth Rd. Ann Arbor, MI 48105-2785 ]==*/
- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
At 09:50 PM 5/24/96 -0400, Jake Khuon wrote:

>
>The RA services at MAE-West were never declared production because we did
>not have two operational RSes deployed there. The RSes at the other NAPs
>that have been declared for production were able to sustain a single-machine
>failure without loss of service.
>

Let me see if I correctly understand this. The RA services at MAE-West
were never declared operational, yet there are service & transit providers
using them for production services?

If so, this will still be interpreted as a black eye for the RSs.

- paul

- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
At 09:50 PM 5/24/96 -0400, Jake Khuon wrote:

>
>The RA services at MAE-West were never declared production because we did
>not have two operational RSes deployed there. The RSes at the other NAPs
>that have been declared for production were able to sustain a single-machine
>failure without loss of service.
>

Let me see if I correctly understand this. The RA services at MAE-West
were never declared operational, yet there are service & transit providers
using them for production services?

If so, this will still be interpreted as a black eye for the RSs.

- paul

- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
| SD> I suppose this would be an incredibly tacky time to say I told you so?
|
| s/tacky/misleading/g

Huh? I was completely agreeing with you. Strange that you should
call that "misleading"!

| The RA services at MAE-West were never declared production because we did
| not have two operational RSes deployed there.

Only a strange group of people could assign blame to MERIT
for Paul's "quite a few peerships" which failed because they
relied upon something that was not a production service,
unless they were uncharitable and asserted that it was
MERIT's duty to push back against people using it even
as an explicitly experimental service. I wouldn't.
People should be free to have unsafe peerings if they want.

Indeed, one would have to be VERY uncharitable to MERIT
because "the strength of Merit's publicity" is in itself
pretty weak. Moreover, given the countless times Vadim Antonov,
many others and I have spent saying "relying upon the RSes
and the RA is a baaaaaaaad idea" (whether you agree or not:
the pro and con side are both debatable, and arguments for
either side are forms of publicity), it's practically
not much more than sour grapes on the part of people who
never believe an iota of the argument that the current
Internet will not safely sustain large numbers of bilateral
and multilateral peerings.

The RS failure at MAE-WEST is just one example of the
problems one likely will encounter when refusing to
accept that as a current (but not necessarily permanent) reality.

However, I do welcome people's efforts to prove me wrong,
especially where such efforts advance IP routing technologies
in potentially useful ways.

Sean.
- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
> Let me see if I correctly understand this. The RA services at MAE-West
> were never declared operational, yet there are service & transit providers
> using them for production services?
>
> If so, this will still be interpreted as a black eye for the RSs.

You're right. Technically the fault was with us (we who let our production
peerage depend on nonproduction devices in the middle). But the eye most
blackened by this week's events will be, undeservedly, the RA's.

My own view is: good idea, bad execution in this instance. We're fixing it.
- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
| Let me see if I correctly understand this. The RA services at MAE-West
| were never declared operational, yet there are service & transit providers
| using them for production services?
|
| If so, this will still be interpreted as a black eye for the RSs.

Surely, since the RA services (AFAIK) have been explicitly experimental
for some time, and it was no secret that there was a potential
single point of failure at MAE-WEST, this should be interpreted
as a black eye for those people who decided to use the RSes anyway?

Sean.

P.S.: Surely the RA has enough self-inflicted black eyes without
people blaming them for something that some form of payment
by the non-NSF93-52 clients of the RSes easily could have prevented?

Remember that MAE-WEST is (unfortunately) not the priority NAP
in the San Francisco Bay Area.

Of course, this depends alot on how one inteprets:

"Route servers are to support stable routing of the
Internet and to provide for simplified routing information to NSPs
and other attached networks" [nsf9352 s. C para. 2]

and in particular what one considers an "attached network".
(The term NSP is very clearly defined in the solicitation).

Personally, I believe that given that the only other place
in "NSF 93-52 - NETWORK ACCESS POINT MANAGER, ROUTING ARBITER, REGIONAL
NETWORK PROVIDERS, AND VERY HIGH SPEED BACKBONE NETWORK SERVICES
PROVIDER FOR NSFNET AND THE NREN(SM) PROGRAM" where these
words are used are with respect to networks attached to
the NSFNET NAPs, the folks at MAE-WEST may not really be covered by
the award at all.

Certainly the bulk of the people affected in the current outage
aren't doing much with respect to the NSFNET and the NREN program.

[.One could also be a weasel and argue that there's not much
more simple than no routing information at all...]

Of course, I don't have a copy of the RA award handy,
and I've never seen any of the NSF<->RA communications
about the RA's obligations to MAE-WEST, so please take
this for what it's worth until someone really in the know
says something concrete.

- - - - - - - - - - - - - - - - -
Re: mae-west route servers are both down hard [ In reply to ]
Does this mean that we can expect a new agenda topic at the NANOG
meeting next week? ;-)

- paul

- - - - - - - - - - - - - - - - -