Mailing List Archive

Any2 LAX
Did Any2 LAX barf last night between about 1am and 8am Pacific time?
Re: Any2 LAX [ In reply to ]
Something happened... All my traffic dropped between 1am to 3am.

-Mike

> On Jun 11, 2021, at 10:11, Seth Mattinen <sethm@rollernet.us> wrote:
>
> ?Did Any2 LAX barf last night between about 1am and 8am Pacific time?
Re: Any2 LAX [ In reply to ]
On Fri, 11 Jun 2021, Seth Mattinen wrote:

> Did Any2 LAX barf last night between about 1am and 8am Pacific time?

More like 00:00-7:45 (Pacific time).

Anyone know what broke, and why the IX was dead for nearly 8 hours?
This is our second recent issue with "an Any2 IX", having dealt with an IX
partition event at Any2 Denver just a few weeks ago.

----------------------------------------------------------------------
Jon Lewis, MCP :) | I route
StackPath, Sr. Neteng | therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Re: Any2 LAX [ In reply to ]
On 6/11/21 10:16 AM, Jon Lewis wrote:
> On Fri, 11 Jun 2021, Seth Mattinen wrote:
>
>> Did Any2 LAX barf last night between about 1am and 8am Pacific time?
>
> More like 00:00-7:45 (Pacific time).
>
> Anyone know what broke, and why the IX was dead for nearly 8 hours?
> This is our second recent issue with "an Any2 IX", having dealt with an
> IX partition event at Any2 Denver just a few weeks ago.
>


What I saw was a lot of unreachable nexthops (I'm in LA2) on routes
advertised through the route servers. Most of my direct BGP sessions
were down, but a handful were still working including the route servers.

For example, I was getting routes for AS29791 from the route servers,
but nexthop 206.72.211.106 was dead to me. Not to pick on Internap other
than a mutual customer called me directly at 1am and wanted to know why
things were down.

I killed the route server sessions and went back to sleep.

Feels like LA1 and LA2 got split, but however the route servers
interconnect still worked, which was problematic.
Re: Any2 LAX [ In reply to ]
Yea, it was down but both RS are online and feeding us unreachable nexthops
during the outage .

On Sat, Jun 12, 2021 at 1:27 AM Seth Mattinen <sethm@rollernet.us> wrote:

> On 6/11/21 10:16 AM, Jon Lewis wrote:
> > On Fri, 11 Jun 2021, Seth Mattinen wrote:
> >
> >> Did Any2 LAX barf last night between about 1am and 8am Pacific time?
> >
> > More like 00:00-7:45 (Pacific time).
> >
> > Anyone know what broke, and why the IX was dead for nearly 8 hours?
> > This is our second recent issue with "an Any2 IX", having dealt with an
> > IX partition event at Any2 Denver just a few weeks ago.
> >
>
>
> What I saw was a lot of unreachable nexthops (I'm in LA2) on routes
> advertised through the route servers. Most of my direct BGP sessions
> were down, but a handful were still working including the route servers.
>
> For example, I was getting routes for AS29791 from the route servers,
> but nexthop 206.72.211.106 was dead to me. Not to pick on Internap other
> than a mutual customer called me directly at 1am and wanted to know why
> things were down.
>
> I killed the route server sessions and went back to sleep.
>
> Feels like LA1 and LA2 got split, but however the route servers
> interconnect still worked, which was problematic.
>
Re: Any2 LAX [ In reply to ]
Also saw a major traffic drop. There is a Root Cause to be issued early in
the week I'm told.


-jim

On Fri, Jun 11, 2021 at 2:42 PM Siyuan Miao <aveline@misaka.io> wrote:

> Yea, it was down but both RS are online and feeding us unreachable
> nexthops during the outage .
>
> On Sat, Jun 12, 2021 at 1:27 AM Seth Mattinen <sethm@rollernet.us> wrote:
>
>> On 6/11/21 10:16 AM, Jon Lewis wrote:
>> > On Fri, 11 Jun 2021, Seth Mattinen wrote:
>> >
>> >> Did Any2 LAX barf last night between about 1am and 8am Pacific time?
>> >
>> > More like 00:00-7:45 (Pacific time).
>> >
>> > Anyone know what broke, and why the IX was dead for nearly 8 hours?
>> > This is our second recent issue with "an Any2 IX", having dealt with an
>> > IX partition event at Any2 Denver just a few weeks ago.
>> >
>>
>>
>> What I saw was a lot of unreachable nexthops (I'm in LA2) on routes
>> advertised through the route servers. Most of my direct BGP sessions
>> were down, but a handful were still working including the route servers.
>>
>> For example, I was getting routes for AS29791 from the route servers,
>> but nexthop 206.72.211.106 was dead to me. Not to pick on Internap other
>> than a mutual customer called me directly at 1am and wanted to know why
>> things were down.
>>
>> I killed the route server sessions and went back to sleep.
>>
>> Feels like LA1 and LA2 got split, but however the route servers
>> interconnect still worked, which was problematic.
>>
>
Re: Any2 LAX [ In reply to ]
This is what I got from those guys ...

--

CoreSite Incident Notification


Description: During a planned maintenance event to integrate new
hardware into our MPLS core an extreme dip in Any2 traffic was observed.
After about 4 hours running in a degraded state, an emergency case was
opened with the hardware vendor. After working with the hardware vendor
to rule out any possible hardware or software bugs, the network
engineering team located the source of the traffic loss. It was an
errant configuration applied by the custom automation written to build
LSP's in our MPLS network. A formal IR will be provided for this event.




On 6/11/21 8:03 PM, jim deleskie wrote:
> Also saw a major traffic drop. There is a Root Cause to be issued early
> in the week I'm told.
>
>
> -jim
>
> On Fri, Jun 11, 2021 at 2:42 PM Siyuan Miao <aveline@misaka.io
> <mailto:aveline@misaka.io>> wrote:
>
> Yea, it was down but both RS are online and feeding us unreachable
> nexthops during the outage .
>
> On Sat, Jun 12, 2021 at 1:27 AM Seth Mattinen <sethm@rollernet.us
> <mailto:sethm@rollernet.us>> wrote:
>
> On 6/11/21 10:16 AM, Jon Lewis wrote:
> > On Fri, 11 Jun 2021, Seth Mattinen wrote:
> >
> >> Did Any2 LAX barf last night between about 1am and 8am
> Pacific time?
> >
> > More like 00:00-7:45 (Pacific time).
> >
> > Anyone know what broke, and why the IX was dead for nearly 8
> hours?
> > This is our second recent issue with "an Any2 IX", having
> dealt with an
> > IX partition event at Any2 Denver just a few weeks ago.
> >
>
>
> What I saw was a lot of unreachable nexthops (I'm in LA2) on routes
> advertised through the route servers. Most of my direct BGP
> sessions
> were down, but a handful were still working including the route
> servers.
>
> For example, I was getting routes for AS29791 from the route
> servers,
> but nexthop 206.72.211.106 was dead to me. Not to pick on
> Internap other
> than a mutual customer called me directly at 1am and wanted to
> know why
> things were down.
>
> I killed the route server sessions and went back to sleep.
>
> Feels like LA1 and LA2 got split, but however the route servers
> interconnect still worked, which was problematic.
>
Re: Any2 LAX [ In reply to ]
On 6/11/21 11:18 AM, Bryan Holloway wrote:
> This is what I got from those guys ...
>
> --
>
> CoreSite Incident Notification
>
>
> Description:  During a planned maintenance event to integrate new
> hardware into our MPLS core an extreme dip in Any2 traffic was observed.
> After about 4 hours running in a degraded state, an emergency case was
> opened with the hardware vendor. After working with the hardware vendor
> to rule out any possible hardware or software bugs, the network
> engineering team located the source of the traffic loss. It was an
> errant configuration applied by the custom automation written to build
> LSP's in our MPLS network. A formal IR will be provided for this event.
>
>


Was that an automated email? Last time I got any email from Coresite was
April 22.
Re: Any2 LAX [ In reply to ]
On 6/11/21 8:25 PM, Seth Mattinen wrote:
> On 6/11/21 11:18 AM, Bryan Holloway wrote:
>> This is what I got from those guys ...
>>
>> --
>>
>> CoreSite Incident Notification
>>
>>
>> Description:  During a planned maintenance event to integrate new
>> hardware into our MPLS core an extreme dip in Any2 traffic was
>> observed. After about 4 hours running in a degraded state, an
>> emergency case was opened with the hardware vendor. After working with
>> the hardware vendor to rule out any possible hardware or software
>> bugs, the network engineering team located the source of the traffic
>> loss. It was an errant configuration applied by the custom automation
>> written to build LSP's in our MPLS network. A formal IR will be
>> provided for this event.
>>
>>
>
>
> Was that an automated email? Last time I got any email from Coresite was
> April 22.


Automated.
Re: Any2 LAX [ In reply to ]
Like Seth, i haven’t gotten anything from them.

-Mike

> On Jun 11, 2021, at 12:08, Bryan Holloway <bryan@shout.net> wrote:
>
> ?
>
>> On 6/11/21 8:25 PM, Seth Mattinen wrote:
>>> On 6/11/21 11:18 AM, Bryan Holloway wrote:
>>> This is what I got from those guys ...
>>>
>>> --
>>>
>>> CoreSite Incident Notification
>>>
>>>
>>> Description: During a planned maintenance event to integrate new hardware into our MPLS core an extreme dip in Any2 traffic was observed. After about 4 hours running in a degraded state, an emergency case was opened with the hardware vendor. After working with the hardware vendor to rule out any possible hardware or software bugs, the network engineering team located the source of the traffic loss. It was an errant configuration applied by the custom automation written to build LSP's in our MPLS network. A formal IR will be provided for this event.
>>>
>>>
>> Was that an automated email? Last time I got any email from Coresite was April 22.
>
>
> Automated.