Mailing List Archive

Optus/Akamai
Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:

"Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."

Anybody confirm, deny, got hit by it?

Cheers,
-- jra
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Optus/Akamai [ In reply to ]
Something def went very wrong. Completely out for ~6h with ongoing reports of some mobile cells still down, fixed line not working etc.

https://ioda.inetintel.cc.gatech.edu/asn/4804?from=1699351533&until=1699481133

________________________________
From: Outages <outages-bounces@outages.org> on behalf of Jay Ashworth via Outages <outages@outages.org>
Sent: 08 November 2023 15:20
To: outages@outages.org <outages@outages.org>
Subject: [outages] Optus/Akamai


EXTERNAL SENDER. Do not click links or open attachments unless you recognize the sender and know the content is safe. DO NOT provide your username or password.


Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:

"Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus?s configured max-prefix settings on their core bgp sessions from their route reflectors weren?t quite high enough and their whole network went down. Someone didn?t quite think through the value for internal max-prefix rules.."

Anybody confirm, deny, got hit by it?

Cheers,
-- jra
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: Optus/Akamai [ In reply to ]
Disclaimer: I am employed by Akamai

I’m not aware of any changes that could have triggered this, if someone has information they can share you can reach out to me directly or to our NOCC. I can be reached at jared@akamai.com if you have something private to share and my mobile number is easily findable as well.

- Jared

> On Nov 8, 2023, at 9:20 AM, Jay Ashworth via Outages <outages@outages.org> wrote:
>
> Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:
>
> "Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."
>
> Anybody confirm, deny, got hit by it?
>
> Cheers,
> -- jra
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages

_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages
Re: Optus/Akamai [ In reply to ]
There was definitely a major outage with Optus yesterday - not just
internet services, but mobile and landline. roughly 10 hour outage for
effectively their entire consumer network (weirdly, I have an
enterprise service which was not effected by the outage). Definitely
got hit by it - our entire corporate mobile fleet was out the whole
day.

Nobody has given a RFO yet - but I have my doubts it was just BGP
issues. That's speculation at this point.

D

On Thu, 9 Nov 2023 at 01:21, Jay Ashworth via Outages
<outages@outages.org> wrote:
>
> Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:
>
> "Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."
>
> Anybody confirm, deny, got hit by it?
>
> Cheers,
> -- jra
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages



--
veg·e·tar·i·an:
Ancient tribal slang for the village idiot who can't hunt, fish or ride
_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages
Re: Optus/Akamai [ In reply to ]
That makes no sense.  How would tripping the max prefix on a single peer
cause a major outage?



On 11/8/2023 3:13 PM, DaZZa via Outages wrote:
> There was definitely a major outage with Optus yesterday - not just
> internet services, but mobile and landline. roughly 10 hour outage for
> effectively their entire consumer network (weirdly, I have an
> enterprise service which was not effected by the outage). Definitely
> got hit by it - our entire corporate mobile fleet was out the whole
> day.
>
> Nobody has given a RFO yet - but I have my doubts it was just BGP
> issues. That's speculation at this point.
>
> D
>
> On Thu, 9 Nov 2023 at 01:21, Jay Ashworth via Outages
> <outages@outages.org> wrote:
>> Off of Facebook of all places, I hear this report that Optus had a major outage in Australia last night, which is believed to have been a BGP screw up involving Akamai:
>>
>> "Nope: it was Akamai increasing their advertised prefix count from their cdn nodes inside the Optus network, and Optus’s configured max-prefix settings on their core bgp sessions from their route reflectors weren’t quite high enough and their whole network went down. Someone didn’t quite think through the value for internal max-prefix rules.."
>>
>> Anybody confirm, deny, got hit by it?
>>
>> Cheers,
>> -- jra
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.
>> _______________________________________________
>> Outages mailing list
>> Outages@outages.org
>> https://puck.nether.net/mailman/listinfo/outages
>
>

--
================================================================
Aaron Wendel
Chief Technical Officer
Wholesale Internet, Inc. (AS 32097)
(816)550-9030
http://www.wholesaleinternet.com
================================================================

_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages
Re: Optus/Akamai [ In reply to ]
Hi,

On Wed, Nov 08, 2023 at 03:25:13PM -0600, Aaron Wendel via Outages wrote:
> That makes no sense.? How would tripping the max prefix on a single peer cause
> a major outage?

If you have

client --> border router -> route reflector -> all other BGP speakers

and the "RR -> BGP speakers" sessions get tripped due to "client sending
in too many new routes", then your whole network will fall apart until
you can shutdown that initial BGP session (or re-provision the other
sessions, which might not work due to "there is no connectivity to
the management systems, because, BGP is down").

*Iff* this happens, and you do not have working OOB access including
being able to do local config changes on the routers ("all configs are
done by the automatization, no local access possible"), such a problem will
be extremely messy to recover. Especially figuring out *what* happened,
if you have no visibility because the routers have lost the route to your
syslog servers....

gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages
Re: Optus/Akamai [ In reply to ]
Are people using max-prefix for iBGP sessions?

That seems.....unwise.

-Steve



On Thu, Nov 9, 2023 at 1:24?AM Gert Doering via Outages <outages@outages.org>
wrote:

> Hi,
>
> On Wed, Nov 08, 2023 at 03:25:13PM -0600, Aaron Wendel via Outages wrote:
> > That makes no sense. How would tripping the max prefix on a single peer
> cause
> > a major outage?
>
> If you have
>
> client --> border router -> route reflector -> all other BGP speakers
>
> and the "RR -> BGP speakers" sessions get tripped due to "client sending
> in too many new routes", then your whole network will fall apart until
> you can shutdown that initial BGP session (or re-provision the other
> sessions, which might not work due to "there is no connectivity to
> the management systems, because, BGP is down").
>
> *Iff* this happens, and you do not have working OOB access including
> being able to do local config changes on the routers ("all configs are
> done by the automatization, no local access possible"), such a problem will
> be extremely messy to recover. Especially figuring out *what* happened,
> if you have no visibility because the routers have lost the route to your
> syslog servers....
>
> gert
> --
> "If was one thing all people took for granted, was conviction that if you
> feed honest figures into a computer, honest figures come out. Never
> doubted
> it myself till I met a computer with a sense of humor."
> Robert A. Heinlein, The Moon is a Harsh
> Mistress
>
> Gert Doering - Munich, Germany
> gert@greenie.muc.de
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
>


--

-Steve
Re: Optus/Akamai [ In reply to ]
On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org>
wrote:

> Are people using max-prefix for iBGP sessions?
>
> That seems.....unwise.
>


Yes, I find it hard to imagine what risks would be mitigated by applying
max-prefix limits to IBGP sessions.

Kind regards,

Job

>
Re: Optus/Akamai [ In reply to ]
On Thu, Nov 9, 2023 at 7:22?PM Job Snijders via Outages
<outages@outages.org> wrote:
>
> On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote:
>>
>> Are people using max-prefix for iBGP sessions?
>>
>> That seems.....unwise.
>
>
>
> Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.

TCAM limits, perhaps ?


Rubens
_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages
Re: Optus/Akamai [ In reply to ]
Surely it's better to drop some routes than to drop the whole session.

On Thu, Nov 9, 2023, 5:26?PM Rubens Kuhl via Outages <outages@outages.org>
wrote:

> On Thu, Nov 9, 2023 at 7:22?PM Job Snijders via Outages
> <outages@outages.org> wrote:
> >
> > On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <
> outages@outages.org> wrote:
> >>
> >> Are people using max-prefix for iBGP sessions?
> >>
> >> That seems.....unwise.
> >
> >
> >
> > Yes, I find it hard to imagine what risks would be mitigated by applying
> max-prefix limits to IBGP sessions.
>
> TCAM limits, perhaps ?
>
>
> Rubens
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
>
Re: Optus/Akamai [ In reply to ]
----- On Nov 9, 2023, at 2:31 PM, Ross Tajvar via Outages outages@outages.org wrote:

Hi,

> Surely it's better to drop some routes than to drop the whole session.

Not necessarily. If part of your IBGP feed includes 10/8 (common in datacenters), you
may end up in a situation where it's better to disable a leaking peer.

Imagine a Trident 3 box in a spine layer, with 32 northbound peers and 32 southbound
peers. Northbound you'll receive a few hundred routes, including 10/8. Southbound,
you'll receive a few hundred host subnets and perhaps some any-cast /32s. You'll
aggregate the host subnets northbound.

If one of those northbound peers leaks a full table, you run the risk of losing some
of your host subnets, or perhaps important any-cast routes. I would prefer to lose
the offending peer over risking host routes, primarily because shutting down the peer
is deterministic while you have no control over which routes are lost.

Thanks,

Sabri
_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages
Re: Optus/Akamai [ In reply to ]
In my experience with such things, it's not a matter of "some routes will
be dropped", but "traffic to certain destinations will blackhole unless
there's a covering route in FIB."

On Thu, 9 Nov 2023, Ross Tajvar via Outages wrote:

> Surely it's better to drop some routes than to drop the whole session.
>
> On Thu, Nov 9, 2023, 5:26?PM Rubens Kuhl via Outages <outages@outages.org> wrote:
> On Thu, Nov 9, 2023 at 7:22?PM Job Snijders via Outages
> <outages@outages.org> wrote:
> >
> > On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote:
> >>
> >> Are people using max-prefix for iBGP sessions?
> >>
> >> That seems.....unwise.
> >
> >
> >
> > Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.
>
> TCAM limits, perhaps ?
>
>
> Rubens
> _______________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages
>
>
>

----------------------------------------------------------------------
Jon Lewis, MCP :) | I route
Blue Stream Fiber, Sr. Neteng | therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________
Re: Optus/Akamai [ In reply to ]
Those are actually one and the same if you think about it, but yes, you are correct… To make matters worse, it’s not particularly deterministic which routes are dropped.

However, I’ll make the argument that losing some random assortment of destinations is almost always going to be better than losing your entire ability to control your peering routers.

YMMV.

Owen


> On Nov 14, 2023, at 11:37, Jon Lewis via Outages <outages@outages.org> wrote:
>
> In my experience with such things, it's not a matter of "some routes will be dropped", but "traffic to certain destinations will blackhole unless there's a covering route in FIB."
>
> On Thu, 9 Nov 2023, Ross Tajvar via Outages wrote:
>
>> Surely it's better to drop some routes than to drop the whole session.
>> On Thu, Nov 9, 2023, 5:26?PM Rubens Kuhl via Outages <outages@outages.org> wrote:
>> On Thu, Nov 9, 2023 at 7:22?PM Job Snijders via Outages
>> <outages@outages.org> wrote:
>> >
>> > On Thu, 9 Nov 2023 at 23:18, Steve Meuse via Outages <outages@outages.org> wrote:
>> >>
>> >> Are people using max-prefix for iBGP sessions?
>> >>
>> >> That seems.....unwise.
>> >
>> >
>> >
>> > Yes, I find it hard to imagine what risks would be mitigated by applying max-prefix limits to IBGP sessions.
>>
>> TCAM limits, perhaps ?
>>
>> Rubens
>> _______________________________________________
>> Outages mailing list
>> Outages@outages.org
>> https://puck.nether.net/mailman/listinfo/outages
>>
>
> ----------------------------------------------------------------------
> Jon Lewis, MCP :) | I route
> Blue Stream Fiber, Sr. Neteng | therefore you are
> _________ http://www.lewis.org/~jlewis/pgp for PGP public key________________________________________________________
> Outages mailing list
> Outages@outages.org
> https://puck.nether.net/mailman/listinfo/outages

_______________________________________________
Outages mailing list
Outages@outages.org
https://puck.nether.net/mailman/listinfo/outages