Mailing List Archive

1 2 3  View All
Re: BGP route hijack by AS10990 [ In reply to ]
>
> So while I will continue pushing for the rest of the world to create
> ROA's, turn on RPKI and enable ROV, I'll also advocate that operators
> continue to have both AS- and prefix-based filters. Not either/or, but
> both. Also, max-prefix as a matter of course.
>

This is the correct approach. We are a very long way from being able to
flip the switch to say "everyone drop any RPKI UNKNOWN" , so in the
meantime best practices for non-ROA covered prefixes still have to be done.

On Fri, Jul 31, 2020 at 9:35 AM Mark Tinka <mark.tinka@seacom.com> wrote:

>
>
> On 31/Jul/20 03:57, Aftab Siddiqui wrote:
> > Not a single prefix was signed, what I saw. May be good reason for
> > Rogers, Charter, TWC etc to do that now. It would have stopped the
> > propagation at Telia.
>
> While I am a huge proponent for ROA's and ROV, it is a massive
> expectation to req filtering to work on the basis of all BGP
> participants creating their ROA's. It's what I would like, but there is
> always going to be a lag on this one.
>
> If none of the prefixes had a ROA, no amount of Telia's shiny new "we
> drop invalids" machine would have helped, as we saw with this incident.
> ROV really only comes into its own when the majority of the Internet has
> correct ROA's setup. In the absence of that, it's a powerful but
> toothless feature.
>
> So while I will continue pushing for the rest of the world to create
> ROA's, turn on RPKI and enable ROV, I'll also advocate that operators
> continue to have both AS- and prefix-based filters. Not either/or, but
> both. Also, max-prefix as a matter of course.
>
> Mark.
>
Re: BGP route hijack by AS10990 [ In reply to ]
They solve a need that isn't reasonably solved any other way that doesn't have similar drawbacks.


Some optimizers need to be redesigned to be safer by default.


Some networks need to be safer by default as well.




-----
Mike Hammett
Intelligent Computing Solutions
http://www.ics-il.com

Midwest-IX
http://www.midwest-ix.com

----- Original Message -----

From: "Mark Tinka" <mark.tinka@seacom.com>
To: nanog@nanog.org
Sent: Friday, July 31, 2020 8:59:51 AM
Subject: Re: BGP route hijack by AS10990



On 30/Jul/20 19:44, Tom Beecher wrote:
> It's not like there are scorecards, but there's a lot of fault to go
> around.
>
> However, again, BGP "Optimizers" are bad. The conditions by which the
> inadvertent leak occur need to be fixed , no question. But in
> scenarios like this, as-path length generally limits impact to "Oh
> crap, I'll fix that, sorry!." Once you start squirting out more
> specifics, you get to own some of the egg on the face.

For about a year or so, I've been saying that the next generation of
network engineers are being trained for a GUI-based point & click world,
as opposed to understanding what protocols and CLI do.

There is no shortage of annual workshops that teach BGP Multi-Homing.

Despite the horror BGP optimizers have displayed in recent years, they
seem to be flying off the shelves, still. Is this a clear example of the
next generation of network engineers that we are breeding?

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
On 31/Jul/20 16:01, Baldur Norddahl wrote:
> How do you know that none of the prefixes had ROA? The ones that had
> got stopped by Telias filter, so we would never know.

Like I said, "if". If they did, then they were protected. If they
didn't, well...


>
> This is exactly the situation where RPKI already works. My and yours
> prefixes, provided you like me have ROAs, will not be leaked through
> Telia and a number of other large transits. Even if they did not have
> proper filters in place.

I don't have to like you, but I will always honour your ROA :-).

That is my point, though - this works if ROA's are present. We know this
to not be the case - so having proper filters in place is not optional.
Not at least until we have 100% diffusion of ROA's + ROV. And even then,
we probably still want some kind of safety net.


>
> Driving without RPKI / ROA is like driving without a seatbelt. You are
> fine until the day someone makes a mistake and then you wish you did
> your job at signing those prefixes sooner.

Don't disagree with you there.

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
On 31/Jul/20 16:07, Job Snijders wrote:

> Could it be ... we didn't see any RPKI Invalids through Telia *because*
> they are rejecting RPKI invalids?
>
> As far as I know the BGP Polluter software does not have a configuration
> setting to only ruin the day of operators without ROAs. :-)
>
> I think the system worked as designed: without RPKI ROV @ Telia the
> damage might have been worse.

Indeed.

What I was saying is we don't know how many of the leaked routes were
dropped by Telia's ROV, if any.

We really shouldn't be having to discuss how bad this could have gotten,
because it means we are excusing Telia's inability to do proper
filtering across its eBGP sessions with its customers.

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
On 31/Jul/20 16:29, Mike Hammett wrote:
> They solve a need that isn't reasonably solved any other way that
> doesn't have similar drawbacks.
>
> Some optimizers need to be redesigned to be safer by default.
>
> Some networks need to be safer by default as well.

Almost every product ever made does solve a need. You will find at least
one customer who is happy with what they paid their money for.

But BGP-4 is vulnerable enough as it is, and the Internet has moved on
in leaps and bounds since 1994 (RFC 1654).

Until we see BGP-5, we need to look after our community. And if that
means holding the BGP optimizers to a higher standard, so be it.

As they say, "You can't blame a monkey for botching a brain surgery".

Plenty of industries strongly "guide" (I'll avoid "regulate") their
actors to ensure standards and results (medicine, aviation, energy,
construction, e.t.c.). If the acceptance bar to a BGP actor is an
optional CCNA or JNCIA certification, we shall learn the hard way, as we
did with this and similar incidents.

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
Telia's statement:

https://blog.teliacarrier.com/2020/07/31/bgp-hijack-of-july-30-2020/

(tl;dr: it was as-path filtering only, as opposed to prefix filtering,
the former has been removed as an option)
Re: BGP route hijack by AS10990 [ In reply to ]
----- On Jul 31, 2020, at 2:33 PM, Lukas Tribus lists@ltri.eu wrote:

Hi,

> Telia's statement:
>
> https://blog.teliacarrier.com/2020/07/31/bgp-hijack-of-july-30-2020/
>
> (tl;dr: it was as-path filtering only, as opposed to prefix filtering,
> the former has been removed as an option)

Kudos to Telia for admitting their mistakes, and fixing their processes.

Thanks,

Sabri
Re: BGP route hijack by AS10990 [ In reply to ]
On 31/Jul/20 23:38, Sabri Berisha wrote:

> Kudos to Telia for admitting their mistakes, and fixing their processes.

Considering Telia's scope and "experience", that is one thing. But for
the general good of the Internet, the number of intended or
unintentional route hijacks in recent years, and all the noise that
rises on this and other lists each time we have such incidents (this
won't be the last), Telia should not have waited to be called out in
order to get this fixed.

Do we know if they are fixing this on just this customer of theirs, or
all their customers? I know this has been their filtering policy with us
(SEACOM) since 2014, as I pointed out earlier today. There has not been
a shortage of similar incidents between now and then, where the
community has consistently called for more deliberate and effective
route filtering across inter-AS arrangements.

There is massive responsibility for the community to act correctly for
the Internet to succeed. Especially so during these Coronavirus times
where the world depends on us to keep whatever shred of an economy is
left up and running. Doubly so if you are a major concern (like Telia)
for the core of the Internet.

It's great that they are fixing this - but this was TOTALLY avoidable.
That we won't see this again - even from the same the actors - isn't
something I have high confidence in guaranteeing, based on current
experience.

We can all do better. We should all do better.

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
----- On Jul 31, 2020, at 2:50 PM, Mark Tinka mark.tinka@seacom.com wrote:

Hi Mark,

> On 31/Jul/20 23:38, Sabri Berisha wrote:
>
>> Kudos to Telia for admitting their mistakes, and fixing their processes.
>
> It's great that they are fixing this - but this was TOTALLY avoidable.

I'm not sure if you read their entire Mea Culpa, but they did indicate that
the root cause of this issue was the provisioning of a legacy filter that
they are no longer using. So effectively, that makes it a human error.

We're going to a point where a single error is no longer causing outages,
something very similar to my favorite analogy: avation. Pretty much every
major air disaster was caused by a combination of factors. Pretty much
every major outage these days is caused by a combination of factors.

The manual provisioning of an inadequate filter, combined with an
automation error on the side of a customer (which by itself was probably
caused by a combination of factors), caused this issue.

We learn from every outage. And instead of radio silence, they fessed up
and fixed the issue. Have a look at the ASRS program :)

Thanks,

Sabri
Re: BGP route hijack by AS10990 [ In reply to ]
To your point with regards to multiple failures combined causing an outage, here's some basic reading on the Swiss cheese model: https://en.wikipedia.org/wiki/Swiss_cheese_model

From over here it looks like the legacy filter was a latent failure, and the BGP automation from the downstream peer of Telia was an active failure (combined caused the outage). Now from the downstream peer's point of view, perhaps the cause of their BGP automation failure was latent also, but we wouldn't know without more details.

Pretty interesting topic.
Re: BGP route hijack by AS10990 [ In reply to ]
On 1/Aug/20 02:17, Sabri Berisha wrote:

> I'm not sure if you read their entire Mea Culpa, but they did indicate that
> the root cause of this issue was the provisioning of a legacy filter that
> they are no longer using. So effectively, that makes it a human error.
>
> We're going to a point where a single error is no longer causing outages,
> something very similar to my favorite analogy: avation. Pretty much every
> major air disaster was caused by a combination of factors. Pretty much
> every major outage these days is caused by a combination of factors.
>
> The manual provisioning of an inadequate filter, combined with an
> automation error on the side of a customer (which by itself was probably
> caused by a combination of factors), caused this issue.
>
> We learn from every outage. And instead of radio silence, they fessed up
> and fixed the issue. Have a look at the ASRS program :)

What I meant by "TOTALLY avoidable" is that "this particular plane
crash" has happened in the exact same way, for the exact same reasons,
over and over again.

Aviation learns from mistakes that don't generally recur in the exact
same way for the exact same reasons.

Telia and others have known about these issues from them happening to
other operators. When we see these issues, we go back and look at our
own networks to implement the fixes that solve the problem the last time
it happened. That's the idea.

The difference between us and aviation is that fundamental flaws or
mistakes that impact safety are required to be fixed and checked if you
want to keep operating in the industry. We don't have that, so...

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
On Sat, Aug 1, 2020 at 4:21 AM Mark Tinka <mark.tinka@seacom.com> wrote:

>
>
> What I meant by "TOTALLY avoidable" is that "this particular plane
> crash" has happened in the exact same way, for the exact same reasons,
> over and over again.
>
> Aviation learns from mistakes that don't generally recur in the exact
> same way for the exact same reasons.


Aviation is regulated.

I am not normally supporting a heavy hand in regulation, but i think it is
fair to say Noction and similar BGP optimizers are unsafe at any speed and
the FTC or similar should ban them in the USA. They harm consumers and are
a risk to national security / critical infrastructure

Noction and similar could have set basic defaults (no-export, only create
/25 bogus routes to limit scope), but they have been clear that their greed
to suck up traffic does not benefit from these defaults and they wont do
it.

Tar and feather them. FTC, do your job.

FTC has done good work before
https://www.ftc.gov/news-events/press-releases/2017/01/ftc-charges-d-link-put-consumers-privacy-risk-due-inadequate

Noction — delete your account
Re: BGP route hijack by AS10990 [ In reply to ]
Mark Tinka wrote on 01/08/2020 12:20:
> The difference between us and aviation is that fundamental flaws or
> mistakes that impact safety are required to be fixed and checked if you
> want to keep operating in the industry. We don't have that, so...

... so once again, route optimisers were at the heart of another serious
route leaking incident.

BGP is designed to prevent loops from happening, and has tools like
no-export to help prevent inadvertent leaks.

When people build "BGP optimisers" which reinject a prefix into a
routing mesh with the entire as-path stripped and then they refuse to
apply the basic minimum of common sense by refusing point blank to tag
prefixes with no-export, it's a matter of certainty that leaks are going
to happen, and that when they do, they'll cause damage.

It's about as responsible as shipping a shotgun with the safety disabled
and then handing it to a newbie. After all, the safety makes it more
difficult to operate and if the newbie shoots themselves, it was their
fault. And if they shot someone else, they shouldn't have got in the
way, right?

Nick
Re: BGP route hijack by AS10990 [ In reply to ]
On 1/Aug/20 15:50, Ca By wrote:

>
> Aviation is regulated.

Which is my point. While, like you, I am not in support in heavy-handed
regulation like most life & death industries require, we also can't be
leaving our industry open for any actor to do as they please.


>
> I am not normally supporting a heavy hand in regulation, but i think
> it is fair to say Noction and similar BGP optimizers are unsafe at any
> speed and the FTC or similar should ban them in the USA. They harm
> consumers and are a risk to national security / critical infrastructure
>
> Noction and similar could have set basic defaults (no-export, only
> create /25 bogus routes to limit scope), but they have been clear that
> their greed to suck up traffic does not benefit from these defaults
> and they wont do it. 
>
> Tar and feather them. FTC, do your job. 
>
> FTC has done good work before 
> https://www.ftc.gov/news-events/press-releases/2017/01/ftc-charges-d-link-put-consumers-privacy-risk-due-inadequate
>
> Noction — delete your account

+1.

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
On 1/Aug/20 16:44, Nick Hilliard wrote:

> ... so once again, route optimisers were at the heart of another
> serious route leaking incident.
>
> BGP is designed to prevent loops from happening, and has tools like
> no-export to help prevent inadvertent leaks.
>
> When people build "BGP optimisers" which reinject a prefix into a
> routing mesh with the entire as-path stripped and then they refuse to
> apply the basic minimum of common sense by refusing point blank to tag
> prefixes with no-export, it's a matter of certainty that leaks are
> going to happen, and that when they do, they'll cause damage.
>
> It's about as responsible as shipping a shotgun with the safety
> disabled and then handing it to a newbie.  After all, the safety makes
> it more difficult to operate and if the newbie shoots themselves, it
> was their fault.  And if they shot someone else, they shouldn't have
> got in the way, right?

All in all, agreed.

While gun ownership and use is highly regulated (and penalized if
violated) in almost all countries, it suffers the same problem as folk
that have access to and drive cars without a valid license.

In our case, we don't really have anything beyond person-to-person trust
in doing their part to not only adhere to global BCOP's for BGP
operation, but to also understand what they are doing with the equipment
they have, as well as the BGP protocol itself.

Without some plan in place to make sure BGP actors do so with sufficient
knowledge and care, these problems are only going to worsen as the next
crop of network engineers prefer a BGP optimizer with a point & click
GUI to actually understanding BGP Multi-Homing principles and techniques.

I'm not opposed to Cameron's suggestion on how to deal with BGP
optimizers :-).

The issue of correctly filtering at eBGP hand-off points has been beaten
to death probably longer than I have been a member of this mailing list.
So...

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
> On Aug 1, 2020, at 04:20 , Mark Tinka <mark.tinka@seacom.com> wrote:
>
>
>
> On 1/Aug/20 02:17, Sabri Berisha wrote:
>
>> I'm not sure if you read their entire Mea Culpa, but they did indicate that
>> the root cause of this issue was the provisioning of a legacy filter that
>> they are no longer using. So effectively, that makes it a human error.
>>
>> We're going to a point where a single error is no longer causing outages,
>> something very similar to my favorite analogy: avation. Pretty much every
>> major air disaster was caused by a combination of factors. Pretty much
>> every major outage these days is caused by a combination of factors.
>>
>> The manual provisioning of an inadequate filter, combined with an
>> automation error on the side of a customer (which by itself was probably
>> caused by a combination of factors), caused this issue.
>>
>> We learn from every outage. And instead of radio silence, they fessed up
>> and fixed the issue. Have a look at the ASRS program :)
>
> What I meant by "TOTALLY avoidable" is that "this particular plane
> crash" has happened in the exact same way, for the exact same reasons,
> over and over again.

That’s also true of Asiana 214.
(Root cause: 5 pilots failed to pay attention to the approach)

https://www.ntsb.gov/investigations/AccidentReports/Reports/AAR1401.pdf

(The full report probably only interests pilots, but the executive summary on
pages xi - xv is a good read).

Worth noting, contrary to the public perception of airline accidents, despite the
near total destruction of the airframe in this incident, 288 of 291 passengers
and all of the crew survived. Of the 307 people on board, only 49 suffered
serious injuries. (serious is defined as an injury requiring >48 hours of
hospitalization within 7 days of the accident in which the injury was sustained).
(49CFR§830.2)

For those that find 5 pages of type TL;DR, the key findings are in the first
paragraph after the last bullet point on page xv.

> Aviation learns from mistakes that don't generally recur in the exact
> same way for the exact same reasons.

Aviation makes a strong effort in this area, perhaps stronger than any other
human endeavor, especially when you’re talking about the fraction of
Aviation known in the US as “Part 121 Scheduled Air Carrier Services”.

However, as noted above, there are exceptions.

In fact, there are striking parallels between Asiana 214 and this incident.

The tools to avoid the accident in question automatically were available to the
pilots, but they failed to turn them on (autothrottle).

The tools to avoid this incident were available to Telia, but they
failed to turn them on.

Owen
Re: BGP route hijack by AS10990 [ In reply to ]
On 1/Aug/20 17:49, Owen DeLong wrote:

> Aviation makes a strong effort in this area, perhaps stronger than any other
> human endeavor, especially when you’re talking about the fraction of
> Aviation known in the US as “Part 121 Scheduled Air Carrier Services”.
>
> However, as noted above, there are exceptions.
>
> In fact, there are striking parallels between Asiana 214 and this incident.
>
> The tools to avoid the accident in question automatically were available to the
> pilots, but they failed to turn them on (autothrottle).
>
> The tools to avoid this incident were available to Telia, but they
> failed to turn them on.

Agreed, the leading cause of aircraft incidents is human error. When
human errors in aeroplane accidents are repeated, it's usually because
of poor crew resource management, poor training, low experience, poor
situational awareness, crew fatigue, crew disorientation, not following
checklists... that sort of thing.

We've made a whole hymn out of "do proper filtering at eBGP hand-off
points" over the years. Network operators are not always working under
pressure like airline pilots do. On a quiet, calm afternoon, an engineer
can comb the network to make sure all potential mistakes that have been
shouted about for years within our community are plugged, especially
when working at an "experienced" operation such as Telia and similar.

It's almost a "do once and forget, and watch it repeat" type-thing, vs.
airline pilots who need to be on it 110%, every second of every flight,
even if they've got 25,000hrs under their epaulettes.

It shouldn't be this hard...

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
> On Aug 1, 2020, at 09:09 , Mark Tinka <mark.tinka@seacom.com> wrote:
>
>
>
> On 1/Aug/20 17:49, Owen DeLong wrote:
>
>> Aviation makes a strong effort in this area, perhaps stronger than any other
>> human endeavor, especially when you’re talking about the fraction of
>> Aviation known in the US as “Part 121 Scheduled Air Carrier Services”.
>>
>> However, as noted above, there are exceptions.
>>
>> In fact, there are striking parallels between Asiana 214 and this incident.
>>
>> The tools to avoid the accident in question automatically were available to the
>> pilots, but they failed to turn them on (autothrottle).
>>
>> The tools to avoid this incident were available to Telia, but they
>> failed to turn them on.
>
> Agreed, the leading cause of aircraft incidents is human error. When
> human errors in aeroplane accidents are repeated, it's usually because
> of poor crew resource management, poor training, low experience, poor
> situational awareness, crew fatigue, crew disorientation, not following
> checklists... that sort of thing.

Let’s be clear… This was not an incident, it was an accident.

In the US at least, under 49CFR§830.2 the two are specifically defined
as follows:

Aircraft accident means an occurrence associated with the operation of an aircraft
which takes place between the time any person boards the aircraft with the intention
of flight and all such persons have disembarked, and in which any person suffers death
or serious injury, or in which the aircraft receives substantial damage. For purposes of
this part, the definition of “aircraft accident” includes “unmanned aircraft accident,”
as defined herein.

Serious injury is defined in my previous message (reference to the same code section)

Substantial damage is defined as “damage or failure which adversely affects the
structural strength, performance, or flight characteristics of the aircraft and which
would normally require major repair or replacement of the affected component. Engine failure or damage limited
to an engine if only one engine fails or is damaged, bent fairings or cowling, dented skin, small punctured holses
in the skin or fabric, ground damage to rotor or propeller blade, and damage to landing gear, wheels, tires, flaps,
engine accessories, brakes, or wingtips are not considered “substantial damage” for the purpose of this part.

An “Incident” is an occurrence other than an accident, associated with the operation of an aircraft,
which affects or could affect the safety of operations.

> We've made a whole hymn out of "do proper filtering at eBGP hand-off
> points" over the years. Network operators are not always working under
> pressure like airline pilots do. On a quiet, calm afternoon, an engineer
> can comb the network to make sure all potential mistakes that have been
> shouted about for years within our community are plugged, especially
> when working at an "experienced" operation such as Telia and similar.

Airline pilots are not always under pressure, either. In fact, airline flying is
90% boredom, 9+% routine operations (procedures for preparation and
departure, departure and climb-out, preparations for approach and landing,
descent, approach, and landing) and <1% actual pressure (IROPS,
in-flight emergencies, etc.).

I say this not only as someone who’s spent a lot of time as a passenger,
but also as a commercial instrument-rated pilot.

> It's almost a "do once and forget, and watch it repeat" type-thing, vs.
> airline pilots who need to be on it 110%, every second of every flight,
> even if they've got 25,000hrs under their epaulettes.

ROFLMAO, if you truly believe this, you have no concept of life in the
cockpit.

Yes, airline pilots need to be paying attention even in the most routine
phases of flight, but in reality, 90+% of every flight is routine monitoring
of systems, essentially checking the “Ts”…
Time: Is the flight progressing as expected
Are we where we expected to be at this time?
Is the fuel consumption in line with our expectations?
Turn: Are we on course?
How far to the next heading change?
Throttle: Is our performance correct for this point in the flight?
Are we at the desired altitude, attitude, power, and airspeed?
If applicable, are the auto throttles in the correct mode?
Twist: Are any adjustments or preparations on Radios/Navigation/FMS needed?
This is where you check to make sure that not only are you on the correct
frequency now (com, navigation, etc.), but that you also have things set
for the next change.
It’s also where you make those changes (e.g. flip to the next VOR) if that’s
due.
Track: How does our track compare to our intended course.
This is where the heading is adjusted, if necessary, to achieve the desired course.
With modern automation in the cockpit, this is mostly a glance at the indicators
to see that the autopilot is still engaged in the correct mode and holding the
desired course.
Talk: Interaction with ATC
Any compulsory reports due? Are we in compliance with our clearance, etc.

Now, in a classic single-engine aircraft, these 6 Ts are a constant effort
for the pilot. In a modern airliner, once it’s at cruise, it’s:
Time: 99% automated, check the gas gauges and ground vs. airspeed to
make sure they match expectations.
Turn: 99+% automated, you programmed your route into the FMS and George
has it from there. (All autopilots are named George[1]).
Throttle: 99+% automated. Are the auto throttles active in the correct mode?
Twist: 99+% automated. Other than the occasional frequency handoff, the
radios are 99% managed by the FMS… Thanks, George!
Track: 99+% automated. Is the automation doing something untoward?
Talk: Workload here, but only if ATC calls you for the most part. Generally
about 5 seconds every 30-90 minutes in cruise flight.

This is not to take away from the skill, training, or capabilities of those who have
put in the effort and have what it takes to attain not only an ATP (Airline Transport
Pilot) certificate, but also get on and keep a pilot job at an airline. It’s definitely no
minor feat to accomplish all of that and as a general rule, they are hard-working
highly skilled highly trained professionals. However, 99% of piloting is best
summed up as “Pilots use their superior training and planning abilities and
their superior judgment to avoid situations in which their superior skills are
required.”

I have tremendous respect for pilots. I am a pilot. But hyperbole such
as what you present above is merely common misconception.

Owen

[1] Why is the autopilot called “George?” — There’s no definite answer, but the
two most common theories are:
+ The first practical autopilot was invented by George DeBeeson (This is fact, but whether
or not the colloquial name for autopilots is because of this isn’t certain)
+ RAF pilots named their aircraft in general “George” after King George, the owner
of all RAF aircraft. (started in WWII under King George VI)
https://airplaneacademy.com/why-is-the-autopilot-called-george-two-prevailing-theories/
Re: BGP route hijack by AS10990 [ In reply to ]
On 01/08/2020 00:50, Mark Tinka wrote:
On 31/Jul/20 23:38, Sabri Berisha wrote:
Kudos to Telia for admitting their mistakes, and fixing their processes.
Considering Telia's scope and "experience", that is one thing. But for the general good of the Internet, the number of intended or unintentional route hijacks in recent years, and all the noise that rises on this and other lists each time we have such incidents (this won't be the last), Telia should not have waited to be called out in order to get this fixed. Do we know if they are fixing this on just this customer of theirs, or all their customers? I know this has been their filtering policy with us (SEACOM) since 2014, as I pointed out earlier today. There has not been a shortage of similar incidents between now and then, where the community has consistently called for more deliberate and effective route filtering across inter-AS arrangements.


AS  level filtering is easy.  IP prefix level filtering is hard.  Especially when you are in the top 200:

https://asrank.caida.org/"]https://asrank.caida.org/




That being said, and due to these BGP "polluters" constantly doing the same thing, wouldn't an easy fix be to use the max-prefix/prefix-limit option:

https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25160-bgp-maximum-prefix.html"]https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25160-bgp-maximum-prefix.html

https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/prefix-limit-edit-protocols-bgp.html"]https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/prefix-limit-edit-protocols-bgp.html




For every BGP peer,  the ISP determines what the current max-prefix currently is.  Then add in 2% and set the max-prefix. 

An errant BGP polluter would then only have limited damage to the Internet routing table.

Not the greatest solution, but easy to implement via a one line change on every BGP peer.




Smaller ISPs can easily do it on their 10 BGP peers so as to limit damage as to what they will hear from their neighbors.





-Hank




Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
Re: BGP route hijack by AS10990 [ In reply to ]
On 1/Aug/20 18:46, Owen DeLong wrote:

> ROFLMAO, if you truly believe this, you have no concept of life in the
> cockpit.

I was born into aviation, with both my mom and dad licensed ATPL pilots
for several decades. So I know my way around a number of different
cockpits.

The goal wasn't to turn this thread into an aviation one, but to focus
on what we can do better in Internet operations for more accountability.

Let's stay on-topic, please. Thanks.

Mark.
Re: BGP route hijack by AS10990 [ In reply to ]
Hi,

----- On Aug 1, 2020, at 8:49 AM, Owen DeLong owen@delong.com wrote:

> In fact, there are striking parallels between Asiana 214 and this incident.

Yes. Children of the magenta line. Depending on automation, and no clue what to
do when the Instrument Landing System goes down.

But, the most important parallel is (hopefully) yet to come. One major outcome of
the Asiana investigation was the call for more training, as the crew did not
properly understand how the aircraft worked.

The same can be said here. Noction and/or its operators appear to not understand
how BGP works, and/or what safety measures must be deployed to ensure that the
larger internet will not be hurt by misconfiguration.

I also agree with Job, that Noction has some responsibility here. And as I
understand more and more about it, I must now agree with Mark T that this
was an avoidable incident (although not because of Telia, but because Noction's
decision to not enable NO_EXPORT by default).

Thanks,

Sabri
Re: BGP route hijack by AS10990 [ In reply to ]
> On Aug 1, 2020, at 11:14 , Hank Nussbacher <hank@interall.co.il> wrote:
>
> On 01/08/2020 00:50, Mark Tinka wrote:
>> On 31/Jul/20 23:38, Sabri Berisha wrote:
>>
>>> Kudos to Telia for admitting their mistakes, and fixing their processes.
>> Considering Telia's scope and "experience", that is one thing. But for
>> the general good of the Internet, the number of intended or
>> unintentional route hijacks in recent years, and all the noise that
>> rises on this and other lists each time we have such incidents (this
>> won't be the last), Telia should not have waited to be called out in
>> order to get this fixed.
>>
>> Do we know if they are fixing this on just this customer of theirs, or
>> all their customers? I know this has been their filtering policy with us
>> (SEACOM) since 2014, as I pointed out earlier today. There has not been
>> a shortage of similar incidents between now and then, where the
>> community has consistently called for more deliberate and effective
>> route filtering across inter-AS arrangements.
>>
>>
> AS level filtering is easy. IP prefix level filtering is hard. Especially when you are in the top 200:
> https://asrank.caida.org/ <https://asrank.caida.org/>
IP Prefix level filtering at backbone<->backbone connections is hard (and mostly pointless).

IP Prefix level filtering at the customer edge is not that hard, no matter how large of a transit
provider you are. Customer edge filtration by Telia in this case would have prevented this
problem from spreading beyond the misconfigured ASN.

> That being said, and due to these BGP "polluters" constantly doing the same thing, wouldn't an easy fix be to use the max-prefix/prefix-limit option:
> https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25160-bgp-maximum-prefix.html <https://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/25160-bgp-maximum-prefix.html>
> https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/prefix-limit-edit-protocols-bgp.html <https://www.juniper.net/documentation/en_US/junos/topics/reference/configuration-statement/prefix-limit-edit-protocols-bgp.html>
That’s a decent pair of suspenders to go with the belt of prefix filtration at the edge, but it’s no substitute.

> For every BGP peer, the ISP determines what the current max-prefix currently is. Then add in 2% and set the max-prefix.
> An errant BGP polluter would then only have limited damage to the Internet routing table.
> Not the greatest solution, but easy to implement via a one line change on every BGP peer.

To the best of my knowledge, that’s already fairly common practice. It’s usually more like 10% (2% would require way
too much active change and create churn and risk).

Owen
Re: BGP route hijack by AS10990 [ In reply to ]
> On Aug 1, 2020, at 12:03 , Sabri Berisha <sabri@cluecentral.net> wrote:
>
> Hi,
>
> ----- On Aug 1, 2020, at 8:49 AM, Owen DeLong owen@delong.com wrote:
>
>> In fact, there are striking parallels between Asiana 214 and this incident.
>
> Yes. Children of the magenta line. Depending on automation, and no clue what to
> do when the Instrument Landing System goes down.

This wasn’t a case of the ILS going down. This was a case where the automation was
put in the wrong mode (accidentally) without any of the pilots in the cockpit noticing
it until it was too late. The problem was discovered and power applied 8 seconds before impact.
It takes 19 seconds for the engines on a 777 to spool up to adequate power for a go-around
at the airspeed and in the configuration that existed at the time.

> But, the most important parallel is (hopefully) yet to come. One major outcome of
> the Asiana investigation was the call for more training, as the crew did not
> properly understand how the aircraft worked.

That’s true in virtually every human factors accident, but in reality, failure to understand
the automation was a tiny contributing factor in this accident. Every pilot is taught
early in their ab initio training that they must monitor the approach carefully and make
sure not to bleed off too much energy (airspeed) in the process.

There’s a very common and easily identifiable pattern to an under-powered approach
on autopilot that all of the pilots in the cockpit should have readily recognized if they
were even paying the slightest attention to the approach…

1. Airplane begins to dip below glide slope.
2. Autopilot raises nose to reduce descent rate and recapture glide slope.
3. Increased pitch = greater induced drag = lower airspeed.
4. Lower airspeed = less lift = goto 1.

Until power is applied, this process will repeat until one of the following events occurs:
1. Landing short of the runway (as in the case of Asiana 214)
2. Power is applied and the approach is stabilized
3. The pitch attitude exceeds the critical angle of attack and the wings stall,
causing an abrupt pitch down.

This cycle is well understood by every student pilot before they can be endorsed
for their first solo flight.

No amount of training will make up for the utter and complete failure to pay attention
to the approach. This is one of the reasons US carriers have a “sterile cockpit” rule.

In most cases, the sterile cockpit rule is approximately this: “Below 10,000 feet or in
other critical phases of flight (emergency situations, unusual climbs or descents,
mechanical difficulties, etc.), cockpit communications are limited to those related to
the safe operation of the aircraft.”

> The same can be said here. Noction and/or its operators appear to not understand
> how BGP works, and/or what safety measures must be deployed to ensure that the
> larger internet will not be hurt by misconfiguration.

On one level, there’s validity to your claim here. On the other hand, there’s a certain
extent to which your telling hammer manufacturers that they have to make it impossible
for a carpenter to injure his thumb by missing the nail.

> I also agree with Job, that Noction has some responsibility here. And as I
> understand more and more about it, I must now agree with Mark T that this
> was an avoidable incident (although not because of Telia, but because Noction's
> decision to not enable NO_EXPORT by default).

I disagree. I think Noction and Telia are both culpable here. Most of the top 200 providers
manage to do prefix filtering at the customer edge, so I don’t see any reason to give
Telia a free pass here.

Owen
Re: BGP route hijack by AS10990 [ In reply to ]
Sabri Berisha wrote on 01/08/2020 20:03:
> but because Noction's decision to not enable NO_EXPORT by default

the primary problem is not this but that Noction reinjects prefixes into
the local ibgp mesh with the as-path stripped and then prioritises these
prefixes so that they're learned as the best path.

The as-path is the primary loop detection mechanism in eBGP. Removing
this is like hot-wiring your electrical distribution board because you
found out you could get more power if you bypass those stupid RCDs.

Once you strip off the as-path in the local view, it's like the AS7007
incident desperately begging to happen all over again.

As long as route optimiser vendors ship their products with such deeply
harmful defaults, we're going to continue to see these problems ad nauseam.

Nick
Re: BGP route hijack by AS10990 [ In reply to ]
----- On Aug 1, 2020, at 12:50 PM, Nick Hilliard nick@foobar.org wrote:

Hi,

> Sabri Berisha wrote on 01/08/2020 20:03:
>> but because Noction's decision to not enable NO_EXPORT by default
>
> the primary problem is not this but that Noction reinjects prefixes into
> the local ibgp mesh with the as-path stripped and then prioritises these
> prefixes so that they're learned as the best path.

Yeah, but that's not problem as far as I'm concerned. Their network,
their rules. I've done weirder stuff than that, in tightly controlled
environments.

> The as-path is the primary loop detection mechanism in eBGP. Removing
> this is like hot-wiring your electrical distribution board because you
> found out you could get more power if you bypass those stupid RCDs.

Well, let's be honest. Sometimes we need to get rid of that pesky mechanism.
For example, when using BGP-as-IGP, the "allowas-in" disregards the as-path,
in a controlled manner (and yes, I know, different use case).

My point is that there can be operational reasons to do so, and whatever
they wish to do on their network is perfectly fine. As long as they don't
bother the rest of the world with it.

Thanks,

Sabri

1 2 3  View All