Mailing List Archive: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

Jun 19, 2020, 4:32 PM

Post #1 of 34 (1300 views)

< ranting of a curmudgeonly old privileged white male >

>>> MPLS was since day one proposed as enabler for services originally
>>> L3VPNs and RSVP-TE.
>> MPLS day one was mike o'dell wanting to move his city/city traffic
>> matrix from ATM to tag switching and open cascade's hold on tags.
> And IIRC, Tag switching day one was Cisco overreacting to Ipsilon.

i had not thought of it as overreacting; more embrace and devour. mo
and yakov, aided and abetted by sob and other ietf illuminati, helped
cisco take the ball away from Ipsilon, Force10, ...

but that is water over the damn, and my head is hurting a bit from
thinking on too many levels at once.

there is saku's point of distributing labels in IGP TLVs/LSAs. i
suspect he is correct, but good luck getting that anywhere in the
internet vendor task force. and that tells us a lot about whether we
can actually effect useful simplification and change.

is a significant part of the perception that there is a forwarding
problem the result of the vendors, 25 years later, still not
designing for v4/v6 parity?

there is the argument that switching MPLS is faster than IP; when the
pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
recovery, and convergence.

did we really learn so little from IP routing that we need to
recreate analogous complexity and fragility in the MPLS control
plane? ( sound of steam eminating from saku's ears :)

and then there is buffering; which seems more serious than simple
forwarding rate. get it there faster so it can wait in a queue? my
principal impression of the Stanford/Google workshops was the parable
of the blind men and the elephant. though maybe Matt had the main
point: given scaling 4x, Moore's law can not save us and it will all
become paced protocols. will we now have a decade+ of BBR evolution
and tuning? if so, how do we engineer our networks for that?

and up 10,000m, we watch vendor software engineers hand crafting in
an assembler language with if/then/case/for, and running a chain of
checking software to look for horrors in their assembler programs.
it's the bleeping 21st century. why are the protocol specs not
formal and verified, and the code formally generated and verified?
and don't give me too slow given that the hardware folk seem to be
able to do 10x in the time it takes to run valgrind a few dozen
times.

we're extracting ore with hammers and chisels, and then hammering it
into shiny objects rather than safe and securable network design and
construction tools.

apologies. i hope you did not read this far.

randy

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mohta at necom830

Jun 20, 2020, 6:39 AM

Post #2 of 34 (1294 views)

Randy Bush wrote:

>>>> MPLS was since day one proposed as enabler for services originally
>>>> L3VPNs and RSVP-TE.
>>> MPLS day one was mike o'dell wanting to move his city/city traffic
>>> matrix from ATM to tag switching and open cascade's hold on tags.
>> And IIRC, Tag switching day one was Cisco overreacting to Ipsilon.
>
> i had not thought of it as overreacting; more embrace and devour. mo
> and yakov, aided and abetted by sob and other ietf illuminati, helped
> cisco take the ball away from Ipsilon, Force10, ...

Ipsilon was hopeless because, as Yakov correctly pointed out, flow
driven approach to automatically detect flows does not scale.

The problem of MPLS, however, is that, it must also be flow driven,
because detailed route information at the destination is necessary
to prepare nested labels at the source, which costs a lot and should
be attempted only for detected flows.

> there is the argument that switching MPLS is faster than IP; when the
> pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
> recovery, and convergence.

Routing table at IPv4 backbone today needs at most 16M entries to be
looked up by simple SRAM, which is as fast as MPLS look up, which is
one of a reason why we should obsolete IPv6.

Though resource reserved flows need their own routing table entries,
they should be charged proportional to duration of the reservation,
which can scale to afford the cost to have the entries.

Masataka Ohta

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

robert at raszuk

Jun 20, 2020, 8:08 AM

Post #3 of 34 (1294 views)

> there is saku's point of distributing labels in IGP TLVs/LSAs. i
> suspect he is correct, but good luck getting that anywhere in the
> internet vendor task force.

Perhaps I will surprise a few but this is not only already in RFC formats -
it is also shipping already across vendors for some time now.

SR-MPLS (as part of its spec) does exactly that. You do not need to use any
SR if you do not want, you still can encapsulate your packets with
transport label corresponding to your exit at any ingress and forget about
LDP for good.

But with that let's not forget that aggregation here is still not spec-ed
out well and to the best of my knowledge it is also not shipping yet. I
recently proposed an idea how to aggregate SRGBs .. one vendor is analyzing
it.

Best,
R.

On Sat, Jun 20, 2020 at 1:33 AM Randy Bush <randy@psg.com> wrote:

> < ranting of a curmudgeonly old privileged white male >
>
> >>> MPLS was since day one proposed as enabler for services originally
> >>> L3VPNs and RSVP-TE.
> >> MPLS day one was mike o'dell wanting to move his city/city traffic
> >> matrix from ATM to tag switching and open cascade's hold on tags.
> > And IIRC, Tag switching day one was Cisco overreacting to Ipsilon.
>
> i had not thought of it as overreacting; more embrace and devour. mo
> and yakov, aided and abetted by sob and other ietf illuminati, helped
> cisco take the ball away from Ipsilon, Force10, ...
>
> but that is water over the damn, and my head is hurting a bit from
> thinking on too many levels at once.
>
> there is saku's point of distributing labels in IGP TLVs/LSAs. i
> suspect he is correct, but good luck getting that anywhere in the
> internet vendor task force. and that tells us a lot about whether we
> can actually effect useful simplification and change.
>
> is a significant part of the perception that there is a forwarding
> problem the result of the vendors, 25 years later, still not
> designing for v4/v6 parity?
>
> there is the argument that switching MPLS is faster than IP; when the
> pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
> recovery, and convergence.
>
> did we really learn so little from IP routing that we need to
> recreate analogous complexity and fragility in the MPLS control
> plane? ( sound of steam eminating from saku's ears :)
>
> and then there is buffering; which seems more serious than simple
> forwarding rate. get it there faster so it can wait in a queue? my
> principal impression of the Stanford/Google workshops was the parable
> of the blind men and the elephant. though maybe Matt had the main
> point: given scaling 4x, Moore's law can not save us and it will all
> become paced protocols. will we now have a decade+ of BBR evolution
> and tuning? if so, how do we engineer our networks for that?
>
> and up 10,000m, we watch vendor software engineers hand crafting in
> an assembler language with if/then/case/for, and running a chain of
> checking software to look for horrors in their assembler programs.
> it's the bleeping 21st century. why are the protocol specs not
> formal and verified, and the code formally generated and verified?
> and don't give me too slow given that the hardware folk seem to be
> able to do 10x in the time it takes to run valgrind a few dozen
> times.
>
> we're extracting ore with hammers and chisels, and then hammering it
> into shiny objects rather than safe and securable network design and
> construction tools.
>
> apologies. i hope you did not read this far.
>
> randy
>

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

robert at raszuk

Jun 20, 2020, 8:12 AM

Post #4 of 34 (1294 views)

> The problem of MPLS, however, is that, it must also be flow driven,
> because detailed route information at the destination is necessary
> to prepare nested labels at the source, which costs a lot and should
> be attempted only for detected flows.
>

MPLS is not flow driven. I sent some mail about it but perhaps it bounced.

MPLS LDP or L3VPNs was NEVER flow driven.

Since day one till today it was and still is purely destination based.

Transport is using LSP to egress PE (dst IP).

L3VPNs are using either per dst prefix, or per CE or per VRF labels. No
implementation does anything upon "flow detection" - to prepare any nested
labels. Even in FIBs all information is preprogrammed in hierarchical
fashion well before any flow packet arrives.

Thx,
R.

>
> > there is the argument that switching MPLS is faster than IP; when the
> > pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
> > recovery, and convergence.
>
> Routing table at IPv4 backbone today needs at most 16M entries to be
> looked up by simple SRAM, which is as fast as MPLS look up, which is
> one of a reason why we should obsolete IPv6.
>
> Though resource reserved flows need their own routing table entries,
> they should be charged proportional to duration of the reservation,
> which can scale to afford the cost to have the entries.
>
> Masataka Ohta
>
>

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 21, 2020, 1:17 AM

Post #5 of 34 (1285 views)

On 20/Jun/20 01:32, Randy Bush wrote:

> there is saku's point of distributing labels in IGP TLVs/LSAs. i
> suspect he is correct, but good luck getting that anywhere in the
> internet vendor task force. and that tells us a lot about whether we
> can actually effect useful simplification and change.

This is shipping today with SR-MPLS.

Besides still being brand new and not yet fully field tested by the
community, my other concern is unless you are running a Juniper and have
the energy to pull a "Vijay Gill" and move your entire backbone to
IS-IS, you'll get either no SR-ISISv6 support, no SR-OSPFv3 support, or
both, with all the vendors.

Which brings me back to the same piss-poor attention LDPv6 is getting,
which is, really, poor attention to IPv6.

Kind of hard for operators to take IPv6 seriously at this level if the
vendors, themselves, aren't.

> is a significant part of the perception that there is a forwarding
> problem the result of the vendors, 25 years later, still not
> designing for v4/v6 parity?

I think the forwarding is fine, if you're carrying the payload in MPLS.

The problem is the control plane. It's not insurmountable; the vendors
just want to do less work.

The issue is IPv4 is gone, and trying to keep it around will only lead
to the creation of more hacks, which will further complicate the control
and data plane.

>
> there is the argument that switching MPLS is faster than IP; when the
> pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
> recovery, and convergence.

Either way, the MPLS or IP problem already has an existing solution. If
you like IP, you can keep it. If you like MPLS, you can keep it.

So I'd be spending less time on the forwarding (of course, if there are
ways to improve that and someone has the time, why not), and as you say,
work on fixing the control plane and the signaling for efficiency and scale.

>
> did we really learn so little from IP routing that we need to
> recreate analogous complexity and fragility in the MPLS control
> plane? ( sound of steam eminating from saku's ears :)

The path to SR-MPLS's inherent signaling carried in the IGP is an
optimum solution, that even I have been wanting since inception.

But, it's still too fresh, global deployment is terrible, and there is
still much to be learned about how it behaves outside of the lab.

For me, a graceful approach toward SR via LDPv6 makes sense. But, as
always, YMMV.

> and then there is buffering; which seems more serious than simple
> forwarding rate. get it there faster so it can wait in a queue? my
> principal impression of the Stanford/Google workshops was the parable
> of the blind men and the elephant. though maybe Matt had the main
> point: given scaling 4x, Moore's law can not save us and it will all
> become paced protocols. will we now have a decade+ of BBR evolution
> and tuning? if so, how do we engineer our networks for that?

This deserves a lot more attention than it's receiving. The problem is
it doesn't sound sexy enough to compile into a PPT that you can project
to suits whom you need to part with cash.

It doesn't have that 5G or SRv6 or Controller or IoT ring to it :-).

It's been a while since vendors that control a large portion of the
market paid real attention to their geeky side. The buffer problem, for
me, would fall into that category. Maybe a smaller, more agile, more
geeky start-up, can take the lead with this one.

> and up 10,000m, we watch vendor software engineers hand crafting in
> an assembler language with if/then/case/for, and running a chain of
> checking software to look for horrors in their assembler programs.
> it's the bleeping 21st century. why are the protocol specs not
> formal and verified, and the code formally generated and verified?
> and don't give me too slow given that the hardware folk seem to be
> able to do 10x in the time it takes to run valgrind a few dozen
> times.

And for today's episode of Jeopardy:

"What used to be the IETF?"

> we're extracting ore with hammers and chisels, and then hammering it
> into shiny objects rather than safe and securable network design and
> construction tools.

Rush it out the factory, fast, even though it's not ready. Get all their
money before they board the ship and sail for Mars.

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 21, 2020, 1:21 AM

Post #6 of 34 (1285 views)

On 20/Jun/20 15:39, Masataka Ohta wrote:

> Ipsilon was hopeless because, as Yakov correctly pointed out, flow
> driven approach to automatically detect flows does not scale.
>
> The problem of MPLS, however, is that, it must also be flow driven,
> because detailed route information at the destination is necessary
> to prepare nested labels at the source, which costs a lot and should
> be attempted only for detected flows.

Again, I think you are talking about what RSVP should have been.

RSVP != MPLS.

> Routing table at IPv4 backbone today needs at most 16M entries to be
> looked up by simple SRAM, which is as fast as MPLS look up, which is
> one of a reason why we should obsolete IPv6.

I'm not sure I should ask this in fear of taking this discussion way off
tangent... aaah, what the heck:

So if we can't assign hosts IPv4 anymore because it has run out, should
we obsolete IPv6 in favour of CGN? I know this works.

>
> Though resource reserved flows need their own routing table entries,
> they should be charged proportional to duration of the reservation,
> which can scale to afford the cost to have the entries.

RSVP failed to take off when it was designed.

Outside of capturing Netflow data (or tracking firewall state), nobody
really cares about handling flows at scale (no, I'm not talking about
ECMP).

Why would we want to do that in 2020 if we didn't in 2000?

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 21, 2020, 1:23 AM

Post #7 of 34 (1285 views)

On 20/Jun/20 17:08, Robert Raszuk wrote:

>
>
> But with that let's not forget that aggregation here is still not
> spec-ed out well and to the best of my knowledge it is also not
> shipping yet. I recently proposed an idea how to aggregate SRGBs ..
> one vendor is analyzing it.

Hence why I think SR still needs time to grow up.

There are some things I can be maverick about. I don't think SR is it,
today.

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 21, 2020, 2:59 AM

Post #8 of 34 (1283 views)

On 20/Jun/20 17:12, Robert Raszuk wrote:

>
> MPLS is not flow driven. I sent some mail about it but perhaps it
> bounced.
>
> MPLS LDP or L3VPNs was NEVER flow driven.
>
> Since day one till today it was and still is purely destination based.
>
> Transport is using LSP to egress PE (dst IP).
>
> L3VPNs are using either per dst prefix, or per CE or per VRF labels.
> No implementation does anything upon "flow detection" - to prepare any
> nested labels. Even in FIBs all information is preprogrammed in
> hierarchical fashion well before any flow packet arrives.

If you really don't like LDP or RSVP-TE, you can statically assign
labels and manually configure FEC's across your entire backbone. If
trading state for administration is your thing, of course :-).

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mohta at necom830

Jun 21, 2020, 4:11 AM

Post #9 of 34 (1283 views)

Robert Raszuk wrote:

> MPLS LDP or L3VPNs was NEVER flow driven.
>
> Since day one till today it was and still is purely destination based.

If information to create labels at or near sources to all the
possible destinations is distributed in advance, may be. But
it is effectively flat routing, or, in extreme cases, flat host
routing.

Or, if information to create labels to all the active destinations
is supplied on demand, it is flow driven.

On day one, Yakov said MPLS had scaled because of nested labels
corresponding to routing hierarchy.

Masataka Ohta

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 21, 2020, 4:31 AM

Post #10 of 34 (1282 views)

On 21/Jun/20 13:11, Masataka Ohta wrote:

>
> If information to create labels at or near sources to all the
> possible destinations is distributed in advance, may be.

But this is what happens today.

Whether you do it manually or use a label distribution protocol, FEC's
are pre-computed ahead of time.

What am I missing?

> But
> it is effectively flat routing, or, in extreme cases, flat host
> routing.

I still don't get it.

>
> Or, if information to create labels to all the active destinations
> is supplied on demand, it is flow driven.

What would the benefit of this be? Ingress and egress nodes don't come
and go. They are stuck in racks in data centres somewhere, and won't
disappear until a human wants them to. So why create labels on-demand if
a box to handle the traffic is already in place and actively working,
day-in, day-out?

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mohta at necom830

Jun 21, 2020, 5:36 AM

Post #11 of 34 (1282 views)

Mark Tinka wrote:

>> If information to create labels at or near sources to all the
>> possible destinations is distributed in advance, may be.
>
> But this is what happens today.

That is a tragedy.

> Whether you do it manually or use a label distribution protocol, FEC's
> are pre-computed ahead of time.
>
> What am I missing?

If all the link-wise (or, worse, host-wise) information of possible
destinations is distributed in advance to all the possible sources,
it is not hierarchical but flat (host) routing, which scales poorly.

Right?

>> But
>> it is effectively flat routing, or, in extreme cases, flat host
>> routing.
>
> I still don't get it.

Why, do you think, flat routing does not but hierarchical
routing does scale?

It is because detailed information to reach destinations
below certain level is advertised not globally but only for
small part of the network around the destinations.

That is, with hierarchical routing, detailed information
around destinations is actively hidden from sources.

So, with hierarchical routing, routing protocols can
carry only rough information around destinations, from
which, source side can not construct detailed (often
purposelessly nested) labels required for MPLS.

> So why create labels on-demand if
> a box to handle the traffic is already in place and actively working,
> day-in, day-out?

According to your theory to ignore routing traffic, we can be happy
with global *host* routing table with 4G entries for IPv4 and a lot
lot lot more than that for IPv6. CIDR should be unnecessary
complication to the Internet

With nested labels, you don't need so much labels at certain nesting
level, which was the point of Yakov, which does not mean you don't
need so much information to create entire nested labels at or near
the sources.

The problem is that we can't afford traffic (and associated processing
by all the related routers or things like those) and storage (at or
near source) for routing (or MPLS, SR* or whatever) with such detailed
routing at the destinations.

Masataka Ohta

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

robert at raszuk

Jun 21, 2020, 6:24 AM

Post #12 of 34 (1282 views)

It is destination based flat routing distributed 100% before any data
packet within each layer - yes. But layers are decoupled so in a sense this
is what defines a hierarchy overall.

So transport is using MPLS LSPs most often hosts IGP routes are matched
with LDP FECs and flooded everywhere in spite of RFC 5283 at least allowing
to aggregate IGP.

Then say L2VPNs or L3VPNs with their own choice of routing protocols are in
turn distributing reachability for the customer sites. Those are service
routes linked to transport by BGP next hop(s).

Many thx,
R.

On Sun, Jun 21, 2020 at 1:11 PM Masataka Ohta <
mohta@necom830.hpcl.titech.ac.jp> wrote:

> Robert Raszuk wrote:
>
> > MPLS LDP or L3VPNs was NEVER flow driven.
> >
> > Since day one till today it was and still is purely destination based.
>
> If information to create labels at or near sources to all the
> possible destinations is distributed in advance, may be. But
> it is effectively flat routing, or, in extreme cases, flat host
> routing.
>
> Or, if information to create labels to all the active destinations
> is supplied on demand, it is flow driven.
>
> On day one, Yakov said MPLS had scaled because of nested labels
> corresponding to routing hierarchy.
>
> Masataka Ohta
>

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

robert at raszuk

Jun 21, 2020, 6:35 AM

Post #13 of 34 (1282 views)

Let's clarify a few things ...

On Sun, Jun 21, 2020 at 2:39 PM Masataka Ohta <
mohta@necom830.hpcl.titech.ac.jp> wrote:

If all the link-wise (or, worse, host-wise) information of possible
> destinations is distributed in advance to all the possible sources,
> it is not hierarchical but flat (host) routing, which scales poorly.
>
> Right?
>

Neither link wise nor host wise information is required to accomplish say
L3VPN services. Imagine you have three sites which would like to
interconnect each with 1000s of users.

So all you are exchanging as part of VPN overlay is three subnets.

Moreover if you have 1000 PEs and those three sites are attached only to 6
of them - only those 6 PEs will need to learn those routes (Hint: RTC -
RFC4684)

It is because detailed information to reach destinations
> below certain level is advertised not globally but only for
> small part of the network around the destinations.
>

Same thing here.

> That is, with hierarchical routing, detailed information
> around destinations is actively hidden from sources.
>

Same thing here.

That is why as described we use label stack. Top label is responsible to
get you to the egress PE. Service label sitting behind top label is
responsible to get you through to the customer site (with or without IP
lookup at egress PE).

> So, with hierarchical routing, routing protocols can
> carry only rough information around destinations, from
> which, source side can not construct detailed (often
> purposelessly nested) labels required for MPLS.
>

Usually sources have no idea of MPLS. MPLS to the host never took off.

> According to your theory to ignore routing traffic, we can be happy
> with global *host* routing table with 4G entries for IPv4 and a lot
> lot lot more than that for IPv6. CIDR should be unnecessary
> complication to the Internet
>

I do not think any one saying it here.

> With nested labels, you don't need so much labels at certain nesting
> level, which was the point of Yakov, which does not mean you don't
> need so much information to create entire nested labels at or near
> the sources.
>

Label stack is here from day one. Each layer of the stack has a completely
different role. That is your hierarchy.

Kind regards,
R.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 21, 2020, 8:50 AM

Post #14 of 34 (1282 views)

On 21/Jun/20 14:36, Masataka Ohta wrote:

>
>
> That is a tragedy.

Well...

> If all the link-wise (or, worse, host-wise) information of possible
> destinations is distributed in advance to all the possible sources,
> it is not hierarchical but flat (host) routing, which scales poorly.
>
> Right?

Host NLRI is summarized in iBGP within the domain, and eBGP outside the
domain.

It's no longer novel to distribute end-user NLRI in the IGP. If folk are
still doing that, I can't feel sympathy for the pain they may experience.

>
> Why, do you think, flat routing does not but hierarchical
> routing does scale?
>
> It is because detailed information to reach destinations
> below certain level is advertised not globally but only for
> small part of the network around the destinations.
>
> That is, with hierarchical routing, detailed information
> around destinations is actively hidden from sources.
>
> So, with hierarchical routing, routing protocols can
> carry only rough information around destinations, from
> which, source side can not construct detailed (often
> purposelessly nested) labels required for MPLS.

But hosts often point default to a clever router.

That clever router could also either point default to the provider, or
carry a full BGP table from the provider.

Neither the host nor their first-hop gateway need to be MPLS-aware.

There are use-cases where a customer CPE can be MPLS-aware, but I'd say
that in nearly 99.999% of all cases, CPE are never MPLS-aware.

> According to your theory to ignore routing traffic, we can be happy
> with global *host* routing table with 4G entries for IPv4 and a lot
> lot lot more than that for IPv6. CIDR should be unnecessary
> complication to the Internet

Not sure what Internet you're running, but I, generally, accept
aggregate IPv4 and IPv6 BGP routes from other AS's. I don't need to know
every /32 or /128 host that sits behind them.

>
> With nested labels, you don't need so much labels at certain nesting
> level, which was the point of Yakov, which does not mean you don't
> need so much information to create entire nested labels at or near
> the sources.

I don't know what Yakov advertised back in the day, but looking at what
I and a ton of others are running in practice, in the real world, today,
I don't see what you're talking about.

Again, if you can identify an actual scenario today, in a live, large
scale (or even small scale) network, I'd like to know.

I'm talking about what's in practice, not theory.

>
> The problem is that we can't afford traffic (and associated processing
> by all the related routers or things like those) and storage (at or
> near source) for routing (or MPLS, SR* or whatever) with such detailed
> routing at the destinations.

Again, I disagree as I mentioned earlier, because you won't be able to
buy a router today that does only IP any cheaper than it does both IP
and MPLS.

MPLS has become mainstream, that its economies of scale have made the
consideration between it and IP a non-starter. Heck, you can even do it
in Linux...

Mark.

RE: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

adamv0025 at netconsultings

Jun 22, 2020, 12:06 AM

Post #15 of 34 (1252 views)

But MPLS can be made flow driven (it can be made whatever the policy dictates), for instance DSCP driven…

adam

From: NANOG <nanog-bounces+adamv0025=netconsultings.com@nanog.org> On Behalf Of Robert Raszuk
Sent: Saturday, June 20, 2020 4:13 PM
To: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
Cc: North American Network Operators' Group <nanog@nanog.org>
Subject: Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?)

The problem of MPLS, however, is that, it must also be flow driven,
because detailed route information at the destination is necessary
to prepare nested labels at the source, which costs a lot and should
be attempted only for detected flows.

MPLS is not flow driven. I sent some mail about it but perhaps it bounced.

MPLS LDP or L3VPNs was NEVER flow driven.

Since day one till today it was and still is purely destination based.

Transport is using LSP to egress PE (dst IP).

L3VPNs are using either per dst prefix, or per CE or per VRF labels. No implementation does anything upon "flow detection" - to prepare any nested labels. Even in FIBs all information is preprogrammed in hierarchical fashion well before any flow packet arrives.

Thx,
R.

> there is the argument that switching MPLS is faster than IP; when the
> pressure points i see are more at routing (BGP/LDP/RSVP/whatever),
> recovery, and convergence.

Routing table at IPv4 backbone today needs at most 16M entries to be
looked up by simple SRAM, which is as fast as MPLS look up, which is
one of a reason why we should obsolete IPv6.

Though resource reserved flows need their own routing table entries,
they should be charged proportional to duration of the reservation,
which can scale to afford the cost to have the entries.

Masataka Ohta

RE: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

adamv0025 at netconsultings

Jun 22, 2020, 12:43 AM

Post #16 of 34 (1252 views)

> Masataka Ohta
> Sent: Sunday, June 21, 2020 1:37 PM
>
> > Whether you do it manually or use a label distribution protocol, FEC's
> > are pre-computed ahead of time.
> >
> > What am I missing?
>
> If all the link-wise (or, worse, host-wise) information of possible
destinations
> is distributed in advance to all the possible sources, it is not
hierarchical but
> flat (host) routing, which scales poorly.
>
> Right?
>
On the Internet yes in controlled environments no, as in these environments
the set of possible destinations is well scoped.

Take an MPLS enabled DC for instance, every VM does need to talk to only a
small subset of all the VMs hosted in a DC.
Hence each VM gets flow transport labels programmed via centralized
end-to-end flow controllers on a need to know bases (not everything to
everyone).
(E.g. dear vm1 this is how you get your EF/BE flows via load-balancer and FW
to backend VMs in your local pod, this is how you get via local pod fw to
internet gw, etc..., done)
Now that you have these neat "pipes" all over the place connecting VMs it's
easy for the switching fabric controller to shuffle elephant and mice flows
around in order to avoid any link saturation.

And now imagine a bit further doing the same as above but with CPEs on a
Service Provider network... yep, no PEs acting as chokepoints for MPLS label
switch paths to flow assignment, needing massive FIBs and even bigger, just
dumb MPLS switch fabric, all the "hard-work" is offloaded to centralized
controllers (and CPEs for label stack imposition) -but only on a need to
know bases (not everything to everyone).

Now in both cases you're free to choose to what extent should the MPLS
switch fabric be involved with the end-to-end flows by imposing hierarchies
to the MPLS stack.

In light of the above, does it suck to have just 20bits of MPLS label space?
Absolutely.

Adam

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mohta at necom830

Jun 22, 2020, 5:49 AM

Post #17 of 34 (1243 views)

Robert Raszuk wrote:

> Neither link wise nor host wise information is required to accomplish say
> L3VPN services. Imagine you have three sites which would like to
> interconnect each with 1000s of users.

For a single customer of an ISP with 1000s of end users. OK.

But, it should be noted that a single class B routing table entry
often serves for an organization with 10000s of users, which is
at least our case here at titech.ac.jp.

It should also be noted that, my concern is scalability in ISP side.

> Moreover if you have 1000 PEs and those three sites are attached only to 6
> of them - only those 6 PEs will need to learn those routes (Hint: RTC -
> RFC4684)

If you have 1000 PEs, you should be serving for somewhere around 1000
customers.

And, if I understand BGP-MP correctly, all the routing information of
all the customers is flooded by BGP-MP in the ISP.

Then, it should be a lot better to let customer edges encapsulate
L2 or L3 over IP, with which, routing information within customers
is exchanged by customer provided VPN without requiring extra
overhead of maintaining customer local routing information by the
ISP.

If a customer want customer-specific SLA, it can be described
as SLA between customer edge routers, for which, intra-ISP MPLS
may or may not be used.

For the ISP, it can be as profitable as PE-based VRF solutions,
because customers so relying on ISPs will let the ISP provide
and maintain customer edges.

The only difference should be on profitability for router makers,
which want to make routing system as complex as possible or even
a lot more than that to make backbone routers a lot profitable
product.

>> With nested labels, you don't need so much labels at certain nesting
>> level, which was the point of Yakov, which does not mean you don't
>> need so much information to create entire nested labels at or near
>> the sources.

> Label stack is here from day one.

Label stack was there, because of, now recognized to be wrong,
statement of Yakov on day one and I can see no reason still to
keep it.

Masataka Ohta

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mohta at necom830

Jun 22, 2020, 6:08 AM

Post #18 of 34 (1243 views)

Mark Tinka wrote:

>> So, with hierarchical routing, routing protocols can
>> carry only rough information around destinations, from
>> which, source side can not construct detailed (often
>> purposelessly nested) labels required for MPLS.
>
> But hosts often point default to a clever router.
The requirement from the E2E principle is that routers should be
dumb and hosts should be clever or the entire system do not.
scale reliably.

In this case, such clever router can ever exist only near the
destination unless very detailed routing information is flooded
all over the network to all the possible sources.

A router can't be clever on something, unless it is provided
with very detailed information on all the possible destinations,
which needs a lot of routing traffic making entire system not
to scale.

Masataka Ohta

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mohta at necom830

Jun 22, 2020, 6:17 AM

Post #19 of 34 (1243 views)

adamv0025@netconsultings.com wrote:

> But MPLS can be made flow driven (it can be made whatever the policy
> dictates), for instance DSCP driven$B!D(B

The point of Yakov on day one was that, flow driven approach of
Ipsilon does not scale and is unacceptable.

Though I agree with Yakov here, we must also eliminate all the
flow driven approaches by MPLS or whatever.

Masataka Ohta

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 22, 2020, 6:40 AM

Post #20 of 34 (1243 views)

On 22/Jun/20 14:49, Masataka Ohta wrote:

>
> But, it should be noted that a single class B...

CIDR - let's not teach the kids old news :-).

> If you have 1000 PEs, you should be serving for somewhere around 1000
> customers.

It's not linear.

We probably have 1 edge router serving several-thousand customers.

>
> And, if I understand BGP-MP correctly, all the routing information of
> all the customers is flooded by BGP-MP in the ISP.

Yes, best practice is in iBGP.

Some operators may still be using an IGP for this. It would work, but
scales poorly.

>
> Then, it should be a lot better to let customer edges encapsulate
> L2 or L3 over IP, with which, routing information within customers
> is exchanged by customer provided VPN without requiring extra
> overhead of maintaining customer local routing information by the
> ISP.

You mean like IP-in-IP or GRE? That already happens today, without any
intervention from the ISP.

>
> If a customer want customer-specific SLA, it can be described
> as SLA between customer edge routers, for which, intra-ISP MPLS
> may or may not be used.

l2vpn's and l3vpn's attract a higher SLA because the services are mostly
provisioned on-net. If an off-net component exists, it would be via a
trusted NNI partner.

Regular IP or GRE tunnels don't come with these kinds of SLA's because
the ISP isn't involved, and the B-end would very likely be off-net with
no SLA guarantees between the A-end customer's ISP and the remote ISP
hosting the B-end.

>
> For the ISP, it can be as profitable as PE-based VRF solutions,
> because customers so relying on ISPs will let the ISP provide
> and maintain customer edges.

There are few ISP's who would be able to terminate an IP or GRE tunnel
on-net, end-to-end.

And even then, they might be reluctant to offer any SLA's because those
tunnels are built on the CPE, typically outside of their control.

>
> The only difference should be on profitability for router makers,
> which want to make routing system as complex as possible or even
> a lot more than that to make backbone routers a lot profitable
> product.

If ISP's didn't make money from MPLS/VPN's, router vendors would not be
as keen on adding the capability in their boxes.

>
> Label stack was there, because of, now recognized to be wrong,
> statement of Yakov on day one and I can see no reason still to
> keep it.

Label stacking is fundamental to the "MP" part of MPLS. Whether your
payload is IP, ATM, Ethernet, Frame Relay, PPP, HDLC, e.t.c., the
ability to stack labels is what makes an MPLS network payload agnostic.
There is value in that.

Mark.

RE: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

adamv0025 at netconsultings

Jun 22, 2020, 6:49 AM

Post #21 of 34 (1243 views)

> From: Masataka Ohta <mohta@necom830.hpcl.titech.ac.jp>
> Sent: Monday, June 22, 2020 2:17 PM
>
> adamv0025@netconsultings.com wrote:
>
> > But MPLS can be made flow driven (it can be made whatever the policy
> > dictates), for instance DSCP driven.
>
> The point of Yakov on day one was that, flow driven approach of Ipsilon
does
> not scale and is unacceptable.
>
> Though I agree with Yakov here, we must also eliminate all the flow driven
> approaches by MPLS or whatever.
>
First I'd need a definition of what flow means in this discussion are we
considering 5-tuple or 4-tuple or just SRC-IP & DST-IP, is DSCP marking part
of it?
Second, although I agree that ~1M unique identifiers is not ideal, can you
provide examples of MPLS applications where 1M is limiting?
What particular aspect?
Is it 1M interfaces per MPLS switching fabric box?
Or 1M unique flows (or better flow groups) originated by a given
VM/Container/CPE?
Or 1M destination entities (IPs or apps on those IPs) that any particular
VM/Container/CPE needs to talk to?
Or 1M customer VPNs or 1M PE-CPE links, if PE acts as a bottleneck?

adam

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 22, 2020, 6:51 AM

Post #22 of 34 (1243 views)

On 22/Jun/20 15:08, Masataka Ohta wrote:

>
> The requirement from the E2E principle is that routers should be
> dumb and hosts should be clever or the entire system do not.
> scale reliably.

And yet in the PTT world, it was the other way around. Clever switching
and dumb telephone boxes. How things have since evened out.

I can understand the concern about making the network smart. But even a
smart network is not as smart as a host. My laptop can do a lot of
things more cleverly than any of the routers in my network. It just
can't do them at scale, consistently, for a bunch of users. So the
responsibility gets to be shared, with the number of users being served
diminishing as you enter and exit the edge of the network.

It's probably not yet an ideal networking paradigm, but it's the one we
have now that is a reasonably fair compromise.

>
> In this case, such clever router can ever exist only near the
> destination unless very detailed routing information is flooded
> all over the network to all the possible sources.

I will admit that bloating router code over recent years to become
terribly smart (CGN, Acceleration, DoS mitigation, VPN's, SD-WAN, IDS,
Video Monitoring, e.t.c.) can become a run away problem. I've often
joked that with all the things being thrown into BGP, we may just see it
carrying DNS too, hehe.

Personally, the level of intelligence we have in routers now beyond
being just Layer 1, 2, 3 - and maybe 4 - crunching machines is just as
far as I'm willing to go. If, like me, you keep pushing back on vendors
trying to make your routers also clean your dishes, they'll take the
hint and stop bloating the code.

Are MPLS/VPN's overly clever? I think so. But considering the pay-off
and how much worse it could get, I'm willing to accept that.

>
> A router can't be clever on something, unless it is provided
> with very detailed information on all the possible destinations,
> which needs a lot of routing traffic making entire system not
> to scale.

Well, if you can propose a better way to locate hosts on a global
network not owned by anyone, in a connectionless manner, I'm sure we'd
all be interested.

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

mark.tinka at seacom

Jun 22, 2020, 6:51 AM

Post #23 of 34 (1243 views)

On 22/Jun/20 15:17, Masataka Ohta wrote:

>
>
> The point of Yakov on day one was that, flow driven approach of
> Ipsilon does not scale and is unacceptable.
>
> Though I agree with Yakov here, we must also eliminate all the
> flow driven approaches by MPLS or whatever.

I still don't see them in practice, even though they may have been proposed.

Mark.

Re: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

Jun 22, 2020, 6:55 AM

Post #24 of 34 (1243 views)

Masataka Ohta wrote on 22/06/2020 13:49:
> But, it should be noted that a single class B routing table entry

"a single class B routing table entry"? Did 1993 just call and ask for
its addressing back? :-)

> But, it should be noted that a single class B routing table entry
> often serves for an organization with 10000s of users, which is
> at least our case here at titech.ac.jp.
>
> It should also be noted that, my concern is scalability in ISP side.

This entire conversation is puzzling: we already have "hierarchical
routing" to a large degree, to the extent that the public DFZ only sees
aggregate routes exported by ASNs. Inside ASNs, there will be internal
aggregation of individual routes (e.g. an ISP DHCP pool), and possibly
multiple levels of aggregation, depending on how this is configured.
Aggregation is usually continued right down to the end-host edge, e.g. a
router might have a /26 assigned on an interface, but the hosts will be
aggregated within this /26.

> If you have 1000 PEs, you should be serving for somewhere around 1000
> customers.
>
> And, if I understand BGP-MP correctly, all the routing information of
> all the customers is flooded by BGP-MP in the ISP.

Well, maybe. Or maybe not. This depend on lots of things.

> Then, it should be a lot better to let customer edges encapsulate
> L2 or L3 over IP, with which, routing information within customers
> is exchanged by customer provided VPN without requiring extra
> overhead of maintaining customer local routing information by the
> ISP.

If you have 1000 or even 10000s of PEs, injecting simplistic
non-aggregated routing information is unlikely to be an issue. If you
have 1,000,000 PEs, you'll probably need to rethink that position.

If your proposition is that the nature of the internet be changed so
that route disaggregation is prevented, or that addressing policy be
changed so that organisations are exclusively handed out IP address
space by their upstream providers, then this is simple matter of
misunderstanding of how impractical the proposition is: that horse
bolted from the barn 30 years ago; no organisation would accept
exclusive connectivity provided by a single upstream; and today's world
of dense interconnection would be impossible on the terms you suggest.
You may not like that there are lots of entries in the DFZ and many
operators view this as a bit of a drag, but on today's technology, this
can scale to significantly more than what we foresee in the medium-long
term future.

Nick

RE: why am i in this handbasket? (was Devil's Advocate - Segment Routing, Why?) [ In reply to ]

adamv0025 at netconsultings

Jun 22, 2020, 7:30 AM

Post #25 of 34 (1243 views)

> Masataka Ohta
> Sent: Monday, June 22, 2020 1:49 PM
>
> Robert Raszuk wrote:
>
> > Moreover if you have 1000 PEs and those three sites are attached only
> > to 6 of them - only those 6 PEs will need to learn those routes (Hint:
> > RTC -
> > RFC4684)
>
> If you have 1000 PEs, you should be serving for somewhere around 1000
> customers.
>
> And, if I understand BGP-MP correctly, all the routing information of all
the
> customers is flooded by BGP-MP in the ISP.
>
Not quite,
The routing information is flooded by default, but the receivers will cherry
pick what they need and drop the rest.
And even if the default flooding of all and dropping most is a concern -it
can be addressed where only the relevant subset of all the routing info is
sent to each receiver.
The key takeaway however is that no single entity in SP network, be it PE,
or RR, or ASBR...., ever needs everything, you can always slice and dice
indefinitely.
So to sum it up you simply can not run into any scaling ceiling with MP-BGP
architecture.

adam