Mailing List Archive: LDPv6 Census Check

Re: LDPv6 Census Check [ In reply to ]

Jun 11, 2020, 8:41 AM

Post #51 of 93 (3059 views)

On Thu, 11 Jun 2020 at 18:32, David Sinn <dsinn@dsinn.com> wrote:

> However if you move away from large multi-chip systems, which hide internal links which can only be debugged and monitored if you know the the obscure, often different ways in which they are partially exposed to the operator, and to a system of fixed form-factor, single chip systems, labels fall apart at scale with high ECMP. Needing to enumerate every possible path within the network or having to have a super-deep label stack removes all of the perceived benefits of cheap and simple. The arguments about IP lookups being slow is one best left to the 1990's when it was true. Fixed pipeline systems have proven this to be false.

It continues to be very much true. IP lookups require external memory,
which takes SERDES, which could be used for revenue otherwise. IP
lookups are slow, expensive and complex, fundamentally, no amount of
advancement will change this fundamental nature.
Sure we can come up with all kind of implementations which bridge the
gap, but the gap is there.

If we take say JNPR MX, your lookup speed isn't limited by the
instruction count on the PPE, the PPE spends most of its time
sleeping, when the platform is fully PPS congested, the PPE is waiting
for the memory to return!

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 11, 2020, 10:19 AM

Post #52 of 93 (3059 views)

On Thu, 11 Jun 2020 at 19:49, Phil Bedard <bedard.phil@gmail.com> wrote:

> As for normal v6 forwarding, the way most higher speed routers made recently work there is little difference in latency since the encapsulation for the packet is done in a common function at the end of the pipeline and the lookups are often in the same memory space. NPUs are also being built today with enough on-package memory to hold larger routing tables. Whether a packet has to be buffered on-chip vs. off-chip has a much larger impact on latency/PDV than a forwarding lookup.

On-package is not important, on-chip or off-chip is what matters, i.e.
do you eat SERDES to connect memory or not.

Of course you could always implement a software feature that says
these 32b/32 or 128b/128 addresses are blessed and need to live on
tiny on-chip memory and from this CIDR we guarantee all are host
routes. To achieve similar-to-MPLS performance, with few more bytes
per number.

The demand is, we need tunneling, then the question is what are the
metrics of a good tunneling solution. By answering this honestly, MPLS
is better. We could do better surely, but IP is not that.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 11, 2020, 10:20 AM

Post #53 of 93 (3059 views)

Phil Bedard wrote on 11/06/2020 17:49:
> Just to clarify the only routers who potentially need to inspect or
> do anything with those headers are endpoints who require information
> in the extension header or hops in an explicit path. In the simple
> example I gave, there are no extension headers at all.

perhaps, but no-one planning to use srv6 is going to invest in kit which
can handle srv6 but not the TE component. Or deploy srv6 on existing
kit which can't handle TE.

Nick
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 11, 2020, 11:04 AM

Post #54 of 93 (3059 views)

> On Jun 11, 2020, at 8:41 AM, Saku Ytti <saku@ytti.fi> wrote:
>
> On Thu, 11 Jun 2020 at 18:32, David Sinn <dsinn@dsinn.com> wrote:
>
>> However if you move away from large multi-chip systems, which hide internal links which can only be debugged and monitored if you know the the obscure, often different ways in which they are partially exposed to the operator, and to a system of fixed form-factor, single chip systems, labels fall apart at scale with high ECMP. Needing to enumerate every possible path within the network or having to have a super-deep label stack removes all of the perceived benefits of cheap and simple. The arguments about IP lookups being slow is one best left to the 1990's when it was true. Fixed pipeline systems have proven this to be false.
>
> It continues to be very much true. IP lookups require external memory,
> which takes SERDES, which could be used for revenue otherwise. IP
> lookups are slow, expensive and complex, fundamentally, no amount of
> advancement will change this fundamental nature.
> Sure we can come up with all kind of implementations which bridge the
> gap, but the gap is there.

But now you are comparing apples and oranges. You're asserting that all IP lookups require external memory. But your talking about comparing a lite-core to a heavy-core. As I said, it depend on deployments. If you have a lite-IP core you don't need external memories. So there isn't a lag question going out to massive memories needed for 2M+ entires. So it is not always that lookups are slow, expensive, complex. Sure, you can build a network around a heavy core but you can also build one without. Sweeping generalizations that MPLS is always better than all other technologies is just that, a sweeping generalization. It misses a ton of points.

Rewrites on MPLS is horrible from a memory perspective as maintaining the state and label transition to explore all possible discrete paths across the overall end-to-end path you are trying to take is hugely in-efficient. Applying circuit switching to a packet network was bad from the start. SR doesn't resolve that, as you are stuck with a global label problem and the associated lack of being able to engineer your paths, or a label stack problem on ingress that means you need a massive ASIC's and memories there.

IP at least gives you rewrite sharing, so in a lite-core you have way better trade-off on resources, especially in a heavily ECMP'ed network. Such as one build of massive number of open small boxes vs. a small number of huge opaque ones. Pick your poison but saying one is inheriantly better then another in all cases is just plane false.

> If we take say JNPR MX, your lookup speed isn't limited by the
> instruction count on the PPE, the PPE spends most of its time
> sleeping, when the platform is fully PPS congested, the PPE is waiting
> for the memory to return!

You've made my point for me. If you are building the core of your network out of MX's, to turn a phrase, in a past life "I fully support my competitors to do so". Large numbers of small boxes, as they have already shown in the data-center, have major cost, control and operational advantages over a small number of large ones. They also expose the inherent problems of label-switching and where IP has it's merits.

David

>
> --
> ++ytti

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 11, 2020, 12:39 PM

Post #55 of 93 (3058 views)

On Thu, 11 Jun 2020 at 21:04, David Sinn <dsinn@dsinn.com> wrote:

> You've made my point for me. If you are building the core of your network out of MX's, to turn a phrase, in a past life "I fully support my competitors to do so". Large numbers of small boxes, as they have already shown in the data-center, have major cost, control and operational advantages over a small number of large ones. They also expose the inherent problems of label-switching and where IP has it's merits.

Except this implementation does not exist, but we can argue that is
missing feature. We can argue we should be able to tell the lookup
engine this CIDR is on-chip and it's host routes only. This is
certainly doable, and would make IP tunnels like MPLS tunnels for
lookup cost, just larger lookup key, which is not significant cost.

But even if we had this (we don't, we have for MPLS) IP would be still
inferior, it is more tunneling overhead, i.e. I need more overspeed.
Technically MPLS is just better tunneling header. I can understand
sentimental arguments for IPv4 and market seems to appreciate those
arguments particularly well.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

adamv0025 at netconsultings

Jun 11, 2020, 1:44 PM

Post #56 of 93 (3058 views)

> From: David Sinn
> Sent: Thursday, June 11, 2020 4:32 PM
>
> However if you move away from large multi-chip systems,
> to a system of fixed form-factor, single chip systems, labels fall
> apart at scale with high ECMP. Needing to enumerate every possible path
> within the network or having to have a super-deep label stack removes all
of
> the perceived benefits of cheap and simple.
>
Looks like the deployments you describe are large DC Clos/Benes fabric, then
the potentially deep label imposition would be done by the VMs right?
On transit nodes the 64x ECMP or super-deep labels is no problem for
NPU/lookup process as it's always just top label lookup and resolving to a
single egress interface.

adam

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

mark.tinka at seacom

Jun 11, 2020, 2:02 PM

Post #57 of 93 (3058 views)

On 11/Jun/20 17:32, David Sinn wrote:

> Respectfully, that is deployment dependent. In a traditional SP topology that focuses on large do everything boxes, where the topology is fairly point-to-point and you only have a small handful of nodes at a PoP, labels can be fast, cheap and easy. Given the lack of ECMP/WECMP, they remain fairly efficient within the hardware.
>
> However if you move away from large multi-chip systems, which hide internal links which can only be debugged and monitored if you know the the obscure, often different ways in which they are partially exposed to the operator, and to a system of fixed form-factor, single chip systems, labels fall apart at scale with high ECMP.

I'm curious about this statement - have you hit practical ECMP issues
with label switching at scale?

We have ECMP'ed label switch paths with multiple paths for a single FEC
all over the place, and those work fine both on Cisco and Junos (of all
sizes), both for IPv4 and IPv6 FEC's. Have done for years.

Unless I misunderstand your concern.

Mark.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

adamv0025 at netconsultings

Jun 11, 2020, 2:45 PM

Post #58 of 93 (3054 views)

> From: Mark Tinka <mark.tinka@seacom.mu>
> Sent: Thursday, June 11, 2020 3:59 PM
>
> > No, my line of reasoning is if you have MPLS LSPs signalled over v4 I see no
> point having them signalled also over v6 in parallel.
>
> It's not about signaling IPv4 LSP's over IPv6.
> LDPv4 creates IPv4 FEC's.
> LDPv6 creates IPv6 FEC's.
>
> The idea is to create IPv6 FEC's so that IPv6 traffic can be label-switched in
> the network natively, allowing you to remove BGPv6 in a native dual-stack
> core.
>
Right I see what you are striving to achieve is migrate from BGP in a core to a BGP free core but not leveraging 6PE or 6VPE?

>
> As you can see, just as with IPv4, IPv6 packets are now being MPLS-switched
> in the core, allowing you to remove BGPv6 in the core and simplify
> operations in that area of the network.
>
> So this is native MPLSv6. It's not 6PE or 6VPE.
>
So considering you already had v4 FECs wouldn't it be simpler to do 6PE/6VPE, what do you see as drawbacks of these compared to native MPLSv6 please?

> > Apart from X months worth of functionality, performance, scalability and
> interworking testing -network wide code upgrades to address the bugs
> found during the testing process and then finally rollout across the core and
> possibly even migration from LDPv4 to LDPv6, involving dozens of people
> from Arch, Design, OPS, Project management, etc... with potential for things
> to break while making changes in live network.
>
> Which you wouldn't have to do with SRv6, because you trust the vendors?
>
Well my point was that if v4 FECs would be enough to carry v6 traffic then I wouldn't need SRv6 nor LDPv6, hence I'm curious to hear from you about the benefits of v6 FEC over v4 FEC (or in other words MPLSv6 vs 6PE/6VPE).

adam

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

mark.tinka at seacom

Jun 11, 2020, 8:43 PM

Post #59 of 93 (3050 views)

On 11/Jun/20 23:45, adamv0025@netconsultings.com wrote:

> Right I see what you are striving to achieve is migrate from BGP in a core to a BGP free core but not leveraging 6PE or 6VPE?

Yes sir.

> So considering you already had v4 FECs wouldn't it be simpler to do 6PE/6VPE, what do you see as drawbacks of these compared to native MPLSv6 please?

Because 6PE, for us, adds a lot more complexity in how we design the
network.

But most importantly, it creates a dependency for the success of IPv6 on
IPv4. If my IPv4 network were to break, for whatever reason, it would
take my IPv6 network down with it.

Years back, there was a nasty bug in the ASR920 that set an upper limit
on the MPLS label space it created FEC's for. Since Juniper sometimes
uses higher label numbers than Cisco, traffic between that ASR920 and
our Juniper network was blackholed. It took weeks to troubleshoot, Cisco
sent some engineering code, I confirmed it fixed the issue, and it was
rolled out generally. During that time when the ASR920 was unavailable
on IPv4, it was still reachable on IPv6.

Other issues are also with the ASR920 and ME3600X/3800X routers, where
0/0 and ::/0 are the last routes to be programmed into FIB when you run
BGP-SD. It can be a while until those boxes can reach the rest of the
world via default. IPv6 will get there faster.

I also remember another issue, back in 2015, where a badly-written IPv4
ACL kicked one of our engineers out of the box. Thankfully, he got back
in via IPv6.

I guess what I'm saying is we don't want to fate-share. IPv4 and IPv6
can operate independently. A failure mode in one of them does not
necessarily propagate to the other, in a native, dual-stack network. You
can deploy something in your IPv6 control/data plane without impacting
IPv4, and vice versa, if you want to roll out gracefully, without
impacting the other protocol.

6PE simply has too many moving parts to setup, comparing to just adding
an IPv6 address to a router interface and updating your IGP. Slap on
LDPv6 for good measure, and you've achieved MPLSv6 forwarding without
all the 6PE faffing.

> Well my point was that if v4 FECs would be enough to carry v6 traffic then I wouldn't need SRv6 nor LDPv6, hence I'm curious to hear from you about the benefits of v6 FEC over v4 FEC (or in other words MPLSv6 vs 6PE/6VPE).

No need for 6PE deployment and day-to-day operation complexity.

A simplified and more native tunneling for IPv6-in-MPLSv6, rather than
IPv6-in-MPLSv4-on-IPv4.

No inter-dependence between IPv6 and IPv4.

Easier troubleshooting if one of the protocols is misbehaving, because
then you are working on just one protocol, and not trying to figure if
IPv4 or MPLSv4 are breaking IPv6, or vice versa.

For me, those 4 simple points help me sleep well at 3AM, meaning I can
stay up longer having more wine, in peace :-).

Mark.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 8:16 AM

Post #60 of 93 (3048 views)

> On Jun 11, 2020, at 12:39 PM, Saku Ytti <saku@ytti.fi> wrote:
>
> On Thu, 11 Jun 2020 at 21:04, David Sinn <dsinn@dsinn.com> wrote:
>
>> You've made my point for me. If you are building the core of your network out of MX's, to turn a phrase, in a past life "I fully support my competitors to do so". Large numbers of small boxes, as they have already shown in the data-center, have major cost, control and operational advantages over a small number of large ones. They also expose the inherent problems of label-switching and where IP has it's merits.
>
> Except this implementation does not exist, but we can argue that is
> missing feature. We can argue we should be able to tell the lookup
> engine this CIDR is on-chip and it's host routes only. This is
> certainly doable, and would make IP tunnels like MPLS tunnels for
> lookup cost, just larger lookup key, which is not significant cost.

I'm not sure what implementation you are saying doesn't exist. The Broadcom XGS line is all on-die. The two largest cloud providers are using them in their transport network (to the best of my understanding). So I'm not sure if your saying that no one is using small boxes like I'm describing or what. And it doesn't have to be MPLS over IP. That is one option, but IPIP is another.

> But even if we had this (we don't, we have for MPLS) IP would be still
> inferior, it is more tunneling overhead, i.e. I need more overspeed.
> Technically MPLS is just better tunneling header. I can understand
> sentimental arguments for IPv4 and market seems to appreciate those
> arguments particularly well.

Again, feel free to look at only one small aspect and say that it is completely better in all cases. MPLS is not better in wide ECMP cases, full stop. SR doesn't help that when you actually look at the problems at massive scale as I have done. You continually are on the trade-off spectrum of irrationally deep label stacks or enumeration of all of the possible paths through the network and burn all of your next-hop re-writes. At least if you want high-radiux, single chip systems. So this is not sentimentally around a protocol, it's the practical reality when you look at the problems at scale using commodity components. So if you want to optimize for costs and power (which is operational costs), MPLS is not where it is at.

David

> --
> ++ytti

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 8:19 AM

Post #61 of 93 (3048 views)

> On Jun 11, 2020, at 2:02 PM, Mark Tinka <mark.tinka@seacom.mu> wrote:
>
>
>
> On 11/Jun/20 17:32, David Sinn wrote:
>
>> Respectfully, that is deployment dependent. In a traditional SP topology that focuses on large do everything boxes, where the topology is fairly point-to-point and you only have a small handful of nodes at a PoP, labels can be fast, cheap and easy. Given the lack of ECMP/WECMP, they remain fairly efficient within the hardware.
>>
>> However if you move away from large multi-chip systems, which hide internal links which can only be debugged and monitored if you know the the obscure, often different ways in which they are partially exposed to the operator, and to a system of fixed form-factor, single chip systems, labels fall apart at scale with high ECMP.
>
> I'm curious about this statement - have you hit practical ECMP issues
> with label switching at scale?

Yes. Path enumeration when you use mult-tier Clos topologies within a PoP causes you many, many problem.

> We have ECMP'ed label switch paths with multiple paths for a single FEC
> all over the place, and those work fine both on Cisco and Junos (of all
> sizes), both for IPv4 and IPv6 FEC's. Have done for years.

The protocols will work fine. And if you are still buying SP class chips, you're fine. But you are paying a lot for those chips in cost and power. If you look to move to the class of single-chip systems, which gives you lower costs and higher radix, you have to pay the trade-offs somewhere. MPLS at high-radix ECMP exposes this.

David

> Unless I misunderstand your concern.
>
> Mark.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 8:26 AM

Post #62 of 93 (3048 views)

On Fri, 12 Jun 2020 at 18:16, David Sinn <dsinn@dsinn.com> wrote:

> I'm not sure what implementation you are saying doesn't exist. The Broadcom XGS line is all on-die. The two largest cloud providers are using them in their transport network (to the best of my understanding). So I'm not sure if your saying that no one is using small boxes like I'm describing or what. And it doesn't have to be MPLS over IP. That is one option, but IPIP is another.

I'm saying implementation which has off-chip and supports putting some
on-chip. So that you could have full table lookup for edge packets and
and fast exact lookup for others. Of course we do have platforms which
do have large LEM tables off-chip.

> Again, feel free to look at only one small aspect and say that it is completely better in all cases. MPLS is not better in wide ECMP cases, full stop. SR doesn't help that when you actually look at the problems at massive scale as I have done. You continually are on the trade-off spectrum of irrationally deep label stacks or enumeration of all of the possible paths through the network and burn all of your next-hop re-writes. At least if you want high-radiux, single chip systems. So this is not sentimentally around a protocol, it's the practical reality when you look at the problems at scale using commodity components. So if you want to optimize for costs and power (which is operational costs), MPLS is not where it is at.

I'm not sure why this deep label stack keeps popping, if we need
multiple levels of tunneling, we need it in IP too, and it's almost
more expensive in IP. Cases I can think of in SR, you'll only loop top
label or two, even if you might have 10 labels.
For every apples to apples cases MPLS tunnels are superior to IP
tunnels. If you want cheap very small FIB backbone, then all traffic
will need to be IP tunneled to egress, and you get all the MPLS
problems, and you get more overhead and larger keys (larger keys is
not a big deal).

Now if discussion is do we need tunnelling at all, the discussion is
very different.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

mark.tinka at seacom

Jun 12, 2020, 8:26 AM

Post #63 of 93 (3048 views)

On 11/Jun/20 19:19, Saku Ytti wrote:

> The demand is, we need tunneling, then the question is what are the
> metrics of a good tunneling solution. By answering this honestly, MPLS
> is better. We could do better surely, but IP is not that.

One unexpected benefit, I will say, with going native LDPv6 is that
MTR's for IPv6 destinations no longer report packet loss on the
intermediary core routers (CRS-X).

I know this was due to the control plane, and nothing to do with the
actual data plane, but it was always a tool explaining to customers why
MTR's for IPv4 destinations have 0% packet loss in our core, while IPv6
ones have 30% - 50% (in spite of the final end-host reporting 0% packet
loss).

Since going LDPv6, IPv6 traffic is now label-switched in the core, in
lieu of hop-by-hop IPv6 forwarding. The unforeseen-but-welcome side
effect is that customer packet loss MTR's for IPv6 destinations that
traverse the CRS-X core are as 0% as they are for IPv4 (even though we
haven't yet removed BGPv6 from the core due to IOS XE platforms that
don't yet run LDPv6).

One less trouble ticket to have to explain for our NOC; I'll gladly take
that...

As my Swedish friend would say, "That gives me an avenue of pleasure and
joy" :-).

Mark.

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 8:42 AM

Post #64 of 93 (3048 views)

> On Jun 12, 2020, at 8:26 AM, Saku Ytti <saku@ytti.fi> wrote:
>
> On Fri, 12 Jun 2020 at 18:16, David Sinn <dsinn@dsinn.com> wrote:
>
>> I'm not sure what implementation you are saying doesn't exist. The Broadcom XGS line is all on-die. The two largest cloud providers are using them in their transport network (to the best of my understanding). So I'm not sure if your saying that no one is using small boxes like I'm describing or what. And it doesn't have to be MPLS over IP. That is one option, but IPIP is another.
>
> I'm saying implementation which has off-chip and supports putting some
> on-chip. So that you could have full table lookup for edge packets and
> and fast exact lookup for others. Of course we do have platforms which
> do have large LEM tables off-chip.

But why do you need full table lookup in the middle of the network? Why place that class of gear where it's not needed?

>> Again, feel free to look at only one small aspect and say that it is completely better in all cases. MPLS is not better in wide ECMP cases, full stop. SR doesn't help that when you actually look at the problems at massive scale as I have done. You continually are on the trade-off spectrum of irrationally deep label stacks or enumeration of all of the possible paths through the network and burn all of your next-hop re-writes. At least if you want high-radiux, single chip systems. So this is not sentimentally around a protocol, it's the practical reality when you look at the problems at scale using commodity components. So if you want to optimize for costs and power (which is operational costs), MPLS is not where it is at.
>
> I'm not sure why this deep label stack keeps popping, if we need
> multiple levels of tunneling, we need it in IP too, and it's almost
> more expensive in IP. Cases I can think of in SR, you'll only loop top
> label or two, even if you might have 10 labels.
> For every apples to apples cases MPLS tunnels are superior to IP
> tunnels. If you want cheap very small FIB backbone, then all traffic
> will need to be IP tunneled to egress, and you get all the MPLS
> problems, and you get more overhead and larger keys (larger keys is
> not a big deal).

The label stack question is about the comparisons between the two extremes of SR that you can be in. You either label your packet just for it's ultimate destination or you apply the stack of the points you want to pass through.

In the former case you are, at the forwarding plane, equal to what you see with traditional MPLS today, with every node along the path needing to know how to reach the end-point. Yes, you have lowered label space from traditional MPLS, but that can be done with site-cast labels already. And, while the nodes don't have to actually swap labels, when you look at commodity implementations (across the last three generations since you want to do this with what is deployed, not wholesale replace the network) a null swap still ends up eating a unique egress next-hop entry. So from a hardware perspective, you haven't improved anything. Your ECMP group count is high.

In the extreme latter case, you have to, on ingress, place the full stack of every "site" you want to pass through. That has the benefits of "sites" only need labels for their directly connected sites, so you have optimized the implications on commodity hardware. However, you now have a label stack that can be quite tall. At least if you want to walk the long-way around the world, say due to failure. On top, that depth of label stack means devices in the middle can't look at the original payload to make ECMP decisions. So you can turn to entropy labels, but that sort of makes matters worse.

The practical reality is somewhere in the middle. At least you probably want some form of path engineering, so the first model really doesn't work or is at least equal to just doing traditional MPLS-TE. The closer you get to the latter, the higher your stack goes. So then you have to look at the hardware you want at the edge of your network and how many labels it can impose. If you want a all commodity network, that doesn't work.

So, yes, MPLS works fine if you want to buy big iron boxes. But that come at a cost. So the point about MPLS is always better is not accurate. Engineering is about trade-offs and there are trade-offs to be made when you optimize in a different direction and that points away from MPLS and back to IPIP

David

> Now if discussion is do we need tunnelling at all, the discussion is
> very different.
>
> --
> ++ytti

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 8:48 AM

Post #65 of 93 (3048 views)

On Fri, 12 Jun 2020 at 18:42, David Sinn <dsinn@dsinn.com> wrote:

> But why do you need full table lookup in the middle of the network? Why place that class of gear where it's not needed?

Some people do collapsed core. But this is getting bit theoretical,
because we definitely could do this on IP, we could do some lookups on
on-chip and some off-chip to do both, should market want it.

> The label stack question is about the comparisons between the two extremes of SR that you can be in. You either label your packet just for it's ultimate destination or you apply the stack of the points you want to pass through.

Quite, but transit won't inspect the stack, it doesn't have to care
about it, so it can be very deep.

> In the former case you are, at the forwarding plane, equal to what you see with traditional MPLS today, with every node along the path needing to know how to reach the end-point. Yes, you have lowered label space from traditional MPLS, but that can be done with site-cast labels already. And, while the nodes don't have to actually swap labels, when you look at commodity implementations (across the last three generations since you want to do this with what is deployed, not wholesale replace the network) a null swap still ends up eating a unique egress next-hop entry. So from a hardware perspective, you haven't improved anything. Your ECMP group count is high.

I don't disagree. What I'm trying to say, however you tunnel, you have
the same issues. If you need to tunnel, then MPLS is better tunnel
than IP. Ultimately both can be made LEM on-chip should market want
it, so difference left is what is the overhead of the tunnel, and MPLS
wins here handsdown. This is objectively true, now what practical
market reality is, that may be different, because market doesn't
optimise for best solution.

> So, yes, MPLS works fine if you want to buy big iron boxes. But that come at a cost. So the point about MPLS is always better is not accurate. Engineering is about trade-offs and there are trade-offs to be made when you optimize in a different direction and that points away from MPLS and back to IPIP

Always if you need tunnel. Because the fundamental question is how
much overhead we have, and what is our key width. In both IP tunnel
and MPLS tunnel cases we will assume LEM lookup, to keep the lookup
cheap.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 8:52 AM

Post #66 of 93 (3048 views)

>> The label stack question is about the comparisons between the two extremes of SR that you can be in. You either label your packet just for it's ultimate destination or you apply the stack of the points you want to pass through.
>
> Quite, but transit won't inspect the stack, it doesn't have to care
> about it, so it can be very deep.

Unless you want ECMP then it VERY much matters. But I guess since we are only talking about theoretical instead of building an actual practical network, it doesn't matter.

David
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

christian at errxtx

Jun 12, 2020, 8:58 AM

Post #67 of 93 (3048 views)

Salve,

On Thu, Jun 11, 2020 at 8:08 PM David Sinn <dsinn@dsinn.com> wrote:

Rewrites on MPLS is horrible from a memory perspective as maintaining the
> state and label transition to explore all possible discrete paths across
> the overall end-to-end path you are trying to take is hugely in-efficient.
> Applying circuit switching to a packet network was bad from the start. SR
> doesn't resolve that, as you are stuck with a global label problem and the
> associated lack of being able to engineer your paths, or a label stack
> problem on ingress that means you need a massive ASIC's and memories there.
>

I don't think rewrites are horrible, but just very flexible and this *can*
come up with a certain price. Irt to your memory argument that path
engineering takes in vanilla TE a lot of forwarding slots we should remind
us that this is not a design principle of MPLS. Discrete paths could also
be signalled in MPLS with shared link-labels so that you will end up with
the same big instructional headend packet as in SR. There are even
implementations offering this.

IP at least gives you rewrite sharing, so in a lite-core you have way
> better trade-off on resources, especially in a heavily ECMP'ed network.
> Such as one build of massive number of open small boxes vs. a small number
> of huge opaque ones. Pick your poison but saying one is inheriantly better
> then another in all cases is just plane false.
>

If I understand this argument correctly then it shouldn't be one because of
"rewrite sharing" being irrelevant for the addressability of single nodes
in a BGP network. Why a header lookup depth of 4B per label in engineered
and non-engineered paths should be a bad requisite for h/w designers of
modern networks is beyond me. In most MPLS networks (unengineered L3VPN)
you need to read less of headers than in a eg. VXLAN fabric to make ECMP
work (24B vs. 20B).
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 9:10 AM

Post #68 of 93 (3048 views)

On Fri, 12 Jun 2020 at 18:52, David Sinn <dsinn@dsinn.com> wrote:

> Unless you want ECMP then it VERY much matters. But I guess since we are only talking about theoretical instead of building an actual practical network, it doesn't matter.

Well blatantly we are, because in the real world most of the value of
MPLS tunnels is not available as IP tunnels. Again technically
entirely possible to replace MPLS tunnels with IP tunnels, just
question how much overhead you have in transporting the tunnel key and
how wide they are.

Should we design a rational cost-efficient solution, we should choose
the lowest overhead and narrowest working keys.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

ankost at podolsk

Jun 12, 2020, 10:12 AM

Post #69 of 93 (3048 views)

Saku Ytti ????? 2020-06-12 12:10:
> On Fri, 12 Jun 2020 at 18:52, David Sinn <dsinn@dsinn.com> wrote:
>
>> Unless you want ECMP then it VERY much matters. But I guess since we
>> are only talking about theoretical instead of building an actual
>> practical network, it doesn't matter.
>
> Well blatantly we are, because in the real world most of the value of
> MPLS tunnels is not available as IP tunnels. Again technically
> entirely possible to replace MPLS tunnels with IP tunnels, just
> question how much overhead you have in transporting the tunnel key and
> how wide they are.
>
> Should we design a rational cost-efficient solution, we should choose
> the lowest overhead and narrowest working keys.

Sorry for jumping in in the mddle of discussion, as a side note, in case
of IPIP tunneling, shouldn't another protocol type be utilized in MAC
header? As I understand, in VXLAN VTEP ip is dedicated for this purpose,
so receiving a packet with VTEP DST IP already means "decapsulate and
lookup the next header". But in traditional routers loopback IPs are
used for multiple purposes and usually receiving a packet with lo0 IP
means punt it to control plane. Isn't additional differentiator is
needed here to tell a router which type of action it has to do? Or, as
alternative, if dedicated stack of IPs is used for tunneling, then
another lookup table is needed for it, isn't it? And now looks like we
are coming to the header structure and forwarding process similar that
we already have in MPLS, only with different label format. Please
correct me if I went off track somewhere in this logical chain.

To David's point about ECMP I'd like to mention that in WAN networks
number of diverse paths is always limited, so having multiple links
taking same path doesn't make much sense. With current economics 4x10G
and 1x100G are usually close from price POV. Obviously, there are
different situations when multiple links are the only option, but how
many, usually 4-8. Sure if you need multiple 400G then there is
currently no option to go to higher speeds, but that's more DC use case
than WAN network. So ECMP in WAN network isn't that big scale problem
imho, also there are existing and proposed solutions, like SR, for it.

Kind regards,
Andrey
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 10:21 AM

Post #70 of 93 (3048 views)

On Fri, 12 Jun 2020 at 20:12, Andrey Kostin <ankost@podolsk.ru> wrote:

> Sorry for jumping in in the mddle of discussion, as a side note, in case
> of IPIP tunneling, shouldn't another protocol type be utilized in MAC
> header? As I understand, in VXLAN VTEP ip is dedicated for this purpose,
> so receiving a packet with VTEP DST IP already means "decapsulate and
> lookup the next header". But in traditional routers loopback IPs are
> used for multiple purposes and usually receiving a packet with lo0 IP
> means punt it to control plane. Isn't additional differentiator is
> needed here to tell a router which type of action it has to do? Or, as
> alternative, if dedicated stack of IPs is used for tunneling, then
> another lookup table is needed for it, isn't it? And now looks like we
> are coming to the header structure and forwarding process similar that
> we already have in MPLS, only with different label format. Please
> correct me if I went off track somewhere in this logical chain.

I don't think new etherType is mandatory by any means. Biggest gain is
security. SRv6 will necessarily have a lot of issues where
unauthorised packet gets treated as SRv6, which is much harder in MPLS
network. Many real-life devices work very differently with EH chains
(with massive performance drop, like can be 90%!). JNPR Trio will
parse up-to N EH, then drop if it cannot finish parsing. NOK FP wil
parse up-to N EH, then forward if it cannot finish parsing (i.e. now
it bypasses TCP/UDP denies, as it didn't know it is TCP/IP, or it
could have SRv6 EH, which it couldn't drop, as it didn't know it had
it).

But in terms of the big MPLS advantage, of having guaranteed exact
match lookups on small space, compared to LPM lookups on large space.
We could guarantee this in IPIP tunnels too, without having any
difference in the headers, other than obligation/guarantee that all
LSR packets are IPIP encapsulated with a small amount of outer packet
DADDRs.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

robert at raszuk

Jun 12, 2020, 11:41 AM

Post #71 of 93 (3048 views)

>
> I'm not sure why this deep label stack keeps popping, if we need
> multiple levels of tunneling, we need it in IP too, and it's almost
> more expensive in IP.
>

Well imagine you need only one level of tunneling but rich ECMP.

Then with IP encap (even MPLS app demux carried in UDP) you just make sure
src UDP port is random and voila works very nicely.

In contrast to achieve the same with MPLS you need Entropy label (which is
two - first the marker/indicator and then the actual value) + hardware
capable of reading it.

R.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 12:47 PM

Post #72 of 93 (3048 views)

On Fri, 12 Jun 2020 at 21:41, Robert Raszuk <robert@raszuk.net> wrote:

> Well imagine you need only one level of tunneling but rich ECMP.
>
> Then with IP encap (even MPLS app demux carried in UDP) you just make sure src UDP port is random and voila works very nicely.
>
> In contrast to achieve the same with MPLS you need Entropy label (which is two - first the marker/indicator and then the actual value) + hardware capable of reading it.

Extending MPLS is byte expensive, we could certainly improve that on
new tunneling headers. But it's still cheaper than looking up UDP, and
potentially impossible (IPv6, EH). Of course if you are working with
existing pipeline which has support for IIP UDP parsing but not
entropy parsing then it's easy to say in that special case UDP is a
better solution. But all things equal, MPLS still wins, but we could
definitely improve upon it.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 1:22 PM

Post #73 of 93 (3048 views)

> Rewrites on MPLS is horrible from a memory perspective as maintaining the state and label transition to explore all possible discrete paths across the overall end-to-end path you are trying to take is hugely in-efficient. Applying circuit switching to a packet network was bad from the start. SR doesn't resolve that, as you are stuck with a global label problem and the associated lack of being able to engineer your paths, or a label stack problem on ingress that means you need a massive ASIC's and memories there.
>
> I don't think rewrites are horrible, but just very flexible and this *can* come up with a certain price. Irt to your memory argument that path engineering takes in vanilla TE a lot of forwarding slots we should remind us that this is not a design principle of MPLS. Discrete paths could also be signalled in MPLS with shared link-labels so that you will end up with the same big instructional headend packet as in SR. There are even implementations offering this.

Except that is actually the problem if you look at it in hardware. And to be very specific, I'm talking about commodity hardware, not flexible pipelines like you find in the MX and a number of the ASR's. I'm also talking about the more recent approach of using Clos in PoP's instead of "big iron" or chassis based systems. On those boxes, it's actually better to not do shared labels, as this pushes the ECMP decision to the ingress node. That does mean you have to enumerate every possible path (or some approximate) through the network, however the action on the commodity gear is greatly reduced. It's a pure label swap, so you don't run into any egress next-hop problems. You definitely do on the ingress nodes. Very, very badly actually.

So you can move to a shared label mode. Now the commodity boxes have to perform ECMP. That means they also have to have a unique ECMP group for every site/any-cast label passing through them, as every label is being swapped differently. You get no reuse for two labels that are on identical paths because the "swaps" are not identical. So you hit up against ECMP next-hop group starvation, forcing you to lower radix and limiting total any-/site-cast count.

> IP at least gives you rewrite sharing, so in a lite-core you have way better trade-off on resources, especially in a heavily ECMP'ed network. Such as one build of massive number of open small boxes vs. a small number of huge opaque ones. Pick your poison but saying one is inheriantly better then another in all cases is just plane false.
>
> If I understand this argument correctly then it shouldn't be one because of "rewrite sharing" being irrelevant for the addressability of single nodes in a BGP network. Why a header lookup depth of 4B per label in engineered and non-engineered paths should be a bad requisite for h/w designers of modern networks is beyond me. In most MPLS networks (unengineered L3VPN) you need to read less of headers than in a eg. VXLAN fabric to make ECMP work (24B vs. 20B).

What I'm getting at is that IP allows re-write sharing in that what needs to change on two IP frames taking the same paths but ultimately reaching different destinations are re-written (e.g. DMAC, egress-port) identically. And, at least with IPIP, you are able to look at the inner-frame for ECMP calculations. Depending on your MPLS design, that may not be the case. If you have too deep of a label stack (3-5 depending on ASIC), you can't look at the payload and you end up with polarization.

David

_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 1:25 PM

Post #74 of 93 (3048 views)

>
>> Unless you want ECMP then it VERY much matters. But I guess since we are only talking about theoretical instead of building an actual practical network, it doesn't matter.
>
> Well blatantly we are, because in the real world most of the value of
> MPLS tunnels is not available as IP tunnels. Again technically
> entirely possible to replace MPLS tunnels with IP tunnels, just
> question how much overhead you have in transporting the tunnel key and
> how wide they are.

You may be, I am not. I'm talking about practical networks and the use-case that multiple large networks are going down around commodity ASIC's. And it is a practial question about the total solution not point-specific ones. Overhead is only one part. Lookup delay is not at all for the class I'm referring to.

> Should we design a rational cost-efficient solution, we should choose
> the lowest overhead and narrowest working keys.

In the abstract, sure. But if you want a practical, deployable, production network, it's multi-dimensioned.

David
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/

Re: LDPv6 Census Check [ In reply to ]

Jun 12, 2020, 1:29 PM

Post #75 of 93 (3048 views)

> On Jun 12, 2020, at 11:41 AM, Robert Raszuk <robert@raszuk.net> wrote:
>
> I'm not sure why this deep label stack keeps popping, if we need
> multiple levels of tunneling, we need it in IP too, and it's almost
> more expensive in IP.
>
> Well imagine you need only one level of tunneling but rich ECMP.
>
> Then with IP encap (even MPLS app demux carried in UDP) you just make sure src UDP port is random and voila works very nicely.

In reality you don't even need to do UDP and consume those extra bytes. All of the commodity pipelines as well as the big-iron ones will look at the inner-header of a IPIP frame for entropy calculations if you configure them to do so. You also have the problem that most of the very high-performance commodity ASIC's are moving away from VXLAN/UDP encap since they are in the middle of the classic core/edge feature split and deem VXLAN as edge.

David
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/