Mailing List Archive

BGP unnumbered examples from data center network using RFC 5549 et al. [was: Re: RFC 5549 - IPv4 Routes with IPv6 next-hop - Does it really exists?]
Mark Tinka writes:
> On 29/Jul/20 15:51, Simon Leinen wrote:

>>
>> Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
>> sw-o(swp16) 4 65108 953559 938348 0 0 0 03w5d00h 688
>> sw-m(swp18) 4 65108 885442 938348 0 0 0 03w5d00h 688
>> s0001(swp1s0.3) 4 65300 748971 748977 0 0 0 03w5d00h 1
>> [...]
>>
>> Note the host names/interface names - this is how you generally refer to
>> neighbors, rather than using literal (IPv6) addresses.

> Are the names based on DNS look-ups, or is there some kind of protocol
> association between the device underlay and its hostname, as it pertains
> to neighbors?

As Nick mentions, the hostnames are from the BGP hostname extension.

I should have noticed that, but we use "BGP unnumbered"[1][2], which
uses RAs to discover the peer's IPv6 link-local address, and then builds
an IPv6 BGP session (that uses RFC 5549 to transfer IPv4 NLRIs as well).

Here are some excerpts of the configuration on such a leaf router.

General BGP boilerplate:

------------------------------
router bgp 65111
bgp router-id 10.1.1.46
bgp bestpath as-path multipath-relax
bgp bestpath compare-routerid
!
address-family ipv4 unicast
network 10.1.1.46/32
redistribute connected
redistribute static
exit-address-family
!
address-family ipv6 unicast
network 2001:db8:1234:101::46/128
redistribute connected
redistribute static
exit-address-family
------------------------------

Leaf switch <-> server connection: (we use a 802.1q tagged subinterface
for the BGP peering and L3 server traffic; the untagged interface is
used only for netbooting the servers when (re)installing the OS. Here,
servers just get IPv4+IPv6 default routes, and each server will only
announce a single IPv4+IPv6 (loopback) address, i.e. the leaf/server
links are also "unnumbered". Very simple redundant setup without any
LACP/MLAG protocols... it's all just BGP+IPv6 ND. You can basically
connect any server to any switch port and things will "just work"
without special inter-switch links etc.)

------------------------------
interface swp1s0
description s0001.s1.scloud.switch.ch p8p1
!
interface swp1s0.3
description s0001.s1.scloud.switch.ch p8p1
ipv6 nd ra-interval 3
no ipv6 nd suppress-ra
!
[...]
router bgp 65111
neighbor servers peer-group
neighbor servers remote-as external
neighbor servers capability extended-nexthop
neighbor swp1s0.3 interface peer-group servers
!
address-family ipv4 unicast
neighbor servers default-originate
neighbor servers soft-reconfiguration inbound
neighbor servers prefix-list DEFAULTV4-PERMIT out
exit-address-family
!
address-family ipv6 unicast
neighbor servers activate
neighbor servers default-originate
neighbor servers soft-reconfiguration inbound
neighbor servers prefix-list DEFAULTV6-PERMIT out
exit-address-family
!
ip prefix-list DEFAULT-PERMIT permit 0.0.0.0/0
!
ipv6 prefix-list DEFAULTV6-PERMIT permit ::/0
------------------------------

Leaf <-> spine:

------------------------------
interface swp16
description sw-o port 22
ipv6 nd ra-interval 3
no ipv6 nd suppress-ra
!
[...]
router bgp 65111
neighbor fabric peer-group
neighbor fabric remote-as external
neighbor fabric capability extended-nexthop
neighbor swp16 interface peer-group fabric
!
address-family ipv4 unicast
neighbor fabric soft-reconfiguration inbound
!
address-family ipv6 unicast
neighbor fabric activate
neighbor fabric soft-reconfiguration inbound
------------------------------

Note the "remote-as external" - this will accept any AS other than the
router's own AS. AS numbering in this DC setup is a bit weird if you're
used to BGP... each leaf switch has its own AS, all spine switches
should have the same AS number (for reasons...), and all servers have
the same AS because who cares. (We are talking about three disjoint
sets of AS numbers for leaves/spines/servers though.)
--
Simon.

[1] https://cumulusnetworks.com/blog/bgp-unnumbered-overview/
[2] https://support.cumulusnetworks.com/hc/en-us/articles/212561648-Configuring-BGP-Unnumbered-with-Cisco-IOS
Re: BGP unnumbered examples from data center network using RFC 5549 et al. [was: Re: RFC 5549 - IPv4 Routes with IPv6 next-hop - Does it really exists?] [ In reply to ]
On 30/Jul/20 12:00, Simon Leinen wrote:

> As Nick mentions, the hostnames are from the BGP hostname extension.
>
> I should have noticed that, but we use "BGP unnumbered"[1][2], which
> uses RAs to discover the peer's IPv6 link-local address, and then builds
> an IPv6 BGP session (that uses RFC 5549 to transfer IPv4 NLRIs as well).
>
> Here are some excerpts of the configuration on such a leaf router.
>
> General BGP boilerplate:
>
> ------------------------------
> router bgp 65111
> bgp router-id 10.1.1.46
> bgp bestpath as-path multipath-relax
> bgp bestpath compare-routerid
> !
> address-family ipv4 unicast
> network 10.1.1.46/32
> redistribute connected
> redistribute static
> exit-address-family
> !
> address-family ipv6 unicast
> network 2001:db8:1234:101::46/128
> redistribute connected
> redistribute static
> exit-address-family
> ------------------------------
>
> Leaf switch <-> server connection: (we use a 802.1q tagged subinterface
> for the BGP peering and L3 server traffic; the untagged interface is
> used only for netbooting the servers when (re)installing the OS. Here,
> servers just get IPv4+IPv6 default routes, and each server will only
> announce a single IPv4+IPv6 (loopback) address, i.e. the leaf/server
> links are also "unnumbered". Very simple redundant setup without any
> LACP/MLAG protocols... it's all just BGP+IPv6 ND. You can basically
> connect any server to any switch port and things will "just work"
> without special inter-switch links etc.)
>
> ------------------------------
> interface swp1s0
> description s0001.s1.scloud.switch.ch p8p1
> !
> interface swp1s0.3
> description s0001.s1.scloud.switch.ch p8p1
> ipv6 nd ra-interval 3
> no ipv6 nd suppress-ra
> !
> [...]
> router bgp 65111
> neighbor servers peer-group
> neighbor servers remote-as external
> neighbor servers capability extended-nexthop
> neighbor swp1s0.3 interface peer-group servers
> !
> address-family ipv4 unicast
> neighbor servers default-originate
> neighbor servers soft-reconfiguration inbound
> neighbor servers prefix-list DEFAULTV4-PERMIT out
> exit-address-family
> !
> address-family ipv6 unicast
> neighbor servers activate
> neighbor servers default-originate
> neighbor servers soft-reconfiguration inbound
> neighbor servers prefix-list DEFAULTV6-PERMIT out
> exit-address-family
> !
> ip prefix-list DEFAULT-PERMIT permit 0.0.0.0/0
> !
> ipv6 prefix-list DEFAULTV6-PERMIT permit ::/0
> ------------------------------
>
> Leaf <-> spine:
>
> ------------------------------
> interface swp16
> description sw-o port 22
> ipv6 nd ra-interval 3
> no ipv6 nd suppress-ra
> !
> [...]
> router bgp 65111
> neighbor fabric peer-group
> neighbor fabric remote-as external
> neighbor fabric capability extended-nexthop
> neighbor swp16 interface peer-group fabric
> !
> address-family ipv4 unicast
> neighbor fabric soft-reconfiguration inbound
> !
> address-family ipv6 unicast
> neighbor fabric activate
> neighbor fabric soft-reconfiguration inbound
> ------------------------------
>
> Note the "remote-as external" - this will accept any AS other than the
> router's own AS. AS numbering in this DC setup is a bit weird if you're
> used to BGP... each leaf switch has its own AS, all spine switches
> should have the same AS number (for reasons...), and all servers have
> the same AS because who cares. (We are talking about three disjoint
> sets of AS numbers for leaves/spines/servers though.)

Interesting.

Data centre bits are, interesting :-).

Thanks for sharing.

Mark.