Mailing List Archive

BGP route-reflection question
Hi All,

Let's imagine that I have two straight ipv4 BGP route reflectors in the same
cluster, say R1 and RR2. Now imagine that I have a reflector client, which
is:

a) Connected to both of them via dedicated point to point links;
b) Advertising the same routes to both of them.

Now imagine that one of the point to point links (say client to RR1) has
failed. Client would still advertise it's routes to the RR2, and RR2 would
readvertise them to RR1.

The experience that I had with other vendor's equipment is that RR1 would
not accept the client route advertisements from RR2, because it would reject
them having seen it's own cluster ID in these. The only way to make this
work was to remove the cluster ID (cluster list is used instead to prevent
routing loops).

In JunOS, it is a prerequisite and the only way to configure route reflector
client by setting the cluster ID in the bgp group to which the client
belongs.

http://www.juniper.net/techpubs/software/junos/junos57/swconfig57-routing/html/bgp-config37.html#1015846

states:

"By default, the BGP route reflector performs intracluster reflection
because it assumes that all the client peers are not fully meshed."

Could anybody please clarify to me how this is implemented in JunOS and this
"intracluster reflection" is?

SY,
--
D.K.
BGP route-reflection question [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If there is a connection between RR1 and RR2 then the client will maintain
iBGP connections to both of the RRs. Remember, iBGP is inherently capable
of multihop.

The other possible solution for getting RR1 to accept routes from RR2 is to
give them both different cluster IDs. There is no requirement for all RRs
associated with a single "cluster" to have a common cluster ID. Therefore,
each RR would see the other's cluster ID in the cluster list and accept the
prefix. However, it would still be receiving a better path direct from the
client (shorter cluster list) :-)

To answer your question. Intracluster reflection is client-to-client
reflection (i.e. routes learned by RR from client A will be reflected to
client B) so there is no requirement for a full mesh within the cluster. If
you want to turn this off then you can do so with "no-client-reflect" in the
particular group.

Regards,

Guy

> -----Original Message-----
> From: Dmitri Kalintsev [mailto:dek@hades.uz]
> Sent: Wednesday, May 28, 2003 8:00 AM
> To: juniper-nsp@puck.nether.net
> Subject: [j-nsp] BGP route-reflection question
>
>
> Hi All,
>
> Let's imagine that I have two straight ipv4 BGP route
> reflectors in the same
> cluster, say R1 and RR2. Now imagine that I have a reflector
> client, which
> is:
>
> a) Connected to both of them via dedicated point to point links;
> b) Advertising the same routes to both of them.
>
> Now imagine that one of the point to point links (say client
> to RR1) has
> failed. Client would still advertise it's routes to the RR2,
> and RR2 would
> readvertise them to RR1.
>
> The experience that I had with other vendor's equipment is
> that RR1 would
> not accept the client route advertisements from RR2, because
> it would reject
> them having seen it's own cluster ID in these. The only way
> to make this
> work was to remove the cluster ID (cluster list is used
> instead to prevent
> routing loops).
>
> In JunOS, it is a prerequisite and the only way to configure
> route reflector
> client by setting the cluster ID in the bgp group to which the client
> belongs.
>
> http://www.juniper.net/techpubs/software/junos/junos57/swconfi
g57-routing/html/bgp-config37.html#1015846

states:

"By default, the BGP route reflector performs intracluster reflection
because it assumes that all the client peers are not fully meshed."

Could anybody please clarify to me how this is implemented in JunOS and this
"intracluster reflection" is?

SY,
- --
D.K.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPtRv0I3dwu/Ss2PCEQKfyACg4+wpYzgSROIggW6hpgg/hfwrQZgAoK5Z
E17sL9Y2gYBeP5ubHqS2oi+O
=BEbS
-----END PGP SIGNATURE-----
BGP route-reflection question [ In reply to ]
i second guys' assessment: having RRs with different cluster-IDs is the way
how we recommend setting up route reflection;

/hannes

On Wed, May 28, 2003 at 09:15:06AM +0100, Guy Davies wrote:
|
| -----BEGIN PGP SIGNED MESSAGE-----
| Hash: SHA1
|
| If there is a connection between RR1 and RR2 then the client will maintain
| iBGP connections to both of the RRs. Remember, iBGP is inherently capable
| of multihop.
|
| The other possible solution for getting RR1 to accept routes from RR2 is to
| give them both different cluster IDs. There is no requirement for all RRs
| associated with a single "cluster" to have a common cluster ID. Therefore,
| each RR would see the other's cluster ID in the cluster list and accept the
| prefix. However, it would still be receiving a better path direct from the
| client (shorter cluster list) :-)
|
| To answer your question. Intracluster reflection is client-to-client
| reflection (i.e. routes learned by RR from client A will be reflected to
| client B) so there is no requirement for a full mesh within the cluster. If
| you want to turn this off then you can do so with "no-client-reflect" in the
| particular group.
|
| Regards,
|
| Guy
|
| > -----Original Message-----
| > From: Dmitri Kalintsev [mailto:dek@hades.uz]
| > Sent: Wednesday, May 28, 2003 8:00 AM
| > To: juniper-nsp@puck.nether.net
| > Subject: [j-nsp] BGP route-reflection question
| >
| >
| > Hi All,
| >
| > Let's imagine that I have two straight ipv4 BGP route
| > reflectors in the same
| > cluster, say R1 and RR2. Now imagine that I have a reflector
| > client, which
| > is:
| >
| > a) Connected to both of them via dedicated point to point links;
| > b) Advertising the same routes to both of them.
| >
| > Now imagine that one of the point to point links (say client
| > to RR1) has
| > failed. Client would still advertise it's routes to the RR2,
| > and RR2 would
| > readvertise them to RR1.
| >
| > The experience that I had with other vendor's equipment is
| > that RR1 would
| > not accept the client route advertisements from RR2, because
| > it would reject
| > them having seen it's own cluster ID in these. The only way
| > to make this
| > work was to remove the cluster ID (cluster list is used
| > instead to prevent
| > routing loops).
| >
| > In JunOS, it is a prerequisite and the only way to configure
| > route reflector
| > client by setting the cluster ID in the bgp group to which the client
| > belongs.
| >
| > http://www.juniper.net/techpubs/software/junos/junos57/swconfi
| g57-routing/html/bgp-config37.html#1015846
|
| states:
|
| "By default, the BGP route reflector performs intracluster reflection
| because it assumes that all the client peers are not fully meshed."
|
| Could anybody please clarify to me how this is implemented in JunOS and this
| "intracluster reflection" is?
|
| SY,
| - --
| D.K.
BGP route-reflection question [ In reply to ]
Hannes, Guy:

Thanks for an info. It is nothing critical, just something to remember I
guess. :)

Guy: in case when client is using its point to point link address to
communicate with RR2, its session will go down with the link. Sorry for not
specifying this in my "case". :) Yes, I know that it's a good practice to
talk Lo0 to Lo0. :)

P.S. I've noticed yesterday that the other vendor is now also says that
having more than one RR in the same cluster is "not recommended". *Sigh*,
the world has changed, hasn't it? ;)

SY,
--
D.K.

On Wed, May 28, 2003 at 05:15:54PM +0200, Hannes Gredler wrote:
> i second guys' assessment: having RRs with different cluster-IDs is the way
> how we recommend setting up route reflection;
>
> /hannes
>
> On Wed, May 28, 2003 at 09:15:06AM +0100, Guy Davies wrote:
> |
> | -----BEGIN PGP SIGNED MESSAGE-----
> | Hash: SHA1
> |
> | If there is a connection between RR1 and RR2 then the client will maintain
> | iBGP connections to both of the RRs. Remember, iBGP is inherently capable
> | of multihop.
> |
> | The other possible solution for getting RR1 to accept routes from RR2 is to
> | give them both different cluster IDs. There is no requirement for all RRs
> | associated with a single "cluster" to have a common cluster ID. Therefore,
> | each RR would see the other's cluster ID in the cluster list and accept the
> | prefix. However, it would still be receiving a better path direct from the
> | client (shorter cluster list) :-)
> |
> | To answer your question. Intracluster reflection is client-to-client
> | reflection (i.e. routes learned by RR from client A will be reflected to
> | client B) so there is no requirement for a full mesh within the cluster. If
> | you want to turn this off then you can do so with "no-client-reflect" in the
> | particular group.
> |
> | Regards,
> |
> | Guy
> |
> | > -----Original Message-----
> | > From: Dmitri Kalintsev [mailto:dek@hades.uz]
> | > Sent: Wednesday, May 28, 2003 8:00 AM
> | > To: juniper-nsp@puck.nether.net
> | > Subject: [j-nsp] BGP route-reflection question
> | >
> | >
> | > Hi All,
> | >
> | > Let's imagine that I have two straight ipv4 BGP route
> | > reflectors in the same
> | > cluster, say R1 and RR2. Now imagine that I have a reflector
> | > client, which
> | > is:
> | >
> | > a) Connected to both of them via dedicated point to point links;
> | > b) Advertising the same routes to both of them.
> | >
> | > Now imagine that one of the point to point links (say client
> | > to RR1) has
> | > failed. Client would still advertise it's routes to the RR2,
> | > and RR2 would
> | > readvertise them to RR1.
> | >
> | > The experience that I had with other vendor's equipment is
> | > that RR1 would
> | > not accept the client route advertisements from RR2, because
> | > it would reject
> | > them having seen it's own cluster ID in these. The only way
> | > to make this
> | > work was to remove the cluster ID (cluster list is used
> | > instead to prevent
> | > routing loops).
> | >
> | > In JunOS, it is a prerequisite and the only way to configure
> | > route reflector
> | > client by setting the cluster ID in the bgp group to which the client
> | > belongs.
> | >
> | > http://www.juniper.net/techpubs/software/junos/junos57/swconfi
> | g57-routing/html/bgp-config37.html#1015846
> |
> | states:
> |
> | "By default, the BGP route reflector performs intracluster reflection
> | because it assumes that all the client peers are not fully meshed."
> |
> | Could anybody please clarify to me how this is implemented in JunOS and this
> | "intracluster reflection" is?
> |
> | SY,
> | - --
> | D.K.
---end quoted text---

--
D.K.
BGP route-reflection question [ In reply to ]
On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:

> P.S. I've noticed yesterday that the other vendor is now also says that
> having more than one RR in the same cluster is "not recommended". *Sigh*,
> the world has changed, hasn't it? ;)

Folks should be careful here, I'm not sure that this is truly a
"recommended" design, per se, as it can effect lots of things significantly.
For example, less optimal BGP update packing and subsequently, slower
convergence & much higher CPU resource utilization, etc... In addition, it
increases Adj-RIB-In sizes [on many boxes] and can have a significant impact
on steady state memory utilization. Imagine multiple levels of reflection
or more than two reflectors for a given cluster, etc.. The impact of
propagating and maintaining redundant paths with slightly different
attribute pairings, especially in complex topologies, should be heavily
weighed.

What I'd _probably recommend is a common cluster_id for all RRs withing a
cluster, a full mesh of iBGP sessions between clients and loopback iBGP
peering everywhere such that if the client<->RR1 link fails there's an
alternative path for the BGP session via RR2 (after all, the connectivity is
there anyway) and nothings disrupted. There are lots of other variable to
be considered as well, but IMO, simply using different cuslter_ids isn't a
clean solution.

-danny
BGP route-reflection question [ In reply to ]
On Wed, May 28, 2003 at 06:04:33PM -0600, Danny McPherson wrote:
| On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
|
| > P.S. I've noticed yesterday that the other vendor is now also says that
| > having more than one RR in the same cluster is "not recommended". *Sigh*,
| > the world has changed, hasn't it? ;)
|
| Folks should be careful here, I'm not sure that this is truly a
| "recommended" design, per se, as it can effect lots of things significantly.
| For example, less optimal BGP update packing and subsequently, slower
| convergence & much higher CPU resource utilization, etc... In addition, it
| increases Adj-RIB-In sizes [on many boxes] and can have a significant impact
| on steady state memory utilization. Imagine multiple levels of reflection
| or more than two reflectors for a given cluster, etc.. The impact of
| propagating and maintaining redundant paths with slightly different
| attribute pairings, especially in complex topologies, should be heavily
| weighed.
|
| What I'd _probably recommend is a common cluster_id for all RRs withing a
| cluster, a full mesh of iBGP sessions between clients and loopback iBGP
| peering everywhere such that if the client<->RR1 link fails there's an
| alternative path for the BGP session via RR2 (after all, the connectivity is
| there anyway) and nothings disrupted. There are lots of other variable to
| be considered as well, but IMO, simply using different cuslter_ids isn't a
| clean solution.

danny,

most of the BGP scaling properties are bound to memory size, you are right:

however, what i fail to see is why path diversity is negatively impacting convergence;
what i have seen to far is contrary: a healthy path diversity speeds
up convergence;

of course many paths do cost memory; so the main challenge
is to convince "the other vendor" to ship proper memory with their boxes and
not to tweak the design of the routing mesh to the limitations of a single
implementation;

typically the design lives much longer than a single boxes' lifespan ;-)

/hannes
BGP route-reflection question [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dmitri,

I have to say that I don't necessarily *recommend* using different cluster
IDs in the same cluster. I merely said that it is a means to achieving what
you wanted. I knew that Hannes specifically and possibly Juniper generally
recommends doing this but I am with Danny on this and personally recommend
using the same cluster ID and doing all iBGP from lo0 to lo0. IMHO, using
different cluster IDs wins you little in a well structured network and can
cost you a lot (as described by Danny).

No offence intended Hannes :-)

Regards,

Guy

> -----Original Message-----
> From: Danny McPherson [mailto:danny@tcb.net]
> Sent: Thursday, May 29, 2003 1:05 AM
> To: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] BGP route-reflection question
>
>
> On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
>
> > P.S. I've noticed yesterday that the other vendor is now
> also says that
> > having more than one RR in the same cluster is "not
> recommended". *Sigh*,
> > the world has changed, hasn't it? ;)
>
> Folks should be careful here, I'm not sure that this is truly a
> "recommended" design, per se, as it can effect lots of things
> significantly.
> For example, less optimal BGP update packing and subsequently, slower
> convergence & much higher CPU resource utilization, etc...
> In addition, it
> increases Adj-RIB-In sizes [on many boxes] and can have a
> significant impact
> on steady state memory utilization. Imagine multiple levels
> of reflection
> or more than two reflectors for a given cluster, etc.. The impact of
> propagating and maintaining redundant paths with slightly different
> attribute pairings, especially in complex topologies, should
> be heavily
> weighed.
>
> What I'd _probably recommend is a common cluster_id for all
> RRs withing a
> cluster, a full mesh of iBGP sessions between clients and
> loopback iBGP
> peering everywhere such that if the client<->RR1 link fails there's an
> alternative path for the BGP session via RR2 (after all, the
> connectivity is
> there anyway) and nothings disrupted. There are lots of
> other variable to
> be considered as well, but IMO, simply using different
> cuslter_ids isn't a
> clean solution.
>
> -danny
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
>

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPtXD0o3dwu/Ss2PCEQIzPACg93FjwCb8GtOizkuL7xQTvELljlAAoOnz
op8zskvSSdSPr4kmnYuu/psn
=zl9q
-----END PGP SIGNATURE-----
BGP route-reflection question [ In reply to ]
On 5/29/03 2:18 AM, "Hannes Gredler" <hannes@juniper.net> wrote:

> most of the BGP scaling properties are bound to memory size, you are right:
>
> however, what i fail to see is why path diversity is negatively impacting
> convergence;
> what i have seen to far is contrary: a healthy path diversity speeds
> up convergence;

I think you're confusing the issue, what new diversity do you get that
simple loopback peering wouldn't provide? The transmission substrate
doesn't change, you're simply adding lots of overhead unnecessarily,
subsequently effecting convergence, memory consumption and CPU utilization,
in _any router.

> of course many paths do cost memory; so the main challenge
> is to convince "the other vendor" to ship proper memory with their boxes and
> not to tweak the design of the routing mesh to the limitations of a single
> implementation;

It's not just about memory, or any particular vendor. Although memory is
one factor (in which case by following your recommendation you'll(J) use
more as well, no?), it's also about the protocol capabilities. If you send
2x or 3x the amount of updates because they can't be packed as efficiently
that effects the entire routing system -- not just a single box -- although
every individual box is effected as well. If receivers now have to pack and
process twice as many updates, or aggregate Adj-RIBs-In are much larger,
that eventually effects the characteristics of the entire routing system.

> typically the design lives much longer than a single boxes' lifespan ;-)

Indeed, and that's why you, as well as any other vendor, should be concerned
with the effects that recommendations you make have on the larger routing
system. This isn't about Cisco, Juniper or insert_vendor_here, don't make
it. It's about clean network architecture, something that will effect not
only the local routing system, but distant networks as well.


-danny
BGP route-reflection question [ In reply to ]
On Thu, May 29, 2003 at 09:14:18AM -0600, Danny McPherson wrote:
| On 5/29/03 2:18 AM, "Hannes Gredler" <hannes@juniper.net> wrote:
|
| > most of the BGP scaling properties are bound to memory size, you are right:
| >
| > however, what i fail to see is why path diversity is negatively impacting
| > convergence;
| > what i have seen to far is contrary: a healthy path diversity speeds
| > up convergence;
|
| I think you're confusing the issue, what new diversity do you get that
| simple loopback peering wouldn't provide? The transmission substrate
| doesn't change, you're simply adding lots of overhead unnecessarily,
| subsequently effecting convergence, memory consumption and CPU utilization,
| in _any router.

no doubt simple loopback peering inside the lcuster does do the trick;
however from an administration point of view, SPs in my theatre here,
try to avoid the intracluster full mesh and better
go with diverse cluster IDs in the same RR level;

| > of course many paths do cost memory; so the main challenge
| > is to convince "the other vendor" to ship proper memory with their boxes and
| > not to tweak the design of the routing mesh to the limitations of a single
| > implementation;
|
| It's not just about memory, or any particular vendor. Although memory is
| one factor (in which case by following your recommendation you'll(J) use
| more as well, no?), it's also about the protocol capabilities. If you send
| 2x or 3x the amount of updates because they can't be packed as efficiently
| that effects the entire routing system -- not just a single box -- although
| every individual box is effected as well. If receivers now have to pack and
| process twice as many updates, or aggregate Adj-RIBs-In are much larger,
| that eventually effects the characteristics of the entire routing system.

indeed it is; you are assumming that it is double the amount of processing load,
however that entirely depends on the implementation; i.e. how the system internally
maintains its path and prefix structures;

| > typically the design lives much longer than a single boxes' lifespan ;-)
|
| Indeed, and that's why you, as well as any other vendor, should be concerned
| with the effects that recommendations you make have on the larger routing
| system. This isn't about Cisco, Juniper or insert_vendor_here, don't make
| it. It's about clean network architecture, something that will effect not
| only the local routing system, but distant networks as well.

don't get me wrong;
i did not want to ride the "lets burn memeory, coz we have lots of them" wave
its simply that too often i have seen network architectures being built around the
limitations of a 20K$ box; the extra 10% of memory that diverse cluster IDs do
cost IMHO outweight the administration cost of maintaining the intracluster
full-mesh; if a box does not stand that extra few MBs then it should not belong
in the core;

/hannes
BGP route-reflection question [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Hannes,

> -----Original Message-----
>
> On Thu, May 29, 2003 at 09:14:18AM -0600, Danny McPherson wrote:
> | On 5/29/03 2:18 AM, "Hannes Gredler" <hannes@juniper.net> wrote:
> |
> | > most of the BGP scaling properties are bound to memory
> size, you are right:
> | >
> | > however, what i fail to see is why path diversity is
> negatively impacting
> | > convergence;
> | > what i have seen to far is contrary: a healthy path
> diversity speeds
> | > up convergence;
> |
> | I think you're confusing the issue, what new diversity do
> you get that
> | simple loopback peering wouldn't provide? The
> transmission substrate
> | doesn't change, you're simply adding lots of overhead unnecessarily,
> | subsequently effecting convergence, memory consumption and
> CPU utilization,
> | in _any router.
>
> no doubt simple loopback peering inside the lcuster does do the trick;
> however from an administration point of view, SPs in my theatre here,
> try to avoid the intracluster full mesh and better
> go with diverse cluster IDs in the same RR level;

I may be missing something, but what does a full mesh within the cluster and
unique cluster IDs gain you over client-to-client reflection between each
client with each client peering with both RRs' loopbacks? I cannot even
contrive a network design where this would improve convergence, stability or
functionality.

> | > of course many paths do cost memory; so the main challenge
> | > is to convince "the other vendor" to ship proper memory
> with their boxes and
> | > not to tweak the design of the routing mesh to the
> limitations of a single
> | > implementation;
> |
> | It's not just about memory, or any particular vendor.
> Although memory is
> | one factor (in which case by following your recommendation
> you'll(J) use
> | more as well, no?), it's also about the protocol
> capabilities. If you send
> | 2x or 3x the amount of updates because they can't be packed
> as efficiently
> | that effects the entire routing system -- not just a single
> box -- although
> | every individual box is effected as well. If receivers now
> have to pack and
> | process twice as many updates, or aggregate Adj-RIBs-In are
> much larger,
> | that eventually effects the characteristics of the entire
> routing system.
>
> indeed it is; you are assumming that it is double the amount
> of processing load,
> however that entirely depends on the implementation; i.e. how
> the system internally
> maintains its path and prefix structures;

I don't think Danny said that there would be twice the processing load. He
said there would be twice as many updates. As you say, how linear the
processing load is relative to the number of updates received is entirely
implementation dependent.

> | > typically the design lives much longer than a single
> boxes' lifespan ;-)
> |
> | Indeed, and that's why you, as well as any other vendor,
> should be concerned
> | with the effects that recommendations you make have on the
> larger routing
> | system. This isn't about Cisco, Juniper or
> insert_vendor_here, don't make
> | it. It's about clean network architecture, something that
> will effect not
> | only the local routing system, but distant networks as well.
>
> don't get me wrong;
> i did not want to ride the "lets burn memeory, coz we have
> lots of them" wave
> its simply that too often i have seen network architectures
> being built around the
> limitations of a 20K$ box; the extra 10% of memory that
> diverse cluster IDs do
> cost IMHO outweight the administration cost of maintaining
> the intracluster
> full-mesh; if a box does not stand that extra few MBs then it
> should not belong
> in the core;

An intracluster full-mesh is nasty. I'd avoid that if at all possible.
However, I still don't understand why you can't get the necessary resilience
and performance with a client-to-client reflection system where the RRs use
the same cluster ID. Of course, peerings must be between lo0.

Regards,

Guy

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPtZ1uo3dwu/Ss2PCEQJCVwCaA9V9zavqB3vZHVR4p6wO85pUAK0AoNcw
WjaG7/67lTVzNuPxdaVpCYWh
=k/I4
-----END PGP SIGNATURE-----
BGP route-reflection question [ In reply to ]
Hmm, this has turned out to be a somewhat hotter-than-anticipated
discussion, so I went to the source, as any good Luke would. The RFC2796
says:

"In a simple configuration the backbone could be divided into many
clusters. Each RR would be configured with other RRs as Non-Client peers
(thus all the RRs will be fully meshed.). The Clients will be configured
to maintain IBGP session only with the RR in their cluster. Due to route
reflection, all the IBGP speakers will receive reflected routing
information."

So, having a client talking to two RRs in different clusters contradicts
this RFC. We're back to the square one.

What I want to say is that in an ideal world I would have appreciated the
ability NOT to set the cluster ID, reverting back to the originator-id loop
detection mechanism. I think that the network designer should be given the
right to choose his own poison, and feel that the way Juniper's config
imposes the use of cluster-ids when configuring an RR client is a weeny bit
pushy. ;^P

Just my 2c.
--
D.K.

On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi Dmitri,
>
> I have to say that I don't necessarily *recommend* using different cluster
> IDs in the same cluster. I merely said that it is a means to achieving
> what you wanted. I knew that Hannes specifically and possibly Juniper
> generally recommends doing this but I am with Danny on this and personally
> recommend using the same cluster ID and doing all iBGP from lo0 to lo0.
> IMHO, using different cluster IDs wins you little in a well structured
> network and can cost you a lot (as described by Danny).
>
> No offence intended Hannes :-)
>
> Regards,
>
> Guy
>
> > -----Original Message-----
> > From: Danny McPherson [mailto:danny@tcb.net]
> > Sent: Thursday, May 29, 2003 1:05 AM
> > To: juniper-nsp@puck.nether.net
> > Subject: Re: [j-nsp] BGP route-reflection question
> >
> >
> > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
> >
> > > P.S. I've noticed yesterday that the other vendor is now also says
> > > that having more than one RR in the same cluster is "not recommended".
> > > *Sigh*, the world has changed, hasn't it? ;)
> >
> > Folks should be careful here, I'm not sure that this is truly a
> > "recommended" design, per se, as it can effect lots of things
> > significantly. For example, less optimal BGP update packing and
> > subsequently, slower convergence & much higher CPU resource utilization,
> > etc... In addition, it increases Adj-RIB-In sizes [on many boxes] and
> > can have a significant impact on steady state memory utilization.
> > Imagine multiple levels of reflection or more than two reflectors for a
> > given cluster, etc.. The impact of propagating and maintaining
> > redundant paths with slightly different attribute pairings, especially
> > in complex topologies, should be heavily weighed.
> >
> > What I'd _probably recommend is a common cluster_id for all RRs withing
> > a cluster, a full mesh of iBGP sessions between clients and loopback
> > iBGP peering everywhere such that if the client<->RR1 link fails there's
> > an alternative path for the BGP session via RR2 (after all, the
> > connectivity is there anyway) and nothings disrupted. There are lots of
> > other variable to be considered as well, but IMO, simply using different
> > cuslter_ids isn't a clean solution.
> >
> > -danny
---end quoted text---
BGP route-reflection question [ In reply to ]
Dmitri,

Two points that must be made clearer, in my view.

1) Cluster id's are required to prevent looping in hierarchical RR designs,
regardless of whether or not the clients are originator_id-aware. Since the
RRs will not match the originator_id's they will not be able to tell that
they have already reflected a route. This would be analogous to saying "I
wish Juniper didn't require my ASN to be prepended to the AS PATH." There
are reasons - good ones - based on well know DV mechanisms that require
enforcement of these rules. Sure, if you hate Split Horizon you can disable
it in RIP, but you better have a love for loops! Since IBGP has no path
state, is recursive to the endpoints, and is largely unaware of the
loopiness (or lack thereof) of the underlying transport, certain limitations
are imposed to ensure NLRI loop-freedom. One is full mesh, or more
accurately, update non-transitivity. Break this rule, and there is nothing
in IBGP that will tell a generic BGP speaker that propogated an external
route that it just learned that same route from an internal peer. RRs
"bend" this rule, but impose new ones. Break them at your own risk!

2) Danny made a subtle, but very important point - one that those of us who
were overambitious with our intracluster RR redunancy goals where sad to
learn for ourselves. The more reflection performed to the same client, the
more routes it must store/process/index/sort/match against policy/etc. If
you have three RRs, then the major current implementations will store all
three of the same thing (with there being enough difference that it forces
BGP to package them differently, but doesn't affect the decision process all
that much). This could mean 330,000 routes for a singly-connected, full BGP
feed. With two full feeds, this number can double - and so on. At what
point does your router run out of gas? Do you want your router to have to
store, package, sort, index, scan, etc 220,000 routes or 660,000? Which do
you think is easier? How much redunancy do you need?

I say these things because I have lived them. Direct iBGP sessions have
little utility compared to Lo0-Lo0 peerings. If you have the latter, than 2
RRs with the same cluster id should be all you need for a router with degree
2 (two uplinks). Anything more provides more pain than pleasure...

Just my .02

-chris

PS

The guys who "invented" RR were pretty thorough in exploring most of these
issues. In fact, with the exception of persistent oscillation (more a MED
prob than RR/confed), there are no known issues (outside of abstract,
loosely applied theory and misconfig/buggy code or load/processing
pathologies) that are known to cause loops or divergence of an iBGP network.
And its been a few years since the first RR draft was posted! ;)



> -----Original Message-----
> From: Dmitri Kalintsev [mailto:dek@hades.uz]
> Sent: Thursday, May 29, 2003 8:12 PM
> To: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] BGP route-reflection question
>
>
> Hmm, this has turned out to be a somewhat
> hotter-than-anticipated discussion, so I went to the source,
> as any good Luke would. The RFC2796
> says:
>
> "In a simple configuration the backbone could be divided into many
> clusters. Each RR would be configured with other RRs as
> Non-Client peers
> (thus all the RRs will be fully meshed.). The Clients will
> be configured
> to maintain IBGP session only with the RR in their
> cluster. Due to route
> reflection, all the IBGP speakers will receive reflected routing
> information."
>
> So, having a client talking to two RRs in different clusters
> contradicts this RFC. We're back to the square one.
>
> What I want to say is that in an ideal world I would have
> appreciated the ability NOT to set the cluster ID, reverting
> back to the originator-id loop detection mechanism. I think
> that the network designer should be given the right to choose
> his own poison, and feel that the way Juniper's config
> imposes the use of cluster-ids when configuring an RR client
> is a weeny bit pushy. ;^P
>
> Just my 2c.
> --
> D.K.
>
> On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> >
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > Hi Dmitri,
> >
> > I have to say that I don't necessarily *recommend* using different
> > cluster IDs in the same cluster. I merely said that it is
> a means to
> > achieving what you wanted. I knew that Hannes specifically and
> > possibly Juniper generally recommends doing this but I am
> with Danny
> > on this and personally recommend using the same cluster ID
> and doing
> > all iBGP from lo0 to lo0. IMHO, using different cluster IDs
> wins you
> > little in a well structured network and can cost you a lot (as
> > described by Danny).
> >
> > No offence intended Hannes :-)
> >
> > Regards,
> >
> > Guy
> >
> > > -----Original Message-----
> > > From: Danny McPherson [mailto:danny@tcb.net]
> > > Sent: Thursday, May 29, 2003 1:05 AM
> > > To: juniper-nsp@puck.nether.net
> > > Subject: Re: [j-nsp] BGP route-reflection question
> > >
> > >
> > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
> > >
> > > > P.S. I've noticed yesterday that the other vendor is
> now also says
> > > > that having more than one RR in the same cluster is "not
> > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > >
> > > Folks should be careful here, I'm not sure that this is truly a
> > > "recommended" design, per se, as it can effect lots of things
> > > significantly. For example, less optimal BGP update packing and
> > > subsequently, slower convergence & much higher CPU resource
> > > utilization, etc... In addition, it increases Adj-RIB-In
> sizes [on
> > > many boxes] and can have a significant impact on steady
> state memory
> > > utilization. Imagine multiple levels of reflection or
> more than two
> > > reflectors for a given cluster, etc.. The impact of
> propagating and
> > > maintaining redundant paths with slightly different attribute
> > > pairings, especially in complex topologies, should be heavily
> > > weighed.
> > >
> > > What I'd _probably recommend is a common cluster_id for all RRs
> > > withing a cluster, a full mesh of iBGP sessions between
> clients and
> > > loopback iBGP peering everywhere such that if the
> client<->RR1 link
> > > fails there's an alternative path for the BGP session via
> RR2 (after
> > > all, the connectivity is there anyway) and nothings disrupted.
> > > There are lots of other variable to be considered as
> well, but IMO,
> > > simply using different cuslter_ids isn't a clean solution.
> > >
> > > -danny
> ---end quoted text--- _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/junipe> r-nsp
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://puck.nether.net/pipermail/juniper-nsp/attachments/20030529/70d5bc66/attachment-0001.htm
BGP route-reflection question [ In reply to ]
Hi Martin,

I guess now I should go back to the issue that I had that prompted me with
the good side of being able to disable to use of cluster-ids. Consider the
following configuration:

RR1---RR2
\ /
\C1/
+----Important LAN---

Links RR1 - RR2 is POS, both links to C1 is GigE, and C1 is a L3 switch.
Both RR1 and RR2 provide independent links to the rest of the network.

Now to the configs (sorry, but all configs in the example will be cisco):

RR1:
---
int lo0
ip add 10.0.0.1 255.255.255.255
!
int pos1/0
ip add 1.1.1.1 255.255.255.252
!
int Gig2/0
ip add 2.2.2.1 255.255.255.0 <= NOTE THE /24 mask
no ip proxy-arp
!
router ospf 1
network 1.1.1.0 0.0.0.255 area 0
!
router bgp 11111
bgp cluster-id 10
neigh 10.0.0.2 remote-as 11111
neigh 10.0.0.2 upda Lo0

RR2:
---
int lo0
ip add 10.0.0.2 255.255.255.255
!
int pos1/0
ip add 1.1.1.2 255.255.255.252
!
int Gig2/0
ip add 2.2.2.2 255.255.255.0 <= NOTE THE /24 mask
no ip proxy-arp
!
router ospf 1
network 1.1.1.0 0.0.0.255 area 0
!
router bgp 11111
bgp cluster-id 10
neigh 10.0.0.1 remote-as 11111
neigh 10.0.0.1 upda Lo0


C1:
---
int lo0
ip add 10.0.0.3 255.255.255.255
!
int VLAN10
ip add 2.2.2.3 255.255.255.0 <= NOTE THE /24 mask
!
int VLANx
desc Important VLAN#1
ip add <whatever>

Requirement:

C1 connects a few networks in quite an important LAN to the internet via
RR1/RR2. iBGP is a requirement between the C1 and RR1/RR2 (there is another
exit from this LAN, so static routing in from RR1/RR2 is not an option), and
there is NO dynamic routing between RR1/RR2 and C1, nor it's a good idea to
configure it (so presume - static routing ONLY plus iBGP). The C1 also talks
to some other L3 switches in the Important_LAN via OSPF.

Yes, I know that this situation is a nasty stockpile of recipies for
disaster, but that's what I had to deal with.

Um, one more complication: you can't change cluster-id's or delete them.
I'll be very interested to see the elegant solution to this (although now
obsolete, thanks God!) situation.

SY,
--
D.K.

On Thu, May 29, 2003 at 08:58:48PM -0400, Martin, Christian wrote:
> Dmitri,
>
> Two points that must be made clearer, in my view.
>
> 1) Cluster id's are required to prevent looping in hierarchical RR designs,
> regardless of whether or not the clients are originator_id-aware. Since the
> RRs will not match the originator_id's they will not be able to tell that
> they have already reflected a route. This would be analogous to saying "I
> wish Juniper didn't require my ASN to be prepended to the AS PATH." There
> are reasons - good ones - based on well know DV mechanisms that require
> enforcement of these rules. Sure, if you hate Split Horizon you can disable
> it in RIP, but you better have a love for loops! Since IBGP has no path
> state, is recursive to the endpoints, and is largely unaware of the
> loopiness (or lack thereof) of the underlying transport, certain limitations
> are imposed to ensure NLRI loop-freedom. One is full mesh, or more
> accurately, update non-transitivity. Break this rule, and there is nothing
> in IBGP that will tell a generic BGP speaker that propogated an external
> route that it just learned that same route from an internal peer. RRs
> "bend" this rule, but impose new ones. Break them at your own risk!
>
> 2) Danny made a subtle, but very important point - one that those of us who
> were overambitious with our intracluster RR redunancy goals where sad to
> learn for ourselves. The more reflection performed to the same client, the
> more routes it must store/process/index/sort/match against policy/etc. If
> you have three RRs, then the major current implementations will store all
> three of the same thing (with there being enough difference that it forces
> BGP to package them differently, but doesn't affect the decision process all
> that much). This could mean 330,000 routes for a singly-connected, full BGP
> feed. With two full feeds, this number can double - and so on. At what
> point does your router run out of gas? Do you want your router to have to
> store, package, sort, index, scan, etc 220,000 routes or 660,000? Which do
> you think is easier? How much redunancy do you need?
>
> I say these things because I have lived them. Direct iBGP sessions have
> little utility compared to Lo0-Lo0 peerings. If you have the latter, than 2
> RRs with the same cluster id should be all you need for a router with degree
> 2 (two uplinks). Anything more provides more pain than pleasure...
>
> Just my .02
>
> -chris
>
> PS
>
> The guys who "invented" RR were pretty thorough in exploring most of these
> issues. In fact, with the exception of persistent oscillation (more a MED
> prob than RR/confed), there are no known issues (outside of abstract,
> loosely applied theory and misconfig/buggy code or load/processing
> pathologies) that are known to cause loops or divergence of an iBGP network.
> And its been a few years since the first RR draft was posted! ;)
>
>
>
> > -----Original Message-----
> > From: Dmitri Kalintsev [mailto:dek@hades.uz]
> > Sent: Thursday, May 29, 2003 8:12 PM
> > To: juniper-nsp@puck.nether.net
> > Subject: Re: [j-nsp] BGP route-reflection question
> >
> >
> > Hmm, this has turned out to be a somewhat
> > hotter-than-anticipated discussion, so I went to the source,
> > as any good Luke would. The RFC2796
> > says:
> >
> > "In a simple configuration the backbone could be divided into many
> > clusters. Each RR would be configured with other RRs as
> > Non-Client peers
> > (thus all the RRs will be fully meshed.). The Clients will
> > be configured
> > to maintain IBGP session only with the RR in their
> > cluster. Due to route
> > reflection, all the IBGP speakers will receive reflected routing
> > information."
> >
> > So, having a client talking to two RRs in different clusters
> > contradicts this RFC. We're back to the square one.
> >
> > What I want to say is that in an ideal world I would have
> > appreciated the ability NOT to set the cluster ID, reverting
> > back to the originator-id loop detection mechanism. I think
> > that the network designer should be given the right to choose
> > his own poison, and feel that the way Juniper's config
> > imposes the use of cluster-ids when configuring an RR client
> > is a weeny bit pushy. ;^P
> >
> > Just my 2c.
> > --
> > D.K.
> >
> > On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> > >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA1
> > >
> > > Hi Dmitri,
> > >
> > > I have to say that I don't necessarily *recommend* using different
> > > cluster IDs in the same cluster. I merely said that it is
> > a means to
> > > achieving what you wanted. I knew that Hannes specifically and
> > > possibly Juniper generally recommends doing this but I am
> > with Danny
> > > on this and personally recommend using the same cluster ID
> > and doing
> > > all iBGP from lo0 to lo0. IMHO, using different cluster IDs
> > wins you
> > > little in a well structured network and can cost you a lot (as
> > > described by Danny).
> > >
> > > No offence intended Hannes :-)
> > >
> > > Regards,
> > >
> > > Guy
> > >
> > > > -----Original Message-----
> > > > From: Danny McPherson [mailto:danny@tcb.net]
> > > > Sent: Thursday, May 29, 2003 1:05 AM
> > > > To: juniper-nsp@puck.nether.net
> > > > Subject: Re: [j-nsp] BGP route-reflection question
> > > >
> > > >
> > > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
> > > >
> > > > > P.S. I've noticed yesterday that the other vendor is
> > now also says
> > > > > that having more than one RR in the same cluster is "not
> > > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > > >
> > > > Folks should be careful here, I'm not sure that this is truly a
> > > > "recommended" design, per se, as it can effect lots of things
> > > > significantly. For example, less optimal BGP update packing and
> > > > subsequently, slower convergence & much higher CPU resource
> > > > utilization, etc... In addition, it increases Adj-RIB-In
> > sizes [on
> > > > many boxes] and can have a significant impact on steady
> > state memory
> > > > utilization. Imagine multiple levels of reflection or
> > more than two
> > > > reflectors for a given cluster, etc.. The impact of
> > propagating and
> > > > maintaining redundant paths with slightly different attribute
> > > > pairings, especially in complex topologies, should be heavily
> > > > weighed.
> > > >
> > > > What I'd _probably recommend is a common cluster_id for all RRs
> > > > withing a cluster, a full mesh of iBGP sessions between
> > clients and
> > > > loopback iBGP peering everywhere such that if the
> > client<->RR1 link
> > > > fails there's an alternative path for the BGP session via
> > RR2 (after
> > > > all, the connectivity is there anyway) and nothings disrupted.
> > > > There are lots of other variable to be considered as
> > well, but IMO,
> > > > simply using different cuslter_ids isn't a clean solution.
> > > >
> > > > -danny
> > ---end quoted text--- _______________________________________________
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> >
---end quoted text---

--
D.K.
BGP route-reflection question [ In reply to ]
Wouldn't the solution be to use Lo0 to Lo0 peering, as you said?
Then you don't have to worry about the cluster-id problem in the
first place...

-c

On Fri, May 30, 2003 at 11:45:48AM +1000, 'Dmitri Kalintsev' wrote:
> Hi Martin,
>
> I guess now I should go back to the issue that I had that prompted me with
> the good side of being able to disable to use of cluster-ids. Consider the
> following configuration:
>
> RR1---RR2
> \ /
> \C1/
> +----Important LAN---
>
> Links RR1 - RR2 is POS, both links to C1 is GigE, and C1 is a L3 switch.
> Both RR1 and RR2 provide independent links to the rest of the network.
>
> Now to the configs (sorry, but all configs in the example will be cisco):
>
> RR1:
> ---
> int lo0
> ip add 10.0.0.1 255.255.255.255
> !
> int pos1/0
> ip add 1.1.1.1 255.255.255.252
> !
> int Gig2/0
> ip add 2.2.2.1 255.255.255.0 <= NOTE THE /24 mask
> no ip proxy-arp
> !
> router ospf 1
> network 1.1.1.0 0.0.0.255 area 0
> !
> router bgp 11111
> bgp cluster-id 10
> neigh 10.0.0.2 remote-as 11111
> neigh 10.0.0.2 upda Lo0
>
> RR2:
> ---
> int lo0
> ip add 10.0.0.2 255.255.255.255
> !
> int pos1/0
> ip add 1.1.1.2 255.255.255.252
> !
> int Gig2/0
> ip add 2.2.2.2 255.255.255.0 <= NOTE THE /24 mask
> no ip proxy-arp
> !
> router ospf 1
> network 1.1.1.0 0.0.0.255 area 0
> !
> router bgp 11111
> bgp cluster-id 10
> neigh 10.0.0.1 remote-as 11111
> neigh 10.0.0.1 upda Lo0
>
>
> C1:
> ---
> int lo0
> ip add 10.0.0.3 255.255.255.255
> !
> int VLAN10
> ip add 2.2.2.3 255.255.255.0 <= NOTE THE /24 mask
> !
> int VLANx
> desc Important VLAN#1
> ip add <whatever>
>
> Requirement:
>
> C1 connects a few networks in quite an important LAN to the internet via
> RR1/RR2. iBGP is a requirement between the C1 and RR1/RR2 (there is another
> exit from this LAN, so static routing in from RR1/RR2 is not an option), and
> there is NO dynamic routing between RR1/RR2 and C1, nor it's a good idea to
> configure it (so presume - static routing ONLY plus iBGP). The C1 also talks
> to some other L3 switches in the Important_LAN via OSPF.
>
> Yes, I know that this situation is a nasty stockpile of recipies for
> disaster, but that's what I had to deal with.
>
> Um, one more complication: you can't change cluster-id's or delete them.
> I'll be very interested to see the elegant solution to this (although now
> obsolete, thanks God!) situation.
>
> SY,
> --
> D.K.
>
> On Thu, May 29, 2003 at 08:58:48PM -0400, Martin, Christian wrote:
> > Dmitri,
> >
> > Two points that must be made clearer, in my view.
> >
> > 1) Cluster id's are required to prevent looping in hierarchical RR designs,
> > regardless of whether or not the clients are originator_id-aware. Since the
> > RRs will not match the originator_id's they will not be able to tell that
> > they have already reflected a route. This would be analogous to saying "I
> > wish Juniper didn't require my ASN to be prepended to the AS PATH." There
> > are reasons - good ones - based on well know DV mechanisms that require
> > enforcement of these rules. Sure, if you hate Split Horizon you can disable
> > it in RIP, but you better have a love for loops! Since IBGP has no path
> > state, is recursive to the endpoints, and is largely unaware of the
> > loopiness (or lack thereof) of the underlying transport, certain limitations
> > are imposed to ensure NLRI loop-freedom. One is full mesh, or more
> > accurately, update non-transitivity. Break this rule, and there is nothing
> > in IBGP that will tell a generic BGP speaker that propogated an external
> > route that it just learned that same route from an internal peer. RRs
> > "bend" this rule, but impose new ones. Break them at your own risk!
> >
> > 2) Danny made a subtle, but very important point - one that those of us who
> > were overambitious with our intracluster RR redunancy goals where sad to
> > learn for ourselves. The more reflection performed to the same client, the
> > more routes it must store/process/index/sort/match against policy/etc. If
> > you have three RRs, then the major current implementations will store all
> > three of the same thing (with there being enough difference that it forces
> > BGP to package them differently, but doesn't affect the decision process all
> > that much). This could mean 330,000 routes for a singly-connected, full BGP
> > feed. With two full feeds, this number can double - and so on. At what
> > point does your router run out of gas? Do you want your router to have to
> > store, package, sort, index, scan, etc 220,000 routes or 660,000? Which do
> > you think is easier? How much redunancy do you need?
> >
> > I say these things because I have lived them. Direct iBGP sessions have
> > little utility compared to Lo0-Lo0 peerings. If you have the latter, than 2
> > RRs with the same cluster id should be all you need for a router with degree
> > 2 (two uplinks). Anything more provides more pain than pleasure...
> >
> > Just my .02
> >
> > -chris
> >
> > PS
> >
> > The guys who "invented" RR were pretty thorough in exploring most of these
> > issues. In fact, with the exception of persistent oscillation (more a MED
> > prob than RR/confed), there are no known issues (outside of abstract,
> > loosely applied theory and misconfig/buggy code or load/processing
> > pathologies) that are known to cause loops or divergence of an iBGP network.
> > And its been a few years since the first RR draft was posted! ;)
> >
> >
> >
> > > -----Original Message-----
> > > From: Dmitri Kalintsev [mailto:dek@hades.uz]
> > > Sent: Thursday, May 29, 2003 8:12 PM
> > > To: juniper-nsp@puck.nether.net
> > > Subject: Re: [j-nsp] BGP route-reflection question
> > >
> > >
> > > Hmm, this has turned out to be a somewhat
> > > hotter-than-anticipated discussion, so I went to the source,
> > > as any good Luke would. The RFC2796
> > > says:
> > >
> > > "In a simple configuration the backbone could be divided into many
> > > clusters. Each RR would be configured with other RRs as
> > > Non-Client peers
> > > (thus all the RRs will be fully meshed.). The Clients will
> > > be configured
> > > to maintain IBGP session only with the RR in their
> > > cluster. Due to route
> > > reflection, all the IBGP speakers will receive reflected routing
> > > information."
> > >
> > > So, having a client talking to two RRs in different clusters
> > > contradicts this RFC. We're back to the square one.
> > >
> > > What I want to say is that in an ideal world I would have
> > > appreciated the ability NOT to set the cluster ID, reverting
> > > back to the originator-id loop detection mechanism. I think
> > > that the network designer should be given the right to choose
> > > his own poison, and feel that the way Juniper's config
> > > imposes the use of cluster-ids when configuring an RR client
> > > is a weeny bit pushy. ;^P
> > >
> > > Just my 2c.
> > > --
> > > D.K.
> > >
> > > On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> > > >
> > > > -----BEGIN PGP SIGNED MESSAGE-----
> > > > Hash: SHA1
> > > >
> > > > Hi Dmitri,
> > > >
> > > > I have to say that I don't necessarily *recommend* using different
> > > > cluster IDs in the same cluster. I merely said that it is
> > > a means to
> > > > achieving what you wanted. I knew that Hannes specifically and
> > > > possibly Juniper generally recommends doing this but I am
> > > with Danny
> > > > on this and personally recommend using the same cluster ID
> > > and doing
> > > > all iBGP from lo0 to lo0. IMHO, using different cluster IDs
> > > wins you
> > > > little in a well structured network and can cost you a lot (as
> > > > described by Danny).
> > > >
> > > > No offence intended Hannes :-)
> > > >
> > > > Regards,
> > > >
> > > > Guy
> > > >
> > > > > -----Original Message-----
> > > > > From: Danny McPherson [mailto:danny@tcb.net]
> > > > > Sent: Thursday, May 29, 2003 1:05 AM
> > > > > To: juniper-nsp@puck.nether.net
> > > > > Subject: Re: [j-nsp] BGP route-reflection question
> > > > >
> > > > >
> > > > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
> > > > >
> > > > > > P.S. I've noticed yesterday that the other vendor is
> > > now also says
> > > > > > that having more than one RR in the same cluster is "not
> > > > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > > > >
> > > > > Folks should be careful here, I'm not sure that this is truly a
> > > > > "recommended" design, per se, as it can effect lots of things
> > > > > significantly. For example, less optimal BGP update packing and
> > > > > subsequently, slower convergence & much higher CPU resource
> > > > > utilization, etc... In addition, it increases Adj-RIB-In
> > > sizes [on
> > > > > many boxes] and can have a significant impact on steady
> > > state memory
> > > > > utilization. Imagine multiple levels of reflection or
> > > more than two
> > > > > reflectors for a given cluster, etc.. The impact of
> > > propagating and
> > > > > maintaining redundant paths with slightly different attribute
> > > > > pairings, especially in complex topologies, should be heavily
> > > > > weighed.
> > > > >
> > > > > What I'd _probably recommend is a common cluster_id for all RRs
> > > > > withing a cluster, a full mesh of iBGP sessions between
> > > clients and
> > > > > loopback iBGP peering everywhere such that if the
> > > client<->RR1 link
> > > > > fails there's an alternative path for the BGP session via
> > > RR2 (after
> > > > > all, the connectivity is there anyway) and nothings disrupted.
> > > > > There are lots of other variable to be considered as
> > > well, but IMO,
> > > > > simply using different cuslter_ids isn't a clean solution.
> > > > >
> > > > > -danny
> > > ---end quoted text--- _______________________________________________
> > > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > > http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> > >
> ---end quoted text---
>
> --
> D.K.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
BGP route-reflection question [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dmitri,

See inline...

> -----Original Message-----
>
> Hi Martin,
>
> I guess now I should go back to the issue that I had that
> prompted me with
> the good side of being able to disable to use of cluster-ids.
> Consider the
> following configuration:
>
> RR1---RR2
> \ /
> \C1/
> +----Important LAN---
>
> Links RR1 - RR2 is POS, both links to C1 is GigE, and C1 is a
> L3 switch.
> Both RR1 and RR2 provide independent links to the rest of the network.

No problem so far...

> Now to the configs (sorry, but all configs in the example
> will be cisco):
>
> RR1:
> ---
> int lo0
> ip add 10.0.0.1 255.255.255.255
> !
> int pos1/0
> ip add 1.1.1.1 255.255.255.252
> !
> int Gig2/0
> ip add 2.2.2.1 255.255.255.0 <= NOTE THE /24 mask
> no ip proxy-arp
> !
> router ospf 1
> network 1.1.1.0 0.0.0.255 area 0
> !
> router bgp 11111
> bgp cluster-id 10
> neigh 10.0.0.2 remote-as 11111
> neigh 10.0.0.2 upda Lo0
>
> RR2:
> ---
> int lo0
> ip add 10.0.0.2 255.255.255.255
> !
> int pos1/0
> ip add 1.1.1.2 255.255.255.252
> !
> int Gig2/0
> ip add 2.2.2.2 255.255.255.0 <= NOTE THE /24 mask
> no ip proxy-arp
> !
> router ospf 1
> network 1.1.1.0 0.0.0.255 area 0
> !
> router bgp 11111
> bgp cluster-id 10
> neigh 10.0.0.1 remote-as 11111
> neigh 10.0.0.1 upda Lo0
>
>
> C1:
> ---
> int lo0
> ip add 10.0.0.3 255.255.255.255
> !
> int VLAN10
> ip add 2.2.2.3 255.255.255.0 <= NOTE THE /24 mask
> !
> int VLANx
> desc Important VLAN#1
> ip add <whatever>
>
> Requirement:
>
> C1 connects a few networks in quite an important LAN to the
> internet via
> RR1/RR2. iBGP is a requirement between the C1 and RR1/RR2
> (there is another
> exit from this LAN, so static routing in from RR1/RR2 is not
> an option), and
> there is NO dynamic routing between RR1/RR2 and C1, nor it's
> a good idea to
> configure it (so presume - static routing ONLY plus iBGP).

I'm slightly confused here. You say that iBGP is a requirement between C1
and RR1/RR2 but there is no dynamic routing between C1 and RR1/RR2. You've
effectively created an RR cluster with no clients! If you want no dynamic
routing between C1 and RR1/RR2, then your only real option is to have static
routes on RR1 and RR2 for "important LAN" pointing to C1. Those static
routes then need to be redistributed (through a route-map ;-) into iBGP.
You'll end up with two routes in BGP for "important LAN" with the next-hops
of RR1 and RR2's loopbacks.

> The C1 also talks
> to some other L3 switches in the Important_LAN via OSPF.
>
> Yes, I know that this situation is a nasty stockpile of recipies for
> disaster, but that's what I had to deal with.
>
> Um, one more complication: you can't change cluster-id's or
> delete them.

Since you're peering between lo0, you shouldn't need to change anything.
The lack of iBGP between C1 and RR1/RR2 is confusing me since this totally
removes the relevance of the RR.

Regards,

Guy

> I'll be very interested to see the elegant solution to this
> (although now
> obsolete, thanks God!) situation.
>
> SY,
> --
> D.K.
>
> On Thu, May 29, 2003 at 08:58:48PM -0400, Martin, Christian wrote:
> > Dmitri,
> >
> > Two points that must be made clearer, in my view.
> >
> > 1) Cluster id's are required to prevent looping in
> hierarchical RR designs,
> > regardless of whether or not the clients are
> originator_id-aware. Since the
> > RRs will not match the originator_id's they will not be
> able to tell that
> > they have already reflected a route. This would be
> analogous to saying "I
> > wish Juniper didn't require my ASN to be prepended to the
> AS PATH." There
> > are reasons - good ones - based on well know DV mechanisms
> that require
> > enforcement of these rules. Sure, if you hate Split
> Horizon you can disable
> > it in RIP, but you better have a love for loops! Since
> IBGP has no path
> > state, is recursive to the endpoints, and is largely unaware of the
> > loopiness (or lack thereof) of the underlying transport,
> certain limitations
> > are imposed to ensure NLRI loop-freedom. One is full mesh, or more
> > accurately, update non-transitivity. Break this rule, and
> there is nothing
> > in IBGP that will tell a generic BGP speaker that
> propogated an external
> > route that it just learned that same route from an internal
> peer. RRs
> > "bend" this rule, but impose new ones. Break them at your own risk!
> >
> > 2) Danny made a subtle, but very important point - one that
> those of us who
> > were overambitious with our intracluster RR redunancy goals
> where sad to
> > learn for ourselves. The more reflection performed to the
> same client, the
> > more routes it must store/process/index/sort/match against
> policy/etc. If
> > you have three RRs, then the major current implementations
> will store all
> > three of the same thing (with there being enough difference
> that it forces
> > BGP to package them differently, but doesn't affect the
> decision process all
> > that much). This could mean 330,000 routes for a
> singly-connected, full BGP
> > feed. With two full feeds, this number can double - and so
> on. At what
> > point does your router run out of gas? Do you want your
> router to have to
> > store, package, sort, index, scan, etc 220,000 routes or
> 660,000? Which do
> > you think is easier? How much redunancy do you need?
> >
> > I say these things because I have lived them. Direct iBGP
> sessions have
> > little utility compared to Lo0-Lo0 peerings. If you have
> the latter, than 2
> > RRs with the same cluster id should be all you need for a
> router with degree
> > 2 (two uplinks). Anything more provides more pain than pleasure...
> >
> > Just my .02
> >
> > -chris
> >
> > PS
> >
> > The guys who "invented" RR were pretty thorough in
> exploring most of these
> > issues. In fact, with the exception of persistent
> oscillation (more a MED
> > prob than RR/confed), there are no known issues (outside of
> abstract,
> > loosely applied theory and misconfig/buggy code or load/processing
> > pathologies) that are known to cause loops or divergence of
> an iBGP network.
> > And its been a few years since the first RR draft was posted! ;)
> >
> >
> >
> > > -----Original Message-----
> > > From: Dmitri Kalintsev [mailto:dek@hades.uz]
> > > Sent: Thursday, May 29, 2003 8:12 PM
> > > To: juniper-nsp@puck.nether.net
> > > Subject: Re: [j-nsp] BGP route-reflection question
> > >
> > >
> > > Hmm, this has turned out to be a somewhat
> > > hotter-than-anticipated discussion, so I went to the source,
> > > as any good Luke would. The RFC2796
> > > says:
> > >
> > > "In a simple configuration the backbone could be
> divided into many
> > > clusters. Each RR would be configured with other RRs as
> > > Non-Client peers
> > > (thus all the RRs will be fully meshed.). The Clients will
> > > be configured
> > > to maintain IBGP session only with the RR in their
> > > cluster. Due to route
> > > reflection, all the IBGP speakers will receive
> reflected routing
> > > information."
> > >
> > > So, having a client talking to two RRs in different clusters
> > > contradicts this RFC. We're back to the square one.
> > >
> > > What I want to say is that in an ideal world I would have
> > > appreciated the ability NOT to set the cluster ID, reverting
> > > back to the originator-id loop detection mechanism. I think
> > > that the network designer should be given the right to choose
> > > his own poison, and feel that the way Juniper's config
> > > imposes the use of cluster-ids when configuring an RR client
> > > is a weeny bit pushy. ;^P
> > >
> > > Just my 2c.
> > > --
> > > D.K.
> > >
> > > On Thu, May 29, 2003 at 09:25:48AM +0100, Guy Davies wrote:
> > > >
> > > > -----BEGIN PGP SIGNED MESSAGE-----
> > > > Hash: SHA1
> > > >
> > > > Hi Dmitri,
> > > >
> > > > I have to say that I don't necessarily *recommend*
> using different
> > > > cluster IDs in the same cluster. I merely said that it is
> > > a means to
> > > > achieving what you wanted. I knew that Hannes specifically and
> > > > possibly Juniper generally recommends doing this but I am
> > > with Danny
> > > > on this and personally recommend using the same cluster ID
> > > and doing
> > > > all iBGP from lo0 to lo0. IMHO, using different cluster IDs
> > > wins you
> > > > little in a well structured network and can cost you a lot (as
> > > > described by Danny).
> > > >
> > > > No offence intended Hannes :-)
> > > >
> > > > Regards,
> > > >
> > > > Guy
> > > >
> > > > > -----Original Message-----
> > > > > From: Danny McPherson [mailto:danny@tcb.net]
> > > > > Sent: Thursday, May 29, 2003 1:05 AM
> > > > > To: juniper-nsp@puck.nether.net
> > > > > Subject: Re: [j-nsp] BGP route-reflection question
> > > > >
> > > > >
> > > > > On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
> > > > >
> > > > > > P.S. I've noticed yesterday that the other vendor is
> > > now also says
> > > > > > that having more than one RR in the same cluster is "not
> > > > > > recommended". *Sigh*, the world has changed, hasn't it? ;)
> > > > >
> > > > > Folks should be careful here, I'm not sure that this
> is truly a
> > > > > "recommended" design, per se, as it can effect lots of things
> > > > > significantly. For example, less optimal BGP update
> packing and
> > > > > subsequently, slower convergence & much higher CPU resource
> > > > > utilization, etc... In addition, it increases Adj-RIB-In
> > > sizes [on
> > > > > many boxes] and can have a significant impact on steady
> > > state memory
> > > > > utilization. Imagine multiple levels of reflection or
> > > more than two
> > > > > reflectors for a given cluster, etc.. The impact of
> > > propagating and
> > > > > maintaining redundant paths with slightly different attribute
> > > > > pairings, especially in complex topologies, should be heavily
> > > > > weighed.
> > > > >
> > > > > What I'd _probably recommend is a common cluster_id
> for all RRs
> > > > > withing a cluster, a full mesh of iBGP sessions between
> > > clients and
> > > > > loopback iBGP peering everywhere such that if the
> > > client<->RR1 link
> > > > > fails there's an alternative path for the BGP session via
> > > RR2 (after
> > > > > all, the connectivity is there anyway) and nothings
> disrupted.
> > > > > There are lots of other variable to be considered as
> > > well, but IMO,
> > > > > simply using different cuslter_ids isn't a clean solution.
> > > > >
> > > > > -danny
> > > ---end quoted text---
> _______________________________________________
> > > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > > http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> > >
> ---end quoted text---
>
> --
> D.K.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
>

-----BEGIN PGP SIGNATURE-----
Version: PGP 8.0

iQA/AwUBPtcZlI3dwu/Ss2PCEQLeBQCgo9KkbcLWq07vN3ivG2GLDYlcrDMAnj+b
FExpCdMCtflLTGK2h2Mcno0T
=Xhtk
-----END PGP SIGNATURE-----
BGP route-reflection question [ In reply to ]
On Fri, 2003-05-30 at 03:45, 'Dmitri Kalintsev' wrote:
> C1 connects a few networks in quite an important LAN to the internet via
> RR1/RR2. iBGP is a requirement between the C1 and RR1/RR2 (there is another
> exit from this LAN, so static routing in from RR1/RR2 is not an option), and
> there is NO dynamic routing between RR1/RR2 and C1, nor it's a good idea to
> configure it (so presume - static routing ONLY plus iBGP). The C1 also talks
> to some other L3 switches in the Important_LAN via OSPF.

Well, as you correctly say, Lo0-Lo0 peering will not work in the setup
described here since C1 obviously does not know how to reach the
loopbacks of your RR's and vice versa. This whole setup of yours looks
fundamentally WEIRD though. :)

But still, if there is some really really good reason you cannot run an
IGP between these, then the solution still shouldn't more complicated
than using appropriate static routes on all the boxes, pointing at the
various loopbacks - and then floating the statics for the backup paths
using distance/preference.

/leg
BGP route-reflection question [ In reply to ]
> On 5/28/03 5:23 PM, "'Dmitri Kalintsev'" <dek@hades.uz> wrote:
>> P.S. I've noticed yesterday that the other vendor is now also says that
>> having more than one RR in the same cluster is "not recommended". *Sigh*,
>> the world has changed, hasn't it? ;)

* danny@tcb.net (Danny McPherson) [Thu 29 May 2003, 02:05 CEST]:
> Folks should be careful here, I'm not sure that this is truly a
> "recommended" design, per se, as it can effect lots of things significantly.
> For example, less optimal BGP update packing and subsequently, slower
> convergence & much higher CPU resource utilization, etc... In addition, it
> increases Adj-RIB-In sizes [on many boxes] and can have a significant impact
> on steady state memory utilization. Imagine multiple levels of reflection
> or more than two reflectors for a given cluster, etc.. The impact of
> propagating and maintaining redundant paths with slightly different
> attribute pairings, especially in complex topologies, should be heavily
> weighed.
>
> What I'd _probably recommend is a common cluster_id for all RRs withing a
> cluster, a full mesh of iBGP sessions between clients and loopback iBGP
> peering everywhere such that if the client<->RR1 link fails there's an
> alternative path for the BGP session via RR2 (after all, the connectivity is
> there anyway) and nothings disrupted. There are lots of other variable to
> be considered as well, but IMO, simply using different cuslter_ids isn't a
> clean solution.

I concur with this. My experience has taught me similar things.

Divide your network into a few hubs, nominate two routers as route
reflectors in each, give each pair identical cluster ID's, configure
full iBGP mesh between all route reflectors, have every router in each
cluster peer with the two route reflectors only. Choose the route
reflectors as it makes engineering sense (e.g. because all routers in a
certain location have different paths to each of them).


-- Niels.

--