Mailing List Archive

RR scalability question
Hello,

I was wondering what are the techniques to estimate the upper limit of
the number of route reflector clients?  Currently we run CSR1000vs,
which now have over 50 clients. BGP scale is as below:

2841509 network entries using 727426304 bytes of memory
2841513 path entries using 340981560 bytes of memory

They are rock solid so far, but this is in a steady state. I'm thinking
about a scenario where multiple BGP sessions fail at once, meaning the
RRs would have much more work to do.

Probably one thing that affects RR performance is the number of update
groups (2 at the moment, will be 3 soon). But still, it doesn't help to
come up with a hard number of clients.


Thank you!

Marcin


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: RR scalability question [ In reply to ]
> Marcin Kurek
> Sent: Wednesday, November 25, 2020 3:55 PM
>
> Hello,
>
> I was wondering what are the techniques to estimate the upper limit of the
> number of route reflector clients? Currently we run CSR1000vs, which now
> have over 50 clients. BGP scale is as below:
>
> 2841509 network entries using 727426304 bytes of memory
> 2841513 path entries using 340981560 bytes of memory
>
> They are rock solid so far, but this is in a steady state. I'm thinking about a
> scenario where multiple BGP sessions fail at once, meaning the RRs would have
> much more work to do.
>
> Probably one thing that affects RR performance is the number of update groups
> (2 at the moment, will be 3 soon). But still, it doesn't help to come up with a
> hard number of clients.
>
Best thing you can do is not estimate, but actually test.
Source identical server that your production RRs are running at for LAB purposes, run the CSR1000v on it and perform scaling testing to a breakpoint with some opensource route generator.
That will give you an idea of how much headroom you have with the numbers you quoted above. (e.g. if it breaks at 20M prefixes you might set yourself a conservative limit of 10M)

You can also carry out performance testing with the same setup to, for example, get an idea of how long would your RR take to establish sessions to say 20 clients and pump 3M prefixes to each, etc...
Tests like these will give you real life expectations for the performance of your production RRs setup.

With a lab setup like this you can then test effects of adding new features/AFs/or update groups.
RRs is a critical piece of infrastructure and an absolute must have in a lab setup, so one can test changes to the setup in a safe environment prior to rolling these into production.

In my experience estimating is useless, each setup is unique, heck even had to do some fine tuning of lab setup to replicate some specific scaling problems in production.

adam


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: RR scalability question [ In reply to ]
> Best thing you can do is not estimate, but actually test.
> Source identical server that your production RRs are running at for LAB purposes, run the CSR1000v on it and perform scaling testing to a breakpoint with some opensource route generator.
> That will give you an idea of how much headroom you have with the numbers you quoted above. (e.g. if it breaks at 20M prefixes you might set yourself a conservative limit of 10M)
>
> You can also carry out performance testing with the same setup to, for example, get an idea of how long would your RR take to establish sessions to say 20 clients and pump 3M prefixes to each, etc...
> Tests like these will give you real life expectations for the performance of your production RRs setup.
>
> With a lab setup like this you can then test effects of adding new features/AFs/or update groups.
> RRs is a critical piece of infrastructure and an absolute must have in a lab setup, so one can test changes to the setup in a safe environment prior to rolling these into production.
>
> In my experience estimating is useless, each setup is unique, heck even had to do some fine tuning of lab setup to replicate some specific scaling problems in production.
>
> adam
>

Thank you, Adam!

I agree that lab testing would be a good move. Actually I was thinking
about spinning up test topology in AWS, they have CSR1000v, XR9000v and
other machines in AWS Marketplace.
However, this way I won't be comparing apples to apples, because of
differences in underlying hardware.

Is there any particular open source route generator you had in mind? Exabgp?

Kind regards,

Marcin


_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: RR scalability question [ In reply to ]
On 11/25/20 20:57, adamv0025@netconsultings.com wrote:

>> Best thing you can do is not estimate, but actually test.

Agree.

I recall doing this back in 2014 when we first launched the CSR1000v as
an RR.

It had about 20 clients, each needing a full feed.

We reset all sessions on the RR. Between the RR's, we are talking
seconds to exchange the full feed. Toward the other clients, the rate of
push was at what they could accommodate. Either way, CSR1000v didn't flinch.

In real life, we've had RR's that have lost sessions to hundreds of
routers due IS-IS issues in recent years. Again, no hit on CSR1000v.
That thing just keeps chugging along.

I know, more anecdotal than empirical, but my feeling is it will take A
LOT to break a well-designed CSR1000v running on a decent server.

Mark.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/