Mailing List Archive

Junos 20 - slow RPD
Hi,

On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is
way slower in processing BGP policies and sending the routes to neighbors.
For example, on a BGP group with one neighbor and an export policy
containing 5 terms each matching a community it takes ~1min ( 100% RPD
utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s
in 18.2.
Disabling terms will reduce the time.

Anyone experienced something similar?

Thanks!

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
Hey,

> On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is
> way slower in processing BGP policies and sending the routes to neighbors.
> For example, on a BGP group with one neighbor and an export policy
> containing 5 terms each matching a community it takes ~1min ( 100% RPD
> utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s
> in 18.2.
> Disabling terms will reduce the time.
>
> Anyone experienced something similar?

I don't recognise this problem specifically. It seems rather terrible
regression so you probably should either open a JTAC case or do the
Junos dance. If you have a large RIB/FIB ratio allowing more than 1
core to work on BGP will produce improvement:

set system processes routing bgp rib-sharding number-of-shards 4
set system processes routing bgp update-threading

This is a disruptive change. JNPR wanted us on 20.3 (we are on
20.3R3-S2) for rib-sharding, but we did run it previously on 20.2R3-S3
with success. We are currently targeting 21.4R1-S1.

If you have memory pressure, you can expand the default 16GB DRAM to
24GB DRAM via configuration toggle (post 21.2R1). If you are
comfortable hacking QEMU/KVM config manually, you can do it on any
release and can entertain other sizes.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
Hi Saku,

The routes are in VRF so no support for rib-sharding unfortunately.
This MX204 is running 20.2R3-S3 so probably the only option is to try
another version.

Thank you for your time and info, very useful as always.

On 22/03/2022 17:58, Saku Ytti wrote:
> Hey,
>
>> On MX204 with ~4M routes, after upgrading from 18.2 to 20.2 the RPD is
>> way slower in processing BGP policies and sending the routes to neighbors.
>> For example, on a BGP group with one neighbor and an export policy
>> containing 5 terms each matching a community it takes ~1min ( 100% RPD
>> utilisation ) to send 1k routes to the neighbor in 20.2 compared to 15s
>> in 18.2.
>> Disabling terms will reduce the time.
>>
>> Anyone experienced something similar?
>
> I don't recognise this problem specifically. It seems rather terrible
> regression so you probably should either open a JTAC case or do the
> Junos dance. If you have a large RIB/FIB ratio allowing more than 1
> core to work on BGP will produce improvement:
>
> set system processes routing bgp rib-sharding number-of-shards 4
> set system processes routing bgp update-threading
>
> This is a disruptive change. JNPR wanted us on 20.3 (we are on
> 20.3R3-S2) for rib-sharding, but we did run it previously on 20.2R3-S3
> with success. We are currently targeting 21.4R1-S1.
>
> If you have memory pressure, you can expand the default 16GB DRAM to
> 24GB DRAM via configuration toggle (post 21.2R1). If you are
> comfortable hacking QEMU/KVM config manually, you can do it on any
> release and can entertain other sizes.
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
On 3/22/22 22:42, Mihai via juniper-nsp wrote:

>
> Hi Saku,
>
> The routes are in VRF so no support for rib-sharding unfortunately.
> This MX204 is running 20.2R3-S3 so probably the only option is to try
> another version.

We've had some terrible experiences with RPD due to NSR sync. to re1 for
BGP, on an RE-S-1800 running Junos 20.4R3.8. Turns out the code can't
deal with grouping outbound updates to eBGP neighbors at scale for that
RE, which crashes RPD on re1.

The options were to either disable NSR, rewrite our outbound policies
and combine multiple customers in the same outbound group, or get more
memory. We went for the last option.

No more problems on the RE-S-X6-64G.

Juniper have some work to do to optimize the code in these use-cases.

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
Hi,
I think that I was the only one with this issue.

Even with a RE-S-X6-64G. We have very slow outbound updates. sending a
lot of fullrouting tables to customers may take upto 60 minutes or more
when you
have a lot of BGP groups , for instance, one group per customer ... and if
the we have an issue with the preferred upstream provider, the customer
routers may me offline
until all updates are sent..

We got new routers and we are going to try Junos 20.4R3 latest service
release with update threading and rib-sharding to see if we get some
improvement, it is better to lost NSR than blackhole
traffic for over an hour..



Em qua., 23 de mar. de 2022 às 06:41, Mark Tinka via juniper-nsp <
juniper-nsp@puck.nether.net> escreveu:

>
>
> On 3/22/22 22:42, Mihai via juniper-nsp wrote:
>
> >
> > Hi Saku,
> >
> > The routes are in VRF so no support for rib-sharding unfortunately.
> > This MX204 is running 20.2R3-S3 so probably the only option is to try
> > another version.
>
> We've had some terrible experiences with RPD due to NSR sync. to re1 for
> BGP, on an RE-S-1800 running Junos 20.4R3.8. Turns out the code can't
> deal with grouping outbound updates to eBGP neighbors at scale for that
> RE, which crashes RPD on re1.
>
> The options were to either disable NSR, rewrite our outbound policies
> and combine multiple customers in the same outbound group, or get more
> memory. We went for the last option.
>
> No more problems on the RE-S-X6-64G.
>
> Juniper have some work to do to optimize the code in these use-cases.
>
> Mark.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
On 3/25/22 02:58, Gustavo Santos via juniper-nsp wrote:

> Hi,
> I think that I was the only one with this issue.

From their feedback, it seems the issue of scaling outbound updates of
full tables to eBGP neighbors is known within Juniper, because they told
us they have had to come up with all manner of hacks for many of their
large scale customers as well.

So it's a fundamental problem, one I'm not sure they are addressing very
well.

We can't keep throwing hardware at the problem.


> it is better to lost NSR than blackhole
> traffic for over an hour..

Agreed - we had gotten to the point where we were willing to give up NSR
until we figure this out.

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
In my case I just upgraded one MX204 in the lab to 21.2R2, enabled
rib-sharding and increased the JunosVM memory to 24G and things look
better now.


On 25/03/2022 00:58, Gustavo Santos via juniper-nsp wrote:
> Hi,
> I think that I was the only one with this issue.
>
> Even with a RE-S-X6-64G. We have very slow outbound updates. sending a
> lot of fullrouting tables to customers may take upto 60 minutes or more
> when you
> have a lot of BGP groups , for instance, one group per customer ... and if
> the we have an issue with the preferred upstream provider, the customer
> routers may me offline
> until all updates are sent..
>
> We got new routers and we are going to try Junos 20.4R3 latest service
> release with update threading and rib-sharding to see if we get some
> improvement, it is better to lost NSR than blackhole
> traffic for over an hour..
>
>
>
> Em qua., 23 de mar. de 2022 às 06:41, Mark Tinka via juniper-nsp <
> juniper-nsp@puck.nether.net> escreveu:
>
>>
>>
>> On 3/22/22 22:42, Mihai via juniper-nsp wrote:
>>
>>>
>>> Hi Saku,
>>>
>>> The routes are in VRF so no support for rib-sharding unfortunately.
>>> This MX204 is running 20.2R3-S3 so probably the only option is to try
>>> another version.
>>
>> We've had some terrible experiences with RPD due to NSR sync. to re1 for
>> BGP, on an RE-S-1800 running Junos 20.4R3.8. Turns out the code can't
>> deal with grouping outbound updates to eBGP neighbors at scale for that
>> RE, which crashes RPD on re1.
>>
>> The options were to either disable NSR, rewrite our outbound policies
>> and combine multiple customers in the same outbound group, or get more
>> memory. We went for the last option.
>>
>> No more problems on the RE-S-X6-64G.
>>
>> Juniper have some work to do to optimize the code in these use-cases.
>>
>> Mark.
>> _______________________________________________
>> juniper-nsp mailing list juniper-nsp@puck.nether.net
>> https://puck.nether.net/mailman/listinfo/juniper-nsp
>>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
On 3/25/22 11:21, Mihai via juniper-nsp wrote:

> In my case I just upgraded one MX204 in the lab to 21.2R2, enabled
> rib-sharding and increased the JunosVM memory to 24G and things look
> better now.

Glad to hear!

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Junos 20 - slow RPD [ In reply to ]
I've been down the path of very slow RPD with JTAC recently. In our case
it was due to some mildly complex BGP community stuff that we do which was
exhausting memory limits.
A good fix for us was to bump up the memory allocation using these hidden
commands:

set policy-options as-path-match memory-limit 16m
set policy-options community-match memory-limit 16m

Default memory is 2097152 bytes, so very small. You can see some
interesting numbers with some other hidden commands:

show policy community-match
show policy as-path-match

Also if you're running EVPN, check out this PR which is a whole world of
fun
https://prsearch.juniper.net/InfoCenter/index?page=prcontent&id=PR1616167



On Fri, Mar 25, 2022 at 6:27 AM Mark Tinka via juniper-nsp <
juniper-nsp@puck.nether.net> wrote:

>
>
> On 3/25/22 11:21, Mihai via juniper-nsp wrote:
>
> > In my case I just upgraded one MX204 in the lab to 21.2R2, enabled
> > rib-sharding and increased the JunosVM memory to 24G and things look
> > better now.
>
> Glad to hear!
>
> Mark.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp