Mailing List Archive

Latency/Packet Loss on ASR1006
Hi,

We have ...

ASR1006 that has following cards...
1 x ESP40
1 x SIP40
4 x SPA-1x10GE-L-V2
1 x 6TGE
1 x RP2

We've been having latency and packet loss during peak periods...

We notice all is good until we reach 50% utilization on output of...

'show platform hardware qfp active datapath utilization summary'

Literally ... 47% good... 48% good... 49% latency to next hop goes from 1ms
to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms latency... 53% we see
60-70ms latency and 8-10% packet loss.

Is this expected... the ESP40 can only really push 20G and then starts to
have performance issues?



---
Colin Legendre
RE: Latency/Packet Loss on ASR1006 [ In reply to ]
https://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/200674-Throughput-issues-on-ASR1000-Series-rout.html



So many years since I have used an asr1000 but, honestly you have an esp40 in a box with 10x10G interfaces? That’s a very underpowered processor for that job. The ESP40 was designed for a box that would have 1G interfaces and perhaps a couple of 10’s. The ASR1000 is a CPU based box, everything goes back to the processor and remember cisco math means half duplex not full.



From: NANOG <nanog-bounces+tony=wicks.co.nz@nanog.org> On Behalf Of Colin Legendre
Sent: Saturday, 27 November 2021 8:09 am
To: nanog <nanog@nanog.org>
Subject: Latency/Packet Loss on ASR1006



Hi,



We have ...



ASR1006 that has following cards...

1 x ESP40

1 x SIP40

4 x SPA-1x10GE-L-V2

1 x 6TGE

1 x RP2



We've been having latency and packet loss during peak periods...



We notice all is good until we reach 50% utilization on output of...



'show platform hardware qfp active datapath utilization summary'



Literally ... 47% good... 48% good... 49% latency to next hop goes from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms latency... 53% we see 60-70ms latency and 8-10% packet loss.



Is this expected... the ESP40 can only really push 20G and then starts to have performance issues?







---
Colin Legendre
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
Hi

we see similar problems on ASR1006-X with ESP100 and MIP100. At about
~45 Gbit/s of traffic (on ~30k PPPoE Sessions and ~700k CGN sessions)
the QFP utilization skyrockets from ~45 % straight to ~95 % :(
I don't know if it's the CGN sessions or the traffic/packets causing the
load increase, the datasheet says it supports something like 10M
sessions.... but maybe not if you really intend to push packets through it?
We have not seen such spikes with way higher pps, but lower CGN session
count, when we had DDoS Attacks against end customers.

Fiona

On 11/26/21 20:09, Colin Legendre wrote:
> Hi,
>
> We have ...
>
> ASR1006  that has following cards...
> 1 x ESP40
> 1 x SIP40
> 4 x SPA-1x10GE-L-V2
> 1 x 6TGE
> 1 x RP2
>
> We've been having latency and packet loss during peak periods...
>
> We notice all is good until we reach 50% utilization on output of...
>
> 'show platform hardware qfp active datapath utilization summary'
>
> Literally ... 47% good... 48% good... 49% latency to next hop goes
> from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
> latency... 53% we see 60-70ms latency and 8-10% packet loss.
>
> Is this expected... the ESP40 can only really push 20G and then starts
> to have performance issues?
>
>
>
> ---
> Colin Legendre
>
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
On Fri, 26 Nov 2021 at 21:37, Tony Wicks <tony@wicks.co.nz> wrote:

> So many years since I have used an asr1000 but, honestly you have an esp40 in a box with 10x10G interfaces? That’s a very underpowered processor for that job. The ESP40 was designed for a box that would have 1G interfaces and perhaps a couple of 10’s. The ASR1000 is a CPU based box, everything goes back to the processor and remember cisco math means half duplex not full.

I'm not sure what a CPU based box means here. ASR1k isn't using a
general purpose core like PQ3, INTC or AMD. Like CRS-1 and nPower,
ASR1k has Cisco made forwarding logic using cores from tensilica
(CPP10/popey I believe was 40 x Tensilica DI 570T, next iteration was
64 cores).

--
++ytti
RE: Latency/Packet Loss on ASR1006 [ In reply to ]
I mean a router without ASIC based forwarding like a Juniper MX or Nokia 7750. The advantage of the 1k is you don't need a services card for cgnat, but the large disadvantage is everything passes through the ESP processor and this often leads to disappointing results under load.

>I'm not sure what a CPU based box means here. ASR1k isn't using a general purpose core like PQ3, INTC or AMD. Like CRS-1 and nPower, ASR1k has Cisco made forwarding logic using cores from tensilica (CPP10/popey I believe was 40 x Tensilica DI 570T, next iteration was
64 cores).

--
++ytti
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
On Sat, 27 Nov 2021 at 13:32, Tony Wicks <tony@wicks.co.nz> wrote:

> I mean a router without ASIC based forwarding like a Juniper MX or Nokia 7750. The advantage of the 1k is you don't need a services card for cgnat, but the large disadvantage is everything passes through the ESP processor and this often leads to disappointing results under load.

I think ASR1k NPU perfect analog for Juniper MX Trio or Nokia 7750 FP,
I think these all fall in very common description of an NPU. We could
dive deep and explain why 7750 and MX are vastly different, in
decision of doing many small or few large cores, but ultimately they
easily fall under NPU definition.

--
++ytti
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
In the past we had packet loss issues due SIP's PLIM buffer.

The following docs may provide some guidance:

https://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/200674-Throughput-issues-on-ASR1000-Series-rout.html
https://www.cisco.com/c/en/us/td/docs/interfaces_modules/shared_port_adapters/configuration/ASR1000/asr1000-sip-spa-book/asr-spa-pkt-class.html

--
Tassos

On 27/11/21 02:11, Fiona Weber via NANOG wrote:
>
> Hi
>
> we see similar problems on ASR1006-X with ESP100 and MIP100. At about
> ~45 Gbit/s of traffic (on ~30k PPPoE Sessions and ~700k CGN sessions)
> the QFP utilization skyrockets from ~45 % straight to ~95 % :(
> I don't know if it's the CGN sessions or the traffic/packets causing
> the load increase, the datasheet says it supports something like 10M
> sessions.... but maybe not if you really intend to push packets
> through it?
> We have not seen such spikes with way higher pps, but lower CGN
> session count, when we had DDoS Attacks against end customers.
>
> Fiona
>
> On 11/26/21 20:09, Colin Legendre wrote:
>> Hi,
>>
>> We have ...
>>
>> ASR1006  that has following cards...
>> 1 x ESP40
>> 1 x SIP40
>> 4 x SPA-1x10GE-L-V2
>> 1 x 6TGE
>> 1 x RP2
>>
>> We've been having latency and packet loss during peak periods...
>>
>> We notice all is good until we reach 50% utilization on output of...
>>
>> 'show platform hardware qfp active datapath utilization summary'
>>
>> Literally ... 47% good... 48% good... 49% latency to next hop goes
>> from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
>> latency... 53% we see 60-70ms latency and 8-10% packet loss.
>>
>> Is this expected... the ESP40 can only really push 20G and then
>> starts to have performance issues?
>>
>>
>>
>> ---
>> Colin Legendre
>>
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
Thanks, will look into this.

---
Colin Legendre
President and CTO

Coextro - Unlimited. Fast. Reliable.
w: www.coextro.com
e: clegendre@coextro.com

p: 647-693-7686 ext.101
m: 416-560-8502
f: 647-812-4132


On Sat, Nov 27, 2021 at 7:42 AM Tassos <achatz@forthnet.gr> wrote:

> In the past we had packet loss issues due SIP's PLIM buffer.
>
> The following docs may provide some guidance:
>
>
> https://www.cisco.com/c/en/us/support/docs/routers/asr-1000-series-aggregation-services-routers/200674-Throughput-issues-on-ASR1000-Series-rout.html
>
> https://www.cisco.com/c/en/us/td/docs/interfaces_modules/shared_port_adapters/configuration/ASR1000/asr1000-sip-spa-book/asr-spa-pkt-class.html
>
> --
> Tassos
>
> On 27/11/21 02:11, Fiona Weber via NANOG wrote:
>
> Hi
>
> we see similar problems on ASR1006-X with ESP100 and MIP100. At about ~45
> Gbit/s of traffic (on ~30k PPPoE Sessions and ~700k CGN sessions) the QFP
> utilization skyrockets from ~45 % straight to ~95 % :(
> I don't know if it's the CGN sessions or the traffic/packets causing the
> load increase, the datasheet says it supports something like 10M
> sessions.... but maybe not if you really intend to push packets through it?
> We have not seen such spikes with way higher pps, but lower CGN session
> count, when we had DDoS Attacks against end customers.
>
> Fiona
> On 11/26/21 20:09, Colin Legendre wrote:
>
> Hi,
>
> We have ...
>
> ASR1006 that has following cards...
> 1 x ESP40
> 1 x SIP40
> 4 x SPA-1x10GE-L-V2
> 1 x 6TGE
> 1 x RP2
>
> We've been having latency and packet loss during peak periods...
>
> We notice all is good until we reach 50% utilization on output of...
>
> 'show platform hardware qfp active datapath utilization summary'
>
> Literally ... 47% good... 48% good... 49% latency to next hop goes from
> 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms latency... 53% we
> see 60-70ms latency and 8-10% packet loss.
>
> Is this expected... the ESP40 can only really push 20G and then starts to
> have performance issues?
>
>
>
> ---
> Colin Legendre
>
>
>
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
On 11/26/2021 1:09 PM, Colin Legendre wrote:
> Hi,
>
> We have ...
>
> ASR1006  that has following cards...
> 1 x ESP40
> 1 x SIP40
> 4 x SPA-1x10GE-L-V2
> 1 x 6TGE
> 1 x RP2
>
> We've been having latency and packet loss during peak periods...
>
> We notice all is good until we reach 50% utilization on output of...
>
> 'show platform hardware qfp active datapath utilization summary'
>
> Literally ... 47% good... 48% good... 49% latency to next hop goes
> from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
> latency... 53% we see 60-70ms latency and 8-10% packet loss.
>
> Is this expected... the ESP40 can only really push 20G and then starts
> to have performance issues?
>
>

I haven't experienced that across about a dozen ASR 1ks. Though I just
checked and we are not pushing any of our ESP's over 50% currently (the
closest we have is an ESP 40 doing 18Gbps). However, I'm pretty sure
we've pushed older ESPs (5, 10's, and 20's) to ~75% or so in the past.

Given the components you have, I would have expected your router to
handle 40Gbps input and 40Gbps output. That could either be 40Gbps into
the 6 port card [and 40Gbps out of the four 1 port cards] or it could be
40Gbps input that is spread across the 6 port and 1 port cards [that is
then output across both cards as well].

Despite other comments, I think your components are well matched. The
only non-obvious thing here is that the 6 port card only has a ~40Gbps
connection to the backplane so you cannot use all 6 ports at full
bandwidth. I think this router is well suited to handle 20-30Gbps of
customer demand doing standard destination based routing (if you're
doing traffic shaping, NAT, tunnelling, or something else more involved
than extended ACLs you may need something beefier at those traffic levels).
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
That's what I thought.

Our total inbound bandwidth from upstreams is about 20G at max.. so that
really is the total bandwidth...

Now we are terminating about 1800 PPPoE sessions on the router as well, and
have policing set on them, as well as shaping on a couple of our major
downstream links.

Is anyone interested in making a few $ and taking a look for us, to see if
we are really hitting capacity, or if some sort of tuning could be done to
help us eak out a little bit more from this device before upgrading.

---
Colin Legendre

On Tue, Dec 7, 2021 at 10:34 AM Blake Hudson <blake@ispn.net> wrote:

>
> On 11/26/2021 1:09 PM, Colin Legendre wrote:
> > Hi,
> >
> > We have ...
> >
> > ASR1006 that has following cards...
> > 1 x ESP40
> > 1 x SIP40
> > 4 x SPA-1x10GE-L-V2
> > 1 x 6TGE
> > 1 x RP2
> >
> > We've been having latency and packet loss during peak periods...
> >
> > We notice all is good until we reach 50% utilization on output of...
> >
> > 'show platform hardware qfp active datapath utilization summary'
> >
> > Literally ... 47% good... 48% good... 49% latency to next hop goes
> > from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
> > latency... 53% we see 60-70ms latency and 8-10% packet loss.
> >
> > Is this expected... the ESP40 can only really push 20G and then starts
> > to have performance issues?
> >
> >
>
> I haven't experienced that across about a dozen ASR 1ks. Though I just
> checked and we are not pushing any of our ESP's over 50% currently (the
> closest we have is an ESP 40 doing 18Gbps). However, I'm pretty sure
> we've pushed older ESPs (5, 10's, and 20's) to ~75% or so in the past.
>
> Given the components you have, I would have expected your router to
> handle 40Gbps input and 40Gbps output. That could either be 40Gbps into
> the 6 port card [and 40Gbps out of the four 1 port cards] or it could be
> 40Gbps input that is spread across the 6 port and 1 port cards [that is
> then output across both cards as well].
>
> Despite other comments, I think your components are well matched. The
> only non-obvious thing here is that the 6 port card only has a ~40Gbps
> connection to the backplane so you cannot use all 6 ports at full
> bandwidth. I think this router is well suited to handle 20-30Gbps of
> customer demand doing standard destination based routing (if you're
> doing traffic shaping, NAT, tunnelling, or something else more involved
> than extended ACLs you may need something beefier at those traffic levels).
>
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
On 07/12/2021 17:32, Blake Hudson wrote:

Suggestion: move this thread to cisco-nsp where you might find more
assistance.

Regards,
Hank

>
> On 11/26/2021 1:09 PM, Colin Legendre wrote:
>> Hi,
>>
>> We have ...
>>
>> ASR1006  that has following cards...
>> 1 x ESP40
>> 1 x SIP40
>> 4 x SPA-1x10GE-L-V2
>> 1 x 6TGE
>> 1 x RP2
>>
>> We've been having latency and packet loss during peak periods...
>>
>> We notice all is good until we reach 50% utilization on output of...
>>
>> 'show platform hardware qfp active datapath utilization summary'
>>
>> Literally ... 47% good... 48% good... 49% latency to next hop goes
>> from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
>> latency... 53% we see 60-70ms latency and 8-10% packet loss.
>>
>> Is this expected... the ESP40 can only really push 20G and then starts
>> to have performance issues?
>>
>>
>
> I haven't experienced that across about a dozen ASR 1ks. Though I just
> checked and we are not pushing any of our ESP's over 50% currently (the
> closest we have is an ESP 40 doing 18Gbps). However, I'm pretty sure
> we've pushed older ESPs (5, 10's, and 20's) to ~75% or so in the past.
>
> Given the components you have, I would have expected your router to
> handle 40Gbps input and 40Gbps output. That could either be 40Gbps into
> the 6 port card [and 40Gbps out of the four 1 port cards] or it could be
> 40Gbps input that is spread across the 6 port and 1 port cards [that is
> then output across both cards as well].
>
> Despite other comments, I think your components are well matched. The
> only non-obvious thing here is that the 6 port card only has a ~40Gbps
> connection to the backplane so you cannot use all 6 ports at full
> bandwidth. I think this router is well suited to handle 20-30Gbps of
> customer demand doing standard destination based routing (if you're
> doing traffic shaping, NAT, tunnelling, or something else more involved
> than extended ACLs you may need something beefier at those traffic levels).
RE: Latency/Packet Loss on ASR1006 [ In reply to ]
> On 11/26/2021 1:09 PM, Colin Legendre wrote:
> > Hi,
> >
> > We have ...
> >
> > ASR1006  that has following cards...
> > 1 x ESP40
> > 1 x SIP40
> > 4 x SPA-1x10GE-L-V2
> > 1 x 6TGE
> > 1 x RP2
> >
> > We've been having latency and packet loss during peak periods...
> >
> > We notice all is good until we reach 50% utilization on output of...
> >
> > 'show platform hardware qfp active datapath utilization summary'
> >
> > Literally ... 47% good... 48% good... 49% latency to next hop goes
> > from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
> > latency... 53% we see 60-70ms latency and 8-10% packet loss.
> >
> > Is this expected... the ESP40 can only really push 20G and then starts
> > to have performance issues?
> >

He had a similar issue about 4 years ago.
We were showing packet loss and drops getting progressively worse and the router was falling over when reaching about 70% of usage.
We could see the interface reliability go down and input errors due to overruns on the interfaces.
Cisco blamed it on microburtst not being able to be handled under load.


"We were able to replicate this scenario in our lab as well.
QFP under high load generated input errors and overruns which in turn led to unicast failures/ drops/ latency.
The issue is not consistent with QFP % utilization as sometimes with even 80%+ traffic, we do not see the drops:"

And recommended removing traffic or upgrading esp.

One of our guys disabled nbar on the router and the problem disappeared.
I would suggest taking a look at what features you are using and if you can try and disable them to see if it makes any impact.
We then upgraded esps and all has been fine since.

Brian
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
Thanks for this.. turned off netflow export.. and it dropped our qfp load
from 44% to 18%. ugh..

---
Colin Legendre



On Thu, Dec 9, 2021 at 4:22 AM Brian Turnbow via NANOG <nanog@nanog.org>
wrote:

>
>
> > On 11/26/2021 1:09 PM, Colin Legendre wrote:
> > > Hi,
> > >
> > > We have ...
> > >
> > > ASR1006 that has following cards...
> > > 1 x ESP40
> > > 1 x SIP40
> > > 4 x SPA-1x10GE-L-V2
> > > 1 x 6TGE
> > > 1 x RP2
> > >
> > > We've been having latency and packet loss during peak periods...
> > >
> > > We notice all is good until we reach 50% utilization on output of...
> > >
> > > 'show platform hardware qfp active datapath utilization summary'
> > >
> > > Literally ... 47% good... 48% good... 49% latency to next hop goes
> > > from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
> > > latency... 53% we see 60-70ms latency and 8-10% packet loss.
> > >
> > > Is this expected... the ESP40 can only really push 20G and then starts
> > > to have performance issues?
> > >
>
> He had a similar issue about 4 years ago.
> We were showing packet loss and drops getting progressively worse and the
> router was falling over when reaching about 70% of usage.
> We could see the interface reliability go down and input errors due to
> overruns on the interfaces.
> Cisco blamed it on microburtst not being able to be handled under load.
>
>
> "We were able to replicate this scenario in our lab as well.
> QFP under high load generated input errors and overruns which in turn led
> to unicast failures/ drops/ latency.
> The issue is not consistent with QFP % utilization as sometimes with even
> 80%+ traffic, we do not see the drops:"
>
> And recommended removing traffic or upgrading esp.
>
> One of our guys disabled nbar on the router and the problem disappeared.
> I would suggest taking a look at what features you are using and if you
> can try and disable them to see if it makes any impact.
> We then upgraded esps and all has been fine since.
>
> Brian
>
>
Re: Latency/Packet Loss on ASR1006 [ In reply to ]
NBAR was not enabled.. just netflow export.. and that was enough..

---
Colin Legendre
President and CTO

Coextro - Unlimited. Fast. Reliable.
w: www.coextro.com
e: clegendre@coextro.com

p: 647-693-7686 ext.101
m: 416-560-8502
f: 647-812-4132


On Thu, Dec 9, 2021 at 7:17 PM Colin Legendre <clegendre@coextro.com> wrote:

> Thanks for this.. turned off netflow export.. and it dropped our qfp load
> from 44% to 18%. ugh..
>
> ---
> Colin Legendre
>
>
>
> On Thu, Dec 9, 2021 at 4:22 AM Brian Turnbow via NANOG <nanog@nanog.org>
> wrote:
>
>>
>>
>> > On 11/26/2021 1:09 PM, Colin Legendre wrote:
>> > > Hi,
>> > >
>> > > We have ...
>> > >
>> > > ASR1006 that has following cards...
>> > > 1 x ESP40
>> > > 1 x SIP40
>> > > 4 x SPA-1x10GE-L-V2
>> > > 1 x 6TGE
>> > > 1 x RP2
>> > >
>> > > We've been having latency and packet loss during peak periods...
>> > >
>> > > We notice all is good until we reach 50% utilization on output of...
>> > >
>> > > 'show platform hardware qfp active datapath utilization summary'
>> > >
>> > > Literally ... 47% good... 48% good... 49% latency to next hop goes
>> > > from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
>> > > latency... 53% we see 60-70ms latency and 8-10% packet loss.
>> > >
>> > > Is this expected... the ESP40 can only really push 20G and then starts
>> > > to have performance issues?
>> > >
>>
>> He had a similar issue about 4 years ago.
>> We were showing packet loss and drops getting progressively worse and the
>> router was falling over when reaching about 70% of usage.
>> We could see the interface reliability go down and input errors due to
>> overruns on the interfaces.
>> Cisco blamed it on microburtst not being able to be handled under load.
>>
>>
>> "We were able to replicate this scenario in our lab as well.
>> QFP under high load generated input errors and overruns which in turn led
>> to unicast failures/ drops/ latency.
>> The issue is not consistent with QFP % utilization as sometimes with even
>> 80%+ traffic, we do not see the drops:"
>>
>> And recommended removing traffic or upgrading esp.
>>
>> One of our guys disabled nbar on the router and the problem disappeared.
>> I would suggest taking a look at what features you are using and if you
>> can try and disable them to see if it makes any impact.
>> We then upgraded esps and all has been fine since.
>>
>> Brian
>>
>>
RE: Latency/Packet Loss on ASR1006 [ In reply to ]
If you still need netflow to gain some visibility on what’s happening, you could check the percentage of netflow export.



Usually 1/1000 is good or 0.1%. Maybe for you 1/1 000 000 could be good enough too.



If 100% was used, then indeed there are some real time performance penalties. Not much people need an accurate 100% of netflow exports. If you need 100% accuracy, then you need dedicated hardware.



0% or totally disabled is also often very good enough if you don’t need visibility. ????



Netflow is useful in my opinion, but maybe not for every case.



Jean



From: NANOG <nanog-bounces+jean=ddostest.me@nanog.org> On Behalf Of Colin Legendre
Sent: December 9, 2021 7:18 PM
To: Brian Turnbow <b.turnbow@twt.it>
Cc: nanog <nanog@nanog.org>
Subject: Re: Latency/Packet Loss on ASR1006



NBAR was not enabled.. just netflow export.. and that was enough..




---
Colin Legendre
President and CTO

Coextro - Unlimited. Fast. Reliable.
w: www.coextro.com <http://www.coextro.com>
e: clegendre@coextro.com <mailto:clegendre@coextro.com>

p: 647-693-7686 ext.101
m: 416-560-8502
f: 647-812-4132





On Thu, Dec 9, 2021 at 7:17 PM Colin Legendre <clegendre@coextro.com <mailto:clegendre@coextro.com> > wrote:

Thanks for this.. turned off netflow export.. and it dropped our qfp load from 44% to 18%. ugh..




---
Colin Legendre





On Thu, Dec 9, 2021 at 4:22 AM Brian Turnbow via NANOG <nanog@nanog.org <mailto:nanog@nanog.org> > wrote:



> On 11/26/2021 1:09 PM, Colin Legendre wrote:
> > Hi,
> >
> > We have ...
> >
> > ASR1006 that has following cards...
> > 1 x ESP40
> > 1 x SIP40
> > 4 x SPA-1x10GE-L-V2
> > 1 x 6TGE
> > 1 x RP2
> >
> > We've been having latency and packet loss during peak periods...
> >
> > We notice all is good until we reach 50% utilization on output of...
> >
> > 'show platform hardware qfp active datapath utilization summary'
> >
> > Literally ... 47% good... 48% good... 49% latency to next hop goes
> > from 1ms to 15-20ms... 50% we see 1-2% packet-loss and 30-40ms
> > latency... 53% we see 60-70ms latency and 8-10% packet loss.
> >
> > Is this expected... the ESP40 can only really push 20G and then starts
> > to have performance issues?
> >

He had a similar issue about 4 years ago.
We were showing packet loss and drops getting progressively worse and the router was falling over when reaching about 70% of usage.
We could see the interface reliability go down and input errors due to overruns on the interfaces.
Cisco blamed it on microburtst not being able to be handled under load.


"We were able to replicate this scenario in our lab as well.
QFP under high load generated input errors and overruns which in turn led to unicast failures/ drops/ latency.
The issue is not consistent with QFP % utilization as sometimes with even 80%+ traffic, we do not see the drops:"

And recommended removing traffic or upgrading esp.

One of our guys disabled nbar on the router and the problem disappeared.
I would suggest taking a look at what features you are using and if you can try and disable them to see if it makes any impact.
We then upgraded esps and all has been fine since.

Brian