Mailing List Archive

Slow RE path 20 x faster then PFE path
Hi,

Would anyone have any idea why IP packets with options are forwarded via
MX104 20x faster then regular IP packets ?

"fast" PFE path - 24-35 ms
"slow" RE path - 1-4 ms

Example (I used record route to force IP Options punt to slow path):

rraszuk@cto-lon2:~$ ping 62.189.71.209 -R -v
PING 62.189.71.209 (62.189.71.209) 56(124) bytes of data.
64 bytes from 62.189.71.209: icmp_seq=1 ttl=63 time=1.44 ms
RR: 69.191.176.206
10.249.23.7
62.189.71.209
10.249.23.7
69.191.176.206

64 bytes from 62.189.71.209: icmp_seq=2 ttl=63 time=1.38 ms
64 bytes from 62.189.71.209: icmp_seq=3 ttl=63 time=1.46 ms
64 bytes from 62.189.71.209: icmp_seq=4 ttl=63 time=1.41 ms
64 bytes from 62.189.71.209: icmp_seq=5 ttl=63 time=1.49 ms
64 bytes from 62.189.71.209: icmp_seq=6 ttl=63 time=1.46 ms
64 bytes from 62.189.71.209: icmp_seq=8 ttl=63 time=1.52 ms
64 bytes from 62.189.71.209: icmp_seq=9 ttl=63 time=2.84 ms
64 bytes from 62.189.71.209: icmp_seq=10 ttl=63 time=1.77 ms
^C
--- 62.189.71.209 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9014ms
rtt min/avg/max/mdev = 1.386/1.892/4.117/0.849 ms

Now I use normal ping running between 62.189.71.209 & 69.191.176.206 in
"fast" path:

rraszuk@cto-lon2:~$ ping 62.189.71.209 -v
PING 62.189.71.209 (62.189.71.209) 56(84) bytes of data.
64 bytes from 62.189.71.209: icmp_seq=1 ttl=63 time=24.1 ms
64 bytes from 62.189.71.209: icmp_seq=2 ttl=63 time=31.5 ms
64 bytes from 62.189.71.209: icmp_seq=3 ttl=63 time=24.1 ms
64 bytes from 62.189.71.209: icmp_seq=4 ttl=63 time=24.1 ms
64 bytes from 62.189.71.209: icmp_seq=5 ttl=63 time=24.0 ms
64 bytes from 62.189.71.209: icmp_seq=6 ttl=63 time=24.1 ms
^C
--- 62.189.71.209 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5006ms
rtt min/avg/max/mdev = 24.097/25.369/31.563/2.774 ms
rraszuk@cto-lon2:~$

Best,
R.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
There isn't enough information to answer your question.

But one possible reason is that you're choosing a different path in SW
and HW. Or that the answers are not even coming from host you think
(tshark might add information).

a) is 1.4ms possible in terms of speed-of-light?
b) where are 24ms packets sitting, do you also have longer path
available or are you heavily congested causing massive queueing delay?



On Mon, 23 Mar 2020 at 15:09, Robert Raszuk <robert@raszuk.net> wrote:
>
> Hi,
>
> Would anyone have any idea why IP packets with options are forwarded via
> MX104 20x faster then regular IP packets ?
>
> "fast" PFE path - 24-35 ms
> "slow" RE path - 1-4 ms
>
> Example (I used record route to force IP Options punt to slow path):
>
> rraszuk@cto-lon2:~$ ping 62.189.71.209 -R -v
> PING 62.189.71.209 (62.189.71.209) 56(124) bytes of data.
> 64 bytes from 62.189.71.209: icmp_seq=1 ttl=63 time=1.44 ms
> RR: 69.191.176.206
> 10.249.23.7
> 62.189.71.209
> 10.249.23.7
> 69.191.176.206
>
> 64 bytes from 62.189.71.209: icmp_seq=2 ttl=63 time=1.38 ms
> 64 bytes from 62.189.71.209: icmp_seq=3 ttl=63 time=1.46 ms
> 64 bytes from 62.189.71.209: icmp_seq=4 ttl=63 time=1.41 ms
> 64 bytes from 62.189.71.209: icmp_seq=5 ttl=63 time=1.49 ms
> 64 bytes from 62.189.71.209: icmp_seq=6 ttl=63 time=1.46 ms
> 64 bytes from 62.189.71.209: icmp_seq=8 ttl=63 time=1.52 ms
> 64 bytes from 62.189.71.209: icmp_seq=9 ttl=63 time=2.84 ms
> 64 bytes from 62.189.71.209: icmp_seq=10 ttl=63 time=1.77 ms
> ^C
> --- 62.189.71.209 ping statistics ---
> 10 packets transmitted, 10 received, 0% packet loss, time 9014ms
> rtt min/avg/max/mdev = 1.386/1.892/4.117/0.849 ms
>
> Now I use normal ping running between 62.189.71.209 & 69.191.176.206 in
> "fast" path:
>
> rraszuk@cto-lon2:~$ ping 62.189.71.209 -v
> PING 62.189.71.209 (62.189.71.209) 56(84) bytes of data.
> 64 bytes from 62.189.71.209: icmp_seq=1 ttl=63 time=24.1 ms
> 64 bytes from 62.189.71.209: icmp_seq=2 ttl=63 time=31.5 ms
> 64 bytes from 62.189.71.209: icmp_seq=3 ttl=63 time=24.1 ms
> 64 bytes from 62.189.71.209: icmp_seq=4 ttl=63 time=24.1 ms
> 64 bytes from 62.189.71.209: icmp_seq=5 ttl=63 time=24.0 ms
> 64 bytes from 62.189.71.209: icmp_seq=6 ttl=63 time=24.1 ms
> ^C
> --- 62.189.71.209 ping statistics ---
> 6 packets transmitted, 6 received, 0% packet loss, time 5006ms
> rtt min/avg/max/mdev = 24.097/25.369/31.563/2.774 ms
> rraszuk@cto-lon2:~$
>
> Best,
> R.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
Hi Saku,

This is very simple setup:

linux (.206) ---- LAN---- mx104(.210) ---- p2p---- isp (.209)

Pretty much 1.4 is what is expected. The only place the delay occurs is on
ingress from the ISP to MX104. If I ping MX104 outbound (.210) int I get
0.5 ms.

No worries anyway ... just thought anyone run into this before.

Cheers,
R,



On Mon, Mar 23, 2020 at 2:14 PM Saku Ytti <saku@ytti.fi> wrote:

> There isn't enough information to answer your question.
>
> But one possible reason is that you're choosing a different path in SW
> and HW. Or that the answers are not even coming from host you think
> (tshark might add information).
>
> a) is 1.4ms possible in terms of speed-of-light?
> b) where are 24ms packets sitting, do you also have longer path
> available or are you heavily congested causing massive queueing delay?
>
>
>
> On Mon, 23 Mar 2020 at 15:09, Robert Raszuk <robert@raszuk.net> wrote:
> >
> > Hi,
> >
> > Would anyone have any idea why IP packets with options are forwarded via
> > MX104 20x faster then regular IP packets ?
> >
> > "fast" PFE path - 24-35 ms
> > "slow" RE path - 1-4 ms
> >
> > Example (I used record route to force IP Options punt to slow path):
> >
> > rraszuk@cto-lon2:~$ ping 62.189.71.209 -R -v
> > PING 62.189.71.209 (62.189.71.209) 56(124) bytes of data.
> > 64 bytes from 62.189.71.209: icmp_seq=1 ttl=63 time=1.44 ms
> > RR: 69.191.176.206
> > 10.249.23.7
> > 62.189.71.209
> > 10.249.23.7
> > 69.191.176.206
> >
> > 64 bytes from 62.189.71.209: icmp_seq=2 ttl=63 time=1.38 ms
> > 64 bytes from 62.189.71.209: icmp_seq=3 ttl=63 time=1.46 ms
> > 64 bytes from 62.189.71.209: icmp_seq=4 ttl=63 time=1.41 ms
> > 64 bytes from 62.189.71.209: icmp_seq=5 ttl=63 time=1.49 ms
> > 64 bytes from 62.189.71.209: icmp_seq=6 ttl=63 time=1.46 ms
> > 64 bytes from 62.189.71.209: icmp_seq=8 ttl=63 time=1.52 ms
> > 64 bytes from 62.189.71.209: icmp_seq=9 ttl=63 time=2.84 ms
> > 64 bytes from 62.189.71.209: icmp_seq=10 ttl=63 time=1.77 ms
> > ^C
> > --- 62.189.71.209 ping statistics ---
> > 10 packets transmitted, 10 received, 0% packet loss, time 9014ms
> > rtt min/avg/max/mdev = 1.386/1.892/4.117/0.849 ms
> >
> > Now I use normal ping running between 62.189.71.209 & 69.191.176.206 in
> > "fast" path:
> >
> > rraszuk@cto-lon2:~$ ping 62.189.71.209 -v
> > PING 62.189.71.209 (62.189.71.209) 56(84) bytes of data.
> > 64 bytes from 62.189.71.209: icmp_seq=1 ttl=63 time=24.1 ms
> > 64 bytes from 62.189.71.209: icmp_seq=2 ttl=63 time=31.5 ms
> > 64 bytes from 62.189.71.209: icmp_seq=3 ttl=63 time=24.1 ms
> > 64 bytes from 62.189.71.209: icmp_seq=4 ttl=63 time=24.1 ms
> > 64 bytes from 62.189.71.209: icmp_seq=5 ttl=63 time=24.0 ms
> > 64 bytes from 62.189.71.209: icmp_seq=6 ttl=63 time=24.1 ms
> > ^C
> > --- 62.189.71.209 ping statistics ---
> > 6 packets transmitted, 6 received, 0% packet loss, time 5006ms
> > rtt min/avg/max/mdev = 24.097/25.369/31.563/2.774 ms
> > rraszuk@cto-lon2:~$
> >
> > Best,
> > R.
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
>
>
>
> --
> ++ytti
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
Hey,

> This is very simple setup:
>
> linux (.206) ---- LAN---- mx104(.210) ---- p2p---- isp (.209)
>
> Pretty much 1.4 is what is expected. The only place the delay occurs is on ingress from the ISP to MX104. If I ping MX104 outbound (.210) int I get 0.5 ms.
>
> No worries anyway ... just thought anyone run into this before.

Is this simplified topology or actual? Is it not possible that return
path is different? Entirely possible for SW and HW forwarded packets
to experience different path selection. So perhaps ISP is returning
packets via other path for HW packets, but other path via SW packets.

It's very difficult to imagine failure mode where packets would wait
consistent ~23ms on the mx104.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
That is actual topology and during testing no other return path existed.

It seems that there are PFE loops which would explain why punted to RE
packets are forwarded so fast .... JTAC is debugging :)

Thx,
R.

On Mon, Mar 23, 2020 at 4:17 PM Saku Ytti <saku@ytti.fi> wrote:

> Hey,
>
> > This is very simple setup:
> >
> > linux (.206) ---- LAN---- mx104(.210) ---- p2p---- isp (.209)
> >
> > Pretty much 1.4 is what is expected. The only place the delay occurs is
> on ingress from the ISP to MX104. If I ping MX104 outbound (.210) int I get
> 0.5 ms.
> >
> > No worries anyway ... just thought anyone run into this before.
>
> Is this simplified topology or actual? Is it not possible that return
> path is different? Entirely possible for SW and HW forwarded packets
> to experience different path selection. So perhaps ISP is returning
> packets via other path for HW packets, but other path via SW packets.
>
> It's very difficult to imagine failure mode where packets would wait
> consistent ~23ms on the mx104.
>
> --
> ++ytti
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
I'm not sure what you mean by 'PFE loops', this is single NPU
fabricless platform, all ports are local. The only way to delay packet
that long, is to send it to off-chip DRAM (delay buffer).

But please update list once you figure it out.


On Mon, 23 Mar 2020 at 17:28, Robert Raszuk <robert@raszuk.net> wrote:
>
> That is actual topology and during testing no other return path existed.
>
> It seems that there are PFE loops which would explain why punted to RE packets are forwarded so fast .... JTAC is debugging :)
>
> Thx,
> R.
>
> On Mon, Mar 23, 2020 at 4:17 PM Saku Ytti <saku@ytti.fi> wrote:
>>
>> Hey,
>>
>> > This is very simple setup:
>> >
>> > linux (.206) ---- LAN---- mx104(.210) ---- p2p---- isp (.209)
>> >
>> > Pretty much 1.4 is what is expected. The only place the delay occurs is on ingress from the ISP to MX104. If I ping MX104 outbound (.210) int I get 0.5 ms.
>> >
>> > No worries anyway ... just thought anyone run into this before.
>>
>> Is this simplified topology or actual? Is it not possible that return
>> path is different? Entirely possible for SW and HW forwarded packets
>> to experience different path selection. So perhaps ISP is returning
>> packets via other path for HW packets, but other path via SW packets.
>>
>> It's very difficult to imagine failure mode where packets would wait
>> consistent ~23ms on the mx104.
>>
>> --
>> ++ytti



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
On 23/Mar/20 17:37, Saku Ytti wrote:
> I'm not sure what you mean by 'PFE loops', this is single NPU
> fabricless platform, all ports are local. The only way to delay packet
> that long, is to send it to off-chip DRAM (delay buffer).
>
> But please update list once you figure it out.

Just for giggle, Robert, are you able to test this with another Juniper
platform (even if it's the MX80)?

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
On 23/Mar/20 17:37, Saku Ytti wrote:
> I'm not sure what you mean by 'PFE loops', this is single NPU
> fabricless platform, all ports are local. The only way to delay packet
> that long, is to send it to off-chip DRAM (delay buffer).
>
> But please update list once you figure it out.

Just for giggles, Robert, are you able to test this with another Juniper
platform (even if it's the MX80)?

Mark.

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
Hi Mark,

Oh yes ... exact same config and same setup and even same hw (mx104) works
fine in other location giving consistent 1 ms without IP options and mostly
1-3 ms with IP options - but that is all fine. Going to RE is always non
deterministic :)

Another interesting observation is that show command indicated services
inline input traffic over 33 Mpps zero output while total coming to the box
was at that time 1 Mpps ....

Thx,
R.

On Mon, Mar 23, 2020 at 5:47 PM Mark Tinka <mark.tinka@seacom.mu> wrote:

>
>
> On 23/Mar/20 17:37, Saku Ytti wrote:
> > I'm not sure what you mean by 'PFE loops', this is single NPU
> > fabricless platform, all ports are local. The only way to delay packet
> > that long, is to send it to off-chip DRAM (delay buffer).
> >
> > But please update list once you figure it out.
>
> Just for giggles, Robert, are you able to test this with another Juniper
> platform (even if it's the MX80)?
>
> Mark.
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
On 23/Mar/20 19:25, Robert Raszuk wrote:

> Hi Mark,
>
> Oh yes ... exact same config and same setup and even same hw (mx104)
> works fine in other location giving consistent 1 ms without IP options
> and mostly 1-3 ms with IP options - but that is all fine. Going to RE
> is always non deterministic :)

That is quite curious.

Given the same kit in another location and the results, what immediately
stands out to you?

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
PFE bugs ... but I will happily let vendor handle it these days :)

On Mon, Mar 23, 2020 at 6:30 PM Mark Tinka <mark.tinka@seacom.mu> wrote:

>
>
> On 23/Mar/20 19:25, Robert Raszuk wrote:
>
> > Hi Mark,
> >
> > Oh yes ... exact same config and same setup and even same hw (mx104)
> > works fine in other location giving consistent 1 ms without IP options
> > and mostly 1-3 ms with IP options - but that is all fine. Going to RE
> > is always non deterministic :)
>
> That is quite curious.
>
> Given the same kit in another location and the results, what immediately
> stands out to you?
>
> Mark.
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
On 23/Mar/20 19:31, Robert Raszuk wrote:
>
> PFE bugs ... but I will happily let vendor handle it these days :)

Please keep us posted.

Would be interesting to see if this can be replicated on non-MX104
hardware as well.

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
No really as if I ping .209 from Internet side without IP options (this is
ISP edge) I get 30 ms RTT across Europe.

Besides this is not the only MX104 which is slow in fast path.

If we well J finds an answer and makes it public I will share with the
list.

Thx,
R.

On Mon, Mar 23, 2020 at 9:32 PM Timur Maryin <timamaryin@mail.ru> wrote:

>
>
> On 23-Mar-20 14:03, Robert Raszuk wrote:
> > Hi,
> >
> > Would anyone have any idea why IP packets with options are forwarded via
> > MX104 20x faster then regular IP packets ?
> >
> > "fast" PFE path - 24-35 ms
> > "slow" RE path - 1-4 ms
>
>
> 24 ms is ages in terms of PFE.
> I hardly can imaginethat is possible.
>
>
> Is it possible that .209 answers faster to packets with options?
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
Yes NAT is configured there as I indicated via presence of si- phantom load
... Having NAT there is not my idea though :). But sorry can not share the
config.

If you could shed some more light on your comment how to properly configure
it and what to avoid I think it may be very useful for many folks on this
list.

Many thx,
R.



On Tue, Mar 24, 2020 at 5:00 AM Alexander Arseniev <arseniev@btinternet.com>
wrote:

> Hello,
>
>
>
> Another interesting observation is that show command indicated services
> inline input traffic over 33 Mpps zero output while total coming to the box
> was at that time 1 Mpps ....
>
>
> Do You have inline NAT configured on this box? Is it possible to share the
> config please?
> It is quite easy to loop traffic with NAT (inline or not) and while looped
> inside same box,
> TTL does not get decremented so You end up with eternal PFE saturation.
>
> Thanks
> Alex
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
You can confirm PPE load by:

RMPC0(r24.labxtx01.us.bb-re0 vty)# show xl-asic 0 npu_stats
NPU Stats Dump for FPC0:NPU0
Global NPU Utilization: 0.

(xl or lu, depending on platform)

Also piping output to file and aggregating: 'show xl-asic 0 ppe cntx'
can be very useful to see what the PPEs are doing, particularly when
you have datapoint from working and non-working box, aggregation how
many PPEs contexts are in which Ucode address will be a dead
give-away.

I was recently troubleshooting issue where on some reboots PPS
performance is good, some reboots less good, and looking at the PPE
contexts was illuminating for that. I suspect if this is PPE spending
time on NAT thing, then on bad box global NPU load should be high as
well as context uCode address distribution to be widly different (good
box mosty sleeping).





On Tue, 24 Mar 2020 at 10:31, Robert Raszuk <robert@raszuk.net> wrote:
>
> Yes NAT is configured there as I indicated via presence of si- phantom load
> ... Having NAT there is not my idea though :). But sorry can not share the
> config.
>
> If you could shed some more light on your comment how to properly configure
> it and what to avoid I think it may be very useful for many folks on this
> list.
>
> Many thx,
> R.
>
>
>
> On Tue, Mar 24, 2020 at 5:00 AM Alexander Arseniev <arseniev@btinternet.com>
> wrote:
>
> > Hello,
> >
> >
> >
> > Another interesting observation is that show command indicated services
> > inline input traffic over 33 Mpps zero output while total coming to the box
> > was at that time 1 Mpps ....
> >
> >
> > Do You have inline NAT configured on this box? Is it possible to share the
> > config please?
> > It is quite easy to loop traffic with NAT (inline or not) and while looped
> > inside same box,
> > TTL does not get decremented so You end up with eternal PFE saturation.
> >
> > Thanks
> > Alex
> >
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Slow RE path 20 x faster then PFE path [ In reply to ]
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp