Mailing List Archive

What exactly causes inconsistent RTT seen using ping utility in Junos?
Hi,

ping utility(or "icmp-ping" RPM probe without hardware-timestamp) in
Junos shows occasional high RTT even in case of pinging for example
directly connected devices in the LAN. Example:

64 bytes from 10.55.55.1: icmp_seq=40 ttl=64 time=0.441 ms
64 bytes from 10.55.55.1: icmp_seq=41 ttl=64 time=0.399 ms
64 bytes from 10.55.55.1: icmp_seq=42 ttl=64 time=67.192 ms
64 bytes from 10.55.55.1: icmp_seq=43 ttl=64 time=0.617 ms

This is well know behavior and documented in several KB articles.
However, what exactly causes this?

KB11792 says that it is caused by an internal scheduler in router
which gives lowest priority to ICMP ping packets. KB28157 gives an
impression that it has something to do with default CoS configuration.
When I measure the deltas between ICMP "echo request" messages seen by
the server directly connected to the router, then there is always a
longer delay for the delayed ICMP "echo request" message. For example,
here is the packet capture with delays between ICMP "echo request"
messages for those four ping results shows above:

00:00:01.000395 IP 10.55.55.2 > 10.55.55.1: ICMP echo request, id
59201, seq 40, length 64
00:00:01.000218 IP 10.55.55.2 > 10.55.55.1: ICMP echo request, id
59201, seq 41, length 64
00:00:01.024137 IP 10.55.55.2 > 10.55.55.1: ICMP echo request, id
59201, seq 42, length 64
00:00:00.983721 IP 10.55.55.2 > 10.55.55.1: ICMP echo request, id
59201, seq 43, length 64

Packet 42 arrives later than normal(normal is ~1000ms) and thus next
packet arrives earlier than normal. This means that (at least some of)
this occasional abnormal delay should be introduced when ICMP "echo
request" is traveling out of the box.

Based on this information I changed the CoS configuration in two MX
series routers. First I tried with vMX(runs in performance mode) and
configured RE-sent ICMP "echo request" messages into forwarding-class
named "ef" and associated this class with low-latency queue:

root@vmx1> show configuration class-of-service
forwarding-classes {
class ef queue-num 1 priority high policing-priority premium;
class be queue-num 0 priority low policing-priority normal;
}
interfaces {
ge-0/0/9 {
scheduler-map test_schedmap;
}
}
scheduler-maps {
test_schedmap {
forwarding-class be scheduler be_sched;
forwarding-class ef scheduler ef_sched;
}
}
schedulers {
be_sched {
priority low;
}
ef_sched {
buffer-size percent 0;
priority strict-high;
}
}

root@vmx1>

I can confirm that ICMP "echo request" messages are put into queue
number 1(associated with "ef" forwarding-class) with "show interfaces
queue ge-0/0/9 forwarding-class ef" command. However, it made no
difference compared to default CoS configuration where ICMP "echo
request" messages sent by RE are in the best-effort class. When I try
the same configuration on MX960, then again, there is no difference
compared to default CoS configuration.

I also noticed that when I load the RE CPU, then this does not have
affect on ping RTT. However, for example, when I pin the vPFE vCPU's
to specific CPU core and then additionally do something CPU-heavy on
that specific core, then it has noticeable affect on RTT of RE-sent
ICMP "echo request" messages.


Maybe someone (from Juniper) can shed some light on it. Please let me
know if anything was unclear.


thanks,
Martin
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
Hey Martin,

> This is well know behavior and documented in several KB articles.
> However, what exactly causes this?

I think just CPU doing something else before given time to do the ICMP
packets. Like busy running some RPD task. You are facing uphill battle
if you need to rely on precise ICMP echo times, it is of course
technically possible, but will not be easy to convince vendor that it
is needed.

> Based on this information I changed the CoS configuration in two MX
> series routers. First I tried with vMX(runs in performance mode) and
> configured RE-sent ICMP "echo request" messages into forwarding-class
> named "ef" and associated this class with low-latency queue:

This won't do anything, this is only applicable for forwarding-plane
traffic. For control-plane you want 'set class-of-service
host-outbound-traffic forwarding-class ef'. However this won't do
anything to give CPU scheduling priority on sending and receiving ICMP
messages.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
Hi Saku,

thanks for reply!

> > This is well know behavior and documented in several KB articles.
> > However, what exactly causes this?
>
> I think just CPU doing something else before given time to do the ICMP
> packets. Like busy running some RPD task.

I also thought that it has something to do with control-plane
process/thread scheduling at first. However, I would expect the RTT of
ping to become even more inconsistent when less CPU-time is available
for sending the ICMP "echo request" messages, but this is not the
case. For example, if I pin the vCPU of virtual control plane to
physical CPU core 0:

$ sudo virsh vcpuinfo vcp-vmx1
VCPU: 0
CPU: 0
State: running
CPU time: 142291.0s
CPU Affinity: y-------

$

..and at the same time run stress-ng on the core 0:

$ sudo taskset 0x00000001 stress-ng --cpu 1 --timeout 600s

..and "while :; do /usr/bin/nice -n -20 sha256 /var/tmp/2GB &
/usr/bin/nice -n -20 sha256 /var/tmp/2GB & /usr/bin/nice -n -20 sha256
/var/tmp/2GB; done" on the RE, then it has no affect to RTT. Physical
core CPU utilization was ~100% and RE CPU utilization seen with "show
chassis routing-engine" was >80%:

1 min CPU utilization:
User 83 percent
Background 0 percent
Kernel 16 percent
Interrupt 1 percent
Idle 0 percent


In addition, the KB28157 says that "An ICMP ping shows intermittent
latency on a directly connected link with the default Junos Class of
Service (CoS) configuration" which gives an impression that there is a
non-default CoS configuration where there is steady latency.
This makes me think that the this occasional abnormal latency is not
caused by RE. However, I could be easily wrong.


> > Based on this information I changed the CoS configuration in two MX
> > series routers. First I tried with vMX(runs in performance mode) and
> > configured RE-sent ICMP "echo request" messages into forwarding-class
> > named "ef" and associated this class with low-latency queue:
>
> This won't do anything, this is only applicable for forwarding-plane
> traffic. For control-plane you want 'set class-of-service
> host-outbound-traffic forwarding-class ef'. However this won't do
> anything to give CPU scheduling priority on sending and receiving ICMP
> messages.

I classified control-plane ICMP traffic into "ef" forwarding-class
with multifield classifier on lo0. It has the same effect as
"host-outbound-traffic".


Martin
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
> Martin T
> Sent: Monday, April 15, 2019 11:47 AM
>
> Hi Saku,
>
> thanks for reply!
>
> > > This is well know behavior and documented in several KB articles.
> > > However, what exactly causes this?
> >
> > I think just CPU doing something else before given time to do the ICMP
> > packets. Like busy running some RPD task.
>
> I also thought that it has something to do with control-plane
process/thread
> scheduling at first. However, I would expect the RTT of ping to become
even
> more inconsistent when less CPU-time is available for sending the ICMP
> "echo request" messages, but this is not the case. For example, if I pin
the
> vCPU of virtual control plane to physical CPU core 0:
>
> $ sudo virsh vcpuinfo vcp-vmx1
> VCPU: 0
> CPU: 0
> State: running
> CPU time: 142291.0s
> CPU Affinity: y-------
>
> $
>
> ..and at the same time run stress-ng on the core 0:
>
> $ sudo taskset 0x00000001 stress-ng --cpu 1 --timeout 600s
>
I'm afraid this is not a valid test to prove the effect of relative process
priorities within the RE, doing this you're slowing down the clock on the
complete RE simulation as a whole (all simulated processes slowed down
equally).


> ..and "while :; do /usr/bin/nice -n -20 sha256 /var/tmp/2GB &
/usr/bin/nice -
> n -20 sha256 /var/tmp/2GB & /usr/bin/nice -n -20 sha256 /var/tmp/2GB;
> done" on the RE, then it has no affect to RTT.

Again this could be because these kinds of operations have lower process
priority than the process handling ping.

Maybe try setting up BGP session with a tool where you can simulate route
churn or even load high number of paths via that BGP session (subject to vMX
license limits) this would allow you to see if high RE CPU utilization
caused by RPD has any effect on the ping RTTs (the assumption here is that
RDP should have one of the highest priorities...)

adam


_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
On Mon, 15 Apr 2019 at 14:37, <adamv0025@netconsultings.com> wrote:

> I'm afraid this is not a valid test to prove the effect of relative process
> priorities within the RE, doing this you're slowing down the clock on the
> complete RE simulation as a whole (all simulated processes slowed down
> equally).
..

> Again this could be because these kinds of operations have lower process
> priority than the process handling ping.

ACK, RPD is involved in ICMP and RIB/FIB management, and to the OS
it's just single thread (in this context, newer RPD does have few
separate OS threads). So what you'd need to do, is have RPD run some
high priority task. BGP churn which will cause FIB changes likely is
good candidate as way to cause variant delay to ICMP.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
? 15 avril 2019 15:09 +03, Saku Ytti <saku@ytti.fi>:

>> I'm afraid this is not a valid test to prove the effect of relative process
>> priorities within the RE, doing this you're slowing down the clock on the
>> complete RE simulation as a whole (all simulated processes slowed down
>> equally).
> ..
>
>> Again this could be because these kinds of operations have lower process
>> priority than the process handling ping.
>
> ACK, RPD is involved in ICMP and RIB/FIB management, and to the OS
> it's just single thread (in this context, newer RPD does have few
> separate OS threads). So what you'd need to do, is have RPD run some
> high priority task. BGP churn which will cause FIB changes likely is
> good candidate as way to cause variant delay to ICMP.

Can you confirm that rpd is answering ICMP echo requests? I find this
surprising as I would have expected the FreeBSD kernel to do that.
--
Replace repetitive expressions by calls to a common function.
- The Elements of Programming Style (Kernighan & Plauger)
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
On Tue, 16 Apr 2019 at 16:35, Vincent Bernat <bernat@luffy.cx> wrote:

> Can you confirm that rpd is answering ICMP echo requests? I find this
> surprising as I would have expected the FreeBSD kernel to do that.

You're probably right. So more likely is that LC CPU is busy doing
programming RPD asked it to do, instead of giving ICMP towards RE.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
On 16/04/2019 15:52, Saku Ytti wrote:
> On Tue, 16 Apr 2019 at 16:35, Vincent Bernat <bernat@luffy.cx> wrote:
>
>> Can you confirm that rpd is answering ICMP echo requests? I find this
>> surprising as I would have expected the FreeBSD kernel to do that.
>
> You're probably right. So more likely is that LC CPU is busy doing
> programming RPD asked it to do, instead of giving ICMP towards RE.
>

This is interesting discussion. It was always unclear to me what are
handled by the freebsd kernel, rpd, or the micro junos kernel.

--
Raphael Mazelier
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
> Raphael Mazelier
> Sent: Tuesday, April 16, 2019 3:51 PM
>
> On 16/04/2019 15:52, Saku Ytti wrote:
> > On Tue, 16 Apr 2019 at 16:35, Vincent Bernat <bernat@luffy.cx> wrote:
> >
> >> Can you confirm that rpd is answering ICMP echo requests? I find this
> >> surprising as I would have expected the FreeBSD kernel to do that.
> >
> > You're probably right. So more likely is that LC CPU is busy doing
> > programming RPD asked it to do, instead of giving ICMP towards RE.
> >
>
> This is interesting discussion. It was always unclear to me what are
handled
> by the freebsd kernel, rpd, or the micro junos kernel.
>
Hi gents,

For those of you interested in all the details around how the transit as
well as host-inbound and host-outbound traffic is handled on juniper MX3D
Trio architecture I'd recommend reading the following FREE book in its
entirety.
https://www.juniper.net/us/en/training/jnbooks/day-one/networking-technologi
es-series/packet-walkthrough-mx-series/
It's an excellent book that will answer, among others, all your questions
around where the ICMP might be delayed in the system and where and how it is
handled.

I'd say it's a must read for all NetEng/Arch working with Juniper MX.


adam



_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
>
> For those of you interested in all the details around how the transit as
> well as host-inbound and host-outbound traffic is handled on juniper MX3D
> Trio architecture I'd recommend reading the following FREE book in its
> entirety.
> https://www.juniper.net/us/en/training/jnbooks/day-one/networking-technologi
> es-series/packet-walkthrough-mx-series/
> It's an excellent book that will answer, among others, all your questions
> around where the ICMP might be delayed in the system and where and how it is
> handled.
>
> I'd say it's a must read for all NetEng/Arch working with Juniper MX.
>
>


This is sure a very good resource (from the excellent David Roy). I've
already read it in the past and forget most of it. So for icmp we are in
the case of an "exception packet" that are "punted" to the RE. But the
document did not detail what really happen in the RE. It mention that
theses packets transit by gigabit ethernet interfaces in the TPP
proprietary protocol, but nothing after. What daemon is in charge of
handling TPP flow on the RE side ? rpd ? for icmp is at the end the
packet go through the freebsd kernel (seems logic but). And what cause
latency in response ?


--
Raphael Mazelier
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
Also it's now different again. Because Linux KVM running FreeBSD
guest. Lot of things are very slow now due to the Linux=>FreeBSD
limit.

And then again different with Junos Evolved.

But certainly LC_CPU doing something else and needing to send ICMP
towards RE will cause some jitter. Of course many other places could
introduce jitter too.

On Wed, 17 Apr 2019 at 13:33, Raphael Mazelier <raph@futomaki.net> wrote:
>
> >
> > For those of you interested in all the details around how the transit as
> > well as host-inbound and host-outbound traffic is handled on juniper MX3D
> > Trio architecture I'd recommend reading the following FREE book in its
> > entirety.
> > https://www.juniper.net/us/en/training/jnbooks/day-one/networking-technologi
> > es-series/packet-walkthrough-mx-series/
> > It's an excellent book that will answer, among others, all your questions
> > around where the ICMP might be delayed in the system and where and how it is
> > handled.
> >
> > I'd say it's a must read for all NetEng/Arch working with Juniper MX.
> >
> >
>
>
> This is sure a very good resource (from the excellent David Roy). I've
> already read it in the past and forget most of it. So for icmp we are in
> the case of an "exception packet" that are "punted" to the RE. But the
> document did not detail what really happen in the RE. It mention that
> theses packets transit by gigabit ethernet interfaces in the TPP
> proprietary protocol, but nothing after. What daemon is in charge of
> handling TPP flow on the RE side ? rpd ? for icmp is at the end the
> packet go through the freebsd kernel (seems logic but). And what cause
> latency in response ?
>
>
> --
> Raphael Mazelier
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp



--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
Hi,

I built a setup where vMX(local-as 64512) has following 20 eBGP
neighbors over point-to-point connections:

AS64513: vmx1[ge-0/0/6.13] <-> [ge-0.0.6.13]PC
AS64514: vmx1[ge-0/0/6.14] <-> [ge-0.0.6.14]PC
AS64515: vmx1[ge-0/0/6.15] <-> [ge-0.0.6.15]PC
AS64516: vmx1[ge-0/0/6.16] <-> [ge-0.0.6.16]PC
AS64517: vmx1[ge-0/0/6.17] <-> [ge-0.0.6.17]PC
AS64518: vmx1[ge-0/0/6.18] <-> [ge-0.0.6.18]PC
AS64519: vmx1[ge-0/0/6.19] <-> [ge-0.0.6.19]PC
AS64520: vmx1[ge-0/0/6.20] <-> [ge-0.0.6.20]PC
AS64521: vmx1[ge-0/0/6.21] <-> [ge-0.0.6.21]PC
AS64522: vmx1[ge-0/0/6.22] <-> [ge-0.0.6.22]PC
AS64523: vmx1[ge-0/0/7.23] <-> [ge-0.0.7.23]PC
AS64524: vmx1[ge-0/0/7.24] <-> [ge-0.0.7.24]PC
AS64525: vmx1[ge-0/0/7.25] <-> [ge-0.0.7.25]PC
AS64526: vmx1[ge-0/0/7.26] <-> [ge-0.0.7.26]PC
AS64527: vmx1[ge-0/0/7.27] <-> [ge-0.0.7.27]PC
AS64528: vmx1[ge-0/0/7.28] <-> [ge-0.0.7.28]PC
AS64529: vmx1[ge-0/0/7.29] <-> [ge-0.0.7.29]PC
AS64530: vmx1[ge-0/0/7.30] <-> [ge-0.0.7.30]PC
AS64531: vmx1[ge-0/0/7.31] <-> [ge-0.0.7.31]PC
AS64532: vmx1[ge-0/0/7.32] <-> [ge-0.0.7.32]PC

vMX has an evaluation license installed in order to avoid shallow max
prefix limitation. In PC there is a BGP daemon(BIRD) running in a way
that first set of 4000 prefixes oscillates between sessions with
AS64513 and AS64523, i.e between vmx1[ge-0/0/6.13] <-> [ge-0.0.6.13]PC
and vmx1[ge-0/0/7.23] <-> [ge-0.0.7.23]PC connections. Second set of
4000 prefixes oscillates between vmx1[ge-0/0/6.14] <-> [ge-0.0.6.14]PC
and vmx1[ge-0/0/7.24] <-> [ge-0.0.7.24]PC connections. Third set of
4000 prefixes oscillates between vmx1[ge-0/0/6.15] <-> [ge-0.0.6.15]PC
and vmx1[ge-0/0/7.25] <-> [ge-0.0.7.25]PC connections, etc.
CPU usage of rpd fluctuates between 20% and 90% under such conditions.
Now when I execute the "ping 10.55.55.1 source 10.55.55.2 count 1000"
command, then summary of the results is following:

--- 10.55.55.1 ping statistics ---
1000 packets transmitted, 992 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.189/19.572/315.915/36.297 ms

I also plotted the results: https://i.imgur.com/wWTgLfX.png As
expected, the RTT is very unstable and there is even some packet loss.
When I repeat the "ping 10.55.55.1 source 10.55.55.2 count 1000"
command without route churn, and thus the CPU usage of rpd is <0.5%,
then the summary of the results is much more stable:

--- 10.55.55.1 ping statistics ---
1000 packets transmitted, 1000 packets received, 0% packet loss
round-trip min/avg/max/stddev = 0.193/2.259/85.518/9.258 ms

Plotted results are here: https://i.imgur.com/bFM5zHj.png

This might give an impression, that this delay is directly related to
rpd process or CPU load caused by rpd, but this does not seem to be
true. vMX provides an excellent way to prove this as traffic between
vCP and vFP goes through the Linux bridge in the host machine and can
be captured. When I execute the "ping 10.55.55.1 source 10.55.55.2"
command and measure the delay between the ICMP "echo request" packages
on forwarding plane Ethernet interface facing the RE and ICMP "echo
request" packages on server interface with 10.55.55.1 address
configured, then first packet capture does not detect the occasional
delay where the second packet capture in the server does. For example,
see the following five ping results from the Junos ping utility:

64 bytes from 10.55.55.1: icmp_seq=1 ttl=64 time=0.459 ms
64 bytes from 10.55.55.1: icmp_seq=2 ttl=64 time=0.434 ms
64 bytes from 10.55.55.1: icmp_seq=3 ttl=64 time=51.631 ms
64 bytes from 10.55.55.1: icmp_seq=4 ttl=64 time=0.622 ms
64 bytes from 10.55.55.1: icmp_seq=5 ttl=64 time=0.294 ms

Delays measured on the forwarding-plane Ethernet port facing the RE are stable:

00:00:01.008455 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4
(0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124
00:00:01.001053 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4
(0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124
00:00:01.006539 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4
(0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124
00:00:01.000228 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4
(0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124
00:00:01.004250 52:54:00:1c:7d:e6 > 52:54:00:17:a4:ae, ethertype IPv4
(0x0800), length 158: 128.0.0.1 > 128.0.0.16: ip-proto-84 124

I captured those with "tcpdump -ttt -nei vfp-int-vmx1 'ip proto 84 &&
src host 128.0.0.1 && ip[80:2] = 0x0800'" command. IP protocol 84 used
in the packet filter is Juniper proprietary Trivial Tunneling Protocol
(TTP) used to encapsulate external control traffic between the RE and
line-card microkernel. IPv4 address 128.0.0.1 is configured to em1.0
in RE. "ip[80:2] = 0x0800" matches ICMP "echo request" message ICMP
header type and code bytes inside the TTP packet, i.e type needs to be
0x08(Echo Request) and code needs to be 0x00.


However, same five packets captured on the ge-0.0.9-vmx1(this one has
the 10.55.55.1 configured) interface can be seen below:

00:00:01.008451 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4
(0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id
55340, seq 1, length 64
00:00:01.001077 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4
(0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id
55340, seq 2, length 64
00:00:01.017782 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4
(0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id
55340, seq 3, length 64
00:00:00.989049 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4
(0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id
55340, seq 4, length 64
00:00:01.004109 02:06:0a:0e:ff:f9 > fe:06:0a:0e:ff:f9, ethertype IPv4
(0x0800), length 98: 10.55.55.2 > 10.55.55.1: ICMP echo request, id
55340, seq 5, length 64

Third packet arrives later than normal and thus fourth packet arrives
earlier than normal, i.e exactly the same behavior demonstrated in my
initial e-mail. Packet capture was done with "tcpdump -ttt -nei
ge-0.0.9-vmx1 'icmp[icmptype] == 8'" command. In short, it seems to be
certain that ICMP "echo request" messages arrive to line-card
microkernel without the delay.

As already said by Saku, one logical explanation for this delay is the
line-card microkernel. As written by David Roy, the microkernel
processes TTP packets in software("TTP Transmit" process seen in the
output of "sh threads" on PFE CLI) and if the microkernel is for
example busy with FIB updates, then for example, the decapsulating of
TTP traffic and then forwarding the payload to MQ chip(or similar ASIC
in PFE) could indeed be affected. On the other hand, the overall LC
CPU usage according to "sh linux cpu usage" did not exceed 80% even
during the route churn, but I actually do not know what exactly this
utilization means..

Please let me know if anything was unclear or if someone has other
ideas or theories.


Martin
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
On Thu, 25 Apr 2019 at 00:19, Martin T <m4rtntns@gmail.com> wrote:

> in PFE) could indeed be affected. On the other hand, the overall LC
> CPU usage according to "sh linux cpu usage" did not exceed 80% even
> during the route churn, but I actually do not know what exactly this
> utilization means..

Like ethernet, CPU core is either busy (100%) or not busy (0%), and
load is looking some specific time frame and telling how often in that
time frame CPU was busy.
So 80% of time, work needed to compete for access to the CPU.

If I understood your explanation right (I may not have):

WireI => PFEi => RE => PFEe => WireE

You measure stable delay from WireI to PFEe even under load (so PFEi
is not contributing to jitter, neither is RE (running RPD)
contributing on either receive or send direction).
But you measure variant delay from WireI to WireE (so PFEe is largely
responsible of jitter)

So perhaps ingress side punt is done through LC CPU interrupt and
egress side is not, causing egress side to cause more jitter,
depending on what the egress PFE LC CPU is doing.

--
++ytti
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
hey,

> Please let me know if anything was unclear or if someone has other
> ideas or theories.

Been following this thread and do not have anything to contribute at
this point but wanted to say I (and I hope many others) appreciate this
type of proper debugging given the tools we have available these days.

--
tarko
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
Hi Saku,

> So 80% of time, work needed to compete for access to the CPU.

Yes. I hesitated because when I compare the ping results and vFP CPU
utilization plus microkernel threads CPU usage during the route churn
and without the route churn, then while CPU utilization clearly
increased, it did not *seem* to increase enough to explain such
drastic RTT fluctuation and even some packet loss. On the other hand,
I have absolutely no idea how the microkernel schedules its
processes/threads or how the virtualized forwarding plane distributes
those four CPU cores it has.
Graphical representation of the output of "show linux cpu usage" PFE
CLI command can be seen here: https://i.imgur.com/f1vMAn4.png and
https://i.imgur.com/vm5nr2S.png. First graph represent CPU cores
utilization under normal conditions. Second graph represents CPU cores
utilization during the route churn. Those four CPU cores are the host
machine CPU cores where the quemu virtualizes the forwarding plane.


> If I understood your explanation right (I may not have):

At this point, I was focusing on the ICMP "echo request" message only.
Network topology seen from Linux is following:

vcp-vmx1[vcp-int-vmx1] <-> [2]br-int-vmx1[3] <->
[vfp-int-vmx1]vfp-vmx1[ge-0.0.9-vmx1]

vcp-vmx1 is RE running Junos and vcp-int-vmx1 is em1 interface in
Junos. br-int-vmx1(Linux bridge) serves the similar purpose to
integrated switch in SCB seen with the "show chassis ethernet-switch"
command in Junos. vfp-int-vmx1 is the line-card Ethernet interface
used by microkernel to communicate with RE. ge-0.0.9-vmx1 represents
the ge-0/0/9 seen in Junos which is directly connected to TAP device
ge-0.0.9-vmx1:

martin@PC:~$ sudo ethtool -i ge-0.0.9-vmx1
driver: tun
version: 1.6
firmware-version:
expansion-rom-version:
bus-info: tap
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
martin@PC:~$

So when I execute the "ping 10.55.55.1 source 10.55.55.2 count 1"
command in RE, then the ICMP "echo request" package travels out from
the RE via vcp-int-vmx1 interface, through the Linux bridge into the
line-card via vfp-int-vmx1 and then to ge-0.0.9-vmx1 in host machine
which has 10.55.55.1 configured. I did packet capture simultaneously
on vfp-int-vmx1 and ge-0.0.9-vmx1 interfaces.


Martin
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
On Thu, 25 Apr 2019 at 08:49, Tarko Tikan <tarko@lanparty.ee> wrote:
>
> hey,
>
> > Please let me know if anything was unclear or if someone has other
> > ideas or theories.
>
> Been following this thread and do not have anything to contribute at
> this point but wanted to say I (and I hope many others) appreciate this
> type of proper debugging given the tools we have available these days.

I agree, this has been interesting to read and a commendable level of
debugging has been carried out, however I'm not sure why.

There are 100 different reasons why packets could be delayed when
passing through the RE as opposed to the forwarding plane.

Trying to debug which one is causing this specific issue seems like a
dog chasing his tail. Once the cause has been identified, and
resolved, after the next software update something else will trigger
the same unwanted behaviour :)

The transit forwarding path through the router has a much more
strictly bound packet processing loop than the RE so I wouldn't really
be relying on it for anything.

Cheers,
James.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: What exactly causes inconsistent RTT seen using ping utility in Junos? [ In reply to ]
As I had access to lab MX960, then I made a similar packet capture as
described in my Apr 25 e-mail. Only difference is that I didn't
capture packets on forwarding-plane Ethernet interface facing the RE,
but on RE Ethernet interface facing the forwarding plane, i.e "start
shell sh" in Junos and then "su" and "tcpdump -ttt -nei em0 'ip proto
84 && src host 128.0.0.1 && ip[86:2] = 0x0800'" where em0 connects to
SCB integrated switch. As with vMX, the delay does not come from the
RE.

I also started following two commands and let them collect ping
results and line-card CPU utilization over night:

$ ssh -qT mx960 'ping 10.66.66.1 source 10.66.66.2' | grep
--line-buffered -oP '(?<=time=)[0-9.]+' | \
> while read -r rtt; do time=$(printf '%(%s)T'); printf '%s %s\n' "$time" "$rtt" >> MX960_ping_results; done

$ while true; do \
> regex='([0-9]+)% \(1 second\), ([0-9]+)% \(5 second\)'; \
> while read -r line; do \
> if [[ $line =~ $regex ]]; then \
> time=$(printf '%(%s)T'); \
> printf '%s %s\n' "$time" "${BASH_REMATCH[1]}" >> LC_CPU_util_1s; \
> printf '%s %s\n' "$time" "${BASH_REMATCH[2]}" >> LC_CPU_util_5s; \
> fi; \
> done < <(timeout 1 ssh -qT mx960 'request pfe execute target fpc3 command "show sched"'); \
> done

First command logs ping("ping 10.66.66.1 source 10.66.66.2" command in
Junos) results with Unix timestamps into file. File content will be:

1556775780 0.586
1556775781 0.579
1556775782 0.607
/* output removed for brevity */

Second command logs 1 second and 5 second LC CPU utilization value
from the PFE3(egress interface for 10.66.66.1 is ge-3/3/9) CLI "show
sched" command into two files. This means that both 1 second and 5
second CPU utilization log files have similar content:

1556775780 9
1556775782 9
1556775783 9
/* output removed for brevity */

One 360 seconds long period with three high-RTT cases can be seen
here: https://i.imgur.com/TZeIbDd.png Left-hand y-axis is RTT and
right-hand y-axis is CPU utilization in percentages. Again, there
doesn't seem to be much correlation between line-card CPU utilization
and high RTT..

At this point I'm fairly sure that this occasional RTT is introduced
in line-card and not in RE, but I'm not convinced that it has anything
to do with the CPU utilization on line-card running the microkernel.
However, I'm not sure if it is possible to debug this much further
without access to microkernel source code and knowing the forwarding
plane internals. There is for example possibility to make a primitive
packet capture inside the microkernel("show pfe host-outbound
packets"), but it does not allow filtering and printing of timestamps.

Again, please let me know if my explanations weren't clear or if there
are any additional ideas.


thanks,
Martin
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp