Mailing List Archive: High latency and slow connections

High latency and slow connections

blaz at inlimbo

Nov 5, 2003, 10:12 AM

Post #1 of 12 (4931 views)

A Juniper M5 running 5.5R1.2 has two STM-1's and a gigabit ethernet
connection. It's been running fine for ages (uptime is 366 days). Today, we
suddenly started seeing extremely high latency through the box (40 ms ping
from a fast ethernet connected box, where it was usually below 1 ms). All
connections exhibit this, no matter which interface they're comming from or
where they're going to. Traffic levels on all interfaces are normal, no high
packet per second rates. No configuration changes.

Only unusual thing that has happened today, one of the two STM-1's has gone
down for about two minutes, this was the first outage on that STM-1 for the
past 365 days, the outage was because of a problem with the SDH equipment.

I have scheduled an emergency reboot of the box because at this point that's
all I can do. I'll upgrade to 5.7 as well to get rid of the annoying
"temperature sensor failed" messages.

Any idea what could be going on. Anyone seen something simmilar?

High latency and slow connections [ In reply to ]

gtate at juniper

Nov 5, 2003, 11:16 AM

Post #2 of 12 (4915 views)

Are the pings transiting the M5 or source from the RE?

You say you have a fastethernet connected box. Where is this box
connected, is it on the production network or connected to the
management network via fxp0?

If the FE Box is on the network, is there any other equipment in the
path? Do you have a way to isolate a path to carry out transit pings
across the M5 only?

It is hard to think of a cause for such a dramatic increase in the data
plane of the M5, only buffering would introduce such delays and as the
interfaces have low levels of traffic this would not be the case. If
however the pings are being sourced from the RE there may be some
increased control traffic that is taking precedence over the pings.

Gary

On Wednesday, Nov 5, 2003, at 07:12 US/Pacific, Blaz Zupan wrote:

> A Juniper M5 running 5.5R1.2 has two STM-1's and a gigabit ethernet
> connection. It's been running fine for ages (uptime is 366 days).
> Today, we
> suddenly started seeing extremely high latency through the box (40 ms
> ping
> from a fast ethernet connected box, where it was usually below 1 ms).
> All
> connections exhibit this, no matter which interface they're comming
> from or
> where they're going to. Traffic levels on all interfaces are normal,
> no high
> packet per second rates. No configuration changes.
>
> Only unusual thing that has happened today, one of the two STM-1's has
> gone
> down for about two minutes, this was the first outage on that STM-1
> for the
> past 365 days, the outage was because of a problem with the SDH
> equipment.
>
> I have scheduled an emergency reboot of the box because at this point
> that's
> all I can do. I'll upgrade to 5.7 as well to get rid of the annoying
> "temperature sensor failed" messages.
>
> Any idea what could be going on. Anyone seen something simmilar?
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
>

High latency and slow connections [ In reply to ]

blaz at inlimbo

Nov 5, 2003, 12:16 PM

Post #3 of 12 (4929 views)

> Are the pings transiting the M5 or source from the RE?

Both packets that transit the M5 and those where the RE is the source or
destination exhibited this problem. The load on the RE was 0.1, although I did
happen to catch a moment where the load jumped to 1.0 with "sampled" and "rpd"
taking most of the CPU. I have netflow accounting turned on (with every 100th
packed being sampled) so some spikes on sampled CPU usage are to be expected.

> You say you have a fastethernet connected box. Where is this box
> connected, is it on the production network or connected to the
> management network via fxp0?

The fastethernet connected box is connected to a VLAN on a Cisco 3550, which
has a gigabit trunk to the M5. But even packets comming in one one STM-1 and
going out the other STM-1 exhibited this problem. Actually *any* packet going
through the M5 to any direction had this problem.

> If the FE Box is on the network, is there any other equipment in the
> path? Do you have a way to isolate a path to carry out transit pings
> across the M5 only?

Here is some crude ASCII art. Obviously you need a monospace font to see it.

us1 us2
| |
|STM1 |STM1
| |
| |
lj2--STM1--mb3
/ |
3550 3550
| / \ \
s1 s2 s3 s4

"mb3" is the M5 having trouble. "lj2" is another M5, connected to mb3 through
a STM-1. "us1" and "us2" are our upstreams. "3550" are two Cisco 3550
switches, while s1 to s4 are various servers.

All tests have been done with both ping and traceroute. We also had complaints
from customers about slow connections (customer connected directly to one of
the switches and rate limited to 4 Mbps saw 350 Kbps speeds on downloads).

I did traceroutes from s2 to mb3, from s2 to lj2, from s2 to s1, from s2 to
us1, from s1 to us2, from s1 to s2, from s1 to us2. Also from mb3 to lj2, us1,
us2 and s2, etc. All showed the same results.

Pings from s2 to mb3 showed 10 ms (they are below 1 ms normally). Pings from
s2 to s1 showed around 30 to 40 ms, they are normally around 4 ms.

> It is hard to think of a cause for such a dramatic increase in the data
> plane of the M5, only buffering would introduce such delays and as the
> interfaces have low levels of traffic this would not be the case. If
> however the pings are being sourced from the RE there may be some
> increased control traffic that is taking precedence over the pings.

Well, low level of traffic is a relative term. They had normal traffic levels
for this time of day. The STM-1 between lj2 and mb3 was loaded around 60 Mbps
in both directions, the STM-1 to us2 was around 100 Mbps, the gigabit ethernet
from mb3 to the 3550 was at around 140 Mbps.

I do have some QoS configuration on some of the links, but I removed all of it
(literally) as a test. The interesting effect was an even higher latency (by
about another 10 ms).

I checked out all pps rates on all the links. Most were between 10000 and
30000 pps, which is normal. During DoS attacks we usually see peaks of even
80000 pps or more and this was handled by the M5 without any trouble.

Interesting enough, I saw a simmilar problem a couple of months ago. A DDoS
attack was targeted at a customer downstream of mb3 (connected to another
router which is connected to the 3550). The DDoS was hitting us through us2
with about 100000 pps and it completely filled up the remaining capacity on
the STM-1 to us2. Latency shot through the roof which was to be expected.

But the annoying part was, that even traffic traversing the gigabit ethernet
and going for example from s2 to s1 (which are not traversing us2 but are
going s2-mb3-lj2-s1) was seeing extremely high latency (300 ms and above). I
would have expected that gigabit ethernet can easily handle 100000 pps and
about 200 Mbps of traffic. It was difficult to diagnose anything while under
the pressure of a DoS attack, so our upstream filtered the attack and I did
not collect any useful data. But it seemed very much simmilar to what we
experienced today, except that today *all* traffic going through *any*
interface was affected.

Here's the output of "show chassis hardware detail" on the troubled box:

Hardware inventory:
Item Version Part number Serial number Description
Chassis 59496 M5
Midplane REV 03 710-002650 HB1361
Power Supply A Rev 04 740-002497 LK22841 AC
Power Supply B Rev 04 740-002497 LK23113 AC
Display REV 04 710-001995 HJ3124
Routing Engine REV 04 740-003877 9000019802 RE-2.0
Routing Engine 7400000734579001 RE-2.0
FEB REV 05 710-003311 HJ3655 E-FEB
FPC 0
PIC 0 REV 01 750-005091 HD1292 1x G/E, 1000 BASE-SX
PIC 1 REV 03 750-003748 HH1347 2x STM-1 SDH, SMIR

The box is running 5.7R3.4. Here's a traceroute from s2 to s1, now
that everything seems to be ok again:

traceroute to fog.amis.net (212.18.32.146), 64 hops max, 44 byte packets
1 mb3-ge-0-0-0-3.router.amis.net (212.18.32.1) 0.859 ms 0.505 ms 0.626 ms
2 lj2-so-0-2-1-0.router.amis.net (212.18.35.114) 13.176 ms 3.131 ms 3.170
ms
3 fog (212.18.32.146) 3.292 ms 4.109 ms 3.458 ms

High latency and slow connections [ In reply to ]

blaz at inlimbo

Nov 6, 2003, 3:49 PM

Post #4 of 12 (4915 views)

Yesterdays voodoo seems to be continuing today. After the reboot and upgrade
from JunOS 5.5 to JunOS 5.7, the box seemed to behave itself with normal
latency below 1 ms for traffic going from one gig VLAN to another gig VLAN
through the M5.

But today when we reached this days peak utilization, latency started to ramp
up again and it "stabilized" around 30 ms for traffic going through the
gigabit ethernet on the box from one VLAN to another VLAN (from server 1 to
server 2):

___ server 1
/
Juniper M5 --- Cisco 3550
\___ server 2

This time I did some experimentation. I started with CoS and configured the
"best-effort" forwarding class with a buffer size of 0 percent:

scheduler-maps {
data-scheduler {
buffer-size percent 0;
}
}

As soon as I commited this, the latency dropped below 1 ms. *BUT*, now I saw
about 2% packet loss on pings going from one VLAN to the other VLAN through
the M5. So, apparently the box fills up the queue on the gigabit PIC and when
the queue is full it starts to buffer packets which shoots up the latency. If
I remove the buffer, it instead drops the excess packets as it can't do
anything else with them when the queue is full.

Somebody might say, normal behaviour for a congested link. Sure, but at the
time this was happening, the gigabit was doing about 130 Mbps in both
directions. So either I can't read or I have the worlds first gigabit ethernet
that only does 130 Mbps. Even if you consider traffic spikes, they can't shoot
up from a 1 second average of 130 Mbps to a 1 second average of 1 Gbps to be
able fill up the queues on the PIC.

Now later that day, latency suddenly dropped below 1 ms even with "buffer-size
percent 95". Looking at the traffic rate on the gigabit PIC, it was around 100
Mbps. As soon as the traffic rate again went above 130 Mbps, latency was again
around 30 ms. So, 130 Mbps seems to be the "sweet spot".

To make sure there's no mistakes in my CoS configuration, I deleted the
complete class-of-service hierarchy from the configuration. There are no
firewall filters or policers on any of the VLANs on the gigabit ethernet
except for a firewall filter the classifies traffic from our VoIP gateways and
puts them into the VoIP queue. I removed that as well. We do have
"encapsulation vlan-ccc" configured, as a couple of Layer 2 VPN's terminate on
this box. But otherwise, there's nothing unusual in there that could be
affecting the box in this way.

With all this information, I can actually partly explain yesterdays weirdness.
Apparently our traffic utilization on the gigabit PIC went above 130 Mbps for
the first time yesterday, that's why we didn't see the high latency until
yesterday. Looking at our mrtg graphs, this indeed seems to be the case.

A spare gigabit PIC which we need for another project should be shipped any
time now, so I'll try to replace the PIC as soon as the spare arrives.

Other than hardware, does anyone have any suggestions? What kind of stupidity
could I have commited to the configuration to degrade a gigabit ethernet link
to the level of a STM-1?

High latency and slow connections [ In reply to ]

harry at juniper

Nov 6, 2003, 5:19 PM

Post #5 of 12 (4905 views)

You should open a case with JTAC. I recall that there was an issue at one
time when using an E-FPC (b3 chip) where congestion on a gig-e might cause
the queues on other PICs sharing that FPC to start backing up. JTAC will
know the preferred way to resolve if this is what you are seeing.

HTHs.

> -----Original Message-----
> From: juniper-nsp-bounces@puck.nether.net
> [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Blaz Zupan
> Sent: Thursday, November 06, 2003 12:50 PM
> To: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] High latency and slow connections
>
>
> Yesterdays voodoo seems to be continuing today. After the
> reboot and upgrade from JunOS 5.5 to JunOS 5.7, the box
> seemed to behave itself with normal latency below 1 ms for
> traffic going from one gig VLAN to another gig VLAN through the M5.
>
> But today when we reached this days peak utilization, latency
> started to ramp up again and it "stabilized" around 30 ms
> for traffic going through the gigabit ethernet on the box
> from one VLAN to another VLAN (from server 1 to server 2):
>
> ___ server 1
> /
> Juniper M5 --- Cisco 3550
> \___ server 2
>
> This time I did some experimentation. I started with CoS and
> configured the "best-effort" forwarding class with a buffer
> size of 0 percent:
>
> scheduler-maps {
> data-scheduler {
> buffer-size percent 0;
> }
> }
>
> As soon as I commited this, the latency dropped below 1 ms.
> *BUT*, now I saw about 2% packet loss on pings going from one
> VLAN to the other VLAN through the M5. So, apparently the box
> fills up the queue on the gigabit PIC and when the queue is
> full it starts to buffer packets which shoots up the latency.
> If I remove the buffer, it instead drops the excess packets
> as it can't do anything else with them when the queue is full.
>
> Somebody might say, normal behaviour for a congested link.
> Sure, but at the time this was happening, the gigabit was
> doing about 130 Mbps in both directions. So either I can't
> read or I have the worlds first gigabit ethernet that only
> does 130 Mbps. Even if you consider traffic spikes, they
> can't shoot up from a 1 second average of 130 Mbps to a 1
> second average of 1 Gbps to be able fill up the queues on the PIC.
>
> Now later that day, latency suddenly dropped below 1 ms even
> with "buffer-size percent 95". Looking at the traffic rate on
> the gigabit PIC, it was around 100 Mbps. As soon as the
> traffic rate again went above 130 Mbps, latency was again
> around 30 ms. So, 130 Mbps seems to be the "sweet spot".
>
> To make sure there's no mistakes in my CoS configuration, I
> deleted the complete class-of-service hierarchy from the
> configuration. There are no firewall filters or policers on
> any of the VLANs on the gigabit ethernet except for a
> firewall filter the classifies traffic from our VoIP gateways
> and puts them into the VoIP queue. I removed that as well.
> We do have "encapsulation vlan-ccc" configured, as a couple
> of Layer 2 VPN's terminate on this box. But otherwise,
> there's nothing unusual in there that could be affecting the
> box in this way.
>
> With all this information, I can actually partly explain
> yesterdays weirdness. Apparently our traffic utilization on
> the gigabit PIC went above 130 Mbps for the first time
> yesterday, that's why we didn't see the high latency until
> yesterday. Looking at our mrtg graphs, this indeed seems to
> be the case.
>
> A spare gigabit PIC which we need for another project should
> be shipped any time now, so I'll try to replace the PIC as
> soon as the spare arrives.
>
> Other than hardware, does anyone have any suggestions? What
> kind of stupidity could I have commited to the configuration
> to degrade a gigabit ethernet link to the level of a STM-1?
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/junipe> r-nsp
>

High latency and slow connections [ In reply to ]

blaz at inlimbo

Nov 7, 2003, 1:15 AM

Post #6 of 12 (4901 views)

> You should open a case with JTAC. I recall that there was an issue at one
> time when using an E-FPC (b3 chip) where congestion on a gig-e might cause
> the queues on other PICs sharing that FPC to start backing up. JTAC will
> know the preferred way to resolve if this is what you are seeing.

I'd love to, but we don't have a support contract...

High latency and slow connections [ In reply to ]

ariel.brunetto at ifxnw

Nov 7, 2003, 2:54 PM

Post #7 of 12 (4935 views)

Hi Harry,

Where is the chip with the B3 label? EFPC or GigE?

Thank you
Ariel Brunetto

-----Mensaje original-----
De: juniper-nsp-bounces@puck.nether.net
[mailto:juniper-nsp-bounces@puck.nether.net]En nombre de harry
Enviado el: Jueves, 06 de Noviembre de 2003 07:20 p.m.
Para: 'Blaz Zupan'; juniper-nsp@puck.nether.net
Asunto: RE: [j-nsp] High latency and slow connections

You should open a case with JTAC. I recall that there was an issue at one
time when using an E-FPC (b3 chip) where congestion on a gig-e might cause
the queues on other PICs sharing that FPC to start backing up. JTAC will
know the preferred way to resolve if this is what you are seeing.

HTHs.

> -----Original Message-----
> From: juniper-nsp-bounces@puck.nether.net
> [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Blaz Zupan
> Sent: Thursday, November 06, 2003 12:50 PM
> To: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] High latency and slow connections
>
>
> Yesterdays voodoo seems to be continuing today. After the
> reboot and upgrade from JunOS 5.5 to JunOS 5.7, the box
> seemed to behave itself with normal latency below 1 ms for
> traffic going from one gig VLAN to another gig VLAN through the M5.
>
> But today when we reached this days peak utilization, latency
> started to ramp up again and it "stabilized" around 30 ms
> for traffic going through the gigabit ethernet on the box
> from one VLAN to another VLAN (from server 1 to server 2):
>
> ___ server 1
> /
> Juniper M5 --- Cisco 3550
> \___ server 2
>
> This time I did some experimentation. I started with CoS and
> configured the "best-effort" forwarding class with a buffer
> size of 0 percent:
>
> scheduler-maps {
> data-scheduler {
> buffer-size percent 0;
> }
> }
>
> As soon as I commited this, the latency dropped below 1 ms.
> *BUT*, now I saw about 2% packet loss on pings going from one
> VLAN to the other VLAN through the M5. So, apparently the box
> fills up the queue on the gigabit PIC and when the queue is
> full it starts to buffer packets which shoots up the latency.
> If I remove the buffer, it instead drops the excess packets
> as it can't do anything else with them when the queue is full.
>
> Somebody might say, normal behaviour for a congested link.
> Sure, but at the time this was happening, the gigabit was
> doing about 130 Mbps in both directions. So either I can't
> read or I have the worlds first gigabit ethernet that only
> does 130 Mbps. Even if you consider traffic spikes, they
> can't shoot up from a 1 second average of 130 Mbps to a 1
> second average of 1 Gbps to be able fill up the queues on the PIC.
>
> Now later that day, latency suddenly dropped below 1 ms even
> with "buffer-size percent 95". Looking at the traffic rate on
> the gigabit PIC, it was around 100 Mbps. As soon as the
> traffic rate again went above 130 Mbps, latency was again
> around 30 ms. So, 130 Mbps seems to be the "sweet spot".
>
> To make sure there's no mistakes in my CoS configuration, I
> deleted the complete class-of-service hierarchy from the
> configuration. There are no firewall filters or policers on
> any of the VLANs on the gigabit ethernet except for a
> firewall filter the classifies traffic from our VoIP gateways
> and puts them into the VoIP queue. I removed that as well.
> We do have "encapsulation vlan-ccc" configured, as a couple
> of Layer 2 VPN's terminate on this box. But otherwise,
> there's nothing unusual in there that could be affecting the
> box in this way.
>
> With all this information, I can actually partly explain
> yesterdays weirdness. Apparently our traffic utilization on
> the gigabit PIC went above 130 Mbps for the first time
> yesterday, that's why we didn't see the high latency until
> yesterday. Looking at our mrtg graphs, this indeed seems to
> be the case.
>
> A spare gigabit PIC which we need for another project should
> be shipped any time now, so I'll try to replace the PIC as
> soon as the spare arrives.
>
> Other than hardware, does anyone have any suggestions? What
> kind of stupidity could I have commited to the configuration
> to degrade a gigabit ethernet link to the level of a STM-1?
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/junipe> r-nsp
>

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp

High latency and slow connections [ In reply to ]

jlin at doradosoftware

Nov 7, 2003, 3:07 PM

Post #8 of 12 (4911 views)

B3 chip's on the fpc

-----Original Message-----
From: juniper-nsp-bounces@puck.nether.net
[mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Ariel Brunetto
Sent: Friday, November 07, 2003 11:56 AM
To: harry
Cc: juniper-nsp@puck.nether.net
Subject: RE: [j-nsp] High latency and slow connections

Hi Harry,

Where is the chip with the B3 label? EFPC or GigE?

Thank you
Ariel Brunetto

-----Mensaje original-----
De: juniper-nsp-bounces@puck.nether.net
[mailto:juniper-nsp-bounces@puck.nether.net]En nombre de harry
Enviado el: Jueves, 06 de Noviembre de 2003 07:20 p.m.
Para: 'Blaz Zupan'; juniper-nsp@puck.nether.net
Asunto: RE: [j-nsp] High latency and slow connections

You should open a case with JTAC. I recall that there was an issue at
one
time when using an E-FPC (b3 chip) where congestion on a gig-e might
cause
the queues on other PICs sharing that FPC to start backing up. JTAC will
know the preferred way to resolve if this is what you are seeing.

HTHs.

> -----Original Message-----
> From: juniper-nsp-bounces@puck.nether.net
> [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Blaz Zupan
> Sent: Thursday, November 06, 2003 12:50 PM
> To: juniper-nsp@puck.nether.net
> Subject: Re: [j-nsp] High latency and slow connections
>
>
> Yesterdays voodoo seems to be continuing today. After the
> reboot and upgrade from JunOS 5.5 to JunOS 5.7, the box
> seemed to behave itself with normal latency below 1 ms for
> traffic going from one gig VLAN to another gig VLAN through the M5.
>
> But today when we reached this days peak utilization, latency
> started to ramp up again and it "stabilized" around 30 ms
> for traffic going through the gigabit ethernet on the box
> from one VLAN to another VLAN (from server 1 to server 2):
>
> ___ server 1
> /
> Juniper M5 --- Cisco 3550
> \___ server 2
>
> This time I did some experimentation. I started with CoS and
> configured the "best-effort" forwarding class with a buffer
> size of 0 percent:
>
> scheduler-maps {
> data-scheduler {
> buffer-size percent 0;
> }
> }
>
> As soon as I commited this, the latency dropped below 1 ms.
> *BUT*, now I saw about 2% packet loss on pings going from one
> VLAN to the other VLAN through the M5. So, apparently the box
> fills up the queue on the gigabit PIC and when the queue is
> full it starts to buffer packets which shoots up the latency.
> If I remove the buffer, it instead drops the excess packets
> as it can't do anything else with them when the queue is full.
>
> Somebody might say, normal behaviour for a congested link.
> Sure, but at the time this was happening, the gigabit was
> doing about 130 Mbps in both directions. So either I can't
> read or I have the worlds first gigabit ethernet that only
> does 130 Mbps. Even if you consider traffic spikes, they
> can't shoot up from a 1 second average of 130 Mbps to a 1
> second average of 1 Gbps to be able fill up the queues on the PIC.
>
> Now later that day, latency suddenly dropped below 1 ms even
> with "buffer-size percent 95". Looking at the traffic rate on
> the gigabit PIC, it was around 100 Mbps. As soon as the
> traffic rate again went above 130 Mbps, latency was again
> around 30 ms. So, 130 Mbps seems to be the "sweet spot".
>
> To make sure there's no mistakes in my CoS configuration, I
> deleted the complete class-of-service hierarchy from the
> configuration. There are no firewall filters or policers on
> any of the VLANs on the gigabit ethernet except for a
> firewall filter the classifies traffic from our VoIP gateways
> and puts them into the VoIP queue. I removed that as well.
> We do have "encapsulation vlan-ccc" configured, as a couple
> of Layer 2 VPN's terminate on this box. But otherwise,
> there's nothing unusual in there that could be affecting the
> box in this way.
>
> With all this information, I can actually partly explain
> yesterdays weirdness. Apparently our traffic utilization on
> the gigabit PIC went above 130 Mbps for the first time
> yesterday, that's why we didn't see the high latency until
> yesterday. Looking at our mrtg graphs, this indeed seems to
> be the case.
>
> A spare gigabit PIC which we need for another project should
> be shipped any time now, so I'll try to replace the PIC as
> soon as the spare arrives.
>
> Other than hardware, does anyone have any suggestions? What
> kind of stupidity could I have commited to the configuration
> to degrade a gigabit ethernet link to the level of a STM-1?
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/junipe> r-nsp
>

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp

_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
http://puck.nether.net/mailman/listinfo/juniper-nsp

High latency and slow connections [ In reply to ]

jesper at skriver

Nov 7, 2003, 3:08 PM

Post #9 of 12 (4912 views)

On Fri, Nov 07, 2003 at 04:55:56PM -0300, Ariel Brunetto wrote:
> Hi Harry,
>
> Where is the chip with the B3 label? EFPC or GigE?

The B chip is on the FPC, which in the case of the M5 is build in to the
FEB.

If you have a E-FEB (show chassis hardward should say so if I remeber
correct), you have the B3 chip.

/Jesper

> Thank you
> Ariel Brunetto
>
>
> -----Mensaje original-----
> De: juniper-nsp-bounces@puck.nether.net
> [mailto:juniper-nsp-bounces@puck.nether.net]En nombre de harry
> Enviado el: Jueves, 06 de Noviembre de 2003 07:20 p.m.
> Para: 'Blaz Zupan'; juniper-nsp@puck.nether.net
> Asunto: RE: [j-nsp] High latency and slow connections
>
>
> You should open a case with JTAC. I recall that there was an issue at one
> time when using an E-FPC (b3 chip) where congestion on a gig-e might cause
> the queues on other PICs sharing that FPC to start backing up. JTAC will
> know the preferred way to resolve if this is what you are seeing.
>
> HTHs.
>
>
>
> > -----Original Message-----
> > From: juniper-nsp-bounces@puck.nether.net
> > [mailto:juniper-nsp-bounces@puck.nether.net] On Behalf Of Blaz Zupan
> > Sent: Thursday, November 06, 2003 12:50 PM
> > To: juniper-nsp@puck.nether.net
> > Subject: Re: [j-nsp] High latency and slow connections
> >
> >
> > Yesterdays voodoo seems to be continuing today. After the
> > reboot and upgrade from JunOS 5.5 to JunOS 5.7, the box
> > seemed to behave itself with normal latency below 1 ms for
> > traffic going from one gig VLAN to another gig VLAN through the M5.
> >
> > But today when we reached this days peak utilization, latency
> > started to ramp up again and it "stabilized" around 30 ms
> > for traffic going through the gigabit ethernet on the box
> > from one VLAN to another VLAN (from server 1 to server 2):
> >
> > ___ server 1
> > /
> > Juniper M5 --- Cisco 3550
> > \___ server 2
> >
> > This time I did some experimentation. I started with CoS and
> > configured the "best-effort" forwarding class with a buffer
> > size of 0 percent:
> >
> > scheduler-maps {
> > data-scheduler {
> > buffer-size percent 0;
> > }
> > }
> >
> > As soon as I commited this, the latency dropped below 1 ms.
> > *BUT*, now I saw about 2% packet loss on pings going from one
> > VLAN to the other VLAN through the M5. So, apparently the box
> > fills up the queue on the gigabit PIC and when the queue is
> > full it starts to buffer packets which shoots up the latency.
> > If I remove the buffer, it instead drops the excess packets
> > as it can't do anything else with them when the queue is full.
> >
> > Somebody might say, normal behaviour for a congested link.
> > Sure, but at the time this was happening, the gigabit was
> > doing about 130 Mbps in both directions. So either I can't
> > read or I have the worlds first gigabit ethernet that only
> > does 130 Mbps. Even if you consider traffic spikes, they
> > can't shoot up from a 1 second average of 130 Mbps to a 1
> > second average of 1 Gbps to be able fill up the queues on the PIC.
> >
> > Now later that day, latency suddenly dropped below 1 ms even
> > with "buffer-size percent 95". Looking at the traffic rate on
> > the gigabit PIC, it was around 100 Mbps. As soon as the
> > traffic rate again went above 130 Mbps, latency was again
> > around 30 ms. So, 130 Mbps seems to be the "sweet spot".
> >
> > To make sure there's no mistakes in my CoS configuration, I
> > deleted the complete class-of-service hierarchy from the
> > configuration. There are no firewall filters or policers on
> > any of the VLANs on the gigabit ethernet except for a
> > firewall filter the classifies traffic from our VoIP gateways
> > and puts them into the VoIP queue. I removed that as well.
> > We do have "encapsulation vlan-ccc" configured, as a couple
> > of Layer 2 VPN's terminate on this box. But otherwise,
> > there's nothing unusual in there that could be affecting the
> > box in this way.
> >
> > With all this information, I can actually partly explain
> > yesterdays weirdness. Apparently our traffic utilization on
> > the gigabit PIC went above 130 Mbps for the first time
> > yesterday, that's why we didn't see the high latency until
> > yesterday. Looking at our mrtg graphs, this indeed seems to
> > be the case.
> >
> > A spare gigabit PIC which we need for another project should
> > be shipped any time now, so I'll try to replace the PIC as
> > soon as the spare arrives.
> >
> > Other than hardware, does anyone have any suggestions? What
> > kind of stupidity could I have commited to the configuration
> > to degrade a gigabit ethernet link to the level of a STM-1?
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > http://puck.nether.net/mailman/listinfo/junipe> r-nsp
> >
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
>
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp
>

/Jesper

--
Jesper Skriver, jesper(at)skriver(dot)dk - CCIE #5456

One Unix to rule them all, One Resolver to find them,
One IP to bring them all and in the zone to bind them.

High latency and slow connections [ In reply to ]

blaz at inlimbo

Nov 8, 2003, 3:27 AM

Post #10 of 12 (4909 views)

> > Where is the chip with the B3 label? EFPC or GigE?
> The B chip is on the FPC, which in the case of the M5 is build in to the
> FEB.
>
> If you have a E-FEB (show chassis hardward should say so if I remeber
> correct), you have the B3 chip.

We just put in the replacement M10 and are waiting for the traffic to go above
130 Mbps on the gigabit. Unfortunatelly saturday mornings are very slow going,
so we'll have to wait some time. The box is new, but the gigabit ethernet PIC
stayed the same. Unfortunatelly I found out the hard way that this M10 only
has the non-enhanced FEB. It's very badly documented, I only noticed because
the previously working configuration produced this:

blaz@maribor3> show interfaces queue ge-0/0/0
Queue statistics are not applicable to this interface.

So I started investigating and found out that the M10 has the non-E FEB (it
was bought used). Curiously enough, it accepted all the CoS commands from the
previous configuration which included queue priorities and DSCP marking, both
of which are unsupported on the non-E FEB according to www.juniper.net.
Even weirder, "show interface extensive" shows this:

...
CoS transmit queue Bandwidth Buffer Priority Limit
% bps % bytes
0 data 55 550000000 55 0 low none
1 voice 20 200000000 20 0 high none
2 vpn 20 200000000 20 0 low none
3 network 5 50000000 5 0 high none
...

Look at the Priority column. This is what I have configured. But why is it
displayed, if the hardware does not support it? Anybody have a spare
FEB-M10-E-S that they're willing to trade for a FEB-M10-S? :-(

High latency and slow connections [ In reply to ]

josefb at juniper

Nov 8, 2003, 12:12 PM

Post #11 of 12 (4906 views)

> Look at the Priority column. This is what I have configured. But why is it
> displayed, if the hardware does not support it?

as always, look at the message log and then you will see a
bunch of errors which will comment that the configured cos
settings are not supported. At the time of the commit the cli
does not now that underneath is an non E-FPC so it sees this
once it starts programming the ASIC's and then it returns
errors in the message log.

> Anybody have a spare
> FEB-M10-E-S that they're willing to trade for a FEB-M10-S? :-(
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> http://puck.nether.net/mailman/listinfo/juniper-nsp

High latency and slow connections [ In reply to ]

blaz at inlimbo

Nov 8, 2003, 2:01 PM

Post #12 of 12 (4911 views)

I once again have access to my voodoo doll, so the black magic is over - I've
figured out the cause of my problems with high latency and slow connections
through the gigabit ethernet on our Juniper box.

After replacing the complete hardware (M5 and E-FEB with M10 and non-E-FEB,
also new gigabit ethernet PIC and SDH PIC), the problem persisted. Thinking
logically, something apparently must be telling the box to slow down. How do
we call that? Flow control of course.

Looking at the settings on the Cisco 3550 switch, we had the defaults: output
flow control on, input flow control off. A bit of reading in the Cisco
documentation told me what that actually means. The Cisco sends PAUSE frames
to the Juniper when it thinks the Juniper is sending too much traffic and
overwhelming other ports (probably the 24 100baseTX ports).

So, I configured "flowcontrol send off" on the Cisco 3550 and "set interfaces
ge-0/0/0 gigether-options no-flow-control" on the Juniper. As soon as I did
this, latency dropped considerably. LaterI successfully moved nearly 200 Mbps
of traffic from one VLAN through the Juniper to another VLAN without any
noteworthy increase in latency, while previously latency would increase above
100 ms when we were moving around 130 Mbps of traffic. Case closed.

Why the Cisco was sending the PAUSE frames is a question for cisco-nsp, but I
suspect a note in the Cisco documentation which claims that you have to turn
off flowcontrol when you have QoS active (mls qos). QoS was indeed turned on,
although not used.