Mailing List Archive

ATM PIC and congestion
Those of you on the NANOG list may remember seeing from me an inquiry
there a short while back asking if anyone saw congestion at the Chicago
SBC/AADS NAP.

After much hair pulling by myself, my team and vendors, we've finally
figured out a latency problem that ended up being due to the way the
Juniper ATM1 PIC handles high traffic loads. The following is a summary
of the problem and some information about ATM buffer management that is
not yet available publicly from Juniper (I was told it was OK to share).

We had an M5 with an OC3c at the Chicago SBC/AADS NAP using the Juniper
ATM1 PIC. We had a high rate of utilization on the entire link, but
traffic on the outbound to the Internet was maxed, with the highest
percentage of outbound traffic going to our primary upstream. Recently
we begun experiencing latency on the order of hundreds of milliseconds.

At first it looked like a latency problem on multiple PVCs, but
eventually realized it was concentrated on our PVC with the most
outbound traffic.

After checking for latency in the ATM switch network, on the far end and
problems with our own gear, Juniper support determined that it appeared
to be how the ATM interface buffers operate. So we tweaked those queue
lengths and the latency problem went away. Now instead we get packet
drops, but that is more normal and we'll handle that in other ways (the
CoS and the ATM2 PIC thread today is very relevant for us as you might
guess). Below is some information some of you may find useful. This
information was apparently written by a Juniper escalation engineer and
hasn't made it into any available documentation yet.

ATM1 PICs contain a transmit buffer pool of 16382 buffers, which
are shared by all PVCs currently configured on the PIC. Even on
multi-phy ATM PICs, there is still a single buffer pool shared
by all the PHYs.

By default, the PIC allows PVCs to consume as many buffers as they
require. If the sustained traffic rate for a PVC exceeds it's shaped
rate, buffers will be consumed. Eventually, all buffers on the PIC
will be used, which will starve other PVC's. This results in
head-of-line blocking.

The queue-length parameter can be set (on a per-PVC basis) to prevent
this situation. It sets a limit on the number of transmit packets
(and ultimately buffers) that can be queued up to a PVC. New packets
that would exceed this limit get dropped (ie: tail-dropping).

queue-length, configured under the shaping heirarchy, represents the
maximum number of packets which can be queued for the PVC using the
global buffers. It should be configured for all PVCs when more than
one PVC is configured on an ATM1 PIC. It perfroms two functions.

1) It stops head of line blocking occuring since it limits the
number of packets and hence buffers that can be consumed by each
configured PVC.

2) It sets the maximum lifetime which can be sustained by packets
over the PVC when traffic has oversubscribed the configured shaping
contract.

The total of all the queue-length settings must not be greater then
the total number of packets which can be held in the buffer space
available on the PIC.

The total number of packets which can be held by the buffers is
calculated dependent on the MTU setting for the interfaces on the
PIC. The MTU used should include all ecapsulation overhead and hence
is the physical interface MTU. The following formula can be used to
calculate the total number of packets the buffer space can hold:

16,382 / ( Round Up ( MTU / 480 ) )

For exmaple, with the default MTU settings for ATM1 PIC interfaces,
the total for the number of packets which can be held is:

16,382 / ( Round Up ( 4,482 / 480 ) ) = 1638 packets.

Thus, when configuring the queue-lengths for each of the PVCs
configured on an ATM1 PIC using default MTU settings, they must not
total to more then 1638. They can total to less.

Setting a queue-length to a very low value is possible, yet doing
this risks not being able to buffer small bursts in packets transiting
the PVC.

The maximum lifetime which could be sustained by packets transiting
a PVC can be calculated dependent on the shaping rate configured for
the PVC, the setting for queue-length and the MTU. The following
formula can be used:

( PVC queue-length in packets x MTU ) / ( PVC shaping in bits per
second / 8 )

For example, say a PVC is configured on an ATM1 PIC interface with the

default MTU and a CBR shaping rate of 3,840,000bps (10,000 cells per
second). The queue-length has been set to 25 packets. The maximum
lifetime is:

( 25 x 4,482 ) / ( 3,840,000 / 8 ) = 233ms.

This is the worst case lifetime assuming all packets in the queue are
MTU sized and the traffic using the PVC is oversubscribing its
configured shaping contract.

In general its a good design practice to keep this maximum lifetime to
a value under 500ms.

So, if you've got high load and high latency over your ATM1 PIC, you may
need to tweak your lengths using the info above.

John
Re: ATM PIC and congestion [ In reply to ]
As a follow up to this issue, it is not possible to check the buffer
usage per PVC, but you can check buffer usage per PIC. According to a
Juniper tech, the following is unsupported and will not be publicly
documented.

To check PIC buffer usage, go into the FPC CPU itself and run
some commands from there. For example, if you have an ATM PIC in
position at-0/0/0, here is how to check (some lines may wrap):

1. start shell
2. su
3. vty fpc0
4. show atm-pic 0 txrx

<-- sample output

FPC0(vty)# show atm-pic 0 txrx

PIC 0 K chip information:

SDH support is disabled, Sonet support is enabled
Monitoring is disabled
Line: Dual OC3
IFD count is 2, interrupt_count is 0
error interrupt count is 0

TXRX Section Registers:
(0x850c0080) TX FREE RING BASE : 0x0002
(0x850c0082) TX FREE TAIL : 0x0250
(0x850c0084) TX FREE HEAD : 0x0268
(0x850c0090) TX REQ RING BASE : 0x0002
(0x850c0092) TX REQ RING TAIL : 0x04b0
(0x850c009c) TX BUF0 END : 0x01e4
(0x850c009e) TX BUFN END : 0x01e4
(0x850c00a0) RX IND RING BASE : 0x0003
(0x850c00a2) RX IND TAIL : 0x004c
(0x850c00a4) RX IND HEAD : 0x004c
(0x850c00a6) RX IND DONE : 0x004c
(0x850c00bc) RX BUF0 END : 0x01e4
(0x850c00be) RX BUFN END : 0x01e4

TX FREE BUF TAIL/HEAD : 0x0250 / 0x0268 (16372 bufs) << Look at this
^^^^^^^^^^^^
RX IND Q TAIL/HEAD/DONE : 0x004c / 0x004c / 0x004c (0 queued)

sample output -->

John