Mailing List Archive

Bottlenecks and link upgrades
At what point do commercial ISPs upgrade links in their backbone as well as peering and transit links that are congested?  At 80% capacity?  90%?  95%? 




Thanks,
Hank




Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
Re: Bottlenecks and link upgrades [ In reply to ]
On Wed, 12 Aug 2020 at 10:35, Hank Nussbacher <hank@interall.co.il> wrote:

> At what point do commercial ISPs upgrade links in their backbone as well as peering and transit links that are congested? At 80% capacity? 90%? 95%?

I've worked for employees where policy has been anywhere from 50% or
80%. And I know this isn't complete range. Most do not subscribe to
any single simple rule but act more tactically.

Personally if the link is in a growth market, you should upgrade
really early, 50% seems late, cost is negligible if you anticipate
growth to continue. If it's not a growth market cost may become less
than negligible.

Sometimes networks congest particularly their edge interfaces
strategically due to poor incentives, where irrelevant revenue
wholesale arm might see some benefit from strategic congestion while
also significantly hurting their money printing mobile arm reducing
company wide bottom line while improving wholesale arm bottom line.


--
++ytti
Re: Bottlenecks and link upgrades [ In reply to ]
On 12/Aug/20 09:31, Hank Nussbacher wrote:

> At what point do commercial ISPs upgrade links in their backbone as
> well as peering and transit links that are congested?  At 80%
> capacity?  90%?  95%? 
>

We start the process at 50% utilization, and work toward completing the
upgrade by 70% utilization.

The period between 50% - 70% is just internal paperwork.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 12/Aug/20 09:44, Saku Ytti wrote:

> Personally if the link is in a growth market, you should upgrade
> really early, 50% seems late, cost is negligible if you anticipate
> growth to continue. If it's not a growth market cost may become less
> than negligible.

The problem you have is "what is a growth market", especially over time
as it stabilizes and see new entrants, but growth is now in a phase
where you need massive scale to keep playing.

You then shift from a "sales are guaranteed Day 1" to a "build it and
hope for the best". Many Commercial get fearful at that point, because
of the temptation to link capacity to guaranteed sales.


> Sometimes networks congest particularly their edge interfaces
> strategically due to poor incentives, where irrelevant revenue
> wholesale arm might see some benefit from strategic congestion while
> also significantly hurting their money printing mobile arm reducing
> company wide bottom line while improving wholesale arm bottom line.

I know a few :-).

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
Just my curiosity. May I ask how we can measure the link capacity loading?
What does it mean by a 50%, 70%, or 90% capacity loading? Load sampled and
measured instantaneously, or averaging over a certain period of time
(granularity)?

These are questions have bothered me for long. Don't know if I can ask
about these by the way. I take care of the radio access network performance
at work. Found many things unknown in transport network.

Thanks and best regards,
Taichi


On Wed, Aug 12, 2020 at 3:54 PM Mark Tinka <mark.tinka@seacom.com> wrote:

>
>
> On 12/Aug/20 09:31, Hank Nussbacher wrote:
>
> At what point do commercial ISPs upgrade links in their backbone as well
> as peering and transit links that are congested? At 80% capacity? 90%?
> 95%?
>
>
> We start the process at 50% utilization, and work toward completing the
> upgrade by 70% utilization.
>
> The period between 50% - 70% is just internal paperwork.
>
> Mark.
>
Re: Bottlenecks and link upgrades [ In reply to ]
When I worked for an ISP, it was about 70%, not sure if that is the case
with the other ones.


On 8/12/2020 3:31 AM, Hank Nussbacher wrote:
>
> At what point do commercial ISPs upgrade links in their backbone as
> well as peering and transit links that are congested?  At 80%
> capacity?  90%?  95%?
>
>
> Thanks,
> Hank
>
>
> Caveat: The views expressed above are solely my own and do not express
> the views or opinions of my employer
>
Re: Bottlenecks and link upgrades [ In reply to ]
On 12/Aug/20 17:08, m.Taichi wrote:

>
> Just my curiosity. May I ask how we can measure the link capacity
> loading? What does it mean by a 50%, 70%, or 90% capacity loading?
> Load sampled and measured instantaneously, or averaging over a certain
> period of time (granularity)?
>
> These are questions have bothered me for long. Don't know if I can ask
> about these by the way. I take care of the radio access network
> performance at work. Found many things unknown in transport network.

For this, we look at simpel 5-minute based SNMP data over the period.
Nothing too fancy. It's stable

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 12/Aug/20 17:08, m.Taichi wrote:

> Just my curiosity. May I ask how we can measure the link capacity
> loading? What does it mean by a 50%, 70%, or 90% capacity loading?
> Load sampled and measured instantaneously, or averaging over a certain
> period of time (granularity)?
>
> These are questions have bothered me for long. Don't know if I can ask
> about these by the way. I take care of the radio access network
> performance at work. Found many things unknown in transport network.

For this, we look at simple 5-minute based SNMP data over the period.
Nothing too fancy. It's stable

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On Wed, 12 Aug 2020, Hank Nussbacher wrote:

>
> At what point do commercial ISPs upgrade links in their backbone as well as peering and transit links that are congested?  At
> 80% capacity?  90%?  95%? 
>
>
> Thanks,
> Hank
>
>
> Caveat: The views expressed above are solely my own and do not express the views or opinions of my employer
>
>
>


Why upgrade when you can legislate the problem instead.

Charter tries to convince FCC that broadband customers want data caps.

https://arstechnica.com/tech-policy/2020/08/charter-tries-to-convince-fcc-that-broadband-customers-want-data-caps/

Ted
Re: Bottlenecks and link upgrades [ In reply to ]
m Taichi writes:
> Just my curiosity. May I ask how we can measure the link capacity
> loading? What does it mean by a 50%, 70%, or 90% capacity loading?
> Load sampled and measured instantaneously, or averaging over a certain
> period of time (granularity)?

Very good question!

With tongue in cheek, one could say that measured instantaneously, the
load on a link is always either zero or 100% link rate...

ISPs typically sample link load in 5-minute intervals and look at graphs
that show load (at this 5-minute sampling resolution) over ~24 hours, or
longer-term graphs where the resolution has been "downsampled", where
downsampling usually smoothes out short-term peaks.

From my own experience, upgrade decisions are made by looking at those
graphs and checking whether peak traffic (possibly ignoring "spikes" :-)
crosses the threshold repeatedly.

At some places this might be codified in terms of percentiles, e.g. "the
Nth percentile of the M-minute utilization samples exceeds X% of link
capacity over a Y-day period". I doubt that anyone uses such rules to
automatically issue upgrade orders, but maybe to generate alerts like
"please check this link, we might want to upgrade it".

I'd be curious whether other operators have such alert rules, and what
N/M/X/Y they use - might well be different values for different kinds of
links.
--
Simon.
PS. We use the "stare at graphs" method, but if we had automatic alerts,
I guess it would be something like "the 95th percentile of 5-minute
samples exceeds 50% over 30 days".
PPS. My colleagues remind me that we do alert on output queue drops.

> These are questions have bothered me for long. Don't know if I can ask
> about these by the way. I take care of the radio access network
> performance at work. Found many things unknown in transport network.

> Thanks and best regards,
> Taichi

> On Wed, Aug 12, 2020 at 3:54 PM Mark Tinka <mark.tinka@seacom.com> wrote:

> On 12/Aug/20 09:31, Hank Nussbacher wrote:

> At what point do commercial ISPs upgrade links in their backbone as well as peering and transit links that are congested? At 80%
> capacity? 90%? 95%?

> We start the process at 50% utilization, and work toward completing the upgrade by 70% utilization.

> The period between 50% - 70% is just internal paperwork.

> Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 13/Aug/20 11:56, Simon Leinen wrote:

> I'd be curious whether other operators have such alert rules, and what
> N/M/X/Y they use - might well be different values for different kinds of
> links.

We use alerts to tell us about links that hit a threshold, in our NMS.
But yes, this is based on 5-minute samples, not percentile data.

The alerts are somewhat redundant for any long-term planning. They are
more useful when problems happen out of the blue.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
>
> With tongue in cheek, one could say that measured instantaneously, the
> load on a link is always either zero or 100% link rate...
>

Actually, that's a first-class observation !

On Thu, Aug 13, 2020 at 12:00 PM Simon Leinen <simon.leinen@switch.ch>
wrote:

> m Taichi writes:
> > Just my curiosity. May I ask how we can measure the link capacity
> > loading? What does it mean by a 50%, 70%, or 90% capacity loading?
> > Load sampled and measured instantaneously, or averaging over a certain
> > period of time (granularity)?
>
> Very good question!
>
> With tongue in cheek, one could say that measured instantaneously, the
> load on a link is always either zero or 100% link rate...
>
> ISPs typically sample link load in 5-minute intervals and look at graphs
> that show load (at this 5-minute sampling resolution) over ~24 hours, or
> longer-term graphs where the resolution has been "downsampled", where
> downsampling usually smoothes out short-term peaks.
>
> From my own experience, upgrade decisions are made by looking at those
> graphs and checking whether peak traffic (possibly ignoring "spikes" :-)
> crosses the threshold repeatedly.
>
> At some places this might be codified in terms of percentiles, e.g. "the
> Nth percentile of the M-minute utilization samples exceeds X% of link
> capacity over a Y-day period". I doubt that anyone uses such rules to
> automatically issue upgrade orders, but maybe to generate alerts like
> "please check this link, we might want to upgrade it".
>
> I'd be curious whether other operators have such alert rules, and what
> N/M/X/Y they use - might well be different values for different kinds of
> links.
> --
> Simon.
> PS. We use the "stare at graphs" method, but if we had automatic alerts,
> I guess it would be something like "the 95th percentile of 5-minute
> samples exceeds 50% over 30 days".
> PPS. My colleagues remind me that we do alert on output queue drops.
>
> > These are questions have bothered me for long. Don't know if I can ask
> > about these by the way. I take care of the radio access network
> > performance at work. Found many things unknown in transport network.
>
> > Thanks and best regards,
> > Taichi
>
> > On Wed, Aug 12, 2020 at 3:54 PM Mark Tinka <mark.tinka@seacom.com>
> wrote:
>
> > On 12/Aug/20 09:31, Hank Nussbacher wrote:
>
> > At what point do commercial ISPs upgrade links in their backbone as
> well as peering and transit links that are congested? At 80%
> > capacity? 90%? 95%?
>
> > We start the process at 50% utilization, and work toward completing the
> upgrade by 70% utilization.
>
> > The period between 50% - 70% is just internal paperwork.
>
> > Mark.
>
>

--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale
Re: Bottlenecks and link upgrades [ In reply to ]
On 12.08.2020 09:31, Hank Nussbacher wrote:
>
> At what point do commercial ISPs upgrade links in their backbone as
> well as peering and transit links that are congested?  At 80%
> capacity?  90%?  95%? 
>

Hi,


Wouldn't it be better to measure the basic performance like packet drop
rates and queue sizes ?

These days live video is needed and these parameters are essential to
the quality.

Queues are building up in milliseconds and people are averaging over
minutes to estimate quality.


If you are measuring queue delay with high frequent one-way-delay
measurements

you would then be able to advice better on what the consequences of a
highly loaded link are.


We are running a research project on end-to-end quality and the enclosed
image is yesterdays report on

queuesize(h_ddelay) in ms. It shows stats on delays between some peers.

I would have looked at the trends on the involved links to see if
upgrade is necessary - 

421 ms  might be too much ig it happens often.


Best regards


  Olav Kvittem


>
> Thanks,
> Hank
>
>
> Caveat: The views expressed above are solely my own and do not express
> the views or opinions of my employer
>
Re: Bottlenecks and link upgrades [ In reply to ]
On 13/Aug/20 12:23, Olav Kvittem via NANOG wrote:

> Wouldn't it be better to measure the basic performance like packet
> drop rates and queue sizes ?
>
> These days live video is needed and these parameters are essential to
> the quality.
>
> Queues are building up in milliseconds and people are averaging over
> minutes to estimate quality.
>
>
> If you are measuring queue delay with high frequent one-way-delay
> measurements
>
> you would then be able to advice better on what the consequences of a
> highly loaded link are.
>
>
> We are running a research project on end-to-end quality and the
> enclosed image is yesterdays report on
>
> queuesize(h_ddelay) in ms. It shows stats on delays between some peers.
>
> I would have looked at the trends on the involved links to see if
> upgrade is necessary - 
>
> 421 ms  might be too much ig it happens often.
>

I'm confident everyone (even the cheapest CFO) knows the consequences of
congesting a link and choosing not to upgrade it.

Optical issues, dirty patch cords, faulty line cards, wrong
configurations, will almost likely lead to packet loss.  Link congestion
due to insufficient bandwidth will most certainly lead to packet loss.

It's great to monitor packet loss, latency, pps, e.t.c. But packet loss
at 10% link utilization is not a foreign occurrence. No amount of
bandwidth upgrades will fix that.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
Mark Tinka wrote on 13/08/2020 11:31:
> It's great to monitor packet loss, latency, pps, e.t.c. But packet loss
> at 10% link utilization is not a foreign occurrence. No amount of
> bandwidth upgrades will fix that.

you could easily have 10% utilization and see packet loss due to
insufficient bandwidth if you have egress << ingress and proportionally
low buffering, e.g. UDP or iSCSI from a 40G/100 port with egress to a
low-buffer 1G port.

This sort of thing is less likely in the imix world, but it can easily
happen with high capacity CDN nodes injecting content where the
receiving port is small and subject to bursty traffic.

Nick
Re: Bottlenecks and link upgrades [ In reply to ]
On 13/Aug/20 13:00, Nick Hilliard wrote:

>
> you could easily have 10% utilization and see packet loss due to
> insufficient bandwidth if you have egress << ingress and
> proportionally low buffering, e.g. UDP or iSCSI from a 40G/100 port
> with egress to a low-buffer 1G port.
>
> This sort of thing is less likely in the imix world, but it can easily
> happen with high capacity CDN nodes injecting content where the
> receiving port is small and subject to bursty traffic.

Indeed.

The smaller the capacity gets toward egress, the closer you are getting
to an end-user, in most cases.

End-user link upgrades will always be the weakest link in the chain, as
the incentive is more on their side than you, their provider. Your final
egress port buffer sizing notwithstanding, of course.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
Hi Mark,


Just comments on your points below.

On 13.08.2020 12:31, Mark Tinka wrote:
>
> On 13/Aug/20 12:23, Olav Kvittem via NANOG wrote:
>
>> Wouldn't it be better to measure the basic performance like packet
>> drop rates and queue sizes ?
>>
>> These days live video is needed and these parameters are essential to
>> the quality.
>>
>> Queues are building up in milliseconds and people are averaging over
>> minutes to estimate quality.
>>
>>
>> If you are measuring queue delay with high frequent one-way-delay
>> measurements
>>
>> you would then be able to advice better on what the consequences of a
>> highly loaded link are.
>>
>>
>> We are running a research project on end-to-end quality and the
>> enclosed image is yesterdays report on
>>
>> queuesize(h_ddelay) in ms. It shows stats on delays between some peers.
>>
>> I would have looked at the trends on the involved links to see if
>> upgrade is necessary - 
>>
>> 421 ms  might be too much ig it happens often.
>>
> I'm confident everyone (even the cheapest CFO) knows the consequences of
> congesting a link and choosing not to upgrade it.
>
> Optical issues, dirty patch cords, faulty line cards, wrong
> configurations, will almost likely lead to packet loss. 
> Link congestion
> due to insufficient bandwidth will most certainly lead to packet loss.
sure, but I guess the loss rate depends of the nature of the traffic.
>
> It's great to monitor packet loss, latency, pps, e.t.c. But packet loss
> at 10% link utilization is not a foreign occurrence. No amount of
> bandwidth upgrades will fix that.


I guess that having more reports would support the judgements better.

A basic question is : what is the effect on the perceived quality of the
customers ?

And the relation between that and /5min load is not known to me.

Actually one good indicator of the congestion loss rate are of course
the SNMP OutputDiscards.


Curves for  queueing delay, link load and discard rate are surprisingly
different.


regards

 Olav



>
> Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 13/Aug/20 13:44, Olav Kvittem wrote:

> sure, but I guess the loss rate depends of the nature of the traffic.

Packet loss is packet loss.

Some applications are more sensitive to it (live video, live voice, for
example), while others are less so. However, packet loss always
manifests badly if left unchecked.


>> I guess that having more reports would support the judgements better.

For sure, yes. Any decent NMS can provide a number of data points so you
aren't shooting in the dark.


>>
>> A basic question is : what is the effect on the perceived quality of the
>> customers ?

Depends on the application.

Gamers tend to complain the most, so that's a great indicator.

Some customers that think bandwidth solves all problems will perceive
their inability to attain their advertised contract as a problem, if
packet loss is in the way.

Generally, other bad things, including unruly human beings :-).


>>
>> And the relation between that and /5min load is not known to me.

For troubleshooting, being able to have a tighter resolution is more
important. 5-minute averages are for day-to-day operations, and
long-term planning.


>>
>> Actually one good indicator of the congestion loss rate are of course
>> the SNMP OutputDiscards.
>>
>>
>> Curves for  queueing delay, link load and discard rate are surprisingly
>> different.

Yes, that then gets into the guts of the router hardware, and it's design.

In such cases, that's when your 100Gbps link is peaking and causing
packet loss, not understanding that the forwarding chip on it is only
good for 60Gbps, for example.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
Is it possible to do and is anyone monitoring metrics such as max queue
length in 5 minutes intervals? Might be a better metric than average load
in 5 minutes intervals.

Regards

Baldur
Re: Bottlenecks and link upgrades [ In reply to ]
I suppose it would depend on if your hardware has an OID for what you want to monitor.




-----
Mike Hammett
Intelligent Computing Solutions

Midwest Internet Exchange

The Brothers WISP

----- Original Message -----

From: "Baldur Norddahl" <baldur.norddahl@gmail.com>
To: nanog@nanog.org
Sent: Thursday, August 13, 2020 8:20:26 AM
Subject: Re: Bottlenecks and link upgrades




Is it possible to do and is anyone monitoring metrics such as max queue length in 5 minutes intervals? Might be a better metric than average load in 5 minutes intervals.


Regards


Baldur
Re: Bottlenecks and link upgrades [ In reply to ]
I expect my hardware does not have such a metric, but maybe it should have.
Max queue length tell us how full the link is with respect to microbursts.


tor. 13. aug. 2020 15.28 skrev Mike Hammett <nanog@ics-il.net>:

> I suppose it would depend on if your hardware has an OID for what you want
> to monitor.
>
>
>
> -----
> Mike Hammett
> Intelligent Computing Solutions <http://www.ics-il.com/>
> <https://www.facebook.com/ICSIL>
> <https://plus.google.com/+IntelligentComputingSolutionsDeKalb>
> <https://www.linkedin.com/company/intelligent-computing-solutions>
> <https://twitter.com/ICSIL>
> Midwest Internet Exchange <http://www.midwest-ix.com/>
> <https://www.facebook.com/mdwestix>
> <https://www.linkedin.com/company/midwest-internet-exchange>
> <https://twitter.com/mdwestix>
> The Brothers WISP <http://www.thebrotherswisp.com/>
> <https://www.facebook.com/thebrotherswisp>
> <https://www.youtube.com/channel/UCXSdfxQv7SpoRQYNyLwntZg>
> ------------------------------
> *From: *"Baldur Norddahl" <baldur.norddahl@gmail.com>
> *To: *nanog@nanog.org
> *Sent: *Thursday, August 13, 2020 8:20:26 AM
> *Subject: *Re: Bottlenecks and link upgrades
>
> Is it possible to do and is anyone monitoring metrics such as max queue
> length in 5 minutes intervals? Might be a better metric than average load
> in 5 minutes intervals.
>
> Regards
>
> Baldur
>
>
Re: Bottlenecks and link upgrades [ In reply to ]
>
> Wouldn't it be better to measure the basic performance like packet drop
> rates and queue sizes ?
>

Those values should be a standard part of monitoring and data collection,
but if they happen to MATTER or not in a given situation very much depends.

The traffic profile traversing the link may be such that the observed drop
% and buffer depths is acceptable for that traffic, and there is no need
for further tuning or changes. In other scenarios it may not be, in which
case either network or application adjustments are warranted.

There is rarely a one sized fits all answer when it comes to these things.


On Thu, Aug 13, 2020 at 6:25 AM Olav Kvittem via NANOG <nanog@nanog.org>
wrote:

>
> On 12.08.2020 09:31, Hank Nussbacher wrote:
>
> At what point do commercial ISPs upgrade links in their backbone as well
> as peering and transit links that are congested? At 80% capacity? 90%?
> 95%?
>
>
> Hi,
>
>
> Wouldn't it be better to measure the basic performance like packet drop
> rates and queue sizes ?
>
> These days live video is needed and these parameters are essential to the
> quality.
>
> Queues are building up in milliseconds and people are averaging over
> minutes to estimate quality.
>
>
> If you are measuring queue delay with high frequent one-way-delay
> measurements
>
> you would then be able to advice better on what the consequences of a
> highly loaded link are.
>
>
> We are running a research project on end-to-end quality and the enclosed
> image is yesterdays report on
>
> queuesize(h_ddelay) in ms. It shows stats on delays between some peers.
>
> I would have looked at the trends on the involved links to see if upgrade
> is necessary -
>
> 421 ms might be too much ig it happens often.
>
>
> Best regards
>
>
> Olav Kvittem
>
>
>
> Thanks,
> Hank
>
>
> Caveat: The views expressed above are solely my own and do not express the
> views or opinions of my employer
>
>
Re: Bottlenecks and link upgrades [ In reply to ]
It is possible to gather a lot of information about buffers and queues, at
least with the vendors we work with. That can be very helpful in a lot of
ways. :)

On Thu, Aug 13, 2020 at 9:21 AM Baldur Norddahl <baldur.norddahl@gmail.com>
wrote:

> Is it possible to do and is anyone monitoring metrics such as max queue
> length in 5 minutes intervals? Might be a better metric than average load
> in 5 minutes intervals.
>
> Regards
>
> Baldur
>
Re: Bottlenecks and link upgrades [ In reply to ]
>
> There is rarely a one sized fits all answer when it comes to these
> things.
>

Absolutely true: every application has characteristic QoS parameters.

Unfortunately, it seems that 5-minute averages of data rates through links
are the one-size-fits-all answer ... which doesn't fit all.

Etienne

On Thu, Aug 13, 2020 at 5:37 PM Tom Beecher <beecher@beecher.cc> wrote:

> Wouldn't it be better to measure the basic performance like packet drop
>> rates and queue sizes ?
>>
>
> Those values should be a standard part of monitoring and data collection,
> but if they happen to MATTER or not in a given situation very much depends.
>
> The traffic profile traversing the link may be such that the observed drop
> % and buffer depths is acceptable for that traffic, and there is no need
> for further tuning or changes. In other scenarios it may not be, in which
> case either network or application adjustments are warranted.
>
> There is rarely a one sized fits all answer when it comes to these things.
>
>
> On Thu, Aug 13, 2020 at 6:25 AM Olav Kvittem via NANOG <nanog@nanog.org>
> wrote:
>
>>
>> On 12.08.2020 09:31, Hank Nussbacher wrote:
>>
>> At what point do commercial ISPs upgrade links in their backbone as well
>> as peering and transit links that are congested? At 80% capacity? 90%?
>> 95%?
>>
>>
>> Hi,
>>
>>
>> Wouldn't it be better to measure the basic performance like packet drop
>> rates and queue sizes ?
>>
>> These days live video is needed and these parameters are essential to the
>> quality.
>>
>> Queues are building up in milliseconds and people are averaging over
>> minutes to estimate quality.
>>
>>
>> If you are measuring queue delay with high frequent one-way-delay
>> measurements
>>
>> you would then be able to advice better on what the consequences of a
>> highly loaded link are.
>>
>>
>> We are running a research project on end-to-end quality and the enclosed
>> image is yesterdays report on
>>
>> queuesize(h_ddelay) in ms. It shows stats on delays between some peers.
>>
>> I would have looked at the trends on the involved links to see if upgrade
>> is necessary -
>>
>> 421 ms might be too much ig it happens often.
>>
>>
>> Best regards
>>
>>
>> Olav Kvittem
>>
>>
>>
>> Thanks,
>> Hank
>>
>>
>> Caveat: The views expressed above are solely my own and do not express
>> the views or opinions of my employer
>>
>>

--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale
Re: Bottlenecks and link upgrades [ In reply to ]
On Wed, Aug 12, 2020 at 12:33 AM Hank Nussbacher <hank@interall.co.il> wrote:
> At what point do commercial ISPs upgrade links in their backbone as well as peering and transit links that are congested? At 80% capacity? 90%? 95%?

Hi Hank,

As others have noted, the answer is rarely that simple.

First, what is your consumption? 90th or 95th percentile usually,
after all 100% between 9 and 5 is 100% not 33% but 100% for two
minutes is not 100%. It gets more complicated if any kind of QoS is in
play because capacity-wise QoS essentially gives you not a single
fixed-speed line but many interdependent variable-speed lines.

Next, capacity is not the only question. Here are some of the other factors:

1) A residential customer on the cheapest plan does not merit as clean
a channel as a high-paying business customer you'd like to keep
milking.

2) Upgrades can take months of planning so the capacity now is beside
the point. You'll use your best-guess projection for the capacity at
the time an upgrade can be complete.

3) Some upgrades tend to be significantly more expensive than others.
Lit service to dark fiber, for example. It's pretty ordinary to run
closer to the limit before making an expensive upgrade than a modest
upgrade.

4) A dirty link merits replacement sooner than a clean one. If the
higher-capacity service also clears up packet loss, you'll want to
trigger the decision at a lower consumption threshold.

5) Switching a single path to two paths is more valuable than
switching two paths to three. It has priority at a lower level of
consumption.

Regards,
Bill Herrin

--
William Herrin
bill@herrin.us
https://bill.herrin.us/
Re: Bottlenecks and link upgrades [ In reply to ]
On Wed, Aug 12, 2020, at 09:31, Hank Nussbacher wrote:
> At what point do commercial ISPs upgrade links in their backbone as
> well as peering and transit links that are congested? At 80% capacity?
> 90%? 95%?

Some reflections about link capacity:
At 90% and over, you should panic.
Between 80% and 90% you should be (very) scared.
Between 70% and 80% you should be worried.
Between 60% and 70% you should seriously consider speeding up the upgrades that you effectively started at 50%, and started planning since 40%.

Of course, that differs from one ISP to another. Some only upgrade after several months with at least 4 hours a day, every day (or almost) at over 95%. Others deploy 10x expected capacity, and upgrade well before 40%.
Re: Bottlenecks and link upgrades [ In reply to ]
On Thu, Aug 13, 2020, at 12:31, Mark Tinka wrote:
> I'm confident everyone (even the cheapest CFO) knows the consequences of
> congesting a link and choosing not to upgrade it.

I think you're over-confident.

> It's great to monitor packet loss, latency, pps, e.t.c. But packet loss
> at 10% link utilization is not a foreign occurrence. No amount of
> bandwidth upgrades will fix that.

That, plus the fact that by the time delay becomes an indication of congestion, it's way too late to start an upgrade. That event should not occur.
Re: Bottlenecks and link upgrades [ In reply to ]
Beyond a pure percentage, you might want to account for the time it takes
you stay below a certain threshold. If you want to target a certain link to
keep your 95th percentile peaks below 70%, then first get an understanding
of your traffic growth and try to project when you will reach that number.
You have to decide whether you care about the occasional peak, or the
consistent peak, or somewhere in between, like weekday vs weekends, etc.
Now you know how much lead time you will have.

Then consider how long it will take you to upgrade that link. If it's a
matter of adding a couple of crossconnects, then you might just need a
week. If you have to ship and install optics, modules, a card, then add
another week. If you have to get a sales order signed by senior management,
add another week. If you have to put it through legal and finance, add a
month. (kidding) If you are doing your annual re-negotiation, well...good
luck.

It's always good to ask your circuit vendors what the lead times are, then
double it and add 5.

And sometimes, if you need a low latency connection, traffic utilization
levels might not even be something you look at.

Louie
Peering Coordinator at a start-up ISP


On Fri, Aug 14, 2020 at 4:13 PM Radu-Adrian Feurdean <
nanog@radu-adrian.feurdean.net> wrote:

> On Wed, Aug 12, 2020, at 09:31, Hank Nussbacher wrote:
> > At what point do commercial ISPs upgrade links in their backbone as
> > well as peering and transit links that are congested? At 80% capacity?
> > 90%? 95%?
>
> Some reflections about link capacity:
> At 90% and over, you should panic.
> Between 80% and 90% you should be (very) scared.
> Between 70% and 80% you should be worried.
> Between 60% and 70% you should seriously consider speeding up the
> upgrades that you effectively started at 50%, and started planning since
> 40%.
>
> Of course, that differs from one ISP to another. Some only upgrade after
> several months with at least 4 hours a day, every day (or almost) at over
> 95%. Others deploy 10x expected capacity, and upgrade well before 40%.
>
Re: Bottlenecks and link upgrades [ In reply to ]
I've seen the weekly profiles of traffic sourced from caches for the major
global services (video, social media, search and general) for a specific
metro area.

For all services, the weekly profile is a repetition of the daily profile,
within +/- 20%.
That is: the weekly profile is obtained from the daily profile within +/-
20% of the average daily profile height.

Given this regularity, as suggested by Louie Lee, then it seems that growth
projections are meaningful.
That is, the weely profile data, seem to provide a sound empirical basis
for link upgrades.

Since I'm not an operator, my comments need to be sprinkled with a pinch of
salt :)

Cheers,

Etienne

On Sat, Aug 15, 2020 at 2:43 AM Louie Lee via NANOG <nanog@nanog.org> wrote:

> Beyond a pure percentage, you might want to account for the time it takes
> you stay below a certain threshold. If you want to target a certain link to
> keep your 95th percentile peaks below 70%, then first get an understanding
> of your traffic growth and try to project when you will reach that number.
> You have to decide whether you care about the occasional peak, or the
> consistent peak, or somewhere in between, like weekday vs weekends, etc.
> Now you know how much lead time you will have.
>
> Then consider how long it will take you to upgrade that link. If it's a
> matter of adding a couple of crossconnects, then you might just need a
> week. If you have to ship and install optics, modules, a card, then add
> another week. If you have to get a sales order signed by senior management,
> add another week. If you have to put it through legal and finance, add a
> month. (kidding) If you are doing your annual re-negotiation, well...good
> luck.
>
> It's always good to ask your circuit vendors what the lead times are, then
> double it and add 5.
>
> And sometimes, if you need a low latency connection, traffic utilization
> levels might not even be something you look at.
>
> Louie
> Peering Coordinator at a start-up ISP
>
>
> On Fri, Aug 14, 2020 at 4:13 PM Radu-Adrian Feurdean <
> nanog@radu-adrian.feurdean.net> wrote:
>
>> On Wed, Aug 12, 2020, at 09:31, Hank Nussbacher wrote:
>> > At what point do commercial ISPs upgrade links in their backbone as
>> > well as peering and transit links that are congested? At 80% capacity?
>> > 90%? 95%?
>>
>> Some reflections about link capacity:
>> At 90% and over, you should panic.
>> Between 80% and 90% you should be (very) scared.
>> Between 70% and 80% you should be worried.
>> Between 60% and 70% you should seriously consider speeding up the
>> upgrades that you effectively started at 50%, and started planning since
>> 40%.
>>
>> Of course, that differs from one ISP to another. Some only upgrade after
>> several months with at least 4 hours a day, every day (or almost) at over
>> 95%. Others deploy 10x expected capacity, and upgrade well before 40%.
>>
>

--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale
Re: Bottlenecks and link upgrades [ In reply to ]
On Sat, Aug 15, 2020, at 02:39, Louie Lee wrote:

> get an understanding of your traffic growth and try to project when you
> will reach that number. You have to decide whether you care about the
> occasional peak, or the consistent peak, or somewhere in between, like
> weekday vs weekends, etc. Now you know how much lead time you will have.

Get an understanding, and try to make a plan on the longer term (like 2-3 years) if you can. If you're reaching some important milestones (e.g need to buy expensive hardware), make a presentation for the management.
You will definitely need adjustments, during the timespan covered (some things will need to be done sooner, others may leave you some extra time) but it should reduce the amount of surprise.

That is valid if you have visibility. If you don't (that may happen), the cheatsheet I described previously is a good start. It could be applied at $job[-1], where I applied it to grow the network from almost zero to 35 Gbps, and it is kind of applied at $job[$now] where long term visibility is kind of missing and we need to be ready for rapid capacity variations.

> And sometimes, if you need a low latency connection, traffic
> utilization levels might not even be something you look at.

This goes to the "understand your traffic" chapter. All the traffic (sine sometimes there may be a mix, e.g. regular eyeball traffic + voice traffic).
Re: Bottlenecks and link upgrades [ In reply to ]
No plan survives contact with the enemy. Your careful made growth
projection was fine until the brass made a deal with some major customer,
which caused a traffic spike. Or any infinite other events that could and
eventually will happen to you.

One hard thing, that almost everyone will get wrong at some point, is
simulating load in the event multiple outages takes some links out, causing
excessive traffic to reroute unto links that previously seemed fine.

Regards,

Baldur


On Sat, Aug 15, 2020 at 10:48 AM Etienne-Victor Depasquale <edepa@ieee.org>
wrote:

> I've seen the weekly profiles of traffic sourced from caches for the major
> global services (video, social media, search and general) for a specific
> metro area.
>
> For all services, the weekly profile is a repetition of the daily profile,
> within +/- 20%.
> That is: the weekly profile is obtained from the daily profile within +/-
> 20% of the average daily profile height.
>
> Given this regularity, as suggested by Louie Lee, then it seems that
> growth projections are meaningful.
> That is, the weely profile data, seem to provide a sound empirical basis
> for link upgrades.
>
> Since I'm not an operator, my comments need to be sprinkled with a pinch
> of salt :)
>
> Cheers,
>
> Etienne
>
> On Sat, Aug 15, 2020 at 2:43 AM Louie Lee via NANOG <nanog@nanog.org>
> wrote:
>
>> Beyond a pure percentage, you might want to account for the time it takes
>> you stay below a certain threshold. If you want to target a certain link to
>> keep your 95th percentile peaks below 70%, then first get an understanding
>> of your traffic growth and try to project when you will reach that number.
>> You have to decide whether you care about the occasional peak, or the
>> consistent peak, or somewhere in between, like weekday vs weekends, etc.
>> Now you know how much lead time you will have.
>>
>> Then consider how long it will take you to upgrade that link. If it's a
>> matter of adding a couple of crossconnects, then you might just need a
>> week. If you have to ship and install optics, modules, a card, then add
>> another week. If you have to get a sales order signed by senior management,
>> add another week. If you have to put it through legal and finance, add a
>> month. (kidding) If you are doing your annual re-negotiation, well...good
>> luck.
>>
>> It's always good to ask your circuit vendors what the lead times are,
>> then double it and add 5.
>>
>> And sometimes, if you need a low latency connection, traffic utilization
>> levels might not even be something you look at.
>>
>> Louie
>> Peering Coordinator at a start-up ISP
>>
>>
>> On Fri, Aug 14, 2020 at 4:13 PM Radu-Adrian Feurdean <
>> nanog@radu-adrian.feurdean.net> wrote:
>>
>>> On Wed, Aug 12, 2020, at 09:31, Hank Nussbacher wrote:
>>> > At what point do commercial ISPs upgrade links in their backbone as
>>> > well as peering and transit links that are congested? At 80%
>>> capacity?
>>> > 90%? 95%?
>>>
>>> Some reflections about link capacity:
>>> At 90% and over, you should panic.
>>> Between 80% and 90% you should be (very) scared.
>>> Between 70% and 80% you should be worried.
>>> Between 60% and 70% you should seriously consider speeding up the
>>> upgrades that you effectively started at 50%, and started planning since
>>> 40%.
>>>
>>> Of course, that differs from one ISP to another. Some only upgrade after
>>> several months with at least 4 hours a day, every day (or almost) at over
>>> 95%. Others deploy 10x expected capacity, and upgrade well before 40%.
>>>
>>
>
> --
> Ing. Etienne-Victor Depasquale
> Assistant Lecturer
> Department of Communications & Computer Engineering
> Faculty of Information & Communication Technology
> University of Malta
> Web. https://www.um.edu.mt/profile/etiennedepasquale
>
Re: Bottlenecks and link upgrades [ In reply to ]
On Sat, Aug 15, 2020, at 11:35, Baldur Norddahl wrote:
> No plan survives contact with the enemy. Your careful made growth
> projection was fine until the brass made a deal with some major
> customer, which caused a traffic spike.

Capacity planning also includes keeping an eye on what is being sold and what is being prepared.
Having the traffic more than double within a 48h timespan (until day X peak at N Gbps, after days X+2, peaks at 2.5*N Gbps) -> done with success when the correct information ("partner X will change delivery system") arrived 4 months in advance.

Having multiple 200 Mbps and 500 Mbps connections over an already-used 1 Gbps port and pretending that "everything's gonna be allright" , in that case you should confront your enemy.

> Or any infinite other events that could and eventually will happen to you.

Among which you try to protect yourself against the most realistic ones.

> One hard thing, that almost everyone will get wrong at some point, is
> simulating load in the event multiple outages takes some links out,
> causing excessive traffic to reroute unto links that previously seemed
> fine.

You should scale the network to absorb a certain degree of "surprise"/damage, and clearly explain that beyond that certain level, service will be degraded (or even absent) and there is nothing that can and nothing that will be done immediately.

Every network fails at a certain moment in time. You just need to make sure you know how to make it working again, within a reasonable time frame. Or have a good run-away plan (sometimes this is the best solution).
Re: Bottlenecks and link upgrades [ In reply to ]
+1

You can't foresee everything, but no plan means foreseeing nothing, =
blindfold.

Cheers,

Etienne

On Sat, Aug 15, 2020 at 12:29 PM Radu-Adrian Feurdean <
nanog@radu-adrian.feurdean.net> wrote:

> On Sat, Aug 15, 2020, at 11:35, Baldur Norddahl wrote:
> > No plan survives contact with the enemy. Your careful made growth
> > projection was fine until the brass made a deal with some major
> > customer, which caused a traffic spike.
>
> Capacity planning also includes keeping an eye on what is being sold and
> what is being prepared.
> Having the traffic more than double within a 48h timespan (until day X
> peak at N Gbps, after days X+2, peaks at 2.5*N Gbps) -> done with success
> when the correct information ("partner X will change delivery system")
> arrived 4 months in advance.
>
> Having multiple 200 Mbps and 500 Mbps connections over an already-used 1
> Gbps port and pretending that "everything's gonna be allright" , in that
> case you should confront your enemy.
>
> > Or any infinite other events that could and eventually will happen to
> you.
>
> Among which you try to protect yourself against the most realistic ones.
>
> > One hard thing, that almost everyone will get wrong at some point, is
> > simulating load in the event multiple outages takes some links out,
> > causing excessive traffic to reroute unto links that previously seemed
> > fine.
>
> You should scale the network to absorb a certain degree of
> "surprise"/damage, and clearly explain that beyond that certain level,
> service will be degraded (or even absent) and there is nothing that can and
> nothing that will be done immediately.
>
> Every network fails at a certain moment in time. You just need to make
> sure you know how to make it working again, within a reasonable time frame.
> Or have a good run-away plan (sometimes this is the best solution).
>


--
Ing. Etienne-Victor Depasquale
Assistant Lecturer
Department of Communications & Computer Engineering
Faculty of Information & Communication Technology
University of Malta
Web. https://www.um.edu.mt/profile/etiennedepasquale
Re: Bottlenecks and link upgrades [ In reply to ]
On 15/Aug/20 01:45, Radu-Adrian Feurdean wrote:

>
> I think you're over-confident.

If you can resist the "let me make a plan" offer that CFO's would want
you to give them, you can be confident :-). Because when it hits the
fan, the CFO will say, "But Feurdean said he would make a plan. If he
thought the situation was urgent, he didn't make it known clearly enough".

Better to say, "CFO, if you don't do this upgrade, the network breaks".

And walk away.

Don't accept risk on behalf of someone else, because at the end of the
day, no one will blame the network... but those that operate it.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 15/Aug/20 10:47, Etienne-Victor Depasquale wrote:

> I've seen the weekly profiles of traffic sourced from caches for the
> major global services (video, social media, search and general) for a
> specific metro area.
>
> For all services, the weekly profile is a repetition of the daily
> profile, within +/- 20%. 
> That is: the weekly profile is obtained from the daily profile
> within +/- 20% of the average daily profile height.
>
> Given this regularity, as suggested by Louie Lee, then it seems that
> growth projections are meaningful.
> That is, the weely profile data, seem to provide a sound empirical
> basis for link upgrades.
>
> Since I'm not an operator, my comments need to be sprinkled with a
> pinch of salt :)

Provided your NMS has been stable over any period of time, you can
extract historical data over 1 year or more and see how linearly things
grew.

It's difficult to sometimes see the growth rate when you are close to
the daily action.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 15/Aug/20 11:35, Baldur Norddahl wrote:

> No plan survives contact with the enemy. Your careful made growth
> projection was fine until the brass made a deal with some major
> customer, which caused a traffic spike. Or any infinite other events
> that could and eventually will happen to you.

That's why your operations teams cannot work separately from the Sales
teams. If a big deal is in the pipeline, there should be someone
operational to do a simple feasibility check to see if the segment in
question will handle the traffic. If not, defer to standard lead times
to deliver. Or even extended ones if the deal is larger than usual.


>
> One hard thing, that almost everyone will get wrong at some point, is
> simulating load in the event multiple outages takes some links out,
> causing excessive traffic to reroute unto links that previously seemed
> fine.

So rather than simulate, insure, I say. By insure, I mean upgrade each
and every backbone link when it hits 50%, and you'll have less to worry
about when things start crumbling all over the place.

Mark.
Re: Bottlenecks and link upgrades [ In reply to ]
On 15/Aug/20 12:32, Etienne-Victor Depasquale wrote:

> +1
>
> You can't foresee everything, but no plan means foreseeing nothing, =
> blindfold.

In the absence of guidance from your Sales team on a forecast, keep the
50% threshold trigger, and standardize on lead times if urgent
feasibilities don't immediately pass.

The more you do this, the more you will encourage better planning on the
Sales side. It just happens automatically.

The worst thing you can do for yourself and your team is try to be the
hero.

Mark.