Mailing List Archive

Internet monitoring in case of general issues
Many times we recognize issues on internet, customer asking why additional
delays are experienced, why it takes so long to access services, why "this
afternoon is slow", we notice fresh bgp updates, etc etc...

Everybody should know internet is cheap but unrealiable, customers many
times would like to save money with an ipsec vpn but then ask for
penalities if the service is not reachable, there is ddos opportunity etc
etc

The question: once you notice issues on internet and your upstreams are
fine, what instrument or service or commands or web site do you use to try
to find out where is the problem and who is experiencing the problem (ie a
tier1 carrier)?

Cheers
James
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
On 14/Mar/20 15:24, james list wrote:

> Many times we recognize issues on internet, customer asking why additional
> delays are experienced, why it takes so long to access services, why "this
> afternoon is slow", we notice fresh bgp updates, etc etc...
>
> Everybody should know internet is cheap but unrealiable, customers many
> times would like to save money with an ipsec vpn but then ask for
> penalities if the service is not reachable, there is ddos opportunity etc
> etc
>
> The question: once you notice issues on internet and your upstreams are
> fine, what instrument or service or commands or web site do you use to try
> to find out where is the problem and who is experiencing the problem (ie a
> tier1 carrier)?

It's helpful if you know some people in the industry for very obscure
faults far beyond your network. Or at least, someone who knows the
person you need to find.

Beyond checking your own network, checking your direct peers, checking
your direct transit providers, and using the regular tools (ping,
traceroute, remote looking glasses, e.t.c.), if a problem is beyond your
usual boundaries and you need to get that looked at, being on the right
mailing list or knowing the right people will be one of your biggest tools.

I know I've helped a lot of people running networks in places I've never
heard of, and vice versa, because we've met on a mailing list, at beer
session after a conference, or something in between.

So next time you're a peering forum or similar conference, don't skip
the beer session :-).

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
* james list

> The question: once you notice issues on internet and your upstreams are
> fine, what instrument or service or commands or web site do you use to try
> to find out where is the problem and who is experiencing the problem (ie a
> tier1 carrier)?

We find that being an NLNOG RING (https://ring.nlnog.net/) participant is very useful in diagnosing these kind of issues. We can start pings or traceroutes towards towards our own network from 500+ locations all over the globe with a single command, for example. Furthermore, there is a tool (ring-sqa) that does pretty much this continuously and alerts us if a partial outage is detected.

Tore
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
RIPE Atlas is also quite useful: https://atlas.ripe.net - beside NLNOG RING.

Am So., 15. März 2020 um 08:58 Uhr schrieb Tore Anderson <tore@fud.no>:

> * james list
>
> > The question: once you notice issues on internet and your upstreams are
> > fine, what instrument or service or commands or web site do you use to
> try
> > to find out where is the problem and who is experiencing the problem (ie
> a
> > tier1 carrier)?
>
> We find that being an NLNOG RING (https://ring.nlnog.net/) participant is
> very useful in diagnosing these kind of issues. We can start pings or
> traceroutes towards towards our own network from 500+ locations all over
> the globe with a single command, for example. Furthermore, there is a tool
> (ring-sqa) that does pretty much this continuously and alerts us if a
> partial outage is detected.
>
> Tore
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>


--
*Marcel Bößendörfer*
Geschäftsführer / CEO

*marbis GmbH*
Griesbachstr. 10
76185 Karlsruhe, Germany

Phone: +49 721 754044-11
Fax: +49 800 100 3860
E-Mail: m.boessendoerfer@nitrado.net
Web: marbis.net / nitrado.net

*Registered Office | Sitz der Gesellschaft:* Karlsruhe
*Register Court | Registergericht:* AG Mannheim, HRB 713868
*Managing Directors | Geschäftsführer:* Marco Balle, Marcel Bößendörfer

Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain
confidential and/or privileged information. If you are not the intended
recipient (or have received this e-mail in error) please notify the sender
immediately and delete this e-mail. Any unauthorized copying, disclosure or
distribution of the material in this e-mail is strictly forbidden.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
All,

It seems that most answers and in fact the question itself assumes that all
we can do here is to be reactive. In my books that is an indication that we
have already failed.

I do think that any one who has more then one internet upstream ISPs (full
table or even defaults out) can do performance routing in real time by
evaluating quality of TCP sessions across 2 (or more). Based on that data
it can intelligently shift the exit traffic on a per prefix basis.

Folks like Google or Facebook are using such home grown tools for a long
time (Espresso, Edge Fabric). Cisco pfr had at least originally single
sided Internet edge OER. Of course with a bit of automation skills any
one can build your own tool too - the only real requirement is tapped
traffic such that you can passively measure the TCP quality to your user
destinations.

For TCP analysis for few years now I am using https://palermotec.net/ analyzer.
TCP analytics it offers is simply fantastic. GUI and user interface still
needs improvement - if someone is to rely on that. Just fyi ... I am also
working with that team to build (Smart Edge Routing) SER controller - they
already have alpha version, but hopefully in the coming months there will
be more progress making it beta and eft. The assumption is of course that
all interaction with routers (any vendor) is over standard protocols (BGP
or static).

As we all know each decent SD-WAN or Cisco iWAN has ability to monitor
performance over the mesh of endpoints and choose more optimal paths. But
that is slightly different as it relies on both ends ownership. Here I
assume we are talking about just single sided exit routing where we have
zero control over dst.

All above is about exit. To do analogy inbound is also to some extent
possible if you are advertising prefixes for your services out. But here
the issue is much more difficult from the perspective of aggregating
different services - so at most one could just average which uplinks are
best for a given prefix for *most* of the users.

Kind regards,
R.


On Sun, Mar 15, 2020 at 9:14 AM Marcel Bößendörfer <mb@marbis.net> wrote:

> RIPE Atlas is also quite useful: https://atlas.ripe.net - beside NLNOG
> RING.
>
> Am So., 15. März 2020 um 08:58 Uhr schrieb Tore Anderson <tore@fud.no>:
>
> > * james list
> >
> > > The question: once you notice issues on internet and your upstreams are
> > > fine, what instrument or service or commands or web site do you use to
> > try
> > > to find out where is the problem and who is experiencing the problem
> (ie
> > a
> > > tier1 carrier)?
> >
> > We find that being an NLNOG RING (https://ring.nlnog.net/) participant
> is
> > very useful in diagnosing these kind of issues. We can start pings or
> > traceroutes towards towards our own network from 500+ locations all over
> > the globe with a single command, for example. Furthermore, there is a
> tool
> > (ring-sqa) that does pretty much this continuously and alerts us if a
> > partial outage is detected.
> >
> > Tore
> > _______________________________________________
> > juniper-nsp mailing list juniper-nsp@puck.nether.net
> > https://puck.nether.net/mailman/listinfo/juniper-nsp
> >
>
>
> --
> *Marcel Bößendörfer*
> Geschäftsführer / CEO
>
> *marbis GmbH*
> Griesbachstr. 10
> 76185 Karlsruhe, Germany
>
> Phone: +49 721 754044-11
> Fax: +49 800 100 3860
> E-Mail: m.boessendoerfer@nitrado.net
> Web: marbis.net / nitrado.net
>
> *Registered Office | Sitz der Gesellschaft:* Karlsruhe
> *Register Court | Registergericht:* AG Mannheim, HRB 713868
> *Managing Directors | Geschäftsführer:* Marco Balle, Marcel Bößendörfer
>
> Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
> irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und
> vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
> Weitergabe dieser Mail ist nicht gestattet. This e-mail may contain
> confidential and/or privileged information. If you are not the intended
> recipient (or have received this e-mail in error) please notify the sender
> immediately and delete this e-mail. Any unauthorized copying, disclosure or
> distribution of the material in this e-mail is strictly forbidden.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
On 15/Mar/20 12:56, Robert Raszuk wrote:
> All,
>
> It seems that most answers and in fact the question itself assumes that all
> we can do here is to be reactive. In my books that is an indication that we
> have already failed.
>
> I do think that any one who has more then one internet upstream ISPs (full
> table or even defaults out) can do performance routing in real time by
> evaluating quality of TCP sessions across 2 (or more). Based on that data
> it can intelligently shift the exit traffic on a per prefix basis.
>
> Folks like Google or Facebook are using such home grown tools for a long
> time (Espresso, Edge Fabric). Cisco pfr had at least originally single
> sided Internet edge OER. Of course with a bit of automation skills any
> one can build your own tool too - the only real requirement is tapped
> traffic such that you can passively measure the TCP quality to your user
> destinations.
>
> For TCP analysis for few years now I am using https://palermotec.net/ analyzer.
> TCP analytics it offers is simply fantastic. GUI and user interface still
> needs improvement - if someone is to rely on that. Just fyi ... I am also
> working with that team to build (Smart Edge Routing) SER controller - they
> already have alpha version, but hopefully in the coming months there will
> be more progress making it beta and eft. The assumption is of course that
> all interaction with routers (any vendor) is over standard protocols (BGP
> or static).
>
> As we all know each decent SD-WAN or Cisco iWAN has ability to monitor
> performance over the mesh of endpoints and choose more optimal paths. But
> that is slightly different as it relies on both ends ownership. Here I
> assume we are talking about just single sided exit routing where we have
> zero control over dst.
>
> All above is about exit. To do analogy inbound is also to some extent
> possible if you are advertising prefixes for your services out. But here
> the issue is much more difficult from the perspective of aggregating
> different services - so at most one could just average which uplinks are
> best for a given prefix for *most* of the users.

My thing is you probably have much better insight and control for
on-net. That leaves off-net as the main issue, as your direct upstream
and peering may be fine, but beyond that is anyone's guess.

If you are monitoring a specific off-net target for some reason, you can
easily control for that. But if you are looking at a generalized
situation, that's a lot harder.

Doubly worse for operators who connect to only one or two upstream
providers/peering points.

I'm not saying we should resign ourselves to, "Ah well, I can't fix what
I can't touch"; but time and resources are limited, so given your
circumstances, spend some time finding out how best to deploy them as
you also seek elegant ways to solve this particular issue itself.

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
Hey Mark,

Just to clarify

My thing is you probably have much better insight and control for
> on-net. That leaves off-net as the main issue, as your direct upstream
> and peering may be fine, but beyond that is anyone's guess.
>

Ahh no. See with decent TCP analyzer I am monitoring end to end network
behaviour regardless of the point I setup the TAP between src and dst.
Clearly easiest is to insert TAP in my ISP peering links. So no guessing -
real data only :).


> If you are monitoring a specific off-net target for some reason, you can
> easily control for that. But if you are looking at a generalized
> situation, that's a lot harder.
>

Ahh that is a fundamental misunderstanding to what I was apparently not
well trying to describe. I am not monitoring any targets. I am monitoring
and performing real time TCP analytics (using said analyzer) of all my
in-out data. And here I have few options. I can focus on optimizing most
active sessions per per volume. I can focus on optimizing sessions per
src/dst port, I can focus on optimizing sessions experiencing most
retransmissions or I could just try to improve RTT, jitter etc ...


> Doubly worse for operators who connect to only one or two upstream
> providers/peering points.
>

I understand that if you have no right tools the problem is hard to solve.
But that is like everything else :) Go cut piece of wood with even best
japanese kitchen knife ....


> I'm not saying we should resign ourselves to, "Ah well, I can't fix what
> I can't touch"; but time and resources are limited, so given your
> circumstances, spend some time finding out how best to deploy them as
> you also seek elegant ways to solve this particular issue itself.
>

Fair. And the only point of my note is to just share a bit different
perspective. I am very well aware that we as networking industry are really
in the stone age as far as various aspects of performance routing is
concerned. Or for that matter dual disjoint path routing without any static
configuration or building two topologies.

All networking today Internet, intradomain, DC cares about reachability.
Quality of that reachability is pushed to the application layer. Well sure
this is ok if you have good app which can build multiple connections to
different destinations like say torrent. But not all apps are like that and
some would like network to be a little bit more smart :)

Best,
R.

On Sun, Mar 15, 2020 at 12:31 PM Mark Tinka <mark.tinka@seacom.mu> wrote:

>
>
> On 15/Mar/20 12:56, Robert Raszuk wrote:
> > All,
> >
> > It seems that most answers and in fact the question itself assumes that
> all
> > we can do here is to be reactive. In my books that is an indication that
> we
> > have already failed.
> >
> > I do think that any one who has more then one internet upstream ISPs
> (full
> > table or even defaults out) can do performance routing in real time by
> > evaluating quality of TCP sessions across 2 (or more). Based on that data
> > it can intelligently shift the exit traffic on a per prefix basis.
> >
> > Folks like Google or Facebook are using such home grown tools for a long
> > time (Espresso, Edge Fabric). Cisco pfr had at least originally single
> > sided Internet edge OER. Of course with a bit of automation skills any
> > one can build your own tool too - the only real requirement is tapped
> > traffic such that you can passively measure the TCP quality to your user
> > destinations.
> >
> > For TCP analysis for few years now I am using https://palermotec.net/
> analyzer.
> > TCP analytics it offers is simply fantastic. GUI and user interface still
> > needs improvement - if someone is to rely on that. Just fyi ... I am also
> > working with that team to build (Smart Edge Routing) SER controller -
> they
> > already have alpha version, but hopefully in the coming months there will
> > be more progress making it beta and eft. The assumption is of course that
> > all interaction with routers (any vendor) is over standard protocols (BGP
> > or static).
> >
> > As we all know each decent SD-WAN or Cisco iWAN has ability to monitor
> > performance over the mesh of endpoints and choose more optimal paths. But
> > that is slightly different as it relies on both ends ownership. Here I
> > assume we are talking about just single sided exit routing where we have
> > zero control over dst.
> >
> > All above is about exit. To do analogy inbound is also to some extent
> > possible if you are advertising prefixes for your services out. But here
> > the issue is much more difficult from the perspective of aggregating
> > different services - so at most one could just average which uplinks are
> > best for a given prefix for *most* of the users.
>
> My thing is you probably have much better insight and control for
> on-net. That leaves off-net as the main issue, as your direct upstream
> and peering may be fine, but beyond that is anyone's guess.
>
> If you are monitoring a specific off-net target for some reason, you can
> easily control for that. But if you are looking at a generalized
> situation, that's a lot harder.
>
> Doubly worse for operators who connect to only one or two upstream
> providers/peering points.
>
> I'm not saying we should resign ourselves to, "Ah well, I can't fix what
> I can't touch"; but time and resources are limited, so given your
> circumstances, spend some time finding out how best to deploy them as
> you also seek elegant ways to solve this particular issue itself.
>
> Mark.
> _______________________________________________
> juniper-nsp mailing list juniper-nsp@puck.nether.net
> https://puck.nether.net/mailman/listinfo/juniper-nsp
>
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: Internet monitoring in case of general issues [ In reply to ]
On 15/Mar/20 17:25, Robert Raszuk wrote:

>
> Ahh no. See with decent TCP analyzer I am monitoring end to end
> network behaviour regardless of the point I setup the TAP between src
> and dst. Clearly easiest is to insert TAP in my ISP peering links. So
> no guessing - real data only :).

Yes - I'm not talking about performance between two targets. I'm talking
about what you can do to fix those issues once you've found them.


> Ahh that is a fundamental misunderstanding to what I was apparently
> not well trying to describe. I am not monitoring any targets. I am
> monitoring and performing real time TCP analytics (using said
> analyzer) of all my in-out data. And here I have few options. I can
> focus on optimizing most active sessions per per volume. I can focus
> on optimizing sessions per src/dst port, I can focus on optimizing
> sessions experiencing most retransmissions or I could just try to
> improve RTT, jitter etc ...

Practically, I'd be very keen to understand how you can optimize for
volume, beyond the obvious.


> I understand that if you have no right tools the problem is hard to
> solve. But that is like everything else :) Go cut piece of wood with
> even best japanese kitchen knife ....

Again, my issue isn't the tools. Those are plenty.

My issue how to effectively fix issues that are so far apart between
source and target.


> Fair. And the only point of my note is to just share a bit different
> perspective. I am very well aware that we as networking industry are
> really in the stone age as far as various aspects of performance
> routing is concerned. Or for that matter dual disjoint path routing
> without any static configuration or building two topologies. 
>
> All networking today Internet, intradomain, DC cares about
> reachability. Quality of that reachability is pushed to the
> application layer. Well sure this is ok if you have good app which can
> build multiple connections to different destinations like say torrent.
> But not all apps are like that and some would like network to be a
> little bit more smart :)

So for me, this is my biggest problem with the theme of this thread.

Identifying the problem between source and target isn't the problem.
Getting it fixed is.

It's great that we can optimize for volume as best as possible. But most
of the tickets customers send through are, "I am having a problem
getting from my network A to network B, or network B getting to me,
network A. In many cases, network B is some obscure customer of some
obscure network who probably has no personal or commercial relationship
with me or my customer, and trying to find whomever operates the network
that serves network B is much harder than verifying there is an issue to
begin with.

For me, improving the communication between operators and their
downstream customers to get problems fixed is what really matters to me.

I am regularly contacted by various people I know or don't know to help
fix issues between the network I operate and the one looking for some
help. And vice versa. And I am sure several folk on this and other lists
are going through the same. It's rather rudimentary, but for better or
worse, it's the best we have at the moment.

As we've all seen over the decades, most people just gave up and left
the problem to CDN's to fix :-).

Mark.
_______________________________________________
juniper-nsp mailing list juniper-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/juniper-nsp