Mailing List Archive

Netflow vs SNMP
Running ASR9906 w/ IOS-XR version 7.5.2 and doing 1:15 Netflow export on
all interfaces (ingress only).

When comparing traffic stats with SNMP, Netflow stats always appear too
low (see attachment).

Opened a TAC case and their recommendation is to do 1:1 and I quote:

"Irrespective of the rate at which the NP punts the records to CPU,
exporter picks up a maximum of 2000 records at a time from the cache
that are eligible for export (timers, network/TCP session events, etc).
This is basically to avoid NetIO dropping the packets due to lack of
b/w. When the exporter wakes up again, it repeats the same."


Does this make sense to go 1:1 which will only increase the number of
Netflow record to export?  Everyone that does 1:1000 or 1:10000
sampling, do you also seen a discrepancy between Netflow stats vs SNMP
stats?


Thanks,

Hank
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: Netflow vs SNMP [ In reply to ]
Hi,

On Mon, Oct 02, 2023 at 09:13:55AM +0300, Hank Nussbacher via cisco-nsp wrote:
> When comparing traffic stats with SNMP, Netflow stats always appear too low
> (see attachment).
>
> Opened a TAC case and their recommendation is to do 1:1 and I quote:
>
> "Irrespective of the rate at which the NP punts the records to CPU, exporter
> picks up a maximum of 2000 records at a time from the cache that are
> eligible for export (timers, network/TCP session events, etc). This is
> basically to avoid NetIO dropping the packets due to lack of b/w. When the
> exporter wakes up again, it repeats the same."

I fail to see why it would make sense to increase the number of flow
exports if their reasoning is "$machinery is busy, so, flow exports are
exported slowly"...

I do like 1:1 netflow, but the ASR9k (at least the linecards we have)
are not suitable to do that, alas - flow cache does not go high enough,
and NPU PPS is limited.

We currently do 1:10, which mostly works OK for our load, but we still
see a few

LC/0/0/CPU0:Oct 2 08:14:24.825 MEDST: nfsvr[280]: %MGBL-NETFLOW-6-INFO_CACHE_SIZE_EXCEEDED : Cache size of 1000000 for monitor v4mon has been exceeded

every day... (from what I understand, there should be enough LC memory
to go higher with that cache, but it cannot be configured).

gert
--
"If was one thing all people took for granted, was conviction that if you
feed honest figures into a computer, honest figures come out. Never doubted
it myself till I met a computer with a sense of humor."
Robert A. Heinlein, The Moon is a Harsh Mistress

Gert Doering - Munich, Germany gert@greenie.muc.de
Re: Netflow vs SNMP [ In reply to ]
On 2 Oct 2023, at 13:13, Hank Nussbacher via cisco-nsp <cisco-nsp@puck.nether.net<mailto:cisco-nsp@puck.nether.net>> wrote:

Does this make sense to go 1:1 which will only increase the number of Netflow record to export? Everyone that does 1:1000 or 1:10000 sampling, do you also seen a discrepancy between Netflow stats vs SNMP stats?

No and no.

Ensure that the active flow timer is set to 60s, that the inactive flow timer is set to 5s, and that the NetFlow capture/analysis system is configured with those values.

For SNMP, ensure that the counter tabulation values are set to 60s/1m, and that the SNMP polling/analysis system is configured with those values.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: Netflow vs SNMP [ In reply to ]
On Mon, 2 Oct 2023 at 09:14, Hank Nussbacher via cisco-nsp
<cisco-nsp@puck.nether.net> wrote:

> Does this make sense to go 1:1 which will only increase the number of
> Netflow record to export? Everyone that does 1:1000 or 1:10000
> sampling, do you also seen a discrepancy between Netflow stats vs SNMP
> stats?

Both 1:1000 and 1:10000 make netflow expensive sflow. You will see
almost all records exported are exactly 1 packet of data. You are
spending a lot of resources storing that data and later exporting that
data out, when you only ever punch the flow exactly once.
This is because people have run the same configuration for decades,
while traffic has exponentially grown, so the probability of hitting
packets in the same flow twice has exponentially gone down. As the
amount of traffic grows, sampling needs to become more and more
aggressive to retain the same resolution. It is basically becoming
massively more expensive over time, and likely cache based in-line
netflow is dead in the water, and will become specialised in-line tap
devices for the few who actually can justify the cost.

Juniper has realised this, and PTX no longer uses cache at all, but
exports immediately after sampling.

IPFIX has newer sampling entities, which allow you to communicate that
every N packet you sample C packets. This would allow you to ensure
that once you fire sampling/export, you can sample enough packets to
fill the MTU on export, to have an ideal balance of resource use and
data density. Again entirely without cache, as cache does nothing
unless you have very very aggressive sampling.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: Netflow vs SNMP [ In reply to ]
On 02/10/2023 10:10, Dobbins, Roland wrote:

> Ensure that the active flow timer is set to 60s, that the inactive flow
> timer is set to 5s, and that the NetFlow capture/analysis system is
> configured with those values.
>
> For SNMP, ensure that the counter tabulation values are set to 60s/1m,
> and that the SNMP polling/analysis system is configured with those values.

We have set:
cache timeout inactive 15
Kentik recommends 15s:
https://github.com/kentik/config-snippets/blob/master/Cisco/IOS-XR/netflow-9.conf
but I will try 5s based on your feedback.


Regards,
Hank
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: Netflow vs SNMP [ In reply to ]
On 2 Oct 2023, at 17:10, Hank Nussbacher <hank@interall.co.il<mailto:hank@interall.co.il>> wrote:

cache timeout inactive 15
Kentik recommends 15s:

This is an old, out-of-date recommendation from Cisco should be retired.

5s is plenty of time for inactive flows.
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/
Re: Netflow vs SNMP [ In reply to ]
On Mon, 2 Oct 2023 at 13:22, Dobbins, Roland via cisco-nsp
<cisco-nsp@puck.nether.net> wrote:

> cache timeout inactive 15
> Kentik recommends 15s:
>
> This is an old, out-of-date recommendation from Cisco should be retired.
>
> 5s is plenty of time for inactive flows.

What is the basis for this recommendation? With 1:10k or 1:1k, either
way you'll have 1 packet per cache item. So 15, 5, 1, 0 would allow an
equal amount of cache row re-use, which is none.

--
++ytti
_______________________________________________
cisco-nsp mailing list cisco-nsp@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-nsp
archive at http://puck.nether.net/pipermail/cisco-nsp/