Mailing List Archive: Few general question on using nprobe as a collector with Kafka

I am fairly new to nprobe and have been experimenting with the many
commandline options. I have a few general questions that I would appreciate
any clarification.

nprobe -v

Welcome to nProbe v.8.0.171020 (r5797) for x86_64-unknown-linux-gnu
with native PF_RING acceleration.
Copyright 2002-17 ntop.org

Build OS: CentOS Linux release 7.3.1611 (Core)
SystemID: 68A2B43E76056A7E
GIT rev: 8.0-stable:478c52c6ce70feaf6c65fe4806be05f75fe0e196:20171020
License: Invalid nProbe license (/etc/nprobe.license) [Missing
license file]

Q1. When running on a multi-core host, will nprobe utilize all cores.
Somewhere, I thought I saw something about it being single threaded but now
cannot find that reference. This question goes to sizing my HW. I am seeing
~5% CPU load for one router's flow (about 2500 flow records/sec). I will
ultimately need more than 20x this volume so I need to deploy N hosts
eventually in full production setup. I just want to know if there are any
settings needed to enable nprobe to fully utilize all cores on a given host.

Q2. I am running with this configuration:

[root@vmwdnacollector01 ~]# cat /etc/nprobe/nprobe.conf
--interface=none
--collector=none
--collector-port=2055
--verbose=1
--flow-version=9
--hash-size=262144
--kafka="kafka01:9092;netflow-raw;1"
--dump-stats=/var/log/nprobe/stats.txt
--event-log=/var/log/nprobe/events.txt
-T="%IPV4_SRC_ADDR %IPV4_DST_ADDR %L4_SRC_PORT %L4_DST_PORT %IPV4_SRC_MASK
%IPV4_DST_MASK %IPV4_NEXT_HOP %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES
%FIRST_SWITCHED %LAST_SWITCHED %TCP_FLAGS %PROTOCOL %SRC_TOS %DIRECTION
%EXPORTER_IPV4_ADDRESS"

I am collecting netflow V9 records from a Cisco router. I was sort of
expecting that the record would include the IP address of the router
because I need that to know where the data came from for upstream
enrichment. I have nprobe publishing to Kafka. But, looking at the raw
flows coming from the router, there is no field that identifies the router
IP. So, I experimented and added a -T <template> definition that matches
the actual fields coming from the router. Then I added the
%EXPORTER_IPV4_ADDRESS field (which is NOT in the raw record from the
router) and voila, the IP address of the router shows up in that field. So,
I assume that nprobe is simply adding the source IP address of each
incoming flow record into that field, as well as mapping each field in the
incoming flow record into the matching field in my defined template - sort
of "cherry picking" the fields out of the source record and packing them
into my template.

So, my question on this point is, am I doing this correctly with defining
my own template? Seems like the only way I can figure it out.

Q3. It appears, for the mode I am operating in, that no license is required
to allow this to work. When I run in the mode where nprobe sniffs packets
from my local interface, it will only produce 25K flows then stops if there
is no license. However, in collector mode, where it just receives flows
from a router and forwards them as JSON to Kafka, it runs for millions of
flows. So, question here is, do I need a license for this sort of use case?

Q4. The Kafka producer has a boat load of configuration options but nprobe
only exposes a couple basic options (topic, acks, brokers). Is that it or
is there some way to provide additional configuration information to the
embedded producer? For example, to properly aggregate data flows, I would
like to partition the topic on the IPV4_SRC_ADDR. I am running in a
multi-tenant environment where each tenant can have overlapping private IP
addresses that we see in the flows. So, I need to aggregate the flows by
TENANT_ID + IPV4_SRC_ADDR, for example. I see no way to configure this with
nprobe + kafka mode.

Q5. Is there any way to bind nprobe to specific interface when used as a
collector in my use case? Meaning, I might need to run multiple instances
on a single host but I want to be able to configure routers to direct their
flow records to a specific IP address so that I can load-balance the flows
over N instances of nprobe running on a single host. I cannot find any
configuration option that will bind the UDP listening port to a specific
interface on a single host.

Thanks for any insights into my questions.

Hi Mark
please see below, but first of all please move to 8.2 as we have fixes
many issues and many improvements in particular when collecting flows
https://www.ntop.org/category/nprobe/

On 12/12/2017 06:09 PM, Mark Petronic wrote:
> I am fairly new to nprobe and have been experimenting with the many
> commandline options. I have a few general questions that I would
> appreciate any clarification.
>
> nprobe -v
>
> Welcome to nProbe v.8.0.171020 (r5797) for x86_64-unknown-linux-gnu
> with native PF_RING acceleration.
> Copyright 2002-17 ntop.org <http://ntop.org>
>
> Build OS: CentOS Linux release 7.3.1611 (Core)
> SystemID: 68A2B43E76056A7E
> GIT rev:
> 8.0-stable:478c52c6ce70feaf6c65fe4806be05f75fe0e196:20171020
> License: Invalid nProbe license (/etc/nprobe.license) [Missing
> license file]
>
>
> Q1. When running on a multi-core host, will nprobe utilize all cores.
> Somewhere, I thought I saw something about it being single threaded
> but now cannot find that reference. This question goes to sizing my
> HW. I am seeing ~5% CPU load for one router's flow (about 2500 flow
> records/sec). I will ultimately need more than 20x this volume so I
> need to deploy N hosts eventually in full production setup. I just
> want to know if there are any settings needed to enable nprobe to
> fully utilize all cores on a given host.

nprobe will use one core because if you use RSS you can spawn an
instance per core. From our tests in collection mode we should be able
to handle ~20k flows/core
>
> Q2. I am running with this configuration:
>
> [root@vmwdnacollector01 ~]# cat /etc/nprobe/nprobe.conf
> --interface=none
> --collector=none
> --collector-port=2055
> --verbose=1
> --flow-version=9
> --hash-size=262144
> --kafka="kafka01:9092;netflow-raw;1"
> --dump-stats=/var/log/nprobe/stats.txt
> --event-log=/var/log/nprobe/events.txt
> -T="%IPV4_SRC_ADDR %IPV4_DST_ADDR %L4_SRC_PORT %L4_DST_PORT
> %IPV4_SRC_MASK %IPV4_DST_MASK %IPV4_NEXT_HOP %IN_PKTS %IN_BYTES
> %OUT_PKTS %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %TCP_FLAGS
> %PROTOCOL %SRC_TOS %DIRECTION %EXPORTER_IPV4_ADDRESS"
>
> I am collecting netflow V9 records from a Cisco router. I was sort of
> expecting that the record would include the IP address of the router
> because I need that to know where the data came from for upstream
> enrichment.

> I have nprobe publishing to Kafka. But, looking at the raw flows
> coming from the router, there is no field that identifies the router
> IP. So, I experimented and added a -T <template> definition that
> matches the actual fields coming from the router. Then I added
> the %EXPORTER_IPV4_ADDRESS field (which is NOT in the raw record from
> the router) and voila, the IP address of the router shows up in that
> field. So, I assume that nprobe is simply adding the source IP address
> of each incoming flow record into that field, as well as mapping each
> field in the incoming flow record into the matching field in my
> defined template - sort of "cherry picking" the fields out of the
> source record and packing them into my template.
>
> So, my question on this point is, am I doing this correctly with
> defining my own template? Seems like the only way I can figure it out.
This is correct. But in >= 8.2 this is done automatically when using
ZMQ, but we'll also extend to kafka as of this email
>
> Q3. It appears, for the mode I am operating in, that no license is
> required to allow this to work. When I run in the mode where nprobe
> sniffs packets from my local interface, it will only produce 25K flows
> then stops if there is no license. However, in collector mode, where
> it just receives flows from a router and forwards them as JSON to
> Kafka, it runs for millions of flows. So, question here is, do I need
> a license for this sort of use case?
>
yes you need a license

> Q4. The Kafka producer has a boat load of configuration options but
> nprobe only exposes a couple basic options (topic, acks, brokers). Is
> that it or is there some way to provide additional configuration
> information to the embedded producer? For example, to properly
> aggregate data flows, I would like to partition the topic on the
> IPV4_SRC_ADDR. I am running in a multi-tenant environment where each
> tenant can have overlapping private IP addresses that we see in the
> flows. So, I need to aggregate the flows by TENANT_ID + IPV4_SRC_ADDR,
> for example. I see no way to configure this with nprobe + kafka mode.

This is not possible, but Simone is the kafka expert: If you can agree
on what type of extensions are needed, we'll implement them
>
> Q5. Is there any way to bind nprobe to specific interface when used as
> a collector in my use case? Meaning, I might need to run multiple
> instances on a single host but I want to be able to configure routers
> to direct their flow records to a specific IP address so that I can
> load-balance the flows over N instances of nprobe running on a single
> host. I cannot find any configuration option that will bind the UDP
> listening port to a specific interface on a single host.

The -n option supports the format IP:port. So you would need this also
for the -3 option, correct?

Regards Luca

>
> Thanks for any insights into my questions.
>
>
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc