Hello list,
I'm testing nProbe listening from two different 10Gb interfaces (using i40e
pf_ring's driver). As we need to mix up the information of these two links,
we use a custom application (using pf_ring sources) that creates a virtual
interface with traffic from the original physical ones.
The total traffic is about 3Gbps (we expect to have more), but we are
seeing around 50-60% packet drops between our application and nProbe. When
testing with zcount/pfcount instead of nProbe, we see 0% drops.
I've made some tuning in the nProbe parameters (hash-size, max-num-flows,
idle-timeout, ...), but no significant changes has been noticed.
Drops are fewer when disabling the export to Kafka and disabling all
plugins (GTPv1, GTPv2 and HTTP), although we always have (around 1-2%).
I'm pasting below my nProbe configuration, and some traffic statistics.
Do you have any recommendation I could follow to improve this performance?
Thanks a lot in advance.
-- System:
nProbe: nprobe-8.0.171020-5797.x86_64
System RAM: 64GB
System CPU: 12 cores
System OS: CentOS Linux release 7.4.1708 (Core)
Linux Kernel: 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux
-- nProbe configuration:
-n=none
-i=zc:1@0
-s=128
-t=60
-d=30
-a=0
-e=1
-B=10
-w=4048000
-M=10000000
-z=0
-S=1:1
-E=0:0
-g=/var/run/nprobe-zc1-0.pid
--vlanid-as-iface-idx=none
-V=9
-T="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES
%FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS
%PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME %SRC_IP_COUNTRY
%SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY %DST_IP_CITY
%DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN
%DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP
%HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST
%HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID
%DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR
%UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT
%GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID
%GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID
%GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID
%GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP
%GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN
%GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC
%GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP
%GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP
%GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID
%GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP
%GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA
%GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL
%GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN
%GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC
%GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC
%GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE
%SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS
%APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS
%RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS
%OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL
%MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH
%SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT
%SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT
%DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT
%NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES
%NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES
%NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT
%SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32
%NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128
%NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224
%NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN
%TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT
%TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
-A=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
--city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
--kafka "127.0.0.1:9092;TEST"
-D=t
--tunnel
-b=1
--smart-udp-frags
-f="udp"
-- nProbe traffic statistics:
19/Jan/2018 18:49:15 [nprobe.c:3106] Average traffic: [227.07 K pps][All
Traffic 1.18 Gb/sec][IP Traffic 1.05 Gb/sec][ratio 0.89]
19/Jan/2018 18:49:15 [nprobe.c:3114] Current traffic: [222.87 K pps][1.14
Gb/sec]
19/Jan/2018 18:49:15 [nprobe.c:3120] Current flow export rate: [5350.3
flows/sec]
19/Jan/2018 18:49:15 [nprobe.c:3123] Flow drops: [export queue too
long=0][too many flows=0][ELK queue flow drops=0]
19/Jan/2018 18:49:15 [nprobe.c:3128] Export Queue: 510035/4048000 [12.6 %]
19/Jan/2018 18:49:15 [nprobe.c:3133] Flow Buckets:
[active=541656][allocated=1051691][toBeExported=510035]
19/Jan/2018 18:49:15 [nprobe.c:3139] Kafka [flows exported=9071081/5350.3
flows/sec][msgs sent=9071081/1.0 flows/msg][send errors=0]
19/Jan/2018 18:49:15 [nprobe.c:2956] Processed packets: 388520559 (max
bucket search: 19)
19/Jan/2018 18:49:15 [nprobe.c:2939] Fragment queue length: 0
19/Jan/2018 18:49:15 [nprobe.c:2962] WARNING: Your bucket search is too
slow (19): expect drops
19/Jan/2018 18:49:15 [nprobe.c:2965] Flow export stats: [0 bytes/0 pkts][0
flows/0 pkts sent]
19/Jan/2018 18:49:15 [nprobe.c:2975] Flow drop stats: [1241630041
bytes/273259031 pkts][0 flows]
19/Jan/2018 18:49:15 [nprobe.c:2980] Total flow stats: [1241630041
bytes/273259031 pkts][0 flows/0 pkts sent]
19/Jan/2018 18:49:15 [nprobe.c:2991] Kafka [flows exported=9071083][msgs
sent=9071083/1.0 flows/msg][send errors=0]
-- Our application statistics:
19/Jan/2018 18:49:15 [zcluster.c:173] Absolute Stats: Recv 5'604'403'857
pkts (0 drops) - Forwarded 2'197'198'847 pkts (3'407'205'010 drops)
19/Jan/2018 18:49:15 [zcluster.c:205] (In p2p1) RX 1176494979 pkts
Dropped 0 pkts (0.0 %)
19/Jan/2018 18:49:15 [zcluster.c:205] (In p2p2) RX 4427908920 pkts
Dropped 0 pkts (0.0 %)
19/Jan/2018 18:49:15 [zcluster.c:219] (Out Local) num:0 RX
2197166603 pkts Dropped 3407205052 pkts (60.8 %)
19/Jan/2018 18:49:15 [zcluster.c:239] Actual Stats: Recv 538'086.38 pps
(0.00 drops) - Forwarded 209'878.02 pps (328'208.35 drops)
--
Regards,
David Notivol
dnotivol@gmail.com
I'm testing nProbe listening from two different 10Gb interfaces (using i40e
pf_ring's driver). As we need to mix up the information of these two links,
we use a custom application (using pf_ring sources) that creates a virtual
interface with traffic from the original physical ones.
The total traffic is about 3Gbps (we expect to have more), but we are
seeing around 50-60% packet drops between our application and nProbe. When
testing with zcount/pfcount instead of nProbe, we see 0% drops.
I've made some tuning in the nProbe parameters (hash-size, max-num-flows,
idle-timeout, ...), but no significant changes has been noticed.
Drops are fewer when disabling the export to Kafka and disabling all
plugins (GTPv1, GTPv2 and HTTP), although we always have (around 1-2%).
I'm pasting below my nProbe configuration, and some traffic statistics.
Do you have any recommendation I could follow to improve this performance?
Thanks a lot in advance.
-- System:
nProbe: nprobe-8.0.171020-5797.x86_64
System RAM: 64GB
System CPU: 12 cores
System OS: CentOS Linux release 7.4.1708 (Core)
Linux Kernel: 3.10.0-693.11.6.el7.x86_64 #1 SMP Thu Jan 4 01:06:37 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux
-- nProbe configuration:
-n=none
-i=zc:1@0
-s=128
-t=60
-d=30
-a=0
-e=1
-B=10
-w=4048000
-M=10000000
-z=0
-S=1:1
-E=0:0
-g=/var/run/nprobe-zc1-0.pid
--vlanid-as-iface-idx=none
-V=9
-T="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES
%FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS
%PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME %SRC_IP_COUNTRY
%SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY %DST_IP_CITY
%DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN
%DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP
%HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST
%HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID
%DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR
%UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT
%GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID
%GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID
%GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID
%GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP
%GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN
%GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC
%GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP
%GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP
%GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID
%GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP
%GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA
%GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL
%GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN
%GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC
%GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC
%GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE
%SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS
%APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS
%RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS
%OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL
%MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH
%SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT
%SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT
%DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT
%NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES
%NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES
%NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT
%SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32
%NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128
%NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224
%NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN
%TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT
%TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
-A=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
--city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
--kafka "127.0.0.1:9092;TEST"
-D=t
--tunnel
-b=1
--smart-udp-frags
-f="udp"
-- nProbe traffic statistics:
19/Jan/2018 18:49:15 [nprobe.c:3106] Average traffic: [227.07 K pps][All
Traffic 1.18 Gb/sec][IP Traffic 1.05 Gb/sec][ratio 0.89]
19/Jan/2018 18:49:15 [nprobe.c:3114] Current traffic: [222.87 K pps][1.14
Gb/sec]
19/Jan/2018 18:49:15 [nprobe.c:3120] Current flow export rate: [5350.3
flows/sec]
19/Jan/2018 18:49:15 [nprobe.c:3123] Flow drops: [export queue too
long=0][too many flows=0][ELK queue flow drops=0]
19/Jan/2018 18:49:15 [nprobe.c:3128] Export Queue: 510035/4048000 [12.6 %]
19/Jan/2018 18:49:15 [nprobe.c:3133] Flow Buckets:
[active=541656][allocated=1051691][toBeExported=510035]
19/Jan/2018 18:49:15 [nprobe.c:3139] Kafka [flows exported=9071081/5350.3
flows/sec][msgs sent=9071081/1.0 flows/msg][send errors=0]
19/Jan/2018 18:49:15 [nprobe.c:2956] Processed packets: 388520559 (max
bucket search: 19)
19/Jan/2018 18:49:15 [nprobe.c:2939] Fragment queue length: 0
19/Jan/2018 18:49:15 [nprobe.c:2962] WARNING: Your bucket search is too
slow (19): expect drops
19/Jan/2018 18:49:15 [nprobe.c:2965] Flow export stats: [0 bytes/0 pkts][0
flows/0 pkts sent]
19/Jan/2018 18:49:15 [nprobe.c:2975] Flow drop stats: [1241630041
bytes/273259031 pkts][0 flows]
19/Jan/2018 18:49:15 [nprobe.c:2980] Total flow stats: [1241630041
bytes/273259031 pkts][0 flows/0 pkts sent]
19/Jan/2018 18:49:15 [nprobe.c:2991] Kafka [flows exported=9071083][msgs
sent=9071083/1.0 flows/msg][send errors=0]
-- Our application statistics:
19/Jan/2018 18:49:15 [zcluster.c:173] Absolute Stats: Recv 5'604'403'857
pkts (0 drops) - Forwarded 2'197'198'847 pkts (3'407'205'010 drops)
19/Jan/2018 18:49:15 [zcluster.c:205] (In p2p1) RX 1176494979 pkts
Dropped 0 pkts (0.0 %)
19/Jan/2018 18:49:15 [zcluster.c:205] (In p2p2) RX 4427908920 pkts
Dropped 0 pkts (0.0 %)
19/Jan/2018 18:49:15 [zcluster.c:219] (Out Local) num:0 RX
2197166603 pkts Dropped 3407205052 pkts (60.8 %)
19/Jan/2018 18:49:15 [zcluster.c:239] Actual Stats: Recv 538'086.38 pps
(0.00 drops) - Forwarded 209'878.02 pps (328'208.35 drops)
--
Regards,
David Notivol
dnotivol@gmail.com