Mailing List Archive: nProbe performance, zbalance packet drops

nProbe performance, zbalance packet drops

Jun 26, 2018, 6:32 AM

Post #1 of 10 (3352 views)

Hello list,

We're using nProbe to export flows information to kafka. We're listening
from two 10Gb interfaces that we merge with zbalance_ipc, and we split them
into 16 queues to have 16 nprobe instances.

The problem is we are seeing about 40% packet drops reported by
zbalance_ipc, so it looks like nprobe is not capable of reading and
processing all the traffic. The CPU usage is really high, and the load
average is over 25-30.

Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
packets / second; and we're using i40e_zc driver.

Do you have any advice to try to improve this performance?
Does it make sense we're having packet drops with this amount of traffic,
and we're reaching the server limits? Or is any configuration we could tune
up to improve it?

Thanks in advance.

-- System:

nProbe: nProbe v.8.5.180625 (r6185)
System RAM: 64GB
System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6 cores,
2 threads per core)
System OS: CentOS Linux release 7.4.1708 (Core)
Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC
2018 x86_64 x86_64 x86_64 GNU/Linux

-- zbalance configuration:

zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log -v
-w

-- nProbe configuration:

--interface=zc:1@0
--pid-file=/var/run/nprobe-zc1-00.pid
--dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
--kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
--collector=none
--idle-timeout=60
--snaplen=128
--aggregation=0/1/1/1/0/0/0
--all-collectors=0
--verbose=1
--dump-format=t
--vlanid-as-iface-idx=none
--hash-size=1024000
--flow-delay=1
--count-delay=10
--min-flow-size=0
--netflow-engine=0:0
--sample-rate=1:1
--as-list=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
--city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
--flow-templ="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS
%OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT
%TCP_FLAGS %PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME
%SRC_IP_COUNTRY %SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY
%DST_IP_CITY %DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN
%DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP
%HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST
%HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID
%DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR
%UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT
%GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID
%GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID
%GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID
%GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP
%GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN
%GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC
%GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP
%GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP
%GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID
%GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP
%GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA
%GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL
%GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN
%GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC
%GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC
%GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE
%SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS
%APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS
%RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS
%OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL
%MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH
%SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT
%SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT
%DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT
%NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES
%NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES
%NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT
%SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32
%NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128
%NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224
%NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN
%TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT
%TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
--flow-version=9
--tunnel
--smart-udp-frags

--
Regards,
David Notivol
dnotivol@gmail.com

Re: nProbe performance, zbalance packet drops [ In reply to ]

cardigliano at ntop

Jun 26, 2018, 7:25 AM

Post #2 of 10 (3352 views)

Permalink

Hi David
please also provide statistics from zbalance_ipc (output or log file)
and nprobe (you can get live stats from /proc/net/pf_ring/stats/)

Thank you
Alfredo

> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com> wrote:
>
> Hello list,
>
> We're using nProbe to export flows information to kafka. We're listening from two 10Gb interfaces that we merge with zbalance_ipc, and we split them into 16 queues to have 16 nprobe instances.
>
> The problem is we are seeing about 40% packet drops reported by zbalance_ipc, so it looks like nprobe is not capable of reading and processing all the traffic. The CPU usage is really high, and the load average is over 25-30.
>
> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million packets / second; and we're using i40e_zc driver.
>
> Do you have any advice to try to improve this performance?
> Does it make sense we're having packet drops with this amount of traffic, and we're reaching the server limits? Or is any configuration we could tune up to improve it?
>
> Thanks in advance.
>
>
>
> -- System:
>
> nProbe: nProbe v.8.5.180625 (r6185)
> System RAM: 64GB
> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6 cores, 2 threads per core)
> System OS: CentOS Linux release 7.4.1708 (Core)
> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> -- zbalance configuration:
>
> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log -v -w
>
> -- nProbe configuration:
>
> --interface=zc:1@0
> --pid-file=/var/run/nprobe-zc1-00.pid
> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
> --kafka "192.168.0.1:9092 <http://192.168.0.1:9092/>,192.168.0.2:9092 <http://192.168.0.2:9092/>,192.168.0.3:9092;topic"
> --collector=none
> --idle-timeout=60
> --snaplen=128
> --aggregation=0/1/1/1/0/0/0
> --all-collectors=0
> --verbose=1
> --dump-format=t
> --vlanid-as-iface-idx=none
> --hash-size=1024000
> --flow-delay=1
> --count-delay=10
> --min-flow-size=0
> --netflow-engine=0:0
> --sample-rate=1:1
> --as-list=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
> --city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
> --flow-templ="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME %SRC_IP_COUNTRY %SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY %DST_IP_CITY %DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN %DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP %HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST %HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID %DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT %GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID %GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID %GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP %GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE %SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS %APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS %RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS %OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL %MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH %SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT %SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT %DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT %NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES %NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES %NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT %SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32 %NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128 %NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224 %NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN %TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT %TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
> --flow-version=9
> --tunnel
> --smart-udp-frags
>
>
>
>
> --
> Regards,
> David Notivol
> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: nProbe performance, zbalance packet drops [ In reply to ]

dnotivol at gmail

Jun 26, 2018, 7:31 AM

Post #3 of 10 (3352 views)

Permalink

Hi Alfredo,
Thanks for replying.
This is an excerpt of the zbalance and nprobe statistics:

26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
(19'157'949 drops)
26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX
1285430267 pkts Dropped 1116181981 pkts (46.5 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX 77050882
pkts Dropped 1127883 pkts (1.4 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX 70722562
pkts Dropped 756409 pkts (1.1 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX 76092418
pkts Dropped 1017335 pkts (1.3 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX 75088386
pkts Dropped 896678 pkts (1.2 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX 91991042
pkts Dropped 2114739 pkts (2.2 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX 81384450
pkts Dropped 1269385 pkts (1.5 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX 84310018
pkts Dropped 1801848 pkts (2.1 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX 84554242
pkts Dropped 1487329 pkts (1.7 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX 84090370
pkts Dropped 1482864 pkts (1.7 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX 73642498
pkts Dropped 732237 pkts (1.0 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX 76481026
pkts Dropped 1000496 pkts (1.3 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX 72496642
pkts Dropped 929049 pkts (1.3 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX 79386626
pkts Dropped 1122169 pkts (1.4 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX 79418370
pkts Dropped 1187172 pkts (1.5 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX 80284162
pkts Dropped 1195559 pkts (1.5 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX 79143426
pkts Dropped 1036797 pkts (1.3 %)
26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51 pps
(555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================

# cat /proc/net/pf_ring/stats/*
ClusterId: 1
TotQueues: 16
Applications: 1
App0Queues: 16
Duration: 0:00:41:18:386
Packets: 1191477340
Forwarded: 1174033613
Processed: 1173893301
IFPackets: 1191477364
IFDropped: 1036448041

Duration: 0:00:41:15:587
Bytes: 42626434538
Packets: 71510530
Dropped: 845465

Duration: 0:00:41:15:557
Bytes: 40686677370
Packets: 65656322
Dropped: 533675

Duration: 0:00:41:15:534
Bytes: 41463519299
Packets: 70565378
Dropped: 804282

Duration: 0:00:41:15:523
Bytes: 42321923225
Packets: 69566978
Dropped: 650333

Duration: 0:00:41:14:659
Bytes: 45415334638
Packets: 85479938
Dropped: 1728521

Duration: 0:00:41:14:597
Bytes: 42615821825
Packets: 75445250
Dropped: 951386

Duration: 0:00:41:14:598
Bytes: 44722410915
Packets: 78252409
Dropped: 1479387

Duration: 0:00:41:14:613
Bytes: 44788855334
Packets: 78318926
Dropped: 1202905

Duration: 0:00:41:14:741
Bytes: 43950263720
Packets: 77821954
Dropped: 1135693

Duration: 0:00:41:14:608
Bytes: 41211162757
Packets: 68241354
Dropped: 496494

Duration: 0:00:41:14:629
Bytes: 43064091353
Packets: 70834104
Dropped: 712427

Duration: 0:00:41:14:551
Bytes: 42072869897
Packets: 67360770
Dropped: 696460

Duration: 0:00:41:14:625
Bytes: 44323715294
Packets: 73420290
Dropped: 851818

Duration: 0:00:41:14:625
Bytes: 43018671083
Packets: 73651110
Dropped: 917985

Duration: 0:00:41:14:600
Bytes: 42730057210
Packets: 74312500
Dropped: 799922

Duration: 0:00:41:14:611
Bytes: 42519248547
Packets: 73394690
Dropped: 771941

El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
cardigliano@ntop.org>) escribió:

> Hi David
> please also provide statistics from zbalance_ipc (output or log file)
> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>
> Thank you
> Alfredo
>
> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com> wrote:
>
> Hello list,
>
> We're using nProbe to export flows information to kafka. We're listening
> from two 10Gb interfaces that we merge with zbalance_ipc, and we split them
> into 16 queues to have 16 nprobe instances.
>
> The problem is we are seeing about 40% packet drops reported by
> zbalance_ipc, so it looks like nprobe is not capable of reading and
> processing all the traffic. The CPU usage is really high, and the load
> average is over 25-30.
>
> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
> packets / second; and we're using i40e_zc driver.
>
> Do you have any advice to try to improve this performance?
> Does it make sense we're having packet drops with this amount of traffic,
> and we're reaching the server limits? Or is any configuration we could tune
> up to improve it?
>
> Thanks in advance.
>
>
>
> -- System:
>
> nProbe: nProbe v.8.5.180625 (r6185)
> System RAM: 64GB
> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
> cores, 2 threads per core)
> System OS: CentOS Linux release 7.4.1708 (Core)
> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC
> 2018 x86_64 x86_64 x86_64 GNU/Linux
>
> -- zbalance configuration:
>
> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log
> -v -w
>
> -- nProbe configuration:
>
> --interface=zc:1@0
> --pid-file=/var/run/nprobe-zc1-00.pid
> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
> --collector=none
> --idle-timeout=60
> --snaplen=128
> --aggregation=0/1/1/1/0/0/0
> --all-collectors=0
> --verbose=1
> --dump-format=t
> --vlanid-as-iface-idx=none
> --hash-size=1024000
> --flow-delay=1
> --count-delay=10
> --min-flow-size=0
> --netflow-engine=0:0
> --sample-rate=1:1
> --as-list=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
> --city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
> --flow-templ="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS
> %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT
> %TCP_FLAGS %PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME
> %SRC_IP_COUNTRY %SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY
> %DST_IP_CITY %DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN
> %DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP
> %HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST
> %HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID
> %DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR
> %UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT
> %GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID
> %GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID
> %GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID
> %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP
> %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN
> %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC
> %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP
> %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP
> %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID
> %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP
> %GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA
> %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL
> %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN
> %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC
> %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC
> %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE
> %SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS
> %APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS
> %RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS
> %OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL
> %MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH
> %SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT
> %SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT
> %DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT
> %NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES
> %NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES
> %NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT
> %SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32
> %NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128
> %NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224
> %NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN
> %TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT
> %TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
> --flow-version=9
> --tunnel
> --smart-udp-frags
>
>
>
>
> --
> Regards,
> David Notivol
> dnotivol@gmail.com
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

--
Saludos,
David Notivol
dnotivol@gmail.com

Re: nProbe performance, zbalance packet drops [ In reply to ]

cardigliano at ntop

Jun 27, 2018, 3:13 AM

Post #4 of 10 (3351 views)

Permalink

Hi David
it seems that you have packet loss both on zbalance and nprobe,
I recommend you to:
1. set the core affinity for both zbalance_ipc and the nprobe instances, trying to
use a different core for each (at least do not share the zbalance_ipc physical core
with nprobe instances)
2. did you try using zc drivers for capturing traffic from the interfaces? (zc:p2p1,zc:p2p2)
Please also provide the top output (press 1 to see all cored) with the current configuration,
I guess kernel is using some of the available cpu with this configuration.

Alfredo

> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com> wrote:
>
> Hi Alfredo,
> Thanks for replying.
> This is an excerpt of the zbalance and nprobe statistics:
>
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts (19'157'949 drops)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX 77050882 pkts Dropped 1127883 pkts (1.4 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX 70722562 pkts Dropped 756409 pkts (1.1 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX 76092418 pkts Dropped 1017335 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX 75088386 pkts Dropped 896678 pkts (1.2 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX 91991042 pkts Dropped 2114739 pkts (2.2 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX 81384450 pkts Dropped 1269385 pkts (1.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX 84310018 pkts Dropped 1801848 pkts (2.1 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX 84554242 pkts Dropped 1487329 pkts (1.7 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX 84090370 pkts Dropped 1482864 pkts (1.7 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX 73642498 pkts Dropped 732237 pkts (1.0 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX 76481026 pkts Dropped 1000496 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX 72496642 pkts Dropped 929049 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX 79386626 pkts Dropped 1122169 pkts (1.4 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX 79418370 pkts Dropped 1187172 pkts (1.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX 80284162 pkts Dropped 1195559 pkts (1.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX 79143426 pkts Dropped 1036797 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51 pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>
>
> # cat /proc/net/pf_ring/stats/*
> ClusterId: 1
> TotQueues: 16
> Applications: 1
> App0Queues: 16
> Duration: 0:00:41:18:386
> Packets: 1191477340
> Forwarded: 1174033613
> Processed: 1173893301
> IFPackets: 1191477364
> IFDropped: 1036448041
>
> Duration: 0:00:41:15:587
> Bytes: 42626434538
> Packets: 71510530
> Dropped: 845465
>
> Duration: 0:00:41:15:557
> Bytes: 40686677370
> Packets: 65656322
> Dropped: 533675
>
> Duration: 0:00:41:15:534
> Bytes: 41463519299
> Packets: 70565378
> Dropped: 804282
>
> Duration: 0:00:41:15:523
> Bytes: 42321923225
> Packets: 69566978
> Dropped: 650333
>
> Duration: 0:00:41:14:659
> Bytes: 45415334638
> Packets: 85479938
> Dropped: 1728521
>
> Duration: 0:00:41:14:597
> Bytes: 42615821825
> Packets: 75445250
> Dropped: 951386
>
> Duration: 0:00:41:14:598
> Bytes: 44722410915
> Packets: 78252409
> Dropped: 1479387
>
> Duration: 0:00:41:14:613
> Bytes: 44788855334
> Packets: 78318926
> Dropped: 1202905
>
> Duration: 0:00:41:14:741
> Bytes: 43950263720
> Packets: 77821954
> Dropped: 1135693
>
> Duration: 0:00:41:14:608
> Bytes: 41211162757
> Packets: 68241354
> Dropped: 496494
>
> Duration: 0:00:41:14:629
> Bytes: 43064091353
> Packets: 70834104
> Dropped: 712427
>
> Duration: 0:00:41:14:551
> Bytes: 42072869897
> Packets: 67360770
> Dropped: 696460
>
> Duration: 0:00:41:14:625
> Bytes: 44323715294
> Packets: 73420290
> Dropped: 851818
>
> Duration: 0:00:41:14:625
> Bytes: 43018671083
> Packets: 73651110
> Dropped: 917985
>
> Duration: 0:00:41:14:600
> Bytes: 42730057210
> Packets: 74312500
> Dropped: 799922
>
> Duration: 0:00:41:14:611
> Bytes: 42519248547
> Packets: 73394690
> Dropped: 771941
>
>
>
> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<cardigliano@ntop.org <mailto:cardigliano@ntop.org>>) escribió:
> Hi David
> please also provide statistics from zbalance_ipc (output or log file)
> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>
> Thank you
> Alfredo
>
>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com <mailto:dnotivol@gmail.com>> wrote:
>>
>> Hello list,
>>
>> We're using nProbe to export flows information to kafka. We're listening from two 10Gb interfaces that we merge with zbalance_ipc, and we split them into 16 queues to have 16 nprobe instances.
>>
>> The problem is we are seeing about 40% packet drops reported by zbalance_ipc, so it looks like nprobe is not capable of reading and processing all the traffic. The CPU usage is really high, and the load average is over 25-30.
>>
>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million packets / second; and we're using i40e_zc driver.
>>
>> Do you have any advice to try to improve this performance?
>> Does it make sense we're having packet drops with this amount of traffic, and we're reaching the server limits? Or is any configuration we could tune up to improve it?
>>
>> Thanks in advance.
>>
>>
>>
>> -- System:
>>
>> nProbe: nProbe v.8.5.180625 (r6185)
>> System RAM: 64GB
>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6 cores, 2 threads per core)
>> System OS: CentOS Linux release 7.4.1708 (Core)
>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>> -- zbalance configuration:
>>
>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log -v -w
>>
>> -- nProbe configuration:
>>
>> --interface=zc:1@0
>> --pid-file=/var/run/nprobe-zc1-00.pid
>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>> --kafka "192.168.0.1:9092 <http://192.168.0.1:9092/>,192.168.0.2:9092 <http://192.168.0.2:9092/>,192.168.0.3:9092;topic"
>> --collector=none
>> --idle-timeout=60
>> --snaplen=128
>> --aggregation=0/1/1/1/0/0/0
>> --all-collectors=0
>> --verbose=1
>> --dump-format=t
>> --vlanid-as-iface-idx=none
>> --hash-size=1024000
>> --flow-delay=1
>> --count-delay=10
>> --min-flow-size=0
>> --netflow-engine=0:0
>> --sample-rate=1:1
>> --as-list=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
>> --city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
>> --flow-templ="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME %SRC_IP_COUNTRY %SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY %DST_IP_CITY %DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN %DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP %HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST %HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID %DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT %GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID %GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID %GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP %GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE %SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS %APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS %RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS %OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL %MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH %SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT %SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT %DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT %NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES %NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES %NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT %SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32 %NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128 %NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224 %NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN %TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT %TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
>> --flow-version=9
>> --tunnel
>> --smart-udp-frags
>>
>>
>>
>>
>> --
>> Regards,
>> David Notivol
>> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
>
> --
> Saludos,
> David Notivol
> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: nProbe performance, zbalance packet drops [ In reply to ]

dnotivol at gmail

Jun 27, 2018, 5:20 AM

Post #5 of 10 (3351 views)

Permalink

Hi Alfredo,
Thanks for your recommendations.

I tested using core affinity as you suggested, and the in drops disappeared
in zbalance. The output drops persist, but the absolute drops are less than
before.
Actually I had tested the core affinity, but I didn't have in mind the
physical cores. Now I put zbalance in one physical core, and 10 nprobe
instances not sharing the physical core with zbalance.

About your point 2, by using zc drivers, how could I run several nprobe
instances to share the load? I'm testing with one instance: -i
zc:p2p1,zc:p2p2

Attached you can find:
- 0.log = top output for the scenario in my previous email.
- 1.log = scenario in your point 1, including top, zbalance output, and
nprobe stats.

El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<
cardigliano@ntop.org>) escribió:

> Hi David
> it seems that you have packet loss both on zbalance and nprobe,
> I recommend you to:
> 1. set the core affinity for both zbalance_ipc and the nprobe instances,
> trying to
> use a different core for each (at least do not share the zbalance_ipc
> physical core
> with nprobe instances)
> 2. did you try using zc drivers for capturing traffic from the interfaces?
> (zc:p2p1,zc:p2p2)
> Please also provide the top output (press 1 to see all cored) with the
> current configuration,
> I guess kernel is using some of the available cpu with this configuration.
>
> Alfredo
>
> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com> wrote:
>
> Hi Alfredo,
> Thanks for replying.
> This is an excerpt of the zbalance and nprobe statistics:
>
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
> (19'157'949 drops)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX
> 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX 77050882
> pkts Dropped 1127883 pkts (1.4 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX 70722562
> pkts Dropped 756409 pkts (1.1 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX 76092418
> pkts Dropped 1017335 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX 75088386
> pkts Dropped 896678 pkts (1.2 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX 91991042
> pkts Dropped 2114739 pkts (2.2 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX 81384450
> pkts Dropped 1269385 pkts (1.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX 84310018
> pkts Dropped 1801848 pkts (2.1 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX 84554242
> pkts Dropped 1487329 pkts (1.7 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX 84090370
> pkts Dropped 1482864 pkts (1.7 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX 73642498
> pkts Dropped 732237 pkts (1.0 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX 76481026
> pkts Dropped 1000496 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX 72496642
> pkts Dropped 929049 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX 79386626
> pkts Dropped 1122169 pkts (1.4 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX 79418370
> pkts Dropped 1187172 pkts (1.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX 80284162
> pkts Dropped 1195559 pkts (1.5 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX 79143426
> pkts Dropped 1036797 pkts (1.3 %)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51
> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>
>
> # cat /proc/net/pf_ring/stats/*
> ClusterId: 1
> TotQueues: 16
> Applications: 1
> App0Queues: 16
> Duration: 0:00:41:18:386
> Packets: 1191477340
> Forwarded: 1174033613
> Processed: 1173893301
> IFPackets: 1191477364
> IFDropped: 1036448041
>
> Duration: 0:00:41:15:587
> Bytes: 42626434538
> Packets: 71510530
> Dropped: 845465
>
> Duration: 0:00:41:15:557
> Bytes: 40686677370
> Packets: 65656322
> Dropped: 533675
>
> Duration: 0:00:41:15:534
> Bytes: 41463519299
> Packets: 70565378
> Dropped: 804282
>
> Duration: 0:00:41:15:523
> Bytes: 42321923225
> Packets: 69566978
> Dropped: 650333
>
> Duration: 0:00:41:14:659
> Bytes: 45415334638
> Packets: 85479938
> Dropped: 1728521
>
> Duration: 0:00:41:14:597
> Bytes: 42615821825
> Packets: 75445250
> Dropped: 951386
>
> Duration: 0:00:41:14:598
> Bytes: 44722410915
> Packets: 78252409
> Dropped: 1479387
>
> Duration: 0:00:41:14:613
> Bytes: 44788855334
> Packets: 78318926
> Dropped: 1202905
>
> Duration: 0:00:41:14:741
> Bytes: 43950263720
> Packets: 77821954
> Dropped: 1135693
>
> Duration: 0:00:41:14:608
> Bytes: 41211162757
> Packets: 68241354
> Dropped: 496494
>
> Duration: 0:00:41:14:629
> Bytes: 43064091353
> Packets: 70834104
> Dropped: 712427
>
> Duration: 0:00:41:14:551
> Bytes: 42072869897
> Packets: 67360770
> Dropped: 696460
>
> Duration: 0:00:41:14:625
> Bytes: 44323715294
> Packets: 73420290
> Dropped: 851818
>
> Duration: 0:00:41:14:625
> Bytes: 43018671083
> Packets: 73651110
> Dropped: 917985
>
> Duration: 0:00:41:14:600
> Bytes: 42730057210
> Packets: 74312500
> Dropped: 799922
>
> Duration: 0:00:41:14:611
> Bytes: 42519248547
> Packets: 73394690
> Dropped: 771941
>
>
>
> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
> cardigliano@ntop.org>) escribió:
>
>> Hi David
>> please also provide statistics from zbalance_ipc (output or log file)
>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>
>> Thank you
>> Alfredo
>>
>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com> wrote:
>>
>> Hello list,
>>
>> We're using nProbe to export flows information to kafka. We're listening
>> from two 10Gb interfaces that we merge with zbalance_ipc, and we split them
>> into 16 queues to have 16 nprobe instances.
>>
>> The problem is we are seeing about 40% packet drops reported by
>> zbalance_ipc, so it looks like nprobe is not capable of reading and
>> processing all the traffic. The CPU usage is really high, and the load
>> average is over 25-30.
>>
>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
>> packets / second; and we're using i40e_zc driver.
>>
>> Do you have any advice to try to improve this performance?
>> Does it make sense we're having packet drops with this amount of traffic,
>> and we're reaching the server limits? Or is any configuration we could tune
>> up to improve it?
>>
>> Thanks in advance.
>>
>>
>>
>> -- System:
>>
>> nProbe: nProbe v.8.5.180625 (r6185)
>> System RAM: 64GB
>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
>> cores, 2 threads per core)
>> System OS: CentOS Linux release 7.4.1708 (Core)
>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC
>> 2018 x86_64 x86_64 x86_64 GNU/Linux
>>
>> -- zbalance configuration:
>>
>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log
>> -v -w
>>
>> -- nProbe configuration:
>>
>> --interface=zc:1@0
>> --pid-file=/var/run/nprobe-zc1-00.pid
>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
>> --collector=none
>> --idle-timeout=60
>> --snaplen=128
>> --aggregation=0/1/1/1/0/0/0
>> --all-collectors=0
>> --verbose=1
>> --dump-format=t
>> --vlanid-as-iface-idx=none
>> --hash-size=1024000
>> --flow-delay=1
>> --count-delay=10
>> --min-flow-size=0
>> --netflow-engine=0:0
>> --sample-rate=1:1
>> --as-list=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
>> --city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
>> --flow-templ="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS
>> %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT
>> %TCP_FLAGS %PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME
>> %SRC_IP_COUNTRY %SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY
>> %DST_IP_CITY %DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN
>> %DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP
>> %HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST
>> %HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID
>> %DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR
>> %UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT
>> %GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID
>> %GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID
>> %GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID
>> %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP
>> %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN
>> %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC
>> %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP
>> %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP
>> %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID
>> %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP
>> %GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA
>> %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL
>> %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN
>> %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC
>> %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC
>> %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE
>> %SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS
>> %APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS
>> %RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS
>> %OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL
>> %MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH
>> %SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT
>> %SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT
>> %DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT
>> %NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES
>> %NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES
>> %NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT
>> %SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32
>> %NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128
>> %NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224
>> %NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN
>> %TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT
>> %TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
>> --flow-version=9
>> --tunnel
>> --smart-udp-frags
>>
>>
>>
>>
>> --
>> Regards,
>> David Notivol
>> dnotivol@gmail.com
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
>
>
> --
> Saludos,
> David Notivol
> dnotivol@gmail.com
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

--
Saludos,
David Notivol
dnotivol@gmail.com

Re: nProbe performance, zbalance packet drops [ In reply to ]

cardigliano at ntop

Jun 27, 2018, 5:30 AM

Post #6 of 10 (3351 views)

Permalink

Hi David

> On 27 Jun 2018, at 14:20, David Notivol <dnotivol@gmail.com> wrote:
>
> Hi Alfredo,
> Thanks for your recommendations.
>
> I tested using core affinity as you suggested, and the in drops disappeared in zbalance. The output drops persist, but the absolute drops are less than before.
> Actually I had tested the core affinity, but I didn't have in mind the physical cores. Now I put zbalance in one physical core, and 10 nprobe instances not sharing the physical core with zbalance.
>
> About your point 2, by using zc drivers, how could I run several nprobe instances to share the load? I'm testing with one instance: -i zc:p2p1,zc:p2p2

You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>)

> Attached you can find:
> - 0.log = top output for the scenario in my previous email.
> - 1.log = scenario in your point 1, including top, zbalance output, and nprobe stats.

I do not see the attachments, did you forget to enclose them?

Alfredo

>
> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<cardigliano@ntop.org <mailto:cardigliano@ntop.org>>) escribió:
> Hi David
> it seems that you have packet loss both on zbalance and nprobe,
> I recommend you to:
> 1. set the core affinity for both zbalance_ipc and the nprobe instances, trying to
> use a different core for each (at least do not share the zbalance_ipc physical core
> with nprobe instances)
> 2. did you try using zc drivers for capturing traffic from the interfaces? (zc:p2p1,zc:p2p2)
> Please also provide the top output (press 1 to see all cored) with the current configuration,
> I guess kernel is using some of the available cpu with this configuration.
>
> Alfredo
>
>> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com <mailto:dnotivol@gmail.com>> wrote:
>>
>> Hi Alfredo,
>> Thanks for replying.
>> This is an excerpt of the zbalance and nprobe statistics:
>>
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts (19'157'949 drops)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX 77050882 pkts Dropped 1127883 pkts (1.4 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX 70722562 pkts Dropped 756409 pkts (1.1 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX 76092418 pkts Dropped 1017335 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX 75088386 pkts Dropped 896678 pkts (1.2 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX 91991042 pkts Dropped 2114739 pkts (2.2 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX 81384450 pkts Dropped 1269385 pkts (1.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX 84310018 pkts Dropped 1801848 pkts (2.1 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX 84554242 pkts Dropped 1487329 pkts (1.7 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX 84090370 pkts Dropped 1482864 pkts (1.7 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX 73642498 pkts Dropped 732237 pkts (1.0 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX 76481026 pkts Dropped 1000496 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX 72496642 pkts Dropped 929049 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX 79386626 pkts Dropped 1122169 pkts (1.4 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX 79418370 pkts Dropped 1187172 pkts (1.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX 80284162 pkts Dropped 1195559 pkts (1.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX 79143426 pkts Dropped 1036797 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51 pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>
>>
>> # cat /proc/net/pf_ring/stats/*
>> ClusterId: 1
>> TotQueues: 16
>> Applications: 1
>> App0Queues: 16
>> Duration: 0:00:41:18:386
>> Packets: 1191477340
>> Forwarded: 1174033613
>> Processed: 1173893301
>> IFPackets: 1191477364
>> IFDropped: 1036448041
>>
>> Duration: 0:00:41:15:587
>> Bytes: 42626434538
>> Packets: 71510530
>> Dropped: 845465
>>
>> Duration: 0:00:41:15:557
>> Bytes: 40686677370
>> Packets: 65656322
>> Dropped: 533675
>>
>> Duration: 0:00:41:15:534
>> Bytes: 41463519299
>> Packets: 70565378
>> Dropped: 804282
>>
>> Duration: 0:00:41:15:523
>> Bytes: 42321923225
>> Packets: 69566978
>> Dropped: 650333
>>
>> Duration: 0:00:41:14:659
>> Bytes: 45415334638
>> Packets: 85479938
>> Dropped: 1728521
>>
>> Duration: 0:00:41:14:597
>> Bytes: 42615821825
>> Packets: 75445250
>> Dropped: 951386
>>
>> Duration: 0:00:41:14:598
>> Bytes: 44722410915
>> Packets: 78252409
>> Dropped: 1479387
>>
>> Duration: 0:00:41:14:613
>> Bytes: 44788855334
>> Packets: 78318926
>> Dropped: 1202905
>>
>> Duration: 0:00:41:14:741
>> Bytes: 43950263720
>> Packets: 77821954
>> Dropped: 1135693
>>
>> Duration: 0:00:41:14:608
>> Bytes: 41211162757
>> Packets: 68241354
>> Dropped: 496494
>>
>> Duration: 0:00:41:14:629
>> Bytes: 43064091353
>> Packets: 70834104
>> Dropped: 712427
>>
>> Duration: 0:00:41:14:551
>> Bytes: 42072869897
>> Packets: 67360770
>> Dropped: 696460
>>
>> Duration: 0:00:41:14:625
>> Bytes: 44323715294
>> Packets: 73420290
>> Dropped: 851818
>>
>> Duration: 0:00:41:14:625
>> Bytes: 43018671083
>> Packets: 73651110
>> Dropped: 917985
>>
>> Duration: 0:00:41:14:600
>> Bytes: 42730057210
>> Packets: 74312500
>> Dropped: 799922
>>
>> Duration: 0:00:41:14:611
>> Bytes: 42519248547
>> Packets: 73394690
>> Dropped: 771941
>>
>>
>>
>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<cardigliano@ntop.org <mailto:cardigliano@ntop.org>>) escribió:
>> Hi David
>> please also provide statistics from zbalance_ipc (output or log file)
>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>
>> Thank you
>> Alfredo
>>
>>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com <mailto:dnotivol@gmail.com>> wrote:
>>>
>>> Hello list,
>>>
>>> We're using nProbe to export flows information to kafka. We're listening from two 10Gb interfaces that we merge with zbalance_ipc, and we split them into 16 queues to have 16 nprobe instances.
>>>
>>> The problem is we are seeing about 40% packet drops reported by zbalance_ipc, so it looks like nprobe is not capable of reading and processing all the traffic. The CPU usage is really high, and the load average is over 25-30.
>>>
>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million packets / second; and we're using i40e_zc driver.
>>>
>>> Do you have any advice to try to improve this performance?
>>> Does it make sense we're having packet drops with this amount of traffic, and we're reaching the server limits? Or is any configuration we could tune up to improve it?
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> -- System:
>>>
>>> nProbe: nProbe v.8.5.180625 (r6185)
>>> System RAM: 64GB
>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6 cores, 2 threads per core)
>>> System OS: CentOS Linux release 7.4.1708 (Core)
>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> -- zbalance configuration:
>>>
>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log -v -w
>>>
>>> -- nProbe configuration:
>>>
>>> --interface=zc:1@0
>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>> --kafka "192.168.0.1:9092 <http://192.168.0.1:9092/>,192.168.0.2:9092 <http://192.168.0.2:9092/>,192.168.0.3:9092;topic"
>>> --collector=none
>>> --idle-timeout=60
>>> --snaplen=128
>>> --aggregation=0/1/1/1/0/0/0
>>> --all-collectors=0
>>> --verbose=1
>>> --dump-format=t
>>> --vlanid-as-iface-idx=none
>>> --hash-size=1024000
>>> --flow-delay=1
>>> --count-delay=10
>>> --min-flow-size=0
>>> --netflow-engine=0:0
>>> --sample-rate=1:1
>>> --as-list=/usr/share/ntopng/httpdocs/geoip/GeoIPASNum.dat
>>> --city-list=/usr/share/ntopng/httpdocs/geoip/GeoLiteCity.dat
>>> --flow-templ="%IPV4_SRC_ADDR %IPV4_DST_ADDR %IN_PKTS %IN_BYTES %OUT_PKTS %OUT_BYTES %FIRST_SWITCHED %LAST_SWITCHED %L4_SRC_PORT %L4_DST_PORT %TCP_FLAGS %PROTOCOL %SRC_TOS %SRC_AS %DST_AS %L7_PROTO %L7_PROTO_NAME %SRC_IP_COUNTRY %SRC_IP_CITY %SRC_IP_LONG %SRC_IP_LAT %DST_IP_COUNTRY %DST_IP_CITY %DST_IP_LONG %DST_IP_LAT %SRC_VLAN %DST_VLAN %DOT1Q_SRC_VLAN %DOT1Q_DST_VLAN %DIRECTION %SSL_SERVER_NAME %SRC_AS_MAP %DST_AS_MAP %HTTP_METHOD %HTTP_RET_CODE %HTTP_REFERER %HTTP_UA %HTTP_MIME %HTTP_HOST %HTTP_SITE %UPSTREAM_TUNNEL_ID %UPSTREAM_SESSION_ID %DOWNSTREAM_TUNNEL_ID %DOWNSTREAM_SESSION_ID %UNTUNNELED_PROTOCOL %UNTUNNELED_IPV4_SRC_ADDR %UNTUNNELED_L4_SRC_PORT %UNTUNNELED_IPV4_DST_ADDR %UNTUNNELED_L4_DST_PORT %GTPV2_REQ_MSG_TYPE %GTPV2_RSP_MSG_TYPE %GTPV2_C2S_S1U_GTPU_TEID %GTPV2_C2S_S1U_GTPU_IP %GTPV2_S2C_S1U_GTPU_TEID %GTPV2_S5_S8_GTPC_TEID %GTPV2_S2C_S1U_GTPU_IP %GTPV2_C2S_S5_S8_GTPU_TEID %GTPV2_S2C_S5_S8_GTPU_TEID %GTPV2_C2S_S5_S8_GTPU_IP %GTPV2_S2C_S5_S8_GTPU_IP %GTPV2_END_USER_IMSI %GTPV2_END_USER_MSISDN %GTPV2_APN_NAME %GTPV2_ULI_MCC %GTPV2_ULI_MNC %GTPV2_ULI_CELL_TAC %GTPV2_ULI_CELL_ID %GTPV2_RESPONSE_CAUSE %GTPV2_RAT_TYPE %GTPV2_PDN_IP %GTPV2_END_USER_IMEI %GTPV2_C2S_S5_S8_GTPC_IP %GTPV2_S2C_S5_S8_GTPC_IP %GTPV2_C2S_S5_S8_SGW_GTPU_TEID %GTPV2_S2C_S5_S8_SGW_GTPU_TEID %GTPV2_C2S_S5_S8_SGW_GTPU_IP %GTPV2_S2C_S5_S8_SGW_GTPU_IP %GTPV1_REQ_MSG_TYPE %GTPV1_RSP_MSG_TYPE %GTPV1_C2S_TEID_DATA %GTPV1_C2S_TEID_CTRL %GTPV1_S2C_TEID_DATA %GTPV1_S2C_TEID_CTRL %GTPV1_END_USER_IP %GTPV1_END_USER_IMSI %GTPV1_END_USER_MSISDN %GTPV1_END_USER_IMEI %GTPV1_APN_NAME %GTPV1_RAT_TYPE %GTPV1_RAI_MCC %GTPV1_RAI_MNC %GTPV1_RAI_LAC %GTPV1_RAI_RAC %GTPV1_ULI_MCC %GTPV1_ULI_MNC %GTPV1_ULI_CELL_LAC %GTPV1_ULI_CELL_CI %GTPV1_ULI_SAC %GTPV1_RESPONSE_CAUSE %SRC_FRAGMENTS %DST_FRAGMENTS %CLIENT_NW_LATENCY_MS %SERVER_NW_LATENCY_MS %APPL_LATENCY_MS %RETRANSMITTED_IN_BYTES %RETRANSMITTED_IN_PKTS %RETRANSMITTED_OUT_BYTES %RETRANSMITTED_OUT_PKTS %OOORDER_IN_PKTS %OOORDER_OUT_PKTS %FLOW_ACTIVE_TIMEOUT %FLOW_INACTIVE_TIMEOUT %MIN_TTL %MAX_TTL %IN_SRC_MAC %OUT_DST_MAC %PACKET_SECTION_OFFSET %FRAME_LENGTH %SRC_TO_DST_MAX_THROUGHPUT %SRC_TO_DST_MIN_THROUGHPUT %SRC_TO_DST_AVG_THROUGHPUT %DST_TO_SRC_MAX_THROUGHPUT %DST_TO_SRC_MIN_THROUGHPUT %DST_TO_SRC_AVG_THROUGHPUT %NUM_PKTS_UP_TO_128_BYTES %NUM_PKTS_128_TO_256_BYTES %NUM_PKTS_256_TO_512_BYTES %NUM_PKTS_512_TO_1024_BYTES %NUM_PKTS_1024_TO_1514_BYTES %NUM_PKTS_OVER_1514_BYTES %LONGEST_FLOW_PKT %SHORTEST_FLOW_PKT %NUM_PKTS_TTL_EQ_1 %NUM_PKTS_TTL_2_5 %NUM_PKTS_TTL_5_32 %NUM_PKTS_TTL_32_64 %NUM_PKTS_TTL_64_96 %NUM_PKTS_TTL_96_128 %NUM_PKTS_TTL_128_160 %NUM_PKTS_TTL_160_192 %NUM_PKTS_TTL_192_224 %NUM_PKTS_TTL_224_255 %DURATION_IN %DURATION_OUT %TCP_WIN_MIN_IN %TCP_WIN_MAX_IN %TCP_WIN_MSS_IN %TCP_WIN_SCALE_IN %TCP_WIN_MIN_OUT %TCP_WIN_MAX_OUT %TCP_WIN_MSS_OUT %TCP_WIN_SCALE_OUT"
>>> --flow-version=9
>>> --tunnel
>>> --smart-udp-frags
>>>
>>>
>>>
>>>
>>> --
>>> Regards,
>>> David Notivol
>>> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
>>
>> --
>> Saludos,
>> David Notivol
>> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
>
> --
> Saludos,
> David Notivol
> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: nProbe performance, zbalance packet drops [ In reply to ]

dnotivol at gmail

Jun 27, 2018, 8:41 AM

Post #7 of 10 (3345 views)

Permalink

Hi Alfredo,

Sorry, I forgot to attach the files as you said. I sent them awhile ago,
but it seems the mail size is over the limit and get held for approval. I'm
trying now deleting some info from my first email and pasting one file at a
time.

- 0.log = top output for the scenario in my fist email.

El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (<
cardigliano@ntop.org>) escribió:

> Hi David
>
> On 27 Jun 2018, at 14:20, David Notivol <dnotivol@gmail.com> wrote:
>
> Hi Alfredo,
> Thanks for your recommendations.
>
> I tested using core affinity as you suggested, and the in drops
> disappeared in zbalance. The output drops persist, but the absolute drops
> are less than before.
> Actually I had tested the core affinity, but I didn't have in mind the
> physical cores. Now I put zbalance in one physical core, and 10 nprobe
> instances not sharing the physical core with zbalance.
>
> About your point 2, by using zc drivers, how could I run several nprobe
> instances to share the load? I'm testing with one instance: -i
> zc:p2p1,zc:p2p2
>
>
> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS
> (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>)
>
> Attached you can find:
> - 0.log = top output for the scenario in my previous email.
> - 1.log = scenario in your point 1, including top, zbalance output, and
> nprobe stats.
>
>
>
> I do not see the attachments, did you forget to enclose them?
>
> Alfredo
>
>
> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<
> cardigliano@ntop.org>) escribió:
>
>> Hi David
>> it seems that you have packet loss both on zbalance and nprobe,
>> I recommend you to:
>> 1. set the core affinity for both zbalance_ipc and the nprobe instances,
>> trying to
>> use a different core for each (at least do not share the zbalance_ipc
>> physical core
>> with nprobe instances)
>> 2. did you try using zc drivers for capturing traffic from the
>> interfaces? (zc:p2p1,zc:p2p2)
>> Please also provide the top output (press 1 to see all cored) with the
>> current configuration,
>> I guess kernel is using some of the available cpu with this configuration.
>>
>> Alfredo
>>
>> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com> wrote:
>>
>> Hi Alfredo,
>> Thanks for replying.
>> This is an excerpt of the zbalance and nprobe statistics:
>>
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
>> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
>> (19'157'949 drops)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX
>> 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX 77050882
>> pkts Dropped 1127883 pkts (1.4 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX 70722562
>> pkts Dropped 756409 pkts (1.1 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX 76092418
>> pkts Dropped 1017335 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX 75088386
>> pkts Dropped 896678 pkts (1.2 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX 91991042
>> pkts Dropped 2114739 pkts (2.2 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX 81384450
>> pkts Dropped 1269385 pkts (1.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX 84310018
>> pkts Dropped 1801848 pkts (2.1 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX 84554242
>> pkts Dropped 1487329 pkts (1.7 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX 84090370
>> pkts Dropped 1482864 pkts (1.7 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX 73642498
>> pkts Dropped 732237 pkts (1.0 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX
>> 76481026 pkts Dropped 1000496 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX
>> 72496642 pkts Dropped 929049 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX
>> 79386626 pkts Dropped 1122169 pkts (1.4 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX
>> 79418370 pkts Dropped 1187172 pkts (1.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX
>> 80284162 pkts Dropped 1195559 pkts (1.5 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX
>> 79143426 pkts Dropped 1036797 pkts (1.3 %)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51
>> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>
>>
>> # cat /proc/net/pf_ring/stats/*
>> ClusterId: 1
>> TotQueues: 16
>> Applications: 1
>> App0Queues: 16
>> Duration: 0:00:41:18:386
>> Packets: 1191477340
>> Forwarded: 1174033613
>> Processed: 1173893301
>> IFPackets: 1191477364
>> IFDropped: 1036448041
>>
>> Duration: 0:00:41:15:587
>> Bytes: 42626434538
>> Packets: 71510530
>> Dropped: 845465
>>
>> [removed to make the mail smaller]

>
>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
>> cardigliano@ntop.org>) escribió:
>>
>>> Hi David
>>> please also provide statistics from zbalance_ipc (output or log file)
>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>>
>>> Thank you
>>> Alfredo
>>>
>>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com> wrote:
>>>
>>> Hello list,
>>>
>>> We're using nProbe to export flows information to kafka. We're listening
>>> from two 10Gb interfaces that we merge with zbalance_ipc, and we split them
>>> into 16 queues to have 16 nprobe instances.
>>>
>>> The problem is we are seeing about 40% packet drops reported by
>>> zbalance_ipc, so it looks like nprobe is not capable of reading and
>>> processing all the traffic. The CPU usage is really high, and the load
>>> average is over 25-30.
>>>
>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
>>> packets / second; and we're using i40e_zc driver.
>>>
>>> Do you have any advice to try to improve this performance?
>>> Does it make sense we're having packet drops with this amount of
>>> traffic, and we're reaching the server limits? Or is any configuration we
>>> could tune up to improve it?
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>> -- System:
>>>
>>> nProbe: nProbe v.8.5.180625 (r6185)
>>> System RAM: 64GB
>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
>>> cores, 2 threads per core)
>>> System OS: CentOS Linux release 7.4.1708 (Core)
>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
>>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> -- zbalance configuration:
>>>
>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log
>>> -v -w
>>>
>>> -- nProbe configuration:
>>>
>>> --interface=zc:1@0
>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
>>> --collector=none
>>> --idle-timeout=60
>>> --snaplen=128
>>>
>>> [removed to make the mail smaller]

Re: nProbe performance, zbalance packet drops [ In reply to ]

dnotivol at gmail

Jun 27, 2018, 8:43 AM

Post #8 of 10 (3345 views)

Permalink

Hi,
And now:
- 1.log = scenario in your point 1, including top, zbalance output, and
nprobe stats.

El mié., 27 jun. 2018 a las 17:41, David Notivol (<dnotivol@gmail.com>)
escribió:

> Hi Alfredo,
>
> Sorry, I forgot to attach the files as you said. I sent them awhile ago,
> but it seems the mail size is over the limit and get held for approval. I'm
> trying now deleting some info from my first email and pasting one file at a
> time.
>
> - 0.log = top output for the scenario in my fist email.
>
>
>
> El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (<
> cardigliano@ntop.org>) escribió:
>
>> Hi David
>>
>> On 27 Jun 2018, at 14:20, David Notivol <dnotivol@gmail.com> wrote:
>>
>> Hi Alfredo,
>> Thanks for your recommendations.
>>
>> I tested using core affinity as you suggested, and the in drops
>> disappeared in zbalance. The output drops persist, but the absolute drops
>> are less than before.
>> Actually I had tested the core affinity, but I didn't have in mind the
>> physical cores. Now I put zbalance in one physical core, and 10 nprobe
>> instances not sharing the physical core with zbalance.
>>
>> About your point 2, by using zc drivers, how could I run several nprobe
>> instances to share the load? I'm testing with one instance: -i
>> zc:p2p1,zc:p2p2
>>
>>
>> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS
>> (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>)
>>
>> Attached you can find:
>> - 0.log = top output for the scenario in my previous email.
>> - 1.log = scenario in your point 1, including top, zbalance output, and
>> nprobe stats.
>>
>>
>>
>> I do not see the attachments, did you forget to enclose them?
>>
>> Alfredo
>>
>>
>> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<
>> cardigliano@ntop.org>) escribió:
>>
>>> Hi David
>>> it seems that you have packet loss both on zbalance and nprobe,
>>> I recommend you to:
>>> 1. set the core affinity for both zbalance_ipc and the nprobe instances,
>>> trying to
>>> use a different core for each (at least do not share the zbalance_ipc
>>> physical core
>>> with nprobe instances)
>>> 2. did you try using zc drivers for capturing traffic from the
>>> interfaces? (zc:p2p1,zc:p2p2)
>>> Please also provide the top output (press 1 to see all cored) with the
>>> current configuration,
>>> I guess kernel is using some of the available cpu with this
>>> configuration.
>>>
>>> Alfredo
>>>
>>> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com> wrote:
>>>
>>> Hi Alfredo,
>>> Thanks for replying.
>>> This is an excerpt of the zbalance and nprobe statistics:
>>>
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
>>> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
>>> (19'157'949 drops)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX
>>> 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX
>>> 77050882 pkts Dropped 1127883 pkts (1.4 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX
>>> 70722562 pkts Dropped 756409 pkts (1.1 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX
>>> 76092418 pkts Dropped 1017335 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX
>>> 75088386 pkts Dropped 896678 pkts (1.2 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX
>>> 91991042 pkts Dropped 2114739 pkts (2.2 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX
>>> 81384450 pkts Dropped 1269385 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX
>>> 84310018 pkts Dropped 1801848 pkts (2.1 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX
>>> 84554242 pkts Dropped 1487329 pkts (1.7 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX
>>> 84090370 pkts Dropped 1482864 pkts (1.7 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX
>>> 73642498 pkts Dropped 732237 pkts (1.0 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX
>>> 76481026 pkts Dropped 1000496 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX
>>> 72496642 pkts Dropped 929049 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX
>>> 79386626 pkts Dropped 1122169 pkts (1.4 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX
>>> 79418370 pkts Dropped 1187172 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX
>>> 80284162 pkts Dropped 1195559 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX
>>> 79143426 pkts Dropped 1036797 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51
>>> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>>
>>>
>>> # cat /proc/net/pf_ring/stats/*
>>> ClusterId: 1
>>> TotQueues: 16
>>> Applications: 1
>>> App0Queues: 16
>>> Duration: 0:00:41:18:386
>>> Packets: 1191477340
>>> Forwarded: 1174033613
>>> Processed: 1173893301
>>> IFPackets: 1191477364
>>> IFDropped: 1036448041
>>>
>>> Duration: 0:00:41:15:587
>>> Bytes: 42626434538
>>> Packets: 71510530
>>> Dropped: 845465
>>>
>>> [removed to make the mail smaller]
>
>>
>>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
>>> cardigliano@ntop.org>) escribió:
>>>
>>>> Hi David
>>>> please also provide statistics from zbalance_ipc (output or log file)
>>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>>>
>>>> Thank you
>>>> Alfredo
>>>>
>>>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com> wrote:
>>>>
>>>> Hello list,
>>>>
>>>> We're using nProbe to export flows information to kafka. We're
>>>> listening from two 10Gb interfaces that we merge with zbalance_ipc, and we
>>>> split them into 16 queues to have 16 nprobe instances.
>>>>
>>>> The problem is we are seeing about 40% packet drops reported by
>>>> zbalance_ipc, so it looks like nprobe is not capable of reading and
>>>> processing all the traffic. The CPU usage is really high, and the load
>>>> average is over 25-30.
>>>>
>>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
>>>> packets / second; and we're using i40e_zc driver.
>>>>
>>>> Do you have any advice to try to improve this performance?
>>>> Does it make sense we're having packet drops with this amount of
>>>> traffic, and we're reaching the server limits? Or is any configuration we
>>>> could tune up to improve it?
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> -- System:
>>>>
>>>> nProbe: nProbe v.8.5.180625 (r6185)
>>>> System RAM: 64GB
>>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
>>>> cores, 2 threads per core)
>>>> System OS: CentOS Linux release 7.4.1708 (Core)
>>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
>>>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> -- zbalance configuration:
>>>>
>>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l
>>>> /var/tmp/zbalance.log -v -w
>>>>
>>>> -- nProbe configuration:
>>>>
>>>> --interface=zc:1@0
>>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
>>>> --collector=none
>>>> --idle-timeout=60
>>>> --snaplen=128
>>>>
>>>> [removed to make the mail smaller]
>
>

--
Saludos,
David Notivol
dnotivol@gmail.com

Re: nProbe performance, zbalance packet drops [ In reply to ]

deri at ntop

Jun 27, 2018, 9:16 AM

Post #9 of 10 (3345 views)

Permalink

Hi David
your template is huge. Can you please omit (just for troubleshooting) "--flow-templ….” and report if you see changes in load?

Thanks Luca

> On 27 Jun 2018, at 08:43, David Notivol <dnotivol@gmail.com> wrote:
>
> Hi,
> And now:
> - 1.log = scenario in your point 1, including top, zbalance output, and nprobe stats.
>
> El mié., 27 jun. 2018 a las 17:41, David Notivol (<dnotivol@gmail.com <mailto:dnotivol@gmail.com>>) escribió:
> Hi Alfredo,
>
> Sorry, I forgot to attach the files as you said. I sent them awhile ago, but it seems the mail size is over the limit and get held for approval. I'm trying now deleting some info from my first email and pasting one file at a time.
>> - 0.log = top output for the scenario in my fist email.
>
>
> El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (<cardigliano@ntop.org <mailto:cardigliano@ntop.org>>) escribió:
> Hi David
>
>> On 27 Jun 2018, at 14:20, David Notivol <dnotivol@gmail.com <mailto:dnotivol@gmail.com>> wrote:
>>
>> Hi Alfredo,
>> Thanks for your recommendations.
>>
>> I tested using core affinity as you suggested, and the in drops disappeared in zbalance. The output drops persist, but the absolute drops are less than before.
>> Actually I had tested the core affinity, but I didn't have in mind the physical cores. Now I put zbalance in one physical core, and 10 nprobe instances not sharing the physical core with zbalance.
>>
>> About your point 2, by using zc drivers, how could I run several nprobe instances to share the load? I'm testing with one instance: -i zc:p2p1,zc:p2p2
>
> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>)
>
>> Attached you can find:
>> - 0.log = top output for the scenario in my previous email.
>> - 1.log = scenario in your point 1, including top, zbalance output, and nprobe stats.
>
>
> I do not see the attachments, did you forget to enclose them?
>
> Alfredo
>
>>
>> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<cardigliano@ntop.org <mailto:cardigliano@ntop.org>>) escribió:
>> Hi David
>> it seems that you have packet loss both on zbalance and nprobe,
>> I recommend you to:
>> 1. set the core affinity for both zbalance_ipc and the nprobe instances, trying to
>> use a different core for each (at least do not share the zbalance_ipc physical core
>> with nprobe instances)
>> 2. did you try using zc drivers for capturing traffic from the interfaces? (zc:p2p1,zc:p2p2)
>> Please also provide the top output (press 1 to see all cored) with the current configuration,
>> I guess kernel is using some of the available cpu with this configuration.
>>
>> Alfredo
>>
>>> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com <mailto:dnotivol@gmail.com>> wrote:
>>>
>>> Hi Alfredo,
>>> Thanks for replying.
>>> This is an excerpt of the zbalance and nprobe statistics:
>>>
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts (19'157'949 drops)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX 77050882 pkts Dropped 1127883 pkts (1.4 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX 70722562 pkts Dropped 756409 pkts (1.1 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX 76092418 pkts Dropped 1017335 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX 75088386 pkts Dropped 896678 pkts (1.2 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX 91991042 pkts Dropped 2114739 pkts (2.2 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX 81384450 pkts Dropped 1269385 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX 84310018 pkts Dropped 1801848 pkts (2.1 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX 84554242 pkts Dropped 1487329 pkts (1.7 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX 84090370 pkts Dropped 1482864 pkts (1.7 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX 73642498 pkts Dropped 732237 pkts (1.0 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX 76481026 pkts Dropped 1000496 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX 72496642 pkts Dropped 929049 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX 79386626 pkts Dropped 1122169 pkts (1.4 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX 79418370 pkts Dropped 1187172 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX 80284162 pkts Dropped 1195559 pkts (1.5 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX 79143426 pkts Dropped 1036797 pkts (1.3 %)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51 pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>>
>>>
>>> # cat /proc/net/pf_ring/stats/*
>>> ClusterId: 1
>>> TotQueues: 16
>>> Applications: 1
>>> App0Queues: 16
>>> Duration: 0:00:41:18:386
>>> Packets: 1191477340
>>> Forwarded: 1174033613
>>> Processed: 1173893301
>>> IFPackets: 1191477364
>>> IFDropped: 1036448041
>>>
>>> Duration: 0:00:41:15:587
>>> Bytes: 42626434538
>>> Packets: 71510530
>>> Dropped: 845465
>>>
>
> [removed to make the mail smaller]
>>>
>>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<cardigliano@ntop.org <mailto:cardigliano@ntop.org>>) escribió:
>>> Hi David
>>> please also provide statistics from zbalance_ipc (output or log file)
>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>>
>>> Thank you
>>> Alfredo
>>>
>>>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com <mailto:dnotivol@gmail.com>> wrote:
>>>>
>>>> Hello list,
>>>>
>>>> We're using nProbe to export flows information to kafka. We're listening from two 10Gb interfaces that we merge with zbalance_ipc, and we split them into 16 queues to have 16 nprobe instances.
>>>>
>>>> The problem is we are seeing about 40% packet drops reported by zbalance_ipc, so it looks like nprobe is not capable of reading and processing all the traffic. The CPU usage is really high, and the load average is over 25-30.
>>>>
>>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million packets / second; and we're using i40e_zc driver.
>>>>
>>>> Do you have any advice to try to improve this performance?
>>>> Does it make sense we're having packet drops with this amount of traffic, and we're reaching the server limits? Or is any configuration we could tune up to improve it?
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>> -- System:
>>>>
>>>> nProbe: nProbe v.8.5.180625 (r6185)
>>>> System RAM: 64GB
>>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6 cores, 2 threads per core)
>>>> System OS: CentOS Linux release 7.4.1708 (Core)
>>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>>
>>>> -- zbalance configuration:
>>>>
>>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l /var/tmp/zbalance.log -v -w
>>>>
>>>> -- nProbe configuration:
>>>>
>>>> --interface=zc:1@0
>>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>>> --kafka "192.168.0.1:9092 <http://192.168.0.1:9092/>,192.168.0.2:9092 <http://192.168.0.2:9092/>,192.168.0.3:9092;topic"
>>>> --collector=none
>>>> --idle-timeout=60
>>>> --snaplen=128
>>>>
>
> [removed to make the mail smaller]
>
>
>
> --
> Saludos,
> David Notivol
> dnotivol@gmail.com <mailto:dnotivol@gmail.com>
> <1.log>_______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>

Re: nProbe performance, zbalance packet drops [ In reply to ]

dnotivol at gmail

Jul 11, 2018, 6:40 AM

Post #10 of 10 (3310 views)

Permalink

Hi Luca,

Sorry for the delay replying the list. Thanks for your advice.

I tested reducing the template to just the minimal fields, and the
performance was way better.

Right now, I left the template with ~50 fields (instead of ~150), and
leaving zbalance in a dedicated core (not sharing threads with any nProbe
instance). With this scenario, zbalance reports around 1-2% packet loss.
When I remove the output to kafka (in a different server), and export to
TCP, or without any export, the packet loss is at 0%.

However, I don't see any error saying the buffers are full, or pointing to
any Kafka bottleneck in the logs. (I'm pasting below stats from one of the
instances). I guess I am misreading the stats... Should the problem be
reflected on these stats?

By the way, I'm testing this in a server without any license yet. When I
export the output to TCP, it stops working after 25000 flows as expected;
but this doesn't happen when exporting to Kafka, it keeps working forever.

11/Jul/2018 03:54:35 [nprobe.c:3177] ---------------------------------
11/Jul/2018 03:54:35 [nprobe.c:3181] Average traffic: [73.50 K pps][All
Traffic 352.69 Mb/sec][IP Traffic 309.88 Mb/sec][ratio
0.88]
11/Jul/2018 03:54:35 [nprobe.c:3189] Current traffic: [19.48 K pps][82.17
Mb/sec]
11/Jul/2018 03:54:35 [nprobe.c:3196] Current flow export rate: [1100.4
flows/sec]
11/Jul/2018 03:54:35 [nprobe.c:3199] Flow drops: [export queue too
long=0][too many flows=0][ELK queue flow drops=0]
11/Jul/2018 03:54:35 [nprobe.c:3204] Export Queue: 0/1024000 [0.0 %]
11/Jul/2018 03:54:35 [nprobe.c:3209] Flow Buckets:
[active=88328][allocated=88328][toBeExported=0]
11/Jul/2018 03:54:35 [nprobe.c:3218] Kafka producer #0 [msgs produced:
1937609474][msgs delivered: 1937609474][bytes delivered:
-481499953][msgs failed: 0][msgs/s: 2579][MB/s: 1.58][produce failures:
0][queue len: 2]
11/Jul/2018 03:54:35 [nprobe.c:3026] Processed packets: 55205507976 (max
bucket search: 4)
11/Jul/2018 03:54:35 [nprobe.c:3009] Fragment queue length: 0
11/Jul/2018 03:54:35 [nprobe.c:3035] Flow export stats: [0 bytes/0 pkts][0
flows/0 pkts sent]
11/Jul/2018 03:54:35 [nprobe.c:3045] Flow drop stats: [4021444583
bytes/127743259 pkts][0 flows]
11/Jul/2018 03:54:35 [nprobe.c:3050] Total flow stats: [4021444583
bytes/127743259 pkts][0 flows/0 pkts sent]
11/Jul/2018 03:54:35 [nprobe.c:3064] Kafka producer #0 [msgs produced:
1937609474][msgs delivered: 1937609474][bytes delivered:
-481499953][msgs failed: 0][msgs/s: 2579][MB/s: 1.58][produce failures:
0][queue len: 2][1 msg == 1 flow]
11/Jul/2018 03:55:35 [nprobe.c:3177] ---------------------------------

Regards,
David.

El mié., 27 jun. 2018 a las 18:16, Luca Deri (<deri@ntop.org>) escribió:

> Hi David
> your template is huge. Can you please omit (just for troubleshooting)
> "--flow-templ….” and report if you see changes in load?
>
> Thanks Luca
>
> On 27 Jun 2018, at 08:43, David Notivol <dnotivol@gmail.com> wrote:
>
> Hi,
> And now:
> - 1.log = scenario in your point 1, including top, zbalance output, and
> nprobe stats.
>
> El mié., 27 jun. 2018 a las 17:41, David Notivol (<dnotivol@gmail.com>)
> escribió:
>
>> Hi Alfredo,
>>
>> Sorry, I forgot to attach the files as you said. I sent them awhile ago,
>> but it seems the mail size is over the limit and get held for approval. I'm
>> trying now deleting some info from my first email and pasting one file at a
>> time.
>>
>> - 0.log = top output for the scenario in my fist email.
>>
>>
>>
>> El mié., 27 jun. 2018 a las 14:30, Alfredo Cardigliano (<
>> cardigliano@ntop.org>) escribió:
>>
>>> Hi David
>>>
>>> On 27 Jun 2018, at 14:20, David Notivol <dnotivol@gmail.com> wrote:
>>>
>>> Hi Alfredo,
>>> Thanks for your recommendations.
>>>
>>> I tested using core affinity as you suggested, and the in drops
>>> disappeared in zbalance. The output drops persist, but the absolute drops
>>> are less than before.
>>> Actually I had tested the core affinity, but I didn't have in mind the
>>> physical cores. Now I put zbalance in one physical core, and 10 nprobe
>>> instances not sharing the physical core with zbalance.
>>>
>>> About your point 2, by using zc drivers, how could I run several nprobe
>>> instances to share the load? I'm testing with one instance: -i
>>> zc:p2p1,zc:p2p2
>>>
>>>
>>> You can keep using zbalance_ipc (-i zc:p2p1,zc:p2p2), or you can use RSS
>>> (running nprobe on -i zc:p2p1@<id>,zc:p2p2@<id>)
>>>
>>> Attached you can find:
>>> - 0.log = top output for the scenario in my previous email.
>>> - 1.log = scenario in your point 1, including top, zbalance output, and
>>> nprobe stats.
>>>
>>>
>>>
>>> I do not see the attachments, did you forget to enclose them?
>>>
>>> Alfredo
>>>
>>>
>>> El mié., 27 jun. 2018 a las 12:13, Alfredo Cardigliano (<
>>> cardigliano@ntop.org>) escribió:
>>>
>>>> Hi David
>>>> it seems that you have packet loss both on zbalance and nprobe,
>>>> I recommend you to:
>>>> 1. set the core affinity for both zbalance_ipc and the nprobe
>>>> instances, trying to
>>>> use a different core for each (at least do not share the zbalance_ipc
>>>> physical core
>>>> with nprobe instances)
>>>> 2. did you try using zc drivers for capturing traffic from the
>>>> interfaces? (zc:p2p1,zc:p2p2)
>>>> Please also provide the top output (press 1 to see all cored) with the
>>>> current configuration,
>>>> I guess kernel is using some of the available cpu with this
>>>> configuration.
>>>>
>>>> Alfredo
>>>>
>>>> On 26 Jun 2018, at 16:31, David Notivol <dnotivol@gmail.com> wrote:
>>>>
>>>> Hi Alfredo,
>>>> Thanks for replying.
>>>> This is an excerpt of the zbalance and nprobe statistics:
>>>>
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:265] =========================
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:266] Absolute Stats: Recv
>>>> 1'285'430'239 pkts (1'116'181'903 drops) - Forwarded 1'266'272'285 pkts
>>>> (19'157'949 drops)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:305] p2p1,p2p2 RX
>>>> 1285430267 pkts Dropped 1116181981 pkts (46.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 0 RX
>>>> 77050882 pkts Dropped 1127883 pkts (1.4 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 1 RX
>>>> 70722562 pkts Dropped 756409 pkts (1.1 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 2 RX
>>>> 76092418 pkts Dropped 1017335 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 3 RX
>>>> 75088386 pkts Dropped 896678 pkts (1.2 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 4 RX
>>>> 91991042 pkts Dropped 2114739 pkts (2.2 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 5 RX
>>>> 81384450 pkts Dropped 1269385 pkts (1.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 6 RX
>>>> 84310018 pkts Dropped 1801848 pkts (2.1 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 7 RX
>>>> 84554242 pkts Dropped 1487329 pkts (1.7 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 8 RX
>>>> 84090370 pkts Dropped 1482864 pkts (1.7 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 9 RX
>>>> 73642498 pkts Dropped 732237 pkts (1.0 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 10 RX
>>>> 76481026 pkts Dropped 1000496 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 11 RX
>>>> 72496642 pkts Dropped 929049 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 12 RX
>>>> 79386626 pkts Dropped 1122169 pkts (1.4 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 13 RX
>>>> 79418370 pkts Dropped 1187172 pkts (1.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 14 RX
>>>> 80284162 pkts Dropped 1195559 pkts (1.5 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:319] Q 15 RX
>>>> 79143426 pkts Dropped 1036797 pkts (1.3 %)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:338] Actual Stats: Recv 369'127.51
>>>> pps (555'069.74 drops) - Forwarded 369'129.51 pps (0.00 drops)
>>>> 26/Jun/2018 17:29:58 [zbalance_ipc.c:348] =========================
>>>>
>>>>
>>>> # cat /proc/net/pf_ring/stats/*
>>>> ClusterId: 1
>>>> TotQueues: 16
>>>> Applications: 1
>>>> App0Queues: 16
>>>> Duration: 0:00:41:18:386
>>>> Packets: 1191477340
>>>> Forwarded: 1174033613
>>>> Processed: 1173893301
>>>> IFPackets: 1191477364
>>>> IFDropped: 1036448041
>>>>
>>>> Duration: 0:00:41:15:587
>>>> Bytes: 42626434538
>>>> Packets: 71510530
>>>> Dropped: 845465
>>>>
>>>> [removed to make the mail smaller]
>>
>>>
>>>> El mar., 26 jun. 2018 a las 16:25, Alfredo Cardigliano (<
>>>> cardigliano@ntop.org>) escribió:
>>>>
>>>>> Hi David
>>>>> please also provide statistics from zbalance_ipc (output or log file)
>>>>> and nprobe (you can get live stats from /proc/net/pf_ring/stats/)
>>>>>
>>>>> Thank you
>>>>> Alfredo
>>>>>
>>>>> On 26 Jun 2018, at 15:32, David Notivol <dnotivol@gmail.com> wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>> We're using nProbe to export flows information to kafka. We're
>>>>> listening from two 10Gb interfaces that we merge with zbalance_ipc, and we
>>>>> split them into 16 queues to have 16 nprobe instances.
>>>>>
>>>>> The problem is we are seeing about 40% packet drops reported by
>>>>> zbalance_ipc, so it looks like nprobe is not capable of reading and
>>>>> processing all the traffic. The CPU usage is really high, and the load
>>>>> average is over 25-30.
>>>>>
>>>>> Merging both interfaces we're getting up to 5.5 Gbps, and 1.2 million
>>>>> packets / second; and we're using i40e_zc driver.
>>>>>
>>>>> Do you have any advice to try to improve this performance?
>>>>> Does it make sense we're having packet drops with this amount of
>>>>> traffic, and we're reaching the server limits? Or is any configuration we
>>>>> could tune up to improve it?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> -- System:
>>>>>
>>>>> nProbe: nProbe v.8.5.180625 (r6185)
>>>>> System RAM: 64GB
>>>>> System CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz, 12 cores (6
>>>>> cores, 2 threads per core)
>>>>> System OS: CentOS Linux release 7.4.1708 (Core)
>>>>> Linux Kernel: 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58
>>>>> UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
>>>>>
>>>>> -- zbalance configuration:
>>>>>
>>>>> zbalance_ipc -i p2p1,p2p2 -c 1 -n 16 -m 4 -a -p -l
>>>>> /var/tmp/zbalance.log -v -w
>>>>>
>>>>> -- nProbe configuration:
>>>>>
>>>>> --interface=zc:1@0
>>>>> --pid-file=/var/run/nprobe-zc1-00.pid
>>>>> --dump-stats=/var/log/nprobe/zc1-00_flows_stats.txt
>>>>> --kafka "192.168.0.1:9092,192.168.0.2:9092,192.168.0.3:9092;topic"
>>>>> --collector=none
>>>>> --idle-timeout=60
>>>>> --snaplen=128
>>>>>
>>>>> [removed to make the mail smaller]
>>
>>
>
> --
> Saludos,
> David Notivol
> dnotivol@gmail.com
> <1.log>_______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

--
Saludos,
David Notivol
dnotivol@gmail.com

Mailing List Archive

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files: