Mailing List Archive: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps

zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps

Oct 13, 2016, 1:35 PM

Post #1 of 16 (3815 views)

I'm testing out a new server (36 cores, 72 with HT) using
zbalance_ipc, and it seems occasionally some packets are
getting sent to multiple processes.

I'm currently running zbalance_ipc like so:

/usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1

with 72 snorts like so:

/usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
--daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
-c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)

I've got a custom HTTP rule to catch GETs with a particular
user-agent. I run 100 GETs, and each GET request has the run
number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
and this is what I end up getting when I check the GETs :

1 GET /11
1 GET /2
1 GET /30
1 GET /34
1 GET /37
1 GET /5
1 GET /59
1 GET /62
1 GET /70
1 GET /8
1 GET /83
1 GET /84
1 GET /9
1 GET /90
1 GET /94
1 GET /95
16 GET /97
20 GET /12
20 GET /38

Obviously I'm still running into packet loss, but several of the
GETs are getting sent to multiple processes:

ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1

Is this an issue with the zbalance_ipc hash? I tried using

-m 1

but it seemed like I ended up dropping even more packets.

Any advice/pointers appreciated.

--
Jim Hranicky
Data Security Specialist
UF Information Technology
105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
352-273-1341
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 14, 2016, 12:53 AM

Post #2 of 16 (3804 views)

Permalink

Hi Jim
please note that when using distribution to multiple applications (using a comma-separated list in -n),
the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.

Alfredo

> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>
> I'm testing out a new server (36 cores, 72 with HT) using
> zbalance_ipc, and it seems occasionally some packets are
> getting sent to multiple processes.
>
> I'm currently running zbalance_ipc like so:
>
> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>
> with 72 snorts like so:
>
> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>
> I've got a custom HTTP rule to catch GETs with a particular
> user-agent. I run 100 GETs, and each GET request has the run
> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
> and this is what I end up getting when I check the GETs :
>
> 1 GET /11
> 1 GET /2
> 1 GET /30
> 1 GET /34
> 1 GET /37
> 1 GET /5
> 1 GET /59
> 1 GET /62
> 1 GET /70
> 1 GET /8
> 1 GET /83
> 1 GET /84
> 1 GET /9
> 1 GET /90
> 1 GET /94
> 1 GET /95
> 16 GET /97
> 20 GET /12
> 20 GET /38
>
> Obviously I'm still running into packet loss, but several of the
> GETs are getting sent to multiple processes:
>
> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>
> Is this an issue with the zbalance_ipc hash? I tried using
>
> -m 1
>
> but it seemed like I ended up dropping even more packets.
>
> Any advice/pointers appreciated.
>
> --
> Jim Hranicky
> Data Security Specialist
> UF Information Technology
> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
> 352-273-1341
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 14, 2016, 8:59 AM

Post #3 of 16 (3804 views)

Permalink

How difficult would it be to add a hashing algorithm based
on the 5-tuple that can support more cores? Is that even
feasible?

Jim

On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
> Hi Jim
> please note that when using distribution to multiple applications (using a comma-separated list in -n),
> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>
> Alfredo
>
>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>
>> I'm testing out a new server (36 cores, 72 with HT) using
>> zbalance_ipc, and it seems occasionally some packets are
>> getting sent to multiple processes.
>>
>> I'm currently running zbalance_ipc like so:
>>
>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>
>> with 72 snorts like so:
>>
>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>
>> I've got a custom HTTP rule to catch GETs with a particular
>> user-agent. I run 100 GETs, and each GET request has the run
>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>> and this is what I end up getting when I check the GETs :
>>
>> 1 GET /11
>> 1 GET /2
>> 1 GET /30
>> 1 GET /34
>> 1 GET /37
>> 1 GET /5
>> 1 GET /59
>> 1 GET /62
>> 1 GET /70
>> 1 GET /8
>> 1 GET /83
>> 1 GET /84
>> 1 GET /9
>> 1 GET /90
>> 1 GET /94
>> 1 GET /95
>> 16 GET /97
>> 20 GET /12
>> 20 GET /38
>>
>> Obviously I'm still running into packet loss, but several of the
>> GETs are getting sent to multiple processes:
>>
>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>
>> Is this an issue with the zbalance_ipc hash? I tried using
>>
>> -m 1
>>
>> but it seemed like I ended up dropping even more packets.
>>
>> Any advice/pointers appreciated.
>>
>> --
>> Jim Hranicky
>> Data Security Specialist
>> UF Information Technology
>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>> 352-273-1341
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 14, 2016, 9:04 AM

Post #4 of 16 (3805 views)

Permalink

Another question. It seems that suricata can go into ZC mode
without using zbalance_ipc, however, the card I have (82599)
only supports RSS values of upto 16. Would I be able to take
advantage of all the cores I have with suri in this instance
if I moved to a card that can support more RSS entries
(e.g., fm10k : 128 ) ?

Jim

On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
> Hi Jim
> please note that when using distribution to multiple applications (using a comma-separated list in -n),
> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>
> Alfredo
>
>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>
>> I'm testing out a new server (36 cores, 72 with HT) using
>> zbalance_ipc, and it seems occasionally some packets are
>> getting sent to multiple processes.
>>
>> I'm currently running zbalance_ipc like so:
>>
>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>
>> with 72 snorts like so:
>>
>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>
>> I've got a custom HTTP rule to catch GETs with a particular
>> user-agent. I run 100 GETs, and each GET request has the run
>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>> and this is what I end up getting when I check the GETs :
>>
>> 1 GET /11
>> 1 GET /2
>> 1 GET /30
>> 1 GET /34
>> 1 GET /37
>> 1 GET /5
>> 1 GET /59
>> 1 GET /62
>> 1 GET /70
>> 1 GET /8
>> 1 GET /83
>> 1 GET /84
>> 1 GET /9
>> 1 GET /90
>> 1 GET /94
>> 1 GET /95
>> 16 GET /97
>> 20 GET /12
>> 20 GET /38
>>
>> Obviously I'm still running into packet loss, but several of the
>> GETs are getting sent to multiple processes:
>>
>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>
>> Is this an issue with the zbalance_ipc hash? I tried using
>>
>> -m 1
>>
>> but it seemed like I ended up dropping even more packets.
>>
>> Any advice/pointers appreciated.
>>
>> --
>> Jim Hranicky
>> Data Security Specialist
>> UF Information Technology
>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>> 352-273-1341
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 14, 2016, 9:07 AM

Post #5 of 16 (3805 views)

Permalink

And one more, sorry. I tried to stop zbalance_ipc to move to
32 queues and am getting this error:

Message from syslogd@host at Oct 14 12:05:23 ...
kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237]

Message from syslogd@host at Oct 14 12:05:23 ...
kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]

Message from syslogd@host at Oct 14 12:05:26 ...
kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157]

Message from syslogd@host at Oct 14 12:05:27 ...
kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217]

kill -9 has no effect. Is this a result of useing too many queues?

Jim

On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
> Hi Jim
> please note that when using distribution to multiple applications (using a comma-separated list in -n),
> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>
> Alfredo
>
>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>
>> I'm testing out a new server (36 cores, 72 with HT) using
>> zbalance_ipc, and it seems occasionally some packets are
>> getting sent to multiple processes.
>>
>> I'm currently running zbalance_ipc like so:
>>
>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>
>> with 72 snorts like so:
>>
>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>
>> I've got a custom HTTP rule to catch GETs with a particular
>> user-agent. I run 100 GETs, and each GET request has the run
>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>> and this is what I end up getting when I check the GETs :
>>
>> 1 GET /11
>> 1 GET /2
>> 1 GET /30
>> 1 GET /34
>> 1 GET /37
>> 1 GET /5
>> 1 GET /59
>> 1 GET /62
>> 1 GET /70
>> 1 GET /8
>> 1 GET /83
>> 1 GET /84
>> 1 GET /9
>> 1 GET /90
>> 1 GET /94
>> 1 GET /95
>> 16 GET /97
>> 20 GET /12
>> 20 GET /38
>>
>> Obviously I'm still running into packet loss, but several of the
>> GETs are getting sent to multiple processes:
>>
>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>
>> Is this an issue with the zbalance_ipc hash? I tried using
>>
>> -m 1
>>
>> but it seemed like I ended up dropping even more packets.
>>
>> Any advice/pointers appreciated.
>>
>> --
>> Jim Hranicky
>> Data Security Specialist
>> UF Information Technology
>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>> 352-273-1341
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

manuel.polonio at bitmonlab

Oct 14, 2016, 10:18 AM

Post #6 of 16 (3804 views)

Permalink

Hi Jim,

I faced a similar problem with the same NIC some time ago. I found the same
upper bound in a 32 physical-core (64 cores with HT) server. The point is
that this server had two different CPUs (with 16 physical cores (32 cores
with HT) each one) and, maybe I'm wrong, but I seem to remember that RSS
only applies to physical cores and only for the first CPU.

How many CPUs does your server have?

Best regards,
Manuel

2016-10-14 18:04 GMT+02:00 Jim Hranicky <jfh@ufl.edu>:

> Another question. It seems that suricata can go into ZC mode
> without using zbalance_ipc, however, the card I have (82599)
> only supports RSS values of upto 16. Would I be able to take
> advantage of all the cores I have with suri in this instance
> if I moved to a card that can support more RSS entries
> (e.g., fm10k : 128 ) ?
>
> Jim
>
> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
> > Hi Jim
> > please note that when using distribution to multiple applications (using
> a comma-separated list in -n),
> > the fan-out API is used which supports up to 32 egress queues total, in
> your case you are using 73 queues,
> > thus I guess only the first 32 instances are receiving traffic (and
> maybe duplicated traffic due to a wrong
> > egress mask) . I will add a check for this in zbalance_ipc to avoid this
> kind of misconfigurations.
> >
> > Alfredo
> >
> >> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
> >>
> >> I'm testing out a new server (36 cores, 72 with HT) using
> >> zbalance_ipc, and it seems occasionally some packets are
> >> getting sent to multiple processes.
> >>
> >> I'm currently running zbalance_ipc like so:
> >>
> >> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S
> 1
> >>
> >> with 72 snorts like so:
> >>
> >> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
> >> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
> >> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
> >>
> >> I've got a custom HTTP rule to catch GETs with a particular
> >> user-agent. I run 100 GETs, and each GET request has the run
> >> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
> >> and this is what I end up getting when I check the GETs :
> >>
> >> 1 GET /11
> >> 1 GET /2
> >> 1 GET /30
> >> 1 GET /34
> >> 1 GET /37
> >> 1 GET /5
> >> 1 GET /59
> >> 1 GET /62
> >> 1 GET /70
> >> 1 GET /8
> >> 1 GET /83
> >> 1 GET /84
> >> 1 GET /9
> >> 1 GET /90
> >> 1 GET /94
> >> 1 GET /95
> >> 16 GET /97
> >> 20 GET /12
> >> 20 GET /38
> >>
> >> Obviously I'm still running into packet loss, but several of the
> >> GETs are getting sent to multiple processes:
> >>
> >> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >>
> >> Is this an issue with the zbalance_ipc hash? I tried using
> >>
> >> -m 1
> >>
> >> but it seemed like I ended up dropping even more packets.
> >>
> >> Any advice/pointers appreciated.
> >>
> >> --
> >> Jim Hranicky
> >> Data Security Specialist
> >> UF Information Technology
> >> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
> >> 352-273-1341
> >> _______________________________________________
> >> Ntop-misc mailing list
> >> Ntop-misc@listgateway.unipi.it
> >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> >
> > _______________________________________________
> > Ntop-misc mailing list
> > Ntop-misc@listgateway.unipi.it
> > http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> >
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 14, 2016, 10:48 AM

Post #7 of 16 (3806 views)

Permalink

I have 2 CPUS, 18 cores each, HT gives 36 cores per CPU for a total
of 72 "cpus" as per /proc/cpuinfo .

I only have a 10g feed currently, but it appears the fm10k can take
10g SFPs and supports RSS values upto 128 according to fm10k_type.h:

#define FM10K_MAX_RSS_INDICES 128

Jim

On 10/14/2016 01:18 PM, Manuel Polonio wrote:
> Hi Jim,
>
> I faced a similar problem with the same NIC some time ago. I found the
> same upper bound in a 32 physical-core (64 cores with HT) server. The
> point is that this server had two different CPUs (with 16 physical cores
> (32 cores with HT) each one) and, maybe I'm wrong, but I seem to
> remember that RSS only applies to physical cores and only for the first CPU.
>
> How many CPUs does your server have?
>
> Best regards,
> Manuel
>
> 2016-10-14 18:04 GMT+02:00 Jim Hranicky <jfh@ufl.edu <mailto:jfh@ufl.edu>>:
>
> Another question. It seems that suricata can go into ZC mode
> without using zbalance_ipc, however, the card I have (82599)
> only supports RSS values of upto 16. Would I be able to take
> advantage of all the cores I have with suri in this instance
> if I moved to a card that can support more RSS entries
> (e.g., fm10k : 128 ) ?
>
> Jim
>
> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
> > Hi Jim
> > please note that when using distribution to multiple applications
> (using a comma-separated list in -n),
> > the fan-out API is used which supports up to 32 egress queues
> total, in your case you are using 73 queues,
> > thus I guess only the first 32 instances are receiving traffic
> (and maybe duplicated traffic due to a wrong
> > egress mask) . I will add a check for this in zbalance_ipc to
> avoid this kind of misconfigurations.
> >
> > Alfredo
> >
> >> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu
> <mailto:jfh@ufl.edu>> wrote:
> >>
> >> I'm testing out a new server (36 cores, 72 with HT) using
> >> zbalance_ipc, and it seems occasionally some packets are
> >> getting sent to multiple processes.
> >>
> >> I'm currently running zbalance_ipc like so:
> >>
> >> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99
> -g 0 -S 1
> >>
> >> with 72 snorts like so:
> >>
> >> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
> >> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
> >> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R
> ($i + 1)
> >>
> >> I've got a custom HTTP rule to catch GETs with a particular
> >> user-agent. I run 100 GETs, and each GET request has the run
> >> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
> >> and this is what I end up getting when I check the GETs :
> >>
> >> 1 GET /11
> >> 1 GET /2
> >> 1 GET /30
> >> 1 GET /34
> >> 1 GET /37
> >> 1 GET /5
> >> 1 GET /59
> >> 1 GET /62
> >> 1 GET /70
> >> 1 GET /8
> >> 1 GET /83
> >> 1 GET /84
> >> 1 GET /9
> >> 1 GET /90
> >> 1 GET /94
> >> 1 GET /95
> >> 16 GET /97
> >> 20 GET /12
> >> 20 GET /38
> >>
> >> Obviously I'm still running into packet loss, but several of the
> >> GETs are getting sent to multiple processes:
> >>
> >> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
> >>
> >> Is this an issue with the zbalance_ipc hash? I tried using
> >>
> >> -m 1
> >>
> >> but it seemed like I ended up dropping even more packets.
> >>
> >> Any advice/pointers appreciated.
> >>
> >> --
> >> Jim Hranicky
> >> Data Security Specialist
> >> UF Information Technology
> >> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
> >> 352-273-1341
> >> _______________________________________________
> >> Ntop-misc mailing list
> >> Ntop-misc@listgateway.unipi.it
> <mailto:Ntop-misc@listgateway.unipi.it>
> >> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
> >
> > _______________________________________________
> > Ntop-misc mailing list
> > Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
> > http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
> >
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it <mailto:Ntop-misc@listgateway.unipi.it>
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
> <http://listgateway.unipi.it/mailman/listinfo/ntop-misc>
>
>
>
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 14, 2016, 12:41 PM

Post #8 of 16 (3807 views)

Permalink

Hi Jim
please note the hashing algorithm and the distribution function themselves handle
more than 32 queues, the limit is in the fan-out support (multi applications) which
uses a 32bit mask: in essence if you use -n 72 in place of -n 72,1 you are able
to handle 72 instances. Changing the fan-out API is feasible but requires some
internal change (besides affecting performance, especially if a mask larger than
64 bit is needed).

Alfredo

> On 14 Oct 2016, at 17:59, Jim Hranicky <jfh@ufl.edu> wrote:
>
> How difficult would it be to add a hashing algorithm based
> on the 5-tuple that can support more cores? Is that even
> feasible?
>
> Jim
>
> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>> Hi Jim
>> please note that when using distribution to multiple applications (using a comma-separated list in -n),
>> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
>> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
>> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>>
>> Alfredo
>>
>>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>>
>>> I'm testing out a new server (36 cores, 72 with HT) using
>>> zbalance_ipc, and it seems occasionally some packets are
>>> getting sent to multiple processes.
>>>
>>> I'm currently running zbalance_ipc like so:
>>>
>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>
>>> with 72 snorts like so:
>>>
>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>
>>> I've got a custom HTTP rule to catch GETs with a particular
>>> user-agent. I run 100 GETs, and each GET request has the run
>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>>> and this is what I end up getting when I check the GETs :
>>>
>>> 1 GET /11
>>> 1 GET /2
>>> 1 GET /30
>>> 1 GET /34
>>> 1 GET /37
>>> 1 GET /5
>>> 1 GET /59
>>> 1 GET /62
>>> 1 GET /70
>>> 1 GET /8
>>> 1 GET /83
>>> 1 GET /84
>>> 1 GET /9
>>> 1 GET /90
>>> 1 GET /94
>>> 1 GET /95
>>> 16 GET /97
>>> 20 GET /12
>>> 20 GET /38
>>>
>>> Obviously I'm still running into packet loss, but several of the
>>> GETs are getting sent to multiple processes:
>>>
>>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>
>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>
>>> -m 1
>>>
>>> but it seemed like I ended up dropping even more packets.
>>>
>>> Any advice/pointers appreciated.
>>>
>>> --
>>> Jim Hranicky
>>> Data Security Specialist
>>> UF Information Technology
>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>> 352-273-1341
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 14, 2016, 12:43 PM

Post #9 of 16 (3805 views)

Permalink

Yes, RSS in 82599 supports up to 16 queues, if you need more moving to fm10k could be an option.

Alfredo

> On 14 Oct 2016, at 18:04, Jim Hranicky <jfh@ufl.edu> wrote:
>
> Another question. It seems that suricata can go into ZC mode
> without using zbalance_ipc, however, the card I have (82599)
> only supports RSS values of upto 16. Would I be able to take
> advantage of all the cores I have with suri in this instance
> if I moved to a card that can support more RSS entries
> (e.g., fm10k : 128 ) ?
>
> Jim
>
> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>> Hi Jim
>> please note that when using distribution to multiple applications (using a comma-separated list in -n),
>> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
>> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
>> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>>
>> Alfredo
>>
>>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>>
>>> I'm testing out a new server (36 cores, 72 with HT) using
>>> zbalance_ipc, and it seems occasionally some packets are
>>> getting sent to multiple processes.
>>>
>>> I'm currently running zbalance_ipc like so:
>>>
>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>
>>> with 72 snorts like so:
>>>
>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>
>>> I've got a custom HTTP rule to catch GETs with a particular
>>> user-agent. I run 100 GETs, and each GET request has the run
>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>>> and this is what I end up getting when I check the GETs :
>>>
>>> 1 GET /11
>>> 1 GET /2
>>> 1 GET /30
>>> 1 GET /34
>>> 1 GET /37
>>> 1 GET /5
>>> 1 GET /59
>>> 1 GET /62
>>> 1 GET /70
>>> 1 GET /8
>>> 1 GET /83
>>> 1 GET /84
>>> 1 GET /9
>>> 1 GET /90
>>> 1 GET /94
>>> 1 GET /95
>>> 16 GET /97
>>> 20 GET /12
>>> 20 GET /38
>>>
>>> Obviously I'm still running into packet loss, but several of the
>>> GETs are getting sent to multiple processes:
>>>
>>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>
>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>
>>> -m 1
>>>
>>> but it seemed like I ended up dropping even more packets.
>>>
>>> Any advice/pointers appreciated.
>>>
>>> --
>>> Jim Hranicky
>>> Data Security Specialist
>>> UF Information Technology
>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>> 352-273-1341
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 14, 2016, 12:44 PM

Post #10 of 16 (3805 views)

Permalink

Uhm, hard to say, could you provide also dmesg?

Alfredo

> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>
> And one more, sorry. I tried to stop zbalance_ipc to move to
> 32 queues and am getting this error:
>
> Message from syslogd@host at Oct 14 12:05:23 ...
> kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>
> Message from syslogd@host at Oct 14 12:05:23 ...
> kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>
> Message from syslogd@host at Oct 14 12:05:26 ...
> kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>
> Message from syslogd@host at Oct 14 12:05:27 ...
> kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>
> kill -9 has no effect. Is this a result of useing too many queues?
>
> Jim
>
> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>> Hi Jim
>> please note that when using distribution to multiple applications (using a comma-separated list in -n),
>> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
>> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
>> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>>
>> Alfredo
>>
>>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>>
>>> I'm testing out a new server (36 cores, 72 with HT) using
>>> zbalance_ipc, and it seems occasionally some packets are
>>> getting sent to multiple processes.
>>>
>>> I'm currently running zbalance_ipc like so:
>>>
>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>
>>> with 72 snorts like so:
>>>
>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>
>>> I've got a custom HTTP rule to catch GETs with a particular
>>> user-agent. I run 100 GETs, and each GET request has the run
>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>>> and this is what I end up getting when I check the GETs :
>>>
>>> 1 GET /11
>>> 1 GET /2
>>> 1 GET /30
>>> 1 GET /34
>>> 1 GET /37
>>> 1 GET /5
>>> 1 GET /59
>>> 1 GET /62
>>> 1 GET /70
>>> 1 GET /8
>>> 1 GET /83
>>> 1 GET /84
>>> 1 GET /9
>>> 1 GET /90
>>> 1 GET /94
>>> 1 GET /95
>>> 16 GET /97
>>> 20 GET /12
>>> 20 GET /38
>>>
>>> Obviously I'm still running into packet loss, but several of the
>>> GETs are getting sent to multiple processes:
>>>
>>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>
>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>
>>> -m 1
>>>
>>> but it seemed like I ended up dropping even more packets.
>>>
>>> Any advice/pointers appreciated.
>>>
>>> --
>>> Jim Hranicky
>>> Data Security Specialist
>>> UF Information Technology
>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>> 352-273-1341
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 14, 2016, 1:25 PM

Post #11 of 16 (3803 views)

Permalink

Logs attached.

Jim

On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
> Uhm, hard to say, could you provide also dmesg?
>
> Alfredo
>
>> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>>
>> And one more, sorry. I tried to stop zbalance_ipc to move to
>> 32 queues and am getting this error:
>>
>> Message from syslogd@host at Oct 14 12:05:23 ...
>> kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>
>> Message from syslogd@host at Oct 14 12:05:23 ...
>> kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>
>> Message from syslogd@host at Oct 14 12:05:26 ...
>> kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>
>> Message from syslogd@host at Oct 14 12:05:27 ...
>> kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>
>> kill -9 has no effect. Is this a result of useing too many queues?
>>
>> Jim
>>
>> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>>> Hi Jim
>>> please note that when using distribution to multiple applications (using a comma-separated list in -n),
>>> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
>>> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
>>> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>>>
>>> Alfredo
>>>
>>>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>
>>>> I'm testing out a new server (36 cores, 72 with HT) using
>>>> zbalance_ipc, and it seems occasionally some packets are
>>>> getting sent to multiple processes.
>>>>
>>>> I'm currently running zbalance_ipc like so:
>>>>
>>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>>
>>>> with 72 snorts like so:
>>>>
>>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>>
>>>> I've got a custom HTTP rule to catch GETs with a particular
>>>> user-agent. I run 100 GETs, and each GET request has the run
>>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>>>> and this is what I end up getting when I check the GETs :
>>>>
>>>> 1 GET /11
>>>> 1 GET /2
>>>> 1 GET /30
>>>> 1 GET /34
>>>> 1 GET /37
>>>> 1 GET /5
>>>> 1 GET /59
>>>> 1 GET /62
>>>> 1 GET /70
>>>> 1 GET /8
>>>> 1 GET /83
>>>> 1 GET /84
>>>> 1 GET /9
>>>> 1 GET /90
>>>> 1 GET /94
>>>> 1 GET /95
>>>> 16 GET /97
>>>> 20 GET /12
>>>> 20 GET /38
>>>>
>>>> Obviously I'm still running into packet loss, but several of the
>>>> GETs are getting sent to multiple processes:
>>>>
>>>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>
>>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>>
>>>> -m 1
>>>>
>>>> but it seemed like I ended up dropping even more packets.
>>>>
>>>> Any advice/pointers appreciated.
>>>>
>>>> --
>>>> Jim Hranicky
>>>> Data Security Specialist
>>>> UF Information Technology
>>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>>> 352-273-1341
>>>> _______________________________________________
>>>> Ntop-misc mailing list
>>>> Ntop-misc@listgateway.unipi.it
>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 15, 2016, 4:12 AM

Post #12 of 16 (3805 views)

Permalink

This seems to be driver-related. You were using ixgbe in this test right? Did you do something like putting the interface down or unloading the driver perhaps? Trying to figure out what caused this..

Thank you
Alfredo

> On 14 Oct 2016, at 22:25, Jim Hranicky <jfh@ufl.edu> wrote:
>
> Logs attached.
>
> Jim
>
> On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
>> Uhm, hard to say, could you provide also dmesg?
>>
>> Alfredo
>>
>>> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>>>
>>> And one more, sorry. I tried to stop zbalance_ipc to move to
>>> 32 queues and am getting this error:
>>>
>>> Message from syslogd@host at Oct 14 12:05:23 ...
>>> kernel:BUG: soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>>
>>> Message from syslogd@host at Oct 14 12:05:23 ...
>>> kernel:BUG: soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>>
>>> Message from syslogd@host at Oct 14 12:05:26 ...
>>> kernel:BUG: soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>>
>>> Message from syslogd@host at Oct 14 12:05:27 ...
>>> kernel:BUG: soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>>
>>> kill -9 has no effect. Is this a result of useing too many queues?
>>>
>>> Jim
>>>
>>> On 10/14/2016 03:53 AM, Alfredo Cardigliano wrote:
>>>> Hi Jim
>>>> please note that when using distribution to multiple applications (using a comma-separated list in -n),
>>>> the fan-out API is used which supports up to 32 egress queues total, in your case you are using 73 queues,
>>>> thus I guess only the first 32 instances are receiving traffic (and maybe duplicated traffic due to a wrong
>>>> egress mask) . I will add a check for this in zbalance_ipc to avoid this kind of misconfigurations.
>>>>
>>>> Alfredo
>>>>
>>>>> On 13 Oct 2016, at 22:35, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>>
>>>>> I'm testing out a new server (36 cores, 72 with HT) using
>>>>> zbalance_ipc, and it seems occasionally some packets are
>>>>> getting sent to multiple processes.
>>>>>
>>>>> I'm currently running zbalance_ipc like so:
>>>>>
>>>>> /usr/local/pf/bin/zbalance_ipc -i zc:ens5f0 -m 4 -n 72,1 -c 99 -g 0 -S 1
>>>>>
>>>>> with 72 snorts like so:
>>>>>
>>>>> /usr/sbin/snort -D -i zc:99@$i --daq-dir=/usr/lib64/daq \
>>>>> --daq-var clusterid=99 --daq-var bindcpu=$i --daq pfring_zc \
>>>>> -c /etc/snort/ufirt-snort-pf-ewan.conf -l /var/log/snort69 -R ($i + 1)
>>>>>
>>>>> I've got a custom HTTP rule to catch GETs with a particular
>>>>> user-agent. I run 100 GETs, and each GET request has the run
>>>>> number and timestamp in the url. (GET /1/<ts>, GET /2/<ts>, etc)
>>>>> and this is what I end up getting when I check the GETs :
>>>>>
>>>>> 1 GET /11
>>>>> 1 GET /2
>>>>> 1 GET /30
>>>>> 1 GET /34
>>>>> 1 GET /37
>>>>> 1 GET /5
>>>>> 1 GET /59
>>>>> 1 GET /62
>>>>> 1 GET /70
>>>>> 1 GET /8
>>>>> 1 GET /83
>>>>> 1 GET /84
>>>>> 1 GET /9
>>>>> 1 GET /90
>>>>> 1 GET /94
>>>>> 1 GET /95
>>>>> 16 GET /97
>>>>> 20 GET /12
>>>>> 20 GET /38
>>>>>
>>>>> Obviously I'm still running into packet loss, but several of the
>>>>> GETs are getting sent to multiple processes:
>>>>>
>>>>> ens5f0.33 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.53 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.42 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.44 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.46 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.35 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.67 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.34 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.36 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.62 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.70 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.65 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.57 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.63 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.68 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.38 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.49 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.61 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.32 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>> ens5f0.72 GET /12/2016-10-13.14:04:49 HTTP/1.1
>>>>>
>>>>> Is this an issue with the zbalance_ipc hash? I tried using
>>>>>
>>>>> -m 1
>>>>>
>>>>> but it seemed like I ended up dropping even more packets.
>>>>>
>>>>> Any advice/pointers appreciated.
>>>>>
>>>>> --
>>>>> Jim Hranicky
>>>>> Data Security Specialist
>>>>> UF Information Technology
>>>>> 105 NW 16TH ST Room #104 GAINESVILLE FL 32603-1826
>>>>> 352-273-1341
>>>>> _______________________________________________
>>>>> Ntop-misc mailing list
>>>>> Ntop-misc@listgateway.unipi.it
>>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>>
>>>> _______________________________________________
>>>> Ntop-misc mailing list
>>>> Ntop-misc@listgateway.unipi.it
>>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>>>
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
> <zbal-logs.txt>_______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 15, 2016, 10:22 AM

Post #13 of 16 (3803 views)

Permalink

Yes, this is the ixgbe driver. All I did was run my version of
/etc/init.d/{suri,snortd} stop (testing both at the moment), and
both hosts locked up. I had to do a hard reset as the reboot hung
as well.

All the script does is kill zbalance_ipc then snort/suri . Should
I do that in reverse order so zbalance_ipc is killed last?

Jim

On 10/15/2016 07:12 AM, Alfredo Cardigliano wrote:
> This seems to be driver-related. You were using ixgbe in this test
> right? Did you do something like putting the interface down or
> unloading the driver perhaps? Trying to figure out what caused
> this..
>
> Thank you Alfredo
>
>> On 14 Oct 2016, at 22:25, Jim Hranicky <jfh@ufl.edu> wrote:
>>
>> Logs attached.
>>
>> Jim
>>
>> On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
>>> Uhm, hard to say, could you provide also dmesg?
>>>
>>> Alfredo
>>>
>>>> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>
>>>> And one more, sorry. I tried to stop zbalance_ipc to move to 32
>>>> queues and am getting this error:
>>>>
>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>> soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>>>
>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>> soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>>>
>>>> Message from syslogd@host at Oct 14 12:05:26 ... kernel:BUG:
>>>> soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>>>
>>>> Message from syslogd@host at Oct 14 12:05:27 ... kernel:BUG:
>>>> soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>>>
>>>> kill -9 has no effect. Is this a result of useing too many
>>>> queues?
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 26, 2016, 3:31 AM

Post #14 of 16 (3715 views)

Permalink

Hi Jim
ideally you should kill snort/suri and then zbalance_ipc, however killing zbalance_ipc
before should not lead to issues, this is something we usually do in our tests.
Is this something always reproducible? Could you try changing the order (killing
zbalance_ipc after snort/suri) to check if itâ€™s related to that?

Thank you
Alfredo

> On 15 Oct 2016, at 19:22, Jim Hranicky <jfh@ufl.edu> wrote:
>
> Yes, this is the ixgbe driver. All I did was run my version of
> /etc/init.d/{suri,snortd} stop (testing both at the moment), and
> both hosts locked up. I had to do a hard reset as the reboot hung
> as well.
>
> All the script does is kill zbalance_ipc then snort/suri . Should
> I do that in reverse order so zbalance_ipc is killed last?
>
> Jim
>
> On 10/15/2016 07:12 AM, Alfredo Cardigliano wrote:
>> This seems to be driver-related. You were using ixgbe in this test
>> right? Did you do something like putting the interface down or
>> unloading the driver perhaps? Trying to figure out what caused
>> this..
>>
>> Thank you Alfredo
>>
>>> On 14 Oct 2016, at 22:25, Jim Hranicky <jfh@ufl.edu> wrote:
>>>
>>> Logs attached.
>>>
>>> Jim
>>>
>>> On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
>>>> Uhm, hard to say, could you provide also dmesg?
>>>>
>>>> Alfredo
>>>>
>>>>> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>>
>>>>> And one more, sorry. I tried to stop zbalance_ipc to move to 32
>>>>> queues and am getting this error:
>>>>>
>>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>>> soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>>>>
>>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>>> soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>>>>
>>>>> Message from syslogd@host at Oct 14 12:05:26 ... kernel:BUG:
>>>>> soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>>>>
>>>>> Message from syslogd@host at Oct 14 12:05:27 ... kernel:BUG:
>>>>> soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>>>>
>>>>> kill -9 has no effect. Is this a result of useing too many
>>>>> queues?
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

jfh at ufl

Oct 26, 2016, 6:06 AM

Post #15 of 16 (3715 views)

Permalink

Changing the order seems to have solved the problem.

Jim

On 10/26/2016 06:31 AM, Alfredo Cardigliano wrote:
> Hi Jim
> ideally you should kill snort/suri and then zbalance_ipc, however killing zbalance_ipc
> before should not lead to issues, this is something we usually do in our tests.
> Is this something always reproducible? Could you try changing the order (killing
> zbalance_ipc after snort/suri) to check if itâ€™s related to that?
>
> Thank you
> Alfredo
>
>> On 15 Oct 2016, at 19:22, Jim Hranicky <jfh@ufl.edu> wrote:
>>
>> Yes, this is the ixgbe driver. All I did was run my version of
>> /etc/init.d/{suri,snortd} stop (testing both at the moment), and
>> both hosts locked up. I had to do a hard reset as the reboot hung
>> as well.
>>
>> All the script does is kill zbalance_ipc then snort/suri . Should
>> I do that in reverse order so zbalance_ipc is killed last?
>>
>> Jim
>>
>> On 10/15/2016 07:12 AM, Alfredo Cardigliano wrote:
>>> This seems to be driver-related. You were using ixgbe in this test
>>> right? Did you do something like putting the interface down or
>>> unloading the driver perhaps? Trying to figure out what caused
>>> this..
>>>
>>> Thank you Alfredo
>>>
>>>> On 14 Oct 2016, at 22:25, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>
>>>> Logs attached.
>>>>
>>>> Jim
>>>>
>>>> On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
>>>>> Uhm, hard to say, could you provide also dmesg?
>>>>>
>>>>> Alfredo
>>>>>
>>>>>> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>>>
>>>>>> And one more, sorry. I tried to stop zbalance_ipc to move to 32
>>>>>> queues and am getting this error:
>>>>>>
>>>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>>>> soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>>>>>
>>>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>>>> soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>>>>>
>>>>>> Message from syslogd@host at Oct 14 12:05:26 ... kernel:BUG:
>>>>>> soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>>>>>
>>>>>> Message from syslogd@host at Oct 14 12:05:27 ... kernel:BUG:
>>>>>> soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>>>>>
>>>>>> kill -9 has no effect. Is this a result of useing too many
>>>>>> queues?
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>
_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Re: zbalance_ipc Hash Mode 4 Sending Packets to Multiple Apps [ In reply to ]

cardigliano at ntop

Oct 26, 2016, 6:12 AM

Post #16 of 16 (3714 views)

Permalink

Ok, at least now we know where to look at.

Thank you
Alfredo

> On 26 Oct 2016, at 15:06, Jim Hranicky <jfh@ufl.edu> wrote:
>
> Changing the order seems to have solved the problem.
>
> Jim
>
> On 10/26/2016 06:31 AM, Alfredo Cardigliano wrote:
>> Hi Jim
>> ideally you should kill snort/suri and then zbalance_ipc, however killing zbalance_ipc
>> before should not lead to issues, this is something we usually do in our tests.
>> Is this something always reproducible? Could you try changing the order (killing
>> zbalance_ipc after snort/suri) to check if itâ€™s related to that?
>>
>> Thank you
>> Alfredo
>>
>>> On 15 Oct 2016, at 19:22, Jim Hranicky <jfh@ufl.edu> wrote:
>>>
>>> Yes, this is the ixgbe driver. All I did was run my version of
>>> /etc/init.d/{suri,snortd} stop (testing both at the moment), and
>>> both hosts locked up. I had to do a hard reset as the reboot hung
>>> as well.
>>>
>>> All the script does is kill zbalance_ipc then snort/suri . Should
>>> I do that in reverse order so zbalance_ipc is killed last?
>>>
>>> Jim
>>>
>>> On 10/15/2016 07:12 AM, Alfredo Cardigliano wrote:
>>>> This seems to be driver-related. You were using ixgbe in this test
>>>> right? Did you do something like putting the interface down or
>>>> unloading the driver perhaps? Trying to figure out what caused
>>>> this..
>>>>
>>>> Thank you Alfredo
>>>>
>>>>> On 14 Oct 2016, at 22:25, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>>
>>>>> Logs attached.
>>>>>
>>>>> Jim
>>>>>
>>>>> On 10/14/2016 03:44 PM, Alfredo Cardigliano wrote:
>>>>>> Uhm, hard to say, could you provide also dmesg?
>>>>>>
>>>>>> Alfredo
>>>>>>
>>>>>>> On 14 Oct 2016, at 18:07, Jim Hranicky <jfh@ufl.edu> wrote:
>>>>>>>
>>>>>>> And one more, sorry. I tried to stop zbalance_ipc to move to 32
>>>>>>> queues and am getting this error:
>>>>>>>
>>>>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>>>>> soft lockup - CPU#17 stuck for 22s! [migration/17:237]
>>>>>>>
>>>>>>> Message from syslogd@host at Oct 14 12:05:23 ... kernel:BUG:
>>>>>>> soft lockup - CPU#34 stuck for 22s! [zbalance_ipc:6496]
>>>>>>>
>>>>>>> Message from syslogd@host at Oct 14 12:05:26 ... kernel:BUG:
>>>>>>> soft lockup - CPU#1 stuck for 23s! [migration/1:157]
>>>>>>>
>>>>>>> Message from syslogd@host at Oct 14 12:05:27 ... kernel:BUG:
>>>>>>> soft lockup - CPU#13 stuck for 23s! [migration/13:217]
>>>>>>>
>>>>>>> kill -9 has no effect. Is this a result of useing too many
>>>>>>> queues?
>>> _______________________________________________
>>> Ntop-misc mailing list
>>> Ntop-misc@listgateway.unipi.it
>>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
>> _______________________________________________
>> Ntop-misc mailing list
>> Ntop-misc@listgateway.unipi.it
>> http://listgateway.unipi.it/mailman/listinfo/ntop-misc
>>
> _______________________________________________
> Ntop-misc mailing list
> Ntop-misc@listgateway.unipi.it
> http://listgateway.unipi.it/mailman/listinfo/ntop-misc

_______________________________________________
Ntop-misc mailing list
Ntop-misc@listgateway.unipi.it
http://listgateway.unipi.it/mailman/listinfo/ntop-misc

Mailing List Archive

Mailing List Archive

Attached Files: