Mailing List Archive: CPU xeon all Cores 100% ospf

CPU xeon all Cores 100% ospf

Jan 29, 2017, 4:37 AM

Post #1 of 7 (1157 views)

>
> Friends,
>
> This is my first toll.
>
> We're leaving mikrotik as a pppoe server. We are testing debian 8 64bits with the configuration that I will inform below.
>
> We are running quagga with IP summarization, accel-ppp and ipv6.
>
> We have 3,000 pppoe sessions. The problem is that the cpu gets from 6% to 9% and suddenly it goes on for a few seconds to 90% to 100% dropping the accel-ppp dropping all users and then back to normal.
>
> What can I do to improve? These peaks are with large intervals, sometimes 3 hours pass to happen.
>
> Server Configuration:
>
>
> Xeon v3 3.4ghz 8MB quad core
> 32GB ddr3 1600MHZ
> 2x 3TB SAS 6GB / s raid 1 via off-board controller Intel 1GB cache.
> 8 gigabit intel ports but we are only using 4 making 2GB / s LACP bonding for WAN and 2GB for LAN.
>
> Spending on average 1.1GB and 1.5GB / s of traffic at peak times. Traffic is not the problem because today happened with 500MB of traffic.
>
>
>
> Fernando Galvão
> Wantel Telecom
> +55 87 3866-5200
>
>
>

Re: CPU xeon all Cores 100% ospf [ In reply to ]

tom at samplonius

Jan 29, 2017, 2:39 PM

Post #2 of 7 (1155 views)

Permalink

In what process is the high CPU usage? It is not in any of the Quagga processes, then it isn’t a quagga issue.

So it is probably not Quagga. But you need to separate cause and effect. Is the jump to 100% CPU the cause of the dropped PPPoE sessions, or the 100% CPU the result of all of the sessions being torn down?

And if is not a Quagga issue, you should go to the accel-ppp mailing list.

> On Jan 29, 2017, at 4:37 AM, Fernando Galvão <fernando@wantel.com.br> wrote:
>
>> We have 3,000 pppoe sessions. The problem is that the cpu gets from 6% to 9% and suddenly it goes on for a few seconds to 90% to 100% dropping the accel-ppp dropping all users and then back to normal.

Re: CPU xeon all Cores 100% ospf [ In reply to ]

fernando at wantel

Jan 29, 2017, 3:10 PM

Post #3 of 7 (1155 views)

Permalink

Mudar para português
Follows the moment of the blast on the CPU and when it spends most of the time below 10%. The process is the zebra

Enviado do meu iPhone

> Em 29 de jan de 2017, às 19:39, Tom Samplonius <tom@samplonius.org> escreveu:
>
> In what process is the high CPU usage? It is not in any of the Quagga processes, then it isn’t a quagga issue.
>
> So it is probably not Quagga. But you need to separate cause and effect. Is the jump to 100% CPU the cause of the dropped PPPoE sessions, or the 100% CPU the result of all of the sessions being torn down?
>
> And if is not a Quagga issue, you should go to the accel-ppp mailing list.

Re: CPU xeon all Cores 100% ospf [ In reply to ]

fernando at wantel

Jan 30, 2017, 5:16 AM

Post #4 of 7 (1152 views)

Permalink

zebra 20%

Router# show thread cpu
CPU (user+system): Real (wall-clock):
Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0.000 3 0 0 59 77 R vty_accept
1494296.000 48529220 30 200000 30 386589 B work_queue_run
332.000 37437 8 36000 7 3979 R zebra_client_read
5484.000 26577 206 28000 202 22775 R vtysh_read
0.000 20 0 0 64 325 W zserv_flush_data
0.000 114 0 0 21 179 R vty_read
0.000 115 0 0 14 213 W vty_flush
68.000 5318 12 12000 10 372 R vtysh_accept
78180.000 870403 89 72000 90 76227 R kernel_read
4.000 2 2000 4000 626 1244 R zebra_accept
1578364.000 49469209 31 200000 32 386589 RWTEXB TOTAL
Fernando Galvão
Wantel Telecom
+55 87 3866-5200

> Em 30 de jan de 2017, à(s) 08:54, Fernando Galvão <fernando@wantel.com.br> escreveu:
>
>
> Fernando Galvão
> Wantel Telecom
> +55 87 3866-5200
>
>
>
>> Início da mensagem encaminhada:
>>
>> De: Paul Jakma <paul@jakma.org <mailto:paul@jakma.org>>
>> Assunto: Re: CPU xeon all Cores 100% ospf
>> Data: 30 de janeiro de 2017 04:23:29 BRT
>> Para: Fernando Galvão <fernando@wantel.com.br <mailto:fernando@wantel.com.br>>
>> Cc: quagga-users-owner@lists.quagga.net <mailto:quagga-users-owner@lists.quagga.net>
>>
>> On Sun, 29 Jan 2017, Fernando Galvão wrote:
>>
>>> Mudar para português
>>> What do you need information to help me?
>>
>> If you can check you can access the telnet interface (I recommend the -A 127.1 argument to restrict access to localhost), then, once this event has happened, go into the telnet interface and get the output from:
>>
>> "show thread cpu"
>>
>> That will give us a rough idea what part of ospfd is blocking for long enough to cause adjacencies to drop (assuming that's the problem).
>>
>> regards,
>> --
>> Paul Jakma | paul@jakma.org <mailto:paul@jakma.org> | @pjakma | Key ID: 0xD86BF79464A2FF6A
>> Fortune:
>> Anyone can do any amount of work provided it isn't the work he is supposed
>> to be doing at the moment.
>> -- Robert Benchley
>

Re: CPU xeon all Cores 100% ospf [ In reply to ]

fernando at wantel

Jan 30, 2017, 6:17 AM

Post #5 of 7 (1152 views)

Permalink

zebra 90%

Router# show thread cpu
CPU (user+system): Real (wall-clock):
Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0.000 4 0 0 57 77 R vty_accept
1638604.000 53380874 30 200000 30 386589 B work_queue_run
360.000 40561 8 36000 7 3979 R zebra_client_read
5992.000 28970 206 28000 201 22775 R vtysh_read
0.000 20 0 0 64 325 W zserv_flush_data
4.000 162 24 4000 91 1927 R vty_read
0.000 164 0 0 16 213 W vty_flush
0.000 1 0 0 47 47 T vty_timeout
72.000 5796 12 12000 10 372 R vtysh_accept
85764.000 934452 91 72000 92 76227 R kernel_read
4.000 2 2000 4000 626 1244 R zebra_accept
1730800.000 54391006 31 200000 31 386589 RWTEXB TOTAL

zebra ~ 80%

Router# show thread cpu
CPU (user+system): Real (wall-clock):
Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0.000 4 0 0 57 77 R vty_accept
1639268.000 53403255 30 200000 30 386589 B work_queue_run
360.000 40561 8 36000 7 3979 R zebra_client_read
5992.000 28971 206 28000 201 22775 R vtysh_read
0.000 20 0 0 64 325 W zserv_flush_data
4.000 164 24 4000 97 1927 R vty_read
0.000 166 0 0 16 213 W vty_flush
0.000 1 0 0 47 47 T vty_timeout
72.000 5796 12 12000 10 372 R vtysh_accept
85804.000 934531 91 72000 92 76227 R kernel_read
4.000 2 2000 4000 626 1244 R zebra_accept
1731504.000 54413471 31 200000 31 386589 RWTEXB TOTAL
Fernando Galvão
Wantel Telecom
+55 87 3866-5200

> Em 30 de jan de 2017, à(s) 10:16, Fernando Galvão <fernando@wantel.com.br> escreveu:
>
> zebra 20%
>
> Router# show thread cpu
> CPU (user+system): Real (wall-clock):
> Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
> 0.000 3 0 0 59 77 R vty_accept
> 1494296.000 48529220 30 200000 30 386589 B work_queue_run
> 332.000 37437 8 36000 7 3979 R zebra_client_read
> 5484.000 26577 206 28000 202 22775 R vtysh_read
> 0.000 20 0 0 64 325 W zserv_flush_data
> 0.000 114 0 0 21 179 R vty_read
> 0.000 115 0 0 14 213 W vty_flush
> 68.000 5318 12 12000 10 372 R vtysh_accept
> 78180.000 870403 89 72000 90 76227 R kernel_read
> 4.000 2 2000 4000 626 1244 R zebra_accept
> 1578364.000 49469209 31 200000 32 386589 RWTEXB TOTAL
> Fernando Galvão
> Wantel Telecom
> +55 87 3866-5200
>
>
>
>> Em 30 de jan de 2017, à(s) 08:54, Fernando Galvão <fernando@wantel.com.br <mailto:fernando@wantel.com.br>> escreveu:
>>
>>
>> Fernando Galvão
>> Wantel Telecom
>> +55 87 3866-5200
>>
>>
>>
>>> Início da mensagem encaminhada:
>>>
>>> De: Paul Jakma <paul@jakma.org <mailto:paul@jakma.org>>
>>> Assunto: Re: CPU xeon all Cores 100% ospf
>>> Data: 30 de janeiro de 2017 04:23:29 BRT
>>> Para: Fernando Galvão <fernando@wantel.com.br <mailto:fernando@wantel.com.br>>
>>> Cc: quagga-users-owner@lists.quagga.net <mailto:quagga-users-owner@lists.quagga.net>
>>>
>>> On Sun, 29 Jan 2017, Fernando Galvão wrote:
>>>
>>>> Mudar para português
>>>> What do you need information to help me?
>>>
>>> If you can check you can access the telnet interface (I recommend the -A 127.1 argument to restrict access to localhost), then, once this event has happened, go into the telnet interface and get the output from:
>>>
>>> "show thread cpu"
>>>
>>> That will give us a rough idea what part of ospfd is blocking for long enough to cause adjacencies to drop (assuming that's the problem).
>>>
>>> regards,
>>> --
>>> Paul Jakma | paul@jakma.org <mailto:paul@jakma.org> | @pjakma | Key ID: 0xD86BF79464A2FF6A
>>> Fortune:
>>> Anyone can do any amount of work provided it isn't the work he is supposed
>>> to be doing at the moment.
>>> -- Robert Benchley
>>
>
> _______________________________________________
> Quagga-users mailing list
> Quagga-users@lists.quagga.net
> https://lists.quagga.net/mailman/listinfo/quagga-users

Re: CPU xeon all Cores 100% ospf [ In reply to ]

paul at jakma

Feb 2, 2017, 4:53 AM

Post #6 of 7 (1143 views)

Permalink

Hi Fernando,

Thanks for that. Could you get the output of the same command from
ospfd, after the event you describe has happened? The telnet for ospfd
is on port 2604.

regards,

Paul

On Mon, 30 Jan 2017, Fernando Galvão wrote:

> zebra 20%
>
> Router# show thread cpu
> CPU (user+system): Real (wall-clock):
> Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
> 0.000 3 0 0 59 77 R vty_accept
> 1494296.000 48529220 30 200000 30 386589 B work_queue_run
> 332.000 37437 8 36000 7 3979 R zebra_client_read
> 5484.000 26577 206 28000 202 22775 R vtysh_read
> 0.000 20 0 0 64 325 W zserv_flush_data
> 0.000 114 0 0 21 179 R vty_read
> 0.000 115 0 0 14 213 W vty_flush
> 68.000 5318 12 12000 10 372 R vtysh_accept
> 78180.000 870403 89 72000 90 76227 R kernel_read
> 4.000 2 2000 4000 626 1244 R zebra_accept
> 1578364.000 49469209 31 200000 32 386589 RWTEXB TOTAL
> Fernando Galvão
> Wantel Telecom
> +55 87 3866-5200
>
>
>
>> Em 30 de jan de 2017, à(s) 08:54, Fernando Galvão <fernando@wantel.com.br> escreveu:
>>
>>
>> Fernando Galvão
>> Wantel Telecom
>> +55 87 3866-5200
>>
>>
>>
>>> Início da mensagem encaminhada:
>>>
>>> De: Paul Jakma <paul@jakma.org <mailto:paul@jakma.org>>
>>> Assunto: Re: CPU xeon all Cores 100% ospf
>>> Data: 30 de janeiro de 2017 04:23:29 BRT
>>> Para: Fernando Galvão <fernando@wantel.com.br <mailto:fernando@wantel.com.br>>
>>> Cc: quagga-users-owner@lists.quagga.net <mailto:quagga-users-owner@lists.quagga.net>
>>>
>>> On Sun, 29 Jan 2017, Fernando Galvão wrote:
>>>
>>>> Mudar para português
>>>> What do you need information to help me?
>>>
>>> If you can check you can access the telnet interface (I recommend the -A 127.1 argument to restrict access to localhost), then, once this event has happened, go into the telnet interface and get the output from:
>>>
>>> "show thread cpu"
>>>
>>> That will give us a rough idea what part of ospfd is blocking for long enough to cause adjacencies to drop (assuming that's the problem).
>>>
>>> regards,
>>> --
>>> Paul Jakma | paul@jakma.org <mailto:paul@jakma.org> | @pjakma | Key ID: 0xD86BF79464A2FF6A
>>> Fortune:
>>> Anyone can do any amount of work provided it isn't the work he is supposed
>>> to be doing at the moment.
>>> -- Robert Benchley
>>
>
>

--
Paul Jakma | paul@jakma.org | @pjakma | Key ID: 0xD86BF79464A2FF6A
Fortune:
Ask not what's inside your head, but what your head's inside of.
-- J.J. Gibson

Re: CPU xeon all Cores 100% ospf [ In reply to ]

fernando at wantel

Mar 3, 2017, 9:03 AM

Post #7 of 7 (1126 views)

Permalink

Dear TOM,

I am with another server for more robust pppoe. It has 2x Xeon processors 2.4ghz 12mb quad core face cache. Even so, I still have issues with the zebra and ospf process. They've turned the cpu on 100% and drop my pppoe sessions from accel-ppp. What can I do? I see that I have unused CPU colors. Does not the quagga use the cores (16 in total) by balancing the load between them? Because it is not working. Today I had to restart the server and reduce to 2,000 pppoe sessions for the quagga to stay at 5% cpu consumption

Fernando Galvão
Wantel Telecom
+55 87 3866-5200

> Em 29 de jan de 2017, à(s) 20:10, Fernando Galvão <fernando@wantel.com.br> escreveu:
>
> <>

> Mudar para português <>
> Follows the moment of the blast on the CPU and when it spends most of the time below 10%. The process is the zebra
> <image1.jpeg><image2.jpeg>
>
> Enviado do meu iPhone
>
> Em 29 de jan de 2017, às 19:39, Tom Samplonius <tom@samplonius.org <mailto:tom@samplonius.org>> escreveu:
>
>> In what process is the high CPU usage? It is not in any of the Quagga processes, then it isn’t a quagga issue.
>>
>> So it is probably not Quagga. But you need to separate cause and effect. Is the jump to 100% CPU the cause of the dropped PPPoE sessions, or the 100% CPU the result of all of the sessions being torn down?
>>
>> And if is not a Quagga issue, you should go to the accel-ppp mailing list.
> _______________________________________________
> Quagga-users mailing list
> Quagga-users@lists.quagga.net
> https://lists.quagga.net/mailman/listinfo/quagga-users

Mailing List Archive

Mailing List Archive

Attached Files: