Mailing List Archive

[Bug 421] bgpd gets unstable
Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug
report.

http://bugzilla.quagga.net/show_bug.cgi?id=421





------- Additional Comments From arnold@nipper.de 2007-11-26 12:13 -------
Do you really want to see the config?

rs2:~> wc /etc/quagga/bgpd.conf
183376 1479133 11962693 /etc/quagga/bgpd.conf

This machine is running bgpd only and problems described did happen during
normal operation, i.e. w/o re-configuring/adding/dropping a peer.


Arnold

------- Additional Comments From arnold@nipper.de 2007-12-23 11:56 -------
Same machine which gets unstable every other time. I don't see warnings but a
big subset of peers start to miss keepalives.

gawk '/Hold Timer Expired/{hte[substr($2,1,4)]++}END{for(i in hte)printf"%s0
%d\n",i,hte[i]}' rs2.log | sort
06:30 1
06:40 1
06:50 1
07:00 1
07:10 1
07:30 1
07:40 1
07:50 1
08:00 1
08:10 1
08:30 1
08:40 1
08:50 1
09:00 1
09:10 1
09:20 1
09:40 2
09:50 1
10:00 2
10:10 64
10:20 362
10:30 287
10:40 314
10:50 287
11:00 274
11:10 281
11:20 275
11:30 278
11:40 278
11:50 272
12:00 273
12:10 264
12:20 236
12:30 16
12:40 1

My assumptions is that due to large numbers of peers bgpd is unable to walk
through all peers to send keepalives (for whatever reason). Restart helps ..




Arnold




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
_______________________________________________
Quagga-bugs mailing list
Quagga-bugs@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-bugs
[Bug 421] bgpd gets unstable [ In reply to ]
Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug
report.

http://bugzilla.quagga.net/show_bug.cgi?id=421





------- Additional Comments From paul@dishone.st 2007-12-23 12:17 -------
How many peers, out of curiosity?

Can you get 'show thread cpu' output?

Is the machine swapping or is there some other, external explanation for why
wall-clock and bgpd-cpu-time seem to differ so greatly:

2007/11/17 10:32:36 warnings: BGP: SLOW THREAD: task bgp_read (807ca38) ran for
80782ms (cpu time 1086ms)

What's the memory usage of bgpd at this time? If it's large (as seen by system
tools) please get 'show memory' output.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
_______________________________________________
Quagga-bugs mailing list
Quagga-bugs@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-bugs
[Bug 421] bgpd gets unstable [ In reply to ]
Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug
report.

http://bugzilla.quagga.net/show_bug.cgi?id=421





------- Additional Comments From arnold@nipper.de 2007-12-23 12:27 -------
ipv4: 232
ipv6: 68

rs2# show thread cpu
CPU (user+system): Real (wall-clock):
Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0.000 3 0 0 56 111 R smux_read
0.000 2 0 0 1110 1244 TE smux_connect
5.999 75 79 1000 130 1473 W vty_flush
17593.327 93 189175 231965 193836 370422 T bgp_scan_timer
799.881 74 10809 650901 14523 926214 R vty_read
3.000 182 16 1000 42 142 T bgp_import
39849.976 116099 343 1389789 359 1403998 E bgp_event
241275.317 115552 2088 1952704 2155 1952240 R bgp_read
391.944 355 1104 237964 1368 340290 R bgp_accept
53.996 374 144 1000 190 15820 T bgp_start_timer
49.994 1175 42 1000 185 342 T bgp_connect_timer
27219.905 217024 125 35994 133 171725 W bgp_write
0.000 1 0 0 23 23 E zlookup_connect
0.000 2 0 0 146 149 T
bgp_holdtime_timer
1.000 10 100 1000 149 276 TE zclient_connect
20784.855 15779 1317 242963 1413 382253 T
bgp_keepalive_timer
0.000 2 0 0 90 108 R vty_accept
645.890 25215 25 1999 58 1204 T
bgp_routeadv_timer
278927.573 12779 21827 1315801 22224 1455591 B work_queue_run
627602.657 504796 1243 1952704 1282 1952240 RWTEXB TOTAL


rs2# show memory
System allocator statistics:
Total heap allocated: 512 MiB
Holding block headers: 132 KiB
Used small blocks: 0 bytes
Used ordinary blocks: 479 MiB
Free small blocks: 16 bytes
Free ordinary blocks: 33 MiB
Ordinary blocks: 2135514
Small blocks: 1
Holding blocks: 1
(see system documentation for 'mallinfo' for meaning)
-----------------------------
Temporary memory : 14
String vector : 23987
Vector : 11472
Vector index : 11472
Link List : 16
Link Node : 326
Thread : 1235
Thread master : 1
Thread stats : 19
Thread function name : 1254
VTY : 3
VTY history : 1
Buffer : 3
Buffer data : 1
Stream : 631
Stream data : 631
Stream FIFO : 312
Hash : 2503
Hash Bucket : 59910
Hash Index : 2503
Access List : 1
Access List Str : 1
Access Filter : 2
Prefix List : 307
Prefix List Entry : 185122
Prefix List Str : 307
Route map : 198
Route map name : 198
Route map index : 408
Route map rule : 428
Route map rule str : 428
Route map compiled : 821
Command desc : 12001
Socket union : 494
Privilege information : 2
Logging : 1
Zclient : 2
Work queue : 9
Work queue name string : 9
Host config : 5
-----------------------------
BGP instance : 1
BGP peer : 312
BGP peer hostname : 310
Peer group : 1
Peer description : 299
BGP attribute : 32911
BGP aspath : 23344
BGP aspath seg : 23346
BGP aspath segment data : 23346
BGP aspath str : 23344
-----------------------------
BGP table : 31
BGP node : 146530
BGP route : 179125
BGP synchronise : 2496
BGP adj in : 18
BGP adj out : 5323475
-----------------------------
BGP AS list : 258
BGP AS filter : 10522
BGP AS filter str : 10522
-----------------------------
community : 4430
community val : 4430
community str : 4418
-----------------------------
extcommunity : 8
extcommunity val : 8
extcommunity str : 8
-----------------------------
community-list : 203
community-list name : 203
community-list entry : 813
community-list config : 21
community-list handler : 1
-----------------------------
BGP transit attr : 2
BGP transit val : 2
-----------------------------
BGP regexp : 10543


top - 13:25:52 up 292 days, 16:01, 2 users, load average: 0.15, 0.21, 0.24
Tasks: 57 total, 2 running, 55 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3% us, 0.3% sy, 0.0% ni, 98.3% id, 0.0% wa, 0.0% hi, 1.0% si
Mem: 1026116k total, 939948k used, 86168k free, 166276k buffers
Swap: 1164704k total, 388k used, 1164316k free, 136192k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32628 quagga 15 0 517m 514m 1996 S 0.3 51.3 10:56.10 bgpd


> Is the machine swapping or is there some other, external explanation for why
wall-clock and bgpd-cpu-time seem to differ so greatly:

Please help, as I don't understand that question :-(



Thank you for your swift response,



Arnold



------- Additional Comments From paul@dishone.st 2007-12-23 12:54 -------
Ok. That instance of bgpd hasn't had this problem yet, as the longest wall-clock
(ie real, elapsed time - equal to the clock on your wall) time consumed by any
thread is 1952240µs, or just under 2s.

The problem in your first comment seems to be due to:

2007/11/17 10:32:36 warnings: BGP: SLOW THREAD: task bgp_read (807ca38) ran for
80782ms (cpu time 1086ms)

Here bgp_read took 1s of CPU, but *80*s of real, elapsed time. I.e. bgpd only
got to run for 1s, over 80s of elapsedtime . This implies the system was very
very busy doing something other than running bgpd. Typically that means it was
swapping.

This 'fresh' instance of bgpd is consuming about 512MB of RAM. The machine has
1GB of RAM.

Is the machine swapping when the problem occurs, and what size is bgpd when it
happens? If it's grown significantly larger than 512MB - get the output of 'show
memory' please. If it's not bgpd causing the swapping, are there other processes?

I suspect bgpd is leaking memory. I'll have to check whether any leaks were
fixed since 0.99.7.




------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
_______________________________________________
Quagga-bugs mailing list
Quagga-bugs@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-bugs
[Bug 421] bgpd gets unstable [ In reply to ]
Please do not reply directly to this email. All additional
comments should be made in the comments box of this bug
report.

http://bugzilla.quagga.net/show_bug.cgi?id=421





------- Additional Comments From arnold@nipper.de 2007-12-23 13:07 -------
iirc bgpd's memory usage was in the order of 570 MB. On rs1 we still have 0.98.3
which has a memory leak.

Besides bgpd there is only a bigbrother client, ntp and exim. All of them do not
really consume time and memory.





------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
_______________________________________________
Quagga-bugs mailing list
Quagga-bugs@lists.quagga.net
http://lists.quagga.net/mailman/listinfo/quagga-bugs