Mailing List Archive

Re: Problem with revision of Route Server pa tch
Paul,

From looking at the history of Zebra (before Quagga) I saw several
references to Solaris being misaligned often. Looking at the patch code,
though, I see that the main system programing for BGPd and Zebra should not
have this issue.

BGPD, when it dies, does not send an error out to bgpd.log nor does it core,
instead, the only real "error" I get is if I run it without the -d call, I
get "Buss Error" when it finally dies. With a lack of a core file, I ran
truss on the daemon while it was running. I am trying to get it to do
something via GDB to get some form of core dump for more tracking.
Thoughts?

From the truss output, I gathered from the last entry (when BGPD crashes)
that is was an alignment issue. See below:

12192: poll(0xFFBFD758, 11, 7982) (sleeping...)
12192: fd=5 ev=POLLRDNORM rev=0
12192: fd=8 ev=POLLRDNORM rev=0
12192: fd=9 ev=POLLRDNORM rev=0
12192: fd=10 ev=POLLRDNORM rev=0
12192: fd=12 ev=POLLOUT|POLLRDNORM rev=0
12192: fd=13 ev=POLLRDNORM rev=0
12192: fd=14 ev=POLLRDNORM rev=POLLRDNORM
12192: fd=15 ev=POLLRDNORM rev=0
12192: fd=16 ev=POLLOUT|POLLRDNORM rev=0
12192: fd=17 ev=POLLRDNORM rev=0
12192: fd=18 ev=POLLOUT|POLLRDNORM rev=0
12192: 1935.5612 poll(0xFFBFD758, 11, 7982) = 1
12192: fd=5 ev=POLLRDNORM rev=0
12192: fd=8 ev=POLLRDNORM rev=0
12192: fd=9 ev=POLLRDNORM rev=0
12192: fd=10 ev=POLLRDNORM rev=0
12192: fd=12 ev=POLLOUT|POLLRDNORM rev=0
12192: fd=13 ev=POLLRDNORM rev=POLLRDNORM
12192: fd=14 ev=POLLRDNORM rev=0
12192: fd=15 ev=POLLRDNORM rev=0
12192: fd=16 ev=POLLOUT|POLLRDNORM rev=0
12192: fd=17 ev=POLLRDNORM rev=0
12192: fd=18 ev=POLLOUT|POLLRDNORM rev=0
12192: 1935.5621 getpid() =
12192 [1]
12192: 1935.5623 open("/proc/12192/usage", O_RDONLY) = 19
12192: 1935.5626 read(19, "\0\0\0\0\0\0\001\0\0 7E6".., 256) =
256
12192: 1935.5628 close(19) = 0
12192: 1935.5630 fcntl(13, F_GETFL, 0x00000000) = 2
12192: 1935.5631 fstat64(13, 0xFFBFF648) = 0
12192: d=0x03C00000 i=50408 m=0140666 l=0 u=0 g=0 sz=0
12192: at = Mar 3 23:44:02 GMT 2004 [ 1078357442 ]
12192: mt = Mar 3 23:43:58 GMT 2004 [ 1078357438 ]
12192: ct = Mar 3 22:33:26 GMT 2004 [ 1078353206 ]
12192: bsz=8192 blks=0 fs=ufs
12192: 1935.5635 getsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748,
0xFFBFF740, 16711680) = 0
12192: 1935.5637 fstat64(13, 0xFFBFF648) = 0
12192: d=0x03C00000 i=50408 m=0140666 l=0 u=0 g=0 sz=0
12192: at = Mar 3 23:44:02 GMT 2004 [ 1078357442 ]
12192: mt = Mar 3 23:43:58 GMT 2004 [ 1078357438 ]
12192: ct = Mar 3 22:33:26 GMT 2004 [ 1078353206 ]
12192: bsz=8192 blks=0 fs=ufs
12192: 1935.5641 getsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748,
0xFFBFF744, 16711680) = 0
12192: 1935.5643 setsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748, 4,
16711680) = 0
12192: 1935.5645 fcntl(13, F_SETFL, 0x00000082) = 0
12192: 1935.5647 read(13, "FFFFFFFFFFFFFFFFFFFFFFFF".., 19) = 19
12192: 1935.5649 fstat64(13, 0xFFBFF648) = 0
12192: d=0x03C00000 i=50408 m=0140666 l=0 u=0 g=0 sz=0
12192: at = Mar 3 23:44:06 GMT 2004 [ 1078357446 ]
12192: mt = Mar 3 23:43:58 GMT 2004 [ 1078357438 ]
12192: ct = Mar 3 22:33:26 GMT 2004 [ 1078353206 ]
12192: bsz=8192 blks=0 fs=ufs
12192: 1935.5653 getsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748,
0xFFBFF744, 0) = 0
12192: 1935.5655 setsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748, 4, 0)
= 0
12192: 1935.5656 fcntl(13, F_SETFL, 0x00000002) = 0
12192: 1935.5658 fcntl(13, F_GETFL, 0x00000000) = 2
12192: 1935.5659 fstat64(13, 0xFFBFF648) = 0
12192: d=0x03C00000 i=50408 m=0140666 l=0 u=0 g=0 sz=0
12192: at = Mar 3 23:44:06 GMT 2004 [ 1078357446 ]
12192: mt = Mar 3 23:43:58 GMT 2004 [ 1078357438 ]
12192: ct = Mar 3 22:33:26 GMT 2004 [ 1078353206 ]
12192: bsz=8192 blks=0 fs=ufs
12192: 1935.5663 getsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748,
0xFFBFF740, 0) = 0
12192: 1935.5665 fstat64(13, 0xFFBFF648) = 0
12192: d=0x03C00000 i=50408 m=0140666 l=0 u=0 g=0 sz=0
12192: at = Mar 3 23:44:06 GMT 2004 [ 1078357446 ]
12192: mt = Mar 3 23:43:58 GMT 2004 [ 1078357438 ]
12192: ct = Mar 3 22:33:26 GMT 2004 [ 1078353206 ]
12192: bsz=8192 blks=0 fs=ufs
12192: 1935.5668 getsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748,
0xFFBFF744, 0) = 0
12192: 1935.5670 setsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748, 4, 0)
= 0
12192: 1935.5671 fcntl(13, F_SETFL, 0x00000082) = 0
12192: 1935.5673 read(13, "\0031086F2\0 2 @0101\0 @".., 61) = 61
12192: 1935.5675 fstat64(13, 0xFFBFF648) = 0
12192: d=0x03C00000 i=50408 m=0140666 l=0 u=0 g=0 sz=0
12192: at = Mar 3 23:44:06 GMT 2004 [ 1078357446 ]
12192: mt = Mar 3 23:43:58 GMT 2004 [ 1078357438 ]
12192: ct = Mar 3 22:33:26 GMT 2004 [ 1078353206 ]
12192: bsz=8192 blks=0 fs=ufs
12192: 1935.5680 getsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748,
0xFFBFF744, 0) = 0
12192: 1935.5682 setsockopt(13, SOL_SOCKET, 0x2000, 0xFFBFF748, 4, 0)
= 0
12192: 1935.5684 fcntl(13, F_SETFL, 0x00000002) = 0
12192: 1935.5685 time() =
1078357446
12192: 1935.5687 Incurred fault #5, FLTACCESS %pc = 0x0003E5D8
12192: siginfo: SIGBUS BUS_ADRALN addr=0x00200C41
12192: 1935.5690 Received signal #10, SIGBUS [default]
12192: siginfo: SIGBUS BUS_ADRALN addr=0x00200C41



-----Original Message-----
From: Paul Jakma [mailto:Paul.Jakma@Sun.COM]
Sent: Thu 3/4/2004 8:18 PM
To: Gibbs, Michael
Cc: quagga-dev@lists.quagga.net
Subject: Re: [quagga-dev 952] Re: Problem with revision of Route Server
patch



On Thu, 4 Mar 2004, Gibbs, Michael wrote:

> This looks like an address alignment issue,

What makes you think that?

> in which I see several references to a problem based on ipv6 code, but
> no patches suggesting the ipv4 code in quagga or Zebra has this issue
> on Solaris. Is this a known bug?

Can you be more specific? What references?

Also, you are running on Solaris, right?

> Mike Gibbs

--paulj
Re: Problem with revision of Route Server pa tch [ In reply to ]
--On Thursday, March 4, 2004 3:33 pm -0500 "Gibbs, Michael"
<MGibbs@thexchange.com> wrote:

> I am trying to get it to do
> something via GDB to get some form of core dump for more tracking.
> Thoughts?

Run it under gdb with logging turned on. When it crashes get a backtrace
('bt') and 'info local'. Provide that, the logs and the config would be a
good start.

Rick
Re: Problem with revision of Route Server pa tch [ In reply to ]
Rick,

Here is the initial output from gdb when it crashed


Starting program: /usr/local/quagga/sbin/bgpd -n
Program received signal SIGSEGV, Segmentation fault.
aspath_key_make (aspath=0x200c41) at bgp_aspath.c:1127
1127 key += *pnt++;
(gdb)

I will run the other command in the next test and get back to the group, and
yes we are running Solaris 9.

-----Original Message-----
From: Rick Payne [mailto:rickp@rossfell.co.uk]
Sent: Fri 3/5/2004 10:21 AM
To: Gibbs, Michael; quagga-dev@lists.quagga.net
Cc:
Subject: Re: [quagga-dev 954] Re: Problem with revision of Route Server pa
tch




--On Thursday, March 4, 2004 3:33 pm -0500 "Gibbs, Michael"
<MGibbs@thexchange.com> wrote:

> I am trying to get it to do
> something via GDB to get some form of core dump for more tracking.
> Thoughts?

Run it under gdb with logging turned on. When it crashes get a backtrace
('bt') and 'info local'. Provide that, the logs and the config would be a
good start.

Rick
Re: Problem with revision of Route Server pa tch [ In reply to ]
Ok,

For those still awake, here is more of a trace of the problem via GDB.

(gdb) run
Starting program: /usr/local/quagga/sbin/bgpd -n
Program received signal SIGSEGV, Segmentation fault.
aspath_key_make (aspath=0x200c43) at bgp_aspath.c:1127
1127 key += *pnt++;
(gdb) bt
#0 aspath_key_make (aspath=0x200c43) at bgp_aspath.c:1127
#1 0x0007cae0 in hash_get (hash=0x146dd8, data=0xffbfede8,
alloc_func=0x3d624 <aspath_hash_alloc>) at hash.c:73
#2 0x0003d6d8 in aspath_parse (
pnt=0x200c43 "\002\005\rÝ\002½\016Ê\016ÊGw@\003\004?ûÙï\200\004\004",
length=12) at bgp_aspath.c:352
#3 0x0004011c in bgp_attr_aspath (peer=0x200c43, length=0, attr=0x8cc00,
flag=64 '@' <mailto:'@'> , startp=0x0) at bgp_attr.c:666
(gdb)
(gdb) info local
key = 0
length = 6
pnt = (short unsigned int *) 0x200c43
(gdb)
(gdb) p aspath->length
$1 = 45944522
(gdb)
(gdb) p aspath->data
$2 = 0x4003043f <Address 0x4003043f out of bounds>
(gdb)
#1 0x0007cae0 in hash_get (hash=0x146dd8, data=0xffbfede8,
alloc_func=0x3d624 <aspath_hash_alloc>) at hash.c:73
73 key = (*hash->hash_key) (data);
(gdb)
(gdb) up
#1 0x0007cae0 in hash_get (hash=0x146dd8, data=0xffbfede8,
alloc_func=0x3d624 <aspath_hash_alloc>) at hash.c:73
73 key = (*hash->hash_key) (data);
(gdb) p/x*attr
No symbol "attr" in current context.
(gdb) up
#2 0x0003d6d8 in aspath_parse (
pnt=0x200c43 "\002\005\rÝ\002½\016Ê\016ÊGw@\003\004?ûÙï\200\004\004",
length=12) at bgp_aspath.c:352
352 find = hash_get (ashash, &as, aspath_hash_alloc);
(gdb) p/x*attr
No symbol "attr" in current context.
(gdb) up
#3 0x0004011c in bgp_attr_aspath (peer=0x200c43, length=0, attr=0x8cc00,
flag=64 '@' <mailto:'@'> , startp=0x0) at bgp_attr.c:666
666 attr->aspath = aspath_parse (stream_pnt (peer->ibuf), length);
(gdb) p/x*attr
$3 = {refcnt = 0x455f6d65, flag = 0x73736167, origin = 0x65, nexthop = {
S_un = {S_un_b = {s_b1 = 0x0, s_b2 = 0x0, s_b3 = 0x0, s_b4 = 0x0},
S_un_w = {s_w1 = 0x0, s_w2 = 0x0}, S_addr = 0x0}}, med = 0x52656365,
local_pref = 0x6976655f, aggregator_as = 0x4f50, aggregator_addr = {S_un =
{
S_un_b = {s_b1 = 0x5f, s_b2 = 0x6d, s_b3 = 0x65, s_b4 = 0x73}, S_un_w
= {
s_w1 = 0x5f6d, s_w2 = 0x6573}, S_addr = 0x5f6d6573}},
weight = 0x73616765, originator_id = {S_un = {S_un_b = {s_b1 = 0x0,
s_b2 = 0x0, s_b3 = 0x0, s_b4 = 0x0}, S_un_w = {s_w1 = 0x0,
s_w2 = 0x0}, S_addr = 0x0}}, cluster = 0x4b656570,
mp_nexthop_len = 0x41, mp_nexthop_global_in = {S_un = {S_un_b = {
s_b1 = 0x65, s_b2 = 0x5f, s_b3 = 0x74, s_b4 = 0x69}, S_un_w = {
s_w1 = 0x655f, s_w2 = 0x7469}, S_addr = 0x655f7469}},
mp_nexthop_local_in = {S_un = {S_un_b = {s_b1 = 0x6d, s_b2 = 0x65,
s_b3 = 0x72, s_b4 = 0x5f}, S_un_w = {s_w1 = 0x6d65, s_w2 = 0x725f},
S_addr = 0x6d65725f}}, aspath = 0x65787069, community = 0x72656400,
ecommunity = 0x486f6c64, transit = 0x5f54696d}
(gdb)


-----Original Message-----
From: Rick Payne [mailto:rickp@rossfell.co.uk]
Sent: Fri 3/5/2004 10:21 AM
To: Gibbs, Michael; quagga-dev@lists.quagga.net
Cc:
Subject: Re: [quagga-dev 954] Re: Problem with revision of Route Server pa
tch




--On Thursday, March 4, 2004 3:33 pm -0500 "Gibbs, Michael"
<MGibbs@thexchange.com> wrote:

> I am trying to get it to do
> something via GDB to get some form of core dump for more tracking.
> Thoughts?

Run it under gdb with logging turned on. When it crashes get a backtrace
('bt') and 'info local'. Provide that, the logs and the config would be a
good start.

Rick
Re: Problem with revision of Route Server pa tch [ In reply to ]
Ok,

This looks like it is a problem in the quagga-0.96.4 code and not Jose's
Route Server patch. I removed his patch, and only included Rick's IOVEC
patch when compiling. I have tested this out on the stable release and the
current CVS snapshot. I am running solaris 9 with 2 gigs of ram. This
happens when I run it with one or more neighbors sending at least 110k per
neighbor. Any thoughts on what could cause this from the GDB and truss
sets? From tracing more, it looks like the stack is blown out between #2
and #1 below, but I can not see why that would happen. The aspath size
shown below is just insane in size, but it parses to the last 5 and crashes.
The pointer actually loops while doing this.

Mike Gibbs

!
! Zebra configuration saved from vty
! 2004/03/06 01:53:30
!
password ************
enable password *************
log file /usr/local/quagga/logs/bgpd.log
!
bgp multiple-instance
!
router bgp 65010 view Route-Server
bgp log-neighbor-changes
neighbor 10.8.60.71 remote-as 65024
neighbor 10.8.60.71 shutdown
neighbor 10.8.60.71 soft-reconfiguration inbound
neighbor 10.8.60.71 attribute-unchanged as-path next-hop
neighbor 63.251.217.236 remote-as 65025
neighbor 63.251.217.236 soft-reconfiguration inbound
neighbor 63.251.217.236 route-map 65025-export out
neighbor 63.251.217.236 attribute-unchanged as-path next-hop
neighbor 63.251.217.239 remote-as 65026
neighbor 63.251.217.239 soft-reconfiguration inbound
neighbor 63.251.217.239 route-map 65026-comcon in
neighbor 63.251.217.239 route-map 65026-export out
neighbor 63.251.217.239 attribute-unchanged as-path next-hop
neighbor 63.251.217.240 remote-as 65027
neighbor 63.251.217.240 soft-reconfiguration inbound
neighbor 63.251.217.240 route-map 65027-export out
neighbor 63.251.217.240 attribute-unchanged as-path next-hop
neighbor 63.251.217.241 remote-as 65028
neighbor 63.251.217.241 soft-reconfiguration inbound
neighbor 63.251.217.241 route-map 65028-comcon in
neighbor 63.251.217.241 route-map 65028-export out
neighbor 63.251.217.241 attribute-unchanged as-path next-hop
neighbor 63.251.217.242 remote-as 65029
neighbor 63.251.217.242 soft-reconfiguration inbound
neighbor 63.251.217.242 route-map 65029-export out
neighbor 63.251.217.242 attribute-unchanged as-path next-hop
neighbor 63.251.217.243 remote-as 65030
neighbor 63.251.217.243 soft-reconfiguration inbound
neighbor 63.251.217.243 route-map 65030-export out
neighbor 63.251.217.243 attribute-unchanged as-path next-hop
neighbor 63.251.217.244 remote-as 65031
neighbor 63.251.217.244 soft-reconfiguration inbound
neighbor 63.251.217.244 route-map 65031-export out
neighbor 63.251.217.244 attribute-unchanged as-path next-hop
!
ip community-list standard 65025-65024 permit 65028:300
ip community-list expanded 65025-customer permit 65028:410
ip community-list standard 65026-65024 permit 65026:300
ip community-list expanded 65026-customer permit 65026:4...
ip community-list expanded 65026-peer permit 65026:[123]...
ip community-list standard 65027-65024 permit 65027:300
!
route-map 65026-comcon permit 10
match community 65026-customer
set community 65026:300 additive
!
route-map 65026-comcon permit 20
match community 65026-peer
set community 65026:300 additive
!
route-map 65028-comcon permit 10
match community 65028-customer
set community 65028:300 additive
!
route-map 65028-comcon permit 20
set community 65028:300 additive
!
line vty
!


-----Original Message-----
From: Gibbs, Michael
Sent: Friday, March 05, 2004 11:11 AM
To: quagga-dev@lists.quagga.net
Subject: RE: [quagga-dev 954] Re: Problem with revision of Route Server
patch

Ok,

For those still awake, here is more of a trace of the problem via GDB.

(gdb) run
Starting program: /usr/local/quagga/sbin/bgpd -n
Program received signal SIGSEGV, Segmentation fault.
aspath_key_make (aspath=0x200c43) at bgp_aspath.c:1127
1127 key += *pnt++;
(gdb) bt
#0 aspath_key_make (aspath=0x200c43) at bgp_aspath.c:1127
#1 0x0007cae0 in hash_get (hash=0x146dd8, data=0xffbfede8,
alloc_func=0x3d624 <aspath_hash_alloc>) at hash.c:73
#2 0x0003d6d8 in aspath_parse (
pnt=0x200c43 "\002\005\rÝ\002½\016Ê\016ÊGw@\003\004?ûÙï\200\004\004",
length=12) at bgp_aspath.c:352
#3 0x0004011c in bgp_attr_aspath (peer=0x200c43, length=0, attr=0x8cc00,
flag=64 '@' <mailto:'@'> , startp=0x0) at bgp_attr.c:666
(gdb)
(gdb) info local
key = 0
length = 6
pnt = (short unsigned int *) 0x200c43
(gdb)
(gdb) p aspath->length
$1 = 45944522
(gdb)
(gdb) p aspath->data
$2 = 0x4003043f <Address 0x4003043f out of bounds>
(gdb)
#1 0x0007cae0 in hash_get (hash=0x146dd8, data=0xffbfede8,
alloc_func=0x3d624 <aspath_hash_alloc>) at hash.c:73
73 key = (*hash->hash_key) (data);
(gdb)
(gdb) up
#1 0x0007cae0 in hash_get (hash=0x146dd8, data=0xffbfede8,
alloc_func=0x3d624 <aspath_hash_alloc>) at hash.c:73
73 key = (*hash->hash_key) (data);
(gdb) p/x*attr
No symbol "attr" in current context.
(gdb) up
#2 0x0003d6d8 in aspath_parse (
pnt=0x200c43 "\002\005\rÝ\002½\016Ê\016ÊGw@\003\004?ûÙï\200\004\004",
length=12) at bgp_aspath.c:352
352 find = hash_get (ashash, &as, aspath_hash_alloc);
(gdb) p/x*attr
No symbol "attr" in current context.
(gdb) up
#3 0x0004011c in bgp_attr_aspath (peer=0x200c43, length=0, attr=0x8cc00,
flag=64 '@' <mailto:'@'> , startp=0x0) at bgp_attr.c:666
666 attr->aspath = aspath_parse (stream_pnt (peer->ibuf), length);
(gdb) p/x*attr
$3 = {refcnt = 0x455f6d65, flag = 0x73736167, origin = 0x65, nexthop = {
S_un = {S_un_b = {s_b1 = 0x0, s_b2 = 0x0, s_b3 = 0x0, s_b4 = 0x0},
S_un_w = {s_w1 = 0x0, s_w2 = 0x0}, S_addr = 0x0}}, med = 0x52656365,
local_pref = 0x6976655f, aggregator_as = 0x4f50, aggregator_addr = {S_un =
{
S_un_b = {s_b1 = 0x5f, s_b2 = 0x6d, s_b3 = 0x65, s_b4 = 0x73}, S_un_w
= {
s_w1 = 0x5f6d, s_w2 = 0x6573}, S_addr = 0x5f6d6573}},
weight = 0x73616765, originator_id = {S_un = {S_un_b = {s_b1 = 0x0,
s_b2 = 0x0, s_b3 = 0x0, s_b4 = 0x0}, S_un_w = {s_w1 = 0x0,
s_w2 = 0x0}, S_addr = 0x0}}, cluster = 0x4b656570,
mp_nexthop_len = 0x41, mp_nexthop_global_in = {S_un = {S_un_b = {
s_b1 = 0x65, s_b2 = 0x5f, s_b3 = 0x74, s_b4 = 0x69}, S_un_w = {
s_w1 = 0x655f, s_w2 = 0x7469}, S_addr = 0x655f7469}},
mp_nexthop_local_in = {S_un = {S_un_b = {s_b1 = 0x6d, s_b2 = 0x65,
s_b3 = 0x72, s_b4 = 0x5f}, S_un_w = {s_w1 = 0x6d65, s_w2 = 0x725f},
S_addr = 0x6d65725f}}, aspath = 0x65787069, community = 0x72656400,
ecommunity = 0x486f6c64, transit = 0x5f54696d}
(gdb)
-----Original Message-----
From: Rick Payne [mailto:rickp@rossfell.co.uk]
Sent: Fri 3/5/2004 10:21 AM
To: Gibbs, Michael; quagga-dev@lists.quagga.net
Cc:
Subject: Re: [quagga-dev 954] Re: Problem with revision of Route Server pa
tch

--On Thursday, March 4, 2004 3:33 pm -0500 "Gibbs, Michael"
<MGibbs@thexchange.com> wrote:
> I am trying to get it to do
> something via GDB to get some form of core dump for more tracking.
> Thoughts?
Run it under gdb with logging turned on. When it crashes get a backtrace
('bt') and 'info local'. Provide that, the logs and the config would be a
good start.
Rick