Hello Rob/Dave
Thanks for the pointers. I figured out the issue. The reason my C stub was able to list out all interfaces without crashing is -
if (getifaddrs(&ifaddr) == -1) {
print ("getifaddr failed");
exit(1);
}
struct ifaddrs *ifa = ifaddr;
for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) {
--->>>> if (ifa->ifa_addr != NULL) { ------>> Check for ifa_addr
int family = ifa->ifa_addr->sa_family;
I was only looking into the ifaddrs structure only when the interface addr is set.
In the stub_if_getaddr code, the code is as follows
ret = getifaddrs(&ifaddrs);
if (ret < 0)
caml_failwith("cannot get interface address");
for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
sock = tmp->ifa_addr; ------------------------------>Assigned here
netmask = tmp->ifa_netmask;
if (sock->sa_family == AF_INET || sock->sa_family == AF_INET6) { -------------> Dereferenced here without checking
name = caml_copy_string(tmp->ifa_name);
<snip>
In my case, there were two internal interfaces for which the interface address was not setup and while iterating through the list, there was a NULL pointer dereference.
It might look like defensive coding but can we ignore the interfaces for which the ifa_addr is not set. I can open up a bug and fix it if there is consenus that this needs to be fixed.
Ranjeet
-----Original Message-----
From: Rob Hoes [mailto:Rob.Hoes@citrix.com]
Sent: Wednesday, March 26, 2014 4:44 AM
To: Ranjeet R; Dave Scott
Cc: xen-api@lists.xen.org
Subject: RE: [Xen-API] Debugging XAPI daemon crash
Hi Ranjeet,
> It seems to be crashing in the same point as you had mentioned. Please
> find the SEGV backtrace attached.
>
> (gdb) c
> Program received signal SIGSEGV, Segmentation fault.
> 0x085bc2d6 in stub_if_getaddr ()
> (gdb) bt
> #0 0x085cca90 in segv_handler ()
> #1 <signal handler called>
> #2 0x085bc2d6 in stub_if_getaddr ()
> #3 0x0850ef8c in camlNetdev__get_all_ipv4_1325 ()
>
> You had mentioned that this could be because of a bad C function binding.
> I wrote a small C stub to see whether it works for the xenbr0
> interface and it seems to be working fine. How should I verify the binding.
The function that is failing seems to be this one:
https://github.com/xapi-project/xen-api-libs/blob/clearwater/netdev/addr_stubs.c#L74 It has:
int ret;
struct ifaddrs *ifaddrs, *tmp;
[...]
ret = getifaddrs(&ifaddrs);
[...]
for (tmp = ifaddrs; tmp; tmp = tmp->ifa_next) {
sock = tmp->ifa_addr;
netmask = tmp->ifa_netmask;
[...]
Could it be that the getifaddrs function does not set ifaddrs correctly? You should be able to test this with a small C program. Or is this what you have already done?
Cheers,
Rob
> Appreciate your help.
>
> -Ranjeet
>
> -----Original Message-----
> From: David Scott [mailto:dave.scott@eu.citrix.com]
> Sent: Monday, March 24, 2014 3:46 AM
> To: Ranjeet R
> Cc: xen-api@lists.xen.org
> Subject: Re: [Xen-API] Debugging XAPI daemon crash
>
> On 24/03/14 10:30, Ranjeet R wrote:
> > Hello Dave
> >
> > The binaries did not have debug symbols but I managed to rebuild the
> binaries with debug enabled.
>
> Great.
>
> > I tried starting the xapi process as it was started in the init.d
> scripts under gdb. However, in gdb, the xapi process forks another
> process and I am not able to debug it further (I tried setting
> detach_on_fork to off in gdb, but the primary process just goes to end of execution).
> >
> > I am using the following gdb command to debug
> >
> > gdb --args /usr/sbin/xapi -daemon -writeinitcomplete
> /var/run/xapi_init_complete.cookie -writereadyfile
> /var/run/xapi_startup.cookie -onsystemboot"
> >
> > Can you please help me in the steps that you use in debugging the
> > XAPI
> process.
>
> Ah, I think xapi forks a "watchdog" process near the start -- this is
> probably what you're seeing.
>
> Try adding a "-nowatchdog" option to the command-line.
>
> Dave
>
> >
> > Thanks for your help,
> >
> > -Ranjeet
> >
> >
> >
> > -----Original Message-----
> > From: Dave Scott [mailto:Dave.Scott@citrix.com]
> > Sent: Saturday, March 22, 2014 12:36 PM
> > To: Ranjeet R
> > Cc: xen-api@lists.xen.org
> > Subject: Re: [Xen-API] Debugging XAPI daemon crash
> >
> > Hi,
> >
> > I suspect the segfault is being caused by a bad C function binding.
> > I've
> seen a similar crash before when querying an interface IP via
> getifaddrs (I think that was the function name) Could you run xapi in
> gdb and reproduce the crash? Printing the call stack would help to
> confirm this hypothesis. Provided the xapi binary still has debug
> symbols (ie hasn't been stripped) the ocaml functions (with fairly
> obvious mangled names) should also be on the stack too.
> >
> > Cheers,
> > Dave
> >
> >> On Mar 22, 2014, at 3:47 AM, "Ranjeet R" <rranjeet@juniper.net> wrote:
> >>
> >> Hello all
> >>
> >> I am trying to bring a DevCloud setup which has an XCP Kronos based
> XAPI daemon. I had changed the underlying network implementation (it
> is not a bridge, but an openvswitch-like network implementation) and
> the XAPI daemon crashes during bootup. Please find the XAPI logs below.
> >>
> >>
> >> starting up database engine D:72969b3eaf8e|redo_log] Flushing
> >> database to all active redo-logs starting up database engine
> >> D:72969b3eaf8e|xapi] About to flush database: /var/lib/xcp/state.db
> >> starting up database engine D:72969b3eaf8e|redo_log] Flushing
> >> database to all active redo-logs starting up database engine
> >> D:72969b3eaf8e|xapi] Performing initial DB GC thread_zero|dbsync
> >> (update_env) D:fd0aec7399c9|dbsync] Sync: sync_create_localhost
> >> dbsync
> >> (update_env) D:fd0aec7399c9|dbsync] creating localhost
> >>
> >> dmesg logs seem to suggest that xapi is crashing during startup.
> >>
> >> [ 9.092377] xapi[2813]: segfault at 0 ip 085bc286 sp bf80ae30 error
> 4 in xapi[8048000+59f000]
> >> [ 9.869971] xapi[2943]: segfault at 0 ip 085bc286 sp bf8ec450 error
> 4 in xapi[8048000+59f000]
> >>
> >> I looked the XAPI code to see where it fails and I don't see any
> >> logs after the following code point in ocaml / xapi /
> >> dbsync_slave.ml
> >>
> >> let create_localhost ~__context info =
> >> let ip = get_my_ip_addr ~__context in
> >>
> >> I confirmed to see that "ifconfig xenbr0" has a valid management IP
> address and should not fail.
> >>
> >> How do I debug this crash further. Are there any ways to look at
> >> the
> stack trace where XAPI crashed. Any pointers to debug this further
> will be very helpful.
> >>
> >> -Ranjeet
> >>
> >>
> >> _______________________________________________
> >> Xen-api mailing list
> >> Xen-api@lists.xen.org
> >> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
> >
> >
> >
>
>
>
>
>
> _______________________________________________
> Xen-api mailing list
> Xen-api@lists.xen.org
> http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________
Xen-api mailing list
Xen-api@lists.xen.org
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api