Mailing List Archive

recent ospfd silently exiting?
I updated sources yesterday, after this patchset (cvsps):

PatchSet 576
Date: 2004/07/26 20:27:51
Author: paul
Branch: HEAD
Tag: (none)
Log:
2004-07-26 Paul Jakma <paul@dishone.st>

* configure.ac: reenable tests/Makefile
* tests/Makefile.am: automake file for tests dir
* tests/.cvsignore: update

Members:
configure.ac:1.57->1.58
tests/.cvsignore:1.2->1.3
tests/Makefile.am:INITIAL->1.1


Twice now ospfd has silently exited; the system still has the routes
(showing a lack of keepalive and cleanup in zebra/ospfd connection!),
and there are no log messages. I've attached gdb to the restarted
process now. This is on NetBSD/i386 1.6.2 with two ethernets and ppp.
There are some ASE routes, but no opaque lsas AFAIK.

Looking at the cvsps output, this is the only thing that looks
complicated enough to cause trouble:

PatchSet 567
Date: 2004/07/23 16:13:48
Author: paul
Branch: HEAD
Tag: (none)
Log:
2004-07-23 Paul Jakma <paul@dishone.st>

* ospf_network.c: Replace PKTINFO/RECVIF with call to
setsockopt_pktinfo
* ospf_packet.c: Use getsockopt_pktinfo_ifindex and
SOPT_SIZE_CMSG_PKTINFO_IPV4.

Members:
ospfd/ChangeLog:1.32->1.33
ospfd/ospf_network.c:1.4->1.5
ospfd/ospf_packet.c:1.29->1.30

But on reading the diff, it really looks like pulling common code into
a function, and not any significant changes. And, ospfd works fine
before it dies.

Is anyone else running really recent ospfd? Does it work for you or
not?

--
Greg Troxel <gdt@ir.bbn.com>
Re: recent ospfd silently exiting? [ In reply to ]
I found that ospfd segfaulted.


(gdb) bt
#0 ospf_ls_upd_send_queue_event (thread=0xbfbfd5bc) at ospf_packet.c:3128
#1 0x480ec89e in thread_call (thread=0xbfbfd5bc) at thread.c:858
#2 0x80493bd in main (argc=2, argv=0xbfbfd698) at ospf_main.c:307
#3 0x8048f90 in ___start ()
(gdb) i fr
Stack level 0, frame at 0xbfbfd4e8:
eip = 0x48085380 in ospf_ls_upd_send_queue_event (ospf_packet.c:3128);
saved eip 0x480ec89e
called by frame at 0xbfbfd578
source language c.
Arglist at 0xbfbfd4e8, args: thread=0xbfbfd5bc
Locals at 0xbfbfd4e8, Previous frame's sp is 0x0
Saved registers:
ebx at 0xbfbfd4c0, ebp at 0xbfbfd4e8, esi at 0xbfbfd4c4, edi at 0xbfbfd4c8,
eip at 0xbfbfd4ec


3126 for (tn = update->head; tn; tn = nn)
3127 {
3128 nn = tn->next;
3129 ospf_ls_upd_queue_send (oi, update, rn->p.u.prefix4);
3130 }


(gdb) print update->head
$15 = (struct listnode *) 0x0
(gdb) print *update
$16 = {head = 0x0, tail = 0x0, count = 0, cmp = 0, del = 0}
(gdb) print tn
$17 = (struct listnode *) 0x6670736f
(gdb) print *tn
Error accessing memory address 0x6670736f: Invalid argument.
(gdb) print nn
$18 = (struct listnode *) 0x6670736f

So it seems update->head got bashed, since otherwise we wouldn't be at
line 3128 at all.


I skimmed ospf_ls_upd_queue_send, and it doesn't seem to be modifying
update.


I'm going to make clean and rebuild and see if that helps.

--
Greg Troxel <gdt@ir.bbn.com>
Re: recent ospfd silently exiting? [ In reply to ]
On Tue, 27 Jul 2004, Greg Troxel wrote:

> Looking at the cvsps output, this is the only thing that looks
> complicated enough to cause trouble:
>
> PatchSet 567
> Date: 2004/07/23 16:13:48
> Author: paul
> Branch: HEAD
> Tag: (none)
> Log:
> 2004-07-23 Paul Jakma <paul@dishone.st>
>
> * ospf_network.c: Replace PKTINFO/RECVIF with call to
> setsockopt_pktinfo
> * ospf_packet.c: Use getsockopt_pktinfo_ifindex and
> SOPT_SIZE_CMSG_PKTINFO_IPV4.
>
> Members:
> ospfd/ChangeLog:1.32->1.33
> ospfd/ospf_network.c:1.4->1.5
> ospfd/ospf_packet.c:1.29->1.30
>
> But on reading the diff, it really looks like pulling common code into
> a function, and not any significant changes. And, ospfd works fine
> before it dies.
>
> Is anyone else running really recent ospfd? Does it work for you or
> not?

Heh, well you're the first. I'm still working on that, and it's not
tested yet. ;) Though, it does only clean up stuff, unless i've done
something stupid the end-result should be exact same.

I'll update a few machines to CVS.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
What's all this brouhaha?
Re: recent ospfd silently exiting? [ In reply to ]
On Tue, 27 Jul 2004, Greg Troxel wrote:

> I skimmed ospf_ls_upd_queue_send, and it doesn't seem to be modifying
> update.

It doesnt, but it calls ospf_make_ls_upd() which does. That list loop
is bogus.. sigh, by time ospf_ls_upd_queue_send() returns list most
likely will be empty.

(unless you have a giant area and a router with so many OSPF active
links that its router-lsas are > MTU, which will never be sent, in
which case LSA will remain on LSA list forever as things stand.).

> I'm going to make clean and rebuild and see if that helps.

err... cvs update first - just fixed it i think.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
"Absolutely nothing should be concluded from these figures except that
no conclusion can be drawn from them."
(By Joseph L. Brothers, Linux/PowerPC Project)
Re: recent ospfd silently exiting? [ In reply to ]
I found that ospfd segfaulted.

With Paul's fix from yesterday, ospfd has been up for 18 hours and is
still going - thanks!

--
Greg Troxel <gdt@ir.bbn.com>