Mailing List Archive

repeated bgpd hangs
Hi,

bgpd keeps hanging every few days. All peerings are dropped, and
telnetting to the bgpd port doesn't give any output at all. It's done
that about once every three days for the few months we have it running.

I've seen this bug since zebra 0.93b (first version I tried.) In fact,
I downgraded this box to zebra to make sure it wasn't just a quagga
problem. quagga 0.96.3 has the exactly the same problem.

This in on a box with about 15 peerings, one eBGP and 14 iBGP.

Has this been fixed in 0.96.4?

I'd be happy to run debug patches.


cheers,
Lennert
Re: repeated bgpd hangs [ In reply to ]
On Fri, 7 Nov 2003, Lennert Buytenhek wrote:

> Hi,
Hi,

> bgpd keeps hanging every few days. All peerings are dropped, and
> telnetting to the bgpd port doesn't give any output at all. It's done
> that about once every three days for the few months we have it running.

What CPU do you have in your router? AFAIK Zebra/Quagga has an "issue"
that, when peer goes, for qute long time bgpd is performing route
delection/change without sending BGP KeepAlavies and reading data from
telnet connection. On my P3 550 it takes about 30-50 sec to remove one
neigh. So, after a while, other peers may decide to drop bgp connection
since they do not receive BGP packets from your router.

Best regards,

Krzysztof Olêdzki
Re: repeated bgpd hangs [ In reply to ]
On Fri, Nov 07, 2003 at 01:44:10PM +0100, Krzysztof Oledzki wrote:

> > bgpd keeps hanging every few days. All peerings are dropped, and
> > telnetting to the bgpd port doesn't give any output at all. It's done
> > that about once every three days for the few months we have it running.
>
> What CPU do you have in your router?

It's a P4 Xeon 2.4GHz.


> AFAIK Zebra/Quagga has an "issue"
> that, when peer goes, for qute long time bgpd is performing route
> delection/change without sending BGP KeepAlavies and reading data from
> telnet connection. On my P3 550 it takes about 30-50 sec to remove one
> neigh. So, after a while, other peers may decide to drop bgp connection
> since they do not receive BGP packets from your router.

That makes sense. But when that happens, bgpd is supposed to recover,
right? In this case it never recovers.


thanks,
Lennert
Re: repeated bgpd hangs [ In reply to ]
On Fri, 7 Nov 2003, Lennert Buytenhek wrote:

> On Fri, Nov 07, 2003 at 01:44:10PM +0100, Krzysztof Oledzki wrote:
>
> > > bgpd keeps hanging every few days. All peerings are dropped, and
> > > telnetting to the bgpd port doesn't give any output at all. It's done
> > > that about once every three days for the few months we have it running.
> >
> > What CPU do you have in your router?
>
> It's a P4 Xeon 2.4GHz.
>
>
> > AFAIK Zebra/Quagga has an "issue"
> > that, when peer goes, for qute long time bgpd is performing route
> > delection/change without sending BGP KeepAlavies and reading data from
> > telnet connection. On my P3 550 it takes about 30-50 sec to remove one
> > neigh. So, after a while, other peers may decide to drop bgp connection
> > since they do not receive BGP packets from your router.
>
> That makes sense. But when that happens, bgpd is supposed to recover,
> right? In this case it never recovers.

Which OS do you use? If fox example in linux you can attach strace to
process, using -p switch, and check what it is doing.

Best regards,

Krzysztof Olêdzki
Re: repeated bgpd hangs [ In reply to ]
On Fri, Nov 07, 2003 at 02:13:16PM +0100, Krzysztof Oledzki wrote:

> > > AFAIK Zebra/Quagga has an "issue"
> > > that, when peer goes, for qute long time bgpd is performing route
> > > delection/change without sending BGP KeepAlavies and reading data from
> > > telnet connection. On my P3 550 it takes about 30-50 sec to remove one
> > > neigh. So, after a while, other peers may decide to drop bgp connection
> > > since they do not receive BGP packets from your router.
> >
> > That makes sense. But when that happens, bgpd is supposed to recover,
> > right? In this case it never recovers.
>
> Which OS do you use? If fox example in linux you can attach strace to
> process, using -p switch, and check what it is doing.

I'm on linux. I'm fairly proficient in C so I should be able to figure
out what's going wrong, it's just that when it goes down, I need to have
it back up again as soon as I can.

I wondered if this is a known issue -- if it is not, I shall investigate.


--L