Hello,
For some time now we've had problems with the zebra daemon making some
valid iBGP routes inactive. This is a problem in Zebra 0.93b (which
we're currently running), but I have just verified, that this problem
still exists in Quagga 0.96.5 (which we're in the process of upgrading
to). To illustrate it, I have some real data from our routers:
One of our BGP routers (212.97.129.4) has an eBGP peering with
192.38.7.1, and this router announces the 192.38.0.0/17 prefix to us
(and several others, but this is the only the prefix from this peer with
the problem), and this is properly added to the kernel routing table on
212.97.129.4. But on our other BGP routers, the following happens:
bgpd# show ip bgp 192.38.0.0/17
BGP routing table entry for 192.38.0.0/17
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
1835
192.38.7.1 (metric 1) from 212.97.129.4 (212.97.129.4)
Origin IGP, metric 0, localpref 200, valid, internal, best
Last update: Mon Apr 26 10:21:22 2004
zebra# show ip route 192.38.0.0/17
Routing entry for 192.38.0.0/17
Known via "bgp", distance 200, metric 0
Last update 03w5d12h ago
192.38.7.1 inactive
All other prefixes from the peer are fine, but this particular one is
marked as inactive by the zebra daemon. As far as I can tell, this
happens, because 192.38.7.1 (the peer address) is inside 192.38.0.0/17
(the prefix). This also happens for all other prefixes received through
peers, whose address is inside the prefix.
I did some digging in the source code and traced this behaviour to the
nexthop_active_ipv4 function in zebra/zebra_rib.c. The BGP routes were
made inactive because of the following lines in this function:
/* If lookup self prefix return immidiately. */
if (rn == top)
return 0;
As far as I can tell, this check is only done to speed things up and
avoid running through all the other tests, so I tried to disable it,
which made the problematic routes inactive, while other routes that were
previously (correctly) filtered by this check now fell through to the
final "return 0" in the function after one more iteration of the while loop.
I'm a bit surprised, that others apparently haven't hit (or at least
noticed) this bug, so I'm wondering if this could be caused by some
wierd configuration on our routers, or this is in fact a bug in the
zebra daemon?
Also, if it is a bug, is disabling the code above the proper fix, or
does it have a purpose, that I'm missing?
Finally, if this is the proper fix, the code should probably be removed
from nexthop_active_ipv6 too.
Regards,
Anders K. Pedersen
--
The From: and Reply-To: addresses are internal news2mail gateway addresses.
Reply to the list or to "Anders K. Pedersen" <akp@cohaesio.com>
For some time now we've had problems with the zebra daemon making some
valid iBGP routes inactive. This is a problem in Zebra 0.93b (which
we're currently running), but I have just verified, that this problem
still exists in Quagga 0.96.5 (which we're in the process of upgrading
to). To illustrate it, I have some real data from our routers:
One of our BGP routers (212.97.129.4) has an eBGP peering with
192.38.7.1, and this router announces the 192.38.0.0/17 prefix to us
(and several others, but this is the only the prefix from this peer with
the problem), and this is properly added to the kernel routing table on
212.97.129.4. But on our other BGP routers, the following happens:
bgpd# show ip bgp 192.38.0.0/17
BGP routing table entry for 192.38.0.0/17
Paths: (1 available, best #1, table Default-IP-Routing-Table)
Not advertised to any peer
1835
192.38.7.1 (metric 1) from 212.97.129.4 (212.97.129.4)
Origin IGP, metric 0, localpref 200, valid, internal, best
Last update: Mon Apr 26 10:21:22 2004
zebra# show ip route 192.38.0.0/17
Routing entry for 192.38.0.0/17
Known via "bgp", distance 200, metric 0
Last update 03w5d12h ago
192.38.7.1 inactive
All other prefixes from the peer are fine, but this particular one is
marked as inactive by the zebra daemon. As far as I can tell, this
happens, because 192.38.7.1 (the peer address) is inside 192.38.0.0/17
(the prefix). This also happens for all other prefixes received through
peers, whose address is inside the prefix.
I did some digging in the source code and traced this behaviour to the
nexthop_active_ipv4 function in zebra/zebra_rib.c. The BGP routes were
made inactive because of the following lines in this function:
/* If lookup self prefix return immidiately. */
if (rn == top)
return 0;
As far as I can tell, this check is only done to speed things up and
avoid running through all the other tests, so I tried to disable it,
which made the problematic routes inactive, while other routes that were
previously (correctly) filtered by this check now fell through to the
final "return 0" in the function after one more iteration of the while loop.
I'm a bit surprised, that others apparently haven't hit (or at least
noticed) this bug, so I'm wondering if this could be caused by some
wierd configuration on our routers, or this is in fact a bug in the
zebra daemon?
Also, if it is a bug, is disabling the code above the proper fix, or
does it have a purpose, that I'm missing?
Finally, if this is the proper fix, the code should probably be removed
from nexthop_active_ipv6 too.
Regards,
Anders K. Pedersen
--
The From: and Reply-To: addresses are internal news2mail gateway addresses.
Reply to the list or to "Anders K. Pedersen" <akp@cohaesio.com>