Mailing List Archive

weird BGP cisco-ism?
I have a Cisco 7505 which is advertising about 50 routes to about 40
peers at mae-west, and a few others. One set of customers has been complaining
that their connectivity is going away right at that router, and then coming
back. Narrowed the set of customers down to a single CIDR block, at
204.147.224.0/20.

So, some of our peers are claiming that the route is flapping... that's
weird, we have them all nailed up to static routes... especially the CIDR
blocks. So I wrote a tool which you can peer a router with, and it watches
the BGP traffic and prints anything it gets, formatted, to standard out.

My Cisco is sending fresh advertisements every 10-30 minutes for that route,
and not for any other of the routes it has, and it appeared to be all the same,
but on careful examination, it appears that each advertisement reflects a
change in the MULTI_EXIT_DISC from 0x00000000 to 0x00000014 and then back
again in the next advertisement.

What the heck am I seeing here? Is someone's flap damping code seeing the
repeated advertisements and suppressing me? Is my Cisco going crazy?

-matthew kaufman
matthew@scruz.net
Re: weird BGP cisco-ism? [ In reply to ]
Original message <199707110721.HAA01250@ice.genuity.net>
From: Danny McPherson <danny@genuity.net>
Date: Jul 11, 0:21
Subject: Re: weird BGP cisco-ism?
>
>
> if the primary route becomes unavailable and routing falls over to the "nailed
> up" route, a bgp update is still sent. (if dampening is enabled) this update
> is recorded by ebgp peers as a "flap". i'd guess from your message below that
> when this occurs your router(s) are also changing the med attached to the
> prefix, which seems normal to me...

WHAT?!?
This is ridiculous. There *must* be a way to have routes truly "nailed up"
such that flaps do not occur. I have static routes in there precisely
because I want to be friendly and reduce the number of routing announcements
which occur when interfaces transition (I have dozens and dozens of
people with class C networks attached via unreliable dialup lines)

> i'd suggest you have a look at the stability of the interface to which the
> primary route is attached (?carrier transitions, interface resets, etc...?).
> you might also consider breaking the primary route (/20) into 2 /21 blocks
> internally and allowing the longer "nailed up" route to be the permanent
> source of the /20 advertisement. for example:

However, in this case, I don't have *anything* as the "primary route".
In fact, the very first /24 of the route is currently unused and goes
nowhere... and the rest of the block goes as /24's, /29's and the like to
various people who connect and disconnect all the time... just like I have
on several other CIDR blocks I've got. But *this* one, and *only* this one,
insists on sending a readvertisement.

So, to readdress my first paragraph, apparently these static routes as
I have them configured do just the trick, since if they didn't, I'd see
lots of other things flapping on my BGP monitor software I just wrote,
but instead I only see this one. Sounds like a real bug of some sort.

-matthew
Re: weird BGP cisco-ism? [ In reply to ]
> add a default-metric under router bgp.. I think the source of the
> route in the routing table is changing for some reason and the box is
> conveying the metric as MED...
>
> --ravi

Did that. It went from advertising it with an MED of 0 to an MED of 20 and
back, to just readvertising it over and over with the fixed default MED
I'd set (1).

The sad thing is that until flap damping code was added, this sort of
bug was masked... but now I have a few dozen customers totally offline.

-matthew
Re: weird BGP cisco-ism? [ In reply to ]
if the primary route becomes unavailable and routing falls over to the "nailed
up" route, a bgp update is still sent. (if dampening is enabled) this update
is recorded by ebgp peers as a "flap". i'd guess from your message below that
when this occurs your router(s) are also changing the med attached to the
prefix, which seems normal to me...

i'd suggest you have a look at the stability of the interface to which the
primary route is attached (?carrier transitions, interface resets, etc...?).
you might also consider breaking the primary route (/20) into 2 /21 blocks
internally and allowing the longer "nailed up" route to be the permanent
source of the /20 advertisement. for example:

rather than:

ip route 204.147.224.0 255.255.240.0 <interface> !"primary" route
ip route 204.147.224.0 255.255.240.0 null0 254 !"nailed up" route
!
router bgp <as>
network 204.147.224.0 mask 255.255.240

try this:

ip route 204.147.224.0 255.255.248.0 <interface> !"primary" route
ip route 204.147.232.0 255.255.248.0 <interface> !"primary" route
ip route 204.147.224.0 255.255.240.0 null0 <admin. distance> !"nailed up"
route
!
router bgp <as>
network 204.147.224.0 mask 255.255.240

although correcting the stability problem is the correct solution.

-danny


>
> I have a Cisco 7505 which is advertising about 50 routes to about 40
> peers at mae-west, and a few others. One set of customers has been complaining
> that their connectivity is going away right at that router, and then coming
> back. Narrowed the set of customers down to a single CIDR block, at
> 204.147.224.0/20.
>
> So, some of our peers are claiming that the route is flapping... that's
> weird, we have them all nailed up to static routes... especially the CIDR
> blocks. So I wrote a tool which you can peer a router with, and it watches
> the BGP traffic and prints anything it gets, formatted, to standard out.
>
> My Cisco is sending fresh advertisements every 10-30 minutes for that route,
> and not for any other of the routes it has, and it appeared to be all the same,
> but on careful examination, it appears that each advertisement reflects a
> change in the MULTI_EXIT_DISC from 0x00000000 to 0x00000014 and then back
> again in the next advertisement.
>
> What the heck am I seeing here? Is someone's flap damping code seeing the
> repeated advertisements and suppressing me? Is my Cisco going crazy?
>
> -matthew kaufman
> matthew@scruz.net
Re: weird BGP cisco-ism? [ In reply to ]
In cisco.external.nanog you write:


>I have a Cisco 7505 which is advertising about 50 routes to about 40
>peers at mae-west, and a few others. One set of customers has been complaining
>that their connectivity is going away right at that router, and then coming
>back. Narrowed the set of customers down to a single CIDR block, at
>204.147.224.0/20.

>So, some of our peers are claiming that the route is flapping... that's
>weird, we have them all nailed up to static routes... especially the CIDR
>blocks. So I wrote a tool which you can peer a router with, and it watches
>the BGP traffic and prints anything it gets, formatted, to standard out.

>My Cisco is sending fresh advertisements every 10-30 minutes for that route,
>and not for any other of the routes it has, and it appeared to be all the same,
>but on careful examination, it appears that each advertisement reflects a
>change in the MULTI_EXIT_DISC from 0x00000000 to 0x00000014 and then back
>again in the next advertisement.


add a default-metric under router bgp.. I think the source of the
route in the routing table is changing for some reason and the box is
conveying the metric as MED...

--ravi


>What the heck am I seeing here? Is someone's flap damping code seeing the
>repeated advertisements and suppressing me? Is my Cisco going crazy?

>-matthew kaufman
> matthew@scruz.net
Re: weird BGP cisco-ism? [ In reply to ]
> > add a default-metric under router bgp.. I think the source of the
> > route in the routing table is changing for some reason and the box is
> > conveying the metric as MED...
> >
> > --ravi
>
> Did that. It went from advertising it with an MED of 0 to an MED of 20 and
> back, to just readvertising it over and over with the fixed default MED
> I'd set (1).


This indicates that your flap is fixed.. the neighboring AS will look
at these advertisments as duplicate and discard it (and hence no
dampening).

To avoid readvertising the same prefix, you might want to check if
there is any other dynamic protocol that is installing that prefix in
the routing table and removing it (could be a OSPF subnet...)


>
> The sad thing is that until flap damping code was added, this sort of
> bug was masked... but now I have a few dozen customers totally offline.
>


As soon as the reuse interval kicks in they should be fine..

--ravi


> -matthew
>
>