Mailing List Archive

Re: [zebra 20304] Re: ANNOUNCE: Quagga Routing Suite release0.96
On Wed, 20 Aug 2003, Kunihiro Ishiguro wrote:

> For collision detection, if you want to fix it, please read spec and
> just concentrate on Zebra code.

Ok. I've been looking at this...

6.8 states:

Upon receipt of an OPEN message, the local system must examine all
of its connections that are in the OpenConfirm state. A BGP speaker
may also examine connections in an OpenSent state if it knows the BGP
Identifier of the peer by means outside of the protocol. If among
these connections there is a connection to a remote BGP speaker whose
BGP Identifier equals the one in the OPEN message, then the local
system performs the following collision resolution procedure:

bgp_collision_detect() loops through the peer loop and looks for:

if (peer != new
&& (peer->status == OpenConfirm || peer->status == OpenSent)
&& sockunion_same (&peer->su, &new->su))

Firstly:

- the RFC states

" A BGP speaker may also examine connections in an OpenSent state if
it knows the BGP Identifier of the peer by means outside of the
protocol."

I'm not sure of the implications of this, but I dont see how bgpd
fulfills "by means outside of the protocol".

OpenSent should not match, unless there's an explanation for the
implication of the comment in the RFC and why bgpd chooses to also
match on OpenSent connections.

OpenSent state connections will time out via the connect retry timer
if nothing further happens and retry, further the /other/ peer (if in
OpenSent) will be receiving an Open and go into OpenConfirm and then
do collision detection and close one of its connections - so the
OpenSent connection will either receive an Open (other side closes
its OpenSent connection, sends Open on this connection and state goes
to OpenConfirm) or a BGP_NOTIFY_CEASE if i understand correctly (this
side closes its OpenSent).

Secondly:

- the code checks for /connections/ which match, however, the RFC
states:

"whose BGP Identifier equals the one in the OPEN message"

I do not see why the connections should match. The whole point is to
run the collision detection mechanism on connections which possibly
are /not/ the same connection as the connection which caused the peer
state to go to OpenConfirm.

So, from my reading of it, the loop in bgp_collision_detect() should
be looking for:

if (peer != new
&& peer->status == OpenConfirm
&& peer->remote_id.s_addr == new->remote_id.s_addr)

unfortunately i do not have the resources to test such a change.

Also, bgp_open_receive() (which uses bgp_collision_detect()) has the
following:

/* Hack part. */
if (CHECK_FLAG (peer->sflags, PEER_STATUS_ACCEPT_PEER))
{
if (ret == 0 && realpeer->status != Active
&& realpeer->status != OpenSent
&& realpeer->status != OpenConfirm)
{
if (BGP_DEBUG (events, EVENTS))
zlog_info ("%s [Event] peer's status is %s close connection",
realpeer->host, LOOKUP (bgp_status_msg, peer->status));
return -1;
}

AFAICT, it is not valid to receive an OPEN message when state is
Active. It is valid to receive a /TCP/ Open event. The FSM should
take care of this.

Upon receiving an Open when in state Active, resources should
released and state should go back to Idle. So, I suspect the
'realpeer->status != Active' should be dropped - also, the log
message says "close connection", but that isnt done, nor is it done
for connections closed in collision detection, we seem to rely on the
remote side closing the connection.

I think part of the problem is possibly the fact bgpd creates a dummy
peer for tcp connection open (ie connections accepted by bgpd). This
means the bgp peer list potentially has /two/ peer structures for a
given peer, the true peer structure (created by bgpd to connect out
to the peer) and the dummy peer (created by inbound connection). I
think this should die and instead we should stick the "peer
information" part of struct peer into a seperate struct
(peer_information?) and add 2 instances of struct peer_information to
struct peer: info_accept and info_local with a final pointer to point
to the authoratitive peer_information struct (info).

This could get rid of the hacky copy of the peer structure in
bgp_open_receive(), instead it can just update the info pointer to
point to whichever of info_accept and info_local is to be used.

Also, shouldnt nearly all of the bgp_ignore's in the FSM state table
actually be bgp_stop? :)

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
USENET would be a better laboratory is there were more labor and less oratory.
-- Elizabeth Haley
Re: [zebra 20304] Re: ANNOUNCE: Quagga Routing Suite release0.96 [ In reply to ]
On 23 Aug 2003, Ruud de Rooij wrote:

> Hello Paul,
>
> In our network, we've seen some rare cases with Zebra 0.93b where
> after a BGP session flap, the session would not come back up.
> Doing a "show ip bgp nei" would show TWO neighbor entries for the
> peer IP. Removing the neighbor from the configuration only removes
> one of them. The only way to get the situation back to normal is
> to kill and restart bgpd. It sounds like this behaviour is the
> result of what you describe above.

yes, it does. In theory the collision detection should close the
'dummy' peer connection down, but obviously collision avoidance is a
bit borked unfortunately.

i think i know pretty much what has to be done to fix it, and i could
put a patch together tomorrow - but i have no means of testing it.

> Regards,
>
> Ruud de Rooij

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
IBM Advanced Systems Group -- a bunch of mindless jerks, who'll be first
against the wall when the revolution comes...
-- with regrets to D. Adams