Mailing List Archive

bgpd dump patches
Hi,

for some time, the RIS project (www.ripe.net/ris) has been running
patched versions of quagga on its route collectors. These (very simple)
patches have been available at http://www.ris.ripe.net/source/ for a few
months but have not yet been merged into quagga.

We are also testing another patch which improves the quality of IPv6
dumps by including the IPv6 nexthop address in "dump bgp routes-mrt"
dumps and by providing the correct IPv6 addresses of peers using IPv6
peerings in "dump bgp updates" dumps.

So we would like to submit all the patches that we use for inclusion in
the main quagga codebase. The patches have been ported to the latest CVS
and have the following effects:

quagga-bgp_dump-rollup.patch
----------------------------

1. Dumps are produced at fixed times during the day and not at fixed
time intervals after bgpd is started.
2. If a dump file can't be opened, bgpd will retry at the next scheduled
time instead of giving up forever.
3. RIB dumps are closed as soon as they are not needed any more.

quagga-dump-v6-nexthop-and-peers.patch
--------------------------------------
1. The IP addresses of peers in "dump bgp updates" dumps are correct: if
it's an IPv4 peering, the peer's IPv4 address is dumped; if it's an IPv6
peering, the IPv6 address is dumped.
2. IPv6 next hop values are included in "dump bgp routes-mrt" dumps.
This is done by constructing an artificial MP_REACH_NLRI attribute with
this information and adding it to each IPv6 prefix dumped.


Regards,
Lorenzo
Re: bgpd dump patches [ In reply to ]
Hi Lorenzo,

On Tue, 14 Oct 2003, Lorenzo Colitti wrote:

> Hi,
>
> for some time, the RIS project (www.ripe.net/ris) has been running
> patched versions of quagga on its route collectors. These (very
> simple) patches have been available at
> http://www.ris.ripe.net/source/ for a few months but have not yet
> been merged into quagga.

Sounds interesting.

> We are also testing another patch which improves the quality of
> IPv6 dumps by including the IPv6 nexthop address in "dump bgp
> routes-mrt" dumps and by providing the correct IPv6 addresses of
> peers using IPv6 peerings in "dump bgp updates" dumps.

Sounds useful, but are there people out there with tools/scripts/etc
who depend on the existing format? And how would they be affected?
AFAICT, it doesnt change AFI_IP dumps.

> So we would like to submit all the patches that we use for inclusion in
> the main quagga codebase. The patches have been ported to the latest CVS
> and have the following effects:
>
> quagga-bgp_dump-rollup.patch
> ----------------------------
>
> 1. Dumps are produced at fixed times during the day and not at fixed
> time intervals after bgpd is started.

sounds like a good idea, one question though - what is interval2 for?
it doesnt appear to be used for anything.

> 2. If a dump file can't be opened, bgpd will retry at the next scheduled
> time instead of giving up forever.
> 3. RIB dumps are closed as soon as they are not needed any more.

good ideas.

I'll accept this patch once i figure what interval2 is for. :)

> quagga-dump-v6-nexthop-and-peers.patch

> --------------------------------------
> 1. The IP addresses of peers in "dump bgp updates" dumps are correct: if
> it's an IPv4 peering, the peer's IPv4 address is dumped; if it's an IPv6
> peering, the IPv6 address is dumped.

as opposed to present where IPv4 is dumped regardless?

> 2. IPv6 next hop values are included in "dump bgp routes-mrt" dumps.
> This is done by constructing an artificial MP_REACH_NLRI attribute with
> this information and adding it to each IPv6 prefix dumped.

sounds reasonable.

aside from wondering whether anyone would presently rely on IPv4
nexthops for IPv6 routes being dumped - eg IPv6 address family
prefixes can be carried over IPv4 peerings surely? - the patch looks
grand.

Though, related to the IPv6 BGP prefixes over IPv4 peerings:

@@ -332,7 +332,7 @@
stream_putw (obuf, peer->as);
stream_putw (obuf, peer->local_as);

- if (peer->afc[AFI_IP][SAFI_UNICAST])
+ if (peer->su.sa.sa_family == AF_INET)
{
stream_putw (obuf, peer->ifindex);
stream_putw (obuf, AFI_IP);
@@ -345,7 +345,7 @@
stream_put (obuf, empty, IPV4_MAX_BYTELEN);
}
#ifdef HAVE_IPV6
- else if (peer->afc[AFI_IP6][SAFI_UNICAST])
+ else if (peer->su.sa.sa_family == AF_INET6)
{
/* Interface Index and Address family. */
stream_putw (obuf, peer->ifindex);

Surely testing for configured address-family, rather than the socket
family, is the more correct behaviour?

> Regards,
> Lorenzo

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
All the evidence concerning the universe has not yet been collected,
so there's still hope.
Re: bgpd dump patches [ In reply to ]
Paul Jakma wrote:
>>We are also testing another patch which improves the quality of
>>IPv6 dumps by including the IPv6 nexthop address in "dump bgp
>>routes-mrt" dumps and by providing the correct IPv6 addresses of
>>peers using IPv6 peerings in "dump bgp updates" dumps.
>
> Sounds useful, but are there people out there with tools/scripts/etc
> who depend on the existing format? And how would they be affected?
> AFAICT, it doesnt change AFI_IP dumps.

That's correct, the patch only adds the extra attribute to AFI_IP6
dumps. We have tested route_btoa on the resulting dumps, since that is
what many people use, and the results are as follows:

If route_btoa is compiled with IPv6 support (e.g. on Solaris), it works
fine. It even picks up the extra info added by our patches.

If route_btoa is compiled without IPv6 support, it has two problems:

1. Sometimes (depends on context) it quits halfway through the
IPv6 portion of RIB dumps with a "MSG_NULL" message. We don't
see this as a big problem, since:quagga dumps all AFI_IP routes
before starting on AFI_IP6, soo route_btoa will already have
decoded all IPv4 routes when this occurs. Also, if route_btoa
is compiled without IPv6 support, it can't decode AFI_IP6
records anyway (it just prints empty records). So you're
actually not losing any information unless you are simply
counting the number of INET6 records.

2. It wrongly decodes all IPv6 updates as NULL messages with
garbage FROM: and TO: IPv4 addresses. Without the patch, it
does marginally better, decoding IPv6 updates as
empty UPDATE messages with no FROM: or TO:.
Again, it can't decode the IPv6 stuff anyway, so the only
way someone can run into problems is if he is counting the
number of updates or if he is parsing the IPv4 addresses
from NULL records, which he probably shouldn't be doing.

So, by and large, if people only look at IPv4 and do a minimum of sanity
checking (e.g. not blindly supposing that everything is an IPv4 update),
everything is fine.

We have a simple patch for route_btoa that fixes these problems. The
patch is only cosmetic though; route_btoa still can't decode IPv6 dumps
unless it is compiled with IPv6 support. it is available at:

http://www.ris.ripe.net/source/mrt-patches/mrt-2.2.2a-v6-wallpaper.patch

These problems seem to be due to the fact that route_btoa takes an "all
or nothing" approach to IPv6 and unknown attributes in general: instead
of skipping over them, it often attempts to decode them, only to abort
in the middle, get out of sync, and bail out. This is probably because
the BGP library that route_btoa uses is written with a routing daemon,
not a decoding tool, in mind.

Probably, purpose-written decoding tools are more robust in the face of
unknown but well-formed input and cleanly skip over the parts they don't
understand.


>>1. Dumps are produced at fixed times during the day and not at fixed
>>time intervals after bgpd is started.
>
> sounds like a good idea, one question though - what is interval2 for?
> it doesnt appear to be used for anything.

Sorry, I made a mistake when forward porting the patch. The one attached
to this message should be correct.


>>quagga-dump-v6-nexthop-and-peers.patch
>
>>--------------------------------------
>>1. The IP addresses of peers in "dump bgp updates" dumps are correct: if
>>it's an IPv4 peering, the peer's IPv4 address is dumped; if it's an IPv6
>>peering, the IPv6 address is dumped.
>
> as opposed to present where IPv4 is dumped regardless?

I don't really understand the code, but it doesn't seem that way. For
example, my test environment consists of an IPv4 peering over which I
exchange IPv6 routes. The updates dumped are wrongly marked as AFI_IP6
and contain garbage IPv6 addresses in the source and destination IP
address fields.

I think that the code seems to be confusing the address family of the
NLRI information exchanged via BGP with the address family of the socket
the peering is run on, which of course may be different.


>>2. IPv6 next hop values are included in "dump bgp routes-mrt" dumps.
>>This is done by constructing an artificial MP_REACH_NLRI attribute with
>>this information and adding it to each IPv6 prefix dumped.
>
>
> sounds reasonable.
>
> aside from wondering whether anyone would presently rely on IPv4
> nexthops for IPv6 routes being dumped - eg IPv6 address family
> prefixes can be carried over IPv4 peerings surely? - the patch looks
> grand.

Yes, any other address family's routes can be carried over an IPv4
peering. But the relevant next hop is the one contained in the MP_NLRI
attribute, and is of the same family as the routes exchanged. In fact,
if an IPv4 peering only exchanges IPv6 routes, that the IPv4 next hop
should not be included all, as stated in RFC 2858:

> An UPDATE message that carries no NLRI, other than the one encoded in
> the MP_REACH_NLRI attribute, should not carry the NEXT_HOP attribute.
> If such a message contains the NEXT_HOP attribute, the BGP speaker
> that receives the message should ignore this attribute.

An IPv4 next hop makes sense if an update contains both IPv4 and IPv6
routes, but here we're only dumping one IPv6 route, so only the IPv6
next hop is needed.

A different matter, of course, is the address of the peer, which may be
an IPv4 address even for an IPv6 prefix. But we don't store the IP
address of the peer in RIB dumps anyway, not even for IPv4.


> - else if (peer->afc[AFI_IP6][SAFI_UNICAST])
> + else if (peer->su.sa.sa_family == AF_INET6)
> {
> /* Interface Index and Address family. */
> stream_putw (obuf, peer->ifindex);
>
> Surely testing for configured address-family, rather than the socket
> family, is the more correct behaviour?

Hmmm... but from what I see in the code, peer->su _is_ determined
directly from the configuration. peer_remote_as_vty() calls
str2sockunion() on the neighbor address string, and obtains su. Then it
passes su to peer_remote_as(), which passes it to peer_create, which
actually creates the peer and stuffs su into peer->su. The peer's
configured address family doesn't seem to be stored anywhere else. Am I
missing something?



Regards,
Lorenzo
Re: bgpd dump patches [ In reply to ]
On Thu, 16 Oct 2003, Lorenzo Colitti wrote:

> > Sounds useful, but are there people out there with tools/scripts/etc
> > who depend on the existing format? And how would they be affected?
> > AFAICT, it doesnt change AFI_IP dumps.
>
> That's correct, the patch only adds the extra attribute to AFI_IP6
> dumps. We have tested route_btoa on the resulting dumps, since that
> is what many people use, and the results are as follows:
>
> If route_btoa is compiled with IPv6 support (e.g. on Solaris), it works
> fine. It even picks up the extra info added by our patches.
>
> If route_btoa is compiled without IPv6 support, it has two problems:
>
> 1. Sometimes (depends on context) it quits halfway through the
> IPv6 portion of RIB dumps with a "MSG_NULL" message. We don't
> see this as a big problem, since:quagga dumps all AFI_IP routes
> before starting on AFI_IP6, soo route_btoa will already have
> decoded all IPv4 routes when this occurs. Also, if route_btoa
> is compiled without IPv6 support, it can't decode AFI_IP6
> records anyway (it just prints empty records). So you're
> actually not losing any information unless you are simply
> counting the number of INET6 records.
>
> 2. It wrongly decodes all IPv6 updates as NULL messages with
> garbage FROM: and TO: IPv4 addresses. Without the patch, it
> does marginally better, decoding IPv6 updates as
> empty UPDATE messages with no FROM: or TO:.
> Again, it can't decode the IPv6 stuff anyway, so the only
> way someone can run into problems is if he is counting the
> number of updates or if he is parsing the IPv4 addresses
> from NULL records, which he probably shouldn't be doing.
>
> So, by and large, if people only look at IPv4 and do a minimum of
> sanity checking (e.g. not blindly supposing that everything is an
> IPv4 update), everything is fine.

Fair enough. I suspect anyone who's vaguely interested in IPv6 and
actually has IPv6 address-family peerings wouldnt object to compiling
IPv6 support into mrt. Anyone who deliberately does not compile in
IPv6 support is unlikely to have IPv6 address-family enabled on
peerings.


> These problems seem to be due to the fact that route_btoa takes an
> "all or nothing" approach to IPv6 and unknown attributes in
> general: instead of skipping over them, it often attempts to decode
> them, only to abort in the middle, get out of sync, and bail out.
> This is probably because the BGP library that route_btoa uses is
> written with a routing daemon, not a decoding tool, in mind.

maybe, though even a routing daemon must be able to deal with bgp
attributes it doesnt recognise (transitive at least, origin ->
incomplete).

> Sorry, I made a mistake when forward porting the patch. The one attached
> to this message should be correct.

thank you :)

> I don't really understand the code, but it doesn't seem that way.
> For example, my test environment consists of an IPv4 peering over
> which I exchange IPv6 routes. The updates dumped are wrongly marked
> as AFI_IP6 and contain garbage IPv6 addresses in the source and
> destination IP address fields.

> I think that the code seems to be confusing the address family of
> the NLRI information exchanged via BGP with the address family of
> the socket the peering is run on, which of course may be different.

thats basically the case i was wondering about. but its wrong at
present anyway, you've noted.

> An IPv4 next hop makes sense if an update contains both IPv4 and
> IPv6 routes, but here we're only dumping one IPv6 route, so only
> the IPv6 next hop is needed.

ok, makes sense now.

> A different matter, of course, is the address of the peer, which
> may be an IPv4 address even for an IPv6 prefix. But we don't store
> the IP address of the peer in RIB dumps anyway, not even for IPv4.

good.

> > - else if (peer->afc[AFI_IP6][SAFI_UNICAST])
> > + else if (peer->su.sa.sa_family == AF_INET6)
> > {
> > /* Interface Index and Address family. */
> > stream_putw (obuf, peer->ifindex);
> >
> > Surely testing for configured address-family, rather than the socket
> > family, is the more correct behaviour?
>
> Hmmm... but from what I see in the code, peer->su _is_ determined
> directly from the configuration. peer_remote_as_vty() calls
> str2sockunion() on the neighbor address string, and obtains su.
> Then it passes su to peer_remote_as(), which passes it to
> peer_create, which actually creates the peer and stuffs su into
> peer->su. The peer's configured address family doesn't seem to be
> stored anywhere else. Am I missing something?

ah.. I just read the patch and didnt have enough context - i assumed
these changes were for something like deciding which address family
of updates to dump. but its for dumping peer information. your patch
make sense - you want to dump the information of the actual peering
connection.

thanks for your patches!

> Regards,
> Lorenzo

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
There are new messages.
Re: bgpd dump patches [ In reply to ]
Paul Jakma wrote:
> Fair enough. I suspect anyone who's vaguely interested in IPv6 and
> actually has IPv6 address-family peerings wouldnt object to compiling
> IPv6 support into mrt. Anyone who deliberately does not compile in
> IPv6 support is unlikely to have IPv6 address-family enabled on
> peerings.

Actually, it may not be a matter of user choice. I have never been able
to compile route_btoa with IPv6 support on any Linux machine. But then,
neither have I have ever been able to compile route_btoa at all on
FreeBSD. So it might be my fault. :)

>>This is probably because the BGP library that route_btoa uses is
>>written with a routing daemon, not a decoding tool, in mind.
>
> maybe, though even a routing daemon must be able to deal with bgp
> attributes it doesnt recognise (transitive at least, origin ->
> incomplete).

Yes, indeed. I wonder if anyone has ever *really* used mrt as a routing
daemon... :)

> thanks for your patches!

Don't mention it. Thanks for your review and for applying them, and let
me know if something breaks... :)


Regards,
Lorenzo
Re: bgpd dump patches [ In reply to ]
On Tue, 21 Oct 2003, Lorenzo Colitti wrote:

> Actually, it may not be a matter of user choice. I have never been
> able to compile route_btoa with IPv6 support on any Linux machine.
> But then, neither have I have ever been able to compile route_btoa
> at all on FreeBSD. So it might be my fault. :)

Nah, i have a vague memory of trying to get mrt working myself, and
running into compile issues.

> Yes, indeed. I wonder if anyone has ever *really* used mrt as a
> routing daemon... :)

Have a look through the 6bone archives - iirc mrt was the cause of
many problems, particularly ghost routes.

> Don't mention it. Thanks for your review and for applying them, and
> let me know if something breaks... :)

Will do :)

> Regards,
> Lorenzo

Thanks!

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
It is impossible to defend perfectly against the attack of those who want
to die.