Mailing List Archive

ripd status
Hello,

What about other, ripd related, patches? I have been working with ripd
code for some time now - may patch is about 80% ready. It is a quite big
modification, but ripd is now able to keep more than just one route and
switch them when necessary. Some parts of the code were totaly rewrited so
I expect that something may be broken. It would be nice to make ripd in
quagga working before my big changes. For most ripd bugs we have fixes on
this list, can they be commited into cvs before quagga 0.96.5, Paul?

There is one bug left, but I hope Sowmini will fix it soon (see
[quagga-dev 490] Re: Problems with ripd in quagga-0.96.4).

Best regards,

Krzysztof Olêdzki
Re: ripd status [ In reply to ]
On Thu, 25 Dec 2003, Krzysztof Oledzki wrote:

> Hello,
>
> What about other, ripd related, patches? I have been working with
> ripd code for some time now - may patch is about 80% ready. It is a
> quite big modification, but ripd is now able to keep more than just
> one route and switch them when necessary. Some parts of the code
> were totaly rewrited so I expect that something may be broken. It
> would be nice to make ripd in quagga working before my big changes.
> For most ripd bugs we have fixes on this list, can they be commited
> into cvs before quagga 0.96.5, Paul?

The DISTANCE patch is the main outstanding patch correct? It looks
sane - {i,we}'ll test it out early new year and incorporate it.
Ditto for futher patches.

> There is one bug left, but I hope Sowmini will fix it soon (see
> [quagga-dev 490] Re: Problems with ripd in quagga-0.96.4).

Right yes.

> Best regards,
>
> Krzysztof Olędzki

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
"It runs like _x, where _x is something unsavory"
-- Prof. Romas Aleliunas, CS 435
Re: ripd status [ In reply to ]
>
> There is one bug left, but I hope Sowmini will fix it soon (see
> [quagga-dev 490] Re: Problems with ripd in quagga-0.96.4).
>

Thanks for reminding.. I'd forgotten to follow up on this one.
As it turns out, I see that [quagga-dev 428] has not been committed.

As far as I understood, the patch in [quagga-dev 428] reverted
the code to where it used to be in, say, 0.96.2.
I'll have to reproduce your problem to fix it- were you using ripv1?
Was this working with quagga 0.96.2?

--Sowmini
Re: ripd status [ In reply to ]
> From sowmini@quasimodo.East.Sun.COM Thu Dec 25 20:40:27 2003
>
> >
> > There is one bug left, but I hope Sowmini will fix it soon (see
> > [quagga-dev 490] Re: Problems with ripd in quagga-0.96.4).
> >
>

found out what the problem was. I was always closing send_sock,
whereas I should only have closed it for (!to). Here's the new patch
which worked for me on linux with ripv1. Can you please try and
confirm that this one works, so that Paul can commit it?

--Sowmini


===================================================================
RCS file: ripd/rip_interface.c,v
retrieving revision 1.13
diff -uwb -r1.13 ripd/rip_interface.c
--- ripd/rip_interface.c 2003/10/15 23:20:17 1.13
+++ ripd/rip_interface.c 2003/11/07 16:07:38
@@ -146,13 +146,18 @@
struct in_addr addr;
struct prefix_ipv4 *p;

+ if (connected != NULL)
+ {
if (if_pointopoint)
p = (struct prefix_ipv4 *) connected->destination;
else
p = (struct prefix_ipv4 *) connected->address;
-
addr = p->prefix;
-
+ }
+ else
+ {
+ addr.s_addr = INADDR_ANY;
+ }

if (setsockopt_multicast_ipv4 (sock, IP_MULTICAST_IF,
addr, 0, 0) < 0)
@@ -173,7 +178,10 @@

/* Address shoud be any address. */
from.sin_family = AF_INET;
+ if (connected)
addr = ((struct prefix_ipv4 *) connected->address)->prefix;
+ else
+ addr.s_addr = INADDR_ANY;
from.sin_addr = addr;
#ifdef HAVE_SIN_LEN
from.sin_len = sizeof (struct sockaddr_in);
@@ -182,7 +190,6 @@
if (ripd_privs.change (ZPRIVS_RAISE))
zlog_err ("rip_interface_multicast_set: could not raise privs");

- bind (sock, NULL, 0); /* unbind any previous association */
ret = bind (sock, (struct sockaddr *) & from, sizeof (struct sockaddr_in));
if (ret < 0)
{
===================================================================
RCS file: ripd/ripd.c,v
retrieving revision 1.11
diff -uwb -r1.11 ripd/ripd.c
--- ripd/ripd.c 2003/10/15 23:20:17 1.11
+++ ripd/ripd.c 2004/01/08 15:46:45
@@ -42,6 +42,11 @@
#include "ripd/ripd.h"
#include "ripd/rip_debug.h"

+/*
+ * The source address to be used when sending the packet
+ */
+struct connected *source_address;
+
extern struct zebra_privs_t ripd_privs;

/* RIP Structure. */
@@ -1237,7 +1242,7 @@
rip_send_packet (caddr_t buf, int size, struct sockaddr_in *to,
struct interface *ifp)
{
- int ret;
+ int ret, send_sock;
struct sockaddr_in sin;

/* Make destination address. */
@@ -1252,6 +1257,7 @@
{
sin.sin_port = to->sin_port;
sin.sin_addr = to->sin_addr;
+ send_sock = rip->sock;
}
else
{
@@ -1259,11 +1265,32 @@
sin.sin_port = htons (RIP_PORT_DEFAULT);
sin.sin_addr.s_addr = htonl (INADDR_RIP_GROUP);

- /* caller has set multicast interface */
+ /*
+ * we have to open a new socket for each packet because
+ * this is the most portable way to bind to a different
+ * source ipv4 address.
+ */
+ send_sock = socket(AF_INET, SOCK_DGRAM, 0);
+ if (send_sock < 0)
+ {
+ zlog_warn("could not create socket %s", strerror(errno));
+ return -1;
+ }
+
+ sockopt_broadcast (send_sock);
+ sockopt_reuseaddr (send_sock);
+ sockopt_reuseport (send_sock);
+#ifdef RIP_RECVMSG
+ setsockopt_pktinfo (send_sock);
+#endif /* RIP_RECVMSG */
+ rip_interface_multicast_set(send_sock, source_address,
+ if_is_pointopoint(ifp));
+ /* reset source address */
+ source_address = NULL;

}

- ret = sendto (rip->sock, buf, size, 0, (struct sockaddr *)&sin,
+ ret = sendto (send_sock, buf, size, 0, (struct sockaddr *)&sin,
sizeof (struct sockaddr_in));

if (IS_RIP_DEBUG_EVENT)
@@ -1273,6 +1300,8 @@
if (ret < 0)
zlog_warn ("can't send packet : %s", strerror (errno));

+ if (!to)
+ close(send_sock);
return ret;
}

@@ -1839,7 +1868,7 @@
return len;
}

-/* Make socket for RIP protocol. */
+/* Make socket for RIP protocol and bind it to the inaddr. */
int
rip_create_socket ()
{
@@ -1848,7 +1877,6 @@
struct sockaddr_in addr;
struct servent *sp;

- memset (&addr, 0, sizeof (struct sockaddr_in));

/* Set RIP port. */
sp = getservbyname ("router", "udp");
@@ -2358,8 +2386,7 @@
if (ifaddr->family != AF_INET)
continue;

- rip_interface_multicast_set(rip->sock, connected,
- if_is_pointopoint(ifp));
+ source_address = connected;
if (vsend & RIPv1)
rip_update_interface (ifp, RIPv1, route_type, ifaddr);
if (vsend & RIPv2)
@@ -2588,8 +2615,7 @@
if (p->family != AF_INET)
continue;

- rip_interface_multicast_set(rip->sock, connected,
- if_is_pointopoint(ifp));
+ source_address = connected;
if (rip_send_packet ((caddr_t) &rip_packet, sizeof (rip_packet),
to, ifp) != sizeof (rip_packet))
return -1;
Re: ripd status [ In reply to ]
Could you summarize the original problem that let to this proposed
change? That would perhaps be good to include in a ChangeLog entry.

+struct connected *source_address;

This is a global, and it seems to get set in rip_send_packet. I don't
follow this.

How do the sockets get cleaned up when interfaces are removed?

- bind (sock, NULL, 0); /* unbind any previous association */

I know this is a deleted line, but this raises the question of the
portability of 'unbind' - is that fialing what led to this?

(At first glance, the patch also looks like it isn't whitespace-clean,
too.)

--
Greg Troxel <gdt@ir.bbn.com>
Re: ripd status [ In reply to ]
On Fri, 9 Jan 2004, Greg Troxel wrote:

> +struct connected *source_address;
>
> This is a global, and it seems to get set in rip_send_packet. I don't
> follow this.

I think this ties in with the potential cleanup that could be done by
collecting various bits of interface/subnet related information ripd
maintains and making use of the ->info field in struct interface as a
place to store ripd specific interface information, as discussed
previously.

i think sowmini's intention was to do that cleanup at some stage when
she had the time.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Vests are to suits as seat-belts are to cars.
Re: ripd status [ In reply to ]
> From gdt@ir.bbn.com Fri Jan 9 10:32:09 2004
>
> Could you summarize the original problem that let to this proposed
> change? That would perhaps be good to include in a ChangeLog entry.
>
> +struct connected *source_address;
>
> This is a global, and it seems to get set in rip_send_packet. I don't
> follow this.

It's a long story- the thread to look for has the subject
"Re: Problems with ripd in quagga-0.96.4" in quagga-dev

The basic issue is: the ripv2 spec requires that ripd send out
a rip response/request packet for each connected network. For mcast
packets sent out on an interface that is on multiple networks,
this is done by setting the source address appropriately. The function
that knows the source address (the one that packages the rip payload,
and knows how to do split horizon, poison reverse etc.) is several
levels higher in the stack than the actual sending function. Rather
than tow the source address through all those functions, I'd
originally tried to do it by setting the source address on rip->sock
but that proved to be non-portable.

>
> How do the sockets get cleaned up when interfaces are removed?
>
> - bind (sock, NULL, 0); /* unbind any previous association */
>
> I know this is a deleted line, but this raises the question of the
> portability of 'unbind' - is that fialing what led to this?

It wasn't portable. See [quagga-dev 427].

The only portable solution was to revert back to the original
clumsy way of open/closing a socket for each packet sent.

> (At first glance, the patch also looks like it isn't whitespace-clean,
> too.)

Unfortunately, the patch in [quagga-dev 428] did not get committed,
and I was trying to recreate from an old source-base that I had lying
around. If Kryzstof can confirm that the patch fixes all of his problems,
I can try to generate a cleaner patch.

--Sowmini
Re: ripd status [ In reply to ]
The only portable solution was to revert back to the original
clumsy way of open/closing a socket for each packet sent.

This should be in the ChangeLog entry.

The basic issue is: the ripv2 spec requires that ripd send out
a rip response/request packet for each connected network. For mcast
packets sent out on an interface that is on multiple networks,
this is done by setting the source address appropriately. The function
that knows the source address (the one that packages the rip payload,
and knows how to do split horizon, poison reverse etc.) is several
levels higher in the stack than the actual sending function. Rather
than tow the source address through all those functions, I'd
originally tried to do it by setting the source address on rip->sock
but that proved to be non-portable.

This should be explained in comments in the code. It sounds like the
situation that is troublesome is an interface with multiple prefixes
configured on it?

Using a global to pass an address in several levels really seems
unclean. If the source address really is needed by the sending
function (which it sounds like), I would really a global-free
solution. I know quagga is not thread-safe, but global usage like
this is hard to understand (especially without a big comment
explaining the rules) and I'm afraid it is likely to lead to trouble
with future maintenance.
Changing a few signatures to carry a source address doesn't sound like
that bad a change.

The only portable solution was to revert back to the original
clumsy way of open/closing a socket for each packet sent.

One could keep the sockets in the interface structure rather than
open/closing.

How do you ensure that these sockets don't receive rip traffic that
might then get dropped when they are closed?
(Please add a comment rather than answering here :-)

> (At first glance, the patch also looks like it isn't whitespace-clean,
> too.)

Unfortunately, the patch in [quagga-dev 428] did not get committed,
and I was trying to recreate from an old source-base that I had lying
around. If Kryzstof can confirm that the patch fixes all of his problems,
I can try to generate a cleaner patch.

OK - patches submitted for application really need to be fully clean
and against the head of CVS. Please also include a ChangeLog entry --
see HACKING at top level for my take on patch submission guidelines.
While there isn't yet established consensus about these, no negative
comments have been received either.
Re: ripd status [ In reply to ]
>
> Using a global to pass an address in several levels really seems
> unclean.

Yes, I know. See Paul's comments about keeping things in ifp->info.
I figured that the global was no worse than doing the unbind()
(which stashes the source address info in the socket internals).

> Changing a few signatures to carry a source address doesn't sound like
> that bad a change.

this gets complicated because the functions are called from several
places, only a few of which have a meaningful source address to send.

> One could keep the sockets in the interface structure rather than
> open/closing.

Yes, see Paul's comments.

> How do you ensure that these sockets don't receive rip traffic that
> might then get dropped when they are closed?
> (Please add a comment rather than answering here :-)

e.g.,
/*
* question for kunihiro: How do you ensure that these
* sockets don't receive rip traffic that might then get dropped
* when they are closed?
*/

no, seriously, that's how the code originally was written. There's
a separate socket (rip->sock) that picks up incoming traffic.

--Sowmini
Re: ripd status [ In reply to ]
> Changing a few signatures to carry a source address doesn't sound like
> that bad a change.

this gets complicated because the functions are called from several
places, only a few of which have a meaningful source address to send.

Then NULL can be passed. On a quick glance, it looks like only
rip_send_packet needs to get an additional argument; the 'connected'
structure or its addr seems to be passed down the others.

e.g.,
/*
* question for kunihiro: How do you ensure that these
* sockets don't receive rip traffic that might then get dropped
* when they are closed?
*/

no, seriously, that's how the code originally was written. There's
a separate socket (rip->sock) that picks up incoming traffic.

OK, but that calls for either a comment when the socket is created
that explains why it won't get traffic, or

/* XXX This socket might receive rip traffic which we will then drop. */
Re: ripd status [ In reply to ]
On Fri, 9 Jan 2004 sowmini.varadhan@sun.com wrote:

> The only portable solution was to revert back to the original
> clumsy way of open/closing a socket for each packet sent.

at which point we might as well permanently open a socket for each
subnet (for lifetime of being active on that subnet) - if it were to
simplify things. (your original argument was socket/subnet might have
scaleability problems, but surely socket/message would be worse?)

> Unfortunately, the patch in [quagga-dev 428] did not get committed,

ah. that can be done. its just it wasnt very clear at that stage what
was what. You could have included this patch in your own patch and
submitted it on too.

> --Sowmini

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
One seldom sees a monument to a committee.
Re: ripd status [ In reply to ]
> > The only portable solution was to revert back to the original
> > clumsy way of open/closing a socket for each packet sent.
>
> at which point we might as well permanently open a socket for each
> subnet (for lifetime of being active on that subnet) - if it were to
> simplify things. (your original argument was socket/subnet might have
> scaleability problems, but surely socket/message would be worse?)

true.

> > Unfortunately, the patch in [quagga-dev 428] did not get committed,
>
> ah. that can be done. its just it wasnt very clear at that stage what
> was what. You could have included this patch in your own patch and
> submitted it on too.

Let's wait for Kryzstof to confirm that this patch solves all the
technical problems and then I'll submit one big patch which addresses
the comment/cosmetic/software-engineering issues.

Kryzstof?

--Sowmini
Re: ripd status [ In reply to ]
On Fri, 9 Jan 2004 sowmini.varadhan@sun.com wrote:

> Let's wait for Kryzstof to confirm that this patch solves all the
> technical problems and then I'll submit one big patch which addresses
> the comment/cosmetic/software-engineering issues.
>
> Kryzstof?

Krzysztof ;-)


Eh :) Tested.


My small testing enviroment:

eth0 interface with 1 address: 192.168.0.33/24
eth3 interface with 2 addresses: 192.168.200.10/24, 192.168.200.11/24, 192.168.200.12/24

-- ripd.conf begin --
router rip
network eth3

timers basic 10 45 120

redistribute connected
-- ripd.conf end --


1. RIPv2 Multicast still works:

20:07:40.416018 192.168.200.10.520 > 224.0.0.9.520: RIPv2-req 24 (DF) [ttl 1]
20:07:40.418367 192.168.200.11.520 > 224.0.0.9.520: RIPv2-req 24 (DF) [ttl 1]
20:07:40.420541 192.168.200.12.520 > 224.0.0.9.520: RIPv2-req 24 (DF) [ttl 1]

20:07:41.415871 192.168.200.10.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]
20:07:41.418270 192.168.200.11.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]
20:07:41.420580 192.168.200.12.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]

20:07:54.416156 192.168.200.10.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]
20:07:54.418620 192.168.200.11.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]
20:07:54.420922 192.168.200.12.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]

20:08:03.426351 192.168.200.10.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]
20:08:03.428804 192.168.200.11.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]
20:08:03.431101 192.168.200.12.520 > 224.0.0.9.520: RIPv2-resp [items 1]: {192.168.0.0/255.255.255.0}(1) (DF) [ttl 1]


2. RIPv1 Unicast does not work properly:

20:09:38.056789 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.057499 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.058045 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.058483 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.058835 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.059392 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.059819 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.060164 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:09:38.060540 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)

20:09:38.061290 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.062060 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.062705 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.063416 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.064038 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.064838 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.065504 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.066237 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:38.066909 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)

20:09:43.056739 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.057471 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.058197 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.058828 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.059538 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.060163 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.061029 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.061646 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:43.062284 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)


20:09:56.066942 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.067626 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.068353 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.069224 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.069833 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.070450 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.071164 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.071791 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:09:56.072643 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)


Please notice that there are 9 (3^3 probably) instead of 3 requests and 9
(3^3 probably - again) instead of 3 announces. All packets come from
primary ip address (192.168.200.10).

With 2 addresses on eth3 (192.168.200.10/24, 192.168.200.11/24) there are
4 (2^2 probably) requests and 4 announces:

20:11:57.019394 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:11:57.020135 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:11:57.020702 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)
20:11:57.021083 192.168.200.10.520 > 192.168.200.255.520: RIPv1-req 24 (DF)


20:11:58.019318 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:11:58.020013 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:11:58.020751 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)
20:11:58.021452 192.168.200.10.520 > 192.168.200.255.520: RIPv1-resp [items 1]: {192.168.0.0}(1) (DF)



Best regards,


Krzysztof Olêdzki
Re: ripd status [ In reply to ]
> > Ok.. I don't know too much about linux administration, so I used
> > the sequence of commands that seemed most obvious to me.
>
> But this creates new interfaces, each one with one IP address. IMHO this
> is not the same like one interface with many addresses. So, test results
> may vary.

[.Also cc-ing the list, to get some other thoughts about this]

Ok, I'll have to test both cases. But as for the problem itself,
I spent some time thinking about it, and talking to colleagues, and
it appears that the right thing to do, when you have multiple addresses
configured on the same subnet, is to send out exactly *one* request/response
using *one* source address for the message (else the listeners'
routing tables are going to balloon up).

To do this, ripd must probably maintain some sort of "DUP ADDR" in
struct connected, and this makes the change more complex.

Stay tuned.. I'll work on the fix for us both to test.

--Sowmini
Re: ripd status [ In reply to ]
Ok, I'll have to test both cases. But as for the problem itself,
I spent some time thinking about it, and talking to colleagues, and
it appears that the right thing to do, when you have multiple addresses
configured on the same subnet, is to send out exactly *one* request/response
using *one* source address for the message (else the listeners'
routing tables are going to balloon up).

I'll toss out that when I first heard about the situation it seemed
like ripd trying to cope with a perhaps broken world. I find the word
subnet confusing above. If one uses the IPv6 term link to refer to a
medium that can exchange packets, then I think you are talking about
the situation of multiple IPv4 prefixes configured on the same link.

It would seem a bit odd in this circumstance to send multiple rip
announcements, but on the other hand I'd expect rip to require that
only packets from an address falling within a configured prefix be
accepted. So if there are some routers with prefix A and some with
prefix B, and one with A and B (call it router Z), then perhaps Z
should be sending two announcements, and the other ones will route
through Z.

Still, this seems like a very oddly configured network. But odd does
not necessarily equal "broken, we don't support".


In any case, I think it is important to keep the concept of an
interface (a device which can send and receive packets on a link)
separate from a configured prefix.
So if one had two prefixes on one link, there would be one interface
structure which had two prefixes associated with it in the data
structures.

--
Greg Troxel <gdt@ir.bbn.com>
Re: ripd status [ In reply to ]
> like ripd trying to cope with a perhaps broken world. I find the word
> subnet confusing above.

Confusing, but perfectly valid, and not broken at all.
I can have one interface with addresses:
10.0.0.1/24, 10.0.0.2/24, 10.0.0.3/24, for all sorts of reasons e.g.,
failover when one address fails, use a specific address for DNS etc.

> If one uses the IPv6 term link to refer to a
> medium that can exchange packets, then I think you are talking about
> the situation of multiple IPv4 prefixes configured on the same link.

no, I'm talking about multiple *addresses* for one interface on
the same link.

Note that this is also different from the linux concept of "secondary"
address, or bsd's "alias" address, where the secondary/alias may or
may not be on the same prefix/subnet as the "primary" address.

For example, consider a node like this (with 3 physical interfaces,
A, B, C, but the problem wrt ripd is the same)


------------
| |
| A B C | listening router R
------------ |
| | | |
| | | |
------------------------- N


A is the "primary" address in that it is used as source addr for
outgoing packets, with the intention that source address should fail over to
the address configured on B if the interface A fails (marked deprecated).

The way ripd is currently designed (I'm not sure what the other daemons
do) it will send out 3 packets, with source A, B, C. And R will think
that there are 3 routers, A, B, C in the network, its routing
tables will quickly balloon up, and it will be doing all sorts of
gymnastics timing out and managing this ballooned up routing table.

One possibility is for zebra to notice that B and C are really duplicate
addresses on the network N, and to flag them as ZEBRA_IFA_DUP
(leaving ZEBRA_IFA_SECONDARY as designed). This would require some changes
to functions like connected_up_ipv4 (which I'm investigating currently)-
they will have to call prefix_match instead of prefix_same, and do some
work to identify the difference between an exact match and a prefix match
(so that addresses don't get configured twice etc.)

This solution is non-trivial, because the daemons now have to recognize
when to fail-over, when to promote something from DUP to primary etc.

--Sowmini
Re: ripd status [ In reply to ]
no, I'm talking about multiple *addresses* for one interface on
the same link.

Right, so you have multiple subnets on the same link in this case,
since subnet is a prefix-based concept, not a link-based concept.
This is what I meant was potentially confusing.

Note that this is also different from the linux concept of "secondary"
address, or bsd's "alias" address, where the secondary/alias may or
may not be on the same prefix/subnet as the "primary" address.

Good points.

The way ripd is currently designed (I'm not sure what the other daemons
do) it will send out 3 packets, with source A, B, C. And R will think
that there are 3 routers, A, B, C in the network, its routing
tables will quickly balloon up, and it will be doing all sorts of
gymnastics timing out and managing this ballooned up routing table.

Before talking about how to change the code, we really need to have a
very clear understanding of what the specs say and what the correct
behavior is.

One possibility is for zebra to notice that B and C are really duplicate
addresses on the network N, and to flag them as ZEBRA_IFA_DUP
(leaving ZEBRA_IFA_SECONDARY as designed). This would require some changes

Perhaps, but we really need to understand the usages and meanings of
those flags, and fix up the docs for them. Right now things are a bit
underdocumented, but do work for the most part.
Re: ripd status [ In reply to ]
>
> The way ripd is currently designed (I'm not sure what the other daemons
> do) it will send out 3 packets, with source A, B, C. And R will think
> that there are 3 routers, A, B, C in the network, its routing
> tables will quickly balloon up, and it will be doing all sorts of
> gymnastics timing out and managing this ballooned up routing table.
>
> Before talking about how to change the code, we really need to have a
> very clear understanding of what the specs say and what the correct
> behavior is.
>

My understanding of the RIP specs (ipv4 and ipv6) is that you should
send out one packet on each connected network, i.e., on each connected
prefix/subnet. So, in the case discussed earlier (both the A/B/C case
and the one where there's one interface with addresses
{10.0.0.1/24, 10.0.0.2/24, 10.0.0.3/24}), you have *one* connection
to the 10.0.0.0/24 network and you should send out *one* packet on this
network.

Seems like ospfd does something to this effect too- Paul tells
me that it will not send packets on ZEBRA_IFA_SECONDARY networks,
which strikes me as behavior in the same spirit?

--Sowmini
Re: ripd status [ In reply to ]
My understanding of the RIP specs (ipv4 and ipv6) is that you should
send out one packet on each connected network, i.e., on each connected
prefix/subnet.

This is what I meant was confusing. By 'network', do you mean 'link'
or 'prefix'? From the 'i.e.', I gather you mean prefix.

{10.0.0.1/24, 10.0.0.2/24, 10.0.0.3/24}), you have *one* connection
to the 10.0.0.0/24 network and you should send out *one* packet on this
network.

Is your case of interest like this, or with different prefixes?

Having things marked ZEBRA_IFA_SECONDARY for (all but one of)
prefixes/addresses which are prefix_cmp-equal to another address sounds
like it might well be the right thing, but I'd like to see the
semantics for ZEBRA_IFA_SECONDARY documented.

find . -name \*.[ch]|xargs egrep IFA_SECONDARY

./lib/if.h:#define ZEBRA_IFA_SECONDARY (1 << 0)
./ospfd/ospfd.c: if (CHECK_FLAG(co->flags,ZEBRA_IFA_SECONDARY))
./zebra/interface.c: if (CHECK_FLAG (connected->flags, ZEBRA_IFA_SECONDARY))
./zebra/interface.c: SET_FLAG (ifc->flags, ZEBRA_IFA_SECONDARY);
./zebra/interface.c: SET_FLAG (ifc->flags, ZEBRA_IFA_SECONDARY);
./zebra/interface.c: if (CHECK_FLAG (ifc->flags, ZEBRA_IFA_SECONDARY))
./zebra/rt_netlink.c: SET_FLAG (flags, ZEBRA_IFA_SECONDARY);
./zebra/rt_netlink.c: if (CHECK_FLAG (ifc->flags, ZEBRA_IFA_SECONDARY))

It seems only ospfd uses this, but I only see that it gets set via
netlink or the ip_address_secondary command.

#ifdef HAVE_NETLINK
DEFUN (ip_address_secondary,
ip_address_secondary_cmd,
"ip address A.B.C.D/M secondary",
"Interface Internet Protocol config commands\n"
"Set the IP address of an interface\n"
"IP address (e.g. 10.0.0.1/8)\n"
"Secondary IP address\n")
{
return ip_address_install (vty, vty->index, argv[0], NULL, NULL, 1);
}
Re: ripd status [ In reply to ]
> send out one packet on each connected network, i.e., on each connected
> prefix/subnet.
>
> This is what I meant was confusing. By 'network', do you mean 'link'
> or 'prefix'? From the 'i.e.', I gather you mean prefix.

By your definitions, Prefix.

> {10.0.0.1/24, 10.0.0.2/24, 10.0.0.3/24}), you have *one* connection
> to the 10.0.0.0/24 network and you should send out *one* packet on this
> network.
>
> Is your case of interest like this, or with different prefixes?

My case of interest is all of the above, but oleq's particular case is
like the set I indicate above.

>
> Having things marked ZEBRA_IFA_SECONDARY for (all but one of)
> prefixes/addresses which are prefix_cmp-equal to another address sounds

no, it does not.. how can ripd know the difference between the case
when a secondary address is a duplicated prefix (don't send packets)
as compared to when it is not (2 connected networks/links/prefixes- send
packet).

> It seems only ospfd uses this, but I only see that it gets set via
> netlink or the ip_address_secondary command.

I think the linux kernel also sets it, but afaik BSD does not. And of course,
there's no concept of alias/secondary on Solaris. I don't know what the
other flavors of unix do.

--Sowmini
Re: ripd status [ In reply to ]
> Having things marked ZEBRA_IFA_SECONDARY for (all but one of)
> prefixes/addresses which are prefix_cmp-equal to another address sounds

no, it does not.. how can ripd know the difference between the case
when a secondary address is a duplicated prefix (don't send packets)
as compared to when it is not (2 connected networks/links/prefixes- send
packet).

This is why the semantics of ZEBRA_IFA_SECONDARY need to be defined
precisely, or perhaps something else, and then interface and address
adding/deleting modified to respect those invariants. Clearly one can
tell the difference between 2 addrs in one prefix and 2 addrs in
different prefixes.

It remains to be seen whether this is what ZEBRA_IFA_SECONDARY really
means, though.

I think the linux kernel also sets it, but afaik BSD does not. And of course,
there's no concept of alias/secondary on Solaris. I don't know what the
other flavors of unix do.

You mean on solaris you can't ifconfig 2 ip addrs on the same prefix
on an interface?
Re: ripd status [ In reply to ]
>
> I think the linux kernel also sets it, but afaik BSD does not. And of course,
> there's no concept of alias/secondary on Solaris. I don't know what the
> other flavors of unix do.
>
> You mean on solaris you can't ifconfig 2 ip addrs on the same prefix
> on an interface?

no, that would be a pretty gross lacking, and one that would irk customers.

I mean, the concept of a kernel-defined IFF_SECONDARY flag is pretty unique
to linux.

--Sowmini
Re: ripd status [ In reply to ]
Greg Troxel wrote:

> > Having things marked ZEBRA_IFA_SECONDARY for (all but one of)
> > prefixes/addresses which are prefix_cmp-equal to another address sounds
>
> no, it does not.. how can ripd know the difference between the case
> when a secondary address is a duplicated prefix (don't send packets)
> as compared to when it is not (2 connected networks/links/prefixes- send
> packet).
>
> This is why the semantics of ZEBRA_IFA_SECONDARY need to be defined
> precisely, or perhaps something else, and then interface and address
> adding/deleting modified to respect those invariants. Clearly one can
> tell the difference between 2 addrs in one prefix and 2 addrs in
> different prefixes.
>
> It remains to be seen whether this is what ZEBRA_IFA_SECONDARY really
> means, though.

In fact, it does not: this goes back to historical discussions about the
meaning of the term "secondary" in zebra/quagga (for example, see
[quagga-dev 72] and follow-ups). Briefly speaking, I believe the
original need emerged from Cisco compatibility, where a secondary
address is any non-primary address assigned to an interface, and each
interface may have exactly one primary address. On the other hand, a
secondary address in other contexts (Linux for sure, I believe
BSD-variants as well) means subsequent addresses of a single subnet
prefix assigned to a single interface; nonetheless, in Linux for
example, it's the kernel's exclusive role to mark addresses as
secondary. Absurdly enough, zebra/quagga attempts to *set* the secondary
flag for addresses assigned via netlink, obviously in vain... Thus, we
get a considerable inconsistency with regard to what "secondary" and
ZEBRA_IFA_SECONDARY really stand for.

How can it be changed? IMO, there are two possible paths, both
incomplete in some sense:

#1, make 'secondary' a configuration property only, in order to retain
the Cisco-like API, and refrain from any attempt to address forwarding
related issues affected by this attribute as applied by the kernel.
That's approximately what things are today, IMO.

#2, stick to a finer scheme, by which secondary is a kernel derived
attribute, which is aimed to address actual forwarding issues (namely,
tie break for which source address to use for outbound packets). Not a
bad idea, but requires extensive modifications, as the zerba daemon must
trace 'secondary' flags as they are received from the underlying kernel,
on the one hand, and on the other hand disable user intervention in the
form of Cisco-like 'secondary' keyword. Is it worth the effort?

Note that these issues, and whatever decisions taken, may have further
effect that we aren't quite aware of: for example, how should
link-oriented protocols like RIP and OSPF treat a chain of
primary-secondaries addresses? Should they be treated as a single
"bundle" for the sake of transmitting/receiving route updates? What
happens when a primary address is deleted? (this is also tightly
hand-in-hand with the behavior of the specific underlying kernel) And so
on. In other words, we need to do careful thinking and come up with the
most generalized approach that addresses all such issues.

Still want to talk about secondaries in quagga?... ;->

Gilad


(PS: AFAICT, an "interface" has nothing to do with "link" with regard to
this particular discussion.)
Re: ripd status [ In reply to ]
>
> Still want to talk about secondaries in quagga?... ;->

yeah!

I have no familiarity with linux and I'm playing with it, and I see
that when I configure ip address using some /sbin/ip incantations
I picked up from Paul and Krysztof, I get:

# ip -4 add
1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue
inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
4: eth1: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
inet 10.0.0.33/25 brd 10.255.255.255 scope global eth1
inet 172.16.3.33/24 scope global eth1
inet 10.0.0.34/25 scope global secondary eth1

(why can't good old 'ifconfig -a' display these addresses? but that's
another problem)

So I was wrong in my assumption earlier that "secondary" is set
by the kernel for all non-primary addresses. It seems like "secondary"
is only set for (what I've been calling) duplicate addresses.

What does BSD do with the interface flags(if anything at all)?

On Solaris, if you just add another address (as opposed to explicitly
configuring IPMP groups) via 'ifconfig <intf> addif <....>', does
nothing remarkable:

eri0: flags=1104843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,ROUTER,IPv4> ...
inet 10.0.0.102 netmask ffffff80 broadcast <...>
eri0:1: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> ...
inet 10.0.0.103 netmask ffffff80 broadcast <...>


> (PS: AFAICT, an "interface" has nothing to do with "link" with regard to
> this particular discussion.)

In my dictionary, yes.. To me, an interface is a physical card,
link is what Greg calls "prefix/subnet". Therefore, you can
have multiple interfaces on the same link/prefix/subnet, one
interface on multiple links/prefixes/subnets.

But that's all a matter of definition. I think we are all talking
about the same thing here.

--Sowmini
Re: ripd status [ In reply to ]
sowmini.varadhan@Sun.COM wrote:

> So I was wrong in my assumption earlier that "secondary" is set
> by the kernel for all non-primary addresses. It seems like "secondary"
> is only set for (what I've been calling) duplicate addresses.

Yes, and there's more: for example, try to delete the primary address
and see what happens...


> On Solaris, if you just add another address (as opposed to explicitly
> configuring IPMP groups) via 'ifconfig <intf> addif <....>', does
> nothing remarkable:
>
> eri0: flags=1104843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,ROUTER,IPv4> ...
> inet 10.0.0.102 netmask ffffff80 broadcast <...>
> eri0:1: flags=1100843<UP,BROADCAST,RUNNING,MULTICAST,ROUTER,IPv4> ...
> inet 10.0.0.103 netmask ffffff80 broadcast <...>

So, what happens when you do 'ping 10.0.0.105'? What source address do
the packets carry? (my guess: the non-alias "interface" is used) What
happens when you remove 10.0.0.102? (my guess: you lose all subsequent
aliases) If I'm right here -- and bear in mind that I've never used
Solaris or the like -- then the "primality" property is implied to hold
for non-alias interface, and otherwise for aliases. Am I mistaken?

Gilad
Re: ripd status [ In reply to ]
> So, what happens when you do 'ping 10.0.0.105'? What source address do
> the packets carry?

it would use the first configured address..

> happens when you remove 10.0.0.102?

what happens? nothing remarkable.. eri0:1 is now the address for eri0
and is used for outgoing packets..

> Solaris or the like -- then the "primality" property is implied to hold
> for non-alias interface, and otherwise for aliases. Am I mistaken?

Most of the primality property is an emulation of BSD behavior, simply
because BSD got there first. The cisco behavior you described was
also exactly the same as bsd behavior.

The secondary flag in linux sounds pretty useful (though I don't
know all the details.. what happens when the primary address is
deleted? does something else get promoted to primary status? Are
routing socket listeners informed about changes?)

Maybe zebra should do some internal management and set the
SECONDARY flag (after we agree on some definition for it) for
OS-es where the kernel does not help us.

Is anyone familiar with the linux kernel code around secondary
flag management, to understand linux semantics for this flag?

--Sowmini
Re: ripd status [ In reply to ]
sowmini.varadhan@Sun.COM wrote:

> The secondary flag in linux sounds pretty useful (though I don't
> know all the details.. what happens when the primary address is
> deleted? does something else get promoted to primary status? Are
> routing socket listeners informed about changes?)

When you delete a primary address, it flushes the whole chain of
subsequent secondaries as well. Netlink subscribers are informed of this
bulk address deletion, of course.


> Maybe zebra should do some internal management and set the
> SECONDARY flag (after we agree on some definition for it) for
> OS-es where the kernel does not help us.

On what basis should it set the secondary flag? What about OS's where it
cannot set this flag at all (eg, Linux, and probably BSD as well) What
would be the meaning for an address to be "secondary" in the first
place? And how does it all fit into the restrictive Cisco-like syntax
and semantics? I'm not sure that the zebra daemon can set primary
attributes to addresses in a single, coherent, all-in-one interface.
Therefore, IMO, the better choice would be accurately sticking to what
the underlying kernel defines to be primary/secondary, and for that
sense you'll need to slightly extend the specialized kernel layers for
each such variant, in order to support describing these semantics to zebra.


> Is anyone familiar with the linux kernel code around secondary
> flag management, to understand linux semantics for this flag?

Is there anything unclear beyond what I've stated so far?...

Gilad
Re: ripd status [ In reply to ]
>
> > Maybe zebra should do some internal management and set the
> > SECONDARY flag (after we agree on some definition for it) for
> > OS-es where the kernel does not help us.
>
> On what basis should it set the secondary flag?

On the same basis as Linux?

> What about OS's where it
> cannot set this flag at all (eg, Linux, and probably BSD as well) What
> would be the meaning for an address to be "secondary" in the first
> place?

I think there's some misunderstanding here.. I was thinking of just
setting a flag that's internal to zebra [#], so that routing daemons
can behave consistently in sending behavior. As things stand now,
my understanding is that ospfd on bsd would send out packets with
source addresses that would have been marked "secondary" by linux.

[#] Obviously, you can't set/change flags in the kernel if it won't let
you, and you can't (or shouldn't be able to) outsmart the kernel.

--Sowmini
Re: ripd status [ In reply to ]
sowmini.varadhan@Sun.COM wrote:

>>On what basis should it set the secondary flag?
>
> On the same basis as Linux?

Okay, I get your point, but that isn't necessarily correct under all
circumstances, take for instance the case of primary (a-la Linux)
address removal: does it imply deletion of all dependent secondaries, or
else? (in BSD, according to your witness, it isn't mandatory) So, my
conclusion would be that zebra must correspond with the kernel in order
to get this configuration right.

I generally agree that correctly observing "secondary" addresses (again,
the Linux way, meaning subsequent addresses of a single subnet) and
reporting to protocols is important, in order to avoid duplicate,
unnecessary advertisements.


> I think there's some misunderstanding here.. I was thinking of just
> setting a flag that's internal to zebra [#], so that routing daemons
> can behave consistently in sending behavior. As things stand now,
> my understanding is that ospfd on bsd would send out packets with
> source addresses that would have been marked "secondary" by linux.

Let alone, it would do the same on Linux as well... ;-> (as ospfd uses
raw sockets, it can transmit with whatever source address it chooses)
Furthermore, zebra's ospfd implement an even-more-complicated approach
of abstract "OSPF interfaces", namely stand for <prefix,real-interface>
pairs, and applied a separate state machine for each; in this case, we'd
probably need to revise this abstraction so as to correctly handle
primary/secondary bundles, isn't it?...

(Is there anybody else listening here? Paul?)


> [#] Obviously, you can't set/change flags in the kernel if it won't let
> you, and you can't (or shouldn't be able to) outsmart the kernel.

Elementary... but what about correctly understanding your kernel? ;->

Gilad
Re: ripd status [ In reply to ]
In my dictionary, yes.. To me, an interface is a physical card,
link is what Greg calls "prefix/subnet". Therefore, you can
have multiple interfaces on the same link/prefix/subnet, one
interface on multiple links/prefixes/subnets.

interface is a port on a card.

link is a medium that a port plugs into, such as a 100 Mb/s
hub/switch.

It is possible to plug two or more interfaces into a link.

prefix/subnet is an address/mask (with no bits set in the 'host'
portion) that denotes a logical network. Each subnet is associated
with one link, but a link may have more than one subnet.

At least that's how I use the terms.
Re: ripd status [ In reply to ]
> It remains to be seen whether this is what ZEBRA_IFA_SECONDARY really
> means, though.

In fact, it does not: this goes back to historical discussions about the
meaning of the term "secondary" in zebra/quagga (for example, see
[quagga-dev 72] and follow-ups). Briefly speaking, I believe the
original need emerged from Cisco compatibility, where a secondary
address is any non-primary address assigned to an interface, and each
interface may have exactly one primary address.

So we have an architectural choice. Do we declare that of all
addresses with the 'prefix_cmp'-same prefixes, that all but the first
one we saw (or whatever) are 'secondfooary', for some term foo, and
choose to assign a new flag to denote this? Or, do we let each
protocol which has to make such tests (which ripd apparently should)
do this test on its own.

I don't see this as intimately connected with any particular kernel
flags, and is sensible in the absence of special kernel support.
Re: ripd status [ In reply to ]
Out of curiosity I tried this umm, pathological case on a spare nic on
linux 2.4.23:

# ip addr add 10.0.0.1/24 dev eth1
# ip addr add 10.0.0.2/24 dev eth1
# ip addr add 10.0.0.3/25 dev eth1
# ip addr add 10.0.0.4/23 dev eth1
# ip addr show dev eth1

5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:02:44:4c:04:c8 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/24 scope global eth1
inet 10.0.0.3/25 scope global eth1
inet 10.0.0.4/23 scope global eth1
inet 10.0.0.2/24 scope global secondary eth1

# ip addr del 10.0.0.1/24 dev eth1
# ip addr show dev eth1

5: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop qlen 1000
link/ether 00:02:44:4c:04:c8 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.3/25 scope global eth1
inet 10.0.0.4/23 scope global eth1

Further tests confirmed that the netmask must comply and that the
secondary flag is up for all except the firstly added address with
that netmask, independent of the IP address.

--
Frank
Re: ripd status [ In reply to ]
Frank van Maarseveen wrote:

> Further tests confirmed that the netmask must comply and that the
> secondary flag is up for all except the firstly added address with
> that netmask, independent of the IP address.

This was reported and discussed previously, see [zebra 18516] and [zebra
18531]. In fact, it's quite an understandable behavior, if we assume
that secondary is aimed to tie break the decision as for which address
to use as source address: for addresses (therefore implied connected
routes) of different netmask, it will be chosen based on prefix longest
match paradigm, as it is for any outbound IP packet; so, the problem
only exists for addresses residing on the same IP subnet with the same
netmask.

(You also implicitly confirmed the flushing behavior of secondary chains
when a primary address is deleted.)

Gilad
Re: ripd status [ In reply to ]
Greg Troxel wrote:

> So we have an architectural choice. Do we declare that of all
> addresses with the 'prefix_cmp'-same prefixes, that all but the first
> one we saw (or whatever) are 'secondfooary', for some term foo, and
> choose to assign a new flag to denote this? Or, do we let each
> protocol which has to make such tests (which ripd apparently should)
> do this test on its own.

As long as a protocol doesn't require any notion of which of a subnet's
addresses is the "primary" one (meaning, the one that is used as source
address for sending packets), then it's probably a good solution that is
independant of what zebra and/or kernel believes those addresses to be.


> I don't see this as intimately connected with any particular kernel
> flags, and is sensible in the absence of special kernel support.

I think this is generally a healthy approach, but one thing still needs
to be addressed sometime -- fixing zebra's 'secondary' flag for IP
addresses, so as to correlate with some real kernel property applied to
those addresses (currently it has nothing to do with actual forwarding,
although the code attempts to set those secondary attributes). My own
proposal: eliminate the use of 'secondary' in VTY, and that's it... what
do you think?

Gilad
Re: ripd status [ In reply to ]
As long as a protocol doesn't require any notion of which of a subnet's
addresses is the "primary" one (meaning, the one that is used as source
address for sending packets), then it's probably a good solution that is
independant of what zebra and/or kernel believes those addresses to be.

I think one just has to be chosen and it doesn't matter which.
We can do this cheesily by making the first found address the one that
doesn't get marked as a dup; this will do 'the right' thing by most
people's stadards.

I think this is generally a healthy approach, but one thing still needs
to be addressed sometime -- fixing zebra's 'secondary' flag for IP
addresses, so as to correlate with some real kernel property applied to
those addresses (currently it has nothing to do with actual forwarding,
although the code attempts to set those secondary attributes). My own
proposal: eliminate the use of 'secondary' in VTY, and that's it... what
do you think?

Someone needs to understand exactly what that command does on a cisco
- or is it there to support the Linux secondary notion? I would be ok
with removing it, or at least making it conditional on
--enable-secondary.
Re: ripd status [ In reply to ]
Greg Troxel wrote:

> As long as a protocol doesn't require any notion of which of a subnet's
> addresses is the "primary" one (meaning, the one that is used as source
> address for sending packets), then it's probably a good solution that is
> independant of what zebra and/or kernel believes those addresses to be.
>
> I think one just has to be chosen and it doesn't matter which.
> We can do this cheesily by making the first found address the one that
> doesn't get marked as a dup; this will do 'the right' thing by most
> people's stadards.

In fact I was promoting your idea of letting each protocol daemon manage
its own bundling of IP addresses on an interface, thus totally letting
go of any "secondary" property that may apply with regard to
forwarding... (did I misunderstand?)


> Someone needs to understand exactly what that command does on a cisco
> - or is it there to support the Linux secondary notion? I would be ok
> with removing it, or at least making it conditional on
> --enable-secondary.

I think that all relevant information is known, isn't it?

Gilad
Re: ripd status [ In reply to ]
> Greg Troxel wrote:
>
> > As long as a protocol doesn't require any notion of which of a subnet's
> > addresses is the "primary" one (meaning, the one that is used as source
> > address for sending packets), then it's probably a good solution that is
> > independant of what zebra and/or kernel believes those addresses to be.
> >
> > I think one just has to be chosen and it doesn't matter which.
> > We can do this cheesily by making the first found address the one that
> > doesn't get marked as a dup; this will do 'the right' thing by most
> > people's stadards.
>
> In fact I was promoting your idea of letting each protocol daemon manage
> its own bundling of IP addresses on an interface, thus totally letting
> go of any "secondary" property that may apply with regard to
> forwarding... (did I misunderstand?)

This solution is unavoidably OS-dependant. Sometimes, as in the case
of Linux or Solaris (when IP multipathing is enabled), the OS will
pass up flags to indicate which address is "primary" (the rest
being secondary/deprecated/back-up) and the daemon has to be in step
with the kernel. Otherwise, the daemon could end up picking a deprecated
address as source address, and misleading/confusing listeners into
using the deprecated address for the destination.

> > Someone needs to understand exactly what that command does on a cisco
> > - or is it there to support the Linux secondary notion? I would be ok
> > with removing it, or at least making it conditional on
> > --enable-secondary.

that sounds reasonable.

--Sowmini
Re: ripd status [ In reply to ]
sowmini.varadhan@sun.com wrote:

> This solution is unavoidably OS-dependant. Sometimes, as in the case
> of Linux or Solaris (when IP multipathing is enabled), the OS will
> pass up flags to indicate which address is "primary" (the rest
> being secondary/deprecated/back-up) and the daemon has to be in step
> with the kernel. Otherwise, the daemon could end up picking a deprecated
> address as source address, and misleading/confusing listeners into
> using the deprecated address for the destination.

I agree it would be ideal given that one comes up with an approach
that's general enough to cover all possible breeds of kernel behaviors,
and one can also implement it successfully in quagga (I believe it takes
some change in architecture, since it will require tying all address
redistribution to the event of address reflection from the kernel layer,
eg netlink).

By the way, for the sake of this particular problem -- what could be the
harm in using a secondary/deprecated/back-up address as destination by
neighbors? I mean, any secondary address is always valid for receiving,
isn't it?

Gilad
Re: ripd status [ In reply to ]
>
> By the way, for the sake of this particular problem -- what could be the
> harm in using a secondary/deprecated/back-up address as destination by
> neighbors? I mean, any secondary address is always valid for receiving,
> isn't it?
>

Let's take the deprecated case.. the kernel is probably marking the
address as deprecated because the address is going to be timed out/deleted
soon, and should not be used as the source addr for new connections..
the daemons should pay attention to that.. another case is when
there are multiple connections/interfaces to the link, and some interfaces
are "better" (faster? more reliable?) than others and therefore
being marked primary, while the "worse" interfaces are backup. If
we use the address on the backup as the source address, the return
packet is going to potentially take the wrong path back (down the slow
connection). To make this clearer, consider the setup below ("..." represents
some random set of routers):


-------- connection 1 --------
| |-------...----------| |
| A | | B |
| |-------...----------| |
-------- connection 2 --------

If connection_1 is the "better" link, and A marks A1 (its addr on
connection_1) as primary, but sends out packets with source A2, then
B, in its returned packets could use A2 as the dest addr, and end up
sending the packets down the "worse" link.

No harm is done, but likely to cause some dissatisfaction in the audience.

--Sowmini
Re: ripd status [ In reply to ]
sowmini.varadhan@Sun.COM wrote:

> -------- connection 1 --------
> | |-------...----------| |
> | A | | B |
> | |-------...----------| |
> -------- connection 2 --------
>
> If connection_1 is the "better" link, and A marks A1 (its addr on
> connection_1) as primary, but sends out packets with source A2, then
> B, in its returned packets could use A2 as the dest addr, and end up
> sending the packets down the "worse" link.

I don't quite seems to get it: are A1 and A2 in your example associated
with different physical links? Or the other way round, if those are two
addresses assigned to a single "interface" (the Linux way, some layer-2
entity that's used to send/receive datagrams), don't they inherit the
same link characteristics? (as they are going down the same driver with
the same control block, etc?)

Otherwise, if they are associated with different interfaces, they aren't
primary/secondary related in the first place.

Gilad
Re: ripd status [ In reply to ]
>
> > -------- connection 1 --------
> > | |-------...----------| |
> > | A | | B |
> > | |-------...----------| |
> > -------- connection 2 --------
> >
> I don't quite seems to get it: are A1 and A2 in your example associated
> with different physical links? Or the other way round, if those are two

Yes, I was talking about something like Solaris multipathing, or
when you have multiple interfaces on thes same link.

> Otherwise, if they are associated with different interfaces, they aren't
> primary/secondary related in the first place.

True, but with Solaris multipathing, the kernel is doing failover
management and has its notion of a preferred interface. If that is
the case, the daemons should probably be in sync with the kernel.

--Sowmini
Re: ripd status [ In reply to ]
On Tue, 13 Jan 2004 sowmini.varadhan@Sun.COM wrote:

> One possibility is for zebra to notice that B and C are really
> duplicate addresses on the network N, and to flag them as
> ZEBRA_IFA_DUP (leaving ZEBRA_IFA_SECONDARY as designed).

What would be the difference between ZEBRA_IFA_DUP and
ZEBRA_IFA_SECONDARY though? Arent they same thing to all intents and
purposes?

I had this discussion with Gilad a long time ago. I argued that
perhaps zebra should do detection of duplicate and included subnets
and set the secondary flag accordingly. Discussion was inconclusive
(though in the case of linux, kernel sets secondary flag, and its
reflected on).

> This would require some changes to functions like connected_up_ipv4
> (which I'm investigating currently)- they will have to call
> prefix_match instead of prefix_same, and do some work to identify
> the difference between an exact match and a prefix match (so that
> addresses don't get configured twice etc.)

What about included subnets? Eg,

eth0:
1.1.1.0/24
1.1.1.x/25
1.1.1.y/32

> This solution is non-trivial, because the daemons now have to
> recognize when to fail-over, when to promote something from DUP to
> primary etc.

Hmmm... this brings us back to kernel behaviour:

# ip -4 addr show dev usb0
# ip add add 192.168.100.1/24 dev usb0
# ip -4 addr show dev usb0
2: usb0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.100.1/24 scope global usb0
# ip add add 192.168.100.2/24 dev usb0
# ip -4 addr show dev usb0
2: usb0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 192.168.100.1/24 scope global usb0
inet 192.168.100.2/24 scope global secondary usb0
# ip add del 192.168.100.1/24 dev usb0
# ip -4 addr show dev usb0
#

ie, in the case of linux, removing the primary address removes the
DUP (or secondary) addresses too.

I would argue the issue of promotion or removal of addresses is one
for the kernel, that zebra simply needs to be kept informed.

> --Sowmini

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Your computer account is overdrawn. Please see Big Brother.
Re: ripd status [ In reply to ]
On Tue, 13 Jan 2004, Greg Troxel wrote:

> like it might well be the right thing, but I'd like to see the
> semantics for ZEBRA_IFA_SECONDARY documented.

At the moment it reflects the netlink IFA_F_SECONDARY flag.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
A diplomat's life consists of three things: protocol, Geritol, and alcohol.
-- Adlai Stevenson
Re: ripd status [ In reply to ]
What would be the difference between ZEBRA_IFA_DUP and
ZEBRA_IFA_SECONDARY though? Arent they same thing to all intents and
purposes?

They might be. It remains to add comments to the code that define
precisely the semantics of the flag _without_ using the words "is set
if Linux kernel says so". Or, IFA_SECONDARY can be defined to be set
if Linux says so, but then that becomes a Linux-specific feature, and
we can define DUP to be 'there is another address on this interface
which is not marked DUP which has the prefix-cmp-same prefix'. I
suppose SECONDARY can be defined just like that, with the added
restriction that the address which doesn't get SECONDARY is chosen to
be the one that the kernel does not label secondary, on kernels which
do such any kind of secondary labeling.

I had this discussion with Gilad a long time ago. I argued that
perhaps zebra should do detection of duplicate and included subnets
and set the secondary flag accordingly. Discussion was inconclusive
(though in the case of linux, kernel sets secondary flag, and its
reflected on).

It's really a question of whether zebra should provide this
abstraction. Since it is a bit hard, it might make sense to
centralize it.


I would argue the issue of promotion or removal of addresses is one
for the kernel, that zebra simply needs to be kept informed.

I agree.

eth0:
1.1.1.0/24
1.1.1.x/25
1.1.1.y/32

I'd say we should look at all the routing protocols which deal with
this, and see what def would be most useful for them.
I consider the above example pathological; while two nonoverlapping
prefixes are sometimes reasonble, as is having two addrs in the same
prefix, having overlapping nonequal prefixes to me is a sign of a
confused network design. This case is likely to be underspecified in
routing daemons. So it may not matter much how we handle it (but of
course we have to be clear and consistent).
Re: ripd status [ In reply to ]
On Wed, 14 Jan 2004, Greg Troxel wrote:

> What would be the difference between ZEBRA_IFA_DUP and
> ZEBRA_IFA_SECONDARY though? Arent they same thing to all intents and
> purposes?
>
> They might be. It remains to add comments to the code that define
> precisely the semantics of the flag _without_ using the words "is set
> if Linux kernel says so".

Sorry, the exact meaning at the moment is "Linux kernel set the
secondary flag".

What I'm asking is the more abstract question of whether the IFA_DUP
flag, which some BSD kernels have, is notionally equivalent to
Linux's IFA_F_SECONDARY flag? If so, ZEBRA_IFA_SECONDARY is then what
we should set internally inside zebra. (ospfd already looks for this
flag).

Further questions:

- are there kernels which can have multiple addresses attached to an
interface which do not have or do not set a 'duplicate' or
'secondary' flag?

- for kernels which set such flags, do these kernel's properly notify
zebra of events relating to change in status of duplicate/secondary
addresses? eg, if the primary address is removed, what does the
kernel do for secondary addresses and does it inform zebra
appropriately?


> Or, IFA_SECONDARY can be defined to be set if Linux says so, but
> then that becomes a Linux-specific feature, and we can define DUP
> to be 'there is another address on this interface which is not
> marked DUP which has the prefix-cmp-same prefix'.

Do BSD kernels have IFA_F_SECONDARY? :)

> I suppose SECONDARY can be defined just like that, with the added
> restriction that the address which doesn't get SECONDARY is chosen
> to be the one that the kernel does not label secondary, on kernels
> which do such any kind of secondary labeling.

Absence of 'secondary' implies 'primary', at least on linux.

> It's really a question of whether zebra should provide this
> abstraction. Since it is a bit hard, it might make sense to
> centralize it.

Well, first we should ask ourselves whether kernel's already handle
this for us, eg linux does.

One other thing we do need to consider is 'included' subnets - linux
allows this and there is no flag to distinguish such.

> eth0:
> 1.1.1.0/24
> 1.1.1.x/25
> 1.1.1.y/32
>
> I'd say we should look at all the routing protocols which deal with
> this, and see what def would be most useful for them. I consider
> the above example pathological;

So do i. I would consider it admin error. Another example is:

eth0:
1.1.1.x/24
eth1:
1.1.1.y/25

where the two interfaces are on completely seperate networks, if must
associate an address (eg source addr of a packet) with an interface
and that address fals within 1.1.1.y/25 which interface do you pick?

ospfd gets very confused by such setups, it'd be nice if it could
detect such cases. However it is still a case of admin error - dont
do that.

> having two addrs in the same prefix, having overlapping nonequal
> prefixes to me is a sign of a confused network design.

yep.

> This case is likely to be underspecified in routing daemons. So it
> may not matter much how we handle it (but of course we have to be
> clear and consistent).

at present ospfd gets confused. So we might wish to consider adding a
ZEBRA_IFA_INCLUDED flag and having zebra do inclusion detection for
interface addresses and setting the flag appropriately.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
If God had intended Man to program, we'd be born with serial I/O ports.
Re: ripd status [ In reply to ]
On Tue, 13 Jan 2004 sowmini.varadhan@sun.com wrote:

> Seems like ospfd does something to this effect too-

Yes.

> Paul tells me that it will not send packets on ZEBRA_IFA_SECONDARY
> networks, which strikes me as behavior in the same spirit?

Yep, however, ZEBRA_IFA_SECONDARY at present is only set on netlink
platforms (ie linux).

> --Sowmini

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
The founding fathers tried to set up a judicial system where the accused
received a fair trial, not a system to insure an acquittal on technicalities.
Re: ripd status [ In reply to ]
Sorry, the exact meaning at the moment is "Linux kernel set the
secondary flag".

Fine for now, but before we can go setting it in other systems we need
a real semantics def.

What I'm asking is the more abstract question of whether the IFA_DUP
flag, which some BSD kernels have, is notionally equivalent to
Linux's IFA_F_SECONDARY flag? If so, ZEBRA_IFA_SECONDARY is then what
we should set internally inside zebra. (ospfd already looks for this
flag).

NetBSD does not seem to have IFA_DUP. There is a notion of aliases,
but I think that is really to differentiate 'change the addr' from
'add this addr'. It seems to be a v4 thing only; I think v6 always
adds a new one. /sbin/ifconfig prints alias, but this is AFAICT not a
property of the address.

So yes, I think that we can define a sensible general behavior.

I had one addr, and added an alias. ifconfig showed the new one as an
alias. I deleted the first, and added it back as an alias. Now the
original address shows as an alias.

- are there kernels which can have multiple addresses attached to an
interface which do not have or do not set a 'duplicate' or
'secondary' flag?

NetBSD does this, I think. ifconfig loops over the addrs, and
addresses which are not equal to the 'primary' addr (what SIOCGIFADDR
returns) are printed with the word alias.

So zebra would have to keep track of such designations. This is
probably hairy, but it is pretty easy to write an invariant checker
and call it often.

- for kernels which set such flags, do these kernel's properly notify
zebra of events relating to change in status of duplicate/secondary
addresses? eg, if the primary address is removed, what does the
kernel do for secondary addresses and does it inform zebra
appropriately?

On Linux, the secondary address seems to vanish too.

Do BSD kernels have IFA_F_SECONDARY? :)

no

> I suppose SECONDARY can be defined just like that, with the added
> restriction that the address which doesn't get SECONDARY is chosen
> to be the one that the kernel does not label secondary, on kernels
> which do such any kind of secondary labeling.

Absence of 'secondary' implies 'primary', at least on linux.

sure, but I was trying to say how labels could get assigned when the
kernel doesn't.

> It's really a question of whether zebra should provide this
> abstraction. Since it is a bit hard, it might make sense to
> centralize it.

Well, first we should ask ourselves whether kernel's already handle
this for us, eg linux does.

Some do, but it isn't clear that this is a service that the kernel
needs to provide. AFAICT, the only thing the kernel does with the
flag is delete the secondary addrs when the primary is deleted.

Is it an error (on Linux) to have two addrs in the same prefix with
neither marked secondary?

One other thing we do need to consider is 'included' subnets - linux
allows this and there is no flag to distinguish such.

Right, so we would need another flag that says that this address is on
a subnet that is included in another (larger) subnet. If anyone
cares, and I'm not clear.

So do i. I would consider it admin error. Another example is:

eth0:
1.1.1.x/24
eth1:
1.1.1.y/25

This is also a broken setup.

where the two interfaces are on completely seperate networks, if must
associate an address (eg source addr of a packet) with an interface
and that address fals within 1.1.1.y/25 which interface do you pick?

Do you mean use the source addr to find the interface? One shouldn't
do that, but instead set SO_RECVIF so that the kernel will tell you
the interface that the packet came in on.

ospfd gets very confused by such setups, it'd be nice if it could
detect such cases. However it is still a case of admin error - dont
do that.

One could add code to zebra to intersect all the configured prefixes
and if a, b are found s.t. a != b and intersect(a,b) then issue a
warning.

at present ospfd gets confused. So we might wish to consider adding a
ZEBRA_IFA_INCLUDED flag and having zebra do inclusion detection for
interface addresses and setting the flag appropriately.

across a single interface, or across all configured addresses? (two
flags: INCLUDED_IF and INCLUDED .... :-)
Re: ripd status [ In reply to ]
Many of the questions that Paul asked were answered (or discussed)
in this thread yesterday, but

> where the two interfaces are on completely seperate networks, if must
> associate an address (eg source addr of a packet) with an interface
> and that address fals within 1.1.1.y/25 which interface do you pick?

To find the source address of the packet, it can use IP_RECVIF, right?
For outgoing packets, there are multiple choices- either first available,
or explicitly set outgoing interface with some suitable system
call.. right?

--Sowmini
Re: ripd status [ In reply to ]
On Wed, 14 Jan 2004 sowmini.varadhan@sun.com wrote:

> To find the source address of the packet, it can use IP_RECVIF,
> right?

Not on linux, its IP_PKTINFO, slightly different. Iirc, ospfd uses
source address of incoming packets to lookup appropriate ospf
interface struct.

> --Sowmini

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
"The identical is equal to itself, since it is different."
-- Franco Spisani
Re: ripd status [ In reply to ]
Not on linux, its IP_PKTINFO, slightly different.

I suppose we should have an abstraction layer to hide this then.

Iirc, ospfd uses source address of incoming packets to lookup
appropriate ospf interface struct.

That's too bad, since it can be wrong and is more subject to spoofing.
Re: ripd status [ In reply to ]
On Wed, 14 Jan 2004, Greg Troxel wrote:

> Not on linux, its IP_PKTINFO, slightly different.
>
> I suppose we should have an abstraction layer to hide this then.

Quite probably.

> That's too bad, since it can be wrong and is more subject to
> spoofing.

Well, that's why OSPF MD5 auth exists :)

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
I don't want to live on in my work, I want to live on in my apartment.
-- Woody Allen
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> I had this discussion with Gilad a long time ago. I argued that
> perhaps zebra should do detection of duplicate and included subnets
> and set the secondary flag accordingly. Discussion was inconclusive
> (though in the case of linux, kernel sets secondary flag, and its
> reflected on).

Not accurate: indeed rt_netlink.c sets the secondary flag for addresses
that are not configured by zebra, but internally configured addresses
carry a secondary attribute the obeys the user statement (ie, if the
user said 'ip address <addr>/<plen> secondary' then it's secondary, and
otherwise non-secondary, from zebra's pov).


> What about included subnets? Eg,
>
> eth0:
> 1.1.1.0/24
> 1.1.1.x/25
> 1.1.1.y/32

What about them? They don't imply any forwarding ambiguity, since the
longest prefix match wins, so what's the problem?


> ie, in the case of linux, removing the primary address removes the
> DUP (or secondary) addresses too.

Yeah, and IMO it makes sense and doesn't cause any trouble for zebra, as
this flushing of addresses is well captured via netlink reflections.

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

>>Paul tells me that it will not send packets on ZEBRA_IFA_SECONDARY
>>networks, which strikes me as behavior in the same spirit?
>
> Yep, however, ZEBRA_IFA_SECONDARY at present is only set on netlink
> platforms (ie linux).

Not true, for addresses configured from within zebra's VTY, it is set
for any address which the user has stated to be 'secondary' in the
command line. I think that's one basic inconsistency with zebra's
handling of the secondary attribute, and some decision should be taken
here, and my personal beliefs about this say that

#1, secondary in VTY is redundant and is there for historical (ie,
Cisco) reasons; nonetheless, although attempting to affect address
properties in the kernel, it's totally useless in this regard

#2, a true secondary policy should be kernel driven; this however,
requires a more sophisticated address redistribution process, such that
waits for a kernel reflection (to see what attributes the kernel has
attached to an address) and respond accordingly.

#3, even if #2 above is not currently implemented, it may be a good
practice for the protocols to determine their own "overlapping
avoidance" policy, without counting on zebra to mark secondaries for
them -- healthy and modular. That is, if secondaries are marked, let's
use those flags, otherwise we can do without it (because it means that
zebra/kernel don't really care for primary/secondary hierarchy).

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

>>like it might well be the right thing, but I'd like to see the
>>semantics for ZEBRA_IFA_SECONDARY documented.
>
> At the moment it reflects the netlink IFA_F_SECONDARY flag.

Inaccurate, see my previous posts.

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> One other thing we do need to consider is 'included' subnets - linux
> allows this and there is no flag to distinguish such.

Obviously, because it does not imply any forwarding ambiguity (see
previous replies).


>> eth0:
>> 1.1.1.0/24
>> 1.1.1.x/25
>> 1.1.1.y/32
>>
>>I'd say we should look at all the routing protocols which deal with
>>this, and see what def would be most useful for them. I consider
>>the above example pathological;
>
> So do i. I would consider it admin error.

Why's that? It is perfectly valid from IP forwarding pov...


> Another example is:
>
> eth0:
> 1.1.1.x/24
> eth1:
> 1.1.1.y/25
>
> where the two interfaces are on completely seperate networks, if must
> associate an address (eg source addr of a packet) with an interface
> and that address fals within 1.1.1.y/25 which interface do you pick?

The one whose implied connected route holds to be the longest prefix
match for the destination: given that x=1 and y=129, if your destination
is 5 then your longest prefix match is this connected route

1.1.1.0/24 via eth0 src 1.1.1.1

and if it's 150 then your longest prefix match is this connected route

1.1.1.128/25 via eth1 src 1.1.1.129

Note that, y may even imply a connected route with overlaps with x
itself, eg y=5; then again, the kernel follows the same forwarding
decision scheme as demonstrated above.


>>having two addrs in the same prefix, having overlapping nonequal
>>prefixes to me is a sign of a confused network design.
>
> yep.

Clearly, a reasonable sysadmin wouldn't set those addresses in the first
place, however they seem perfectly valid to me.

Gilad
Re: ripd status [ In reply to ]
Greg Troxel wrote:

> So yes, I think that we can define a sensible general behavior.
>
> I had one addr, and added an alias. ifconfig showed the new one as an
> alias. I deleted the first, and added it back as an alias. Now the
> original address shows as an alias.

I didn't quite get it -- can you list the commands and corresponding
ifconfig output for this sequence?


> NetBSD does this, I think. ifconfig loops over the addrs, and
> addresses which are not equal to the 'primary' addr (what SIOCGIFADDR
> returns) are printed with the word alias.

So, an 'alias' in NetBSD (or *BSD?) is any address past the first
address configured for an interface? If so, then it's a rougher scheme
than Linux, which allows finer chains in the sense that only addresses
implying the very same subnet are primary/secondary related.

Is this a correct prognosis?


> So zebra would have to keep track of such designations. This is
> probably hairy, but it is pretty easy to write an invariant checker
> and call it often.

It should only be called upon address change reflection from the kernel;
on the other hand, if BSD has a netlink interface, then any such address
change should be reported either way (namely, if a non-alias is deleted
and some alias is promoted for non-alias, then those two changes should
be reported back to zebra, aren't they?)


> - for kernels which set such flags, do these kernel's properly notify
> zebra of events relating to change in status of duplicate/secondary
> addresses? eg, if the primary address is removed, what does the
> kernel do for secondary addresses and does it inform zebra
> appropriately?
>
> On Linux, the secondary address seems to vanish too.

And it seems to distribute the corresponding sequence of netlink
messages to listeners (haven't really checked on that, but it seems
obvious).


> Do BSD kernels have IFA_F_SECONDARY? :)
>
> no

But they have alias labels, I understand from your saying? (should be
installing a BSD soon... ;->)


> Some do, but it isn't clear that this is a service that the kernel
> needs to provide. AFAICT, the only thing the kernel does with the
> flag is delete the secondary addrs when the primary is deleted.

No, it's original intend is to break ambiguity when assigning a source
address to an outbound packet. Flushing the chain of secondaries in
Linux is just one choice out of many (eg, promoting some secondary to be
a primary, as I understand your NetBSD does).


> Is it an error (on Linux) to have two addrs in the same prefix with
> neither marked secondary?

It's impossible, as the second address of this kind would be
automatically marked secondary by the kernel. Try it out.


> One other thing we do need to consider is 'included' subnets - linux
> allows this and there is no flag to distinguish such.
>
> Right, so we would need another flag that says that this address is on
> a subnet that is included in another (larger) subnet. If anyone
> cares, and I'm not clear.

I don't think it's required, nested addresses/subnets do not involve any
forwarding ambiguity, IMO (see previous posting).


> So do i. I would consider it admin error. Another example is:
>
> eth0:
> 1.1.1.x/24
> eth1:
> 1.1.1.y/25
>
> This is also a broken setup.

Haa? I don't think so (see previous posting). It may be "illegal" in the
eyes of a Cisco user, but it is perfectly valid from IP forwarding pov.


> where the two interfaces are on completely seperate networks, if must
> associate an address (eg source addr of a packet) with an interface
> and that address fals within 1.1.1.y/25 which interface do you pick?
>
> Do you mean use the source addr to find the interface? One shouldn't
> do that, but instead set SO_RECVIF so that the kernel will tell you
> the interface that the packet came in on.

Okay, that's the correct answer to Paul's question (I answered a
different question that I figured he was referring to... sorry).


> ospfd gets very confused by such setups, it'd be nice if it could
> detect such cases. However it is still a case of admin error - dont
> do that.
>
> One could add code to zebra to intersect all the configured prefixes
> and if a, b are found s.t. a != b and intersect(a,b) then issue a
> warning.

Again, I don't see a reason to do so.

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

>>To find the source address of the packet, it can use IP_RECVIF,
>>right?
>
> Not on linux, its IP_PKTINFO, slightly different. Iirc, ospfd uses
> source address of incoming packets to lookup appropriate ospf
> interface struct.

So apparently that's another problem with ospfd... ;-> Now seriously,
why should the daemon assume any restriction on the address scheme of
it's underlying kernel? Why not use some solid, guaranteed way to
conclude the inbound interface? Clearly, it should be embedded into the
kernel layer API, but that's not too much of a problem, IMO.

Gilad
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> Not accurate: indeed rt_netlink.c sets the secondary flag for
> addresses that are not configured by zebra, but internally
> configured addresses carry a secondary attribute the obeys the user
> statement (ie, if the user said 'ip address <addr>/<plen>
> secondary' then it's secondary, and otherwise non-secondary, from
> zebra's pov).

right yes, they can be manually specified. fair enough.

> What about them? They don't imply any forwarding ambiguity, since
> the longest prefix match wins, so what's the problem?

Not a forwarding ambiguity, of course. But for connected subnets its
ambigious. Routers dont just forward packets, they also communicate
(eg for routing protocols :) ).

If you have overlapping connected subnets and you wish to send a
packet to an address within the overlap, where does it go? You argue
to the most specific prefix subnet, but that host could be on the
other subnet. Anyway, such a case is admin error.

> Yeah, and IMO it makes sense and doesn't cause any trouble for
> zebra, as this flushing of addresses is well captured via netlink
> reflections.

Yes, it makes sense.

> Gilad

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Atlanta makes it against the law to tie a giraffe to a telephone pole
or street lamp.
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> > At the moment it reflects the netlink IFA_F_SECONDARY flag.
>
> Inaccurate, see my previous posts.

It means either netlink set IFA_F_SECONDARY or the admin manually set
the flag.

> Gilad

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
If built in great numbers, motels will be used for nothing but illegal
purposes.
-- J. Edgar Hoover
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> Not a forwarding ambiguity, of course. But for connected subnets its
> ambigious. Routers dont just forward packets, they also communicate
> (eg for routing protocols :) ).

What's the difference to that matter?...


> If you have overlapping connected subnets and you wish to send a
> packet to an address within the overlap, where does it go? You argue
> to the most specific prefix subnet, but that host could be on the
> other subnet. Anyway, such a case is admin error.

Clearly if the destination resides in the nested portion of the larger
subnet, then someone made a mistake. But someone can also make a mistake
by adding a default route to to unknown nexthop, or keep an interface in
shutdown status, and so on. Errors happen, but the model remains the
same ;-> (and has to be as generalized as possible yet consistent)

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> It means either netlink set IFA_F_SECONDARY or the admin manually set
> the flag.

And in the latter case, there's absolutely no correlation between
zebra's secondary flag and that of the kernel.

Gilad
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> Obviously, because it does not imply any forwarding ambiguity (see
> previous replies).

Right, but it does imply a connected subnet ambiguity. Overlapping
routes are normal and healthy, overlapping interface addresses are
not (other than as secondary interfaces).

> Why's that? It is perfectly valid from IP forwarding pov...

absolutely.

> The one whose implied connected route holds to be the longest prefix
> match for the destination: given that x=1 and y=129, if your destination
> is 5 then your longest prefix match is this connected route
>
> 1.1.1.0/24 via eth0 src 1.1.1.1
>
> and if it's 150 then your longest prefix match is this connected route
>
> 1.1.1.128/25 via eth1 src 1.1.1.129
>
> Note that, y may even imply a connected route with overlaps with x
> itself, eg y=5; then again, the kernel follows the same forwarding
> decision scheme as demonstrated above.

Yes, i agree, as i had stated, i consider it admin error.

> Clearly, a reasonable sysadmin wouldn't set those addresses in the
> first place, however they seem perfectly valid to me.

WRT to reasonble admin: quite clearly, yes. Wrt valid, technically
yes, but as you admin, not reasonable. And having admined a network
where a previous admin had decided to number one network using a /25
from an in use /24, i know funny things happen. Also, admins
sometimes make mistakes or sometimes they just dont realise all the
ins and outs (they can be a hard worked bunch often), it'd be nice to
give them a hand and have ospfd do the right thing rather than have
them scratch their heads wondering why ospfd misbehaves so and
eventually figure out its cause of that host address they added (or
their HA software added) - if they ever figure it out.

The problem is both OSPF and RIP expect to work on basis of subnets.
And I think if its reasonable that they work out what connected nets
are secondary and which are included, then its reasonable for zebra
to do it (do it one place, rather than seperately in every daemon
surely?).

I argued this before, to keep a master table of connected prefixes
and mark addresses, but you didnt seem agreeable to the idea iirc.
The flags can be purely informational, for use by the daemons at
their discretion (eg, as ospfd currently uses).

After the last discussion, i made ospfd check for SECONDARY and not
enable ospf on those addresses (and made sure to tell a certain user
to have his HA software configure 'failover' addresses with the same
netmask as his 'primary' address to ensure the kernel set secondary).
It'd be nice if there was also a INCLUDED flag, so ospfd could do the
same. And it would help ripd too, it appears.

> Gilad

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Dreams are free, but you get soaked on the connect time.
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> What's the difference to that matter?...

The OSPF and RIP RFCs like to think of connected subnets? :)

> keep an interface in shutdown status, and so on. Errors happen, but
> the model remains the same ;-> (and has to be as generalized as
> possible yet consistent)

Agreed. See my previous email for more obtuse discussion.
Specifically manual setting of the flag doesnt help where external
software adds addresses (HA software was what brought up previous
ospfd problem (heartbeat and amir's SRRD)).

Detecting secondary addresses and setting a flag thus would be
consistent.

Detecting included subnets and setting a flag thus would be
consistent.

What daemons might make of the flags is down to them.

Is that not consistent?

> Gilad

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
To understand a program you must become both the machine and the program.
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> And in the latter case, there's absolutely no correlation between
> zebra's secondary flag and that of the kernel.

Yep. But then again, I do not argue that people should not have
enough rope to hang themselves :)

> Gilad

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
Biology grows on you.
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> Detecting secondary addresses and setting a flag thus would be
> consistent.

Yes, as long as it goes hand-in-hand with the kernel's concept of
secondary. Generally speaking, I believe that secondary (the Linux way,
or the Cisco way which is probably equivalent to BSD's aliases) means
that the router should not use that address as a source address, but
rather the primary one. I presume that routing protocols should better
be following this paradigm, so as not to confuse neighbors. And so, I
conclude that such link-oriented protocols (ripd, ospfd) should
implement bundling of secondary network prefixes into the same abstract
"interface" object indexed by the corresponding primary address (thus
not advertising more than once for each such bundle). The point is, that
this "bundle" is different from OS to OS, as Linux probably allows more
delicate bundling that others, for example.


> Detecting included subnets and setting a flag thus would be
> consistent.

Consistent it is, but how would you benefit from this flag? How does it
promote correct behavior of routing protocols (or else)?


> What daemons might make of the flags is down to them.

Don't agree: since secondary is aimed to solve a forwarding problem, and
thus induces constraints on the way routers can use their IP addresses
as source address for outbound packets, routing protocols should follow
this paradigm quite strictly, IMO (that is, given that they have an
accurate notion of what's primary and what isn't with their addresses).
In any case, they should be able to comply with the most general known
scheme (probably the Linux one) with respect to this "bundling" feature
I mentioned above.


> Is that not consistent?

Depends on your scope... ;-> (see above)

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

>>And in the latter case, there's absolutely no correlation between
>>zebra's secondary flag and that of the kernel.
>
> Yep. But then again, I do not argue that people should not have
> enough rope to hang themselves :)

If 'ip addr <addr>/<plen> secondary' is completely useless other than
allowing sysadmins to hang themselves, then I'd say it should be gone.

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> WRT to reasonble admin: quite clearly, yes. Wrt valid, technically
> yes, but as you admin, not reasonable. And having admined a network
> where a previous admin had decided to number one network using a /25
> from an in use /24, i know funny things happen. Also, admins
> sometimes make mistakes or sometimes they just dont realise all the
> ins and outs (they can be a hard worked bunch often), it'd be nice to
> give them a hand and have ospfd do the right thing rather than have
> them scratch their heads wondering why ospfd misbehaves so and
> eventually figure out its cause of that host address they added (or
> their HA software added) - if they ever figure it out.

Isn't you who mentioned the rope's length a few minutes ago?... ;->


> The problem is both OSPF and RIP expect to work on basis of subnets.
> And I think if its reasonable that they work out what connected nets
> are secondary and which are included, then its reasonable for zebra
> to do it (do it one place, rather than seperately in every daemon
> surely?).

Secondary I agree, as long as it sticks to the OS's notion of secondary.
Included -- well, what exactly would you expect your protocol to do
given that it notices that some connected network is in fact "included"?
If it isn't supposed to advertise (into) it, your saying, wouldn't that
cause those clueless admins to raise an eyebrow trying to figure out why
the hell ospfd doesn't work for that subnet? I say it's crooked either
way, and I still don't see the benefit in concluding "included" network.

Let alone, that computing "included" may be problematic: suppose there
are tens of /24 addresses on a router (no overlapping or so), but
there's a single /16 which contains them all. Clearly, of all them are
marked "included"; however, when that /16 is deleted (the admin has
finally discovered why ospfd helped him by not advertising on his /24
networks), zebra should do some exhaustive work in changing that
attribute, distributing to all protocols, and those in turn should
change their whole configuration by opening new abstract interfaces for
each subnet, state machine, etc... I don't really seem to see the
utility in this.


> After the last discussion, i made ospfd check for SECONDARY and not
> enable ospf on those addresses (and made sure to tell a certain user
> to have his HA software configure 'failover' addresses with the same
> netmask as his 'primary' address to ensure the kernel set secondary).
> It'd be nice if there was also a INCLUDED flag, so ospfd could do the
> same. And it would help ripd too, it appears.

Obviously, you're more pragmatic than I am... ;->

However, I'm looking for a consistent solution, and I believe it should
be as generalized as possible: to this extent I'm not really convinced
that marking "included" is of any benefit.

Gilad
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> If 'ip addr <addr>/<plen> secondary' is completely useless other
> than allowing sysadmins to hang themselves, then I'd say it should
> be gone.

No, its useful to force the secondary flag.

> Gilad

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
warning: do not ever send email to spam@dishone.st
Fortune:
linux: because a PC is a terrible thing to waste
(ksh@cis.ufl.edu put this on Tshirts in '93)
Re: ripd status [ In reply to ]
Here is a trace of doing some alias configuration.

2#> ifconfig ex0
ex0: flags=8822<BROADCAST,NOTRAILERS,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:50:da:[redacted]
media: Ethernet autoselect
status: no carrier
3#> ifconfig ex0 10.0.0.1/24
4#> ifconfig ex0
ex0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:50:da:[redacted]
media: Ethernet autoselect (none)
status: no carrier
inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
inet6 fe80::[redacted]%ex0 prefixlen 64 scopeid 0x5
5#> ifconfig ex0 10.0.0.2/24 alias
6#> ifconfig ex0
ex0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:50:da:[redacted]
media: Ethernet autoselect (none)
status: no carrier
inet 10.0.0.1 netmask 0xffffff00 broadcast 10.0.0.255
inet alias 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
inet6 fe80::[redacted]%ex0 prefixlen 64 scopeid 0x5

now we have 2 addrs

7#> ifconfig ex0 10.0.0.1/24 delete
8#> ifconfig ex0
ex0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:50:da:[redacted]
media: Ethernet autoselect (none)
status: no carrier
inet 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
inet6 fe80::[redacted]%ex0 prefixlen 64 scopeid 0x5

alias no longer shows up, since .2 is now the first v4 addr.

9#> ifconfig ex0 10.0.0.2/24 alias
10#> ifconfig ex0
ex0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:50:da:[redacted]
media: Ethernet autoselect (none)
status: no carrier
inet 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
inet6 fe80::[redacted]%ex0 prefixlen 64 scopeid 0x5

I feebed typing; had I typed 1, there would have been a second inet
line with .1 and alias.

11#> ifconfig ex0 10.0.0.3/16
12#> ifconfig ex0
ex0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:50:da:[redacted]
media: Ethernet autoselect (none)
status: no carrier
inet 10.0.0.3 netmask 0xffff0000 broadcast 10.0.255.255
inet6 fe80::[redacted]%ex0 prefixlen 64 scopeid 0x5

Without alias, the address is replaced.


So, an 'alias' in NetBSD (or *BSD?) is any address past the first
address configured for an interface? If so, then it's a rougher scheme
than Linux, which allows finer chains in the sense that only addresses
implying the very same subnet are primary/secondary related.

Right, except that I'd say simpler rather than rougher; there is
simply no concept of related enforced by the kernel.

In NetBSD (and I think the others) that interfaces simply may have
more than one address (within any given address family). The notion
of alias is IPv4 only, and is purely for interacting with the user in
order to distinguish between the following two requests:

# basic setup for both requests
ifconfig le0 inet 10.0.0.1/24

# request A:
ifconfig le0 inet 10.0.0.2/24
# now le0 has just 10.0.0.2/24 (or any other addr; the netmask of
# the second address is used of course but the replacement happens
# regardless of its value).

# request B:
ifconfig le0 inet 10.0.0.3/24 alias
# now le0 has two addresses. The second one is second because
# addresses are put at the end of the list.

I am pretty sure that in case B the first address will be used for
source selection.

When you do 'ifconfig le0', the second address (and succeeding) are
printed as alias, but that's because ifconfig has logic to get the
first address and prints the word alias on all addresses that aren't
the same. There is no alias flag in the kernel. This means that
alias is printed regardless of whether the subnets match.

I found a local machine with a non-matching subnet on an interface.
After anonymizing, this is what it looks like:

le0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
address: 08:00:20:[redacted]
media: Ethernet autoselect (10baseT)
status: active
inet 10.0.1.1 netmask 0xffffff00 broadcast 10.0.1.255
inet alias 10.0.2.5 netmask 0xfffffff8 broadcast 10.0.2.7
inet6 fe80::[redacted]%le0 prefixlen 64 scopeid 0x1

This is a broken setup; the 10.0.2.0/29 block is for vhosts and there
is no reason to expect to find say .4 on the same ethernet; the point
of the scheme is to distribute all of those addresses to their
servers.
Really this address should be configured as a /32, and I think best
practices call for it to be on lo0 too. (This way only packets that
are routed to the box are answered, and we are very sure that it will
not be used for a source address of a new sendto()/connect().

Note that this is all for IPv4. On IPv6 the default behavior is to
add a new address, not replace. If you want to replace, it is delete
and add (or add and delete). This is because in IPv6 it is normal to
have more than one address, esp. since there is always a link-local
address.

So yes, this lacks the ability of Linux to configure primary and
secondary addresses in arbitrary order, and to then have source
selection use the primary one. It also doesn't auto-delete chained
addresses. In practice, people put the primary address first in
/etc/ifconfig.le0, and then the others, and the lack of this feature
does not cause pain that I have either noticed or heard about.

Also, if you want a 'virtual host' type address that isn't on the
subnet that your main interface is on, typically you put it

Does Linux have the same secondary notion for IPv6?
Re: ripd status [ In reply to ]
Greg Troxel wrote:

> Here is a trace of doing some alias configuration.

Thanks for that, demonstrates well what you've explained previsouly.


> So, an 'alias' in NetBSD (or *BSD?) is any address past the first
> address configured for an interface? If so, then it's a rougher scheme
> than Linux, which allows finer chains in the sense that only addresses
> implying the very same subnet are primary/secondary related.
>
> Right, except that I'd say simpler rather than rougher; there is
> simply no concept of related enforced by the kernel.

When I said rougher, I referred to the implication on source address
selection: it seems that a single address is used to access all
connected networks on a given interface. For instance, assume your
interface has 10.0.0.1/24 and 10.0.1.1/24 (alias), following this scheme
the former will be used to access neighbor 10.0.1.5/24, although the
natural choice, and the one taken in Linux kernels, would be 10.0.0.1/24.


> I am pretty sure that in case B the first address will be used for
> source selection.
>
> When you do 'ifconfig le0', the second address (and succeeding) are
> printed as alias, but that's because ifconfig has logic to get the
> first address and prints the word alias on all addresses that aren't
> the same. There is no alias flag in the kernel. This means that
> alias is printed regardless of whether the subnets match.

You're saying that 'alias' is just an ifconfig notation, but I doubt
that: assume a multihomed interface with 10.0.0.1/24, 10.0.1.1/24
(alias) and 10.0.2.1/24 (alias); what happens when you issue 'ifconfig
ifname 10.0.3.1/24'? According to your explanation, 10.0.3.1/24 would
replace 10.0.0.1/24 as the primary (non-alias) address, while keeping
the other two intact. However, if you claim that alias is just a user
syntactic sugar, how does ifconfig actually remove the old address and
install the new one such that it is the *first* in the list of IPv4
addresses? If I accept your saying that 'alias' has no meaning other
than "not first", then how can ifconfig set an address to be "first"
(meaning, non-alias)?


> This is a broken setup; the 10.0.2.0/29 block is for vhosts and there
> is no reason to expect to find say .4 on the same ethernet; the point
> of the scheme is to distribute all of those addresses to their
> servers.
> Really this address should be configured as a /32, and I think best
> practices call for it to be on lo0 too. (This way only packets that
> are routed to the box are answered, and we are very sure that it will
> not be used for a source address of a new sendto()/connect().

I'm afraid I didn't quite understand the moral of this example -- ?


> Note that this is all for IPv4. On IPv6 the default behavior is to
> add a new address, not replace. If you want to replace, it is delete
> and add (or add and delete). This is because in IPv6 it is normal to
> have more than one address, esp. since there is always a link-local
> address.

Or the other way round: in IPv4, it is legacy to have a single address,
and so legacy-derived schemes use all kind of tricks (like aliases)... ;->


> So yes, this lacks the ability of Linux to configure primary and
> secondary addresses in arbitrary order, and to then have source
> selection use the primary one. It also doesn't auto-delete chained
> addresses. In practice, people put the primary address first in
> /etc/ifconfig.le0, and then the others, and the lack of this feature
> does not cause pain that I have either noticed or heard about.

On the other hand, a simple extension to Linux's ioctl/netlink can imply
different fallback policy for a deleted primary address, like promoting
the first (FIFO order) secondary in the chain.


> Does Linux have the same secondary notion for IPv6?

Didn't ever check, however I believe it does (otherwise can't tell how
source address ambiguity problem is resolved).

Gilad
Re: ripd status [ In reply to ]
When I said rougher, I referred to the implication on source address
selection: it seems that a single address is used to access all
connected networks on a given interface. For instance, assume your
interface has 10.0.0.1/24 and 10.0.1.1/24 (alias), following this scheme
the former will be used to access neighbor 10.0.1.5/24, although the
natural choice, and the one taken in Linux kernels, would be 10.0.0.1/24.

Sorry, I wasn't thinking about that case. The source address is
indeed chosen as the first one matching the route that led to the
network.

e.g., I did

# add alias that is unrelated to the 'real' prefix
ifconfig ex0 192.168.100.210 alias

leading to that showing up second/alias on ifconfig, and then

ping 192.168.100.x

10:45:10.650652 arp who-has 192.168.100.x tell 192.168.100.210
10:45:10.651745 arp reply 192.168.100.x is-at [redacted]
10:45:11.659325 192.168.100.210 > 192.168.100.x: icmp: echo request
10:45:11.660388 192.168.100.x > 192.168.100.210: icmp: echo reply

ssh 192.168.100.x

10:46:51.654199 192.168.100.210.59867 > 192.168.100.x.22: S 642066818:642066818(0) win 16384 <mss 1460,nop,wscale 0,nop,nop,timestamp 12 0>
10:46:51.655281 192.168.100.x.22 > 192.168.100.210.59867: R 0:0(0) ack 642066819 win 0
Re: ripd status [ In reply to ]
You're saying that 'alias' is just an ifconfig notation, but I doubt
that: assume a multihomed interface with 10.0.0.1/24, 10.0.1.1/24
(alias) and 10.0.2.1/24 (alias); what happens when you issue 'ifconfig
ifname 10.0.3.1/24'? According to your explanation, 10.0.3.1/24 would
replace 10.0.0.1/24 as the primary (non-alias) address, while keeping
the other two intact. However, if you claim that alias is just a user
syntactic sugar, how does ifconfig actually remove the old address and
install the new one such that it is the *first* in the list of IPv4
addresses?

No, I've read the code. It really is in sbin/ifconfig.c.

I did the experiment you described, and got this:

inet 10.0.1.1 netmask 0xffffff00 broadcast 10.0.1.255
inet alias 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255
inet alias 10.0.3.1 netmask 0xffffff00 broadcast 10.0.3.255

So it deleted the first one and then added one.
The new one showed up at the end.

This is arguably odd behavior.
But, either you do manual remove and add with 'alias', or you just
have one.

If I accept your saying that 'alias' has no meaning other
than "not first", then how can ifconfig set an address to be "first"
(meaning, non-alias)?

Well, it could use some mythical add-first ioctl, or remove all and
readd, or something. Right now it doesn't.

> This is a broken setup; the 10.0.2.0/29 block is for vhosts and there
> is no reason to expect to find say .4 on the same ethernet; the point
> of the scheme is to distribute all of those addresses to their
> servers.
> Really this address should be configured as a /32, and I think best
> practices call for it to be on lo0 too. (This way only packets that
> are routed to the box are answered, and we are very sure that it will
> not be used for a source address of a new sendto()/connect().

I'm afraid I didn't quite understand the moral of this example -- ?

With the (real) setup I described, attempts to connect to 10.0.2.4/29
may fail, since that address isn't necessarily on a host on that LAN,
since the setup is to assign the /29 to virtual use, assign individual
addressses (e.g. www.ir.bbn.com) to specific hosts, and then inject
/32 routes to the 'real' hosts for each virtual address.
So it should be set up not to assume that the other virtual addrs are
reachable directly, because they might not be.

Or the other way round: in IPv4, it is legacy to have a single address,
and so legacy-derived schemes use all kind of tricks (like aliases)... ;->

Sure. The natural thing would be to drop the 'alias' flag and the
delete-on-new-addr behavior of ifconfig.

On the other hand, a simple extension to Linux's ioctl/netlink can imply
different fallback policy for a deleted primary address, like promoting
the first (FIFO order) secondary in the chain.

Sounds fine, but there's always the 'does the kernel need to do this,
or can ifconfig' debate (that isn't useful to have).

Didn't ever check, however I believe it does (otherwise can't tell how
source address ambiguity problem is resolved).

BSD resolves it without flags. Look at the route that got us here,
find a matching prefix, and take the first one. Arguably this is less
flexible/configurable, but it basically works.

actually v6 source selection is much more complicated, since if you
have an interface with only link-local addrs (eg. gif in some cases,
ppp), locally-originated packets going over the ppp interface need to
get a global addr from someplace, and an addr (from another interface)
is chosen.
Re: ripd status [ In reply to ]
I've been distracted with some other things and I haven't been
keeping up with this thread carefully, so please bear with me
if I'm stating something that you guys have already figured out.

But, understanding the kernel's data structs and ioctls for bsd
will make the whole alias thing easier to understand. The kernel
has something pretty similar to zebra's data structs:


struct ifnet

|
`--> ifaddr --> ifaddr --> ifaddr etc.

the first ifaddr is usually the AF_LINK for the interface,
and then (assuming the interface has only ipv4) the ifaddr for the
primary ipv4 address, followed by the aliases.

If the user (e.g., ifconfig) does a SIOCSIFADDR, the primary ipv4 address
is added (modified, if it already exists). If ifconfig does a
SIOCAIFADDR, the aliases are added. The SIFADDR ioctl expects an
ifreq data, the AIFADDR expects an in_aliasreq struct

Does that straighten things out wrt *bsd? Or was I talking about
something else than what's being discussed? :-(

--Sowmini
Re: ripd status [ In reply to ]
Does that straighten things out wrt *bsd? Or was I talking about
something else than what's being discussed? :-(

Not redundant, and your diagram is quite helpful.

But on NetBSD 1.6.1, ifconfig always does SIOCAIFADDR, and it does a
SIOCDIFADDR first if the alias keyword is not given.
But I think the SIOCSIFADDR ioctl works as you describe.
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Greg Troxel wrote:

> Does Linux have the same secondary notion for IPv6?

Doesnt appear to:

# ip -6 ad sho dev eth0
7: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet6 fe80::260:97ff:fe54:1ec9/64 scope link
inet6 2001:770:105:1:260:97ff:fe54:1ec9/64 scope global
# ip -6 addr add 2001:770:105:1::22/64 dev eth0
# ip -6 ad sho dev eth0
7: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet6 2001:770:105:1::22/64 scope global tentative
inet6 fe80::260:97ff:fe54:1ec9/64 scope link
inet6 2001:770:105:1:260:97ff:fe54:1ec9/64 scope global

that's Fedora 2.4.22-1.2140.nptl kernel, 2.6.0 also does not set the
secondary flag.

--paulj
Re: ripd status [ In reply to ]
On Thu, 15 Jan 2004, Gilad Arnold wrote:

> When I said rougher, I referred to the implication on source address
> selection: it seems that a single address is used to access all
> connected networks on a given interface.

No.. its whatever the route table determines for source address
selection, iirc its typically the primary address for that subnet, or at
least the ip util would lead one to believe so. And src selection can be
influenced by adding routes, eg:

# ip addr add 192.168.24.1/24 dev dummy0
# ip addr add 192.168.25.1/24 dev dummy0
# ip addr add 192.168.24.2/24 dev dummy0
# ip -4 ad sho dev dummy0
17: dummy0: <BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 192.168.24.1/24 scope global dummy0
inet 192.168.25.1/24 scope global dummy0
inet 192.168.24.2/24 scope global secondary dummy0

# ip ro get 192.168.25.0/24
broadcast 192.168.25.0 dev dummy0 src 192.168.25.1
cache <local,brd> mtu 1500 advmss 1460

# ip ro get 192.168.24.0/24
broadcast 192.168.24.0 dev dummy0 src 192.168.24.1
cache <local,brd> mtu 1500 advmss 1460

# # add a route to specify a different source:
# ip ro add 192.168.25.100/32 dev dummy0 src 192.168.24.2

# ip ro get 192.168.25.100
192.168.25.100 dev dummy0 src 192.168.24.2
cache mtu 1500 advmss 1460
# ip ro ge 192.168.25.10
192.168.25.10 dev dummy0 src 192.168.25.1
cache mtu 1500 advmss 1460

> neighbor 10.0.1.5/24, although the natural choice, and the one taken
> in Linux kernels, would be 10.0.0.1/24.

sure? that'd mean the kernel doesnt do what ip would lead one to
believe. (which'd be annoying).

> Didn't ever check, however I believe it does (otherwise can't tell how
> source address ambiguity problem is resolved).

It doesnt appear to. It has the secondary flag, but it isnt set. Dont
know if that's intentional or whether its a buglet wrt IPv6
implementation.

There doesnt appear to be a way to influence src address selection
either. (then again this may be intentional. It was my vague
understanding IPv6 possibly does not allow multiple subnets/addresses
per link. But I cant remember why I have this vague understanding, nor
can i find any handy reference to answer the question.).

> Gilad

regards,

--paulj
Re: ripd status [ In reply to ]
Paul Jakma wrote:

>>When I said rougher, I referred to the implication on source address
>>selection: it seems that a single address is used to access all
>>connected networks on a given interface.
>
> No.. its whatever the route table determines for source address
> selection, iirc its typically the primary address for that subnet, or at
> least the ip util would lead one to believe so.

In fact, I was referring to Greg's example of a BSD stack (and you
probably figured I was talking about Linux?)


> And src selection can be influenced by adding routes, eg:

[ snip iproute2 trace ]

Yes, you can do that, however in order to change the src for the whole
connected network (ie, replace the current, kernel derived connected
route with one of your own) you'll need to explicitly remove the current
connected route and install a new one, otherwise you get an error adding
this new route. But anyway, you don't really want to do that in most
cases, as the kernel still considers the first address of a subnet to be
the primary one, with all the associated implications (like secondary
chain flushing, etc).


> sure? that'd mean the kernel doesnt do what ip would lead one to
> believe. (which'd be annoying).

Don't be annoyed! (I was talking about BSD, see above ;->)

Gilad
Re: ripd status [ In reply to ]
Paul Jakma wrote:

> # ip -6 addr add 2001:770:105:1::22/64 dev eth0
> # ip -6 ad sho dev eth0
> 7: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
> inet6 2001:770:105:1::22/64 scope global tentative
> inet6 fe80::260:97ff:fe54:1ec9/64 scope link
> inet6 2001:770:105:1:260:97ff:fe54:1ec9/64 scope global

Seems like your new address is added with that "tentative" flag --
what's that? Could it be another way of saying "secondary", in IPv6-ish?

Gilad
Re: ripd status [ In reply to ]
On Sun, 18 Jan 2004, Gilad Arnold wrote:

>
> Paul Jakma wrote:
>
> > # ip -6 addr add 2001:770:105:1::22/64 dev eth0
> > # ip -6 ad sho dev eth0
> > 7: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
> > inet6 2001:770:105:1::22/64 scope global tentative
> > inet6 fe80::260:97ff:fe54:1ec9/64 scope link
> > inet6 2001:770:105:1:260:97ff:fe54:1ec9/64 scope global
>
> Seems like your new address is added with that "tentative" flag --
> what's that? Could it be another way of saying "secondary", in IPv6-ish?

No. The tentative flag meens that this address is being checked for
availability - if there is no other host with this address. After check,
which takes about 2 sec, this flag is removed.


best regards,

Krzysztof Olêdzki
Re: ripd status [ In reply to ]
> # ip -6 addr add 2001:770:105:1::22/64 dev eth0
> # ip -6 ad sho dev eth0
> 7: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
> inet6 2001:770:105:1::22/64 scope global tentative
> inet6 fe80::260:97ff:fe54:1ec9/64 scope link
> inet6 2001:770:105:1:260:97ff:fe54:1ec9/64 scope global

Seems like your new address is added with that "tentative" flag --
what's that? Could it be another way of saying "secondary", in IPv6-ish?

No, tentative means that Duplicate Address Detection has not yet
completed.
It is interesting that the new address shows up earlier, and in
particular that it is ahead of the Link Local address.