Mailing List Archive

link-detect issues (not quite working)
Hi,

quagga 0.96.5-20040810 (from cvs webpage download)
kernel 2.6.6
interface card 8139too drivers (stock from kernel)
zebra config below

I saw odd behavior with link-detect enabled.

The card drivers would correctly report the IFF_RUNNING status (and ip
monitor did too, as did ifconfig tell me correctly RUNNING or not) of
the ethernet link.

quagga would correctly modify it's RIB to match the card status
IE, if there was good link on eth0 a "show ip route" within quagga
would show the route to the net available, and connected. And if the
link was bad the "show ip route" within quagga would not show a
route to the net, as well as "show int eth0" saying
Interface eth0 is up, line protocol is down

But the kernel FIB would not be modified. Even if the link was not showing
as running (via ifconfig or "show int eth0" saying line protocol down)
the actual kernel rib (via netstat -r -n or ip route list on command
line) would still show the route to the network that was instantiated
on eth0. And it would try to send packets out that interface if destined
to that network.

Some of these problems were discussed in quagga-user back in may/june 2004
under the subject: Problem with link-detect. No real resolution
came out of that thread.


I did some pokeing around in the code this afternoon. It seems quagga
doesn't even try to modify the RIB for RIB_SYSTEM_ROUTE routes.
RIB_SYSTEM_ROUTE routes being ZEBRA_ROUTE_KERNEL or ZEBRA_ROUTE_CONNECT
routes.

So I took (what I thought) was a sledgehammer to it and changed the
RIB_SYSTEM_ROUTE macro to only be ZEBRA_ROUTE_KERNEL routes. But the
netlink calls to remove the CONNECTED routes failed:

2004/08/10 17:18:12 ZEBRA: netlink_parse_info: netlink-listen type RTM_NEWLINK(16), seq=0, pid=0
2004/08/10 17:18:12 ZEBRA: MESSAGE: ZEBRA_INTERFACE_DOWN eth0
2004/08/10 17:18:12 ZEBRA: netlink_route_multipath(): (single hop)RTM_DELROUTE 10.1.1.128/26 via 10.1.1.128 if 12, type Directly connected
2004/08/10 17:18:12 ZEBRA: netlink_talk: netlink-cmd type RTM_DELROUTE(25), seq=172004/08/10 19:18:12 ZEBRA: netlink-cmd error: No such process, type=RTM_DELROUTE(25), seq=17, pid=0


My knowledge of netlink interface is pretty slim, so I may (probably am)
missing subtleties with my brute force approach.

If I "ip route delete" the route from the command line, it goes away just
fine. Even though the interface is still UP (and not RUNNIING :)

So. My questions are:

1. Why doesn't zebra mess with the CONNECTED routes?
Shouldn't link-detect cause even CONNECTED routes (maybe especially
CONNECTED routes) to be removed from the kernel FIB as well as
the zebra RIB?

2. Why did my brute force attempt to let it do that fail?
Can netlink not be used to remove CONNECTED routes, or was the
netlink command to do it not correctly formed?, wrong fib? wrong
type? As I said my netlink experience is limited to what I have learned
today, so I may be completly misunderstanding something.

Im going to spend even more time tomorrow researching this and
the code, but if there is a clue bat, please whap me upside the
head with it.

Thanks for any pointers or comments
E

(zebra.conf included)
[root@localhost hda1]# more zebra.conf
!
! Zebra configuration saved from vty
! 2004/08/09 18:27:07
!
hostname g1-r3
password zeb1
enable password zeb1
log file /mnt/hda1/zebra.log
debug zebra events
debug zebra kernel
debug zebra packet
!
interface eth0
description link to area 2 net 2
link-detect
ip address 10.1.1.130/26
ipv6 nd suppress-ra
!
interface eth1
description link to area 2 net 3
link-detect
ip address 10.1.1.193/26
ipv6 nd suppress-ra
!
interface eth2
description link to area 0 net 0
link-detect
ip address 10.1.0.4/24
ipv6 nd suppress-ra
!
interface gre0
shutdown
ipv6 nd suppress-ra
!
interface lo
!
interface shaper0
ipv6 nd suppress-ra
!
interface sit0
shutdown
ipv6 nd suppress-ra
!
interface tunl0
shutdown
ipv6 nd suppress-ra
!
!
line vty
!
Re: link-detect issues (not quite working) [ In reply to ]
On Tue, 10 Aug 2004, Eric S. Johnson wrote:

> So. My questions are:
>
> 1. Why doesn't zebra mess with the CONNECTED routes?
> Shouldn't link-detect cause even CONNECTED routes (maybe especially
> CONNECTED routes) to be removed from the kernel FIB as well as
> the zebra RIB?

Well, why should it?

Or to put it another way.. if you think the correct thing to do is to
remove the connected the route when the link is not running, why not
try persuade the kernel people to have the kernel remove the route
which it created?

> 2. Why did my brute force attempt to let it do that fail? Can
> netlink not be used to remove CONNECTED routes, or was the netlink
> command to do it not correctly formed?, wrong fib? wrong type? As I
> said my netlink experience is limited to what I have learned today,
> so I may be completly misunderstanding something.

Not sure, output of 'ip monitor' (the ip tool, not quagga command)
while zebra tries to remove this route would be informative.

> Im going to spend even more time tomorrow researching this and the
> code, but if there is a clue bat, please whap me upside the head
> with it.

The kernel stuck it there.. why not get the kernel to remove it if
removing connected routes is the correct thing to do?

If zebra removes it, zebra will also have to reinstall it when link
comes back.

> Thanks for any pointers or comments
> E

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
If it has syntax, it isn't user friendly.
Re: link-detect issues (not quite working) [ In reply to ]
Paul Jakma wrote:

> On Tue, 10 Aug 2004, Eric S. Johnson wrote:
>
>> So. My questions are:
>>
>> 1. Why doesn't zebra mess with the CONNECTED routes?
>> Shouldn't link-detect cause even CONNECTED routes (maybe especially
>> CONNECTED routes) to be removed from the kernel FIB as well as
>> the zebra RIB?
>
>
> Well, why should it?
>
> Or to put it another way.. if you think the correct thing to do is to
> remove the connected the route when the link is not running, why not
> try persuade the kernel people to have the kernel remove the route
> which it created?

It is too complicate to manage these states into a Kernel (what ever BSD
or Linux). It is easy to remove the connected routes, but it is more
complicate to restore them.
In fact, I think that the best approach is to stop redistributing the
connected routes to the protocols when the link is removed.
It means that the connected route should not have the ">" (selected
route). Then the routing protocols will stop announcing them.

Regards,
Vincent
Re: link-detect issues (not quite working) [ In reply to ]
On Wed, 11 Aug 2004, Vincent Jardin wrote:

> It is too complicate to manage these states into a Kernel (what ever BSD or
> Linux). It is easy to remove the connected routes, but it is more complicate
> to restore them.

Hmm, well the events are there, the kernel could remove the connected
routes. I suspect it's more to do with possible breakage that would
occur with non-link-running aware apps that expect to be able to send
packets out a specific interface by specifying the appropriate
connected address, possibly.

> In fact, I think that the best approach is to stop redistributing
> the connected routes to the protocols when the link is removed. It
> means that the connected route should not have the ">" (selected
> route). Then the routing protocols will stop announcing them.

Could do that, yes.

Note that OSPF, for active links, already takes care of dealing with
dead-but-up links in the protocol.

> Regards,
> Vincent

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
Second-sytem effect.
Re: link-detect issues (not quite working) [ In reply to ]
paul@clubi.ie said:
> Well, why should it?

> Or to put it another way.. if you think the correct thing to do is
> to remove the connected the route when the link is not running, why
> not try persuade the kernel people to have the kernel remove the
> route which it created?

Heh. Well, yeah. That seems to be the thought of some folks.
Looks like http://www.linuxvirtualserver.org/~acassen/ did just that
(came up with some kernel patches for 2.4.23 that did that.
Don't know why/whynot they didn't make it into the 2.6 world.

But, really, the kernel didn't create the route. Zebra did. ifconfig
shows nothing up but lo0 before zebra starts up.

The "connected" routes are added by the kernel as zebra brings the interface
up due to my interface eth0 ip address commands.

Get this: If I run my zebra that has the RIB_SYSTEM_ROUTE macro (in zebra_rib.c)
changed to allow zebra to mess with connected routes, after startup of zebra
I have two entries in the route table for each interface. One kernel and one
zebra. If I delete the kernel routes (manually, outside of zebra) then
all seems to work well.


> The kernel stuck it there.. why not get the kernel to remove it if
> removing connected routes is the correct thing to do?

Well, the kernel stuck it there at the request of zebra bringing up the
interface.

Im not looking to place blame here :) Just get the proper functionality
out of a quagga router. The rib is all correct, but because the fib/kernel is
still thinking the interface is down packets get blackholed.

Since zebra knows the link is down, can't it help the broken kernel out ;)


> If zebra removes it, zebra will also have to reinstall it when link
> comes back.

zactly. Just like it does in the RIB.


Im playing around some more with it. More later.

(side note, would just moving to BSD make my life happier? Does it
do the right thing kernel wise when link goes up/down? Im burning
a bsd live cd right now as I type... :)

E
Re: link-detect issues (not quite working) [ In reply to ]
On Wed, 11 Aug 2004, Eric S. Johnson wrote:

> Heh. Well, yeah. That seems to be the thought of some folks.
> Looks like http://www.linuxvirtualserver.org/~acassen/ did just that
> (came up with some kernel patches for 2.4.23 that did that.
> Don't know why/whynot they didn't make it into the 2.6 world.

Hmm..

> But, really, the kernel didn't create the route. Zebra did.
> ifconfig shows nothing up but lo0 before zebra starts up.

Well, zebra added the address. The connected route is added by the
kernel as it adds the address.

> The "connected" routes are added by the kernel as zebra brings the
> interface up due to my interface eth0 ip address commands.
>
> Get this: If I run my zebra that has the RIB_SYSTEM_ROUTE macro (in
> zebra_rib.c) changed to allow zebra to mess with connected routes,
> after startup of zebra I have two entries in the route table for
> each interface. One kernel and one zebra.

In zebra's route-table? Possibly because you affected the logic that
allows zebra to recognise that it's own connected route and the
kernels are one and the same.

> Well, the kernel stuck it there at the request of zebra bringing up
> the interface.

Right, as it would do if you had used ifconfig or 'ip address add' to
add the address.

> Im not looking to place blame here :) Just get the proper
> functionality out of a quagga router. The rib is all correct, but
> because the fib/kernel is still thinking the interface is down
> packets get blackholed.
>
> Since zebra knows the link is down, can't it help the broken kernel
> out ;)

Possibly, it could be done, yes. But is it wise? Why do want to be
able to rely on being able to connect to link-specific addresses? You
could just as easily use link-independent addresses (ie attached to
loopback or dummy) and rely on those as your always-routable
addresses.

Removing the connected route is *not* going to ensure connectivity,
only link-independent addresses can ensure connectivity so long as a
path exists.

I question the wisdom of wanting to remove the connected route. It's
a bad solution to what you want, and the better solution already
exists.

> (side note, would just moving to BSD make my life happier? Does it
> do the right thing kernel wise when link goes up/down? Im burning a
> bsd live cd right now as I type... :)

No idea, be interested to hear what it does.

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
Cheops' Law:
Nothing ever gets built on schedule or within budget.
Re: link-detect issues (not quite working) [ In reply to ]
On Wed, Aug 11, 2004 at 10:56:53AM -0400, Eric S. Johnson wrote:
> Heh. Well, yeah. That seems to be the thought of some folks.
> Looks like http://www.linuxvirtualserver.org/~acassen/ did just that
> (came up with some kernel patches for 2.4.23 that did that.
> Don't know why/whynot they didn't make it into the 2.6 world.

LinkWatch? It's in the 2.6 kernel; see net/core/link_watch.c.

Regards,

Bill Rugolsky
Re: link-detect issues (not quite working) [ In reply to ]
brugolsky@telemetry-investments.com said:
> LinkWatch? It's in the 2.6 kernel; see net/core/link_watch.c.

Right it is. I misunderstood what that linkwatch code did.

paul@clubi.ie said:
>I question the wisdom of wanting to remove the connected route.

Well, if ospf can provide a better (IE working) path...

But my further investigation has me confused, as a better path
*should have* come from ospf and is not anyway. Once I have a
plausible demo of the problem, ill post more.

(did I see something in other messages about netlink messages getting
lost or delayed?)

E
Re: link-detect issues (not quite working) [ In reply to ]
On Wed, 11 Aug 2004, Eric S. Johnson wrote:

> paul@clubi.ie said:
>> I question the wisdom of wanting to remove the connected route.
>
> Well, if ospf can provide a better (IE working) path...
>
> But my further investigation has me confused, as a better path
> *should have* come from ospf and is not anyway. Once I have a
> plausible demo of the problem, ill post more.

The connected network will *always* be routed out the local
interface, if the interface is still 'UP'. What you want to do,
presumably is to remove the connected route for so long as the link
is not running, and obviously reinstall it if the connected link
comes back, so that if an alternate path exists to that
connected-but-not-running network on that link you can access it via
the alternate path.

The problem is, will the reply ever get back to you? The remote host
can not know that you have lost your link, presuming the packet has a
source address in the connected-but-not-running network, the remote
host will use its connected route to reach you.

So, unless the original packet is originated from an INADDR_ANY bound
socket[1], removing the connected route simply is a pretty poor
solution. TCP connections will not survive such an event. It wont
work if you're trying to ensure connectivity to a service listening
on or published as being available on the IP on that
connected-but-not-running network.

Ie removing the connected route is just a very *poor* solution in
general, and will not buy you much.

If you want an address that is independent of link-failures, assign a
*seperate* /32 address to loopback.

> (did I see something in other messages about netlink messages getting
> lost or delayed?)

They can do with large numbers of interfaces and/or high rate of
netlink messages with the typical receive buffer size(s). Hasso's
patch goes a long toward fixing it, but you wont know whether you
need a bigger receive buffer size until zebra starts losing messages.

> E

regards,
--
Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A
Fortune:
Moneyliness is next to Godliness.
-- Andries van Dam