Here's a bug in the code that's been bothering me:
It's possible to start multiple copies of these daemons without
even a whimper, sometimes. For example,
# ./ospfd -f /usr/local/etc/ospfd.conf &
[1] 6173
# 2003/10/14 15:25:28 OSPF: Redistribute[Kernel]: Start Type[1], Metric[20]
2003/10/14 15:25:28 OSPF: ASBR[Status:1]: Update
2003/10/14 15:25:28 OSPF: Redistribute[Connected]: Start Type[1], Metric[20]
2003/10/14 15:25:28 OSPF: ASBR[Status:2]: Update
2003/10/14 15:25:28 OSPF: ASBR[Status:2]: Already ASBR
2003/10/14 15:25:28 OSPF: Redistribute[Static]: Start Type[1], Metric[20]
2003/10/14 15:25:28 OSPF: ASBR[Status:3]: Update
2003/10/14 15:25:28 OSPF: ASBR[Status:3]: Already ASBR
# ./ospfd -f /usr/local/etc/ospfd.conf &
[2] 6174
# 2003/10/14 15:25:29 OSPF: Redistribute[Kernel]: Start Type[1], Metric[20]
2003/10/14 15:25:29 OSPF: ASBR[Status:1]: Update
2003/10/14 15:25:29 OSPF: Redistribute[Connected]: Start Type[1], Metric[20]
2003/10/14 15:25:29 OSPF: ASBR[Status:2]: Update
2003/10/14 15:25:29 OSPF: ASBR[Status:2]: Already ASBR
2003/10/14 15:25:29 OSPF: Redistribute[Static]: Start Type[1], Metric[20]
2003/10/14 15:25:29 OSPF: ASBR[Status:3]: Update
2003/10/14 15:25:29 OSPF: ASBR[Status:3]: Already ASBR
The second one actually failed to open the raw socket, but notice
the lack of any warnings/errors.. and, if you had a conf file that
asks for things to be logged to a file, it's not really possible to
figure out which incarnation is printing what. Another annoying effect
is that doing a 'ps' leaves you wondering which is the useful daemon,
and which ones are just hanging out.
So, a simple solution, in the case of routing protocol daemons
is to bail out if they can't open the raw socket- if they can't
communicate with their peers in the domain, no point sitting around
and doing nothing.
In the case of zebra itself, one doesn't really have a raw socket,
but, if we are using a TCP/IP vty_port, the call to bind() from
vty_serv_sock will fail, and we can bail out at that point. This
solution would also kick in for the routing daemons, and I don't really
see the harm in that (comments??)
But that still leaves me with 2 unsolved cases:
1. what if I start up 2 copies of zebra,
# zebra -P port1
# zebra -P port2
My opinion is that this is probably ok- whoever's doing it
must have a good (though mysterious) reason, and they shouldn't
be forbidden from it.
2. what if vtysh is being used instead of TCP/IP? The call to bind()
from vty_serv_un() is not going to fail for the second incarnation.
One possible solution is to use some form of file-locking on
serv.sun_path used in vty_serv_un().. is this a good idea? My thought
is something like
bind();
/* try to write lock the file and bail out if unsuccesful */
what's the most portable system call for file-locking? fcntl? flock?
--Sowmini
It's possible to start multiple copies of these daemons without
even a whimper, sometimes. For example,
# ./ospfd -f /usr/local/etc/ospfd.conf &
[1] 6173
# 2003/10/14 15:25:28 OSPF: Redistribute[Kernel]: Start Type[1], Metric[20]
2003/10/14 15:25:28 OSPF: ASBR[Status:1]: Update
2003/10/14 15:25:28 OSPF: Redistribute[Connected]: Start Type[1], Metric[20]
2003/10/14 15:25:28 OSPF: ASBR[Status:2]: Update
2003/10/14 15:25:28 OSPF: ASBR[Status:2]: Already ASBR
2003/10/14 15:25:28 OSPF: Redistribute[Static]: Start Type[1], Metric[20]
2003/10/14 15:25:28 OSPF: ASBR[Status:3]: Update
2003/10/14 15:25:28 OSPF: ASBR[Status:3]: Already ASBR
# ./ospfd -f /usr/local/etc/ospfd.conf &
[2] 6174
# 2003/10/14 15:25:29 OSPF: Redistribute[Kernel]: Start Type[1], Metric[20]
2003/10/14 15:25:29 OSPF: ASBR[Status:1]: Update
2003/10/14 15:25:29 OSPF: Redistribute[Connected]: Start Type[1], Metric[20]
2003/10/14 15:25:29 OSPF: ASBR[Status:2]: Update
2003/10/14 15:25:29 OSPF: ASBR[Status:2]: Already ASBR
2003/10/14 15:25:29 OSPF: Redistribute[Static]: Start Type[1], Metric[20]
2003/10/14 15:25:29 OSPF: ASBR[Status:3]: Update
2003/10/14 15:25:29 OSPF: ASBR[Status:3]: Already ASBR
The second one actually failed to open the raw socket, but notice
the lack of any warnings/errors.. and, if you had a conf file that
asks for things to be logged to a file, it's not really possible to
figure out which incarnation is printing what. Another annoying effect
is that doing a 'ps' leaves you wondering which is the useful daemon,
and which ones are just hanging out.
So, a simple solution, in the case of routing protocol daemons
is to bail out if they can't open the raw socket- if they can't
communicate with their peers in the domain, no point sitting around
and doing nothing.
In the case of zebra itself, one doesn't really have a raw socket,
but, if we are using a TCP/IP vty_port, the call to bind() from
vty_serv_sock will fail, and we can bail out at that point. This
solution would also kick in for the routing daemons, and I don't really
see the harm in that (comments??)
But that still leaves me with 2 unsolved cases:
1. what if I start up 2 copies of zebra,
# zebra -P port1
# zebra -P port2
My opinion is that this is probably ok- whoever's doing it
must have a good (though mysterious) reason, and they shouldn't
be forbidden from it.
2. what if vtysh is being used instead of TCP/IP? The call to bind()
from vty_serv_un() is not going to fail for the second incarnation.
One possible solution is to use some form of file-locking on
serv.sun_path used in vty_serv_un().. is this a good idea? My thought
is something like
bind();
/* try to write lock the file and bail out if unsuccesful */
what's the most portable system call for file-locking? fcntl? flock?
--Sowmini