Mailing List Archive

patches for pauses between parallel RANCID runs
I needed to control how fast RANCID starts up jobs in parallel: when
using one-time password logins, I had multiple routers trying
to log in with the same sequence number, and only one of them could
finish logging in.

It turns out "par" already supports such a feature, but there's no easy
hook to turn it on. So here's an addition to /etc/rancid.conf:

# How long to pause (in seconds) between parallel RANCID runs
# This is important when using the same S/Key account on multiple
# routers, otherwise all the routers will receive the same challenge
# and only one will actually be able to log in. Default is zero.
# PAR_PAUSE=3; export PAR_PAUSE

And a simple patch to control_rancid, (see attached), to use that environment
variable.

I'll send in my S/Key patches in a few days, after they've proved to be
stable. If anyone else wants to try them out, please write to me off-list.

-- Ed
-------------- next part --------------
--- bin/control_rancid 2005-06-10 20:49:46.000000000 -0400
+++ ../rancid-panix-1/libexec/rancid/control_rancid 2005-06-14 11:35:21.000000000 -0400
@@ -89,6 +89,9 @@
# Number of things par should run in parallel.
PAR_COUNT=${PAR_COUNT:-5}

+# How many seconds to sleep between each run
+PAR_PAUSE=${PAR_PAUSE:-0}
+
# Bail if we do not have the necessary info to run
if [ ! -d $DIR ]
then
@@ -304,7 +307,7 @@
# tailored to the specific installation.
echo ""
echo "Trying to get all of the configs."
-par -q -n $PAR_COUNT -c "rancid-fe \{}" $devlistfile
+par -q -n $PAR_COUNT -p $PAR_PAUSE -c "rancid-fe \{}" $devlistfile

# This section will generate a list of missed routers
# and try to grab them again. It will run through
@@ -334,7 +337,7 @@
if [ -f $DIR/routers.up.missed ]; then
echo "====================================="
echo "Getting missed routers: round $round."
- par -q -n $PAR_COUNT -c "rancid-fe \{}" $DIR/routers.up.missed
+ par -q -n $PAR_COUNT -p $PAR_PAUSE -c "rancid-fe \{}" $DIR/routers.up.missed
rm -f $DIR/routers.up.missed
round=`expr $round + 1`
else
patches for pauses between parallel RANCID runs [ In reply to ]
Tue, Jun 14, 2005 at 12:10:58PM -0400, Ed Ravin:
> I needed to control how fast RANCID starts up jobs in parallel: when
> using one-time password logins, I had multiple routers trying
> to log in with the same sequence number, and only one of them could
> finish logging in.
>
> It turns out "par" already supports such a feature, but there's no easy
> hook to turn it on. So here's an addition to /etc/rancid.conf:
>
> # How long to pause (in seconds) between parallel RANCID runs
> # This is important when using the same S/Key account on multiple
> # routers, otherwise all the routers will receive the same challenge
> # and only one will actually be able to log in. Default is zero.
> # PAR_PAUSE=3; export PAR_PAUSE
>
> And a simple patch to control_rancid, (see attached), to use that environment
> variable.
>
> I'll send in my S/Key patches in a few days, after they've proved to be
> stable. If anyone else wants to try them out, please write to me off-list.
>
> -- Ed

I dont think that is a reliable solution. you really need to write-lock the
file you are reading the keys from. The process will have to lock that file
until it manages to get it's key accepted (login, then again for enable) or
gives-up and others will have to block waiting for the lock.

> --- bin/control_rancid 2005-06-10 20:49:46.000000000 -0400
> +++ ../rancid-panix-1/libexec/rancid/control_rancid 2005-06-14 11:35:21.000000000 -0400
> @@ -89,6 +89,9 @@
> # Number of things par should run in parallel.
> PAR_COUNT=${PAR_COUNT:-5}
>
> +# How many seconds to sleep between each run
> +PAR_PAUSE=${PAR_PAUSE:-0}
> +
> # Bail if we do not have the necessary info to run
> if [ ! -d $DIR ]
> then
> @@ -304,7 +307,7 @@
> # tailored to the specific installation.
> echo ""
> echo "Trying to get all of the configs."
> -par -q -n $PAR_COUNT -c "rancid-fe \{}" $devlistfile
> +par -q -n $PAR_COUNT -p $PAR_PAUSE -c "rancid-fe \{}" $devlistfile
>
> # This section will generate a list of missed routers
> # and try to grab them again. It will run through
> @@ -334,7 +337,7 @@
> if [ -f $DIR/routers.up.missed ]; then
> echo "====================================="
> echo "Getting missed routers: round $round."
> - par -q -n $PAR_COUNT -c "rancid-fe \{}" $DIR/routers.up.missed
> + par -q -n $PAR_COUNT -p $PAR_PAUSE -c "rancid-fe \{}" $DIR/routers.up.missed
> rm -f $DIR/routers.up.missed
> round=`expr $round + 1`
> else
patches for pauses between parallel RANCID runs [ In reply to ]
On Tue, Jun 14, 2005 at 03:16:25PM -0700, john heasley wrote:
> Tue, Jun 14, 2005 at 12:10:58PM -0400, Ed Ravin:
> > I needed to control how fast RANCID starts up jobs in parallel: when
> > using one-time password logins, I had multiple routers trying
> > to log in with the same sequence number, and only one of them could
> > finish logging in.
> >
> > It turns out "par" already supports such a feature, but there's no easy
> > hook to turn it on. So here's an addition to /etc/rancid.conf:
> >
> > # How long to pause (in seconds) between parallel RANCID runs
> > # This is important when using the same S/Key account on multiple
> > # routers, otherwise all the routers will receive the same challenge
> > # and only one will actually be able to log in. Default is zero.
> > # PAR_PAUSE=3; export PAR_PAUSE
[...]
> I dont think that is a reliable solution. you really need to write-lock the
> file you are reading the keys from. The process will have to lock that file
> until it manages to get it's key accepted (login, then again for enable) or
> gives-up and others will have to block waiting for the lock.

I agree that it's not 100% reliable, but it will probably be good enough.
Note that this is a general issue with s/key, not a RANCID-specific thing.
I don't like the idea of locking files, as it only solves the problem for
RANCID and only when RANCID is running on just one machine. Also, when
you add locking code you add the possibility of bugs that deadlock, which
is no fun.

I'd rather do what normally happens when an S/Key collision occurs
- try the login again. The catch is, I'd like to sleep a random
amount so that a flock of clogins don't all retry at the same time - how
do you get random numbers in expect ?

What do you think of conditionally skipping the 1-second sleep in
clogin before sending the password? I think that's part of the problem,
since any clogins using the same account that try another router in the 1
second interval will get a duplicate challenge that will be stale by the
time they finish their 1-second sleeps...

-- Ed