Mailing List Archive

Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x
Hello all,

I'm seeing a very strange problem with rancid-2.2.2 (and 2.3-rc1).
I've been talking privately with some of the developers but this has
all of us stumped. Part of the problem is that no one has managed to
duplicate the symptoms I'm seeing, so I'm I'm sending this to the
-discuss list in the hopes that someone out there has seen something
similar.

I recently installed rancid-2.2.2 on a freshly-installed FreeBSD-5.1
system after installing tcl83 and expect (5.38.0) from /usr/ports.
After setting up a small group of (mostly) Cisco 26xx and 72xx
routers to be polled, on subsequent runs of rancid I saw oscillating
sets of diffs like these:

RUN1:
[ on router1 ]
- snmp-server contact <email>
+ snmp-server contaact <email>

[on router2]
- tacacs-server host 644.124.X.Y
+ tacacs-server host 64.124.X.Y

[on router3]
- taacacs-server host <foo>
+ tacacs-server host <foo>

RUN2:
[on router1]
- snmp-server contaact <foo>
+ snmp-server contact <foo>

[on router2]
- tacacs-server host 64.124.X.Y
+ tacacs-server host 644.124.X.Y

[on router3]
- tacacs-server host <foo>
+ taacacs-server host <foo>

Again, this is a stock FreeBSD-5.1-RELEASE system with a GENERIC
kernel and very little in the way of additional software added... a
couple things like apache and MRTG are about it.

I was unable to duplicate this behavior by running:

% clogin -c 'wr term' $router

no matter how many times I tried. I also tried running 'rancid -d'
in case it had something to do with the sequence of commands run
by rancid and not just the 'wr term' itself, but did not see the
same problem. I tried setting NOPIPE in rancid.conf but still saw
the duplicate characters. I played with PAR_COUNT, to no avail.
Thinking that it was something particular to the version of tcl
and/or expect we're using, I tried several different combinations
of those (including the latest, tcl8.4.5 and expect 5.40.0) but got
the same results.

I then loaded 2.2.2 on a freshly-installed FreeBSD-5.2-RELEASE
system and got the same sort of behavior. Again, I tried different
versions of everything (2.3-rc1, newer and older tcl, newer and
older expect) but none of those made a difference. I also changed
the set of routers I was polling to one that includes Cisco GSRs
and Juniper M160s and saw the same sort of thing on both types of
router, ruling out anything vendor-specific.

After all of this, I loaded 2.3-rc1 on the system we currently use
to poll the routers (an ancient FreeBSD-3.4 system) and it works as
expected, with no duplicate characters. At this point I don't know
what to think... it could be a 5.x issue, or a problem with newer
tcl+expect, or even rancid itself, althouth that seems unlikely.

Thanks for any suggestions,

--Jeff
Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x [ In reply to ]
On Thu, Feb 19, 2004 at 09:43:36AM -0500, Jeff Aitken wrote:
> Hello all,
>
> I'm seeing a very strange problem with rancid-2.2.2 (and 2.3-rc1).
> I've been talking privately with some of the developers but this has
> all of us stumped. Part of the problem is that no one has managed to
> duplicate the symptoms I'm seeing, so I'm I'm sending this to the
> -discuss list in the hopes that someone out there has seen something
> similar.
From time to time I have dublicate lines in config files, which disappear
on the following run. I use rancid-2.2.2 with expect-5.38.0 and tcl8.4.1
on Solaris8. The occurance is arbitrary, so I am a bit aimless, howto
debug this problem

--
erik at code.de

"I am not a Geek! I shower."
Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x [ In reply to ]
Thu, Mar 11, 2004 at 10:27:16AM +0100, Erik Wenzel:
> On Thu, Feb 19, 2004 at 09:43:36AM -0500, Jeff Aitken wrote:
> > Hello all,
> >
> > I'm seeing a very strange problem with rancid-2.2.2 (and 2.3-rc1).
> > I've been talking privately with some of the developers but this has
> > all of us stumped. Part of the problem is that no one has managed to
> > duplicate the symptoms I'm seeing, so I'm I'm sending this to the
> > -discuss list in the hopes that someone out there has seen something
> > similar.
> >From time to time I have dublicate lines in config files, which disappear
> on the following run. I use rancid-2.2.2 with expect-5.38.0 and tcl8.4.1
> on Solaris8. The occurance is arbitrary, so I am a bit aimless, howto
> debug this problem

Could you supply an example, please.

What type of device is being collected? what connection method, telnet or
ssh?
Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x [ In reply to ]
On Thu, Mar 11, 2004 at 10:31:16AM -0800, john heasley wrote:
> Thu, Mar 11, 2004 at 10:27:16AM +0100, Erik Wenzel:
> > On Thu, Feb 19, 2004 at 09:43:36AM -0500, Jeff Aitken wrote:
> > > Hello all,
> > >
> > > I'm seeing a very strange problem with rancid-2.2.2 (and 2.3-rc1).
> > > I've been talking privately with some of the developers but this has
> > > all of us stumped. Part of the problem is that no one has managed to
> > > duplicate the symptoms I'm seeing, so I'm I'm sending this to the
> > > -discuss list in the hopes that someone out there has seen something
> > > similar.
> > >From time to time I have dublicate lines in config files, which disappear
> > on the following run. I use rancid-2.2.2 with expect-5.38.0 and tcl8.4.1
> > on Solaris8. The occurance is arbitrary, so I am a bit aimless, howto
> > debug this problem
>
> Could you supply an example, please.
>
> What type of device is being collected? what connection method, telnet or
> ssh?
This is an example output. There are mutiple occurances, when it
happens:
---snip---
arp 1.1.1.1 0000.0000.0001 ARPA
+ arp 1.1.1.1 0000.0000.0001 ARPA
arp 1.1.1.2 0000.0000.0002 ARPA
arp 1.1.1.3.0000.0000.0003 ARPA
---snip---

On the next do-diffs run it disappears with:
---snip---
arp 1.1.1.1 0000.0000.0001 ARPA
- arp 1.1.1.1 0000.0000.0001 ARPA
arp 1.1.1.2 0000.0000.0002 ARPA
arp 1.1.1.3.0000.0000.0003 ARPA
---snip---

This is collected with telnet from a Cisco Catalyst 6500 running IOS
Version 12.1(19)E1.

--
erik at code.de

"I am not a Geek! I shower."
Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x [ In reply to ]
Fri, Mar 12, 2004 at 03:26:36PM +0100, Erik Wenzel:
> On Thu, Mar 11, 2004 at 10:31:16AM -0800, john heasley wrote:
> > Thu, Mar 11, 2004 at 10:27:16AM +0100, Erik Wenzel:
> > > On Thu, Feb 19, 2004 at 09:43:36AM -0500, Jeff Aitken wrote:
> > > > Hello all,
> > > >
> > > > I'm seeing a very strange problem with rancid-2.2.2 (and 2.3-rc1).
> > > > I've been talking privately with some of the developers but this has
> > > > all of us stumped. Part of the problem is that no one has managed to
> > > > duplicate the symptoms I'm seeing, so I'm I'm sending this to the
> > > > -discuss list in the hopes that someone out there has seen something
> > > > similar.
> > > >From time to time I have dublicate lines in config files, which disappear
> > > on the following run. I use rancid-2.2.2 with expect-5.38.0 and tcl8.4.1
> > > on Solaris8. The occurance is arbitrary, so I am a bit aimless, howto
> > > debug this problem
> >
> > Could you supply an example, please.
> >
> > What type of device is being collected? what connection method, telnet or
> > ssh?
> This is an example output. There are mutiple occurances, when it
> happens:
> ---snip---
> arp 1.1.1.1 0000.0000.0001 ARPA
> + arp 1.1.1.1 0000.0000.0001 ARPA
> arp 1.1.1.2 0000.0000.0002 ARPA
> arp 1.1.1.3.0000.0000.0003 ARPA
> ---snip---
>
> On the next do-diffs run it disappears with:
> ---snip---
> arp 1.1.1.1 0000.0000.0001 ARPA
> - arp 1.1.1.1 0000.0000.0001 ARPA
> arp 1.1.1.2 0000.0000.0002 ARPA
> arp 1.1.1.3.0000.0000.0003 ARPA
> ---snip---
>
> This is collected with telnet from a Cisco Catalyst 6500 running IOS
> Version 12.1(19)E1.

we have one of these, but not arp commands in the config. It nor any
of the other 6500s exhibit this behavior. Does this occur only with arp
commands by chance?
Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x [ In reply to ]
On Thu, Feb 19, 2004 at 09:43:36AM -0500, Jeff Aitken wrote:
> I recently installed rancid-2.2.2 on a freshly-installed FreeBSD-5.1
> system after installing tcl83 and expect (5.38.0) from /usr/ports.
> After setting up a small group of (mostly) Cisco 26xx and 72xx
> routers to be polled, on subsequent runs of rancid I saw oscillating
> sets of diffs like these:
>
> RUN1:
> [ on router1 ]
> - snmp-server contact <email>
> + snmp-server contaact <email>
>
> [on router2]
> - tacacs-server host 644.124.X.Y
> + tacacs-server host 64.124.X.Y
>
> [on router3]
> - taacacs-server host <foo>
> + tacacs-server host <foo>
>
> RUN2:
> [on router1]
> - snmp-server contaact <foo>
> + snmp-server contact <foo>
>
> [on router2]
> - tacacs-server host 64.124.X.Y
> + tacacs-server host 644.124.X.Y
>
> [on router3]
> - tacacs-server host <foo>
> + taacacs-server host <foo>


I have an update on this problem. As you may remember, I saw this
behavior on both FreeBSD-5.1 and -5.2 systems. On the -5.2 system,
I forced rancid to use ssh instead of telnet. Since making that
change, I have not seen any of these duplicate characters, and I've
run rancid 15-20 times now which should have been enough.

Reading this:

http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.bin/telnet/Makefile

makes me wonder if the changes mentioned in 1.22 might have something
to do with this issue. This could (in theory) explain why no one
could duplicate the problem we saw; perhaps due to our choices during
installation we got a different version of /usr/bin/telnet.

Anyway, the next step is to build a different telnet binary and see
if the behavior changes. Obviously we can run rancid with ssh instead
of telnet without any problems, but it'd be nice to know *why* this
happened in the first place.


--Jeff
Very strange problem with 2.2.2 (and 2.3-rc1) on FreeBSD-5.x [ In reply to ]
Thu, Apr 01, 2004 at 10:33:28AM -0500, Jeff Aitken:
> I have an update on this problem. As you may remember, I saw this
> behavior on both FreeBSD-5.1 and -5.2 systems. On the -5.2 system,
> I forced rancid to use ssh instead of telnet. Since making that
> change, I have not seen any of these duplicate characters, and I've
> run rancid 15-20 times now which should have been enough.
>
> Reading this:
>
> http://www.freebsd.org/cgi/cvsweb.cgi/src/usr.bin/telnet/Makefile
>
> makes me wonder if the changes mentioned in 1.22 might have something
> to do with this issue. This could (in theory) explain why no one
> could duplicate the problem we saw; perhaps due to our choices during
> installation we got a different version of /usr/bin/telnet.
>
> Anyway, the next step is to build a different telnet binary and see
> if the behavior changes. Obviously we can run rancid with ssh instead
> of telnet without any problems, but it'd be nice to know *why* this
> happened in the first place.

I'd try building a telnet without the crypto goo. but, i suspect your
problem is more likely to be with it's buffer (ring) handling.