Mailing List Archive

rancid hangs due to expect, ssh, or cisco?
SunOS netadmin 5.8 Generic_108528-07 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
rancid-2.2.2, expect-5.38, tcl-8.4.1, tk-8.4.1
openssh-3.4p1

Rancid over telnet runs well.

Rancid over ssh hangs, clogin -c "show run" <hostname> hangs, clogin <hostname>
and then "show run" at the enable prompt also hangs. Below is the tail portion
of clogin -c "show run" ecdc2ibgp with expect -d. Pointers will be appreciated.

expect: does "ntp server 158.81.250.130\r\nend\r\n\r\necdc2ibgp#Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match regular expression "\u0008+"? no
"^[^\n\r *]*ecdc2ibgp(\([^\r\n]+\))?#"? no
"^[^\n\r]*ecdc2ibgp(\([^\r\n]+\))?#."? no
"[\n\r]+"? yes
expect: set expect_out(0,string) "\r\n"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "ntp server 158.81.250.130\r\n"
ntp server 158.81.250.130
expect: continuing expect

expect: does "end\r\n\r\necdc2ibgp#Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match regular expression "\u0008+"? no
"^[^\n\r *]*ecdc2ibgp(\([^\r\n]+\))?#"? no
"^[^\n\r]*ecdc2ibgp(\([^\r\n]+\))?#."? no
"[\n\r]+"? yes
expect: set expect_out(0,string) "\r\n\r\n"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "end\r\n\r\n"
end

expect: continuing expect

expect: does "ecdc2ibgp#Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match regular expression "\u0008+"? no
"^[^\n\r *]*ecdc2ibgp(\([^\r\n]+\))?#"? yes
expect: set expect_out(0,string) "ecdc2ibgp#"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "ecdc2ibgp#"
ecdc2ibgp#send: sending "exit\r" to { exp4 }

expect: does "Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match glob pattern "Do you wish to save your configuration changes"? no
"\n"? yes
expect: set expect_out(0,string) "\n"
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) "Received disconnect from 158.81.248.251: Time-out activated\r\n"
expect: continuing expect

expect: does "" (spawn_id exp4) match glob pattern "Do you wish to save your configuration changes"? no
"\n"? no
expect: read eof
expect: set expect_out(spawn_id) "exp4"
expect: set expect_out(buffer) ""
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
are you saying that after the last line of the debug output below, you do
not receive a shell prompt? or, are you referring to the 'time-out
activated' that appears in the output?

if the latter, i suspect it is the cisco that is disconnecting and the fix
would be to increase your vty session and/or exec timeouts. the cisco does
not reset it's timer on output, only input. it is conceivable that it
could take long enough to generate and display the configuration to activate
the timer.

if the former, maybe the expect_before is not remaining active. i would
have to research that, since it works here (sparc, netbsd, exp 5.33,
tcl 8.3.2).

Mon, Dec 23, 2002 at 02:58:26PM -0600, Zhang, Anchi:
> SunOS netadmin 5.8 Generic_108528-07 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
> rancid-2.2.2, expect-5.38, tcl-8.4.1, tk-8.4.1
> openssh-3.4p1
>
> Rancid over telnet runs well.
>
> Rancid over ssh hangs, clogin -c "show run" <hostname> hangs, clogin <hostname> and then "show run" at the enable prompt also hangs. Below is the tail portion of clogin -c "show run" ecdc2ibgp with expect -d. Pointers will be appreciated.
>
> expect: does "ntp server 158.81.250.130\r\nend\r\n\r\necdc2ibgp#Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match regular expression "\u0008+"? no
> "^[^\n\r *]*ecdc2ibgp(\([^\r\n]+\))?#"? no
> "^[^\n\r]*ecdc2ibgp(\([^\r\n]+\))?#."? no
> "[\n\r]+"? yes
> expect: set expect_out(0,string) "\r\n"
> expect: set expect_out(spawn_id) "exp4"
> expect: set expect_out(buffer) "ntp server 158.81.250.130\r\n"
> ntp server 158.81.250.130
> expect: continuing expect
>
> expect: does "end\r\n\r\necdc2ibgp#Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match regular expression "\u0008+"? no
> "^[^\n\r *]*ecdc2ibgp(\([^\r\n]+\))?#"? no
> "^[^\n\r]*ecdc2ibgp(\([^\r\n]+\))?#."? no
> "[\n\r]+"? yes
> expect: set expect_out(0,string) "\r\n\r\n"
> expect: set expect_out(spawn_id) "exp4"
> expect: set expect_out(buffer) "end\r\n\r\n"
> end
>
> expect: continuing expect
>
> expect: does "ecdc2ibgp#Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match regular expression "\u0008+"? no
> "^[^\n\r *]*ecdc2ibgp(\([^\r\n]+\))?#"? yes
> expect: set expect_out(0,string) "ecdc2ibgp#"
> expect: set expect_out(spawn_id) "exp4"
> expect: set expect_out(buffer) "ecdc2ibgp#"
> ecdc2ibgp#send: sending "exit\r" to { exp4 }
>
> expect: does "Received disconnect from 158.81.248.251: Time-out activated\r\n" (spawn_id exp4) match glob pattern "Do you wish to save your configuration changes"? no
> "\n"? yes
> expect: set expect_out(0,string) "\n"
> expect: set expect_out(spawn_id) "exp4"
> expect: set expect_out(buffer) "Received disconnect from 158.81.248.251: Time-out activated\r\n"
> expect: continuing expect
>
> expect: does "" (spawn_id exp4) match glob pattern "Do you wish to save your configuration changes"? no
> "\n"? no
> expect: read eof
> expect: set expect_out(spawn_id) "exp4"
> expect: set expect_out(buffer) ""
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
I do get the shell prompt back right after the last line of the debug but it
takes a long time to get there. Rancid over ssh does work with some of my
routers but not all. For example, it had worked with one router until I
added a few more lines to its existing ACL.

The strange thing is that if I login using clogin <hostname> and issue "show
run" at the router's command prompt, the display will be fine. However, if
I do "term len 0" and then "show run" the display hangs when it gets close
to the very end of the config.

Anchi
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Mon, Dec 23, 2002 at 05:00:29PM -0600, Zhang, Anchi:
> I do get the shell prompt back right after the last line of the debug but it
> takes a long time to get there.

about 45 seconds? that is the timeout period. but, in the debug output
you sent, it should have immediately returned since it matched the prompt
and then EOF.

however, if the output hung prior to receiving the prompt (when we are
not expecting EOF), then it will wait for the timeout period.

> Rancid over ssh does work with some of my routers but not all. For
> example, it had worked with one router until I added a few more lines to
> its existing ACL.

can you share the lines that were added?

> The strange thing is that if I login using clogin <hostname> and issue
> "show run" at the router's command prompt, the display will be fine.
> However, if I do "term len 0" and then "show run" the display hangs when
> it gets close to the very end of the config.

when clogin <hostname> is used, clogin takes care of the login process
and then uses interact. this should in essence (fingers crossed) connect
your terminal directly to the pty until EOF. thus i suspect this is a
cisco bug. try telnet (or ssh) without clogin.
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
The hang is much longer than 45 seconds:

log2% date; clogin -c "sho run" rri2uunet>/dev/null; date
Thu Dec 26 10:02:58 CST 2002
Thu Dec 26 10:13:01 CST 2002

I changed the line to "set timeout 10" in clogin but noticed no difference in
the hang duration.

The lines below added to ACL "ip access-list extended ingress" to make it 181
lines:

permit esp any host 158.81.250.11
permit udp any host 158.81.250.11 eq 10000
permit udp any host 158.81.250.11 eq isakmp

I can email you the whole ACL in private if you wish to see it.

What is even more strange is the fact that I have three edge routers each with
the identical ACLs and adding the three lines to the other two routes did not
affect Rancid's access to them.

ssh or telnet access without clogin presents no problem. clogin via ssh is
successful all the time on many routers, once a while on some routers, and
never on a few others.

Anchi
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Thu, Dec 26, 2002 at 10:33:59AM -0600, Zhang, Anchi:
> The hang is much longer than 45 seconds:
>
> log2% date; clogin -c "sho run" rri2uunet>/dev/null; date
> Thu Dec 26 10:02:58 CST 2002
> Thu Dec 26 10:13:01 CST 2002
>
> I changed the line to "set timeout 10" in clogin but noticed no difference
> in the hang duration.

see the -t option.

> The lines below added to ACL "ip access-list extended ingress" to make it
> 181 lines:
>
> permit esp any host 158.81.250.11
> permit udp any host 158.81.250.11 eq 10000
> permit udp any host 158.81.250.11 eq isakmp
>
> I can email you the whole ACL in private if you wish to see it.
>
> What is even more strange is the fact that I have three edge routers each
> with the identical ACLs and adding the three lines to the other two routes
> did not affect Rancid's access to them.
>
> ssh or telnet access without clogin presents no problem. clogin via ssh is
> successful all the time on many routers, once a while on some routers, and
> never on a few others.

is it possible that the version of ios running on those suspect routers
has a bug related to this ACL? trying reproducing the problem with the
ACL removed.

otherwise, i'm at a loss. perhaps you can send a successful rancid
collection from one of the misbehaving routers to me directly.
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
My repeated tests show that the problem is related to the length of the
config. The Rancid collection that I mailed you privately was successful
because I did

UU-Cisco-gw(config)# no ntp server 158.81.250.130

before I ran rancid -d <hostname>. In fact, shortening the config by just one
line, any line, would render Rancid successful.

Anchi
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Fri, Dec 27, 2002 at 05:31:42PM -0600, Zhang, Anchi:
> My repeated tests show that the problem is related to the length of the
> config. The Rancid collection that I mailed you privately was successful
> because I did
>
> UU-Cisco-gw(config)# no ntp server 158.81.250.130
>
> before I ran rancid -d <hostname>. In fact, shortening the config by just
> one line, any line, would render Rancid successful.

hmm, i smell crack. could you try the following on the router:

conf t
lin v 0 15
exec-time 0
session-time 0
^Z

then try rancid again.
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
I tried it and the only difference it seems to have made was to extend the
hang period indefinitely.

With "exec-timeout 1" the hang is about a minute:

log2% date; clogin -c "sho run" ecdc2ibgp>/dev/null; date
Mon Dec 30 17:16:17 CST 2002
Mon Dec 30 17:17:20 CST 2002

Today, I was able to have a successful Rancid collection on a router that I
had never been able to just by simply removing a few unimportant lines from
the config. I was also able to cause a failed collection on a router by
simply adding enough lines after

ip access-list extended testing

Anchi
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Mon, Dec 30, 2002 at 05:38:25PM -0600, Zhang, Anchi:
> I tried it and the only difference it seems to have made was to extend the hang period indefinitely.
>
> With "exec-timeout 1" the hang is about a minute:
>
> log2% date; clogin -c "sho run" ecdc2ibgp>/dev/null; date
> Mon Dec 30 17:16:17 CST 2002
> Mon Dec 30 17:17:20 CST 2002
>
> Today, I was able to have a successful Rancid collection on a router that I had never been able to just by simply removing a few unimportant lines from the config. I was also able to cause a failed collection on a router by simply adding enough lines after
>
> ip access-list extended testing

could you try the patch for expect that is on www.shrubbery.net/rancid/?
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
I am also experiencing this problem as well, and the patch for expect did
not help in my case. My setup is cisco routers and a debian 3.0 installation
using kernel 2.4.18 and expect 5.32.2.

I have for now worked around this by using telnet instead of SSH, but the
still remains the same. I am also expriencing this issue sometimes when I
run commands with large output without paging on the routers as well,
so I do think it might be something in IOS.

Currently playing around to see what I can come up with.

Regards,
Johan
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Patch just applied but results remain the same.
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
> I have for now worked around this by using telnet instead of SSH, but the
> still remains the same. I am also expriencing this issue sometimes when I
> run commands with large output without paging on the routers as well,
> so I do think it might be something in IOS.

i think this happens for me on a non-cisco router with a cisco cli

randy

---

The following routers have not been successfully contacted for
more than 4 hours.
-rw-r----- 1 randy staff 6928 Nov 18 00:15 psg2.psg.com
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
On Tue, Dec 31, 2002 at 09:43:32AM -0800, Randy Bush wrote:
> i think this happens for me on a non-cisco router with a cisco cli
> ---
> The following routers have not been successfully contacted for
> more than 4 hours.
> -rw-r----- 1 randy staff 6928 Nov 18 00:15 psg2.psg.com

That is a failure to log in or connect to the router.

A rancid hang would get this message to rancid-admin-$GROUP:
Subject: rancid hung - $GROUP

echo hourly config diffs failed: $LOCKFILE exists

and you would have to go kill -9 some of the rancid programs
(typically a hung expect) to get things running again.

--asp
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Tue, Dec 31, 2002 at 09:43:32AM -0800, Randy Bush:
> > I have for now worked around this by using telnet instead of SSH, but the
> > still remains the same. I am also expriencing this issue sometimes when I
> > run commands with large output without paging on the routers as well,
> > so I do think it might be something in IOS.
>
> i think this happens for me on a non-cisco router with a cisco cli

almost certainly. if the device_type field of the router.db file is
incorrect, the wrong login script might ("might" because some types
use the same script) be used. it would be nice to merge all the
scripts, but that is difficult - and we dont want to jeopardize their
stability.

but, i am at a loss as to what might be wrong with Johan or Anchi's
collections. i know of rancid users with both solaris and linux that
have not had problems. what i've seen thus far point to either an
IOS or telnet/ssh/expect problem. someone experiencing the problem will
have to figure it out or one of them will have to offer a login and
router access to me. sorry guys.
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
My temporary workaround to this problem is

log2# diff clogin clogin.orig
457c457
< send "term length 100\r"
---
> send "term length 0\r"

Anchi
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
On Tue, Dec 31, 2002 at 02:49:48PM -0600, Zhang, Anchi wrote:
> My temporary workaround to this problem is
>
> log2# diff clogin clogin.orig
> 457c457
> < send "term length 100\r"
> ---
> > send "term length 0\r"

Well that is whacko. Is 'term length 0' not working on your router?
--asp
rancid hangs due to expect, ssh, or cisco? [ In reply to ]
Yes, 'term len 0' works on my router. However strange, the workaround does
work for me. I was hoping others who have similar problems would try it and
confirm.

Anchi