Mailing List Archive

Rancid hangs
Hi all,

Has anyone seen behavior similar to the following?

Recently, when querying our routers for configurations, clogin seems to =
hang while downloading the configuration. For instance, running clogin =
manually with the command-line:

./clogin -t 90 -c show version;show install active;show env all;show gsr =
chassis;show boot;show bootvar;show variables boot;show flash;dir /all =
nvram:;dir /all bootflash:;dir /all slot0:;dir /all disk0:;dir /all =
slot1:;dir /all disk1:;show controllers;show controllers cbus;show =
diagbus;show diag;show module;show c7200;show vtp status;show vlan;write =
term <router>

it logs in properly and begins running commands. When it gets to "write =
term" it begins printing it out but then hangs (the exact point at which =
it hangs varies with each run). The do-diffs script (which also runs =
the above) hangs on several routers.

I am running Rancid 2.1 and have had no issues until the past week or =
so. I am using both SSH and telnet as my connection methods...

Thanks!
David LaPorte
Rancid hangs [ In reply to ]
Have you upgraded expect latley? I had the same problem, and it turned out
that the version of expect caused it to hang. There was a patch posted on
this list, I have attacked it below:

disclaimer: i am not at all sure that this is the proper way to fix

this problem (where rancid's *logins hang while collecting info from

devices on linux platforms with tcl8.3* and expect 5.32*) or if it

will have adverse affects on other expect scripts.

what is happening (usually amid write term or show config, cisco-ism

or juniper-ism) is the last chunk of data before the prompt has been

read into the internal ("channel") buffer, expect asks for more data,

but instead of tcl reading from the buffer or checking if the file

descriptor is actually ready for reading, it just calls read() via

expect's ExpInputProc() where it hangs with the router waiting for

input (until the router's session-timeout expires).

i believe this is due to Tcl_WaitForEvent() not differentiating properly

between timeout and "ready_for_read", but i did not unwrap the maze of

callbacks within tcl and don't have time to right now.

the (inefficient) patch below makes sure the filedescriptor is

set non-blocking, so the read will return immediately if the FD

is not ready for reading and thus give the tcl timer functions the

opportunity to timeout an operation (and apparently look at the internal

buffer for more data). i have no idea why this doesnt happen/affect netbsd.

it works for me with tcl8.3.b2 and expect 5.32.1 on the linux box i have

freewill-access to, or at least do-diffs completed flawlessly 4 times,

whereas before it barely even got out of the gate. i think this is

RedCrap 6.1 or so...uname says Linux 2.2.16-22, but i'm guessing that's

just the kernel and i'm blissfully unaware of where all the other version

info is hidden.

you'll have to apply this to your expect 5.32.1 source; cd expect-5.32;

patch < patchfile; make install

- - - - - - - - - - - - - c u t h e r e - - - - - - - - - - - - - -

*** exp_chan.c.FCS Tue Aug 14 16:55:54 2001

--- exp_chan.c Tue Aug 14 16:59:25 2001

***************

*** 119,124 ****

--- 119,125 ----

* nonblocking, the read will never block.

*/


+ fcntl(esPtr->fdin, F_SETFL, O_NONBLOCK);

bytesRead = read(esPtr->fdin, buf, (size_t) toRead);

/*printf("ExpInputProc: read(%d,,) = %d\r\n",esPtr->fdin,bytesRead);*/

if (bytesRead > -1) {
Rancid hangs [ In reply to ]
note that patch(1) might get confused with the spacing below. you can
see the correct spacing on www.shrubbery.net/rancid.

Mon, Sep 24, 2001 at 09:57:03AM -0500, Mike Hyde:
> Have you upgraded expect latley? I had the same problem, and it turned out
> that the version of expect caused it to hang. There was a patch posted on
> this list, I have attacked it below:
>
> disclaimer: i am not at all sure that this is the proper way to fix
>
> this problem (where rancid's *logins hang while collecting info from
>
> devices on linux platforms with tcl8.3* and expect 5.32*) or if it
>
> will have adverse affects on other expect scripts.
>
> what is happening (usually amid write term or show config, cisco-ism
>
> or juniper-ism) is the last chunk of data before the prompt has been
>
> read into the internal ("channel") buffer, expect asks for more data,
>
> but instead of tcl reading from the buffer or checking if the file
>
> descriptor is actually ready for reading, it just calls read() via
>
> expect's ExpInputProc() where it hangs with the router waiting for
>
> input (until the router's session-timeout expires).
>
> i believe this is due to Tcl_WaitForEvent() not differentiating properly
>
> between timeout and "ready_for_read", but i did not unwrap the maze of
>
> callbacks within tcl and don't have time to right now.
>
> the (inefficient) patch below makes sure the filedescriptor is
>
> set non-blocking, so the read will return immediately if the FD
>
> is not ready for reading and thus give the tcl timer functions the
>
> opportunity to timeout an operation (and apparently look at the internal
>
> buffer for more data). i have no idea why this doesnt happen/affect netbsd.
>
> it works for me with tcl8.3.b2 and expect 5.32.1 on the linux box i have
>
> freewill-access to, or at least do-diffs completed flawlessly 4 times,
>
> whereas before it barely even got out of the gate. i think this is
>
> RedCrap 6.1 or so...uname says Linux 2.2.16-22, but i'm guessing that's
>
> just the kernel and i'm blissfully unaware of where all the other version
>
> info is hidden.
>
> you'll have to apply this to your expect 5.32.1 source; cd expect-5.32;
>
> patch < patchfile; make install
>
> - - - - - - - - - - - - - c u t h e r e - - - - - - - - - - - - - -
>
> *** exp_chan.c.FCS Tue Aug 14 16:55:54 2001
>
> --- exp_chan.c Tue Aug 14 16:59:25 2001
>
> ***************
>
> *** 119,124 ****
>
> --- 119,125 ----
>
> * nonblocking, the read will never block.
>
> */
>
>
> + fcntl(esPtr->fdin, F_SETFL, O_NONBLOCK);
>
> bytesRead = read(esPtr->fdin, buf, (size_t) toRead);
>
> /*printf("ExpInputProc: read(%d,,) = %d\r\n",esPtr->fdin,bytesRead);*/
>
> if (bytesRead > -1) {
>
>
>
> -----Original Message-----
> From: owner-rancid-discuss at shrubbery.net
> [mailto:owner-rancid-discuss at shrubbery.net]On Behalf Of David LaPorte
> Sent: Friday, September 21, 2001 4:16 PM
> To: rancid-discuss at shrubbery.net
> Subject: Rancid hangs
>
>
> Hi all,
>
> Has anyone seen behavior similar to the following?
>
> Recently, when querying our routers for configurations, clogin seems to
> hang while downloading the configuration. For instance, running clogin
> manually with the command-line:
>
> ./clogin -t 90 -c show version;show install active;show env all;show gsr
> chassis;show boot;show bootvar;show variables boot;show flash;dir /all
> nvram:;dir /all bootflash:;dir /all slot0:;dir /all disk0:;dir /all
> slot1:;dir /all disk1:;show controllers;show controllers cbus;show
> diagbus;show diag;show module;show c7200;show vtp status;show vlan;write
> term <router>
>
> it logs in properly and begins running commands. When it gets to "write
> term" it begins printing it out but then hangs (the exact point at which it
> hangs varies with each run). The do-diffs script (which also runs the
> above) hangs on several routers.
>
> I am running Rancid 2.1 and have had no issues until the past week or so.
> I am using both SSH and telnet as my connection methods...
>
> Thanks!
> David LaPorte
>
>