Mailing List Archive

[Bug 578] SSH freezes on cluster machine
http://bugzilla.mindrot.org/show_bug.cgi?id=578

Summary: SSH freezes on cluster machine
Product: Portable OpenSSH
Version: -current
Platform: ix86
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: ssh
AssignedTo: openssh-bugs@mindrot.org
ReportedBy: andrews@comp.nus.edu.sg


I am user of a cluster where I test my distributed program which consists
of several computing "worker" programs running on different nodes. The
workers communicate with one another using tcp, port 9900. I execute
the workers remotely using ssh

OpenSSH_3.4p1, SSH protocols 1.5/2.0, OpenSSL 0x0090602f

The administrator has recently downgraded ssh to 3.4p1 after 3.5p1
also exhibited the same problem.

The cluster is running Rocks v2.2, Linux kernel 2.4.18-27.7.xsmp

I noticed that some (not all) of the nodes start having problems after I ran
my program on them. When I try to ssh to those nodes, ssh freezes.
But after a while (usually 1/2-1 day), ssh to those affected nodes return the
message

ssh_exchange_identification: Connection closed by remote host

Thank you in advance for any help.


Andrew



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 578] SSH freezes on cluster machine [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=578





------- Additional Comments From andrews@comp.nus.edu.sg 2003-05-28 01:25 -------
Created an attachment (id=313)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=313&action=view)
ssh -vvv compute-0-14

This is what is being printed out on the screen before ssh hangs
when I ran

ssh -vvv compute-0-14

where compute-0-14 is the name of an affected node in the cluster.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 578] SSH freezes on cluster machine [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=578





------- Additional Comments From djm@mindrot.org 2003-06-04 19:26 -------
can you keep non-ssh tcp connections up for similar periods? i.e have you ruled
out network-level issues?

You might also want to set "ClientAliveInterval=120" in sshd_config to work
around these.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 578] SSH freezes on cluster machine [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=578

andrews@comp.nus.edu.sg changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED



------- Additional Comments From andrews@comp.nus.edu.sg 2003-06-04 22:04 -------
Thank you for your help.

The problem was actually caused by remote file access through NFS.
Remote execution of my program on a compute node causes some file
to be updated. I was not aware that the filesystem where these files
resided is actually mounted through NFS.

It seems that after a remote execution of my program finishes,
and ssh returns, somehow sshd at the remote side still needs to deal with
the remote file access, which is somehow stuck, thus making ssh stuck.
I have not encountered similar problem after I removed the remote file
access from my program.

Andrew



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.