Now that the list is said to be open again, I'm resending this. I've
merged my changes into OpenSSH 2.1.0 as Kris imported it into FreeBSD over
the weekend.
---------- Forwarded message ----------
Date: Thu, 4 May 2000 08:40:22 -0500 (CDT)
From: Guy Helmer <ghelmer@cs.iastate.edu>
To: openssh-unix-dev@mindrot.org
Subject: OpenSSH (1.2.3) sshd hanging when using rsync over ssh
I have debugged a problem with OpenSSH's sshd (as found in FreeBSD, based
on OpenSSH 1.2.3) that has been bugging me ever since I switched from
ssh-1.2.27.
I use rsync (FreeBSD port ports/net/rsync) over ssh to synchronize and
backup my main home directory and development directories to other
systems. rsync always worked great with ssh-1.2.2[67].
Since I switched my machines to run OpenSSH's sshd, rsync over ssh would
randomly hang (although the hangs were very persistent when synchronizing
large files). I noticed from netstat that the connection to ssh on the
sshd server machine showed waiting data in the Recv-Q, but no waiting data
in the Send-Q, so I decided to look into sshd. I grabbed a core from sshd
when this hang happened, and gdb showed this stack trace:
#0 0x281e20c4 in write () from /usr/lib/libc.so.4
#1 0x804fb18 in process_output (writeset=0xbfbfed04)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/serverloop.c:366
#2 0x8050029 in server_loop (pid=43486, fdin_arg=9, fdout_arg=9, fderr_arg=11)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/serverloop.c:563
#3 0x8053b60 in do_exec_no_pty (
command=0x80750c0 "rsync --server --sender -vlgtpr --delete . /home/ghelmer/
", pw=0xbfbfef80, display=0x806c0a0 "mocha.cs.iastate.edu:10.0",
auth_proto=0x806c100 "MIT-MAGIC-COOKIE-1",
auth_data=0x8075000 "cdf4b6cb730310be3d51a8abf77303fc")
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:2211
#4 0x805386c in do_authenticated (pw=0xbfbfef80)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:2037
#5 0x80527b4 in do_authentication ()
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:1408
#6 0x8051b43 in main (ac=1, av=0xbfbff624)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:970
#7 0x804aae1 in _start ()
The code around frame #1 was
361 {
362 int len;
363
364 /* Write buffered data to program stdin. */
365 if (fdin != -1 && FD_ISSET(fdin, writeset)) {
366 len = write(fdin, buffer_ptr(&stdin_buffer),
367 buffer_len(&stdin_buffer));
368 if (len <= 0) {
369 #ifdef USE_PIPES
370 close(fdin);
and stdin_buffer contains
$2 = {buf = 0x80b1000 "è\004\212D\204úc½", alloc = 45056, offset = 0,
end = 8192}
So, it appears sshd was stuck in a write() that wouldn't complete. (Even
when I kill the ssh client, sshd hangs around and never notices that the
connection has gone away.)
I figured this was probably something that was fixed in ssh-1.2.27, and
sure enough, fdin was set to be nonblocking and errno was checked for the
value EWOULDBLOCK in process_output. I added similar code to
serverloop.c, and now rsync over ssh works great.
I'm worried that my code is tainted, though, since I looked at the
ssh-1.2.27 sources. If you don't think it is a problem, and if you are
interested, I can send you my diffs... I don't have ties to OpenBSD, so
I'm not sure who in particular I should contact about this.
Thanks,
Guy
Guy Helmer, Ph.D. Candidate, Iowa State University Dept. of Computer Science
Research Assistant, Dept. of Computer Science --- ghelmer@cs.iastate.edu
http://www.cs.iastate.edu/~ghelmer
merged my changes into OpenSSH 2.1.0 as Kris imported it into FreeBSD over
the weekend.
---------- Forwarded message ----------
Date: Thu, 4 May 2000 08:40:22 -0500 (CDT)
From: Guy Helmer <ghelmer@cs.iastate.edu>
To: openssh-unix-dev@mindrot.org
Subject: OpenSSH (1.2.3) sshd hanging when using rsync over ssh
I have debugged a problem with OpenSSH's sshd (as found in FreeBSD, based
on OpenSSH 1.2.3) that has been bugging me ever since I switched from
ssh-1.2.27.
I use rsync (FreeBSD port ports/net/rsync) over ssh to synchronize and
backup my main home directory and development directories to other
systems. rsync always worked great with ssh-1.2.2[67].
Since I switched my machines to run OpenSSH's sshd, rsync over ssh would
randomly hang (although the hangs were very persistent when synchronizing
large files). I noticed from netstat that the connection to ssh on the
sshd server machine showed waiting data in the Recv-Q, but no waiting data
in the Send-Q, so I decided to look into sshd. I grabbed a core from sshd
when this hang happened, and gdb showed this stack trace:
#0 0x281e20c4 in write () from /usr/lib/libc.so.4
#1 0x804fb18 in process_output (writeset=0xbfbfed04)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/serverloop.c:366
#2 0x8050029 in server_loop (pid=43486, fdin_arg=9, fdout_arg=9, fderr_arg=11)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/serverloop.c:563
#3 0x8053b60 in do_exec_no_pty (
command=0x80750c0 "rsync --server --sender -vlgtpr --delete . /home/ghelmer/
", pw=0xbfbfef80, display=0x806c0a0 "mocha.cs.iastate.edu:10.0",
auth_proto=0x806c100 "MIT-MAGIC-COOKIE-1",
auth_data=0x8075000 "cdf4b6cb730310be3d51a8abf77303fc")
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:2211
#4 0x805386c in do_authenticated (pw=0xbfbfef80)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:2037
#5 0x80527b4 in do_authentication ()
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:1408
#6 0x8051b43 in main (ac=1, av=0xbfbff624)
at /usr/src/secure/usr.sbin/sshd/../../../crypto/openssh/sshd.c:970
#7 0x804aae1 in _start ()
The code around frame #1 was
361 {
362 int len;
363
364 /* Write buffered data to program stdin. */
365 if (fdin != -1 && FD_ISSET(fdin, writeset)) {
366 len = write(fdin, buffer_ptr(&stdin_buffer),
367 buffer_len(&stdin_buffer));
368 if (len <= 0) {
369 #ifdef USE_PIPES
370 close(fdin);
and stdin_buffer contains
$2 = {buf = 0x80b1000 "è\004\212D\204úc½", alloc = 45056, offset = 0,
end = 8192}
So, it appears sshd was stuck in a write() that wouldn't complete. (Even
when I kill the ssh client, sshd hangs around and never notices that the
connection has gone away.)
I figured this was probably something that was fixed in ssh-1.2.27, and
sure enough, fdin was set to be nonblocking and errno was checked for the
value EWOULDBLOCK in process_output. I added similar code to
serverloop.c, and now rsync over ssh works great.
I'm worried that my code is tainted, though, since I looked at the
ssh-1.2.27 sources. If you don't think it is a problem, and if you are
interested, I can send you my diffs... I don't have ties to OpenBSD, so
I'm not sure who in particular I should contact about this.
Thanks,
Guy
Guy Helmer, Ph.D. Candidate, Iowa State University Dept. of Computer Science
Research Assistant, Dept. of Computer Science --- ghelmer@cs.iastate.edu
http://www.cs.iastate.edu/~ghelmer