Mailing List Archive

[Bug 948] high CPU in sshd after tcp_wrappers deny
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-15 04:40 -------
(In reply to comment #12)
> Still can't reproduce it. Another thing to try: doing a system call trace the
> cpu-eating sshd and see what it's doing.
>
> That's truss on Solaris and tusc on HP-UX. (tusc is an unsupported tool
> supplied by HP: ftp://ftp.cup.hp.com/dist/networking/tools/)

I haven't found a way to reproduce the bug at will, though I suspect some
aggressive attack script may be involved, because of the timing of when I
started seeing the problems.

The patch doesn't seem to help me, on Solaris 8, either.

I was using CVS snapshorts of OpenSSL and OpenSSH:

openssl-0.9.7-stable-SNAP-20050110
openssh-SNAP-20050204

with this script to run configure:
- - -
#!/usr/bin/csh -vx
rm -f config.cache
set path=(/usr/local/openssl-r/bin $path)
setenv CC "/opt/SUNWspro/bin/cc"
#setenv CFLAGS "-I/usr/local/openssl-r/include -I/usr/local/zlib-1.2.1/include
-I/usr/local/include"
#setenv LFLAGS "-L/usr/local/openssl-r -L/usr/local/zlib-1.2.1/lib -L/usr/local/lib"
setenv CFLAGS "-I/usr/local/openssl-r/include -I/usr/local/include"
setenv LFLAGS "-L/usr/local/openssl-r -L/usr/local/lib"
nohup sh ./configure --prefix=/usr/local/openssh-r \
--sysconfdir=/etc/openssh \
--with-ssl-dir=/usr/local/openssl-r/lib \
--with-prngd-socket=/local-adm-pub/prngd/egd-pool \
--with-tcp-wrappers \
--with-4in6 \
--with-pam \
# --with-zlib=/usr/local/zlib-1.2.1/lib \
--with-xauth=/usr/openwin/bin/xauth \
--with-default-path="/usr/bin:/usr/ucb:/usr/openwin/bin:/usr/local/bin" \
>& my.configure.out.$$ &

#
echo my.configure.out.$$
#
ps -f
#
- - -




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-15 04:51 -------
Created an attachment (id=822)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=822&action=view)
trust output on high CPU process

I tried various truss options, seeing something more verbose/informative.

This is a sample of the output of
truss -f -rall -vall -D -p 4183 | & more

It seems to only be showing calls to read, a few of which take a lot of time,
and most of which don't take much.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-15 05:11 -------
Created an attachment (id=823)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=823&action=view)
ps -ef output for high CPU process used in truss

This is the output of ps -ef on the failing processes in the case I did the
truss on.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-15 05:30 -------
Created an attachment (id=824)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=824&action=view)
corresponding syslog messages from ssh, tcp_wrappers, prngd

These are the messages in syslog from about the time that the ps output seems
to imply the high-cpu processes started. I ran egrep '20:01:3' on the log file.


This file gets messages from prngd, sshd, and tcp_wrappers. I note there's an
error from prngd (which I'm using as a random number source), could that be a
factor in the problem? (I'm using prngd, because this server was a HP-UX box in
a previous life..)

The tcp_wrappers rules are first a number of exception rules of the form:

sshd,in.ftpd: SOME_ADDRESS : rfc931 15 : keepalive : nice 1 : allow

Where SOME_ADDRESS is an IP address, a DNS host address, or a domain
suffix(.foo.example.com) for which we want to allow traffic:

There's a generic allow rule for on-campus traffic:

sshd: .ourdomain.edu : nice 1 : allow

anything else falls thru to a default deny rule:

ALL: ALL : deny

Could the use of rfc931 lookups trigger problems?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-15 05:40 -------
Created an attachment (id=825)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=825&action=view)
corresponding output of /usr/ucb/ps -auxwww

This is the output of /usr/ucb/ps -auxwww, around the time the problem started,
as captured by a cron job running at 15 minute intervals.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From dtucker@zip.com.au 2005-02-15 09:27 -------
(In reply to comment #14)
> It seems to only be showing calls to read, a few of which take a lot of time,
> and most of which don't take much.

This read loop is probably the cause of the CPU utilization. It would appears
that something, somewhere keeps retrying a read after a descriptor closes.

A couple of suggestions for more getting more info:
Can you use lsof or similar to find out what the descriptor is? (It's 3 in this
case but there's no guarantee it'll be the same, so check with truss first).

Attach a debugger to the looping process and get a backtrace to find out where
it's stuck. With gdb this is something like:
# gdb /path/to/sshd pid
(gdb) backtrace

If you use the sshd before it's stripped (ie from the build dir) then you will
get better info. You don't have to be running the one with the debug symbols,
you can be running the stripped sshd and use the one with the debug symbols with
gdb as long as the two sshd's match.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-19 11:34 -------
I was able to reproduce the problem, by creating a tcp_wrappers deny rule for a
local box, running a test server on a high port, and running the "ab" (apache
benchmark utility) against that port.

In fact it was even easier to mess it up by running "ab" against the port
without the tcp_wrappers deny rule.

This binary was build with the Sun compiler suite, not gcc, so I used the
corresponding "dpx" debugger traceback feature.

It looks like it's looping deep in the random number generation code; several
traces all looked similar.

I suspect the common factor between me and the other case is use of prngd or a
similar daemon, rather than the sun /dev/random patch. This set-up was brought
over from HP-UX which didn't have a /dev/random patch, and the other report is
from HP-UX. Plus, there were syslog errors from prngd.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From atlunde@panix.com 2005-02-19 11:38 -------
Created an attachment (id=830)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=830&action=view)
ps output and dbx traceback

This is the results of running

ab -n 100 -t 30 -c 50 http://HOSTNAME:4022/

on the test server port from a blocked host.

Afterwards I did a ps and a traceback with dbx on one of the runaway processes.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From dtucker@zip.com.au 2005-02-19 13:55 -------
Created an attachment (id=831)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=831&action=view)
check for closed fd in openssl's RAND_query_egd_bytes

This appears to be a bug in OpenSSL's prngd interface.

I think what is happening is prngd is closing sockets (the "fairness" limits?
if so does the problem occur with prngd-0.9.27?) and OpenSSL does not handle
the the descriptor closing (read() will return zero) and repeatedly retries the
read.

Please try this patch to OpenSSL (against 0.9.7e but may apply to others),
rebuild then rebuild OpenSSH with the new OpenSSL.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948


dtucker@zip.com.au changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #831 is|0 |1
obsolete| |




------- Additional Comments From dtucker@zip.com.au 2005-02-19 18:41 -------
Created an attachment (id=832)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=832&action=view)
a better fix for closed fd in openssl's RAND_query_egd_bytes




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From dtucker@zip.com.au 2005-02-19 18:57 -------
Created an attachment (id=833)
--> (http://bugzilla.mindrot.org/attachment.cgi?id=833&action=view)
hack prngd to provoke failure at will

This is a nasty hack to prngd in case anyone wants to reproduce this: apply to
prngd-0.9.29 build and run "./prngd /var/run/egd-pool", then run "openssl rand
-out /dev/null 512" and watch the CPU burn...

I have submitted this and the fix upstream to OpenSSL's request tracker (via
email, no request number yet).



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From djm@mindrot.org 2005-02-19 19:26 -------
(From update of attachment 832)
> num = read(fd, egdbuf, 1);
>- if (num >= 0)
>+ if (num == 0)
>+ goto err; /* descriptor closed */
>+ else if (num > 0)

What about the (num == -1) case? Does it cope with EINTR and EAGAIN?




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948





------- Additional Comments From dtucker@zip.com.au 2005-02-19 19:33 -------
(In reply to comment #24)
> What about the (num == -1) case? Does it cope with EINTR and EAGAIN?

Yeah, that's already handled a bit further down, just out reach of the context.
The entire block of code (compacted to save space) is:

if (num == 0)
goto err; /* descriptor closed */
else if (num > 0)
numbytes += num;
else {
switch (errno) {
#ifdef EINTR
case EINTR:
#endif
#ifdef EAGAIN
case EAGAIN:
#endif
/* No error, try again */
break;
default:
ret = -1;
goto err; /* failure */
}



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948


dtucker@zip.com.au changed:

What |Removed |Added
----------------------------------------------------------------------------
URL| |http://www.aet.tu-
| |cottbus.de/rt2/Ticket/Displa
| |y.html?id=1014




------- Additional Comments From dtucker@zip.com.au 2005-02-19 21:28 -------
The OpenSSL folks seem to have accepted the bug and applied the patch:
http://www.aet.tu-cottbus.de/rt2/Ticket/Display.html?id=1014

If the patch fixes your problem, please close this bug as it's not a problem
with OpenSSH.





------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 948] high CPU in sshd after tcp_wrappers deny [ In reply to ]
http://bugzilla.mindrot.org/show_bug.cgi?id=948


atlunde@panix.com changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED




------- Additional Comments From atlunde@panix.com 2005-02-26 00:54 -------
With the patch 832 to OpenSSL, test loads that previously produced several
run-away processes, produced no extra lingering processes.

(This was based on the same CVS snapshots of OpenSSL and OpenSSH source used in
my truss and dbx debug tests, which includes the previously suggested patch from
bug 973.)




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.