Mailing List Archive

Aha! snmp_portscan.nes lockups
All:

Aha! This is so typical, I work on this thing for a few days, think of
dozens of things, build test / debug code, etc ... finally post to the
mailing list, and an hour after that, I [seemingly] ascertain what the
problem is...

It SEEMS like the problem is a combination of the timeout not killing the
process, and a problem with the nessusd plugin code. To get this problem to
happen, I just set the timeout value of the snmp_portscan plugin to 15
seconds and kicked off a scan. About halfway through the third snmpwalk
execution, the test timed out, and that was where the problem happened. As
I mentioned before, sigterm_handler doesn't keep the latest snmpwalk child
pid, it only keeps the pid from the version discovery execution. Therefore,
this scan was unable to be killed.

Next, I don't have a clue WHY (yet), but the nessusd child running
snmp_portscan.nes never exits. I ran gdb on this pid, and I got this:

#0 0x420dabc2 in recv () from /lib/i686/libc.so.6
#1 0x400537fa in comm_send_status () from /usr/lib/libnessus.so.2
#2 0x4020cdc6 in plugin_run (desc=0x8123750) at snmp_portscan.c:365
#3 0x080544e0 in nes_thread (args=0x8123750) at nes_plugins.c:310
#4 0x0804ebd1 in create_process (function=0x8054388 <nes_thread>,
argument=0x8123750) at processes.c:108
#5 0x0805433f in nes_plugin_launch (globals=0x812df88, plugin=0x8123750,
hostinfos=0x81af520, preferences=0x806e608, kb=0x81c1be0,
name=0x81c1c6a "", soc=7) at nes_plugins.c:251
#6 0x08059e17 in plugin_launch (globals=0x812df88, plugin=0x8122ad0,
hostinfos=0x81af520, preferences=0x806e608, key=0x81c1be0,
name=0x81c1c40 "/usr/lib/nessus/plugins/snmp_portscan.nes",launcher=0xa7f)
at pluginlaunch.c:503
#7 0x0804bcad in launch_plugin (globals=0x812df88, plugins=0x8122ad0,
hostname=0xbfffd988 "10.1.2.9", cur_plug=0xbfffd878, num_plugs=1784,
hostinfos=0x81af520, key=0x81c1be0, new_kb=1) at attack.c:271
#8 0x0804c05e in attack_host (globals=0x812df88, hostinfos=0x81af520,
hostname=0xbfffd988 "10.1.2.9", sched=0x81591b0) at attack.c:423
#9 0x0804c261 in attack_start (args=0x81af520) at attack.c:524
#10 0x0804ebd1 in create_process (function=0x804c11c <attack_start>,
argument=0xbfffd970) at processes.c:108
#11 0x0804cb4f in attack_network (globals=0x812df88) at attack.c:820
#12 0x08055267 in server_thread (globals=0x812df88) at nessusd.c:526
#13 0x0804ebd1 in create_process (function=0x8054d88 <server_thread>,
argument=0x812df88) at processes.c:108
#14 0x080557b7 in main_loop () at nessusd.c:860
#15 0x0805624e in main (argc=0, argv=0xbfffe424, envp=0xbfffe438)
at nessusd.c:1323
#16 0x420158d4 in __libc_start_main () from /lib/i686/libc.so.6

... SO the child appears to be waiting for the status from the parent
nessusd process. This completely locks up the works - nothing continues
until the child process (the snmp_portscan.nes process) is kill -9'd.

If the sigterm handler is modified so that it kills its own pid after
killing the snmpwalk child (and all the changes that go with that change),
this problem doesn't show up.

I personally would initialize the snmpwalk_process variable to something
like -1 or 0, then check to make sure the value is > 0 before calling kill
with that as an arg - otherwise, it seems like you'd have a race condition
in which nessud and everything in its process group would be whacked
(assuming the user cancels the scan at exactly the right moment).

I don't know enough about the nessusd plugin scheduler or the control
connection (where it seems to be locked up) to suggest a definitive fix or
an accurate analysis of the problem, but it seems like nessusd isnt
expecting to have to ack something, but since the plugin isn't killed, it
ends up having to ack it.

So that left me with the problem of trying to figure out how on earth this
was triggered - snmpwalk's default timeout is around 6 seconds, and snmpwalk
is only run (for a scan) 4 times per scan (at least when WIN_INST_SOFT isn't
defined). It's run another time to get the version, but that usually ends
up running for well under a second. Therefore, my best guess is that these
people were running several gigantic scans at once. The load must've been
extremely high, which caused snmpwalk to take 9+ seconds to execute, which
pushed the execution time over the limit.

So, for a resolution, I can submit my suggested fix (context diff patch?)
but I can't help thinking that there was a reason why the child pids were
ignored. I'll wait to see what other people say prior to submitting any
code.

Thank you,

Brian Costello
btx@calyx.net
Re: Aha! snmp_portscan.nes lockups [ In reply to ]
On Fri, 7 May 2004, Brian Costello wrote:

> It SEEMS like the problem is a combination of the timeout not killing the
> process, and a problem with the nessusd plugin code. To get this problem to
> happen, I just set the timeout value of the snmp_portscan plugin to 15
> seconds and kicked off a scan. About halfway through the third snmpwalk
> execution, the test timed out, and that was where the problem happened. As
> I mentioned before, sigterm_handler doesn't keep the latest snmpwalk child
> pid, it only keeps the pid from the version discovery execution. Therefore,
> this scan was unable to be killed.

....

> So, for a resolution, I can submit my suggested fix (context diff patch?)
> but I can't help thinking that there was a reason why the child pids were
> ignored. I'll wait to see what other people say prior to submitting any
> code.

Did you submit the code somewhere? I hit the same issue but without the
coding skills to get this far into analyzing and fixing it.

Hugo.

--
All email sent to me is bound to the rules described on my homepage.
hvdkooij@vanderkooij.org http://hvdkooij.xs4all.nl/
Don't meddle in the affairs of sysadmins,
for they are subtle and quick to anger.