Mailing List Archive

URGENT: Clamd is wedged on multiple installations
Hi,

Something went badly wrong with clamd recently; it's stuck with
hundreds/thousands of open files per process and interrupting mail flow.

When a scanning thread finishes, I see this in the strace output.
(I ran clamdscan /etc/hosts as a test):

[pid 3707] 02:11:01 sendto(295, "/etc/hosts: OK\n", 15, 0, NULL, 0) = 15
[pid 3707] 02:11:01 shutdown(295, SHUT_RDWR) = 0
[pid 3707] 02:11:01 close(295) = 0
[pid 3707] 02:11:01 futex(0x1933c3c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 387, {1516950691, 0}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid 3707] 02:11:31 futex(0x1933c10, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 3707] 02:11:31 madvise(0x7fae6affe000, 8368128, MADV_DONTNEED) = 0
[pid 3707] 02:11:31 _exit(0) = ?
[pid 3707] 02:11:31 +++ exited with 0 +++

So it scans the file, says it's OK. and then hangs in the futex for 30
seconds.

HELP! This is causing major outages for many of our customers.

Regards,

Dianne.
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
On 26/01/2018 18:33, Dianne Skoll wrote:
> Hi,
>
> Something went badly wrong with clamd recently; it's stuck with
> hundreds/thousands of open files per process and interrupting mail flow.
>
> When a scanning thread finishes, I see this in the strace output.
> (I ran clamdscan /etc/hosts as a test):
>
> [pid 3707] 02:11:01 sendto(295, "/etc/hosts: OK\n", 15, 0, NULL, 0) = 15
> [pid 3707] 02:11:01 shutdown(295, SHUT_RDWR) = 0
> [pid 3707] 02:11:01 close(295) = 0
> [pid 3707] 02:11:01 futex(0x1933c3c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 387, {1516950691, 0}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 3707] 02:11:31 futex(0x1933c10, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 3707] 02:11:31 madvise(0x7fae6affe000, 8368128, MADV_DONTNEED) = 0
> [pid 3707] 02:11:31 _exit(0) = ?
> [pid 3707] 02:11:31 +++ exited with 0 +++
>
> So it scans the file, says it's OK. and then hangs in the futex for 30
> seconds.
>
> HELP! This is causing major outages for many of our customers.
>
It might be useful to let us know some rather minor and truly
insignificant things, such as what version of ClamAV you are running,
and, perhaps, what Operating System you are running it on?

Alas, I fear that our collective crystal balls are currently cloudy, and
do not supply the information required...

Cheers,
Gary B-)
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
Dianne Skoll wrote:

> Hi,
>
> Something went badly wrong with clamd recently; it's stuck with
> hundreds/thousands of open files per process and interrupting mail
> flow.
>
> When a scanning thread finishes, I see this in the strace output.
> (I ran clamdscan /etc/hosts as a test):
>
> [pid 3707] 02:11:01 sendto(295, "/etc/hosts: OK\n", 15, 0, NULL, 0) =
> [15
> [pid 3707] 02:11:01 shutdown(295, SHUT_RDWR) = 0
> [pid 3707] 02:11:01 close(295) = 0
> [pid 3707] 02:11:01 futex(0x1933c3c,
> [.FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 387, {1516950691, 0},
> [ffffffff) = -1 ETIMEDOUT (Connection timed out)
> [pid 3707] 02:11:31 futex(0x1933c10, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid 3707] 02:11:31 madvise(0x7fae6affe000, 8368128, MADV_DONTNEED) =
> [0
> [pid 3707] 02:11:31 _exit(0) = ?
> [pid 3707] 02:11:31 +++ exited with 0 +++
>
> So it scans the file, says it's OK. and then hangs in the futex for 30
> seconds.
>
> HELP! This is causing major outages for many of our customers.

We're seeing something similar - this morning with 0.99.2, later with
0.99.3. I am guessing there is something wrong in the database?


--
Per Jessen, Zürich (8.2°C)
http://www.hostsuisse.com/ - virtual servers, made in Switzerland.

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
On 27/01/2018 00:50, Per Jessen wrote:
> Dianne Skoll wrote:
>
>> Hi,
>>
>> Something went badly wrong with clamd recently; it's stuck with
>> hundreds/thousands of open files per process and interrupting mail
>> flow.
>>
>> When a scanning thread finishes, I see this in the strace output.
>> (I ran clamdscan /etc/hosts as a test):
>>
>> [pid 3707] 02:11:01 sendto(295, "/etc/hosts: OK\n", 15, 0, NULL, 0) =
>> [15
>> [pid 3707] 02:11:01 shutdown(295, SHUT_RDWR) = 0
>> [pid 3707] 02:11:01 close(295) = 0
>> [pid 3707] 02:11:01 futex(0x1933c3c,
>> [.FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 387, {1516950691, 0},
>> [ffffffff) = -1 ETIMEDOUT (Connection timed out)
>> [pid 3707] 02:11:31 futex(0x1933c10, FUTEX_WAKE_PRIVATE, 1) = 0
>> [pid 3707] 02:11:31 madvise(0x7fae6affe000, 8368128, MADV_DONTNEED) =
>> [0
>> [pid 3707] 02:11:31 _exit(0) = ?
>> [pid 3707] 02:11:31 +++ exited with 0 +++
>>
>> So it scans the file, says it's OK. and then hangs in the futex for 30
>> seconds.
>>
>> HELP! This is causing major outages for many of our customers.
>
> We're seeing something similar - this morning with 0.99.2, later with
> 0.99.3. I am guessing there is something wrong in the database?
>
Not seeing any problems here, running 0.99.2 on Solaris 11.3, logs say:
--------------------------------------
Jan 26 23:39:45 paranoia freshclam[4271]: [ID 702911 local6.info]
Received signal: wake up
Jan 26 23:39:45 paranoia freshclam[4271]: [ID 702911 local6.info] ClamAV
update process started at Fri Jan 26 23:39:45 2018
Jan 26 23:39:45 paranoia freshclam[4271]: [ID 702911 local6.warning]
Your ClamAV installation is OUTDATED!
Jan 26 23:39:45 paranoia freshclam[4271]: [ID 702911 local6.warning]
Local version: 0.99.2 Recommended version: 0.99.3
Jan 26 23:39:45 paranoia freshclam[4271]: [ID 702911 local6.info] DON'T
PANIC! Read http://www.clamav.net/documents/upgrading-clamav
Jan 26 23:39:46 paranoia freshclam[4271]: [ID 702911 local6.info]
main.cld is up to date (version: 58, sigs: 4566249, f-level: 60,
builder: sigmgr)
Jan 26 23:39:46 paranoia freshclam[4271]: [ID 702911 local6.info]
daily.cld is up to date (version: 24257, sigs: 1835982, f-level: 63,
builder: neo)
Jan 26 23:39:46 paranoia freshclam[4271]: [ID 702911 local6.info]
bytecode.cld is up to date (version: 319, sigs: 75, f-level: 63,
builder: neo)
Jan 26 23:39:46 paranoia freshclam[4271]: [ID 702911 local6.info]
--------------------------------------

The local date and time there is around 20170126T0239 UTC.

I'm debating updating to 0.99.3, I get sick of seeing the compiler
warnings, it takes me a while to decide that they're not a problem, it's
(probably) just sloppy coding.

Cheers,
Gary B-)
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
Per Jessen wrote:

> Dianne Skoll wrote:
>
>> Hi,
>>
>> Something went badly wrong with clamd recently; it's stuck with
>> hundreds/thousands of open files per process and interrupting mail
>> flow.
>>
>> When a scanning thread finishes, I see this in the strace output.
>> (I ran clamdscan /etc/hosts as a test):
>>
>> [pid 3707] 02:11:01 sendto(295, "/etc/hosts: OK\n", 15, 0, NULL, 0)
>> [= 15
>> [pid 3707] 02:11:01 shutdown(295, SHUT_RDWR) = 0
>> [pid 3707] 02:11:01 close(295) = 0
>> [pid 3707] 02:11:01 futex(0x1933c3c,
>> [.FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 387, {1516950691,
>> [.0}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
>> [pid 3707] 02:11:31 futex(0x1933c10, FUTEX_WAKE_PRIVATE, 1) = 0
>> [pid 3707] 02:11:31 madvise(0x7fae6affe000, 8368128, MADV_DONTNEED)
>> [= 0
>> [pid 3707] 02:11:31 _exit(0) = ?
>> [pid 3707] 02:11:31 +++ exited with 0 +++
>>
>> So it scans the file, says it's OK. and then hangs in the futex for
>> 30 seconds.
>>
>> HELP! This is causing major outages for many of our customers.
>
> We're seeing something similar - this morning with 0.99.2, later with
> 0.99.3. I am guessing there is something wrong in the database?

We run our own daemon using the clamav libraries - in /tmp I see
directories like this:

clamav-168beed7241e57cd439e26cd5fd96eac.tmp/

they seem to be piling up and our daemon eventually runs out of file
descriptors.


--
Per Jessen, Zürich (8.1°C)

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
Gary R. Schmidt wrote:

> It might be useful to let us know some rather minor and truly
> insignificant things, such as what version of ClamAV you are running,
> and, perhaps, what Operating System you are running it on?

ClamAV 0.99.2, various versions of Linux including Debian Jessie x86_64
and Debian Stretch x86_64.

> Alas, I fear that our collective crystal balls are currently cloudy, and
> do not supply the information required...

Sarcasm is immensely appreciated when dealing with a massive shitstorm.

I note that the ClamAV developers have *still* (10:05 Eastern) not
pulled the bad signature and *still* have not posted any sort of announcement.

I am attempting to build 0.99.3 with this patch:

https://gist.github.com/manuelm/dbc94001c77c07363cdcb5b390c2cb04

Regards,

Dianne.
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
W dniu 2018-01-26 15:26, Per Jessen napisa?(a):
> Per Jessen wrote:
>
>> Dianne Skoll wrote:
>>
>>> Hi,
>>>
>>> Something went badly wrong with clamd recently; it's stuck with
>>> hundreds/thousands of open files per process and interrupting mail
>>> flow.
>>>
>>> When a scanning thread finishes, I see this in the strace output.
>>> (I ran clamdscan /etc/hosts as a test):
>>>
>>> [pid 3707] 02:11:01 sendto(295, "/etc/hosts: OK\n", 15, 0, NULL, 0)
>>> [= 15
>>> [pid 3707] 02:11:01 shutdown(295, SHUT_RDWR) = 0
>>> [pid 3707] 02:11:01 close(295) = 0
>>> [pid 3707] 02:11:01 futex(0x1933c3c,
>>> [.FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 387, {1516950691,
>>> [.0}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
>>> [pid 3707] 02:11:31 futex(0x1933c10, FUTEX_WAKE_PRIVATE, 1) = 0
>>> [pid 3707] 02:11:31 madvise(0x7fae6affe000, 8368128, MADV_DONTNEED)
>>> [= 0
>>> [pid 3707] 02:11:31 _exit(0) = ?
>>> [pid 3707] 02:11:31 +++ exited with 0 +++
>>>
>>> So it scans the file, says it's OK. and then hangs in the futex for
>>> 30 seconds.
>>>
>>> HELP! This is causing major outages for many of our customers.
>>
>> We're seeing something similar - this morning with 0.99.2, later with
>> 0.99.3. I am guessing there is something wrong in the database?
>
> We run our own daemon using the clamav libraries - in /tmp I see
> directories like this:
>
> clamav-168beed7241e57cd439e26cd5fd96eac.tmp/
>
> they seem to be piling up and our daemon eventually runs out of file
> descriptors.

Same here. clamd 0.99.2

I have clamd processes running on separate servers.
Each running clamd keeps 1024 file descriptors open (all that it can):

# ls -l /proc/29866/fd
total 0
lr-x------ 1 root root 64 Jan 26 15:44 0 -> pipe:[4007352]
l-wx------ 1 root root 64 Jan 26 15:44 1 -> pipe:[4007353]
l-wx------ 1 root root 64 Jan 26 15:44 10 -> pipe:[4006371]
lrwx------ 1 root root 64 Jan 26 15:44 100 ->
/tmp/clamav-10911737c186fa7dc6ef20e590b66015.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1000 ->
/tmp/clamav-66ef5c3821ee0de410a09cb56158637c.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1001 ->
/tmp/clamav-35ef3a074c015870de87bca44ccbae69.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1002 ->
/tmp/clamav-400a243c437eb8f030a095b885586b45.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1003 ->
/tmp/clamav-509983ba9c3f50435f102bfe19b8149b.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1004 ->
/tmp/clamav-018c4e6e767f7d994fb343a4d0dd6a45.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1005 ->
/tmp/clamav-1381c37f81ff0a4426c4648f9b88cdd0.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1006 ->
/tmp/clamav-20bf11a62d664d167ccb743277404413.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1007 ->
/tmp/clamav-da1a0ea777e27fa22156e17c9ae082bd.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1008 ->
/tmp/clamav-ca777e11ae7a58542f5dfa07176ce58f.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1009 ->
/tmp/clamav-e0f872aed2725b9eeae6420ad6f11cb3.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 101 ->
/tmp/clamav-74354d2674a0e4003b6c781e0976f177.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1010 ->
/tmp/clamav-f28e4189d6933861c92a0f714bced25f.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1011 ->
/tmp/clamav-784cdb452ed3fb2acec5c02326cf45e7.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1012 ->
/tmp/clamav-6be671cee5cc4c7f159f88f3582bf745.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1013 ->
/tmp/clamav-55f5e509b4d6c995cda89cf68c02dad4.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1014 ->
/tmp/clamav-c60608f4e98ad8774bf0ced9c7d4c7b3.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1015 ->
/tmp/clamav-848055158702437590e6f30378576dac.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1016 ->
/tmp/clamav-0456073a9443ab3e3f6990306cc82666.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1017 ->
/tmp/clamav-9132c07058a0046bb845ed5887b77b92.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1018 ->
/tmp/clamav-9ba25f2151327a7daaf44914467d9fb2.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1019 ->
/tmp/clamav-55ac8c33daaf55d3dda527f55bfc71b3.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 102 ->
/tmp/clamav-9f49783a5c8797427c5ba00693c2c103.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1020 ->
/tmp/clamav-ca3051fda967d90873121fcc62023102.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1021 ->
/tmp/clamav-6b38a0aa9921810680a2b7dbcb1b1405.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1022 ->
/tmp/clamav-bb3ecaa7e3702cf0e828709df363a04d.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 1023 ->
/tmp/clamav-4da381032e4a1f382b67e234c33d8003.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 103 ->
/tmp/clamav-4646c34f92eb8dec155b96408a325e80.tmp (deleted)
lrwx------ 1 root root 64 Jan 26 15:44 104 ->
/tmp/clamav-179e91d62f159efb7d8c278a3724c17c.tmp (deleted)
...

clamd can't accept more connections and is logging "-> ERROR: accept()
failed:"
Problem started today about 6 am CET (with higher mail traffic).


Regards,
Jacek
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
Jacek Zapa?a wrote:

> Same here. clamd 0.99.2
>
> I have clamd processes running on separate servers.
> Each running clamd keeps 1024 file descriptors open (all that it
> can):
>
> # ls -l /proc/29866/fd
> total 0
> lr-x------ 1 root root 64 Jan 26 15:44 0 -> pipe:[4007352]
> l-wx------ 1 root root 64 Jan 26 15:44 1 -> pipe:[4007353]
> l-wx------ 1 root root 64 Jan 26 15:44 10 -> pipe:[4006371]
> lrwx------ 1 root root 64 Jan 26 15:44 100 ->
> /tmp/clamav-10911737c186fa7dc6ef20e590b66015.tmp (deleted)
> lrwx------ 1 root root 64 Jan 26 15:44 1000 ->
> /tmp/clamav-66ef5c3821ee0de410a09cb56158637c.tmp (deleted)
> lrwx------ 1 root root 64 Jan 26 15:44 1001 ->
> /tmp/clamav-35ef3a074c015870de87bca44ccbae69.tmp (deleted)
> lrwx------ 1 root root 64 Jan 26 15:44 1002 ->
> /tmp/clamav-400a243c437eb8f030a095b885586b45.tmp (deleted)
> lrwx------ 1 root root 64 Jan 26 15:44 1003 ->
> /tmp/clamav-509983ba9c3f50435f102bfe19b8149b.tmp (deleted)
> lrwx------ 1 root root 64 Jan 26 15:44 1004 ->
[snip]


In case this helps anyone, we have worked around this by letting our
daemon check /proc/self/fd after each scan and look for "(deleted)".
File descriptors that match are then closed.
I appreciate this won't directly help much, but maybe it'll inspire
someone ?


--
Per Jessen, Zürich (8.1°C)
http://www.cloudsuisse.com/ - your owncloud, hosted in Switzerland.

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
On Fri, Jan 26, 2018 at 04:51:26PM +0100, Per Jessen wrote:
>Jacek Zapa?a wrote:
>
>[snip]
>
>
>In case this helps anyone, we have worked around this by letting our
>daemon check /proc/self/fd after each scan and look for "(deleted)".
>File descriptors that match are then closed.
>I appreciate this won't directly help much, but maybe it'll inspire
>someone ?


We have found the signature causing the issue, have dropped it, and are buidling a new daily right now. This should resolve the immediate issue. We will look at handling the cause of the issue in the ClamAV engine in an upcoming version.

--
Joel Esler
Manager
Open Source, Design, Web, and Education
Talos Group
http://www.talosintelligence.com
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: URGENT: Clamd is wedged on multiple installations [ In reply to ]
Joel Esler wrote:

> On Fri, Jan 26, 2018 at 04:51:26PM +0100, Per Jessen wrote:
>>Jacek Zapa?a wrote:
>>
>>[snip]
>>
>>
>>In case this helps anyone, we have worked around this by letting our
>>daemon check /proc/self/fd after each scan and look for "(deleted)".
>>File descriptors that match are then closed.
>>I appreciate this won't directly help much, but maybe it'll inspire
>>someone ?
>
>
> We have found the signature causing the issue, have dropped it, and
> are buidling a new daily right now. This should resolve the immediate
> issue.

Thanks for the update Joel, that's great!



--
Per Jessen, Zürich (8.1°C)
http://www.hostsuisse.com/ - virtual servers, made in Switzerland.

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml