Mailing List Archive

[clamav-users] Extremely slow PDF file scanning
Hello,

I?m investigating why it takes about five minutes for ClamAV 0.104.0 to scan PDF file. Can someone help me, please? Looks like some sort of format/parsing defect to me because the issue is not reproducible if I modify the file using pdftk, e.g. append page in front.

Additional information below.

Execution time:


root@da6a7952db76:/tmp# time echo "nSCAN /tmp/1.pdf" | nc localhost 3310

/tmp/1.pdf: OK



real 4m53.823s

user 0m0.001s

sys 0m0.006s

File: https://storage.googleapis.com/upload-samples/Museum_26MB.pdf
Config file: https://storage.googleapis.com/upload-samples/clamd.conf
Debug log: https://storage.googleapis.com/upload-samples/debug.log
Extract file size data: https://storage.googleapis.com/upload-samples/files.log

Thanks,
Nikolay


________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Hi Ged,

Thank you for your response! You are right, the configuration below is not secure and thanks again for pointing to this! For production we’ll use different configuration that simply issues alert for the provided file. The one attached here is more a less copy of default configuration with the limits increased to extremely high values in attempt to get scanning completed and then try to tweak them down to see what exactly is causing the alert on the file.

Release notes for 0.104.0 mention “Fixed bytecode match evaluation for PDF bytecode hooks in PDF file scans.”. Looks like something that’s been fixed, yes. For curiosity I have tried to disable bytecode and the scanning completes faster of course, but it must be enabled for best results AFAIK.

Looking at the total size of temporary files created, I see that scan has produced 1.7Gb of temporary files extracted from PDF and that may explain why scanning takes that much time. For comparison, when I extract all images from the file using pdfimages program, I’m seeing about 21Mb. And another comparison, when I use pdftk to concatenate original file with the simple one-page text file and scan result, scan takes just two seconds and the size of temporary files produced is about 6Mb. I.e. for a slightly larger input scan time is much better! That’s the reason I’m suspecting there may be something in the original file that confuses ClamAV parser / analyzer and would be interesting for ClamAV developers to check. On the side note, I have tried few large PDF files from Google scanned library, no issues at all and they are scanned quickly.

Best regards,
Nikolay

From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Date: Saturday, September 25, 2021 at 10:44
To: Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Extremely slow PDF file scanning
Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Any feedback from ClamAV developers, please: should I open a defect for the problem or is it expected that PDF file scanning takes few minutes?

Thanks,
Nikolay

From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Saturday, September 25, 2021 at 11:32
To: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning
Hi Ged,

Thank you for your response! You are right, the configuration below is not secure and thanks again for pointing to this! For production we’ll use different configuration that simply issues alert for the provided file. The one attached here is more a less copy of default configuration with the limits increased to extremely high values in attempt to get scanning completed and then try to tweak them down to see what exactly is causing the alert on the file.

Release notes for 0.104.0 mention “Fixed bytecode match evaluation for PDF bytecode hooks in PDF file scans.”. Looks like something that’s been fixed, yes. For curiosity I have tried to disable bytecode and the scanning completes faster of course, but it must be enabled for best results AFAIK.

Looking at the total size of temporary files created, I see that scan has produced 1.7Gb of temporary files extracted from PDF and that may explain why scanning takes that much time. For comparison, when I extract all images from the file using pdfimages program, I’m seeing about 21Mb. And another comparison, when I use pdftk to concatenate original file with the simple one-page text file and scan result, scan takes just two seconds and the size of temporary files produced is about 6Mb. I.e. for a slightly larger input scan time is much better! That’s the reason I’m suspecting there may be something in the original file that confuses ClamAV parser / analyzer and would be interesting for ClamAV developers to check. On the side note, I have tried few large PDF files from Google scanned library, no issues at all and they are scanned quickly.

Best regards,
Nikolay

From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Date: Saturday, September 25, 2021 at 10:44
To: Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Extremely slow PDF file scanning
Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Hi Nikolay,

Sorry this slipped by me. I'd be happy to take a look at the PDF you were having scan speed issues with. I see that it's no longer available with the URL you originally provided. If you could share it again, I'll spend some time with it to try to see what's going on.

As a heads up, we have couple patch versions coming out tomorrow which I hope will show some scan time improvements and detection improvements as a result of work overhauling some scan recursion and embedded file type detection logic. I don't expect it will help in this particular case, but ... maybe! *shrugs*

Regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Sent: Monday, October 4, 2021 6:03 PM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning


Any feedback from ClamAV developers, please: should I open a defect for the problem or is it expected that PDF file scanning takes few minutes?



Thanks,

Nikolay



From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Saturday, September 25, 2021 at 11:32
To: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi Ged,



Thank you for your response! You are right, the configuration below is not secure and thanks again for pointing to this! For production we’ll use different configuration that simply issues alert for the provided file. The one attached here is more a less copy of default configuration with the limits increased to extremely high values in attempt to get scanning completed and then try to tweak them down to see what exactly is causing the alert on the file.



Release notes for 0.104.0 mention “Fixed bytecode match evaluation for PDF bytecode hooks in PDF file scans.”. Looks like something that’s been fixed, yes. For curiosity I have tried to disable bytecode and the scanning completes faster of course, but it must be enabled for best results AFAIK.



Looking at the total size of temporary files created, I see that scan has produced 1.7Gb of temporary files extracted from PDF and that may explain why scanning takes that much time. For comparison, when I extract all images from the file using pdfimages program, I’m seeing about 21Mb. And another comparison, when I use pdftk to concatenate original file with the simple one-page text file and scan result, scan takes just two seconds and the size of temporary files produced is about 6Mb. I.e. for a slightly larger input scan time is much better! That’s the reason I’m suspecting there may be something in the original file that confuses ClamAV parser / analyzer and would be interesting for ClamAV developers to check. On the side note, I have tried few large PDF files from Google scanned library, no issues at all and they are scanned quickly.



Best regards,

Nikolay



From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Date: Saturday, September 25, 2021 at 10:44
To: Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Hi Micah,

Thank you very much for your attention to this matter! I have re-shared files; original file links should be working now:

File: https://storage.googleapis.com/upload-samples/Museum_26MB.pdf
Config file: https://storage.googleapis.com/upload-samples/clamd.conf
Debug log: https://storage.googleapis.com/upload-samples/debug.log
Extract file size data: https://storage.googleapis.com/upload-samples/files.log

I will try patched versions when they are available.

Best regards,
Nikolay


From: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Date: Tuesday, November 2, 2021 at 17:30
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning
Hi Nikolay,

Sorry this slipped by me. I'd be happy to take a look at the PDF you were having scan speed issues with. I see that it's no longer available with the URL you originally provided. If you could share it again, I'll spend some time with it to try to see what's going on.

As a heads up, we have couple patch versions coming out tomorrow which I hope will show some scan time improvements and detection improvements as a result of work overhauling some scan recursion and embedded file type detection logic. I don't expect it will help in this particular case, but ... maybe! *shrugs*

Regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Sent: Monday, October 4, 2021 6:03 PM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning


Any feedback from ClamAV developers, please: should I open a defect for the problem or is it expected that PDF file scanning takes few minutes?



Thanks,

Nikolay



From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Saturday, September 25, 2021 at 11:32
To: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi Ged,



Thank you for your response! You are right, the configuration below is not secure and thanks again for pointing to this! For production we’ll use different configuration that simply issues alert for the provided file. The one attached here is more a less copy of default configuration with the limits increased to extremely high values in attempt to get scanning completed and then try to tweak them down to see what exactly is causing the alert on the file.



Release notes for 0.104.0 mention “Fixed bytecode match evaluation for PDF bytecode hooks in PDF file scans.”. Looks like something that’s been fixed, yes. For curiosity I have tried to disable bytecode and the scanning completes faster of course, but it must be enabled for best results AFAIK.



Looking at the total size of temporary files created, I see that scan has produced 1.7Gb of temporary files extracted from PDF and that may explain why scanning takes that much time. For comparison, when I extract all images from the file using pdfimages program, I’m seeing about 21Mb. And another comparison, when I use pdftk to concatenate original file with the simple one-page text file and scan result, scan takes just two seconds and the size of temporary files produced is about 6Mb. I.e. for a slightly larger input scan time is much better! That’s the reason I’m suspecting there may be something in the original file that confuses ClamAV parser / analyzer and would be interesting for ClamAV developers to check. On the side note, I have tried few large PDF files from Google scanned library, no issues at all and they are scanned quickly.



Best regards,

Nikolay



From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Date: Saturday, September 25, 2021 at 10:44
To: Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Hi Micah,

FYI, I’ve tested with 0.104.1 and it still takes about 5 minutes to scan the file.


root@da6a7952db76:/# echo "nVERSION" | nc localhost 3310

ClamAV 0.104.1/26342/Wed Nov 3 01:22:37 2021



root@da6a7952db76:/# time echo "nSCAN /tmp/1.pdf" | nc localhost 3310

/tmp/1.pdf: OK



real 4m45.467s

user 0m0.000s

sys 0m0.004s

Thanks!

Best regards,
Nikolay


From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Tuesday, November 2, 2021 at 17:54
To: Micah Snyder (micasnyd) <micasnyd@cisco.com>, ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning
Hi Micah,

Thank you very much for your attention to this matter! I have re-shared files; original file links should be working now:

File: https://storage.googleapis.com/upload-samples/Museum_26MB.pdf
Config file: https://storage.googleapis.com/upload-samples/clamd.conf
Debug log: https://storage.googleapis.com/upload-samples/debug.log
Extract file size data: https://storage.googleapis.com/upload-samples/files.log

I will try patched versions when they are available.

Best regards,
Nikolay


From: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Date: Tuesday, November 2, 2021 at 17:30
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning
Hi Nikolay,

Sorry this slipped by me. I'd be happy to take a look at the PDF you were having scan speed issues with. I see that it's no longer available with the URL you originally provided. If you could share it again, I'll spend some time with it to try to see what's going on.

As a heads up, we have couple patch versions coming out tomorrow which I hope will show some scan time improvements and detection improvements as a result of work overhauling some scan recursion and embedded file type detection logic. I don't expect it will help in this particular case, but ... maybe! *shrugs*

Regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Sent: Monday, October 4, 2021 6:03 PM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning


Any feedback from ClamAV developers, please: should I open a defect for the problem or is it expected that PDF file scanning takes few minutes?



Thanks,

Nikolay



From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Saturday, September 25, 2021 at 11:32
To: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi Ged,



Thank you for your response! You are right, the configuration below is not secure and thanks again for pointing to this! For production we’ll use different configuration that simply issues alert for the provided file. The one attached here is more a less copy of default configuration with the limits increased to extremely high values in attempt to get scanning completed and then try to tweak them down to see what exactly is causing the alert on the file.



Release notes for 0.104.0 mention “Fixed bytecode match evaluation for PDF bytecode hooks in PDF file scans.”. Looks like something that’s been fixed, yes. For curiosity I have tried to disable bytecode and the scanning completes faster of course, but it must be enabled for best results AFAIK.



Looking at the total size of temporary files created, I see that scan has produced 1.7Gb of temporary files extracted from PDF and that may explain why scanning takes that much time. For comparison, when I extract all images from the file using pdfimages program, I’m seeing about 21Mb. And another comparison, when I use pdftk to concatenate original file with the simple one-page text file and scan result, scan takes just two seconds and the size of temporary files produced is about 6Mb. I.e. for a slightly larger input scan time is much better! That’s the reason I’m suspecting there may be something in the original file that confuses ClamAV parser / analyzer and would be interesting for ClamAV developers to check. On the side note, I have tried few large PDF files from Google scanned library, no issues at all and they are scanned quickly.



Best regards,

Nikolay



From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Date: Saturday, September 25, 2021 at 10:44
To: Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.
Re: [clamav-users] Extremely slow PDF file scanning [ In reply to ]
Hi Nikolay,

I also tested it today and observed the same thing. I am doing a little performance profiling to understand what's taking so long. As you noted it does finish up much faster if you disable bytecode signatures, which suggests that one of our bytecode signatures is eating up a lot of scan time.

Regards,
Micah



Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Sent: Thursday, November 4, 2021 2:31 PM
To: Micah Snyder (micasnyd) <micasnyd@cisco.com>; ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning


Hi Micah,



FYI, I’ve tested with 0.104.1 and it still takes about 5 minutes to scan the file.



root@da6a7952db76:/# echo "nVERSION" | nc localhost 3310

ClamAV 0.104.1/26342/Wed Nov 3 01:22:37 2021



root@da6a7952db76:/# time echo "nSCAN /tmp/1.pdf" | nc localhost 3310

/tmp/1.pdf: OK



real 4m45.467s

user 0m0.000s

sys 0m0.004s



Thanks!



Best regards,

Nikolay





From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Tuesday, November 2, 2021 at 17:54
To: Micah Snyder (micasnyd) <micasnyd@cisco.com>, ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi Micah,



Thank you very much for your attention to this matter! I have re-shared files; original file links should be working now:



File: https://storage.googleapis.com/upload-samples/Museum_26MB.pdf

Config file: https://storage.googleapis.com/upload-samples/clamd.conf

Debug log: https://storage.googleapis.com/upload-samples/debug.log

Extract file size data: https://storage.googleapis.com/upload-samples/files.log



I will try patched versions when they are available.



Best regards,

Nikolay





From: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Date: Tuesday, November 2, 2021 at 17:30
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi Nikolay,



Sorry this slipped by me. I'd be happy to take a look at the PDF you were having scan speed issues with. I see that it's no longer available with the URL you originally provided. If you could share it again, I'll spend some time with it to try to see what's going on.



As a heads up, we have couple patch versions coming out tomorrow which I hope will show some scan time improvements and detection improvements as a result of work overhauling some scan recursion and embedded file type detection logic. I don't expect it will help in this particular case, but ... maybe! *shrugs*



Regards,

Micah



Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________

From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Sent: Monday, October 4, 2021 6:03 PM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Subject: Re: [clamav-users] Extremely slow PDF file scanning



Any feedback from ClamAV developers, please: should I open a defect for the problem or is it expected that PDF file scanning takes few minutes?



Thanks,

Nikolay



From: Nikolay Belaevski <Nikolay.Belaevski@five9.com>
Date: Saturday, September 25, 2021 at 11:32
To: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi Ged,



Thank you for your response! You are right, the configuration below is not secure and thanks again for pointing to this! For production we’ll use different configuration that simply issues alert for the provided file. The one attached here is more a less copy of default configuration with the limits increased to extremely high values in attempt to get scanning completed and then try to tweak them down to see what exactly is causing the alert on the file.



Release notes for 0.104.0 mention “Fixed bytecode match evaluation for PDF bytecode hooks in PDF file scans.”. Looks like something that’s been fixed, yes. For curiosity I have tried to disable bytecode and the scanning completes faster of course, but it must be enabled for best results AFAIK.



Looking at the total size of temporary files created, I see that scan has produced 1.7Gb of temporary files extracted from PDF and that may explain why scanning takes that much time. For comparison, when I extract all images from the file using pdfimages program, I’m seeing about 21Mb. And another comparison, when I use pdftk to concatenate original file with the simple one-page text file and scan result, scan takes just two seconds and the size of temporary files produced is about 6Mb. I.e. for a slightly larger input scan time is much better! That’s the reason I’m suspecting there may be something in the original file that confuses ClamAV parser / analyzer and would be interesting for ClamAV developers to check. On the side note, I have tried few large PDF files from Google scanned library, no issues at all and they are scanned quickly.



Best regards,

Nikolay



From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Date: Saturday, September 25, 2021 at 10:44
To: Nikolay Belaevski via clamav-users <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Extremely slow PDF file scanning

Hi there,

On Fri, 24 Sep 2021, Nikolay Belaevski via clamav-users wrote:

> I?m investigating why it takes about five minutes for ClamAV 0.104.0
> to scan PDF file. Can someone help me, please?

There's a note in NEWS.md that a fault in PDF scanning was fixed in
version 0.104. I don't know if it's relevant but it might be worth a
look. More importantly did you read the warnings in the documentation
about the settings that you've changed?

Snippets from 'clamconf -n' reporting about your clamd.conf, with my
comments added below each item:

8<----------------------------------------------------------------------

TCPSocket = "3310"
# TCP sockets are unprotected from remote access. Careful!

DisableCache = "yes"
# This will cause clamd to scan the file every time it sees it, and not
# to use its internal caching.

Foreground = "yes"
# I guess there's reason for this?

Debug = "yes"
# This may affect performance, but I would not expect gross effects.

MaxScanTime = "600000"
# Ten minutes! *Fifty* times the default of 12 seconds. According to
# the man page, excessive times can cause DOS conditions.

MaxScanSize = "4194304000"
# 4GBytes. *Forty* times the default of 100M, and in the clamd.conf
# man page there is a dire warning about setting this limit too high.
# I'm also not sure how the absolute 2G limit on file size impinges.

MaxFileSize = "52428800"
# Twice the default. Another dire warning in the man page.

MaxFiles = "50000"
# Five times the default. Dire warning.

PCREMaxFileSize = "104857600"
# Four times the default. Specific warning about performance.

8<----------------------------------------------------------------------

I think you need to look carefully at the configuration changes which
you've made, perhaps do some testing to establish whether your system
can support scanning with those changes under conditions of plausible
stress.

Under normal circumstances my systems wouldn't scan the file you've
provided - I only use ClamAV to scan mail, and if this appeared in our
mail it would be rejected on principle by inspection of the message
parts. When I scanned it manually, with our usual configuration, it
just gave a warning about a limit that being exceeded after 1m 54s of
scanning time. In my experience that's a longish but not an extreme
scan time. We log the scan time for most scans. Below is an extract
from our mail log, listing all the scan times this month which were
longer than 10 seconds. As you can see some of the scans of rather
small amounts of data took considerably longer than one might expect
given the performance of other scans of much larger amounts - the two
scans of more than 200kbytes of data performed this month each took a
shade over 30 seconds. I haven't really investigated scan times here
as they're well under anything which might cause problems.

Sep 1 19:30:15 mail6 xm 10.249s, 3339 bytes
Sep 1 20:04:29 mail6 xm 10.122s, 50390 bytes
Sep 9 11:42:33 mail6 xm 13.563s, 3337 bytes
Sep 9 21:56:56 mail6 x3 11.716s, 622 bytes
Sep 9 23:29:33 mail6 xm 10.355s, 3338 bytes
Sep 10 06:32:42 mail6 xm 10.153s, 3337 bytes
Sep 10 06:48:42 mail6 x3 12.189s, 853 bytes
Sep 10 21:14:38 mail6 xm 35.126s, 260193 bytes
Sep 10 23:24:55 mail6 xm 32.949s, 218614 bytes
Sep 15 16:10:04 mail6 x3 10.004s, 118 bytes
Sep 15 21:39:11 mail6 xm 11.149s, 53899 bytes
Sep 16 05:42:49 mail6 x3 15.139s, 175 bytes
Sep 16 05:43:09 mail6 x3 10.960s, 46 bytes
Sep 16 05:43:19 mail6 x3 16.012s, 195 bytes
Sep 16 05:43:46 mail6 x3 10.853s, 195 bytes
Sep 16 05:44:09 mail6 x3 16.696s, 1370 bytes
Sep 16 05:45:22 mail6 x3 12.048s, 950 bytes
Sep 16 05:48:41 mail6 x3 11.545s, 596 bytes
Sep 16 05:49:19 mail6 x3 12.399s, 1115 bytes
Sep 16 05:50:35 mail6 x3 10.489s, 474 bytes
Sep 16 10:49:13 mail6 x3 12.740s, 1147 bytes
Sep 18 05:55:10 mail6 x3 16.392s, 861 bytes
Sep 18 08:54:53 mail6 x3 10.325s, 861 bytes
Sep 18 18:29:43 mail6 xm 12.694s, 3342 bytes
Sep 18 20:36:44 mail6 xm 10.644s, 3340 bytes
Sep 18 23:32:09 mail6 x3 10.463s, 45 bytes
Sep 19 04:14:03 mail6 x3 15.265s, 950 bytes
Sep 19 06:29:11 mail6 x3 10.973s, 45 bytes
Sep 21 19:51:45 mail6 x3 22.834s, 712 bytes

These were all scanned using a TCP connection to a remote scanner on
the same (1Gbit/s Ethernet) network. The processes 'xm' and 'x3' are
two of the milters which our mail servers run, they pass data directly
to a remote clamd over TCP as I've said - we don't use clamav-milter.

HTH

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml



________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.

________________________________

CONFIDENTIALITY NOTICE: This e-mail and any files attached may contain confidential information of Five9 and/or its affiliated entities. Access by the intended recipient only is authorized. Any liability arising from any party acting, or refraining from acting, on any information contained in this e-mail is hereby excluded. If you are not the intended recipient, please notify the sender immediately, destroy the original transmission and its attachments and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Copyright in this e-mail and any attachments belongs to Five9 and/or its affiliated entities.