Mailing List Archive

PDF scan
Hi, all.

I hava a question about ClamAV 0.104.2 on IBM AIX7.3 system.
It takes time to scan PDF files by clamdscan.
it takes about 8 seconds to scan PDF file(total 645 page).
(sample file is here: https://www.uinet.or.jp/LPBB0010-1000000000.pdf)

# /opt/freeware/sbin/clamd -V
ClamAV 0.104.2/26663/Mon Sep 19 03:56:35 2022
# clamdscan /home/test/LPBB0010-1000000000.pdf
/home/test/LPBB0010-1000000000.pdf: OK

----------- SCAN SUMMARY -----------
Infected files: 0
Time: 8.503 sec (0 m 8 s)
Start Date: 2022:09:19 08:38:50
End Date: 2022:09:19 08:38:58
# cat /opt/freeware/etc/clamav/clamd.conf |egrep -v '^$|^#'
LocalSocket /tmp/clamd.socket
LocalSocketMode 660
User root
AlertBrokenExecutables yes
AlertBrokenMedia yes
AlertEncrypted yes
AlertEncryptedArchive yes
AlertEncryptedDoc yes

I think it takes too long to scan PDF files.
Could you tell me how to shorten the time?


_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat
Re: [ext] PDF scan [ In reply to ]
* Tsutomu Oyamada <oyamada@promark-inc.com>:
> Hi, all.
>
> I hava a question about ClamAV 0.104.2 on IBM AIX7.3 system.
> It takes time to scan PDF files by clamdscan.
> it takes about 8 seconds to scan PDF file(total 645 page).

All files or just THIS file?
645 pages is quite long.

> (sample file is here: https://www.uinet.or.jp/LPBB0010-1000000000.pdf)

Scanning it here:

# clamdscan -v /tmp/LPBB0010-1000000000.pdf
/tmp/LPBB0010-1000000000.pdf: OK

----------- SCAN SUMMARY -----------
Infected files: 0
Time: 6.818 sec (0 m 6 s)
Start Date: 2022:09:20 09:40:36
End Date: 2022:09:20 09:40:43

# clamdscan -V /tmp/LPBB0010-1000000000.pdf
ClamAV 0.105.1/26663/Mon Sep 19 09:56:35 2022

--
Ralf Hildebrandt
Charité - Universitätsmedizin Berlin
Geschäftsbereich IT | Abteilung Netzwerk

Campus Benjamin Franklin (CBF)
Haus I | 1. OG | Raum 105
Hindenburgdamm 30 | D-12203 Berlin

Tel. +49 30 450 570 155
ralf.hildebrandt@charite.de
https://www.charite.de
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat
Re: PDF scan [ In reply to ]
Hi there,

On Tue, 20 Sep 2022, Tsutomu Oyamada wrote:

> I hava a question about ClamAV 0.104.2 on IBM AIX7.3 system.

Version 0.104.2 is vintage January 2022. You really should upgrade:

https://blog.clamav.net/

> it takes about 8 seconds to scan PDF file(total 645 page).
> (sample file is here: https://www.uinet.or.jp/LPBB0010-1000000000.pdf)
>
> # /opt/freeware/sbin/clamd -V
> ClamAV 0.104.2/26663/Mon Sep 19 03:56:35 2022

In case it isn't obvious, the date there is that of the signature
database, not that of the scanning engine.

> # clamdscan /home/test/LPBB0010-1000000000.pdf
> /home/test/LPBB0010-1000000000.pdf: OK
>
> ----------- SCAN SUMMARY -----------
> Infected files: 0
> Time: 8.503 sec (0 m 8 s)
> Start Date: 2022:09:19 08:38:50
> End Date: 2022:09:19 08:38:58
> ...

I guess we're all in a hurry, but assuming that the file is not in
fact malicious, and that you've scanned nearly a megabyte for ten
million threats, then overall the numbers don't seem too bad to me.
Have you tried other scanners to compare their performance?

> # cat /opt/freeware/etc/clamav/clamd.conf |egrep -v '^$|^#'

The output of 'clamconf -n' would be easier for us.

> ...
> User root

It would be better to choose a non-root user if you can.

> Could you tell me how to shorten the time?

First you should upgrade the scanner. This *may* improve performance
but there are no miracles.

You haven't said anything about the signatures database - is it only
the 'official' ClamAV database, or do you have third-party signatures
in addition?

I'm going to assume that you aren't using some horribly slow network
access, and that file read times are small compare with scan times:
8<----------------------------------------------------------------------
$ wget https://www.uinet.or.jp/LPBB0010-1000000000.pdf
...
$ time cat LPBB0010-1000000000.pdf > /dev/null

real 0m0.035s
user 0m0.016s
sys 0m0.012s
8<----------------------------------------------------------------------

You could always put more horsepower into the scanner and/or make sure
that the scanning machine isn't doing anything else at the same time,
for example move it to a dedicated box. You could get someone else to
do the scans, so you don't have to. :) You could be more choosy with
the scanning (for example you could scan for fewer threats; or scan
smaller quantities of data; or even choose the scanner and the type of
scan, possibly on a file-by-file basis; and you could use things like
MD5 digests to keep a database of things which you've already scanned
and that you're confident won't need to be scanned again). You could
even scan while you're asleep in bed. We do almost all those things.

From where do the PDF files come?

Do you have any control over their content?

How many of them are there?

Is there a lot of 'churn'?

What are the threats which concern you?

How often do you see malicious PDF files as we say 'in the wild'?

How often, in your experience, does ClamAV find one?

How often, in your experience, does ClamAV fail to find one?

--

73,
Ged.
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat