Mailing List Archive

Help clamdscan faster
Hello,
We are running a file management project with file storage using amazon S3.

Our core architecture is every time there is an event to upload or edit a
file on s3, it will trigger an event to run an ECS task, that ECS will be a
container containing clamAV to scan for viruses of that file.
The whole process of the task is : download the file from s3 -> scan the
file for viruses -> push scan result to a webhook to display the message

1.How do you think about this architecture?

2.We are seeing that scanning takes a long time with large excel files (
~20min for 2-3GB file), is there any way to make it faster?

3. We are using "clamdscan" , is there any limit on how many files ( in
folder ) can be scanned at once or maximum file size ?

3.If we run the ECS Fargate task with 2vCPU - 8GB RAM, will we be able to
scan the maximum file size? 20GB file size can be scannable?

Best regards,
Harry Tran
Re: Help clamdscan faster [ In reply to ]
Hello,

yes as a result of architecture there is the limit of 2gb for each single file, if the file is larger then the limit you will get an false-positive scan result for this file.
-> please feel free to read what you find interesting: https://docs.clamav.net/Introduction.html

kind greetings
newcomer01


Von / From: Clamav User Mailinglist <mailto:clamav-users@lists.clamav.net>
An / To: Newcomer01 <mailto:newcomer01@posteo.de>
CC / CC: Nhat Tran Xuan <mailto:nhat.tran2@ntq-solution.com.vn>
Gesendet / Sent: Mittwoch, August 23, 2023 um 20:25 (at 08:25 PM) +0200
Betreff / Subject: [clamav-users] Help clamdscan faster
> Hello,
> We are running a file management project with file storage using amazon S3.
>
> Our core architecture isevery time there is an event to upload or edit a file on s3, it will trigger an event to run an ECS task, that ECS will be a container containing clamAV to scan for viruses of that file.
> The whole process of the task is :  download the file from s3 -> scan the file for viruses ->  push scan result to a webhook to display the message
>
> 1.How do you think about this architecture?
>
> 2.We are seeing that scanning takes a long time with large excel files ( ~20min for 2-3GB file), is there any way to make it faster?
>
> 3. We are using "clamdscan" , is there any limit on how many files ( in folder ) can be scanned at once or maximum file size ?
>
> 3.If we run the ECS Fargate task with 2vCPU - 8GB RAM, will we be able to scan the maximum file size? 20GB file size can be scannable?
>
> Best regards,
> Harry Tran
>
> _______________________________________________
>
> Manage your clamav-users mailing list subscription / unsubscribe:
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/Cisco-Talos/clamav-documentation
>
> https://docs.clamav.net/#mailing-lists-and-chat

_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat
Re: Help clamdscan faster [ In reply to ]
On Thu, 24 Aug 2023, Nhat Tran Xuan via clamav-users wrote:

> Hello,
> We are running a file management project with file storage using amazon S3.
>
> Our core architecture is every time there is an event to upload or edit a
> file on s3, it will trigger an event to run an ECS task, that ECS will be a
> container containing clamAV to scan for viruses of that file.
> The whole process of the task is : download the file from s3 -> scan the
> file for viruses -> push scan result to a webhook to display the message
>
> 1.How do you think about this architecture?
>
> 2.We are seeing that scanning takes a long time with large excel files (
> ~20min for 2-3GB file), is there any way to make it faster?

Make sure that ClamAV does not need to download the virus database
from scratch each time that a container is started.

My instinct is to keep a scan service running rather than start a new
instance for each scan. The virus database takes up about 1.25GB of
RAM; unless the new container can copy or share that from existing
memory, you have two start up delays; one for the container and one
similar to the difference between clamscan and clamdscan.

Ideally you would run clamd and clamdscan on the machine that has the
file to avoid transferring over the network, but if the storage and
compute are on different services or VMs you may need one transfer.

However, I understand that clamd caches results (I guess by checksum
of the data) so if the same file is scanned more than once a network
clamdscan can be faster than a local clamscan ...

man clamdscan suggests that using --fdpass is faster than streaming
the file, but requires a unix socket, so clamd and clamdscan must be
on the same machine.

If Amazon charge for an idle process using >1GB RAM, that would
be a consideration.

> 3. We are using "clamdscan" , is there any limit on how many files ( in
> folder ) can be scanned at once or maximum file size ?

The maximum file size that can be scanned is 2GB
- larger files are skipped.
IF the file is an archive, or disk image etc., eg a .zip or .iso,
then ClamAV 1.2.0 release candidate (currently in pre-release)
can scan each of the files inside (up to the 2GB limit) even if the
.zip itself is larger.
This is true for clamscan and clamdscan.

> 3.If we run the ECS Fargate task with 2vCPU - 8GB RAM, will we be able to
> scan the maximum file size? 20GB file size can be scannable?



--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat