We have this bug report https://bugzilla.clamav.net/show_bug.cgi?id=12673
for 0.103.0+ which claims that files larger than 4GB are causing errors instead of obeying the MaxScanSize setting. I’ll work on it a little today to see if I can figure out what’s going wrong.
From: clamav-users <firstname.lastname@example.org> On Behalf Of Michael Kyriacou via clamav-users
Sent: Tuesday, March 2, 2021 8:36 AM
To: ClamAV users ML <email@example.com>
Cc: Michael Kyriacou <firstname.lastname@example.org>
Subject: Re: [clamav-users] Can’t allocate memory error
I am scanning large Data sets for a company. These file systems have hundreds of thousands of files in them. Most files are small in size, <1GB, while a few are large, >10GB. Most files are documents, archives, and executables. I am scanning them to detect if there are any malware.
These are virtual machines, running Ubuntu 20.04.
The cpu on the esxi host is an Intel Xeon Platinum 828 CPu @2.70GHz. I have in total, 112 logical processors available, and 512 GB of RAM.
The message it says is the following:
Got command FILDES(7,11) argument
RECVTH FILDES command complete
THMGR active jobs for ***********: 2
THRMGR: Contended, sleeping
Nothin under this command, it pauses, then after a couple minutes it will continue, repeating
On Tue, Mar 2, 2021 at 9:40 AM G.W. Haywood via clamav-users <email@example.com<mailto:firstname.lastname@example.org>> wrote:
On Tue, 2 Mar 2021, Michael Kyriacou via clamav-users wrote: > On Tue, Mar 2, 2021 at 4:08 AM G.W. Haywood via clamav-users wrote:
>> On Mon, 1 Mar 2021, Michael Kyriacou via clamav-users wrote:
>>> ... clamav 103.1 on ubuntu 20.04. I am getting “can’t allocate
>>> memory errors” on very large files ( 10GB +). I thought clamdscan
>>> was supposed to skip files that are larger than what you set the
>>> maxfilesize/maxscansize to.
>> Unfortunately this is a known issue:
>> Have you tried other ways to avoid scanning huge files?
> I was not aware of any other way to avoid scanning large files. Where can I
> find such solutions?
The operating system offers ways to avoid shooting your own feet. You
could just arrange for all the huge files to be in some corner of the
filesystem which you don't normally scan - which begs the questions
what are you scanning, and why? There will of course be pseudo-files
in your system which you should _never_ scan. The 'find' utility will
let you specify size limits. You will need to spend some quality time
with the 'man' pages to gain familiarity with using standard utilities
in conjunction with something like ClamAV. Using the 'man' pages is
something of an acquired taste, which you do need to acquire if you're
to get the most out of a Linux box. The 'man' page for clamd.conf
contains information about usage of resources. Also there are some
warnings, which to my mind are perhaps a little over the top, but they
serve to remind us that the system's resources may be shared between a
large number of processes; that these processes compete for resources;
and that things can get ugly when there aren't enough to go around.
The concept of "not scanning a file larger than X bytes" is a bit too
simplistic when talking about scanning with something like ClamAV which
(a) depending on the file type may use different approaches to scanning
and (b) can extract the content from types of file (e.g. Zip, RAR, etc.)
which can contain whole directory structures and also employ compression
techniques, and which as a result are subject to various and sometimes
non-obvious Denial-Of-Service type attacks. So there are numerous clamd
configuration options which permit fine-tuning of the resource usage of
the ClamAV tools. To make the best use of these options you'll need to
be familiar with the your system's resources, and the constraints.
How much memory does the box have? You'll probably need a gigabyte or
so to store the signature database before you even start a scan, plus
whatever the scanner uses when it scans something - that depends a lot
on what it's scanning. Then if you keep the default configuration to
permit scanning while reloading the databases, another gigabyte will
be used (briefly) every time clamd reloads the database. Note that
the extra memory will not be released until the completion of any scan
which was started before the reload. I'd recommend that if you don't
want to have to work on memory management, four gigabytes of RAM is
about the minimum for a clamd server. The longer it takes to scan a
file, the more likely it is that you'll try to reload the database
during a scan, so if you're short on memory and you want to scan files
which take a long time to scan then it's worth considering the option
to scan data only while a database reload is not taking place.
clamav-users mailing list
Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml