Mailing List Archive

[clamav-users] Scanning a large file through HTTP
Hi All,

We are using a HTTP enabled malware scanning service based on Clam AV.

The service is made something like this
https://github.com/solita/clamav-rest

We have files like CAD files which can go in GBs and want to send to this
malware scanning service.

Is there a possibility to send the file in chunks and get it scanned in the
server side in chunks.

I observed that there is a INSTREAM command in clamd for this purpose and
also there is a 4GB size limit.
https://linux.die.net/man/8/clamd


Read somewhere the full file size is mapped to memory. Is it the case for
INSTREAM command also ?

If it is the case then even if chunking is supported then the server side
must have at least 4GB of memory.

Best Regards,
Saurav
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
On 07/04/2021 15:38, Saurav Sarkar via clamav-users wrote:
>
> We have files like CAD files which can go in GBs and want to send to
> this malware scanning service.

Why are you scanning CAD files?

Can your CAD files contain arbitrary executable code which is blindly
executed by the CAD software? If not, there's no reason to scan them? If
they can, then I'd consider getting different CAD software...


> Is there a possibility to send the file in chunks and get it scanned
> in the server side in chunks

That would depend on the HTTP scanning service software. Clam AV needs
the whole file at once to scan it, but the HTTP scanning service may be
able to upload in chunks and reassemble it before sending it to Clam AV.


>
> I observed that there is a INSTREAM command in clamd for this purpose
> and also there is a 4GB size limit.
> https://linux.die.net/man/8/clamd <https://linux.die.net/man/8/clamd>

INSTREAM basically lets you send a file to clamd, it saves it as a
temporary file, and then scans it, then deletes it. It lets you scan
files that don't exist on the same computer as the clamd daemon without
having to set up network shares etc. So, all the limits (eg the 4GB
limit) which apply to normal files also apply to INSTREAM

--

Paul
Paul Smith Computer Services
support@pscs.co.uk - 01484 855800


--


Paul Smith Computer Services
Tel: 01484 855800
Vat No: GB 685 6987 53

Sign up for news & updates at http://www.pscs.co.uk/go/subscribe

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
Hi there,

On Wed, 7 Apr 2021, Saurav Sarkar via clamav-users wrote:

> We are using a HTTP enabled malware scanning service based on Clam AV.

Perhaps you will get better answers if you address your questions to
the supplier of this service.

> We have files like CAD files which can go in GBs and want to send to this
> malware scanning service.

Does the service which you are using permit that?

> Is there a possibility to send the file in chunks and get it scanned in the
> server side in chunks.

Again you should ask your service because we on the mailing list know
nothing about it. I imagine that it might be possible, but I would
also guess that it would be pointless for your stated purpose.

> I observed that there is a INSTREAM command in clamd for this purpose

The clamd 'man' page doesn't exactly say that. And assuming that this
is related to your use of the service, do you know that your service
actually uses clamd?

The INSTREAM command is available so that you can send a stream of
data to the scanner instead of telling it to scan some file. If you
will read the clamd.conf 'man' page you will see that the stream of
data must not exceed the value of the configured 'StreamMaxLength'.
The default for that option is 25 Megabytes, a lot less than the GBs
that you're talking about. If the maximum is exceeded by the length
of the data stream sent after the INSTREAM command, clamd will return
an error and the scan will fail. If I were running a Web service of
the sort you've dfescribed I'd be very cautious about increasing the
default StreamMaxLength because of the potential for abuse.

> and also there is a 4GB size limit.

A number of limits depend on the configuration, and can be much less
than that.

> Read somewhere

Where?

> the full file size is mapped to memory.

I do not know what that means.

The scanner will use whatever memory is available to it. It needs
around 1Gbyte for the current 'official' databases, and it can use
considerably more than that if you add some of the various third-party
datasases. But this memory is used to store signatures (or rather the
compiled versions of them, which is what takes the time to start clamd
or clamscan), not to store the data being scanned. It is not easy to
predict how much memory will be used to scan a particular data stream.

> Is it the case for INSTREAM command also ?

See my previous answer.

> If it is the case then even if chunking is supported then the server side
> must have at least 4GB of memory.

Somewhere along your chain of logic you seem to have left me behind,
but I would recommend at least 4GB of memory for anything which will
be running the ClamAV scanner unless the user knows what he's doing.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
Thanks a lot Paul and Ged for your replies.

Perhaps i added too many confusing points in my question :).

If i just consider myself as a developer of the HTTP Service for malware
scanning.

I would just like to know what could be the maximum file size which can be
supported by ClamAV ?

Is it 4 GB ? Can this size be increased ?

Is the memory or persistent storage a limit for ClamAV to scan a file ? if
it is a persistent storage then can i increase the limit by attaching an
external NFS ?

Best Regards,
Saurav

On Wed, Apr 7, 2021 at 8:45 PM Paul Smith via clamav-users <
clamav-users@lists.clamav.net> wrote:

> On 07/04/2021 15:38, Saurav Sarkar via clamav-users wrote:
> >
> > We have files like CAD files which can go in GBs and want to send to
> > this malware scanning service.
>
> Why are you scanning CAD files?
>
> Can your CAD files contain arbitrary executable code which is blindly
> executed by the CAD software? If not, there's no reason to scan them? If
> they can, then I'd consider getting different CAD software...
>
>
> > Is there a possibility to send the file in chunks and get it scanned
> > in the server side in chunks
>
> That would depend on the HTTP scanning service software. Clam AV needs
> the whole file at once to scan it, but the HTTP scanning service may be
> able to upload in chunks and reassemble it before sending it to Clam AV.
>
>
> >
> > I observed that there is a INSTREAM command in clamd for this purpose
> > and also there is a 4GB size limit.
> > https://linux.die.net/man/8/clamd <https://linux.die.net/man/8/clamd>
>
> INSTREAM basically lets you send a file to clamd, it saves it as a
> temporary file, and then scans it, then deletes it. It lets you scan
> files that don't exist on the same computer as the clamd daemon without
> having to set up network shares etc. So, all the limits (eg the 4GB
> limit) which apply to normal files also apply to INSTREAM
>
> --
>
> Paul
> Paul Smith Computer Services
> support@pscs.co.uk - 01484 855800
>
>
> --
>
>
> Paul Smith Computer Services
> Tel: 01484 855800
> Vat No: GB 685 6987 53
>
> Sign up for news & updates at http://www.pscs.co.uk/go/subscribe
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
>
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
Hi,

Is it 4 GB? Can this size be increased ?

You can increase the maximum file size by setting the MaxFileSize option in clamd.conf. ClamAV’s option parser won’t allow you to set a maximum scan size higher than 4GB. In reality, the file size limit is 2GB. Anything larger than that will be automatically skipped and marked as “OK”.

The reason for the 2GB file size limit is that in the past there were several bug reports for files larger than 2GB causing crashes. Rather than fix the parsers, the devs slipped in a 2GB file size limit to prevent crashes. I only just realized it a few weeks ago, and you can see my comments on this here: https://github.com/Cisco-Talos/clamav-devel/commit/1a3b784e1954e00b6463000a817da0c5092296cd

There’s a lot of technical work to be done to safely raise that limitation, as large files of various file types types have never been tested. A large TAR, for example, may well work fine when a large ZIP might crash the program. We really have no idea. Basically it’s going to take a bunch of testing when someone goes to work on this.

A lot of folks seem to be unhappy with it saying “OK” when a file hasn’t been scanned (myself included). So we have been talking about changing the output to something like the following messages when files are not scanned or are only partially scanned:

* “SKIPPED (exceeded max file size)”
* “INCOMPLETE (exceeded max scan size)”

The exact wording is TBD. If anyone has any specific requests, I’d enjoy some help brainstorming.

Is the memory or persistent storage a limit for ClamAV to scan a file ? if it is a persistent storage then can i increase the limit by attaching an external NFS ?

Sorry, persistent storage is not the concern.

Read somewhere the full file size is mapped to memory. Is it the case for INSTREAM command also ?

Yes, INSTREAM is also limited to 4GB (or _really_ 2GB).

If it is the case then even if chunking is supported then the server side must have at least 4GB of memory.

Scanning a file in chunks is a waste of CPU cycles. ClamAV was designed to process a whole file all at once. Some file formats, like PDF, DMG, and ZIP* store metadata at the end of the file which is necessary to properly parse the file. Streaming scanners like the one in Snort struggle or can’t process these files. I put a * near ZIP because zips are actually pretty easy to parse in-order even if the central directory is missing. Files like DMG, on the other hand, can’t even be identified as DMG’s without reading the end of the file first, or trusting the “.dmg” file extension (which is dangerous).

In short, don’t send chunks of files as separate files to be scanned; It probably won’t catch any malware that way and may print lots of warnings or errors if it gets confused about the type of the file and starts processing it with the wrong parser.

Regards,
-Micah


Micah Snyder
ClamAV
Talos
Cisco Systems, Inc.



From: clamav-users <clamav-users-bounces@lists.clamav.net> On Behalf Of Saurav Sarkar via clamav-users
Sent: Wednesday, April 7, 2021 7:39 AM
To: clamav-users@lists.clamav.net
Cc: Saurav Sarkar <saurav.sarkar1@gmail.com>
Subject: [clamav-users] Scanning a large file through HTTP

Hi All,

We are using a HTTP enabled malware scanning service based on Clam AV.

The service is made something like this
https://github.com/solita/clamav-rest

We have files like CAD files which can go in GBs and want to send to this malware scanning service.

Is there a possibility to send the file in chunks and get it scanned in the server side in chunks.

I observed that there is a INSTREAM command in clamd for this purpose and also there is a 4GB size limit.
https://linux.die.net/man/8/clamd


Read somewhere the full file size is mapped to memory. Is it the case for INSTREAM command also ?

If it is the case then even if chunking is supported then the server side must have at least 4GB of memory.

Best Regards,
Saurav
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
Seems to me that this behavior, advertising a 4GB limit while silently imposing a 2GB limit and reporting "OK" for anything in between, is a *major* security flaw: ClamAV *must* report that the file was too big to deal with (however worded).

Thus I've taken to using clamscan rather than clamdscan (slow though that is), because at least it reports how many bytes were read, and how many scanned, so I can see what's going on.

P.S. Recently I've downloaded some MP3s from Amazon and scanned them (as I do everything I download -- except updates from my Linux distros). But for a reason I saw on this list -- but can't remember -- MP3s are fully read, but not scanned. Is this going to be remedied?


On Wed, 7 Apr 2021 22:14:39 +0000
"Micah Snyder \(micasnyd\) via clamav-users" <clamav-users@lists.clamav.net> wrote:

> In reality, the file size limit is 2GB. Anything larger than that will be automatically skipped and marked as “OK”.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
Hi there,

On Wed, 7 Apr 2021, Micah Snyder (micasnyd) via clamav-users wrote:

> There’s a lot of technical work to be done to safely raise that
> limitation, as large files of various file types types have never
> been tested.

In my milter I've a pretty general-purpose Perl harness which can send
data to clamd in flexible ways. It wouldn't take much effort to tweak
it to run tests on clamd - in fact I've used it for that kind of thing
in the past. If you'd like me to do some testing with large files and
especially if you have some candidate large files which would be worth
trying, I'd be happy to set a job running on an otherwise idle machine
and cook rice puddings while waiting on the results. I have machines
which I can cheerfully crash without worries. They're Pi4Bs, which if
you leave them running for long enough will crash all by themselves.

> A large TAR, for example, may well work fine when a large ZIP might
> crash the program. We really have no idea.

Do you have anything fuzzing the code, deliberately trying to break
it, any even semi-automatic analysis? Seems like if you could break
things into manageable blocks the community could help quite a bit.

What would help most is a design document explaining the structure of
the code, how it all hangs together, and the intended function of the
various parts. Then people who would otherwise be overwhelmed by it
all could get their teeth into it. It could pay enormous dividends if
something like that were available to the community. Help in testing
would be just the start.

> A lot of folks seem to be unhappy with it saying “OK” when a file
> hasn’t been scanned (myself included). So we have been talking
> about changing the output to something like the following messages
> when files are not scanned or are only partially scanned:
> * “SKIPPED (exceeded max file size)”
> * “INCOMPLETE (exceeded max scan size)”
> The exact wording is TBD. If anyone has any specific requests, I’d
> enjoy some help brainstorming.

Agreed it's perverse to report "OK" if a file was not properly scanned
but since it's been that way for decades I think you'll probably break
an awful lot of stuff Out There if you just go ahead and change that.
A compile-time option, initially defaulting to the current behaviour,
or a configuration option (the default behaviour as now) might prevent
a lot of angst. No issues with the suggested wordings that I can see,
as long as they don't turn out to be a moving target. There should be
another one, perhaps something like "DUNNO", for things nobody thought
of yet possibly including "SKIPPED (below minimum file size)". Please
also something in the docs reserving the right to add new replies, so
that coders get the habit of coding for the future or so at the, er,
barest minimum your @r$e is covered.

> ... Some file formats, like PDF, DMG, and ZIP* store metadata at the
> end of the file ... zips are actually pretty easy to parse in-order
> ... Files like DMG, on the other hand, can’t even be identified as
> DMG’s without reading the end of the file first ...

Is there somewhere a document listing the file types of which ClamAV
is aware, how it parses them, and any specific limitations/issues?
Whenever I've delved into the code it's been pretty daunting to try to
work out some of that stuff.

> In short, don’t send chunks of files as separate files to be
> scanned; It probably won’t catch any malware that way and may print
> lots of warnings or errors if it gets confused about the type of the
> file and starts processing it with the wrong parser.

I think the OP was confused by the use of 'chunks' in the clamd 'man'
page, which refers to the API for streaming data to clamd rather than
any suggestion that files can be broken into parts which will then be
scanned separately. Clearly I can scan any known malicious file four
bytes at a time to guarantee a clean result.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Scanning a large file through HTTP [ In reply to ]
Hi there,

On Wed, 7 Apr 2021, Paul Kosinski via clamav-users wrote:

> Seems to me that this behavior, advertising a 4GB limit while
> silently imposing a 2GB limit and reporting "OK" for anything in
> between, is a *major* security flaw: ClamAV *must* report that the
> file was too big to deal with (however worded).

Don't get too excited about it. When ClamAV says "OK" it really means
"I didn't find anything in there", which if you're unlucky it will say
for maybe two out of three infected files anyway. Getting bent out of
shape about a couple of files which happen to give that result because
they're huge and the scanner gives up on them is simply not seeing the
Big Picture.

You will have problems if you believe everything ClamAV (or indeed any
other virus scanner) tells you. No scanner will give you an accurate
result every time. The best anyone can hope for, with ANY scanner and
ANY profile of data, is probably four out of five, so if you're seeing
thousands of malicious samples every day, and all you do is trust your
virus scanners to be right every time, you'll be accepting hundreds of
malicious samples daily at least.

My take on it is that the way to use ClamAV is to try to have it give
you an estimate of the credibility the data sources rather than to try
to whack all the moles, which is usually a fruitless exercise and will
inevitably lead to failure.

> Thus I've taken to using clamscan rather than clamdscan (slow though
> that is), because at least it reports how many bytes were read, and
> how many scanned, so I can see what's going on.

You can easily put something together which gives you that information
but still uses clamd. If anyone wants to take a project and run with
it I'll be happy to post some Perl code which sends a stream to clamd.
It would take care of the ugly inter-process communications, leaving
our hero to make it somehow useful. Perhaps on the development list,
or the ClamAV Bugzilla.

> P.S. Recently I've downloaded some MP3s from Amazon and scanned them
> (as I do everything I download -- except updates from my Linux
> distros). But for a reason I saw on this list -- but can't remember
> -- MP3s are fully read, but not scanned. Is this going to be
> remedied?

See this thread:

https://marc.info/?l=clamav-users&m=150039601417286&w=2

See also the messages in 2014 from Steve Basford on Jul. 8 and Sep 17,
and Douglas Goddard on Sep 25:

https://marc.info/?l=clamav-users&w=2&r=1&s=MP3&q=b

See also

https://bugzilla.clamav.net/show_bug.cgi?id=11582

which tells me that there's plenty of work still to do but it isn't at
the top of anybody's priority list. The bottom line seems to be that
MP3 viruses are, if not non-existent, relatively rare and there's more
to be achieved looking for things which masquerade as MP3 but aren't.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml