Hi there,
On Wed, 7 Apr 2021, Micah Snyder (micasnyd) via clamav-users wrote:
> There’s a lot of technical work to be done to safely raise that
> limitation, as large files of various file types types have never
> been tested.
In my milter I've a pretty general-purpose Perl harness which can send
data to clamd in flexible ways. It wouldn't take much effort to tweak
it to run tests on clamd - in fact I've used it for that kind of thing
in the past. If you'd like me to do some testing with large files and
especially if you have some candidate large files which would be worth
trying, I'd be happy to set a job running on an otherwise idle machine
and cook rice puddings while waiting on the results. I have machines
which I can cheerfully crash without worries. They're Pi4Bs, which if
you leave them running for long enough will crash all by themselves.
> A large TAR, for example, may well work fine when a large ZIP might
> crash the program. We really have no idea.
Do you have anything fuzzing the code, deliberately trying to break
it, any even semi-automatic analysis? Seems like if you could break
things into manageable blocks the community could help quite a bit.
What would help most is a design document explaining the structure of
the code, how it all hangs together, and the intended function of the
various parts. Then people who would otherwise be overwhelmed by it
all could get their teeth into it. It could pay enormous dividends if
something like that were available to the community. Help in testing
would be just the start.
> A lot of folks seem to be unhappy with it saying “OK” when a file
> hasn’t been scanned (myself included). So we have been talking
> about changing the output to something like the following messages
> when files are not scanned or are only partially scanned:
> * “SKIPPED (exceeded max file size)”
> * “INCOMPLETE (exceeded max scan size)”
> The exact wording is TBD. If anyone has any specific requests, I’d
> enjoy some help brainstorming.
Agreed it's perverse to report "OK" if a file was not properly scanned
but since it's been that way for decades I think you'll probably break
an awful lot of stuff Out There if you just go ahead and change that.
A compile-time option, initially defaulting to the current behaviour,
or a configuration option (the default behaviour as now) might prevent
a lot of angst. No issues with the suggested wordings that I can see,
as long as they don't turn out to be a moving target. There should be
another one, perhaps something like "DUNNO", for things nobody thought
of yet possibly including "SKIPPED (below minimum file size)". Please
also something in the docs reserving the right to add new replies, so
that coders get the habit of coding for the future or so at the, er,
barest minimum your @r$e is covered.
> ... Some file formats, like PDF, DMG, and ZIP* store metadata at the
> end of the file ... zips are actually pretty easy to parse in-order
> ... Files like DMG, on the other hand, can’t even be identified as
> DMG’s without reading the end of the file first ...
Is there somewhere a document listing the file types of which ClamAV
is aware, how it parses them, and any specific limitations/issues?
Whenever I've delved into the code it's been pretty daunting to try to
work out some of that stuff.
> In short, don’t send chunks of files as separate files to be
> scanned; It probably won’t catch any malware that way and may print
> lots of warnings or errors if it gets confused about the type of the
> file and starts processing it with the wrong parser.
I think the OP was confused by the use of 'chunks' in the clamd 'man'
page, which refers to the API for streaming data to clamd rather than
any suggestion that files can be broken into parts which will then be
scanned separately. Clearly I can scan any known malicious file four
bytes at a time to guarantee a clean result.
--
73,
Ged.
_______________________________________________
clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml