Mailing List Archive

Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool
In case anyone else is looking into this, I wanted to share some news.

We have been getting some help to create a tool to recursively unpack (or mount) and scan large archives (greater than 2000MB).

This effort has progressed to the point where we've started code review and writing documentation. I'm not entirely sure how we will package it for people to use. I'll share more when we go to open source it. I wanted to share the news now in case anyone else was going to work on it and so they're not as frustrated when it turns out we've done the same.

I don't have a specific release date in mind. It likely won't be until early next year. While we've started code review and testing, the developer that has built the tool for us is now working on adding the allmatch-mode feature support.

Best regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________
From: Andrew C Aitchison <clamav@aitchison.me.uk>
Sent: Thursday, June 8, 2023 6:25 PM
To: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Cc: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Question About MaxFileSize

On Thu, 8 Jun 2023, Micah Snyder (micasnyd) wrote:

> I agree with you. I suspect the majority of cases today is when
> people have a large archive of files to scan.
>
> I think best case scenario for people with a need to scan files
> larger than the present internal 2GB limit is that archives larger
> than 2GB are decompressed and then the files inside are scanned, but
> without actually scanning the very large outer archive.
>
> The way to do this as things work today is to script something
> around clamscan or clamdscan that if the file is too large, handle
> some assorted file types:
>
> 1. if file is a tar.gz, un-tar.gz it and then scan the files within.
> 2. if file is a zip, un-zip it and then scan the files within.
> 3. etc.
>
> I think everyone would like if clamav could do this automatically
> for select archive types. And I think the advantage would be that we
> would perhaps keep the extracted files in memory, or else at least
> delete the temp files as we go without extracting all of it to disk
> before starting to scan.
>
> However, it would be far easier to make a shell script or a python
> script that wraps clamscan/clamdscan and uses native tools like
> "tar", "unzip", etc.

Good idea.

Simply untarring or unzipping into a pipe does not separate the packed files.
However at least tar does have an option which allow us to write a one-liner:
(tar xf ~/viruses.tar --to-command='clamdscan -v - || echo " found in $TAR_REALNAME\n\n---"' ) |& egrep -i found
stream: Eicar-Signature FOUND
found in viruses/EICAR.COM.TAR
stream: Eicar-Signature FOUND
found in viruses/eicar.com.txt
stream: Eicar-Signature FOUND
found in viruses/URLEICAR.COM.TAR
stream: Eicar-Signature FOUND
found in viruses/4DOSBOX/EICAR.COM
stream: Eicar-Signature FOUND
found in viruses/EICAR.COM

The echo is needed to show the name of the file inside the archive.

This appears not to write the unpacked files to disk.

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk
Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool [ In reply to ]
Large archive files may be the most obvious case, especially if things like disk images and installation images are included, but make sure that large multimedia files are also handled.

In today's Internet environment, there are probably far, far more large video files floating around than traditional archives. And in some sense multimedia "container" files (like MP4, MOV, AVI etc.) are archives of their media streams (like H.264/5, AAC, etc.) -- but these archives are, of course, interleaved for real-time playback.

I might add that there have been recent reports of malformed (perhaps malicious) multimedia files causing crashes or unwanted code execution in software such as FFMPEG.


On Mon, 13 Nov 2023 20:32:38 +0000
"Micah Snyder \(micasnyd\) via clamav-users" <clamav-users@lists.clamav.net> wrote:

> In case anyone else is looking into this, I wanted to share some news.
>
> We have been getting some help to create a tool to recursively unpack (or mount) and scan large archives (greater than 2000MB).
>
> This effort has progressed to the point where we've started code review and writing documentation. I'm not entirely sure how we will package it for people to use. I'll share more when we go to open source it. I wanted to share the news now in case anyone else was going to work on it and so they're not as frustrated when it turns out we've done the same.
>
> I don't have a specific release date in mind. It likely won't be until early next year. While we've started code review and testing, the developer that has built the tool for us is now working on adding the allmatch-mode feature support.
>
> Best regards,
> Micah
>
>
> Micah Snyder
> ClamAV Development
> Talos
> Cisco Systems, Inc.
>
> ________________________________
> From: Andrew C Aitchison <clamav@aitchison.me.uk>
> Sent: Thursday, June 8, 2023 6:25 PM
> To: Micah Snyder (micasnyd) <micasnyd@cisco.com>
> Cc: ClamAV users ML <clamav-users@lists.clamav.net>
> Subject: Re: [clamav-users] Question About MaxFileSize
>
> On Thu, 8 Jun 2023, Micah Snyder (micasnyd) wrote:
>
> > I agree with you. I suspect the majority of cases today is when
> > people have a large archive of files to scan.
> >
> > I think best case scenario for people with a need to scan files
> > larger than the present internal 2GB limit is that archives larger
> > than 2GB are decompressed and then the files inside are scanned, but
> > without actually scanning the very large outer archive.
> >
> > The way to do this as things work today is to script something
> > around clamscan or clamdscan that if the file is too large, handle
> > some assorted file types:
> >
> > 1. if file is a tar.gz, un-tar.gz it and then scan the files within.
> > 2. if file is a zip, un-zip it and then scan the files within.
> > 3. etc.
> >
> > I think everyone would like if clamav could do this automatically
> > for select archive types. And I think the advantage would be that we
> > would perhaps keep the extracted files in memory, or else at least
> > delete the temp files as we go without extracting all of it to disk
> > before starting to scan.
> >
> > However, it would be far easier to make a shell script or a python
> > script that wraps clamscan/clamdscan and uses native tools like
> > "tar", "unzip", etc.
>
> Good idea.
>
> Simply untarring or unzipping into a pipe does not separate the packed files.
> However at least tar does have an option which allow us to write a one-liner:
> (tar xf ~/viruses.tar --to-command='clamdscan -v - || echo " found in $TAR_REALNAME\n\n---"' ) |& egrep -i found
> stream: Eicar-Signature FOUND
> found in viruses/EICAR.COM.TAR
> stream: Eicar-Signature FOUND
> found in viruses/eicar.com.txt
> stream: Eicar-Signature FOUND
> found in viruses/URLEICAR.COM.TAR
> stream: Eicar-Signature FOUND
> found in viruses/4DOSBOX/EICAR.COM
> stream: Eicar-Signature FOUND
> found in viruses/EICAR.COM
>
> The echo is needed to show the name of the file inside the archive.
>
> This appears not to write the unpacked files to disk.
>
> --
> Andrew C. Aitchison Kendal, UK
> andrew@aitchison.me.uk
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat
Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool [ In reply to ]
Hi Micah,

Is it going to be part of clamav or a different application entirely?

Hong-Duc Vu


From: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Sent: Monday, November 13, 2023 3:33 PM
To: Andrew C Aitchison <clamav@aitchison.me.uk>
Cc: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool

In case anyone else is looking into this, I wanted to share some news.

We have been getting some help to create a tool to recursively unpack (or mount) and scan large archives (greater than 2000MB).

This effort has progressed to the point where we've started code review and writing documentation. I'm not entirely sure how we will package it for people to use. I'll share more when we go to open source it. I wanted to share the news now in case anyone else was going to work on it and so they're not as frustrated when it turns out we've done the same.

I don't have a specific release date in mind. It likely won't be until early next year. While we've started code review and testing, the developer that has built the tool for us is now working on adding the allmatch-mode feature support.

Best regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool [ In reply to ]
Hi,

It's going to be a python script that depends on having clamav installed and has a few other dependencies for working with zip's, tar's, iso's, and a few other archive formats. At this time, I'm expecting that we will publish it in a separate git repo and not bundle it directly with ClamAV.

Regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Vu, Hong-Duc V. via clamav-users <clamav-users@lists.clamav.net>
Sent: Tuesday, November 14, 2023 10:49 AM
Cc: Vu, Hong-Duc V. <HD.Vu@jhuapl.edu>; ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool


Hi Micah,



Is it going to be part of clamav or a different application entirely?



Hong-Duc Vu





From: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Sent: Monday, November 13, 2023 3:33 PM
To: Andrew C Aitchison <clamav@aitchison.me.uk>
Cc: ClamAV users ML <clamav-users@lists.clamav.net>
Subject: Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool



In case anyone else is looking into this, I wanted to share some news.



We have been getting some help to create a tool to recursively unpack (or mount) and scan large archives (greater than 2000MB).



This effort has progressed to the point where we've started code review and writing documentation. I'm not entirely sure how we will package it for people to use. I'll share more when we go to open source it. I wanted to share the news now in case anyone else was going to work on it and so they're not as frustrated when it turns out we've done the same.



I don't have a specific release date in mind. It likely won't be until early next year. While we've started code review and testing, the developer that has built the tool for us is now working on adding the allmatch-mode feature support.



Best regards,

Micah



Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool [ In reply to ]
We are primarily creating the large archive scanning to support the use case of scanning bundled collections of software, VM images, etc.

Large MP4/MOV/AVI/etc media files are not traditional archives even if they do technically archive media streams. But media streams are not a significant threat concern. As you mentioned, the biggest concern is probably a malicious media file exploiting a vulnerable application to get code execution. Media streams would not otherwise be executable.

Someone may add support to later to extract and scan media streams, but without signature content or special logic coded in a custom media-stream parser written to detect exploits, the scanning of such files is pointless. We have some of that kind of logic to inspect some picture formats (JPEG, PNG, etc) for correctness, but don't have any support for H265, AAC, or other video or audio file formats.

Respectfully,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Paul Kosinski via clamav-users <clamav-users@lists.clamav.net>
Sent: Monday, November 13, 2023 7:28 PM
To: Micah Snyder (micasnyd) via clamav-users <clamav-users@lists.clamav.net>
Cc: Paul Kosinski <clamav-users@iment.com>
Subject: Re: [clamav-users] Question About MaxFileSize / news of upcoming Large Archive Scanner tool

Large archive files may be the most obvious case, especially if things like disk images and installation images are included, but make sure that large multimedia files are also handled.

In today's Internet environment, there are probably far, far more large video files floating around than traditional archives. And in some sense multimedia "container" files (like MP4, MOV, AVI etc.) are archives of their media streams (like H.264/5, AAC, etc.) -- but these archives are, of course, interleaved for real-time playback.

I might add that there have been recent reports of malformed (perhaps malicious) multimedia files causing crashes or unwanted code execution in software such as FFMPEG.


On Mon, 13 Nov 2023 20:32:38 +0000
"Micah Snyder \(micasnyd\) via clamav-users" <clamav-users@lists.clamav.net> wrote:

> In case anyone else is looking into this, I wanted to share some news.
>
> We have been getting some help to create a tool to recursively unpack (or mount) and scan large archives (greater than 2000MB).
>
> This effort has progressed to the point where we've started code review and writing documentation. I'm not entirely sure how we will package it for people to use. I'll share more when we go to open source it. I wanted to share the news now in case anyone else was going to work on it and so they're not as frustrated when it turns out we've done the same.
>
> I don't have a specific release date in mind. It likely won't be until early next year. While we've started code review and testing, the developer that has built the tool for us is now working on adding the allmatch-mode feature support.
>
> Best regards,
> Micah
>
>
> Micah Snyder
> ClamAV Development
> Talos
> Cisco Systems, Inc.
>
> ________________________________
> From: Andrew C Aitchison <clamav@aitchison.me.uk>
> Sent: Thursday, June 8, 2023 6:25 PM
> To: Micah Snyder (micasnyd) <micasnyd@cisco.com>
> Cc: ClamAV users ML <clamav-users@lists.clamav.net>
> Subject: Re: [clamav-users] Question About MaxFileSize
>
> On Thu, 8 Jun 2023, Micah Snyder (micasnyd) wrote:
>
> > I agree with you. I suspect the majority of cases today is when
> > people have a large archive of files to scan.
> >
> > I think best case scenario for people with a need to scan files
> > larger than the present internal 2GB limit is that archives larger
> > than 2GB are decompressed and then the files inside are scanned, but
> > without actually scanning the very large outer archive.
> >
> > The way to do this as things work today is to script something
> > around clamscan or clamdscan that if the file is too large, handle
> > some assorted file types:
> >
> > 1. if file is a tar.gz, un-tar.gz it and then scan the files within.
> > 2. if file is a zip, un-zip it and then scan the files within.
> > 3. etc.
> >
> > I think everyone would like if clamav could do this automatically
> > for select archive types. And I think the advantage would be that we
> > would perhaps keep the extracted files in memory, or else at least
> > delete the temp files as we go without extracting all of it to disk
> > before starting to scan.
> >
> > However, it would be far easier to make a shell script or a python
> > script that wraps clamscan/clamdscan and uses native tools like
> > "tar", "unzip", etc.
>
> Good idea.
>
> Simply untarring or unzipping into a pipe does not separate the packed files.
> However at least tar does have an option which allow us to write a one-liner:
> (tar xf ~/viruses.tar --to-command='clamdscan -v - || echo " found in $TAR_REALNAME\n\n---"' ) |& egrep -i found
> stream: Eicar-Signature FOUND
> found in viruses/EICAR.COM.TAR
> stream: Eicar-Signature FOUND
> found in viruses/eicar.com.txt
> stream: Eicar-Signature FOUND
> found in viruses/URLEICAR.COM.TAR
> stream: Eicar-Signature FOUND
> found in viruses/4DOSBOX/EICAR.COM
> stream: Eicar-Signature FOUND
> found in viruses/EICAR.COM
>
> The echo is needed to show the name of the file inside the archive.
>
> This appears not to write the unpacked files to disk.
>
> --
> Andrew C. Aitchison Kendal, UK
> andrew@aitchison.me.uk
_______________________________________________

Manage your clamav-users mailing list subscription / unsubscribe:
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat