Mailing List Archive

Pull request to add parallel scanning to clamscan
Hi,

I posted a rebased version of my patch to add parallel scanning to
clamscan here:

https://github.com/vrtadmin/clamav-devel/pull/78

As I haven't received any feedback on the first version I posted last
year, could somebody please have a look and comment this time? Even a
"clamscan is a test tool and we do not care about its performance" would
suffice, but I would of course love to see the patch merged. The
motivation for writing it was that we use clamscan in our build
environment. It's simpler to use than setting up a one-shot clamd &
clamdscan, except for the lack of "multiscan" in clamscan.

Thanks,
Michal
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: Pull request to add parallel scanning to clamscan [ In reply to ]
Hi,

I'm not on the ClamAV development team, so have no say whatsoever in terms of what's accepted and what isn't. That said, this is something I'd *love* to see implemented in clamscan. Thank you!

Disclaimer: I'm just having a look at this now but haven't fully read through the diff yet...

From the commit message you said "build a list of files first and then spawn N children to scan the files in parallel."

Does this actually iterate *all* the files and directories before starting the first scan? If you're scanning a large directory tree, how much overhead does this add prior to scanning the first file?

Alternatively, as clamscan already iterates through directories, does it maintain a count of the number of concurrent calls to 'scanfile()' and fire off another one at that point as necessary?

Will take a closer look later, and maybe even answer my own questions :)

Mark

> On 20 Jun 2017, at 9:46 am, Michal Marek <mmarek@suse.com> wrote:
>
> Hi,
>
> I posted a rebased version of my patch to add parallel scanning to
> clamscan here:
>
> https://github.com/vrtadmin/clamav-devel/pull/78
>
> As I haven't received any feedback on the first version I posted last
> year, could somebody please have a look and comment this time? Even a
> "clamscan is a test tool and we do not care about its performance" would
> suffice, but I would of course love to see the patch merged. The
> motivation for writing it was that we use clamscan in our build
> environment. It's simpler to use than setting up a one-shot clamd &
> clamdscan, except for the lack of "multiscan" in clamscan.
>
> Thanks,
> Michal
> _______________________________________________
> http://lurker.clamav.net/list/clamav-devel.html
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
>
> http://www.clamav.net/contact.html#ml

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: Pull request to add parallel scanning to clamscan [ In reply to ]
On 2017-06-20 11:33, Mark Allan wrote:
> From the commit message you said "build a list of files first and
> then spawn N children to scan the files in parallel."
>
> Does this actually iterate *all* the files and directories before
> starting the first scan?

Yes.


> If you're scanning a large directory tree,
> how much overhead does this add prior to scanning the first file?

Unless you are scanning a really slow NFS or CIFS mount, it's negligible
compared to the time it takes to process the content. Initializing the
database does take noticeable time on startup.


> Alternatively, as clamscan already iterates through directories, does
> it maintain a count of the number of concurrent calls to 'scanfile()'
> and fire off another one at that point as necessary?

That would of course be an option, but it would require incrementally
passing paths to the children / threads. With the current approach, I
only need a pipe, which is the simplest synchronization primitive one
can think of :).

Michal
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: Pull request to add parallel scanning to clamscan [ In reply to ]
On Tue, Jun 20, 2017 at 10:46:22AM +0200, Michal Marek wrote:
> Hi,
>
> I posted a rebased version of my patch to add parallel scanning to
> clamscan here:
>
> https://github.com/vrtadmin/clamav-devel/pull/78
>
> As I haven't received any feedback on the first version I posted last
> year, could somebody please have a look and comment this time? Even a
> "clamscan is a test tool and we do not care about its performance" would
> suffice, but I would of course love to see the patch merged. The
> motivation for writing it was that we use clamscan in our build
> environment. It's simpler to use than setting up a one-shot clamd &
> clamdscan, except for the lack of "multiscan" in clamscan.

Thanks for this, applied to 0.99.2 and seems to work great on our old slow
Solaris boxes.. dozen processes really helps there. :-)

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml
Re: Pull request to add parallel scanning to clamscan [ In reply to ]
Hi Michal,

Thanks for your code submission. I opened a bugzilla bug

https://bugzilla.clamav.net/show_bug.cgi?id=11856

to review for ClamAV 0.99.4.

Steve
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

http://www.clamav.net/contact.html#ml