Mailing List Archive

AC algorithm optimization
Hi£º

I have been studying the AC algorithm of ClamAV. Here are some ideas.

Now AC:
1.Use of the Trie structure,to loading large-scale
pattern string,memory expansion serious;
2.The max_ac_depth is set to 3,although can reduce the memory usage;but
that could lead to the ac_find_match function is frequently called.

If a algorithm that can compressing the Trie structure, that could increase
the max_ac_depth value.

There is just right a double array trie algorithm,using two arrays,very
clever compression trie structure.

I did a experiment on my PC,the AC algorithm is separated from the ClamAV
code,which can be run independently.I implement the double array trie as a
contrast at the same time.Extracted the original AC algorithm rules,as
the experimental rules(General:8030PE:17241substrings).

Scanning the files samples(1465 files,filetype :
HTML,PE,ASCII,PDF,etc.. the total size: 273MB).

Test code flow, initialization of AC structure, -> load the rule, -> AC
build - > scan directory for each file, -> read file content into the
memory buffer, - > call gettimeofday to record the start time, -> call
ac scan buff, -> call gettimeofday to record end time, - > calculate
throughput (Mbps) according to the file size and scanning time

Hardware and Software environment:

CPU: Intel i3-3220 @ 3.30 GHz CPU

Memory: DDR3 1333 8 g

OS version: Ubuntu 13.10

GCC version: 4.8.1

The result:

Most of the file scanning performance will be improved, some file pretty
obvious rise up to 10 times.

Whether can use dat algorithm to replace the original ac algorithm to
improve the scanning performance?
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net
Re: AC algorithm optimization [ In reply to ]
That sounds interesting. Is there a reference implementation of AC trie
using double arrays for us?

Thanks,
Steve


On Wed, Mar 26, 2014 at 7:34 AM, 刘申 <liu31380@gmail.com> wrote:

> Hi:
>
> I have been studying the AC algorithm of ClamAV. Here are some ideas.
>
> Now AC:
> 1.Use of the Trie structure,to loading large-scale
> pattern string,memory expansion serious;
> 2.The max_ac_depth is set to 3,although can reduce the memory usage;but
> that could lead to the ac_find_match function is frequently called.
>
> If a algorithm that can compressing the Trie structure, that could increase
> the max_ac_depth value.
>
> There is just right a double array trie algorithm,using two arrays,very
> clever compression trie structure.
>
> I did a experiment on my PC,the AC algorithm is separated from the ClamAV
> code,which can be run independently.I implement the double array trie as a
> contrast at the same time.Extracted the original AC algorithm rules,as
> the experimental rules(General:8030PE:17241substrings).
>
> Scanning the files samples(1465 files,filetype :
> HTML,PE,ASCII,PDF,etc.. the total size: 273MB).
>
> Test code flow, initialization of AC structure, -> load the rule, -> AC
> build - > scan directory for each file, -> read file content into the
> memory buffer, - > call gettimeofday to record the start time, -> call
> ac scan buff, -> call gettimeofday to record end time, - > calculate
> throughput (Mbps) according to the file size and scanning time
>
> Hardware and Software environment:
>
> CPU: Intel i3-3220 @ 3.30 GHz CPU
>
> Memory: DDR3 1333 8 g
>
> OS version: Ubuntu 13.10
>
> GCC version: 4.8.1
>
> The result:
>
> Most of the file scanning performance will be improved, some file pretty
> obvious rise up to 10 times.
>
> Whether can use dat algorithm to replace the original ac algorithm to
> improve the scanning performance?
> _______________________________________________
> http://lurker.clamav.net/list/clamav-devel.html
> Please submit your patches to our Bugzilla: http://bugs.clamav.net
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net