Mailing List Archive: Re: [QUESTION] How does clamAV updates the signature database on-the-fly?

Re: [QUESTION] How does clamAV updates the signature database on-the-fly?

Aug 13, 2010, 11:45 PM

Post #1 of 5 (2587 views)

On 7/28/2010 6:18 PM, thyago wrote:
> I'm researching ways of updating a signature database on-the-fly, so the way
> clamAV does it, can really help me out...
> I mean, what structures are there? how is it implemented?
> Is there a data structure used to store the signatures on memory? If so, how
> exactly is it updated?
> what type of data structure? dynamic or static?
> I need to know if you guys use a pointer to the structure, and then just set
> it to point to the new updated structure,
> and if for example, there's a condition, that limits when this pointer can
> be changed...like a thread needing to finish first....
>
> I tried to look for the implementation on the code itself...but it's so
> big...i don't know in which file to look =/
>
>
>
> Thank you very much, for the help
>
> Thyago

Attached is my implementation. As you can see I use a RW lock to
minimize contention.

Re: [QUESTION] How does clamAV updates the signature database on-the-fly? [ In reply to ]

edwin at clamav

Aug 14, 2010, 1:19 AM

Post #2 of 5 (2512 views)

Permalink

> /**
> * @file /magma/providers/external/clamav.c
> *
> * @brief Interface for the ClamAV library.
> *
> * $Author: Ladar Levison $
> * $Date: 2010/08/13 10:32:38 $
> * $Revision: ecaee526d4ba88a141c5b889dd023b13c05c2654 $
> // Scan the message. The OLE code has a bug in it that causes
> segfaults.

What bug ??

> // We ignore email that ClamAV thinks is a phishing
> based on scanner's internal heuristic checks. else if
> (starts_ci_bl_bl("Phishing", 8, virname, ns_get_length(virname)) ||
> starts_ci_bl_bl("Joke", 4, virname, ns_get_length(virname)))
> { pthread_rwlock_unlock(&virus_lock);
> stats_increment_by_name("provider.virus.scan.total");
> stats_increment_by_name("provider.virus.scan.clean"); close(fd);
> return 0; }

This is incorrect, if you want to match the heuristic Phishing
detection use Heuristics.Phishing.
There are signatures which contain *Phishing*, and *Joke*. ClamAV stops
on first match.

So if you get a zip that contains something ClamAV detects as
Phishing/Joke as first element in zip followed a real malware, then it
will only report the first match (Phishing/Joke). Your code will mark
it as clean, when in fact it could be infected.
(Note that this is not the case for Heuristics.Phishing where ClamAV
keeps on scanning and only reports the heuristics if it didn't find
anything else).

The proper way to deal with this is to not load the Phishing signatures
at all, there is an option you can pass to cl_load() for that.
For *Joke* there is no flag that you can pass though.

Best regards,
--Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Re: [QUESTION] How does clamAV updates the signature database on-the-fly? [ In reply to ]

ladar at lavabit

Aug 14, 2010, 3:00 AM

Post #3 of 5 (2503 views)

Permalink

On 8/14/2010 3:19 AM, Török Edwin wrote:
>>
>> // Scan the message. The OLE code has a bug in it that causes segfaults.
>
> What bug ??

That comment was related to a bug I found in Feb/2008 and v0.92.1, but
has long since been patched. See this email thread for details:

http://marc.info/?l=clamav-devel&m=120442553919615

I had an internal patch floating around for awhile that fixed the issue
inside ole2_walk_property_tree() by incrementing rec_level. Somewhere
along the line the issue was fixed, but I never removed the comment. The
relevant lines in v0.96.2 increment rec_level just like my patch did. I
never submitted the patch because back in 2008 because you indicated
that wasn't the best solution.

>> // We ignore email that ClamAV thinks is a phishing
>> based on scanner's internal heuristic checks. else if
>> (starts_ci_bl_bl("Phishing", 8, virname, ns_get_length(virname)) ||
>> starts_ci_bl_bl("Joke", 4, virname, ns_get_length(virname)))
>> { pthread_rwlock_unlock(&virus_lock);
>> stats_increment_by_name("provider.virus.scan.total");
>> stats_increment_by_name("provider.virus.scan.clean"); close(fd);
>> return 0; }
>
> This is incorrect, if you want to match the heuristic Phishing
> detection use Heuristics.Phishing.
> There are signatures which contain *Phishing*, and *Joke*. ClamAV stops
> on first match.
>
> So if you get a zip that contains something ClamAV detects as
> Phishing/Joke as first element in zip followed a real malware, then it
> will only report the first match (Phishing/Joke). Your code will mark
> it as clean, when in fact it could be infected.
> (Note that this is not the case for Heuristics.Phishing where ClamAV
> keeps on scanning and only reports the heuristics if it didn't find
> anything else).
>
> The proper way to deal with this is to not load the Phishing signatures
> at all, there is an option you can pass to cl_load() for that.
> For *Joke* there is no flag that you can pass though.

Is it possible to determine when ClamAV detects more than one virus and
iterate through the resulting names? I revisited the ex1.c file, and the
clamscan/manager.c file and they seem to suffer from the same issue. In
the case of clamscan, it only outputs the first virus name, which like
you pointed out could be innocuous compared to what else lies farther
along in the file.

If we are limited to only a single result, wouldn't it make more sense
to have a precendence order in place? Presumably malware would rate
ahead of phishing or jokes.

--
Ladar Levison
Lavabit LLC
http://lavabit.com

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Re: [QUESTION] How does clamAV updates the signature database on-the-fly? [ In reply to ]

edwintorok at gmail

Aug 14, 2010, 3:30 AM

Post #4 of 5 (2503 views)

Permalink

On Sat, 14 Aug 2010 05:00:46 -0500
Ladar Levison <ladar@lavabit.com> wrote:

> On 8/14/2010 3:19 AM, TÃ¶rÃ¶k Edwin wrote:
> >>
> >> // Scan the message. The OLE code has a bug in it that causes
> >> segfaults.
> >
> > What bug ??
>
> That comment was related to a bug I found in Feb/2008 and v0.92.1,
> but has long since been patched. See this email thread for details:
>
> http://marc.info/?l=clamav-devel&m=120442553919615
>
> I had an internal patch floating around for awhile that fixed the
> issue inside ole2_walk_property_tree() by incrementing rec_level.
> Somewhere along the line the issue was fixed, but I never removed the
> comment. The relevant lines in v0.96.2 increment rec_level just like
> my patch did. I never submitted the patch because back in 2008
> because you indicated that wasn't the best solution.
>
> >> // We ignore email that ClamAV thinks is a phishing
> >> based on scanner's internal heuristic checks. else if
> >> (starts_ci_bl_bl("Phishing", 8, virname, ns_get_length(virname)) ||
> >> starts_ci_bl_bl("Joke", 4, virname, ns_get_length(virname)))
> >> { pthread_rwlock_unlock(&virus_lock);
> >> stats_increment_by_name("provider.virus.scan.total");
> >> stats_increment_by_name("provider.virus.scan.clean"); close(fd);
> >> return 0; }
> >
> > This is incorrect, if you want to match the heuristic Phishing
> > detection use Heuristics.Phishing.
> > There are signatures which contain *Phishing*, and *Joke*. ClamAV
> > stops on first match.
> >
> > So if you get a zip that contains something ClamAV detects as
> > Phishing/Joke as first element in zip followed a real malware, then
> > it will only report the first match (Phishing/Joke). Your code will
> > mark it as clean, when in fact it could be infected.
> > (Note that this is not the case for Heuristics.Phishing where ClamAV
> > keeps on scanning and only reports the heuristics if it didn't find
> > anything else).
> >
> > The proper way to deal with this is to not load the Phishing
> > signatures at all, there is an option you can pass to cl_load() for
> > that. For *Joke* there is no flag that you can pass though.
>
> Is it possible to determine when ClamAV detects more than one virus
> and iterate through the resulting names?

Currently no.

> I revisited the ex1.c file,
> and the clamscan/manager.c file and they seem to suffer from the same
> issue. In the case of clamscan, it only outputs the first virus name,
> which like you pointed out could be innocuous compared to what else
> lies farther along in the file.
>
> If we are limited to only a single result, wouldn't it make more
> sense to have a precendence order in place? Presumably malware would
> rate ahead of phishing or jokes.
>

Heuristics.Phishing.* will not stop the scan, and report only if
nothing else is found.
Other engine detections could be changed to behave the same way.
Signature based detections however always stop on first match, and that
is not configurable.
If you want to ignore certain signature categories, it is best to not
load them in the first place. To do that you can unpack the DBs, and
remove the sigs you don't want.

Best regards,
--Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Re: [QUESTION] How does clamAV updates the signature database on-the-fly? [ In reply to ]

ladar at lavabit

Aug 14, 2010, 6:08 AM

Post #5 of 5 (2499 views)

Permalink

On 8/14/2010 5:30 AM, TÃ¶rÃ¶k Edwin wrote:
> Heuristics.Phishing.* will not stop the scan, and report only if
> nothing else is found.
> Other engine detections could be changed to behave the same way.
> Signature based detections however always stop on first match, and that
> is not configurable.
> If you want to ignore certain signature categories, it is best to not
> load them in the first place. To do that you can unpack the DBs, and
> remove the sigs you don't want.
>

What I'm trying to do is let the user decide whether to enable a
specific category, so removing the signatures from the database isn't an
option for me. For awhile now, our users have been able to use the
preferences portal on our website to enable/disable malware checks
and/or phishing checks. The malware category is usually a reliable
reason to send a message to the bit bucket. While the latter
Phishing/Heuristic/Joke categories are more likely to generate a false
positive. Perhaps its just me, but I would consider the ability to
reliably determine what ClamAV found important.

--
Ladar Levison
Lavabit LLC
http://lavabit.com

_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Mailing List Archive

Mailing List Archive

Attached Files: