Mailing List Archive

[clamav-users] human friendly signatures
Hi all,

Sorry that this response come so late that is nearly a necro-thread. Things have been busy.

I've been thinking about some of the thing you all have said. And we've talked about it a bit as a team.

We know there is a lot of interest in having better Yara support, not only because it is easier to use but because there is a wealth of Yara rules out there. Support for features from the 'pe' module is something that I believe is particularly desirable.

The KDL-based format that I'm brainstorming does indeed look very similar to Yara. I hope it's not too close. If we do this, we won't want the new language to be so close to Yara that people get confused between them. I certainly wouldn't want to have ours be the Yara format but missing some features and with other extra features. That would be very confusing.

We have a need to maintain our own signature language to support features unique to ClamAV. Starting with our own new language would let us maintain do that but make it easier for new analysts to train up on ClamAV.
Another reason for tossing the baby with the bathwater is that the clamav's existing signature formats are difficult to extend. I say formats because the format for different types of ClamAV signatures aren't all the same. We have a bunch: https://docs.clamav.net/manual/Signatures.html#database-formats I'm hoping we can unify them.

Ged's comment about needing the ability to reference one rule from another rule is something that resonates with us a lot. Being able to have an alert triggered by the combination of some weak indicators would be great. We have something like that in LDB signatures to leverage NDB signatures by way of "macro subsignatures" https://docs.clamav.net/manual/Signatures/LogicalSignatures.html#macro-subsignatures. Macro subsignatures are limited, though. You couldn't, for example, tie together a content-based signature with a certificate revocation signature.

What would be every more cool would be to be able to have an archive alert because we found weak indicators in several of the contained files. Like a rule for "a ZIP that contains HTML that alerts with A, and EXE that alerts with B". Indeed, the ability to have severity levels for signatures so we could even have weak indicators is the sort of thing that would be very difficult to add to our existing signature language. It probably wouldn't be so bad for logical signatures but extending this concept to other signature formats like PE section hash signatures, certificate trust and revocation signatures, etc. wouldn't be fun.

Ged's idea about using Yara's engine in clamav directly is something that has been brought up time and again. It is possible. My understanding is that the reason ClamAV's yara support isn't done this way is that it would require a second pass over the file with a Yara's pattern matcher, after ClamAV's pattern matcher, and that the performance concern made it make more sense to try and load yara rules into ClamAV's matcher instead. I honestly don't have any numbers to back up this argument. It sounds reasonable, but I'd love to see the numbers.

Kris Deugau wrote that he wants embedded comments in signatures, and for ClamAV to ignore empty lines. End-of-line comments probably wouldn't be so bad to add to our current signature language(s). At least for some of the signature types. In-line comments would be more difficult. Ignoring empty lines would be trivial. We'd just have to add it for each of the signature types, which is fine. Honestly I don't know why we haven't done that sooner. We probably couldn't publish anything with empty lines because older versions would choke on it, but at least making it supported for new versions would be nice.

We do really want comments in the new signature format though, so seeing that KDL has 3 different types of comments felt really good.

Anyways, I've been rambling a bit. I think it's pretty clear that we should work on a new signature language for Clam, whether it's KDL-based or something else. If anyone has any other ideas about it, I'd love to hear them. Like if you have any ideas on a different format, or maybe how the KDL-based one could express dependencies on other signature alerts or whatever.

Cheers,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Laurent S. via clamav-users <clamav-users@lists.clamav.net>
Sent: Friday, March 4, 2022 5:43 AM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Laurent S. <110ef9e3086d8405c2929e34be5b4340@protonmail.ch>
Subject: Re: [clamav-users] Minor bug or working as intended?

On Wednesday, March 2nd, 2022 at 18:37, Kris Deugau <kdeugau@vianet.ca> wrote:

> Kris Deugau wrote:
>

> > For some types of content, just allowing a plain ASCII string instead of
> >

> > the hex-coded version of the same would be a big help. Or an
> >

> > enhancement in the current file formats allowing embedded comments -
> >

> > I've lost track of how many times I've created something complex, and
> >

> > had to reconstruct whatever logic I used to create it to make a tweak or
> >

> > refinement - or just gave up and created a new signature - because
> >

> > there's no way to document it in-band. Ignoring empty lines -
> >

> > especially at the end of the signature file! - instead of just claiming
> >

> > "invalid signature" would ease editing.
>


We are using a small GUI where you can create (as strings) and view (as strings) all the hundreds of simple .ndb signatures with a database and automatic expiration. But yes, having a similar sig format where you can just input strings... that could be easier to manage.

Concerning KDL, I'd really prefer a reliable implementation of YARA for the compatibility with other softwares. There are plenty of yara rules on the web, and it would be awesome to be able to import them easily.

Best,
Laurent

PS: Sorry for replying late
Re: [clamav-users] human friendly signatures [ In reply to ]
On Tuesday, March 15th, 2022 at 00:36, Micah Snyder (micasnyd) <micasnyd@cisco.com> wrote:

> Starting with our own new language would let us maintain do that but make it easier for new analysts to train up on ClamAV.

I don't see at all the advantage of using a different, less used language. I don't know many people looking forward to learn a new language that is quite specific to one software and used more or less nowhere else.

One big reason I like to use ClamAV is that it's possible to add other sources of signatures. Lots of people use the sanesecurity ones. I add a lot of my own. I suppose there's a big amount of people who would love to add more (ie YARA) sources.

Is the goal for KDL to replace all of the existing ClamAV formats? I guess the transition would be a whole lot of effort from a LOT of people.

> What would be every more cool would be to be able to have an archive alert because we found weak indicators in several of the contained files.


I love the idea of weak indicators. But then, I'd like to have a more fine grained result in case of a hit. Something less binary but more something like a score. So that the amount of false positives could be more chosen. This would mean my paranoid customers could be as happy as the ones jumping to the roof at the first FP.

Best regards,
Laurent S.
Re: [clamav-users] human friendly signatures [ In reply to ]
Hi there,

On Tue, 15 Mar 2022, Laurent S. via clamav-users wrote:
> On Tuesday, March 15th, 2022 at 00:36, Micah Snyder wrote:
>
>> Starting with our own new language would let us maintain do that
>> but make it easier for new analysts to train up on ClamAV.
>
> I don't see at all the advantage of using a different, less used
> language. I don't know many people looking forward to learn a new
> language that is quite specific to one software and used more or
> less nowhere else.

Well I can understand that features which are unique to ClamAV might
demand something more flexible than the Yara specification, although I
don't profess to have great insight into that. I wonder if this means
there's a case for "ClamAV *extensions* to the Yara language" or some
variation on that theme. I guess it wouldn't be too difficult to make
the extensions sufficiently non-Yara like to avoid clashes with future
developments of Yara itself. In case it isn't obvious we already have
a "ClamAV *version* of the Yara language" so this suggestion might not
be as outrageous as it seems.

>> using Yara's engine in clamav directly is something that has been
>> brought up time and again. It is possible. My understanding is that
>> the reason ClamAV's yara support isn't done this way is that it
>> would require a second pass over the file with a Yara's pattern
>> matcher, after ClamAV's pattern matcher, and that the performance
>> concern made it make more sense to try and load yara rules into
>> ClamAV's matcher instead.

Speaking selfishly I wouldn't be greatly inconvenienced by an increase
in the scan times (even if it doubles) caused by separating the Yara
engine from the ClamAV engine. That's because I only scan mail, and
the clamd server is well on top of it. I can understand that people
who scan filesystems might have a different point of view; maybe both
could be accommodated with a config option.

>> I honestly don't have any numbers to back up this argument. It
>> sounds reasonable, but I'd love to see the numbers.

I occasionally run more than one clamd instance and I've seriously
considered running a separate one purely so that that Yara rules are
kept separate from the rest. I always log scan times. It will be a
bit fiddly, but when I get a minute I'll set something up to try to
give you some numbers.

> One big reason I like to use ClamAV is that it's possible to add
> other sources of signatures. Lots of people use the sanesecurity
> ones. I add a lot of my own.

+1

Finally, unashamed repetition:

(1) a plea for a way to test rules before they go live;

(2) another plea for a parser which is good at its job;

(3) a way to specify that a rule is to match in
(a) mail headers only or
(b) mail body only or
(c) both;

and lastly

(4) it would be great to have a way to reload rulesets separately so
it isn't necessary to reload ten million signatures when you've only
added one Yara rule, only then to find clamd crashes the first time it
tries to scan anything because you broke that rule. I understand this
might be asking a lot, and a decent parser which prevents attempts to
load garbage rules (point 2) would do a lot to alleviate this pain.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] human friendly signatures [ In reply to ]
On Tue, Mar 15, 2022 at 1:53 PM G.W. Haywood via clamav-users <
clamav-users@lists.clamav.net> wrote:

> Hi there,
>
> On Tue, 15 Mar 2022, Laurent S. via clamav-users wrote:
> >> using Yara's engine in clamav directly is something that has been
> >> brought up time and again. It is possible. My understanding is that
> >> the reason ClamAV's yara support isn't done this way is that it
> >> would require a second pass over the file with a Yara's pattern
> >> matcher, after ClamAV's pattern matcher, and that the performance
> >> concern made it make more sense to try and load yara rules into
> >> ClamAV's matcher instead.
>
> Speaking selfishly I wouldn't be greatly inconvenienced by an increase
> in the scan times (even if it doubles) caused by separating the Yara
> engine from the ClamAV engine. That's because I only scan mail, and
> the clamd server is well on top of it. I can understand that people
> who scan filesystems might have a different point of view; maybe both
> could be accommodated with a config option.


Anything that increases scan times would be prohibitive for me. We use
ClamAV to scan around a billion files per day and the primary thing
stopping us from using Yara is the increase in scan times.


> >> I honestly don't have any numbers to back up this argument. It

>> sounds reasonable, but I'd love to see the numbers.
>
> I occasionally run more than one clamd instance and I've seriously
> considered running a separate one purely so that that Yara rules are
> kept separate from the rest. I always log scan times. It will be a
> bit fiddly, but when I get a minute I'll set something up to try to
> give you some numbers.
>
>
We run multiple clamd instances specifically to load different sets of
signatures for different purposes.

For example, if we have instance 1 with very specific signatures and
instance 2 with more general signatures and instance 3 with ClamAV / 3rd
party signatures, we would first scan against instance 1 and, if we don't
get a match, we then scan against instance 2 and, if still no match,
against instance 3.


> > One big reason I like to use ClamAV is that it's possible to add
> > other sources of signatures. Lots of people use the sanesecurity
> > ones. I add a lot of my own.
>
> +1
>
>
For us, the attraction is the ease of creating our own signatures more than
the 3rd-party signatures, though 3rd-party signatures are a definite plus.


> Finally, unashamed repetition:
>
> (1) a plea for a way to test rules before they go live;
>

This is relatively straightforward to do on your own (save the signatures
in a temp location, create a file with something that you know will match,
and scan to make sure it is detected), so the fact that it's not built-in
is a bit confusing.


> (2) another plea for a parser which is good at its job;
>
> (3) a way to specify that a rule is to match in
> (a) mail headers only or
> (b) mail body only or
> (c) both;
>

This would be awesome for mail, but also for any file that has
differentiated parts. It would be great to have a better macro style that
would allow you to combine multiple signatures to produce a different
classification (sort of like logical signatures, but with the ability for
each sub-signature to hit different filetypes).

and lastly
>
> (4) it would be great to have a way to reload rulesets separately so
> it isn't necessary to reload ten million signatures when you've only
> added one Yara rule, only then to find clamd crashes the first time it
> tries to scan anything because you broke that rule. I understand this
> might be asking a lot, and a decent parser which prevents attempts to
> load garbage rules (point 2) would do a lot to alleviate this pain.
>

100% this. Having the ability to load a diff rather than the complete
database would be an enormous boon.

--Maarten
Re: [clamav-users] human friendly signatures [ In reply to ]
The goal for the new sig format would be to include all the existing signature features currently spread across the existing ClamAV-specific signature file formats.
Right now we have different file formats for:

* NDB
* LDB
* CDB
* FTM
* CRB
* CFG
* PDB,WDB, HDB, HSB, MDB, MSB, FP, SFP, IGN2, and PWDB).

from multiple file formats that are hard to read, hard to write, and hard to extend. We would like to the new down into one format that is easier both for the signature authors and the developers.
We want to make a sigtool feature that can transcode from the old to the new, though we have no plans to remove support for the old signature formats. We might say they're deprecated to encourage folks to develop new content in the new format, but they would continue to work for the foreseeable future.

New signature features would only be added to the new signature format.

The goal is not to do away with Yara rule support. We will continue to try to maintain the existing (limited) Yara rule support, and are still open to improving it.



Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Laurent S. via clamav-users <clamav-users@lists.clamav.net>
Sent: Tuesday, March 15, 2022 3:42 AM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Laurent S. <110ef9e3086d8405c2929e34be5b4340@protonmail.ch>
Subject: Re: [clamav-users] human friendly signatures

On Tuesday, March 15th, 2022 at 00:36, Micah Snyder (micasnyd) <micasnyd@cisco.com> wrote:

> Starting with our own new language would let us maintain do that but make it easier for new analysts to train up on ClamAV.

I don't see at all the advantage of using a different, less used language. I don't know many people looking forward to learn a new language that is quite specific to one software and used more or less nowhere else.

One big reason I like to use ClamAV is that it's possible to add other sources of signatures. Lots of people use the sanesecurity ones. I add a lot of my own. I suppose there's a big amount of people who would love to add more (ie YARA) sources.

Is the goal for KDL to replace all of the existing ClamAV formats? I guess the transition would be a whole lot of effort from a LOT of people.

> What would be every more cool would be to be able to have an archive alert because we found weak indicators in several of the contained files.


I love the idea of weak indicators. But then, I'd like to have a more fine grained result in case of a hit. Something less binary but more something like a score. So that the amount of false positives could be more chosen. This would mean my paranoid customers could be as happy as the ones jumping to the roof at the first FP.

Best regards,
Laurent S.
Re: [clamav-users] human friendly signatures [ In reply to ]
Augh! Some hot-key combination just sent my email draft! Sorry! I was working on a list of the different distinct file formats we currently have, none of which are very easy to read.
I'm hoping to illustrate that if we can consolidate this down to something user-friendly it will be a big improvement.

Basing the file structure on the KDL language is just my initial proposal. My teammate Scott is brainstorming some other ideas. We have yet to make any hard decisions.

I agree with you about some sort of scoring. Some signatures might never indicate maliciousness and be very-weak indicators. Some might be very strong indicators; e.g. hash-based sigs for ransomware. I too would like to add some different levels. I don't know if a number-based scoring system makes sense, or if just a handful of different categories is sufficient. More research needed.

Regards,
Micah



Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: Micah Snyder (micasnyd) <micasnyd@cisco.com>
Sent: Wednesday, March 16, 2022 12:10 PM
To: ClamAV users ML <clamav-users@lists.clamav.net>; Laurent S. <110ef9e3086d8405c2929e34be5b4340@protonmail.ch>
Subject: Re: [clamav-users] human friendly signatures

The goal for the new sig format would be to include all the existing signature features currently spread across the existing ClamAV-specific signature file formats.
Right now we have different file formats for:

* NDB
* LDB
* CDB
* FTM
* CRB
* CFG
* PDB,WDB, HDB, HSB, MDB, MSB, FP, SFP, IGN2, and PWDB).

from multiple file formats that are hard to read, hard to write, and hard to extend. We would like to the new down into one format that is easier both for the signature authors and the developers.
We want to make a sigtool feature that can transcode from the old to the new, though we have no plans to remove support for the old signature formats. We might say they're deprecated to encourage folks to develop new content in the new format, but they would continue to work for the foreseeable future.

New signature features would only be added to the new signature format.

The goal is not to do away with Yara rule support. We will continue to try to maintain the existing (limited) Yara rule support, and are still open to improving it.



Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of Laurent S. via clamav-users <clamav-users@lists.clamav.net>
Sent: Tuesday, March 15, 2022 3:42 AM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: Laurent S. <110ef9e3086d8405c2929e34be5b4340@protonmail.ch>
Subject: Re: [clamav-users] human friendly signatures

On Tuesday, March 15th, 2022 at 00:36, Micah Snyder (micasnyd) <micasnyd@cisco.com> wrote:

> Starting with our own new language would let us maintain do that but make it easier for new analysts to train up on ClamAV.

I don't see at all the advantage of using a different, less used language. I don't know many people looking forward to learn a new language that is quite specific to one software and used more or less nowhere else.

One big reason I like to use ClamAV is that it's possible to add other sources of signatures. Lots of people use the sanesecurity ones. I add a lot of my own. I suppose there's a big amount of people who would love to add more (ie YARA) sources.

Is the goal for KDL to replace all of the existing ClamAV formats? I guess the transition would be a whole lot of effort from a LOT of people.

> What would be every more cool would be to be able to have an archive alert because we found weak indicators in several of the contained files.


I love the idea of weak indicators. But then, I'd like to have a more fine grained result in case of a hit. Something less binary but more something like a score. So that the amount of false positives could be more chosen. This would mean my paranoid customers could be as happy as the ones jumping to the roof at the first FP.

Best regards,
Laurent S.
Re: [clamav-users] human friendly signatures [ In reply to ]
> Well I can understand that features which are unique to ClamAV might
> demand something more flexible than the Yara specification, although I
> don't profess to have great insight into that. I wonder if this means
> there's a case for "ClamAV *extensions* to the Yara language" or some
> variation on that theme. I guess it wouldn't be too difficult to make
> the extensions sufficiently non-Yara like to avoid clashes with future
> developments of Yara itself. In case it isn't obvious we already have
> a "ClamAV *version* of the Yara language" so this suggestion might not
> be as outrageous as it seems.

The status quo is a sub-set of what's possible with the Yara. I think that's vastly different than adding features that won't make any sense to real-Yara. That said, ClamAV extensions to the Yara language isn't a terrible idea. It's an idea my boss kicked around a bit when we were chatting last week. Some context: He started the Yara signature support effort in ClamAV. And he has friends in the Yara community.
I don't personally think that Yara + ClamAV extensions will be sufficient for all the different features we'll need. But I don't really have a vision in mind for how that would look. I'd be happy to be proven wrong with some proof-of-concepts work demonstrating each of the different features we have now.

> (1) a plea for a way to test rules before they go live;

If you mean "for personal use" then I'd say, "What Maarten said." But if you mean so Cisco-Talos malware analysts can do more extensive testing with like "hunting signatures" before publishing as "malware signatures" then the answer is different. I'm probably not the best person to discuss what's in the works there. I'll leave that question open to my colleagues on the malware research side.

> (2) another plea for a parser which is good at its job;

I'm not sure what you mean here. Can you elaborate? If you simply want ClamAV ignore garbage rules on load and continue with the rest of the file (see point #4) - that's something we can easily improve regardless of what we do. And that's how our yara rule loading logic works right now.

> (3) a way to specify that a rule is to match in
> (a) mail headers only or
> (b) mail body only or
> (c) both;

This is a neat idea. It is a new signature language feature request and is a great example of something that would be hard to implement in the current clamav signature language(s). If you have any ideas on how this may be expressed either in the "clamav yara extensions" idea or in the proposed "KDL-based signature language" or some other proposed format, I'd love some examples.

> (4) it would be great to have a way to reload rulesets separately so
> it isn't necessary to reload ten million signatures when you've only
> added one Yara rule, only then to find clamd crashes the first time it
> tries to scan anything because you broke that rule. I understand this
> might be asking a lot, and a decent parser which prevents attempts to
> load garbage rules (point 2) would do a lot to alleviate this pain.

Asking Clam to load additional rules to an existing engine while scans are ongoing is tricky, but potentially??? doable. It throws a wrench into the works for some hardening ideas I'm proposing for scan process sandboxing. Sort of. It's a more to think about.

Asking Clam to unload specific rules from an active scanning engine has the same problem plus considerations about how to drop stuff from the trie structures without breaking anything. It's also potentially doable.

Asking Clam to reload a modified signature database in-place is a different story. Let's say you have a ClamD running that had database version A. You modify the database file so now it's version B and version A is gone. And you want ClamD to look at version B and figure out which signatures have been added/removed/changed and update accordingly. I don't think that's something we can do. When signatures are loaded, we store bits of patterns in a variety of structures like tries, lists, hashmaps, etc. Figuring out which bits to remove and which to keep would be a bit of a nightmare. I would imagine that you would have to build a reference as big as the loaded engine while doing it just to sort it all out.

Regards,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.

________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Sent: Tuesday, March 15, 2022 10:51 AM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] human friendly signatures

Hi there,

On Tue, 15 Mar 2022, Laurent S. via clamav-users wrote:
> On Tuesday, March 15th, 2022 at 00:36, Micah Snyder wrote:
>
>> Starting with our own new language would let us maintain do that
>> but make it easier for new analysts to train up on ClamAV.
>
> I don't see at all the advantage of using a different, less used
> language. I don't know many people looking forward to learn a new
> language that is quite specific to one software and used more or
> less nowhere else.

Well I can understand that features which are unique to ClamAV might
demand something more flexible than the Yara specification, although I
don't profess to have great insight into that. I wonder if this means
there's a case for "ClamAV *extensions* to the Yara language" or some
variation on that theme. I guess it wouldn't be too difficult to make
the extensions sufficiently non-Yara like to avoid clashes with future
developments of Yara itself. In case it isn't obvious we already have
a "ClamAV *version* of the Yara language" so this suggestion might not
be as outrageous as it seems.

>> using Yara's engine in clamav directly is something that has been
>> brought up time and again. It is possible. My understanding is that
>> the reason ClamAV's yara support isn't done this way is that it
>> would require a second pass over the file with a Yara's pattern
>> matcher, after ClamAV's pattern matcher, and that the performance
>> concern made it make more sense to try and load yara rules into
>> ClamAV's matcher instead.

Speaking selfishly I wouldn't be greatly inconvenienced by an increase
in the scan times (even if it doubles) caused by separating the Yara
engine from the ClamAV engine. That's because I only scan mail, and
the clamd server is well on top of it. I can understand that people
who scan filesystems might have a different point of view; maybe both
could be accommodated with a config option.

>> I honestly don't have any numbers to back up this argument. It
>> sounds reasonable, but I'd love to see the numbers.

I occasionally run more than one clamd instance and I've seriously
considered running a separate one purely so that that Yara rules are
kept separate from the rest. I always log scan times. It will be a
bit fiddly, but when I get a minute I'll set something up to try to
give you some numbers.

> One big reason I like to use ClamAV is that it's possible to add
> other sources of signatures. Lots of people use the sanesecurity
> ones. I add a lot of my own.

+1

Finally, unashamed repetition:

(1) a plea for a way to test rules before they go live;

(2) another plea for a parser which is good at its job;

(3) a way to specify that a rule is to match in
(a) mail headers only or
(b) mail body only or
(c) both;

and lastly

(4) it would be great to have a way to reload rulesets separately so
it isn't necessary to reload ten million signatures when you've only
added one Yara rule, only then to find clamd crashes the first time it
tries to scan anything because you broke that rule. I understand this
might be asking a lot, and a decent parser which prevents attempts to
load garbage rules (point 2) would do a lot to alleviate this pain.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] human friendly signatures [ In reply to ]
On 16 March 2022 20:29:19 "Micah Snyder \(micasnyd\) via clamav-users"
<clamav-users@lists.clamav.net> wrote:
> yara rule loading logic works right now.
>
>
>> (3) a way to specify that a rule is to match in
>> (a) mail headers only or
>> (b) mail body only or
>> (c) both;
Just a random early thought... could .ldb be extended... by reading the
whole message processing as normal... but if its a header line mark as h,
body with a b...


So if the ldb could be extended with h/b... you could still use the normal
ldb logic...


Test;Engine:81-255,Target:0;(h0&b0=0);hex;hex


Test;Engine:81-255,Target:0;(b0);

h=headers only line
b=body only line

So h0 hex will only match if its a header line
So b0 hex will only matt h if its a body line
Sorry for the formatting.. on mobile.


Cheers,

Steve
Twitter: @sanesecurity
Re: [clamav-users] human friendly signatures [ In reply to ]
Steve,

I like the idea, but why the hex; hex?
Just thinking about my recent issues with direct deposit phishing emails from gmail.com and they are written probably by people, so I can’t really hash it, and have to regex it.

> On Mar 16, 2022, at 5:10 PM, Steve Basford <steveb_clamav@sanesecurity.com> wrote:
>
> On 16 March 2022 20:29:19 "Micah Snyder \(micasnyd\) via clamav-users" <clamav-users@lists.clamav.net <mailto:clamav-users@lists.clamav.net>> wrote:
>
>> yara rule loading logic works right now.
>>
>> > (3) a way to specify that a rule is to match in
>> > (a) mail headers only or
>> > (b) mail body only or
>> > (c) both;
>>
>>
>
> Just a random early thought... could .ldb be extended... by reading the whole message processing as normal... but if its a header line mark as h, body with a b...
>
> So if the ldb could be extended with h/b... you could still use the normal ldb logic...
>
> Test;Engine:81-255,Target:0;(h0&b0=0);hex;hex
>
> Test;Engine:81-255,Target:0;(b0);
>
> h=headers only line
> b=body only line
>
> So h0 hex will only match if its a header line
> So b0 hex will only matt h if its a body line
> Sorry for the formatting.. on mobile.
>
> Cheers,
>
> Steve
> Twitter: @sanesecurity
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net <mailto:clamav-users@lists.clamav.net>
> https://lists.clamav.net/mailman/listinfo/clamav-users <https://lists.clamav.net/mailman/listinfo/clamav-users>
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq <https://github.com/vrtadmin/clamav-faq>
>
> http://www.clamav.net/contact.html#ml <http://www.clamav.net/contact.html#ml>
Re: [clamav-users] human friendly signatures [ In reply to ]
On 16 March 2022 22:16:05 Eric Tykwinski <eric-list@truenet.com> wrote:
> Steve,
>
> I like the idea, but why the hex; hex?
> Just thinking about my recent issues with direct deposit phishing emails
> from gmail.com and they are written probably by people, so I can’t really
> hash it, and have to regex it.



>
>
>> On Mar 16, 2022, at 5:10 PM, Steve Basford <steveb_clamav@sanesecurity.com>
>> wrote:
>>
>> On 16 March 2022 20:29:19 "Micah Snyder \(micasnyd\) via clamav-users"
>> <clamav-users@lists.clamav.net> wrote:
>>> yara rule loading logic works right now.
>>>
>>>
>>>> (3) a way to specify that a rule is to match in
>>>> (a) mail headers only or
>>>> (b) mail body only or
>>>> (c) both;
>> Just a random early thought... could .ldb be extended... by reading the
>> whole message processing as normal... but if its a header line mark as h,
>> body with a b...
>>
>>
>> So if the ldb could be extended with h/b... you could still use the normal
>> ldb logic...
>>
>>
>> Test;Engine:81-255,Target:0;(h0&b0=0);hex;hex
>>
>>
>> Test;Engine:81-255,Target:0;(b0);
>>
>> h=headers only line
>> b=body only line
>>
>> So h0 hex will only match if its a header line
>> So b0 hex will only matt h if its a body line
>> Sorry for the formatting.. on mobile.
>>
>>
>> Cheers,
>>
>> Steve
>> Twitter: @sanesecurity
>> _______________________________________________
>>
>> clamav-users mailing list
>> clamav-users@lists.clamav.net
>> https://lists.clamav.net/mailman/listinfo/clamav-users
>>
>>
>> Help us build a comprehensive ClamAV guide:
>> https://github.com/vrtadmin/clamav-faq
>>
>> http://www.clamav.net/contact.html#ml
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml


Cheers,

Steve
Twitter: @sanesecurity
Re: [clamav-users] human friendly signatures [ In reply to ]
On 16 March 2022 22:16:05 Eric Tykwinski <eric-list@truenet.com> wrote:
> Steve,
>
> I like the idea, but why the hex; hex?

Sorry, should have been clearer... not just hex but....

Test;Engine:81-255,Target:0;(b0&h1);0f0f0f*0b0b0b;0/blah*(?:[4-7]|[8003]\d)/
etc...>Just thinking about my recent issues with direct deposit phishing
emails from gmail.com and they are written probably by people, so I can’t
really hash it, and have to regex it.


Cheers,

Steve
Twitter: @sanesecurity
Re: [clamav-users] human friendly signatures [ In reply to ]
Hi Micah,

On Wed, 16 Mar 2022, Micah Snyder (micasnyd) wrote:

>> (1) a plea for a way to test rules before they go live;
>
> If you mean "for personal use" then I'd say, "What Maarten said."

Er, no. Not "scan to make sure it detects things". What I meant was
"do something to make sure it won't e.g. crash clamd when it tries to
scan something after this rule has been loaded" - but see (2) below.

>> (2) another plea for a parser which is good at its job;
>
> I'm not sure what you mean here. Can you elaborate? If you simply
> want ClamAV ignore garbage rules on load and continue with the rest
> of the file (see point #4) - that's something we can easily improve
> regardless of what we do. And that's how our yara rule loading logic
> works right now.

I strongly feel that if it finds a problem, rather than silently load
some sub-optimal ruleset the parser should abandon the reload of the
entire ruleset. Obviously it should warn when it does that. I guess
this might be an issue if it's running on a machine with too little
RAM to reload while simultaneously scanning with the previous ruleset,
but something like a --test-ruleset option could probably handle that.

The following is from something I was doing back in June 2021, so it's
before your pull request of 2021.08.21:

https://github.com/Cisco-Talos/clamav/pull/261

and I haven't retested, but these are the sorts of things that were
driving me crazy around the middle of last year:

8<----------------------------------------------------------------------
$ diff -U3 Garbage_Rules.yar.~140~.NBG Garbage_Rules.yar.~141~.OK
--- Garbage_Rules.yar.~140~.NBG 2021-06-13 08:05:54.218256634 +0100
+++ Garbage_Rules.yar.~141~.OK 2021-06-13 08:08:11.025783287 +0100
@@ -30,7 +30,7 @@
strings:
$ = /update from GOV.{1,10}UK/ ascii nocase
condition:
- any of them and not Blacklist_1
+ not Blacklist_1 and any of them
}
8<----------------------------------------------------------------------
$ cat does_not_notice_missing_curly_brace
private rule Email_marketing
{
strings:
$ = "email marketing" ascii nocase // Testing
condition:
any of them
}

// Test private rule
rule Garbage_spam_testing_Rule
strings:
$TLD_4_to_20_chars = /htt(p|ps):\/\/[-a-z0-9]{3,50}\.[a-z]{4,20}\/./ ascii nocase
$ = "email marketing" ascii nocase
condition:
all of them
}
8<----------------------------------------------------------------------
$ cat does_not_notice_missing_dollar_symbol
--- Garbage_Rules.yar.~297~ 2021-07-30 14:43:26.540758502 +0100
+++ Garbage_Rules.yar 2021-07-30 14:46:30.277470587 +0100
@@ -29,7 +29,7 @@
rule test_single_string
{
strings:
- = /cc.{1,3}abuse@jubileegroup.co.uk/ ascii nocase
+ $ = /cc.{1,3}abuse@jubileegroup.co.uk/ ascii nocase
condition:
any of them
}
8<----------------------------------------------------------------------
$ cat five_more_yara_bugs
See Garbage rules of late June to early July 2021.

1. It doesn't notice if you have more than one string with the same name.

2. If you have a string with a name that isn't referenced in the condition, it crashes.

3. It crashes if you mistakenly write (see 199-200) something like
condition:
Spam_trap and ( any of ($spammer_*) or any of ($warning_*) or (#publish_* > 4) )

4. It crashes if you mistakenly write something like .*{range} (for example see 200-201)
$ = /we.*{1,50}(sell|sale)/ ascii nocase
which should be
$ = /we.{1,50}(sell|sale)/ ascii nocase

5. If you want to match "Alfreton, Derbyshire" the string "Alfreton, Derbyshire" *does*
match if you use the form

$ = "Alfreton, Derbyshire" ascii nocase

but it does *not* match if you use the form

$ = "Alfreton, Derbyshire" ascii
8<----------------------------------------------------------------------

See also e.g.

https://bugzilla.clamav.net/show_bug.cgi?id=12095

While I was looking at this I also came upon another quirk that can be
a bit of a nuisance. AFAICT Yara strings can only be delimited by one
of two characters, either a double-quote (for a literal string) or a
forward-slash (for a regex). It would help to be able to choose the
quote character like in Perl; if not, at least having more available
to choose from could make many expressions more readable, especially
those which target e.g. HTML and links in mail (both of which tend to
have many occurrences of double-quote or forward-slash characters).

>> (3) a way to specify that a rule is to match in
>> (a) mail headers only or
>> (b) mail body only or
>> (c) both;
>
> This is a neat idea. It is a new signature language feature request
> and is a great example of something that would be hard to implement
> in the current clamav signature language(s). If you have any ideas
> on how this may be expressed either in the "clamav yara extensions"
> idea or in the proposed "KDL-based signature language" or some other
> proposed format, I'd love some examples.

In Yara, something like

rule only_match_RFQ_if_found_in_subject_header
{
strings:
$a = /^Subject:\s*RFQ.*$/
condition:
mail_header and any of them
}

should do it. The 'mail_header' condition would mean "We're scanning
an RFC5322 mail message and additionally this match looks only at the
bit before the first blank line in the message text" - what in RFC6522
is called the "Full Header Section" (section 2, page 7). Of course a
'mail_body' condition would mean "... and additionally this match only
considers the bit *after* the first blank line in the message text" or
in the RFC6522 definition, the "Full Body". It's very simple to split
a mail message into these two parts. The delimiter is just the first
blank line in the text. The rule wouldn't match at all if we aren't
scanning something which ClamAV has decided is a mail message.

>> (4) it would be great to have a way to reload rulesets separately so
>> it isn't necessary to reload ten million signatures when you've only
>> added one Yara rule, only then to find clamd crashes the first time it
>> tries to scan anything because you broke that rule. I understand this
>> might be asking a lot ...
>
> Asking Clam to load additional rules to an existing engine while
> scans are ongoing is tricky, but potentially??? doable. It throws a
> wrench into the works for some hardening ideas I'm proposing ...

As I said, I understand it might be asking a lot. I hadn't considered
that it might militate against hardening, and I think hardening should
take priority.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] human friendly signatures [ In reply to ]
G.W. Haywood via clamav-users wrote:
> Hi Micah,
>
> On Wed, 16 Mar 2022, Micah Snyder (micasnyd) wrote:
>> I'm not sure what you mean here.  Can you elaborate?  If you simply
>> want ClamAV ignore garbage rules on load and continue with the rest
>> of the file (see point #4) - that's something we can easily improve
>> regardless of what we do. And that's how our yara rule loading logic
>> works right now.
>
> I strongly feel that if it finds a problem, rather than silently load
> some sub-optimal ruleset the parser should abandon the reload of the
> entire ruleset.  Obviously it should warn when it does that.  I guess
> this might be an issue if it's running on a machine with too little
> RAM to reload while simultaneously scanning with the previous ruleset,
> but something like a --test-ruleset option could probably handle that.

TBH I'd prefer if Clam *did* continue, just skipping malformed rules
(and also whinging loudly in the log).

Either would be better than just exiting (it's not a hard *crash*, it's
"just" refusing to load a file with a malformed signature - including
things like entirely blank lines).


> While I was looking at this I also came upon another quirk that can be
> a bit of a nuisance.  AFAICT Yara strings can only be delimited by one
> of two characters, either a double-quote (for a literal string) or a
> forward-slash (for a regex).  It would help to be able to choose the
> quote character like in Perl; if not, at least having more available
> to choose from could make many expressions more readable, especially
> those which target e.g. HTML and links in mail (both of which tend to
> have many occurrences of double-quote or forward-slash characters).

Strictly speaking, four characters (the {} delimiters for hex strings).
To my reading this is part of the upstream Yara spec, and I'd be wary of
extending this particular bit without at least requiring some blatant,
obvious flag in any such rule to clearly indicate that it's not stock
Yara syntax.

-kgd

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] human friendly signatures [ In reply to ]
Hi there,

On Mon, 21 Mar 2022, Kris Deugau wrote:

> TBH I'd prefer if Clam *did* continue, just skipping malformed rules
> (and also whinging loudly in the log).

I could live with that if it didn't *also* crash.

> Either would be better than just exiting (it's not a hard *crash*,
> it's "just" refusing to load a file with a malformed signature -
> including things like entirely blank lines).

No, Kris. It *is* a hard crash - and it doesn't happen when it loads
the rules, it happens when it tries to scan something *after* loading
a Yara file which contains a bad rule. Not neccessarily any bad rule,
just one with any of a number of different kinds of badness which I've
found to be problematic. But as I said in my mail things may well be
different as a result of Micah's August PR. TBH I really haven't been
inclined for quite some time to crash clamd on purpose. :)

> Strictly speaking, four characters (the {} delimiters for hex
> strings). To my reading this is part of the upstream Yara spec, and
> I'd be wary of extending this particular bit without at least
> requiring some blatant, obvious flag in any such rule to clearly
> indicate that it's not stock Yara syntax.

Agreed it needs some thought. Maybe a different filename extension?
Not that I'm a great fan of systems which rely on filename extensions
to control the behaviour of executables. Or maybe persuade the folks
upstream to make some enhancements? That would be best, I think, but
it presupposes that the ClamAV Yara engine catches up - which IMHO is
a necessity in any case.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] human friendly signatures [ In reply to ]
G.W. Haywood via clamav-users wrote:
> Hi there,
>
> On Mon, 21 Mar 2022, Kris Deugau wrote:
>
>> TBH I'd prefer if Clam *did* continue, just skipping malformed rules
>> (and also whinging loudly in the log).
>
> I could live with that if it didn't *also* crash.
>
>> Either would be better than just exiting (it's not a hard *crash*,
>> it's "just" refusing to load a file with a malformed signature -
>> including things like entirely blank lines).
>
> No, Kris.  It *is* a hard crash - and it doesn't happen when it loads
> the rules, it happens when it tries to scan something *after* loading
> a Yara file which contains a bad rule.  Not neccessarily any bad rule,
> just one with any of a number of different kinds of badness which I've
> found to be problematic.  But as I said in my mail things may well be
> different as a result of Micah's August PR.  TBH I really haven't been
> inclined for quite some time to crash clamd on purpose. :)

Sorry, didn't see that, figured you were talking about the joy of
finding all those subtle little rules defining a well-formed signature
To date I haven't managed to trip whatever bug(s) bit you, although I
*have* found relatively simple signatures that should have matched but
didn't.

I *have* pushed out "malformed" "signatures" (AKA "signature files with
a blank line or two at the end") that caused the production clamd
instances to shut down... after which I spent some time adding
validation to the SVN commit hook, and writing a local editing wrapper
to help make sure signatures were valid before committing.

-kgd

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] human friendly signatures [ In reply to ]
Hi there,

This is a more or less random data point.

On Mon, 14 Mar 2022, Micah Snyder (micasnyd) via clamav-users wrote:

> Sorry that this response come so late that is nearly a necro-thread. ...

Er, ditto.

> ... If anyone has any other ideas about it, I'd love to hear them. ...

One thing has become much more obvious lately here and I felt the need
to get it written down somewhere.

We're seeing a lot more spam than ever we used to which is written in
CJKV (Chinese, Japanese, Korean, Vietnamese) using UTF-8 encoding.
It's mostly phishing of some sort.

We use UTF-8 text strings in Yara rules to catch a lot of this spam
for our automatic abuse reporting system.

Obviously to make things human friendly it helps a lot if the terminal
emulators, editors and other tools can render the text as appropriate,
but my point is that, however you manipulate Yara rules for ClamAV, as
things are they work fine for this purpose and I'd really hate to lose
that capability.

--

73,
Ged.
_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/Cisco-Talos/clamav-documentation

https://docs.clamav.net/#mailing-lists-and-chat