Mailing List Archive

[clamav-users] Minor bug or working as intended?
After chasing docs back and forth and trying small variations, I think
I've found what's arguably a bug in Clam's YARA implementation.

These two YARA rules should both match exactly the same, but don't. The
first will only match if the condition is changed to indicate a single
match in some variation (either "#a > 0" or just "$a" both match).

rule data1 {
strings:
$a = /<script type="text\/javascript">functionsendemail.?\(\)\{/
condition:
#a > 3
}

rule data2 {
strings:
$a = { 3c 73 63 72 69 70 74 20 74 79 70 65 3d 22 74 65 78 74 2f 6a
61 76 61 73 63 72 69 70 74 22 3e 66 75 6e 63 74 69 6f 6e 73 65 6e 64 65
6d 61 69 6c [0-1] 28 29 7b }
condition:
#a > 3
}

While chasing this back and forth I confirmed that simple text string
repetition also works fine. I also confirmed that individual regex
matches on each observed character variation in the sample file also
worked, including when bundled into a single rule with a condition of
"all of them", so it's not that it couldn't match any particular
expected instance of the string.

-kgd

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Hi there,

On Thu, 24 Feb 2022, Kris Deugau wrote:

> After chasing docs back and forth and trying small variations, I think I've
> found what's arguably a bug in Clam's YARA implementation.
> ...

You too, huh?

In my experience ClamAV's Yara implementation is absolutely riddled.
It's so bad (and *years* out of date) that I don't think it would be
worth the effort of trying to fix it. I'd say start again from
scratch.

I've eventually settled on a way of living with it which is basically
"don't try anything fancy". If you're not careful it crashes clamd.
Most of the time it seems to manage simple regexes reasonably well,
but one example of fancy things not to try would be leaving out the
case-insensitive match modifier 'nocase'.

Having said that when you get it settled it does do good work. Here,
with a few hundred well-chosen strings in a couple of dozen rules, it
catches far more spam than anything else. We don't see much malware
in our mail, so I haven't spent much time on non-text matching and
can't offer much insight into how well it might do there.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Pretty sure you can write what you’re trying to look for with an ldb signature anyway.


Sent from my ? iPhone

> On Feb 24, 2022, at 18:53, G.W. Haywood via clamav-users <clamav-users@lists.clamav.net> wrote:
>
> ?Hi there,
>
>> On Thu, 24 Feb 2022, Kris Deugau wrote:
>>
>> After chasing docs back and forth and trying small variations, I think I've found what's arguably a bug in Clam's YARA implementation.
>> ...
>
> You too, huh?
>
> In my experience ClamAV's Yara implementation is absolutely riddled.
> It's so bad (and *years* out of date) that I don't think it would be
> worth the effort of trying to fix it. I'd say start again from
> scratch.
>
> I've eventually settled on a way of living with it which is basically
> "don't try anything fancy". If you're not careful it crashes clamd.
> Most of the time it seems to manage simple regexes reasonably well,
> but one example of fancy things not to try would be leaving out the
> case-insensitive match modifier 'nocase'.
>
> Having said that when you get it settled it does do good work. Here,
> with a few hundred well-chosen strings in a couple of dozen rules, it
> catches far more spam than anything else. We don't see much malware
> in our mail, so I haven't spent much time on non-text matching and
> can't offer much insight into how well it might do there.
>
> --
>
> 73,
> Ged.
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Hi there,

On Fri, 25 Feb 2022, Joel Esler via clamav-users wrote:

> Pretty sure you can write what you’re trying to look for with an ldb
> signature anyway.

One can write an LDB signature which might look like this:

8<----------------------------------------------------------------------
clamav-fullword-B;Engine:81-255,Target:0;0&1;414141;68656c6c6f::fi
8<----------------------------------------------------------------------

or the same with Yara in something which looks a bit like this:

8<----------------------------------------------------------------------
rule AAA_and_hello
{
strings:
$A = "AAA"
$B = "hello"
condition:
all of them
}
8<----------------------------------------------------------------------

Efficiency/reliability aside, I know what I prefer for readability,
ease of maintenance and modification, combination with other rules
(e.g. for whitelisting), ...

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
There's not a lot that you can do in Yara rules that you can't do in LDB
sigs... for what it's worth, here's a logical sig that detects the same
thing as the Yara rules...

mbroekman@lothlorien:~$ grep MJB.JS.SendEmail clamdb/javascript_sigs.ldb|
sigtool --decode-sigs
VIRUS NAME: MJB.JS.SendEmailFunc-0
TDB: Engine:90-255,Target:0
LOGICAL EXPRESSION: 0>3
* SUBSIG ID 0
+-> OFFSET: ANY
+-> SIGMOD: NOCASE
+-> DECODED SUBSIGNATURE:
<script{WILDCARD_ANY_STRING(LENGTH<=1)}type="text/javascript">{WILDCARD_ANY_STRING(LENGTH<=1)}function{WILDCARD_ANY_STRING(LENGTH<=1)}sendemail{WILDCARD_ANY_STRING(LENGTH<=1)}(){

mbroekman@lothlorien:~$ grep MJB.JS.SendEmail clamdb/javascript_sigs.ldb

MJB.JS.SendEmailFunc-0;Engine:90-255,Target:0;0>3;3c736372697074{-1}747970653d22746578742f6a617661736372697074223e{-1}66756e6374696f6e{-1}73656e64656d61696c{-1}28297b::i

mbroekman@lothlorien:~$ cat testfile
<script type="text/javascript">functionsendemail (){ }</script>
<script type="text/javascript">functionsendemail(){ }</script>
<script type="text/javascript">functionsendemail (){ }</script>
<script type="text/javascript">functionsendemail(){ }</script>

mbroekman@lothlorien:~$ clamscan --quiet testfile
mbroekman@lothlorien:~$ echo $?
1

mbroekman@lothlorien:~$ clamscan testfile
Loading: 10s, ETA: 0s [========================>] 8.61M/8.61M sigs

Compiling: 2s, ETA: 0s [========================>] 41/41 tasks

/Users/mbroekman/testfile: MJB.JS.SendEmailFunc-0.UNOFFICIAL FOUND

----------- SCAN SUMMARY -----------
Known viruses: 8606446
Engine version: 0.104.2
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 0.00 MB
Data read: 0.00 MB (ratio 0.00:1)
Time: 12.433 sec (0 m 12 s)
Start Date: 2022:02:25 10:54:32
End Date: 2022:02:25 10:54:45


On Fri, Feb 25, 2022 at 7:00 AM Joel Esler via clamav-users <
clamav-users@lists.clamav.net> wrote:

> Pretty sure you can write what you’re trying to look for with an ldb
> signature anyway.
>
> —
> Sent from my ? iPhone
>
> > On Feb 24, 2022, at 18:53, G.W. Haywood via clamav-users <
> clamav-users@lists.clamav.net> wrote:
> >
> > ?Hi there,
> >
> >> On Thu, 24 Feb 2022, Kris Deugau wrote:
> >>
> >> After chasing docs back and forth and trying small variations, I think
> I've found what's arguably a bug in Clam's YARA implementation.
> >> ...
> >
> > You too, huh?
> >
> > In my experience ClamAV's Yara implementation is absolutely riddled.
> > It's so bad (and *years* out of date) that I don't think it would be
> > worth the effort of trying to fix it. I'd say start again from
> > scratch.
> >
> > I've eventually settled on a way of living with it which is basically
> > "don't try anything fancy". If you're not careful it crashes clamd.
> > Most of the time it seems to manage simple regexes reasonably well,
> > but one example of fancy things not to try would be leaving out the
> > case-insensitive match modifier 'nocase'.
> >
> > Having said that when you get it settled it does do good work. Here,
> > with a few hundred well-chosen strings in a couple of dozen rules, it
> > catches far more spam than anything else. We don't see much malware
> > in our mail, so I haven't spent much time on non-text matching and
> > can't offer much insight into how well it might do there.
> >
> > --
> >
> > 73,
> > Ged.
> >
> > _______________________________________________
> >
> > clamav-users mailing list
> > clamav-users@lists.clamav.net
> > https://lists.clamav.net/mailman/listinfo/clamav-users
> >
> >
> > Help us build a comprehensive ClamAV guide:
> > https://github.com/vrtadmin/clamav-faq
> >
> > http://www.clamav.net/contact.html#ml
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net
> https://lists.clamav.net/mailman/listinfo/clamav-users
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
>
> http://www.clamav.net/contact.html#ml
>
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Dear Kris,

I've had the same issue. In the last two years, I was regularly writing YARA sigs in ClamAV and finding that it behaves in strange ways... Especially the regex integration.

I specifically remember that counting regex wasn't possible and that I had to write those sigs either in strings or HEX.

After too many timeouts and strange stuff, I decided to rewrite all of the sigs I had written to LDB. It's not easy to read, less fun to write... but damn it's much more reliable and fast.

Here's what your sig could look like:

KGD.LDB.JS.SENDEMAIL;Engine:81-255,Target:3;0>3;3c73637269707420747970653d22746578742f6a617661736372697074223e66756e6374696f6e73656e64656d61696c{0-1}28297b

I took the liberty to define Target:3 (HTML). You might need to change that. Adding more criteria might be good too.

Best,
Laurent

PS: This YARA might technically work, but might cost you lots of CPU:
$a3 = /(<script type="text\/javascript">functionsendemail.?\(\)\{.*){3}/
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
On Fri, 25 Feb 2022, Laurent S. via clamav-users wrote:

> I've had the same issue. In the last two years, I was regularly
> writing YARA sigs in ClamAV and finding that it behaves in strange
> ways... Especially the regex integration.
... ...
> After too many timeouts and strange stuff, I decided to rewrite all
> of the sigs I had written to LDB. It's not easy to read, less fun to
> write... but damn it's much more reliable and fast.

Did you build any tools to help with your rewrite ?
If so they might be a good starting point for a YARA <-> LDB convertor,
which sounds like a useful project.

--
Andrew C. Aitchison Kendal, UK
andrew@aitchison.me.uk

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Maarten Broekman via clamav-users wrote:
> There's not a lot that you can do in Yara rules that you can't do in LDB
> sigs... for what it's worth, here's a logical sig that detects the same
> thing as the Yara rules...
>
> mbroekman@lothlorien:~$ grep MJB.JS.SendEmail
> clamdb/javascript_sigs.ldb| sigtool --decode-sigs
> VIRUS NAME: MJB.JS.SendEmailFunc-0
> TDB: Engine:90-255,Target:0
> LOGICAL EXPRESSION: 0>3
>  * SUBSIG ID 0
>  +-> OFFSET: ANY
>  +-> SIGMOD: NOCASE
>  +-> DECODED SUBSIGNATURE:
> <script{WILDCARD_ANY_STRING(LENGTH<=1)}type="text/javascript">{WILDCARD_ANY_STRING(LENGTH<=1)}function{WILDCARD_ANY_STRING(LENGTH<=1)}sendemail{WILDCARD_ANY_STRING(LENGTH<=1)}(){
>
> mbroekman@lothlorien:~$ grep MJB.JS.SendEmail clamdb/javascript_sigs.ldb
> MJB.JS.SendEmailFunc-0;Engine:90-255,Target:0;0>3;3c736372697074{-1}747970653d22746578742f6a617661736372697074223e{-1}66756e6374696f6e{-1}73656e64656d61696c{-1}28297b::i

*nods* Thanks. As it was I kept at it until I did actually have a full
Yara signature that matched as intended working around the broken
repetition condition with the hex string instead of the regex.

.ldb signatures could definitely use more expansive documentation; the
examples in the PDF are really pretty simple. Earlier on I had also
tripped over (among other things) what might be the correct syntax for
multiple regex matches triggered by the same "hey, wake up!"
subsignature. (I'm not sure I understand why that's needed, it seems
rather awkward.)

I'll have to remember to try {-1} more often. This isn't the first time
I've wanted to match a character that may not be there, although I also
usually also want to restrict matching to a subset of characters, not
"any byte" (which is why I reached for the regex match in both my
attempts at an ldb signature, and in the Yara signature).

-kgd

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Laurent S. via clamav-users wrote:
> Dear Kris,
>
> I've had the same issue. In the last two years, I was regularly writing YARA sigs in ClamAV and finding that it behaves in strange ways... Especially the regex integration.
>
> I specifically remember that counting regex wasn't possible and that I had to write those sigs either in strings or HEX.
>
> After too many timeouts and strange stuff, I decided to rewrite all of the sigs I had written to LDB. It's not easy to read, less fun to write... but damn it's much more reliable and fast.
>
> Here's what your sig could look like:
>
> KGD.LDB.JS.SENDEMAIL;Engine:81-255,Target:3;0>3;3c73637269707420747970653d22746578742f6a617661736372697074223e66756e6374696f6e73656e64656d61696c{0-1}28297b
>
> I took the liberty to define Target:3 (HTML). You might need to change that. Adding more criteria might be good too.

*nod* I kept at it and the full Yara sig I eventually pushed live has
10 strings, requiring layered sets of multi-hit matches. (Finding a
valid syntax just for those conditions alone was a bit tedious; it's
not clear from the upstream Yara docs or Clam's brief commentary whether
you can nest conditions as pseudo-strings[1], but bumping the total
match count required and just and'ing the sub-count conditions was Good
Enough.)

-kgd

[1] Available indications say "you can't", although supposedly you can
reference other Yara signatures - tried, couldn't get that working either

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
On Friday, February 25th, 2022 at 17:39, Andrew C Aitchison <clamav@aitchison.me.uk> wrote:

> Did you build any tools to help with your rewrite ?
>

> If so they might be a starting point for a YARA <-> LDB convertor,
>

> which sounds like a useful project.

Sorry, no I did it by hand. I took the opportunity to always add the corresponding target file, reorganize, optimize, remove obsolete stuff... It was only about 50 sigs.

I personally think a better project for the community would be to improve YARA in ClamAV instead of an automatic converter.

Best,
Laurent
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Hi there,

On Fri, 25 Feb 2022, Laurent S. via clamav-users wrote:

> I've had the same issue. In the last two years, I was regularly
> writing YARA sigs in ClamAV and finding that it behaves in strange
> ways... Especially the regex integration.
>
> I specifically remember that counting regex wasn't possible and that
> I had to write those sigs either in strings or HEX.
>
> After too many timeouts and strange stuff ...

Sounds like you and I have been through the same pain.

> I decided to rewrite all of the sigs I had written to LDB. It's not
> easy to read, less fun to write... but damn it's much more reliable
> and fast.

Execution time will be important for scanning filesystems, less so for
scanning mail (at least for scanning low-volume mail) and readability
can be hugely important if you're writing a lot of rules. Perhaps we
should be asking the development team for readable LDB rules? :)

> PS: This YARA might technically work, but might cost you lots of CPU:
> $a3 = /(<script type="text\/javascript">functionsendemail.?\(\)\{.*){3}/

I think it's generally best to avoid things like '.*' in Yara rules,
and possibly in regexes in general for use in scanning. Even in mail
you can find yourself scanning fairly big base64-encoded texts which
are never going to match but still cost CPU, but in a filesystem there
may be files of gigabytes+ and some regexes will be *very* expensive.

> I personally think a better project for the community would be to
> improve YARA in ClamAV ...

+1

If I'd had the time I'd have done it myself already.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
> Execution time will be important for scanning filesystems, less so for
> scanning mail (at least for scanning low-volume mail) and readability
> can be hugely important if you're writing a lot of rules. Perhaps we
> should be asking the development team for readable LDB rules? :)

Creating a new "human readable", or "human friendly", signature language is something that I've brought up many times this past 6 months in our team meetings. I think it's more feasible than trying to make Yara rules fully functional in ClamAV, or than trying to make our signatures look the same as Yara.

I toyed a bit with using the KDL document language (https://github.com/kdl-org/kdl) as a base for a new format. My thought is it could be "compiled" or converted to more compact line of text prior to distribution, or unpacked/decompiled for readability as needed. I am hoping we can spend some time these next few months investigating it further, once 0.105 is out. With our Rust language integration working rather nicely these days, we should be able to leverage the language and library ecosystem for this effort making it far easier to implement than with C.

A disclaimer: This is purely brainstorming, and I have no idea if we would continue with the KDL idea or find something else. Here are some examples from my short time spent brainstorming this a few months back.

// example logical signature
Win.Trojan.Badness-98 type="logical" {
Engine 81 255
Target 0
terms {
x1 "41414141" type="hex" nocase=true ascii=true
x2 "deadbeefcafe" type="hex"
a1 "evil"
fih "ff000d0d80ff" type="phash"
}
Condition "( x1 or x2 ) and a1 and fih"
}

// example .ign signature
Its.Not.So.Bad-99 type="ignore-signature" "Win.Trojan.Badness-98"

// example .crb trusted cert
Trusted.CA.Microsoft-9875186-0 type="trusted-certificate" {
Engine 81 255
Subject "6a7c2a3146b0335e9a3e1f5fb193338cd71c072d"
Serial "b9e065ba400a6eec327f5b8a4f47faa169a87d8e"
PubKey "dd0cbba2e42e09e3e7c5f79669bc0021bd693333efad04cb5480ee0683bbc52084d9f7d28bf338b0aba4ad2d7c627905ffe34a3f04352070e3c4e76be09cc03675e98a31dd8d70e5dc37b5744696285b8760232cbfdc47a567f751279e72eb07a6c9b91e3b53357ce5d3ec27b9871cfeb9c923096fa84691c16e963c41d3cba33f5d026a4dec691f25285c36fffd43150a94e019b4cfdfc212e2c25b27ee2778308b5b2a096b22895360162cc0681d53baec49f39d618c85680973445d7da2542bdd79f715cf355d6c1c2b5ccebc9c238b6f6eb526d93613c34fd627aeb9323b41922ce1c7cd77e8aa544ef75c0b048765b44318a8b2e06d1977ec5a24fa4803"
Exponent "010001"
CodeSign true
TimeSign true
CertSign true
NotBefore 0
Comment "Microsoft Windows Production PCA 2011 SHA256 2011-2026 61:07:76:56:00:00:00:00:00:08"
}

// Example .crb revoked certificate
Blocklist.CRT.GluptebaRootkit-7910250-2 type="revoked-certificate" {
Engine 81 255
Subject "18df2f83e03d73694d4981d9ed4ac0b59c60ca3b"
Serial "1ff8f31990f3244c29c955b3b56e340c43061807"
PubKey "948adeb891aa3cca7db3fa09947f68db105ab50fccbf77cf43207d9c005ca24ecd35bd52ac0b4f0e48c77af7937d7185cc0c958551cb2b971892139c548b54bb50c96781dd3c6ade0ac2a0686efd5816ba68c144e24ae6579860de3daf70ac15b2332a5ff2874807a04983554f8e95ce034ac05c414fdc3e3f9f5eee778da849d8390d27876425d039c5cd70c6e710677ce9f63427771413f2d425fc4aac323fb5bf8905fa5df1895ec447d4bbff36001c8fdfb69d251f17befdb4fa1baf2dd4379a11935f9b9a6a47e5eee9ca2e84c5f96da9027f54f51ae85e7c250f423ac8de44d1a99aef9a9be014ef9b42794b01a6f2b297896583096233081fac6b4541"
Exponent "010001"
CodeSign true
TimeSign false // could be omitted to ignore
CertSign false // could be omitted to ignore
NotBefore 0 // could be omitted to ignore
Comment "" // could be omitted to ignore
}

I haven't thought through all the implications of allowing plaintext logical condition terms (called "subsignatures" in LDB signatures), such as how to escape the special characters used for wildcarding, ranges, etc. There is a lot to think about, but I feel this project is doable.

I am particularly interested in feedback from those of you who write ClamAV signatures regularly, and from those of you who are new to writing signatures and can more easily spot the sharp edges to which many of us have been desensitized.

Cheers,
Micah


Micah Snyder
ClamAV Development
Talos
Cisco Systems, Inc.
________________________________
From: clamav-users <clamav-users-bounces@lists.clamav.net> on behalf of G.W. Haywood via clamav-users <clamav-users@lists.clamav.net>
Sent: Saturday, February 26, 2022 1:56 AM
To: ClamAV users ML <clamav-users@lists.clamav.net>
Cc: G.W. Haywood <clamav@jubileegroup.co.uk>
Subject: Re: [clamav-users] Minor bug or working as intended?

Hi there,

On Fri, 25 Feb 2022, Laurent S. via clamav-users wrote:

> I've had the same issue. In the last two years, I was regularly
> writing YARA sigs in ClamAV and finding that it behaves in strange
> ways... Especially the regex integration.
>
> I specifically remember that counting regex wasn't possible and that
> I had to write those sigs either in strings or HEX.
>
> After too many timeouts and strange stuff ...

Sounds like you and I have been through the same pain.

> I decided to rewrite all of the sigs I had written to LDB. It's not
> easy to read, less fun to write... but damn it's much more reliable
> and fast.

Execution time will be important for scanning filesystems, less so for
scanning mail (at least for scanning low-volume mail) and readability
can be hugely important if you're writing a lot of rules. Perhaps we
should be asking the development team for readable LDB rules? :)

> PS: This YARA might technically work, but might cost you lots of CPU:
> $a3 = /(<script type="text\/javascript">functionsendemail.?\(\)\{.*){3}/

I think it's generally best to avoid things like '.*' in Yara rules,
and possibly in regexes in general for use in scanning. Even in mail
you can find yourself scanning fairly big base64-encoded texts which
are never going to match but still cost CPU, but in a filesystem there
may be files of gigabytes+ and some regexes will be *very* expensive.

> I personally think a better project for the community would be to
> improve YARA in ClamAV ...

+1

If I'd had the time I'd have done it myself already.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Hi Micah,

On Tue, 1 Mar 2022, Micah Snyder (micasnyd) via clamav-users wrote:

>> ... Perhaps we should be asking the development team for readable LDB rules? :)
>
> Creating a new "human readable", or "human friendly", signature
> language is something that I've brought up many times this past 6
> months in our team meetings. I think it's more feasible than trying
> to make Yara rules fully functional in ClamAV, or than trying to
> make our signatures look the same as Yara.

My feeling is that quite possibly you're right, but I would want to
investigate the possibilities if any of some kind of Yara library.
With admittedly a few reservations, I like Yara rules a lot for their
simplicity and readability. I'd like a bit more flexibility, and the
odd feature - more on that later.

> I toyed a bit with using the KDL document language ...
> (https://github.com/kdl-org/kdl) as a base for a new format. My
> thought is it could be "compiled" or converted to more compact line
> of text prior to distribution, or unpacked/decompiled for
> readability as needed. I am hoping we can spend some time these
> next few months investigating it further, once 0.105 is out. With
> our Rust language integration working rather nicely these days, we
> should be able to leverage the language and library ecosystem for
> this effort making it far easier to implement than with C.

Heh, I used to write C libraries so that I could still use C. :/

> A disclaimer: This is purely brainstorming ...

Understood.

> // example logical signature
> ...
> // example .ign signature
> ...
> // example .crb trusted cert
> ...
> // Example .crb revoked certificate
> ...

Weeeellllll... an improvement, but _still_ butt ugly compared to Yara.

> I haven't thought through all the implications of allowing plaintext
> ... but I feel this project is doable.

:)

> I am particularly interested in feedback from those of you who write
> ClamAV signatures regularly ...

Count me in. You must know by now that I only scan emails. One of
the things I'd love to have from Yara/whatever which I don't have at
present is something to declare rules which will only match headers or
bodies in emails. If you draped me over a cauldron of boiling oil I
could do it with a regex but it would make life extremely tedious. I
split messages into header+body in a milter and scan them separately
anyway but to do what I really want ATM would need two copies of clamd
using one to scan the headers and the other to scan the bodies. I've
occasionally run more than one clamd for testing, but so far resisted
the temptation to do that in production. Another is some sort of test
facility so you could run samples through a new ruleset before it goes
live on the clamd server *and* get verbose feedback about the matching
process which you wouldn't want when it's live. Oh, and a parser that
actually notices if there are missing curly braces... :/

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Micah Snyder (micasnyd) via clamav-users wrote:
> G.W. Haywood wrote:
>> Execution time will be important for scanning filesystems, less so for
>> scanning mail (at least for scanning low-volume mail) and readability
>> can be hugely important if you're writing a lot of rules.? Perhaps we
>> should be asking the development team for readable LDB rules? :)
>
> Creating a new "human readable", or "human friendly", signature language
> is something that I've brought up many times this past 6 months in our
> team meetings.? I think it's more feasible than trying to make Yara
> rules fully functional in ClamAV, or than trying to make our signatures
> look the same as Yara.
>
> I toyed a bit with using the KDL document language
> (https://github.com/kdl-org/kdl) as a base for a new format.? My thought
> is it could be "compiled" or converted to more compact line of text
> prior to distribution, or unpacked/decompiled for readability as
> needed.? I am hoping we can spend some time these next few months
> investigating it further, once 0.105 is out.? With our Rust language
> integration working rather nicely these days, we should be able to
> leverage the language and library ecosystem for this effort making it
> far easier to implement than with C.

For some types of content, just allowing a plain ASCII string instead of
the hex-coded version of the same would be a big help. Or an
enhancement in the current file formats allowing embedded comments -
I've lost track of how many times I've created something complex, and
had to reconstruct whatever logic I used to create it to make a tweak or
refinement - or just gave up and created a new signature - because
there's no way to document it in-band. Ignoring empty lines -
especially at the end of the signature file! - instead of just claiming
"invalid signature" would ease editing.


> A disclaimer: This is purely brainstorming, and I have no idea if we
> would continue with the KDL idea or find something else.? Here are some
> examples from my short time spent brainstorming this a few months back.
>
> // example logical signature
[snip]

TBH that looks almost identical to the Yara rule syntax at a quick look.
Hard to say whether it would be better to spend time spinning up yet
another signature format, or fixing edge cases in one that's already
present and in use.

-kgd

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Hi there,

On Wed, 2 Mar 2022, Kris Deugau wrote:
> Micah Snyder (micasnyd) via clamav-users wrote:
>>
>> ... some examples from my short time spent brainstorming
>> this a few months back.
>>
>> // example logical signature
> [snip]
>
> TBH that looks almost identical to the Yara rule syntax at a quick look.

Very similar, but I don't know if you could refer to one rule from
another rule? I use that feature all the time with Yara. Very handy,
but in fact the 64 string-per-Yara-rule limit imposed by ClamAV makes
it essential.

> Hard to say whether it would be better to spend time spinning up yet
> another signature format, or fixing edge cases in one that's already
> present and in use.

Exactly how I feel, it's hard to say. I'm torn between cutting/losses
and babies/bathwater. But if there's a plug-in Yara engine library of
some description that's anything like up to date and can be shoehorned
in easily it has to be worth a shot. Something like this

https://rustrepo.com/repo/Hugal31-yara-rust-rust-security-tools

given that Rust is where it's going?

Earler today for this thread I was looking at some history. FWIW for
the past year I've averaged about 1.25 Yara rule edits per day.

Perhaps we should take this to the dev list.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
Kris Deugau wrote:
> For some types of content, just allowing a plain ASCII string instead of
> the hex-coded version of the same would be a big help.  Or an
> enhancement in the current file formats allowing embedded comments -
> I've lost track of how many times I've created something complex, and
> had to reconstruct whatever logic I used to create it to make a tweak or
> refinement - or just gave up and created a new signature - because
> there's no way to document it in-band.  Ignoring empty lines -
> especially at the end of the signature file! - instead of just claiming
> "invalid signature" would ease editing.

One other pain point I've run into fairly regularly is that there's no
way to have a *specific signature* match on the raw file - either you
run your entire Clam instance without all of the content unpacking and
normalization, and *all* your signatures need to be based on the raw
files, or you run with the content unpacking enabled and have to bend
and contort to match a perfectly simple chunk of data that's been
variously mangled by one or both of the unpacking and normalization.

I've just found a new case - some malware spewer has embedded a
password-protected .zip as the base-64-encoded data attribute of an
iframe tag in a .html attachment. (Ow.) One of the chunks I want to
match on is:

<iframe src[equals]"data:application/x-zip-compressed;

(lightly obfuscated in case of someone else who's already been here),
but the entire unpacked/normalized "nocomment.html" from clamscan
--leave-temps is:

<head><title>img0457600xls</title></head><body><p>password is
52266</p><iframe src="data"style="border:none;
height:100%;width:100%;"</iframe></body></html>

The normalized HTML and the bit that indicates this is a .zip are in
complete separate files in the unpacked/normalized data, so matching all
the pieces I want to match at the same time is going to be tricky at best.

This particular sample is small enough that the message would be passed
to SpamAssassin (the whole original message is about 26k), where I can
match what I want to match on quite easily. But that's not always the case.

-kgd

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] Minor bug or working as intended? [ In reply to ]
On Wednesday, March 2nd, 2022 at 18:37, Kris Deugau <kdeugau@vianet.ca> wrote:

> Kris Deugau wrote:
>

> > For some types of content, just allowing a plain ASCII string instead of
> >

> > the hex-coded version of the same would be a big help. Or an
> >

> > enhancement in the current file formats allowing embedded comments -
> >

> > I've lost track of how many times I've created something complex, and
> >

> > had to reconstruct whatever logic I used to create it to make a tweak or
> >

> > refinement - or just gave up and created a new signature - because
> >

> > there's no way to document it in-band. Ignoring empty lines -
> >

> > especially at the end of the signature file! - instead of just claiming
> >

> > "invalid signature" would ease editing.
>


We are using a small GUI where you can create (as strings) and view (as strings) all the hundreds of simple .ndb signatures with a database and automatic expiration. But yes, having a similar sig format where you can just input strings... that could be easier to manage.

Concerning KDL, I'd really prefer a reliable implementation of YARA for the compatibility with other softwares. There are plenty of yara rules on the web, and it would be awesome to be able to import them easily.

Best,
Laurent

PS: Sorry for replying late