Mailing List Archive

[clamav-users] unexplainable tar behaviour
We've a really unexplainable behaviour related to clamdscan and tar.

There's a tree of subdirs and files.

If I tar the complete tree and scan it with 'clamdscan -v --fdpass all.tar' an infected file is reported: 'Java.Trojan.Agent-36975 FOUND'.

If I tar all subdirs of the first level in separate tars and scan them, all of them are reported OK. Same if I scan all files one by one.

So where's the infected file report is coming from? Any ideas?

Environment:

# lsb_release -a
LSB Version: n/a
Distributor ID: openSUSE
Description: openSUSE Leap 15.1
Release: 15.1
Codename: n/a
# rpm -q -i clamav
Name : clamav
Version : 0.101.4
Release : lp151.205.1
Architecture: x86_64
Install Date: Mo 28 Okt 2019 16:03:42 CET
Group : Productivity/Security
Size : 2383988
License : GPL-2.0-only
Signature : RSA/SHA256, Fr 25 Okt 2019 16:59:46 CEST, Key ID 69d1b2aaee3d166a
Source RPM : clamav-0.101.4-lp151.205.1.src.rpm
Build Date : Fr 25 Okt 2019 16:59:23 CEST
Build Host : lamb53
Relocations : (not relocatable)
Vendor : obs://build.opensuse.org/security
URL : http://www.clamav.net
Summary : Antivirus Toolkit

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
[clamav-users] unexplainable tar behaviour [ In reply to ]
We've a really unexplainable behaviour related to clamdscan and tar.

There's a tree of subdirs and files.

If I tar the complete tree and scan it with 'clamdscan -v --fdpass all.tar' an infected file is reported: 'Java.Trojan.Agent-36975 FOUND'.

If I tar all subdirs of the first level in separate tars and scan them, all of them are reported OK. Same if I scan all files one by one.

So where's the infected file report is coming from? Any ideas?

Environment:

# lsb_release -a
LSB Version: n/a
Distributor ID: openSUSE
Description: openSUSE Leap 15.1
Release: 15.1
Codename: n/a
# rpm -q -i clamav
Name : clamav
Version : 0.101.4
Release : lp151.205.1
Architecture: x86_64
Install Date: Mo 28 Okt 2019 16:03:42 CET
Group : Productivity/Security
Size : 2383988
License : GPL-2.0-only
Signature : RSA/SHA256, Fr 25 Okt 2019 16:59:46 CEST, Key ID 69d1b2aaee3d166a
Source RPM : clamav-0.101.4-lp151.205.1.src.rpm
Build Date : Fr 25 Okt 2019 16:59:23 CEST
Build Host : lamb53
Relocations : (not relocatable)
Vendor : obs://build.opensuse.org/security
URL : http://www.clamav.net
Summary : Antivirus Toolkit

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
All I can add to the discussion is a slightly obfuscated dump of the signature, which is in main.ndb and was added on Apr 13, 2016:

> VIRUS NAME: Java.Trojan.Agent-36975
> TARGET TYPE: ANY FILE
> OFFSET: *
> DECODED SIGNATURE:
> java*lang*String{WILDCARD_ANY_STRING}writeEmbeddedFile{WILDCARD_ANY_STRING}LPORT{WILDCARD_ANY_STRING}LHOST

I substituted "*" for "/" in the signature in order to prevent this message from being detected in route.

-Al-

On Tue, Oct 29, 2019 at 01:06 AM, Steffen Sledz wrote:
> We've a really unexplainable behaviour related to clamdscan and tar.
>
> There's a tree of subdirs and files.
>
> If I tar the complete tree and scan it with 'clamdscan -v --fdpass all.tar' an infected file is reported: 'Java.Trojan.Agent-36975 FOUND'.
>
> If I tar all subdirs of the first level in separate tars and scan them, all of them are reported OK. Same if I scan all files one by one.
>
> So where's the infected file report is coming from? Any ideas?
>
> Environment:
>
> # lsb_release -a
> LSB Version: n/a
> Distributor ID: openSUSE
> Description: openSUSE Leap 15.1
> Release: 15.1
> Codename: n/a
> # rpm -q -i clamav
> Name : clamav
> Version : 0.101.4
> Release : lp151.205.1
> Architecture: x86_64
> Install Date: Mo 28 Okt 2019 16:03:42 CET
> Group : Productivity/Security
> Size : 2383988
> License : GPL-2.0-only
> Signature : RSA/SHA256, Fr 25 Okt 2019 16:59:46 CEST, Key ID 69d1b2aaee3d166a
> Source RPM : clamav-0.101.4-lp151.205.1.src.rpm
> Build Date : Fr 25 Okt 2019 16:59:23 CEST
> Build Host : lamb53
> Relocations : (not relocatable)
> Vendor : obs://build.opensuse.org/security <obs://build.opensuse.org/security>
> URL : http://www.clamav.net <http://www.clamav.net/>
> Summary : Antivirus Toolkit
>
> _______________________________________________
>
> clamav-users mailing list
> clamav-users@lists.clamav.net <mailto:clamav-users@lists.clamav.net>
> https://lists.clamav.net/mailman/listinfo/clamav-users <https://lists.clamav.net/mailman/listinfo/clamav-users>
>
>
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq <https://github.com/vrtadmin/clamav-faq>
>
> http://www.clamav.net/contact.html#ml <http://www.clamav.net/contact.html#ml>
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On Tue, 29 Oct 2019, Steffen Sledz wrote:

> We've a really unexplainable behaviour related to clamdscan and tar.
>
> There's a tree of subdirs and files.
>
> If I tar the complete tree and scan it with 'clamdscan -v --fdpass all.tar' an infected file is reported: 'Java.Trojan.Agent-36975 FOUND'.
>
> If I tar all subdirs of the first level in separate tars and scan them, all of them are reported OK. Same if I scan all files one by one.
>
> So where's the infected file report is coming from? Any ideas?

Try bisection. Divide the tar file in half (roughly) and see which
half triggers the detection in clamdscan. (If neither half does, split
the file somewhere else, say the first 1/4 and last 3/4.) The two
pieces won't be valid tar files any more, but that's okay since all you
care about is whether the virus scanner objects.

Keep doing this until you have a minimal file, that is, until removing
anything from the beginning or end will cause clamdscan not to detect a
problem. Then see what's in the file and compare it to the original
files and directories in the tree.

If you want, you can be a little more careful about how this is done.
For instance, just remove parts from the end of the file until
clamdscan says the file is okay. Then you'll know that the last piece
you removed matches part of the signature. And the remaining initial
segment of the file will still be a semi-valid tar archive, so you can
list the contents and see what the final entry in the archive is.

Then start removing parts from the front of the original file until
clamdscan says the remainder is okay. You'll know that the part you
removed matches the beginning of the signature. Take the part that you
removed and have tar list its contents; the last entry will be where
the signature starts.

Alan Stern


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On 10/29/2019 3:06 AM, Steffen Sledz wrote:
> We've a really unexplainable behaviour related to clamdscan and tar.
>
> There's a tree of subdirs and files.
>
> If I tar the complete tree and scan it with 'clamdscan -v --fdpass all.tar' an infected file is reported: 'Java.Trojan.Agent-36975 FOUND'.
>
> If I tar all subdirs of the first level in separate tars and scan them, all of them are reported OK. Same if I scan all files one by one.
>
> So where's the infected file report is coming from? Any ideas?
>


There is no virus. You're creating a false positive from scanning a
large blob of data where the signature picks up random bits from
different files.

{random data}{part of signature}{random data}{other part of
signature}...{repeat as needed}

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
I thought ClamAV unpacked TARs (and other archives) and looked at the
contents. If it doesn't, it wouldn't be very effective in detecting
viruses in compressed files.

How big is your file? Since ClamAV doesn't like files bigger than 4 GB,
if your file is bigger, I don't know for sure what happens. Maybe then
it doesn't really unpack the file, and thus might detect a "virus" in a
random subsequence of bytes.


On Tue, 29 Oct 2019 09:45:16 -0500
Noel Jones <njones@megan.vbhcs.org> wrote:

> On 10/29/2019 3:06 AM, Steffen Sledz wrote:
> > We've a really unexplainable behaviour related to clamdscan and tar.
> >
> > There's a tree of subdirs and files.
> >
> > If I tar the complete tree and scan it with 'clamdscan -v --fdpass
> > all.tar' an infected file is reported: 'Java.Trojan.Agent-36975
> > FOUND'.
> >
> > If I tar all subdirs of the first level in separate tars and scan
> > them, all of them are reported OK. Same if I scan all files one by
> > one.
> >
> > So where's the infected file report is coming from? Any ideas?
> >
>
>
> There is no virus. You're creating a false positive from scanning a
> large blob of data where the signature picks up random bits from
> different files.
>
> {random data}{part of signature}{random data}{other part of
> signature}...{repeat as needed}


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On 30.10.19 03:34, Paul Kosinski via clamav-users wrote:
> How big is your file? Since ClamAV doesn't like files bigger than 4 GB,
> if your file is bigger, I don't know for sure what happens. Maybe then
> it doesn't really unpack the file, and thus might detect a "virus" in a
> random subsequence of bytes.

It's about 160MB.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On 29.10.19 15:10, Alan Stern wrote:
> Try bisection...

That makes things even more confusing.

I have shared the tar twice with different ratios. But the individual parts are all reported as clean.

# split -b 80M all.tar all
# ll
total 445768
-rw-r--r-- 1 root root 83886080 30. Okt 07:57 allaa
-rw-r--r-- 1 root root 80998400 30. Okt 07:57 allab
-rw-r--r-- 1 root root 164884480 29. Okt 08:00 all.tar
# clamdscan -v --fdpass all*
/root/clamcheck/allaa: OK
/root/clamcheck/allab: OK
/root/clamcheck/all.tar: Java.Trojan.Agent-36975 FOUND

----------- SCAN SUMMARY -----------
Infected files: 1
Time: 40.302 sec (0 m 40 s)

# split -b 77M all.tar all
# ll
total 445768
-rw-r--r-- 1 root root 80740352 30. Okt 08:15 allaa
-rw-r--r-- 1 root root 80740352 30. Okt 08:15 allab
-rw-r--r-- 1 root root 3403776 30. Okt 08:15 allac
-rw-r--r-- 1 root root 164884480 29. Okt 08:00 all.tar
# clamdscan -v --fdpass all*
/root/clamcheck/allaa: OK
/root/clamcheck/allab: OK
/root/clamcheck/allac: OK
/root/clamcheck/all.tar: Java.Trojan.Agent-36975 FOUND

----------- SCAN SUMMARY -----------
Infected files: 1
Time: 41.426 sec (0 m 41 s)

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
Hi there,

On Wed, 30 Oct 2019, Steffen Sledz wrote:
> On 29.10.19 15:10, Alan Stern wrote:
>> Try bisection...
>
> That makes things even more confusing.

I don't see what's confusing about this.

The match is just an expression. It isn't magic. You could do just
the same thing from the command line for example with 'grep' although
it might take a while and you might need to read up about expressions.
Then you'll see that the word 'unexplainable' is incorrect.

The replies from Mr. Varnell and Mr. Jones both point you in the right
direction, and Mr. Stern simply offered a methodical way of locating
the matching pieces in what might be an unwieldy file.

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On 30.10.19 13:03, G.W. Haywood via clamav-users wrote:
> I don't see what's confusing about this.
>
> The match is just an expression.  It isn't magic.  You could do just
> the same thing from the command line for example with 'grep' although
> it might take a while and you might need to read up about expressions.
> Then you'll see that the word 'unexplainable' is incorrect.
>
> The replies from Mr. Varnell and Mr. Jones both point you in the right
> direction, and Mr. Stern simply offered a methodical way of locating
> the matching pieces in what might be an unwieldy file.

Yes, but ...

> # split -b 80M all.tar all
> # ll
> total 445768
> -rw-r--r-- 1 root root 83886080 30. Okt 07:57 allaa
> -rw-r--r-- 1 root root 80998400 30. Okt 07:57 allab
> -rw-r--r-- 1 root root 164884480 29. Okt 08:00 all.tar
> # clamdscan -v --fdpass all*
> /root/clamcheck/allaa: OK
> /root/clamcheck/allab: OK
> /root/clamcheck/all.tar: Java.Trojan.Agent-36975 FOUND

So "the expression" matches in all.tar, but not in allaa and not in allab. Hmmm?

The expression could be partly in allaa and in allab. That's why I tried a different separation.

> # split -b 77M all.tar all
> # ll
> total 445768
> -rw-r--r-- 1 root root 80740352 30. Okt 08:15 allaa
> -rw-r--r-- 1 root root 80740352 30. Okt 08:15 allab
> -rw-r--r-- 1 root root 3403776 30. Okt 08:15 allac
> -rw-r--r-- 1 root root 164884480 29. Okt 08:00 all.tar
> # clamdscan -v --fdpass all*
> /root/clamcheck/allaa: OK
> /root/clamcheck/allab: OK
> /root/clamcheck/allac: OK
> /root/clamcheck/all.tar: Java.Trojan.Agent-36975 FOUND

Here "the expression" matches in all.tar, but not in allaa, not in allab, and not in allac. Hmmm again?

For me this is confusing!

Regards,
Steffen

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On 30/10/2019, 12:43, "clamav-users on behalf of Steffen Sledz" <clamav-users-bounces@lists.clamav.net on behalf of sledz@dresearch-fe.de> wrote:
> Here "the expression" matches in all.tar, but not in allaa, not in allab, and not in allac. Hmmm again?
>
> For me this is confusing!

If you look back at the response from Al Varnell, you'll see that the decoded signature has several parts, all joined together by wildcard matches.

It's quite plausible that the match is on the first few bytes, some bytes several megabytes later, some more bytes several megabytes later still, and then the last few bytes in the file.

If that's the case (and with a tar file that's reasonably plausible), then bisecting/dissecting your file means that the signature will never match. It will only match on the whole entire file.

There's a form here: https://www.clamav.net/reports/fp

...through which you can report false positives, but you will need to provide your file.

Graeme


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
On 30.10.19 13:52, Graeme Fowler via clamav-users wrote:
> If you look back at the response from Al Varnell, you'll see that the decoded signature has several parts, all joined together by wildcard matches.
>
> It's quite plausible that the match is on the first few bytes, some bytes several megabytes later, some more bytes several megabytes later still, and then the last few bytes in the file.
>
> If that's the case (and with a tar file that's reasonably plausible), then bisecting/dissecting your file means that the signature will never match. It will only match on the whole entire file.

Thank you very much for the explanation. Now I got it. ;-)

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
> I thought ClamAV unpacked TARs (and other archives) and looked at the
> contents. If it doesn't, it wouldn't be very effective in detecting
> viruses in compressed files.

I've been wondering about this too during this particular discussion.
Is ClamAV scanning the archive as-is, then additionally (hopefully)
decompressing it and scanning individual files? Is there a way to
debug with more info to see exactly what is going on with the process?

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
Hi there,

On Thu, 31 Oct 2019, J.R. via clamav-users wrote:

> Is ClamAV scanning the archive as-is, then additionally (hopefully)
> decompressing it and scanning individual files?

man clamd.conf (search for 'ScanArchive')

> Is there a way to debug with more info to see exactly what is going
> on with the process?

More detail about the sort of thing you'd be looking for would help.

As described in the 'man pages there are 'verbose' and 'debug' options
for the scanners and the libraries, I don't know how much help they'll
be to you. As has previously been mentioned, to investigate you can
always use the built-in OS tools to chop a file into parts (although
my preference would usually be to script something with Perl; that's
just because I'm very familiar with Perl's regexes, there's not much
that can't be done with them - nor, for that matter, with Perl.)

The bulk of the signatures are pretty simple, otherwise they'd tend to
be fragile; in my experience most of the time it's easy to understand
what they mean just by inspection. I don't often find myself doing it
but when I do it's usually something like

$ sigtool --datadir=... -fSanesecurity.ScamL.613 | sigtool --decode-sigs
VIRUS NAME: Sanesecurity.ScamL.613
TARGET TYPE: MAIL
OFFSET: *
DECODED SIGNATURE:
REFERENCE NoMBre{WILDCARD_ANY_STRING(LENGTH<=50)}BATCH NoMBre{WILDCARD_ANY_STRING}W1NN1NG
$

As you can see in this signature there are two variable length strings
with arbitrary content, and one one of them can be any length, and the
entire expression can appear in the file at any offset. The word 'any'
in this usage means very approximately "less than 4GBytes". These are
the sorts of things which can give unexpected results in the likes of
mailbox files, database files and archives which can contain a bunch
of possibly unconnected things that are effectively concatenated. As
far as ClamAV is concerned, they're just long strings. So signature-
writing must be something of an art, one I'm happy to leave to others.

Obviously I changed the words in the command output above so it won't
trigger the match, and you'll get the chance to read this message if
you're using Steve's signatures. :)

--

73,
Ged.

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
Yessir, it does indeed scan the raw file and if nothing is found (or you're running in allmatch mode) it will decompress the archive and scan the files within. ClamAV has a default archive recursion depth of 16, so it will go pretty deep.

I don’t think it's been explicitly stated yet, tar files are not compressed, and are just a bundle of files in one file. A compressed tarball ( tar.gz or targ.bz ) is less likely to have the issue described by Steffen where a signature matches various parts of different files within an archive.

If you want to see how ClamAV extracts files or other buffers for scanning, try out clamscan's --leave-temps and --tempdir options. I would also recommend trying the --gen-json option, if your ClamAV build was linked with libjson-c.

The --leave-temps option will force it to write extracted files and other buffers (like PDF streams) to disk, and --tempdir will direct it to a location of your choosing. I will admit, it's a bit of a bear to analyze because the file names (including the JSON metadata file created by --gen-json) are randomly generated and there's only some limited structure. We're working on making the output more readable / more valuable to analysts but for now it is a bit of work to interpret. The output from clamscan's --debug option may also help.

Best,
Micah

?On 10/31/19, 10:46 AM, "clamav-users on behalf of J.R. via clamav-users" <clamav-users-bounces@lists.clamav.net on behalf of clamav-users@lists.clamav.net> wrote:

> I thought ClamAV unpacked TARs (and other archives) and looked at the
> contents. If it doesn't, it wouldn't be very effective in detecting
> viruses in compressed files.

I've been wondering about this too during this particular discussion.
Is ClamAV scanning the archive as-is, then additionally (hopefully)
decompressing it and scanning individual files? Is there a way to
debug with more info to see exactly what is going on with the process?

_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml



_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml
Re: [clamav-users] unexplainable tar behaviour [ In reply to ]
Am 30.10.19 um 03:34 schrieb Paul Kosinski via clamav-users:
> I thought ClamAV unpacked TARs (and other archives) and looked at the
> contents. If it doesn't, it wouldn't be very effective in detecting
> viruses in compressed files.

Yes it does, but IIUC it matches signatures not only to the extracted
files but also to the raw archive in order to catch malware exploiting
vulnerabilities in archive unpackers.


_______________________________________________

clamav-users mailing list
clamav-users@lists.clamav.net
https://lists.clamav.net/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml