Mailing List Archive

1 2  View All
Re: MyDoom E-mail [ In reply to ]
On Thu, 12 Feb 2004, Damon McMahon wrote:

> Thanks for the suggestions. Having done some further troubleshooting I'm
> convinced that the full body regexp search either isn't being run or
> isn't working as I would expect.
>
> Any further clues? Where would I find more info about the full body search?
>
> Thanks...

Because of a feature of SA.

If you have a MIME component of "Content-type: application/octet-stream"
SA rips it out and discards it. EVERYTHING after that 'Content-type:'
declaration up until the end of that particular component/attachment
is discarded and not available for -any- types of matches,
Not "body" "rawbody" nor "full"

Look at the def of a 'body' rule in the spam.conf man page. It says:

The 'body' in this case is the textual parts of the message body;
any non-text MIME parts are stripped, and the message decoded from
Quoted-Printable or Base-64-encoded format if necessary.

As application/octet-stream is clearly a non-text part, it is stripped.

If you look at the MIME headers of one of those critters, the
"filename=" declaration that you're looking for is after the
"Content-type: application/octet-stream" and thus made of unobtanium. ;(

Hey Devs, is there any 'really-raw-full-body' type rules that will
let us look at -everthing- in a message? Or is that so far from SA's
intended usage realm that it's not even possible.

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: MyDoom E-mail [ In reply to ]
On Wed, 11 Feb 2004, Raquel Rice wrote:

> It may be better to filter viruses with an anti-virus filter, like
> ClamAV. Failing that, maybe filtering using procmail, like:
>
> :0B
> * name=.*(\.exe$|\.scr$|\.pif$|\.bat$)
> {
> :0 $Lock
> $VIRS_BOX
> }

For the particular case of virus blocking, I agree that a real
anti-virus tool like ClamAV is the way go. (Using ClamAV here.)

However, be aware of the limitation of signature based anti-AV,
when a new breed of viri first arrives on the scene it will slip
right thru.

A heuristic based filter may actually be better in some cases.
(If the attachment MIME type is 'text|audio|image|video' and file
extension == executable, KILL ;)

However for the general case, there may be other reasons why somebody
might want to SA filter on some arbitrary part of a message.

procmail is not an option for everybody, for example on a gateway
that is a front-end for some other kind of mail system.


--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
RE: MyDoom E-mail [ In reply to ]
> -----Original Message-----
> From: David B Funk [mailto:dbfunk@engineering.uiowa.edu]

> procmail is not an option for everybody, for example on a gateway

Your point is taken. However, a gateway is exactly where I use
Procmail. The -m option allows for a general purpose script.


Regards,
Larry
RE: MyDoom E-mail [ In reply to ]
> From: David B Funk
> Sent: Wednesday, February 11, 2004 8:52 PM
>
> On Thu, 12 Feb 2004, Damon McMahon wrote:
>
> > Thanks for the suggestions. Having done some further troubleshooting I'm
> > convinced that the full body regexp search either isn't being run or
> > isn't working as I would expect.
> >
> > Any further clues? Where would I find more info about the full
> body search?
> >
> > Thanks...
>
> Because of a feature of SA.
>
> If you have a MIME component of "Content-type: application/octet-stream"
> SA rips it out and discards it. EVERYTHING after that 'Content-type:'
> declaration up until the end of that particular component/attachment
> is discarded and not available for -any- types of matches,
> Not "body" "rawbody" nor "full"
>
> Look at the def of a 'body' rule in the spam.conf man page. It says:
>
> The 'body' in this case is the textual parts of the message body;
> any non-text MIME parts are stripped, and the message decoded from
> Quoted-Printable or Base-64-encoded format if necessary.
>
> As application/octet-stream is clearly a non-text part, it is stripped.
>

There is a recent discussion regarding the current semantics of body, raw
body,
and full on the developer's list:

http://thread.gmane.org/gmane.mail.spam.spamassassin.devel/20840

Excerpt:

From: Darryl Bleau <darrylbleau <at> submersion.com>
Subject: Proper formatting of different SA body parts.

[...]
What are the different 'parts' of a message supposed to be, exactly? The
pieces I'm most interested in are Rawbody, Body, and Full. Reading the
documentation in perldoc doesn't help a whole bunch, as that was written
for a user not a developer. I've gone through the code with as fine a
toothed comb as a non-perlite can, printed out the pieces (by placing
prints in PerMsgStatus.pm around line 150 for each part, $decoded,
$bodytext, $fulltext), and and this is so far what I've discovered,
please let me know if it's close to target:

Full: Is the complete, raw, untouched text of the RFC822 message, with
one exception; all of the parts of a message that aren't text parts are
removed and replaced with [skipped <ct> attachment], where <ct> was the
value of the content-type header. The removal takes out the entire part,
including all headers, but leaves the boundary for that part. I don't
know if leaving the boundary for the removed part was the desired
behavior. It also seems to remove the ending boundary. All \r\n
linebreaks are converted to just \n.

Rawbody: Is the exact same as Full, except each part that needs to be
decoded to text is decoded.

Body: Is only the textual parts, decoded (what you'd see in an email
client, for the most part (I'll get to that)) from the message, with
these exceptions: Multiple whitespaces in a row are converted to just
one whitespace, single linebreaks are removed, and multiple linebreaks
in a row are converted to a single one. The value from the Subject
header is appended to the top. HTML-like tags are removed, and if they
look like they had a URL in them, are replaced with URI:whatever. If the
message is multi-part, the headers for the child parts are also
included, but no html-removal or linebreak-squishing or space-squishing
is done to them.

If that is close to right, I have some other questions. These make a
difference to us, because it means either writing mostly all new code to
emulate the parts from SA, or it means we can use our already built
mime-parser.

Does leaving the boundaries for removed parts make a difference? Or is
this just a side-effect? Do the removed parts have to be explicitly
replaced with the [removed] line or does that not matter?

Child parts in Body are inserted with the complete header information.
Is that needed? Body also would include normally not-viewable lines,
such as 'This is a MIME-encapsulated message'. How important is that?

Any URLs in Body are replaced with URI:whatever. From what I can see,
there aren't any Body rules that depend on looking for URI:something,
and it looks like that's only in there (forgive me if I'm wrong) to make
it easier to build the URI list.

I suppose that's all my questions. I don't _think_ it would make a
difference to build the messages slightly different from how they come
out of SA, but I wanted to make sure with the guys who would know best.
That would be you :).

The differences would be, there would be no boundaries left over from
removed parts, boundaries would be ended, removed parts wouldn't be
replaced with a [removed] line, none-viewable parts wouldn't be included
in Body, body child parts would only be inserted with the text part, not
the headers, HTML in body would be simply removed, any <a href part that
would have turned into URI:href won't be there.

My thoughts on what the parts are meant to be (for rules to run on) is
this:

Full: unaltered RFC822, minus non-text parts.
Rawbody: decoded RFC822, minus non-text parts.
Body: decoded text only, without html.
[...]
RE: MyDoom E-mail [ In reply to ]
My original thought was to look for the phrase they put in the body. It
seems to only vary slightly, and after all, that's the "beauty" of the
virus, to some people it's a very inviting and convincing sentence to
fake them into opening the attachment.
Re: MyDoom E-mail [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


David B Funk writes:
> On Thu, 12 Feb 2004, Damon McMahon wrote:
>
> > Thanks for the suggestions. Having done some further troubleshooting I'm
> > convinced that the full body regexp search either isn't being run or
> > isn't working as I would expect.
> >
> > Any further clues? Where would I find more info about the full body search?
> >
> > Thanks...
>
> Because of a feature of SA.
>
> If you have a MIME component of "Content-type: application/octet-stream"
> SA rips it out and discards it. EVERYTHING after that 'Content-type:'
> declaration up until the end of that particular component/attachment
> is discarded and not available for -any- types of matches,
> Not "body" "rawbody" nor "full"
>
> Look at the def of a 'body' rule in the spam.conf man page. It says:
>
> The 'body' in this case is the textual parts of the message body;
> any non-text MIME parts are stripped, and the message decoded from
> Quoted-Printable or Base-64-encoded format if necessary.
>
> As application/octet-stream is clearly a non-text part, it is stripped.
>
> If you look at the MIME headers of one of those critters, the
> "filename=" declaration that you're looking for is after the
> "Content-type: application/octet-stream" and thus made of unobtanium. ;(
>
> Hey Devs, is there any 'really-raw-full-body' type rules that will
> let us look at -everthing- in a message? Or is that so far from SA's
> intended usage realm that it's not even possible.

Not rules, no. It's a *spam* filter. ;)

You could, however, do it with a plugin in 3.0 -- access the "pristine"
message body.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAK8J+QTcbUG5Y7woRAvgxAKDfElS3Ye/dq1l9qE21kIQk3SNpWQCfe+Bk
PZqcMO70vETOFXrugZdQ8h8=
=lZuq
-----END PGP SIGNATURE-----
Re: MyDoom E-mail [ In reply to ]
Mark,

These are the SA rules I developed, which catch most (but not all) of
the MyDoom stuff going around:

###
# Custom antimalware tests and measures
###
# treat all messages containing Microsoft executables suspiciously
score MICROSOFT_EXECUTABLE 5.0
#
# test for W32.MyDoom malware
#
body MYDOOM_FAKE_SMTP_ERROR /Mail transaction failed. Partial
message is available./
body MYDOOM_UNICODE_BINARY /The message contains Unicode characters
and has been sent as a binary attachment./
body MYDOOM_7BIT_BINARY /The message cannot be represented in
7-bit ASCII encoding and has been sent as a binary attachment./
body MYDOOM_BODY_TEST /^test$/i
header MYDOOM_SUBJ_HI Subject =~ /^hi$/i
header MYDOOM_SUBJ_SERVER Subject =~ /^server report$/i
header MYDOOM_SUBJ_TEST Subject =~ /^test$/i
header MYDOOM_SUBJ_DELIVERY Subject =~ /^Mail Delivery System$/
header MYDOOM_SUBJ_TRANSACTION Subject =~ /^Mail Transaction Failed$/
header MYDOOM_SUBJ_STATUS Subject =~ /^status$/i
#
describe MYDOOM_FAKE_SMTP_ERROR Fake SMTP error message
typically sent by W32.Mydoom malware
describe MYDOOM_UNICODE_BINARY Techno mumbo-jumbo typically
sent by W32.Mydoom malware (1)
describe MYDOOM_7BIT_BINARY Techno mumbo-jumbo typically
sent by W32.Mydoom malware (2)
describe MYDOOM_BODY_TEST Message with 'test' on single
line typically sent by W32.Mydoom malware
describe MYDOOM_SUBJ_HI Message with 'Hi' as subject typically
sent by W32.Mydoom malware
#
score MYDOOM_FAKE_SMTP_ERROR 5.0
score MYDOOM_UNICODE_BINARY 5.0
score MYDOOM_7BIT_BINARY 5.0
score MYDOOM_BODY_TEST 2.5
score MYDOOM_SUBJ_HI 2.5
score MYDOOM_SUBJ_SERVER 5.0
score MYDOOM_SUBJ_TEST 5.0
score MYDOOM_SUBJ_DELIVERY 5.0
score MYDOOM_SUBJ_TRANSACTION 5.0
score MYDOOM_SUBJ_STATUS 5.0
score MYDOOM_SUBJ_ERROR 5.0
#

Obviously you can taylor the descriptions and scores to your own taste.
My logic is that messages with the subject 'test' or 'hi' may be
legitimate, whereas there isn't a snowball's chance in hell that the
others are anything other than the malware.


Mark A. DeMichele wrote:
> My original thought was to look for the phrase they put in the body. It
> seems to only vary slightly, and after all, that's the "beauty" of the
> virus, to some people it's a very inviting and convincing sentence to
> fake them into opening the attachment.
>
>
>

1 2  View All