The only spam getting thru our SA from time to time is of this format:
<html><body>
<center><!--e2we9aigF7oX1u9--><a href="http://www.xcvxc3w.com/ghr/"><img
src="http://www.sddewwx.com/g9.gif" border=0></a></center>
<body></html>
That's the body of a standard multipart/alternative mail where the
text/plain part only contains a few line breaks. The Subject is encoded
although there is no need to:
Subject: =?iso-8859-1?B?UmU6U29tYXRyb3BpbiBpbiBhIGNhcHN1bGU=?=
It gets tagged as follows:
2.10 BAYES_90 Bayesian spam probability is 90 to 99%
0.11 HTML_60_70 Message is 60% to 70% HTML
1.23 HTML_IMAGE_ONLY_02 HTML: images with 0-200 bytes of words
I'm wondering:
- why there isn't any tagging of the comment, shouldn't comments be used
as an indication of probable spam? (Not that they could just remove the
comment, it seems to only serve as a Bayes poison, it doesn't
hide/obfuscate anything.)
- why it doesn't tag as MIME_HTML_ONLY or MIME_HTML_ONLY_MULTI or maybe
there should be a rule "MIME_NO_TEXT_PLAIN" which hits when you have
text/plain parts but no content?
- why it doesn't tag an IMG in an HREF as spammy
or do we get too many false positives with this approach?
Another lesson the message tells is that the BigEvil rules are probably
going to fail more often in the future than ever since spammers just
revolve thru random domains freshly registered for one batch of spam.
Kai
--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org
<html><body>
<center><!--e2we9aigF7oX1u9--><a href="http://www.xcvxc3w.com/ghr/"><img
src="http://www.sddewwx.com/g9.gif" border=0></a></center>
<body></html>
That's the body of a standard multipart/alternative mail where the
text/plain part only contains a few line breaks. The Subject is encoded
although there is no need to:
Subject: =?iso-8859-1?B?UmU6U29tYXRyb3BpbiBpbiBhIGNhcHN1bGU=?=
It gets tagged as follows:
2.10 BAYES_90 Bayesian spam probability is 90 to 99%
0.11 HTML_60_70 Message is 60% to 70% HTML
1.23 HTML_IMAGE_ONLY_02 HTML: images with 0-200 bytes of words
I'm wondering:
- why there isn't any tagging of the comment, shouldn't comments be used
as an indication of probable spam? (Not that they could just remove the
comment, it seems to only serve as a Bayes poison, it doesn't
hide/obfuscate anything.)
- why it doesn't tag as MIME_HTML_ONLY or MIME_HTML_ONLY_MULTI or maybe
there should be a rule "MIME_NO_TEXT_PLAIN" which hits when you have
text/plain parts but no content?
- why it doesn't tag an IMG in an HREF as spammy
or do we get too many false positives with this approach?
Another lesson the message tells is that the BigEvil rules are probably
going to fail more often in the future than ever since spammers just
revolve thru random domains freshly registered for one batch of spam.
Kai
--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de & http://msie.winware.org