Keith C. Ivey wrote:
>Could you post the debug output from spamassassin for one of
>these messages? I'm very curious to see why you think the
>poison is defeating Bayes. It's certainly possible that every
>once in a while a spammer will randomly hit on a word that's a
>good nonspam indicator for you, but I don't believe it can
>happen for any substantial fraction of messages.
The output you asked for is posted at the end. I've already trained my
Bayes with 2 messages similar to this one (layout is identical, but the
URL is in a different domain in each one, and of course, completely
different set of random poison words).
>The only SA change the message your posted seems to suggest is
>a modification of the rule for catching low-contrast font
>color, which has nothing to do with Bayes. Looking at the
>spam, it got BAYES_50, so the "poison" didn't affect Bayes at
>all. It had no strong spam or nonspam indicators even without
>the added words.
Understood, what I wanted to say is that Bayes isn't effective against
this sort of stuff and currently the other SA mechanisms aren't
sufficient to catch this spam.
This is mainly because HTML.pm can be fooled by dangling attributes.
Ideally, HTML parser should parse HTML the same way as popular browsers
(IE, Mozilla). Unfortuanately I cannot fix this in HTML.pm myself, this
code is too bity convoluted for me. I think that the help of original
author of HTML.pm is needed here.
--- BEGIN OUTPUT ---
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/kerberos/sbin', keeping.
debug: PATH included '/usr/kerberos/bin', keeping.
debug: PATH included '/usr/lib/courier/bin', keeping.
debug: PATH included '/usr/lib/courier/sbin', keeping.
debug: PATH included '/usr/local/sbin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/X11R6/bin', keeping.
debug: PATH included '/root/bin', keeping.
debug: Final PATH set to:
/usr/kerberos/sbin:/usr/kerberos/bin:/usr/lib/courier/bin:/usr/lib/courier/sbin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/root/.spamassassin" for user state dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 28543 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_toks
debug: bayes: 28543 tie-ing to DB file R/O /etc/mail/spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 3 chosen.
debug: Initialising learner
debug: is Net::DNS::Resolver available? yes
debug: trying (3) microsoft.com...
debug: looking up MX for 'microsoft.com'
debug: MX for 'microsoft.com' exists? 1
debug: MX lookup of microsoft.com succeeded => Dns available (set
dns_available to hardcode)
debug: is DNS available? 1
debug: all '*From' addrs: ninawrithed@beerbloat.com
debug: running header regexp tests; score so far=0
debug: running body-text per-line regexp tests; score so far=0.5
debug: bayes corpus size: nspam = 1819, nham = 6265
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *p = "U*ninawrithed D*beerbloat.com
D*com"
debug: tokenize: header tokens for *M = " qsnlk 636881hohclfgayg
Kmorphynbkzderzfc com "
debug: tokenize: header tokens for *F = "U*ninawrithed D*beerbloat.com
D*com"
debug: tokenize: header tokens for To = "U*olo D*altkom.com.pl D*com.pl
D*pl"
debug: tokenize: header tokens for Mime-Version = "1.0"
debug: tokenize: header tokens for *c = "/html; charset=iso-8859-1"
debug: tokenize: header tokens for Content-Transfer-Encoding = "7bit"
debug: tokenize: header tokens for X-Mime-Autoconverted = "from 8bit to
7bit by courier 0.44"
debug: tokenize: header tokens for *r = " olo ([::ffff:202.196.220]) by
nmail.altkom.pl esmtp; "
debug: bayes token 'H*c:html' => 0.997358361790176
debug: bayes token 'disagree' => 0.00297237569060773
debug: bayes token 'Sale' => 0.996940397350993
debug: bayes token 'Bruno' => 0.00410687022900763
debug: bayes token 'H*r:olo' => 0.993492957746479
debug: bayes token 'beach' => 0.993492957746479
debug: bayes token 'CALLED' => 0.990941176470588
debug: bayes token 'adulthood' => 0.985096774193548
debug: bayes token 'CheapPharmacy' => 0.978
debug: bayes token 'CIADLIS' => 0.978
debug: bayes token 'VIAGDRA' => 0.978
debug: bayes token 'EFFECTIVE!' => 0.978
debug: bayes token 'tolerance' => 0.0256190476190476
debug: bayes token 'carelessly' => 0.0256190476190476
debug: bayes token 'URI' => 0.96844194358858
debug: bayes token 'REAL' => 0.958964997782887
debug: bayes token 'movements' => 0.958
debug: bayes token 'makeup' => 0.958
debug: bayes token 'chord' => 0.958
debug: bayes token 'downright' => 0.958
debug: bayes token 'sagebrush' => 0.958
debug: bayes token 'corpse' => 0.958
debug: bayes token 'aviary' => 0.958
debug: bayes token 'HTo:U*olo' => 0.95257244243949
debug: bayes token 'Hewlett' => 0.0489090909090909
debug: bayes token 'reopens' => 0.0489090909090909
debug: bayes token 'chronicle' => 0.0489090909090909
debug: bayes token 'cautions' => 0.0489090909090909
debug: bayes token 'discharged' => 0.0489090909090909
debug: bayes token 'compresses' => 0.0489090909090909
debug: bayes token 'notorious' => 0.0489090909090909
debug: bayes token 'defeat' => 0.0489090909090909
debug: bayes token 'smartly' => 0.0489090909090909
debug: bayes token 'credible' => 0.947986086684282
debug: bayes token 'ONLY' => 0.942476065364982
debug: bayes token 'employ' => 0.929714918635859
debug: bayes token 'SUPER' => 0.928422130125509
debug: bayes token 'dose' => 0.916960992788963
debug: bayes token 'href' => 0.902075983318674
debug: bayes token 'HTo:D*altkom.com.pl' => 0.885057514092106
debug: bayes token 'HTo:D*com.pl' => 0.883537429955864
debug: bayes token 'H*r:ffff' => 0.876766493636693
debug: bayes token 'Annual' => 0.864494012282618
debug: bayes: score = 0.622580654529175
debug: bayes: 28543 untie-ing
debug: bayes: 28543 untie-ing db_toks
debug: bayes: 28543 untie-ing db_seen
debug: Razor2 is not available
debug: running raw-body-text per-line regexp tests; score so far=1.92
debug: running uri tests; score so far=2.62
debug: uri tests: Done uriRE
debug: running full-text regexp tests; score so far=2.62
debug: Current PATH is:
/usr/kerberos/sbin:/usr/kerberos/bin:/usr/lib/courier/bin:/usr/lib/courier/sbin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin
debug: Pyzor is not available: pyzor not found
debug: Razor2 is not available
debug: DCCifd is not available: no r/w dccifd socket found.
debug: DCC is not available: no executable dccproc found.
debug: all '*To' addrs: olo@altkom.com.pl
debug: DNS MX records found: 1
debug: RBL: success for 1 of 1 queries
debug: running meta tests; score so far=2.62
debug: is spam? score=4.212 required=5
tests=BAYES_60,HTML_FONTCOLOR_UNKNOWN,HTML_MESSAGE,LOC_HTMLSPLITFONT,MIME_HTML_ONLY,SUBJECT_PHARMACY
Delivered-To: olo@altkom.com.pl
Return-Path: <ninawrithed@beerbloat.com>
Received: from olo ([::ffff:202.196.220.93])
by nmail.altkom.pl with esmtp; Tue, 17 Feb 2004 10:21:19 +0100
Message-ID: <qsnlk.636881hohclfgayg@Kmorphynbkzderzfc.com>
From: "Kmorphy" <ninawrithed@beerbloat.com>
Date: Tue, 17 Feb 2004 17:21:40 +0800
To: olo@altkom.com.pl
Subject: upholders CheapPharmacy acoustics
Mime-Version: 1.0
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: 7bit
X-Mime-Autoconverted: from 8bit to 7bit by courier 0.44
X-Spam-Level: ****
X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on
nmail.altkom.pl
X-Spam-Status: No, hits=4.2 required=5.0 tests=BAYES_60,
HTML_FONTCOLOR_UNKNOWN,HTML_MESSAGE,LOC_HTMLSPLITFONT,MIME_HTML_ONLY,
SUBJECT_PHARMACY autolearn=no version=2.60
<html>
<font color=
#fee4e8> sleepwalk negotiate sonic, diluting democrat humpback,
encamping palaces MacDonald Hewlett brainstem crops cautions chartering
discharged chronicle disagree presided accordion. sentiments transplant
corpse defeat downright, immersed Boltzmann skulk beatitudes espouse
planks palmer compresses populace almsman bivouac tolerance. cookery
Ridgway scalded. ribbing mockery Oakley glover reopens
satellites.</font><br>
ONLY REAL SUPER VIAGDRA CALLED CIADLIS IS EFFECTIVE! Annual Sale: ONLY
$3 per dose<br>
<br>convening<br>
<br><a hrefredrawnhref=
http://multilayer.com href=
"
http://goandtakeit.com/sv/index.php?pid=expert">Website</a>
<br><br>
<font color=
#ebeff4>whitely chord cowing gayety aviary, nostalgic glucose Hyannis
employ; subdued movements mischief smartly intonation reserved distaff
standoff terrifies. heavily acquirable beach adulthood invertible,
traversing vacuo enraged Dobbin Avogadro Agnes Bruno enfeeble credible
notorious carelessly octaves. negotiate makeup SIMULA. sagebrush
imaginably heiressesfalcons.</font><br>
</html>
--- END OUTPUT ---
--
Best Regards,
Aleksander Adamowski
GG#: 274614
ICQ UIN: 19780575
http://olo.ab.altkom.pl