Mailing List Archive

Example bayes hits on random words (was Re: test=none?? (Re: Spec-ial Offer! 60% 0ff Generic convulse))
OK, IGNORE the score for the message I just sent. I sent the wrong results.

HERE is how it scores on my system:

X-Spam-Level: *
X-Spam-Status: No, hits=1.0 required=5.0
tests=BAYES_50,LOCAL_DRUGS_MALEDYSFUNCTION
autolearn=no version=2.63


So, what we've got is a tricky spam, and my SA bayes did NOT recognize
it (yet). It's naturally going into training ASAP. However, two of the
four bayes DID pick it out accurately (with bogofilter being my favorite).

What's interesting about these are the words that are being caught by bayes:

(bogofilter excerpt)

X-Spam-Bogosity: Yes, tests=bogofilter, spamicity=1.000000, version=0.17.2
n pgood pbad fw U
[...]
"Cialis" 102 0.001004 0.013583 0.931097 +
[...]
"canvasback" 8 0.000000 0.001168 0.999270 +
"confucian" 8 0.000000 0.001168 0.999270 +
"duress" 8 0.000000 0.001168 0.999270 +
"exaltation" 8 0.000000 0.001168 0.999270 +
"powers" 8 0.000000 0.001168 0.999270 +
"subj:Generic" 11 0.000000 0.001607 0.999469 +
"concerti" 13 0.000000 0.001899 0.999550 +
"crosslink" 13 0.000000 0.001899 0.999550 +

The 2nd column is how many times the word appears in the database, the
next two are the percentage of ham and spam they appear in, and the last
is the overall "spamicity" associated with the word. For some reason,
Cialis is in my bogofilter database as having appeared in non-spam at
one point (hmm...) but other, not-necessarily-spam words such as
crosslink, concerti, exaltation, confuscian have ONLY appeared in spam
for my account. Without bayes, these wouldn't raise flags. However, I
rarely talk about those topics, so they're "unusual." And of course, the
word generic in a subject is a flag.

Now for spamprobe, which matches on word pairs:

Score: 1.0000000
Spam Prob Count Good Spam Word
0.9999900 1 0 172 cialis
0.9999852 1 0 16 get hard
0.9999817 1 0 13 crosslink
0.9999736 1 0 9 Hsubject_generic
0.9999661 1 0 7 concerti
0.9999661 1 0 7 duress
0.9999525 1 0 5 gene ric
[...]

Similar results here, but the word pairings yield slightly different results. "get hard" and "gene ric" are dead giveaways. There concerti again. Oh, and generic in the subject. And crosslink must be in a spammer's wordlist somewhere, because it's hit in 13 spam messages so far.