Mailing List Archive

strange results: spamassassin assings score 6.8 spamc -c assigns score 4.3
Hi

This may be related to the mistakes I made earlier but I noticed some
strange results:
after trying to fix some mistakes I made earlier I told sa-learn to
forget some messages (sent-folders and a folder where I store jokes) and
trained it again with some new spam:
m8ram@linux:~> sa-learn --forget --showdots --no-rebuild --mbox /home/m8ram/evolution/local/kim/subfolders/sent/mbox
sa-learn warning: --forget requires read/write access to the database, and is incompatible with --no-rebuild
Learned from 0 message(s) (141 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/Bram/subfolders/jokes/mbox
Learned from 0 message(s) (115 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/Bram/subfolders/afterdawn/mbox
Learned from 0 message(s) (3 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/M8ram/subfolders/sent/mbox
Learned from 0 message(s) (6 message(s) examined).
m8ram@linux:~> sa-learn --forget --showdots --mbox /home/m8ram/evolution/local/gilbert/subfolders/sent/mbox
Learned from 0 message(s) (13 message(s) examined).
m8ram@linux:~> sa-learn --ham --showdots --no-rebuild --mbox /home/m8ram/evolution/local/SPAM/mbox
Learned from 676 message(s) (676 message(s) examined).
m8ram@linux:~> sa-learn --rebuild

The last output confused me a bit, I already trained it on that mbox
yesterday so I expected the learned from number to be much lower...

After this I ran the SPAM/mbox back through SA (using the spamc -c
command in evolution) and only +/- 100 of the +600 messages were sorted
to the spamassassin folder (where I store the messages marked as spam).

I found it strange that a message with a subject:"Reality F*ck Tour
Across America!" would still be marked as ham...

So I saved the message to a text-file and tested it with spamassassin:
m8ram@linux:~> spamassassin < spam.txt > spam.out
And looking at spam.out the message is marked as SPAM:
X-Spam-Status: Yes, hits=6.8 required=5.0 tests=HTML_60_70,

HTML_FONTCOLOR_UNKNOWN,HTML_FONT_BIG,HTML_FONT_INVISIBLE,HTML_MESSAGE,
HTML_TAG_BALANCE_A,HTTP_EXCESSIVE_ESCAPES,MAILTO_TO_SPAM_ADDR,
MIME_HTML_NO_CHARSET,MIME_HTML_ONLY,RCVD_IN_BL_SPAMCOP_NET,
RCVD_IN_DSBL autolearn=no version=2.60

But when I run it through spamc -c I get the following:
m8ram@linux:~> spamc -c < spam.txt
4.3/5.0
m8ram@linux:~> cat spam.txt | spamc -c
4.3/5.0

Can anybody tell me how I can fix this?

I haven't told sa-learn to forget the mbox files I fed it without the
--mbox option yet. Do I have to train SA again after this?

TIA

Bram
--
# Mertens Bram "M8ram" <bram-mertens@linux.be> Linux User #249103 #
# SuSE Linux 8.2 (i586) kernel 2.4.20-4GB i686 256MB RAM #
# 1:45pm up 26 days 17:23, 9 users, load average: 0.00, 0.02, 0.00 #