Mailing List Archive

spamc and spamassassin differences?
I'm having difficulty with getting spamc and spamassassin behaving
identically.

Even when I run spamd without --nouser-config and --local, I get different
scoring for a test mail. The result is that if I try to use spamc instead of
spamassassin in my -procmailrc, a lot of spam gets through. Spamc is a whole
lot faster, though. Is there some explicit documentation on what one should
remember when migrating from spamassassin to spamc?


-- v --

v@iki.fi

[1] I'd actually like to run spamd with those options for most users and
turn them off only for the few ones that care.
Re: spamc and spamassassin differences? [ In reply to ]
At 07:04 AM 2/13/2004, Ville Herva wrote:
>I'm having difficulty with getting spamc and spamassassin behaving
>identically.
>
>Even when I run spamd without --nouser-config and --local, I get different
>scoring for a test mail. The result is that if I try to use spamc instead of
>spamassassin in my -procmailrc, a lot of spam gets through. Spamc is a whole
>lot faster, though. Is there some explicit documentation on what one should
>remember when migrating from spamassassin to spamc?

Suggestion to help diagnose the problem:

Grab an email or two in mbox format.
First feed the email through spamassassin --local, and note what
rules hit in x-spam-status.
Second feed the email through spamc (with spamd started with
--local), compare the x-spam-status for differences.

Generally the difference is the result of bayes training, and you'll get
different BAYES_ tests.

If that's not the issue, post some of the results from the above test.
Re: spamc and spamassassin differences? [ In reply to ]
On Fri, Feb 13, 2004 at 10:51:13AM -0500, you [Matt Kettler] wrote:
>
> Suggestion to help diagnose the problem:
>
> Grab an email or two in mbox format.
> First feed the email through spamassassin --local, and note what
> rules hit in x-spam-status.
> Second feed the email through spamc (with spamd started with
> --local), compare the x-spam-status for differences.

That is essentially what I've been doing, apart from that I'd like to have
the non-local tests and user config to be enabled. Here are results for one
spam:

spamc < new.spam.29004.17779 > new.spam.29004.17779.spamc
spamassassin < new.spam.29004.17779 > new.spam.29004.17779.spamassassin

diff -Nau new.spam.29004.17779 new.spam.29004.17779.spamc
--- new.spam.29004.17779 Fri Feb 13 22:05:22 2004
+++ new.spam.29004.17779.spamc Fri Feb 13 22:30:59 2004
@@ -26,6 +26,33 @@
boundary="--068420412694249159"
X-CS-IP: 12.40.132.210
+X-Spam-Flag: YES
+X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on herkules.viasys.com
+X-Spam-Report:
+ * 2.2 EXCUSE_4 BODY: Claims you can be removed from the list
+ * 0.6 CLICK_BELOW_CAPS BODY: Asks you to click below (in capital letters)
+ * 0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
+ * 0.0 HTML_MESSAGE BODY: HTML included in message
+ * 0.5 HTML_LINK_CLICK_CAPS BODY: HTML link text says "CLICK"
+ * 0.1 HTML_FONT_BIG BODY: HTML has a big font
+ * 0.5 HTML_40_50 BODY: Message is 40% to 50% HTML
+ * 0.1 HTML_LINK_CLICK_HERE BODY: HTML link text says "click here"
+ * 1.6 FRONTPAGE BODY: Frontpage used to create the message
+ * 0.1 RCVD_IN_SORBS RBL: SORBS: sender is listed in SORBS
+ * [24.2.142.152 listed in dnsbl.sorbs.net]
+ * 1.1 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
+ * [<http://dsbl.org/listing?ip=24.2.142.152>]
+ * 2.2 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
+ * [Blocked - see <http://www.spamcop.net/bl.shtml?24.2.142.152>]
+ * 2.5 RCVD_IN_DYNABLOCK RBL: Sent directly from dynamic IP address
+ * [24.2.142.152 listed in dnsbl.sorbs.net]
+ * 1.1 MIME_HTML_ONLY_MULTI Multipart message only has text/html MIME parts
+X-Spam-Status: Yes, hits=12.8 required=5.0 tests=CLICK_BELOW_CAPS,EXCUSE_4,
+ FRONTPAGE,HTML_40_50,HTML_FONT_BIG,HTML_LINK_CLICK_CAPS,
+ HTML_LINK_CLICK_HERE,HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,
+ RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_DSBL,RCVD_IN_DYNABLOCK,RCVD_IN_SORBS
+ autolearn=no version=2.63
+X-Spam-Level: ************

----068420412694249159
Content-Type: text/html;

----068420412694249159
Content-Type: text/html;
diff -Nau new.spam.29004.17779
new.spam.29004.17779.spamassassin
--- new.spam.29004.17779 Fri Feb 13 22:05:22 2004
+++ new.spam.29004.17779.spamassassin Fri Feb 13 22:22:37 2004
@@ -26,6 +26,37 @@
boundary="--068420412694249159"
X-CS-IP: 12.40.132.210
+X-Spam-Flag: YES
+X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on host
+X-Spam-Report:
+ * 2.5 EXCUSE_4 BODY: Claims you can be removed from the list
+ * 0.5 CLICK_BELOW_CAPS BODY: Asks you to click below (in capital letters)
+ * 5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
+ * [score: 1.0000]
+ * 1.1 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence between 51 and 100
+ * [cf: 100]
+ * 0.3 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
+ * 0.1 HTML_MESSAGE BODY: HTML included in message
+ * 0.5 HTML_LINK_CLICK_CAPS BODY: HTML link text says "CLICK"
+ * 0.3 HTML_FONT_BIG BODY: HTML has a big font
+ * 0.1 HTML_LINK_CLICK_HERE BODY: HTML link text says "click here"
+ * 2.5 FRONTPAGE BODY: Frontpage used to create the message
+ * 1.0 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
+ * 0.1 RCVD_IN_SORBS RBL: SORBS: sender is listed in SORBS
+ * [24.2.142.152 listed in dnsbl.sorbs.net]
+ * 0.7 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
+ * [<http://dsbl.org/listing?ip=24.2.142.152>]
+ * 1.5 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
+ * [Blocked - see <http://www.spamcop.net/bl.shtml?24.2.142.152>]
+ * 2.6 RCVD_IN_DYNABLOCK RBL: Sent directly from dynamic IP address
+ * [24.2.142.152 listed in dnsbl.sorbs.net]
+ * 1.1 MIME_HTML_ONLY_MULTI Multipart message only has text/html MIME parts
+X-Spam-Status: Yes, hits=20.3 required=5.0 tests=BAYES_99,CLICK_BELOW_CAPS,
+ EXCUSE_4,FRONTPAGE,HTML_FONT_BIG,HTML_LINK_CLICK_CAPS,
+ HTML_LINK_CLICK_HERE,HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,
+ RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,RCVD_IN_BL_SPAMCOP_NET,
+ RCVD_IN_DSBL,RCVD_IN_DYNABLOCK,RCVD_IN_SORBS autolearn=no version=2.63
+X-Spam-Level: ********************


spamd is running with these options:
/usr/bin/perl -T -w /usr/bin/spamd --max-children=30 --user-config -u spamc

So even with --user-config turned on and --local turned off spamd/spamc is
not giving the same result as spamassassin. Clearly it is not executing the
bayes tests, but it also seems to omit razor2, for example.

Also, I'd like to turn off --local only for selected users (the few that
execute spamc from their procmail, not for everybody. (spamc is called
from qmail-scanner-queue.pl, and it's quite heavy to do all tests for *all*
mail.)



-- v --

v@iki.fi
Re: spamc and spamassassin differences? [ In reply to ]
At 03:37 PM 2/13/2004, Ville Herva wrote:
>That is essentially what I've been doing, apart from that I'd like to have
>the non-local tests and user config to be enabled. Here are results for one
>spam:
>
>spamc < new.spam.29004.17779 > new.spam.29004.17779.spamc
>spamassassin < new.spam.29004.17779 > new.spam.29004.17779.spamassassin
>

<snip>

Um.. dude.. take a close look at that output... your runs via spamassassin
and spamc are different because you don't have bayes enabled when running
via spamc.

Try running the spamassassin command again, but this time su to the "spamc"
user first.
Re: spamc and spamassassin differences? [ In reply to ]
On Fri, 13 Feb 2004, Ville Herva wrote:

> spamc < new.spam.29004.17779 > new.spam.29004.17779.spamc
> spamassassin < new.spam.29004.17779 > new.spam.29004.17779.spamassassin
>
> spamd is running with these options:
> /usr/bin/perl -T -w /usr/bin/spamd --max-children=30 --user-config -u spamc

OK, so spamd is running as the user "spamc" and spamassassin is running as
the current user (whoever that is). So in all likelyhood they don't have
the same Bayes databases, and therefore won't generate the same score.

In fact, it seems likely that the spamc user's Bayes database hasn't yet
been trained on enough ham/spam for spamd to make use of it yet.

> Also, I'd like to turn off --local only for selected users (the few that
> execute spamc from their procmail, not for everybody.

Run two instances of spamd on different ports, one with --local and one
without. Instruct the users calling spamc from procmail to supply the
appropriate -p option.
Re: spamc and spamassassin differences? [ In reply to ]
On Fri, Feb 13, 2004 at 12:47:27PM -0800, you [Bart Schaefer] wrote:
> > spamd is running with these options:
> > /usr/bin/perl -T -w /usr/bin/spamd --max-children=30 --user-config -u spamc
>
> OK, so spamd is running as the user "spamc" and spamassassin is running as
> the current user (whoever that is). So in all likelyhood they don't have
> the same Bayes databases, and therefore won't generate the same score.

Ok, I understand.

My mistake was that I thought spamd flag "--user-config" means it will
execute with the config of the user that calls spamc.

> In fact, it seems likely that the spamc user's Bayes database hasn't yet
> been trained on enough ham/spam for spamd to make use of it yet.

Clearly, if the config is different, then the results are, too. I just
wasn't sure why they config wasn't.

> > Also, I'd like to turn off --local only for selected users (the few that
> > execute spamc from their procmail, not for everybody.
>
> Run two instances of spamd on different ports, one with --local and one
> without. Instruct the users calling spamc from procmail to supply the
> appropriate -p option.

Ok, that sounds good. I think that is the way to go.


thanks,

-- v --

v@iki.fi
Re: spamc and spamassassin differences? [ In reply to ]
On Fri, 13 Feb 2004, Ville Herva wrote:

> My mistake was that I thought spamd flag "--user-config" means it will
> execute with the config of the user that calls spamc.

You aren't wrong, exactly -- but the user that spamd runs as (in this
case, "spamc") must have sufficient privileges to setuid() to the user
identity passed by spamc to spamd.

That usually means you either have to run spamd as root, or use virtual
user configs (--virtual-config-dir).
Re: spamc and spamassassin differences? [ In reply to ]
On Fri, Feb 13, 2004 at 01:43:01PM -0800, you [Bart Schaefer] wrote:
> On Fri, 13 Feb 2004, Ville Herva wrote:
>
> > My mistake was that I thought spamd flag "--user-config" means it will
> > execute with the config of the user that calls spamc.
>
> You aren't wrong, exactly -- but the user that spamd runs as (in this
> case, "spamc") must have sufficient privileges to setuid() to the user
> identity passed by spamc to spamd.
>
> That usually means you either have to run spamd as root, or use virtual
> user configs (--virtual-config-dir).

Yep. I wasn't sure how exactly it works. Of course, it's quite naive to
think spamc would send the whole configuration over to spamd, and it
naturally is not enough for spamd to be able to read $USER/.spamassassin
if it wants to update the bayes db etc.

Thanks for clearing this up.


-- v --

v@iki.fi