Mailing List Archive

Calling spamassassin directly yields very different results than calling spamassassin via amavis-new
About five months ago, I experienced a problem that I *thought* I had
resolved, but I am observing similar behavior after retraining the Bayes
database. While the symptoms are similar, the root cause seems to be
different (thankfully). The original problem is documented at
http://spamassassin.1065346.n5.nabble.com/Very-spammy-messages-yield-BAYES-00-1-9-td101167.html
.

In any case, I am again seeing SA scores that seem way too low for the
message content in question. My "glue", as it were, is Amavis-New.

In particular, certain messages that are clearly SPAM are scored between
0 and 3 when processed via Amavis. However, if I process the same
messages with the "spamassassin" binary, directly, the scores are much
higher and much more in-line with what one would expect.

The X-Spam-Status header, when processed via Amavis, looks like this:

X-Spam-Status: No, score=0.8 tagged_above=-999 required=2
tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=disabled

When I process the same message with spamassassin, directly
(spamassassin -t -D < /tmp/msg.txt), the header looks like this:

----------------------------------------------------------------------
X-Spam-Status: Yes, score=7.5 required=5.0
tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS
autolearn=disabled version=3.3.1

[...]

Content analysis details: (7.5 points, 5.0 required)

pts rule name description
---- ----------------------
--------------------------------------------------
-0.0 NO_RELAYS Informational: message was not relayed via SMTP
1.2 MISSING_HEADERS Missing To: header
2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
[score: 0.5000]
1.2 MISSING_MID Missing Message-Id: header
1.3 MISSING_SUBJECT Missing Subject: header
-0.0 NO_RECEIVED Informational: message has no Received headers
1.8 MISSING_DATE Missing Date: header
0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822
headers
----------------------------------------------------------------------

In short, my question is, how the **** is the message scoring 0.8 in one
case and 7.5 in another? That is a massive discrepancy.

From what I can tell, the same tests aren't even being performed in each
case.

I have to assume that the options that are passed to SA are wildly
different in each case.

It bears mention that the server in question uses ISPConfig 3. ISPConfig
allows for SA policies to be configured per-domain and per-user, and
Amavis leverages MySQL to make that happen. If relevant, I can provide
more information about this aspect of my setup.

These are the only directives that I've added to /etc/spamassassin/local.cf:

----------------------------------------------------------------------
bayes_path /var/lib/amavis/.spamassassin/bayes

use_bayes 1
bayes_auto_expire 0
bayes_store_module Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn DBI:mysql:sa_bayes:localhost
bayes_sql_username sa_user
bayes_sql_password [scrubbed]
bayes_sql_override_username amavis
----------------------------------------------------------------------

Given the first directive, SA should always use the same Bayes database
(the one I've configured in MySQL), regardless of how SA is called, right?

For those curious about the state of the Bayes database, here's the
output from "sa-learn --dump magic" (sorry for the wrapping):

0.000 0 3 0 non-token data: bayes db version
0.000 0 2007 0 non-token data: nspam
0.000 0 6554 0 non-token data: nham
0.000 0 188379 0 non-token data: ntokens
0.000 0 1356345829 0 non-token data: oldest atime
0.000 0 1357769317 0 non-token data: newest atime
0.000 0 0 0 non-token data: last journal
sync atime
0.000 0 1357727978 0 non-token data: last expiry atime
0.000 0 1382400 0 non-token data: last expire
atime delta
0.000 0 3191 0 non-token data: last expire
reduction count

Ultimately, it seems that I should be trying to figure out how, exactly,
Amavis is calling SpamAssassin in the course of normal operation.

Thanks for any help here, folks!

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Wed, Jan 09, 2013 at 05:14:05PM -0500, Ben Johnson wrote:
> Content analysis details: (7.5 points, 5.0 required)
>
> pts rule name description
> ---- ----------------------
> --------------------------------------------------
> -0.0 NO_RELAYS Informational: message was not relayed via SMTP
> 1.2 MISSING_HEADERS Missing To: header
> 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60%
> [score: 0.5000]
> 1.2 MISSING_MID Missing Message-Id: header
> 1.3 MISSING_SUBJECT Missing Subject: header
> -0.0 NO_RECEIVED Informational: message has no Received headers
> 1.8 MISSING_DATE Missing Date: header
> 0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822
> headers
> ----------------------------------------------------------------------

These hits indicate that the mail you're testing (/tmp/msg.txt) is
corrupted, as it is missing most email headers.

> In short, my question is, how the **** is the message scoring 0.8 in one
> case and 7.5 in another? That is a massive discrepancy.

In the second case, the mail you are testing is corrupted. Open /tmp/msg.txt
in a text editor and check if it looks sane.
--
Marius Gavrilescu
(warnings) Do not dangle the mouse by its cable or throw the mouse at co-workers. --From a manual for an SGI computer.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Wed, 09 Jan 2013 17:14:05 -0500
Ben Johnson wrote:

> About five months ago, I experienced a problem that I *thought* I had
> resolved, but I am observing similar behavior after retraining the
> Bayes database. While the symptoms are similar, the root cause seems
> to be different (thankfully). The original problem is documented at
> http://spamassassin.1065346.n5.nabble.com/Very-spammy-messages-yield-BAYES-00-1-9-td101167.html
> ..
>
> In any case, I am again seeing SA scores that seem way too low for the
> message content in question. My "glue", as it were, is Amavis-New.
>
> In particular, certain messages that are clearly SPAM are scored
> between 0 and 3 when processed via Amavis. However, if I process the
> same messages with the "spamassassin" binary, directly, the scores
> are much higher and much more in-line with what one would expect.
>...
> When I process the same message with spamassassin, directly
> (spamassassin -t -D < /tmp/msg.txt), the header looks like this:
>
> ----------------------------------------------------------------------
> X-Spam-Status: Yes, score=7.5 required=5.0
> tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS
> autolearn=disabled version=3.3.1


This is not better, it indicates that SA didn't recognise it as an
email, not that it recognised it as a spam. Whatever /tmp/msg.txt was
it wasn't a properly formatted email.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/9/2013 5:36 PM, RW wrote:
> On Wed, 09 Jan 2013 17:14:05 -0500
> Ben Johnson wrote:
>
>> About five months ago, I experienced a problem that I *thought* I had
>> resolved, but I am observing similar behavior after retraining the
>> Bayes database. While the symptoms are similar, the root cause seems
>> to be different (thankfully). The original problem is documented at
>> http://spamassassin.1065346.n5.nabble.com/Very-spammy-messages-yield-BAYES-00-1-9-td101167.html
>> ..
>>
>> In any case, I am again seeing SA scores that seem way too low for the
>> message content in question. My "glue", as it were, is Amavis-New.
>>
>> In particular, certain messages that are clearly SPAM are scored
>> between 0 and 3 when processed via Amavis. However, if I process the
>> same messages with the "spamassassin" binary, directly, the scores
>> are much higher and much more in-line with what one would expect.
>> ...
>> When I process the same message with spamassassin, directly
>> (spamassassin -t -D < /tmp/msg.txt), the header looks like this:
>>
>> ----------------------------------------------------------------------
>> X-Spam-Status: Yes, score=7.5 required=5.0
>> tests=BAYES_50,MISSING_DATE,MISSING_HEADERS,MISSING_MID,MISSING_SUBJECT,NO_HEADERS_MESSAGE,NO_RECEIVED,NO_RELAYS
>> autolearn=disabled version=3.3.1
>
>
> This is not better, it indicates that SA didn't recognise it as an
> email, not that it recognised it as a spam. Whatever /tmp/msg.txt was
> it wasn't a properly formatted email.
>

Thanks for the quick replies, Marius and RW.

I see; I saved the email message out of Thunderbird (with View ->
Headers -> All), as a plain text file. Apparently, that process butchers
the original message.

I'm reviewing SA's behavior using an email client to view the messages,
but I also have access to the mailbox on the server. I realize that this
question may seem amateurish, but how does one discern the "message ID"
from the email client and locate the corresponding file in the user's
"Maildir"? I'm using Dovecot 1.x.

The file names in the user's Maildir look like this:

1357762471.M952293P32429.example.com,S=4300,W=4381:2,

I assume that the first bit is a UNIX timestamp. Is there any means by
which to correlate the second bit (M952293P32429) to the message as I
see it in my email client (Thunderbird)? I don't see that string
anywhere in the headers (maybe that's by design).

In other words, when I spot a message that SA seems to be scoring
incorrectly in my Inbox, how do I track-down the actual file on the
server that should be fed into "spamassassin"?

Is there some better method than doing something like

# grep -ir 20B2834E4242 /var/vmail/example.com/user/Maildir

where 20B2834E4242 is the ID in the "Received" header?

In any case, I tracked-down the original message on the server and
repeated the process (spamassassin -t < /tmp/msg.txt):

----------------------------------------------------------------------
X-Spam-Status: Yes, score=9.3 required=5.0 tests=BAYES_50,HTML_MESSAGE,

RCVD_IN_BRBL_LASTEXT,RCVD_IN_CSS,RCVD_IN_PSBL,RCVD_IN_XBL,URIBL_DBL_SPAM,
URIBL_JP_SURBL autolearn=disabled version=3.3.1

[...]

Content analysis details: (9.3 points, 5.0 required)

pts rule name description
---- ----------------------
--------------------------------------------------
0.4 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
[188.165.126.107 listed in zen.spamhaus.org]
1.0 RCVD_IN_CSS RBL: Received via a relay in Spamhaus CSS
2.7 RCVD_IN_PSBL RBL: Received via a relay in PSBL
[188.165.126.107 listed in psbl.surriel.com]
1.2 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
[URIs: ehylle.info]
1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT
[188.165.126.107 listed in
bb.barracudacentral.org]
1.7 URIBL_DBL_SPAM Contains an URL listed in the DBL blocklist
[URIs: ehylle.info]
0.0 HTML_MESSAGE BODY: HTML included in message
0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60%
[score: 0.5428]
----------------------------------------------------------------------

So, if I've done this correctly, the score discrepancy is even larger.

Thanks, guys!

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 2013-01-10 01:03, Ben Johnson wrote:

> I see; I saved the email message out of Thunderbird (with View ->
> Headers -> All), as a plain text file. Apparently, that process
> butchers the original message.

In Thunderbird, rather use File > Save as to save the entire message.

> RCVD_IN_BRBL_LASTEXT,RCVD_IN_CSS,RCVD_IN_PSBL,RCVD_IN_XBL,URIBL_DBL_S
>PAM, URIBL_JP_SURBL autolearn=disabled version=3.3.1

Rules based on RBL/URIBL checks depend on DNS based blacklist queries.
And between the time you first receive an email and the time you
re-scan it, the originating client IP and/or URIs from the mail body
may have been added the the black lists after you first received the
mail. Did you re-scan the mail with amavis, too, or did you post the
X-Spam header lines from the original amavis scan and re-scan the mail
with spamassassin significantly later?

I am not familiar with amavis, but I know that it calls spamassassin in
a special way, depending on the amavis config. Wild guess: could it be
that RBL/URIBL queries are disabled in your amavis config?

Hope this helps.

Cheers,

wolfgang
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/9/2013 7:36 PM, wolfgang wrote:
> On 2013-01-10 01:03, Ben Johnson wrote:
>
>> I see; I saved the email message out of Thunderbird (with View ->
>> Headers -> All), as a plain text file. Apparently, that process
>> butchers the original message.
>
> In Thunderbird, rather use File > Save as to save the entire message.
>
>> RCVD_IN_BRBL_LASTEXT,RCVD_IN_CSS,RCVD_IN_PSBL,RCVD_IN_XBL,URIBL_DBL_S
>> PAM, URIBL_JP_SURBL autolearn=disabled version=3.3.1
>
> Rules based on RBL/URIBL checks depend on DNS based blacklist queries.
> And between the time you first receive an email and the time you
> re-scan it, the originating client IP and/or URIs from the mail body
> may have been added the the black lists after you first received the
> mail. Did you re-scan the mail with amavis, too, or did you post the
> X-Spam header lines from the original amavis scan and re-scan the mail
> with spamassassin significantly later?
>
> I am not familiar with amavis, but I know that it calls spamassassin in
> a special way, depending on the amavis config. Wild guess: could it be
> that RBL/URIBL queries are disabled in your amavis config?
>
> Hope this helps.
>
> Cheers,
>
> wolfgang
>

Hi, Wolfgang,

Thanks for the reply.

What you say about the RBL/URIBL tests makes sense. I did not rescan the
message with amavis; I posted the X-Spam-Status header contents from the
original scan. The only reason for which I did not rescan the message
with Amavis is that I don't know how to perform a SpamAssassin scan
through Amavis in a manual capacity. And I can't find instructions
regarding the process.

All of that said, less than eight hours elapsed between the original
scan with Amavis and the manual scan with "spamassassin". But, that's
probably long enough for the IP addresses to be blacklisted.

If nobody knows how to scan messages through Amavis, maybe I need to
take this question over to the Amavis list for the time being.

Thanks again,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Wed, 9 Jan 2013, Ben Johnson wrote:

> On 1/9/2013 7:36 PM, wolfgang wrote:
>>
>>> RCVD_IN_BRBL_LASTEXT,RCVD_IN_CSS,RCVD_IN_PSBL,RCVD_IN_XBL,URIBL_DBL_S
>>> PAM, URIBL_JP_SURBL autolearn=disabled version=3.3.1
>>
>> I am not familiar with amavis, but I know that it calls spamassassin in
>> a special way, depending on the amavis config. Wild guess: could it be
>> that RBL/URIBL queries are disabled in your amavis config?
>
> Thanks for the reply.
>
> What you say about the RBL/URIBL tests makes sense.

Check your amavis configuration to see whether you have network tests
disabled. That's the simplest explanation.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"They will be slaughtered as result of England's anti-gun laws
that concentrates power to the Government."
-- Shifty Powers (101 abn) observing British
subjects training to repel a German invasion
using rakes, hoes and pitchforks
-----------------------------------------------------------------------
8 days until Benjamin Franklin's 307th Birthday
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 10/01/13 00:03, Ben Johnson wrote:
>
>
> On 1/9/2013 5:36 PM, RW wrote:
>>
>>
>> This is not better, it indicates that SA didn't recognise it as an
>> email, not that it recognised it as a spam. Whatever /tmp/msg.txt was
>> it wasn't a properly formatted email.
>>
>
> Thanks for the quick replies, Marius and RW.
>
> I see; I saved the email message out of Thunderbird (with View ->
> Headers -> All), as a plain text file. Apparently, that process butchers
> the original message.
>

Ben,

In thunderbird, select the message and then press Ctrl-U (or from the
menus: View > Message Source) and select File > Save to save the email
including all headers in plain text format. You can then feed it to
spamassassin as above.

Hope that helps.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/9/2013 9:13 PM, John Hardin wrote:
> On Wed, 9 Jan 2013, Ben Johnson wrote:
>
>> On 1/9/2013 7:36 PM, wolfgang wrote:
>>>
>>>> RCVD_IN_BRBL_LASTEXT,RCVD_IN_CSS,RCVD_IN_PSBL,RCVD_IN_XBL,URIBL_DBL_S
>>>> PAM, URIBL_JP_SURBL autolearn=disabled version=3.3.1
>>>
>>> I am not familiar with amavis, but I know that it calls spamassassin in
>>> a special way, depending on the amavis config. Wild guess: could it be
>>> that RBL/URIBL queries are disabled in your amavis config?
>>
>> Thanks for the reply.
>>
>> What you say about the RBL/URIBL tests makes sense.
>
> Check your amavis configuration to see whether you have network tests
> disabled. That's the simplest explanation.
>

Thanks, John.

On the surface, network tests appear to be enabled:

# grep -ir sa_local_tests_only /etc/amavis
/etc/amavis/conf.d/20-debian_defaults:$sa_local_tests_only = 0; #
only tests which do not require internet access?

Also, some of the incoming messages do contain network test scoring data
in the X-Spam-Status header; here are two examples:

Yes, score=8.451 tagged_above=-999 required=2 tests=[BAYES_99=3.5,
RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_CSS=1, RDNS_NONE=0.793,
SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7]
autolearn=disabled

Yes, score=12.266 tagged_above=-999 required=2 tests=[.BAYES_50=0.8,
DATE_IN_FUTURE_12_24=3.199, DIET_1=0.001, HTML_MESSAGE=0.001,
RCVD_IN_BRBL_LASTEXT=1.449, RCVD_IN_PSBL=2.7, RCVD_IN_XBL=0.375,
RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001,
URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25] autolearn=disabled

(Several of those are network tests, right?)

What's strange is that another message was delivered at nearly the same
time as the above two, yet it shows no evidence of network tests being
performed (right?):

No, score=0.8 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=disabled

It seems as though the SPAM that slips through never shows evidence of
network tests, whereas the SPAM that is caught (and usually has a high
score -- 10 or higher) always seems to show evidence of network tests.

This observation begs the question: why are network tests being
performed for some messages but not others? To my knowledge, no
white/gray/black listing has been done on this box.

Thanks again,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Thu, 10 Jan 2013 11:43:44 -0500
Ben Johnson wrote:


> This observation begs the question: why are network tests being
> performed for some messages but not others? To my knowledge, no
> white/gray/black listing has been done on this box.

As has already been said, the score from network tests is commonly a
lot higher on retesting because of all the reporting that happened
in-between.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/10/2013 11:49 AM, RW wrote:
> On Thu, 10 Jan 2013 11:43:44 -0500
> Ben Johnson wrote:
>
>
>> This observation begs the question: why are network tests being
>> performed for some messages but not others? To my knowledge, no
>> white/gray/black listing has been done on this box.
>
> As has already been said, the score from network tests is commonly a
> lot higher on retesting because of all the reporting that happened
> in-between.
>

RW,

I understand that, but that doesn't explain why if I retest a given
message by calling SpamAssassin directly, and I *disable network tests*,
the score is sometimes *higher* than when the message was scanned
initially with AMaViS.

When this message came through initially, the X-Spam-Status header was:

No, score=1.593 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
HTML_MESSAGE=0.001, RDNS_NONE=0.793, SPF_PASS=-0.001] autolearn=disabled

About an hour later, I fed the same message to the spamassassin
executable, while disabling network tests:

# spamassassin -L -t -D < /tmp/msg.txt

Content analysis details: (5.0 points, 5.0 required)

pts rule name description
---- ----------------------
--------------------------------------------------
3.8 BAYES_99 BODY: Bayes spam probability is 99 to 100%
[score: 1.0000]
0.0 HTML_MESSAGE BODY: HTML included in message
1.2 RDNS_NONE Delivered to internal network by a host with
no rDNS

To restate the question, if network tests are not outright disabled in
Amavis, why is Amavis returning lower scores than the SA binary does
when called directly with network tests disabled? Shouldn't the SA score
with network tests disabled *always* be lower than or equal to the
Amavis score with network tests enabled (provided that all else is equal)?

Or am I way off-base here?

Thanks again,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/10/2013 12:18 PM, Ben Johnson wrote:
>
>
> On 1/10/2013 11:49 AM, RW wrote:
>> On Thu, 10 Jan 2013 11:43:44 -0500
>> Ben Johnson wrote:
>>
>>
>>> This observation begs the question: why are network tests being
>>> performed for some messages but not others? To my knowledge, no
>>> white/gray/black listing has been done on this box.
>>
>> As has already been said, the score from network tests is commonly a
>> lot higher on retesting because of all the reporting that happened
>> in-between.
>>
>
> RW,
>
> I understand that, but that doesn't explain why if I retest a given
> message by calling SpamAssassin directly, and I *disable network tests*,
> the score is sometimes *higher* than when the message was scanned
> initially with AMaViS.
>
> When this message came through initially, the X-Spam-Status header was:
>
> No, score=1.593 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
> HTML_MESSAGE=0.001, RDNS_NONE=0.793, SPF_PASS=-0.001] autolearn=disabled
>
> About an hour later, I fed the same message to the spamassassin
> executable, while disabling network tests:
>
> # spamassassin -L -t -D < /tmp/msg.txt
>
> Content analysis details: (5.0 points, 5.0 required)
>
> pts rule name description
> ---- ----------------------
> --------------------------------------------------
> 3.8 BAYES_99 BODY: Bayes spam probability is 99 to 100%
> [score: 1.0000]
> 0.0 HTML_MESSAGE BODY: HTML included in message
> 1.2 RDNS_NONE Delivered to internal network by a host with
> no rDNS
>
> To restate the question, if network tests are not outright disabled in
> Amavis, why is Amavis returning lower scores than the SA binary does
> when called directly with network tests disabled? Shouldn't the SA score
> with network tests disabled *always* be lower than or equal to the
> Amavis score with network tests enabled (provided that all else is equal)?
>
> Or am I way off-base here?
>
> Thanks again,
>
> -Ben
>

Upon further consideration, this behavior makes perfect sense if the
mailbox user has moved the message from Inbox to Junk between scans;
Dovecot's Antispam filter is in use on this server. This action would
cause the message tokens to be added to the Bayes database, which
explains why the SA score is higher on subsequent scans, even with
network tests disabled.

Sorry... I'm still trying to wrap my head around all of this. Lots of
moving parts.

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Thu, 10 Jan 2013 12:48:07 -0500
Ben Johnson wrote:
> pon further consideration, this behavior makes perfect sense if the
> mailbox user has moved the message from Inbox to Junk between scans;
> Dovecot's Antispam filter is in use on this server. This action would
> cause the message tokens to be added to the Bayes database, which
> explains why the SA score is higher on subsequent scans, even with
> network tests disabled.

Also by turning-off network tests you switch to a different score set so
the score for RDNS_NONE rose.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/10/2013 1:06 PM, RW wrote:
> On Thu, 10 Jan 2013 12:48:07 -0500
> Ben Johnson wrote:
>> pon further consideration, this behavior makes perfect sense if the
>> mailbox user has moved the message from Inbox to Junk between scans;
>> Dovecot's Antispam filter is in use on this server. This action would
>> cause the message tokens to be added to the Bayes database, which
>> explains why the SA score is higher on subsequent scans, even with
>> network tests disabled.
>
> Also by turning-off network tests you switch to a different score set so
> the score for RDNS_NONE rose.
>

Ahh; I didn't realize that disabling network tests changes the score set
entirely. Thanks for the clarification there.

So, at this point, I'm struggling to understand how the following happened.

Over the course of 15 minutes, I received the same exact message four
times. Each time, the message was sent to the same recipient mailbox.
The "From" and "Return-Path" headers changed slightly each time, but the
message bodies appear to be identical.

Here are the X-Spam-Status headers for each message:

1:28 PM

Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
URIBL_WS_SURBL=1.608] autolearn=disabled

1:35 PM

No, score=-0.374 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RDNS_NONE=0.793,
SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01] autolearn=disabled

1:36 PM

Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
URIBL_WS_SURBL=1.608] autolearn=disabled

1:41 PM

Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
URIBL_WS_SURBL=1.608] autolearn=disabled

Questions:

1.) I have a fairly well-trained Bayes DB; why on earth does a message
with the subject "Cash Quick? Get up to 1500 Now", and an equally
nefarious body, trigger BAYES_00?

2.) Why weren't network tests performed on message 2 of 4? This seems to
be evidence of the fact that network tests are not being performed some
percentage of the time, which could very well be at the root of this
whole problem.

Thanks,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 10-01-13 19:55, Ben Johnson wrote:
>
>
> On 1/10/2013 1:06 PM, RW wrote:
>> On Thu, 10 Jan 2013 12:48:07 -0500
>> Ben Johnson wrote:
>>> pon further consideration, this behavior makes perfect sense if the
>>> mailbox user has moved the message from Inbox to Junk between scans;
>>> Dovecot's Antispam filter is in use on this server. This action would
>>> cause the message tokens to be added to the Bayes database, which
>>> explains why the SA score is higher on subsequent scans, even with
>>> network tests disabled.
>>
>> Also by turning-off network tests you switch to a different score set so
>> the score for RDNS_NONE rose.
>>
>
> Ahh; I didn't realize that disabling network tests changes the score set
> entirely. Thanks for the clarification there.
>
> So, at this point, I'm struggling to understand how the following happened.
>
> Over the course of 15 minutes, I received the same exact message four
> times. Each time, the message was sent to the same recipient mailbox.
> The "From" and "Return-Path" headers changed slightly each time, but the
> message bodies appear to be identical.
>
> Here are the X-Spam-Status headers for each message:
>
> 1:28 PM
>
> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
> URIBL_WS_SURBL=1.608] autolearn=disabled
>
> 1:35 PM
>
> No, score=-0.374 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RDNS_NONE=0.793,
> SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01] autolearn=disabled
>
> 1:36 PM
>
> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
> URIBL_WS_SURBL=1.608] autolearn=disabled
>
> 1:41 PM
>
> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
> URIBL_WS_SURBL=1.608] autolearn=disabled
>
> Questions:
>
> 1.) I have a fairly well-trained Bayes DB; why on earth does a message
> with the subject "Cash Quick? Get up to 1500 Now", and an equally
> nefarious body, trigger BAYES_00?

This will solely depend on the contents of your bayes db. Is this shared
between users, etc etc. No good answer ready without looking at it.

> 2.) Why weren't network tests performed on message 2 of 4? This seems to
> be evidence of the fact that network tests are not being performed some
> percentage of the time, which could very well be at the root of this
> whole problem.

The fact that not a single network test was triggered, is indeed
suspicious. The DNSBL tests are of course sender sender dependent, but
if the body is the same the URIBL stuff should fire. Maybe you DNS
queries timed because your DNS setup is borked? Maybe you should
temporarily enable debug logging for dns lookups in spamassassin?

--
Tom
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Thu, 10 Jan 2013, Ben Johnson wrote:

> So, at this point, I'm struggling to understand how the following happened.
>
> Over the course of 15 minutes, I received the same exact message four
> times. Each time, the message was sent to the same recipient mailbox.
> The "From" and "Return-Path" headers changed slightly each time, but the
> message bodies appear to be identical.
>
> Here are the X-Spam-Status headers for each message:
>
> 1:28 PM
>
> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
> URIBL_WS_SURBL=1.608] autolearn=disabled
>
> 1:35 PM
>
> No, score=-0.374 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RDNS_NONE=0.793,
> SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01] autolearn=disabled
>
> 1:36 PM
>
> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
> URIBL_WS_SURBL=1.608] autolearn=disabled
>
> 1:41 PM
>
> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
> URIBL_WS_SURBL=1.608] autolearn=disabled
>
> Questions:
>
> 1.) I have a fairly well-trained Bayes DB; why on earth does a message
> with the subject "Cash Quick? Get up to 1500 Now", and an equally
> nefarious body, trigger BAYES_00?
>
> 2.) Why weren't network tests performed on message 2 of 4? This seems to
> be evidence of the fact that network tests are not being performed some
> percentage of the time, which could very well be at the root of this
> whole problem.

How many MTAs do you have? Is it possible the low-scoring one went via a
different MTA?

Have you sotpped amavisd, killed all of the amavis processes, and
restarted it?


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim I: Pillage, _then_ burn.
-----------------------------------------------------------------------
7 days until Benjamin Franklin's 307th Birthday
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Thu, 10 Jan 2013 13:55:58 -0500
Ben Johnson wrote:

> So, at this point, I'm struggling to understand how the following
> happened.
>
> Over the course of 15 minutes, I received the same exact message four
> times. Each time, the message was sent to the same recipient mailbox.
> The "From" and "Return-Path" headers changed slightly each time, but
> the message bodies appear to be identical.

> 1.) I have a fairly well-trained Bayes DB; why on earth does a message
> with the subject "Cash Quick? Get up to 1500 Now", and an equally
> nefarious body, trigger BAYES_00?

From what you wrote before the database is trained by end users, so you
can't really be sure that it is well trained.

> 2.) Why weren't network tests performed on message 2 of 4? This seems
> to be evidence of the fact that network tests are not being performed
> some percentage of the time, which could very well be at the root of
> this whole problem.


It may be that there was some local problem, but there is a simpler
explanation. Are you sure that message 2 has exactly the same IP and
URI as 1 and that it hasn't been delayed with respect to 1. The rest are
in RCVD_IN_CSS which is a snow-shoe spam list, so you expect that early
spams from a given IP address wont hit any URI or IP blocklist at all.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/10/2013 4:12 PM, John Hardin wrote:
> On Thu, 10 Jan 2013, Ben Johnson wrote:
>
>> So, at this point, I'm struggling to understand how the following
>> happened.
>>
>> Over the course of 15 minutes, I received the same exact message four
>> times. Each time, the message was sent to the same recipient mailbox.
>> The "From" and "Return-Path" headers changed slightly each time, but the
>> message bodies appear to be identical.
>>
>> Here are the X-Spam-Status headers for each message:
>>
>> 1:28 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> 1:35 PM
>>
>> No, score=-0.374 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RDNS_NONE=0.793,
>> SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01] autolearn=disabled
>>
>> 1:36 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> 1:41 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> Questions:
>>
>> 1.) I have a fairly well-trained Bayes DB; why on earth does a message
>> with the subject "Cash Quick? Get up to 1500 Now", and an equally
>> nefarious body, trigger BAYES_00?
>>
>> 2.) Why weren't network tests performed on message 2 of 4? This seems to
>> be evidence of the fact that network tests are not being performed some
>> percentage of the time, which could very well be at the root of this
>> whole problem.
>
> How many MTAs do you have? Is it possible the low-scoring one went via a
> different MTA?

Just one; there should be no possibility of that.

> Have you sotpped amavisd, killed all of the amavis processes, and
> restarted it?
>
>

I have now. And I enabled amavis's $sa_debug option, so we should see a
lot more in the way of useful SA debugging information now.

In fact, I was just able to capture the out that I believe we're after,
and I'll paste a link in my response to RW's message (shortly forthcoming).

Thanks,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/10/2013 3:13 PM, Tom Hendrikx wrote:
> On 10-01-13 19:55, Ben Johnson wrote:
>>
>>
>> On 1/10/2013 1:06 PM, RW wrote:
>>> On Thu, 10 Jan 2013 12:48:07 -0500
>>> Ben Johnson wrote:
>>>> pon further consideration, this behavior makes perfect sense if the
>>>> mailbox user has moved the message from Inbox to Junk between scans;
>>>> Dovecot's Antispam filter is in use on this server. This action would
>>>> cause the message tokens to be added to the Bayes database, which
>>>> explains why the SA score is higher on subsequent scans, even with
>>>> network tests disabled.
>>>
>>> Also by turning-off network tests you switch to a different score set so
>>> the score for RDNS_NONE rose.
>>>
>>
>> Ahh; I didn't realize that disabling network tests changes the score set
>> entirely. Thanks for the clarification there.
>>
>> So, at this point, I'm struggling to understand how the following happened.
>>
>> Over the course of 15 minutes, I received the same exact message four
>> times. Each time, the message was sent to the same recipient mailbox.
>> The "From" and "Return-Path" headers changed slightly each time, but the
>> message bodies appear to be identical.
>>
>> Here are the X-Spam-Status headers for each message:
>>
>> 1:28 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> 1:35 PM
>>
>> No, score=-0.374 tagged_above=-999 required=2 tests=[BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RDNS_NONE=0.793,
>> SPF_PASS=-0.001, T_LOTS_OF_MONEY=0.01] autolearn=disabled
>>
>> 1:36 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> 1:41 PM
>>
>> Yes, score=7.008 tagged_above=-999 required=2 tests=[.BAYES_00=-1.9,
>> HTML_MESSAGE=0.001, MIME_HTML_ONLY=0.723, RCVD_IN_BRBL_LASTEXT=1.449,
>> RCVD_IN_CSS=1, RCVD_IN_XBL=0.375, RDNS_NONE=0.793, SPF_PASS=-0.001,
>> T_LOTS_OF_MONEY=0.01, URIBL_DBL_SPAM=1.7, URIBL_JP_SURBL=1.25,
>> URIBL_WS_SURBL=1.608] autolearn=disabled
>>
>> Questions:
>>
>> 1.) I have a fairly well-trained Bayes DB; why on earth does a message
>> with the subject "Cash Quick? Get up to 1500 Now", and an equally
>> nefarious body, trigger BAYES_00?
>
> This will solely depend on the contents of your bayes db. Is this shared
> between users, etc etc. No good answer ready without looking at it.

Yes, the Bayes DB is shared between users. But it seems that focusing on
the "low-hanging fruit" (the network test issues) will be more
productive in the short term.

>> 2.) Why weren't network tests performed on message 2 of 4? This seems to
>> be evidence of the fact that network tests are not being performed some
>> percentage of the time, which could very well be at the root of this
>> whole problem.
>
> The fact that not a single network test was triggered, is indeed
> suspicious. The DNSBL tests are of course sender sender dependent, but
> if the body is the same the URIBL stuff should fire. Maybe you DNS
> queries timed because your DNS setup is borked? Maybe you should
> temporarily enable debug logging for dns lookups in spamassassin?
>

I enabled Amavis's SA debugging mode on the server in question and was
able to extract the debug output for two messages that seem like they
should definitely be classified as spam.

Message #1: http://pastebin.com/xLMikNJH

Message #2: http://pastebin.com/Ug78tPrt

A couple points of note and a couple of questions:

a.) There seems to be plenty of network activity, but I don't any
"results" (for lack of a better term) for those queries. The final
X-Spam-Status header that is generated looks like this:

No, score=1.592 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
RDNS_NONE=0.793, SPF_PASS=-0.001] autolearn=disabled

Does the absence of network tests in the resultant header simply mean
that none of the network tests contributed to the score? If so, why
might that be? Are these messages simply "too new" to appear in any
blacklists?

b.) The scores for both messages are identical, which, I suppose, is not
surprising, given that the same exact tests were performed and produced
the same exact results. Is this normal?

c.) 45 minutes after receiving Message #2 from above, I received a very
similar message. The subjects varied only in dollar amount advertised,
and the bodies varies only in the hyperlink URLs and the footer/signature.

Here's the debug output: http://pastebin.com/sLMgXrf5

The second message was scored at 14.75, which seems much better. Of
course, the second score was so much higher because the
network/blacklist tests contributed significantly.

Is the conclusion to be drawn the same as in a) (these messages are "too
new" to appear in blacklists)?

One final point of concern on this item: the Bayes score for the first
of the two emails was BAYES_50=0.8, and I fed the message through
sa-learn as spam shortly after it arrived. Yet, the Bayes score for the
second message was BAYES_40=-0.001 -- *lower* than the first. How could
this be? Is there some rational explanation?

Thanks for all the help here, guys!

-Ben

> --
> Tom
>
>
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/11/2013 4:27 PM, Ben Johnson wrote:
> I enabled Amavis's SA debugging mode on the server in question and was
> able to extract the debug output for two messages that seem like they
> should definitely be classified as spam.
>
> Message #1: http://pastebin.com/xLMikNJH
>
> Message #2: http://pastebin.com/Ug78tPrt
>
> A couple points of note and a couple of questions:
>
> a.) There seems to be plenty of network activity, but I don't any
> "results" (for lack of a better term) for those queries. The final
> X-Spam-Status header that is generated looks like this:
>
> No, score=1.592 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
> RDNS_NONE=0.793, SPF_PASS=-0.001] autolearn=disabled
>
> Does the absence of network tests in the resultant header simply mean
> that none of the network tests contributed to the score? If so, why
> might that be? Are these messages simply "too new" to appear in any
> blacklists?
>
> b.) The scores for both messages are identical, which, I suppose, is not
> surprising, given that the same exact tests were performed and produced
> the same exact results. Is this normal?
>
> c.) 45 minutes after receiving Message #2 from above, I received a very
> similar message. The subjects varied only in dollar amount advertised,
> and the bodies varies only in the hyperlink URLs and the footer/signature.
>
> Here's the debug output: http://pastebin.com/sLMgXrf5
>
> The second message was scored at 14.75, which seems much better. Of
> course, the second score was so much higher because the
> network/blacklist tests contributed significantly.
>
> Is the conclusion to be drawn the same as in a) (these messages are "too
> new" to appear in blacklists)?
>
> One final point of concern on this item: the Bayes score for the first
> of the two emails was BAYES_50=0.8, and I fed the message through
> sa-learn as spam shortly after it arrived. Yet, the Bayes score for the
> second message was BAYES_40=-0.001 -- *lower* than the first. How could
> this be? Is there some rational explanation?
>
> Thanks for all the help here, guys!
>
> -Ben

Nobody?

A clear pattern has emerged: the X-Spam-Status headers for very
obviously spammy messages never contain evidence that network tests
contributed to their SA scores.

Ultimately, I need to know whether:

a.) Network tests are not being run at all for these messages

b.) Network tests are being run, but are failing in some way

c.) Network tests are being run, and are succeeding, but return
responses that do not contribute to the messages' scores

I've had a look at the log entries to which I link in my previous
message and I just need a little help interpreting the "dns" and "async"
messages.

Thanks for any insight,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On Mon, 14 Jan 2013 13:24:55 -0500
Ben Johnson wrote:


> A clear pattern has emerged: the X-Spam-Status headers for very
> obviously spammy messages never contain evidence that network tests
> contributed to their SA scores.
>
> Ultimately, I need to know whether:
>
> a.) Network tests are not being run at all for these messages
>
> b.) Network tests are being run, but are failing in some way
>
> c.) Network tests are being run, and are succeeding, but return
> responses that do not contribute to the messages' scores
>
> I've had a look at the log entries to which I link in my previous
> message and I just need a little help interpreting the "dns" and
> "async" messages.

As I said before, it's not unusual for snowshoe spam to hit no net
tests at all. Also obvious spam isn't any more likely to be in a
blocklist than less obvious spam.

However, try adding this to your SpamAssassin configuration, and
restart the appropriate daemon:

header RCVD_IN_HITALL eval:check_rbl('hitall-lastexternal', 'ipv4.fahq2.com.')
tflags RCVD_IN_HITALL net
score RCVD_IN_HITALL 0.001


It should add a dns test that is hit for all mail delivered from an
IPv4 address.
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/14/2013 2:49 PM, RW wrote:
> On Mon, 14 Jan 2013 13:24:55 -0500
> Ben Johnson wrote:
>
>
>> A clear pattern has emerged: the X-Spam-Status headers for very
>> obviously spammy messages never contain evidence that network tests
>> contributed to their SA scores.
>>
>> Ultimately, I need to know whether:
>>
>> a.) Network tests are not being run at all for these messages
>>
>> b.) Network tests are being run, but are failing in some way
>>
>> c.) Network tests are being run, and are succeeding, but return
>> responses that do not contribute to the messages' scores
>>
>> I've had a look at the log entries to which I link in my previous
>> message and I just need a little help interpreting the "dns" and
>> "async" messages.
>
> As I said before, it's not unusual for snowshoe spam to hit no net
> tests at all. Also obvious spam isn't any more likely to be in a
> blocklist than less obvious spam.
>
> However, try adding this to your SpamAssassin configuration, and
> restart the appropriate daemon:
>
> header RCVD_IN_HITALL eval:check_rbl('hitall-lastexternal', 'ipv4.fahq2.com.')
> tflags RCVD_IN_HITALL net
> score RCVD_IN_HITALL 0.001
>
>
> It should add a dns test that is hit for all mail delivered from an
> IPv4 address.
>

Thanks, RW.

I understand that snowshoe spam may not hit any net tests. I guess my
confusion is around what, exactly, classifies spam as "snowshoe".

Are most/all of the BL services hash-based? In other words, if a known
spam message was added yesterday, will it be considered "snowshoe" spam
if the spammer sends the same message today and changes only one
character within the body?

If so, then I guess the only remedy here is to focus on why Bayes seems
to perform so miserably. It must be a configuration issue, because I've
sa-learn-ed messages that are incredibly similar for two days now and
not only do their Bayes scores not change significantly, but sometimes
they decrease. And I have a hard time believing that one of my users is
sa-train-ing these messages as ham and negating my efforts.

I have ensured that the spam token count increases when I train these
messages. That said, I do notice that the token count does not *always*
change; sometimes, sa-learn reports "Learned tokens from 0 message(s) (1
message(s) examined)". Does this mean that all tokens from these
messages have already been learned, thereby making it pointless to
continue feeding them to sa-learn?

If I receive one more uncaught message about how some mom is angering
doctors by doing something crazy to her face, I'm going to hunt-down the
****er and rip her face OFF.

Finally, I added the test you supplied to my SA configuration, restarted
Amavis, and all messages appear to be tagged with RCVD_IN_HITALL=0.001.

Thanks for all your help,

-Ben
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 2013/01/14 10:24, Ben Johnson wrote:
>
>
> On 1/11/2013 4:27 PM, Ben Johnson wrote:
>> I enabled Amavis's SA debugging mode on the server in question and was
>> able to extract the debug output for two messages that seem like they
>> should definitely be classified as spam.
>>
>> Message #1: http://pastebin.com/xLMikNJH
>>
>> Message #2: http://pastebin.com/Ug78tPrt
>>
>> A couple points of note and a couple of questions:
>>
>> a.) There seems to be plenty of network activity, but I don't any
>> "results" (for lack of a better term) for those queries. The final
>> X-Spam-Status header that is generated looks like this:
>>
>> No, score=1.592 tagged_above=-999 required=2 tests=[BAYES_50=0.8,
>> RDNS_NONE=0.793, SPF_PASS=-0.001] autolearn=disabled
>>
>> Does the absence of network tests in the resultant header simply mean
>> that none of the network tests contributed to the score? If so, why
>> might that be? Are these messages simply "too new" to appear in any
>> blacklists?
>>
>> b.) The scores for both messages are identical, which, I suppose, is not
>> surprising, given that the same exact tests were performed and produced
>> the same exact results. Is this normal?
>>
>> c.) 45 minutes after receiving Message #2 from above, I received a very
>> similar message. The subjects varied only in dollar amount advertised,
>> and the bodies varies only in the hyperlink URLs and the footer/signature.
>>
>> Here's the debug output: http://pastebin.com/sLMgXrf5
>>
>> The second message was scored at 14.75, which seems much better. Of
>> course, the second score was so much higher because the
>> network/blacklist tests contributed significantly.
>>
>> Is the conclusion to be drawn the same as in a) (these messages are "too
>> new" to appear in blacklists)?
>>
>> One final point of concern on this item: the Bayes score for the first
>> of the two emails was BAYES_50=0.8, and I fed the message through
>> sa-learn as spam shortly after it arrived. Yet, the Bayes score for the
>> second message was BAYES_40=-0.001 -- *lower* than the first. How could
>> this be? Is there some rational explanation?
>>
>> Thanks for all the help here, guys!
>>
>> -Ben
>
> Nobody?
>
> A clear pattern has emerged: the X-Spam-Status headers for very
> obviously spammy messages never contain evidence that network tests
> contributed to their SA scores.
>
> Ultimately, I need to know whether:
>
> a.) Network tests are not being run at all for these messages
>
> b.) Network tests are being run, but are failing in some way
>
> c.) Network tests are being run, and are succeeding, but return
> responses that do not contribute to the messages' scores
>
> I've had a look at the log entries to which I link in my previous
> message and I just need a little help interpreting the "dns" and "async"
> messages.

Ben, do be aware that sometimes you draw the short straw and sit at the
very start of the spam distribution cycle. In those cases the BLs will
generally not have been alerted yet so they may not trigger. For those
situations the rules should be your friends. (I still use my treasured
set of SARE rules and personally hand crafted rules my partner and I
have created that fit OUR needs but may not be good general purpose
rules.)

{^_^}
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 2013/01/14 12:59, Ben Johnson wrote:
>
>
> On 1/14/2013 2:49 PM, RW wrote:
>> On Mon, 14 Jan 2013 13:24:55 -0500
>> Ben Johnson wrote:
>>
>>
>>> A clear pattern has emerged: the X-Spam-Status headers for very
>>> obviously spammy messages never contain evidence that network tests
>>> contributed to their SA scores.
>>>
>>> Ultimately, I need to know whether:
>>>
>>> a.) Network tests are not being run at all for these messages
>>>
>>> b.) Network tests are being run, but are failing in some way
>>>
>>> c.) Network tests are being run, and are succeeding, but return
>>> responses that do not contribute to the messages' scores
>>>
>>> I've had a look at the log entries to which I link in my previous
>>> message and I just need a little help interpreting the "dns" and
>>> "async" messages.
>>
>> As I said before, it's not unusual for snowshoe spam to hit no net
>> tests at all. Also obvious spam isn't any more likely to be in a
>> blocklist than less obvious spam.
>>
>> However, try adding this to your SpamAssassin configuration, and
>> restart the appropriate daemon:
>>
>> header RCVD_IN_HITALL eval:check_rbl('hitall-lastexternal', 'ipv4.fahq2.com.')
>> tflags RCVD_IN_HITALL net
>> score RCVD_IN_HITALL 0.001
>>
>>
>> It should add a dns test that is hit for all mail delivered from an
>> IPv4 address.
>>
>
> Thanks, RW.
>
> I understand that snowshoe spam may not hit any net tests. I guess my
> confusion is around what, exactly, classifies spam as "snowshoe".
>
> Are most/all of the BL services hash-based? In other words, if a known
> spam message was added yesterday, will it be considered "snowshoe" spam
> if the spammer sends the same message today and changes only one
> character within the body?
>
> If so, then I guess the only remedy here is to focus on why Bayes seems
> to perform so miserably. It must be a configuration issue, because I've
> sa-learn-ed messages that are incredibly similar for two days now and
> not only do their Bayes scores not change significantly, but sometimes
> they decrease. And I have a hard time believing that one of my users is
> sa-train-ing these messages as ham and negating my efforts.
>
> I have ensured that the spam token count increases when I train these
> messages. That said, I do notice that the token count does not *always*
> change; sometimes, sa-learn reports "Learned tokens from 0 message(s) (1
> message(s) examined)". Does this mean that all tokens from these
> messages have already been learned, thereby making it pointless to
> continue feeding them to sa-learn?
>
> If I receive one more uncaught message about how some mom is angering
> doctors by doing something crazy to her face, I'm going to hunt-down the
> ****er and rip her face OFF.
>
> Finally, I added the test you supplied to my SA configuration, restarted
> Amavis, and all messages appear to be tagged with RCVD_IN_HITALL=0.001.

As much as I might applaud that sentiment I'd like to note two things.
First, it might involve just a whole lot of nasty paperwork and unpleasant
contact with authorities. Second the energy wasted doing that might have
been better spent had you learned how to create rules and recognize the
elements of a spam that are likely to be relatively unique so you can
create rules for it.

After awhile creating rules to knock down such "stuff" can become fun.
(Then after a longer while it gets "old", sigh.)

Another thing to learn in the process is that what you consider to be
spam is another person's (jerk's?) ham. So crafting rules needs to be
done with care if you're filtering for more than one person. Erm, of
course this is what allowing per user rules is good for.

{^_^}
Re: Calling spamassassin directly yields very different results than calling spamassassin via amavis-new [ In reply to ]
On 1/14/2013 2:59 PM, Ben Johnson wrote:

> I understand that snowshoe spam may not hit any net tests. I guess my
> confusion is around what, exactly, classifies spam as "snowshoe".

Snowshoe spam - spreading a spam run across a large number of IPs so
no single IP is sending a large volume. Typically also combined
with "natural language" text, RFC compliant mail servers, verified
SPF and DKIM, business-class ISP with FCrDNS, and every other
criteria to look like a legit mail source. This type of spam is
difficult to catch.

http://www.spamhaus.org/faq/section/Glossary#233
and countless other links if you ask google.

> Are most/all of the BL services hash-based? In other words, if a known
> spam message was added yesterday, will it be considered "snowshoe" spam
> if the spammer sends the same message today and changes only one
> character within the body?

No, most all DNS blacklists are based on IP reputation. Check each
list's website for their listing policy to see how an IP gets on
their list; generally honypot email addresses or trusted user
reports. Most lists require some number of reports before listing
an IP to prevent false positives; snowshoe spammers take advantage
of this.

> If so, then I guess the only remedy here is to focus on why Bayes seems
> to perform so miserably.

Sounds as if your bayes has been improperly trained in the past.
You might do better to just delete the bayes db and start over with
hand-picked spam and ham.



-- Noel Jones

1 2 3 4  View All