Mailing List Archive: Bayes autolearn: how does it resolve whether rules are body or header related?

Bayes autolearn: how does it resolve whether rules are body or header related?

May 8, 2021, 7:17 PM

Post #1 of 7 (927 views)

Dear fellow Spamassassin users,

I recently noticed that quite a lot of spam emails with high scores
weren't marked for Bayes autolearning. While some senders and receivers
were a common match, explaining why autolearn was nog, there was no
clear explanation for other cases. I therefore put Spamassassin in debug
mode to check in more detail, and noticed that fairly often autolearn is
not used because the minimum score for body tests isn't achieved. After
looking at some specific cases, it seems however that several rules are
either not considered when calculating the header rule score and body
rule score for Bayes autolearning. I've always presumed these scores are
calculated based on whether the underlying rule performs a regex on a
header or on the body, but now I'm not so sure any more. I hope you can
help clear up whether this is intended behaviour (and what that
behaviour is) or whether I should report this as a bug.

One example I noticed is URI_DEOBFU_INSTR=3.595. This is if I understand
it correctly a URI test that's performed on the body. Should a test like
this be counted towards the body score count? Then there's the question
of meta rules such as MONEY_NOHTML. If you resolve the different meta
levels within this rule, it's a combination of header and body, however
it's only counted towards the header score. Finally, it seems as if
custom rules I've added within local.cf aren't considered. Is that
indeed the case (and if so, is that by design)? I'm also not completely
sure if UNWANTED_BODY_LANGUAGE and tests like razor, pyzor and DCC are
considered for body scores.

Within the same realm, I'm also wondering whether these expected numbers
for body and header can be tweaked and if so, how. For example the case
below isn't autolearned even though it has a huge score and a vast
amount of tests going off, but seemingly not enough body-related scores.
Is that really the intended behaviour?

May 8 10:40:32 mail amavis[4076058]: (4076058-16)
header_edits_for_quar: <fineart@dasanart.com> ->
<gdpr@notgoingtoshare.tld>, Yes, score=24.619 tag=-9999 tag2=5 kill=7.5
tests=[.ADVANCE_FEE_3_NEW_MONEY=0.001,
AXB_XMAILER_MIMEOLE_OL_024C2=0.001, BAYES_50=0.8, BERT_KULSPAM=1,
FORGED_MUA_OUTLOOK=1.927, FREEMAIL_FORGED_REPLYTO=2.095,
FREEMAIL_REPLYTO=1, FREEMAIL_REPLYTO_END_DIGIT=0.25,
FROM_MISSPACED=0.001, FROM_MISSP_EH_MATCH=0.001,
FROM_MISSP_FREEMAIL=0.001, FROM_MISSP_MSFT=0.001,
FROM_MISSP_REPLYTO=2.497, FSL_BULK_SIG=0.001, FSL_CTYPE_WIN1251=0.001,
FSL_NEW_HELO_USER=0.001, KHOP_HELO_FCRDNS=0.398, LOTS_OF_MONEY=0.001,
MISSING_HEADERS=1.021, MISSING_MID=0.497, MONEY_FREEMAIL_REPTO=1.202,
MONEY_FROM_MISSP=0.001, MONEY_NOHTML=2.497, NSL_RCVD_HELO_USER=0.001,
PYZOR_CHECK=1.392, REPLYTO_WITHOUT_TO_CC=1.552, REPTO_419_FRAUD=2.996,
SPF_HELO_NONE=0.001, TO_NO_BRKTS_FROM_MSSP=1.593,
TO_NO_BRKTS_MSFT=1.888, XFER_LOTSA_MONEY=0.001] autolearn=no
autolearn_force=no

Thank you in advance for your help. If you need any more examples or
would us to run some tests, then feel free to let me know.

Kind regards,
Bert Van de Poel
ULYSSIS