Mailing List Archive

[Bug 8214] __HAS_ANY_URI matches non-URI
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8214

--- Comment #9 from Bill Cole <billcole@apache.org> ---
(In reply to Jared from comment #7)
[...]
> In the case of my KDDI DMARC messages, with: Return-Path: <> added
> ANY_BOUNCE_MESSAGE=2.5 to the SCC_BODY_URI_ONLY=2.796. Dunno, seems to be a
> steadily cascading CF of scores.

If ANY_BOUNCE_MESSAGE is scored for you at 2.5, you must have chosen that score
for yourself locally. In the default channel it is nailed to 0.1. Doesn't seem
like "cascading" in any way to me.

> I don't really get the unwillingness to correct a problem that SA themselves
> introduced with SCC_BODY_URI_ONLY. Can you make the suggestion that they
> lower the scoring.

There is no 'they' here. Both John and I are committers and PMC members. I
happen to have added this particular rule and it is defined in my 'sandbox' but
any committer could change the rule or score if they saw a need. I hope they'd
discuss it with me.

There's no unwillingness to fix well-defined problems that have good solutions.
We add and change rules almost every day, informed in part by the automated
rule QA system. We do not remove rules that have high hit rates and good
accuracy, we work to improve them. We don't manually assign high scores, we
require rules to earn them in QA. We will often limit the score of a rule with
a lot of random FPs but ideally the fix to FPs *at the project level* is to
find commonalities in them and attack those.

> Per SPAMASSASSIN/FalsePositives: "Rules that hit
> non-spam get very low scores.
> ...
> low-scoring rules do not make much of a difference; it's the ones with high
> scores that need to be avoided".

That is a substantively automated process. RuleQA runs nightly and adjusts
scores based on the mass-check logs submitted to us for that purpose. You can
see some of the stats for that at https://ruleqa.spamassassin.org/. Currently
it shows SCC_BODY_URI_ONLY hitting 28% of spam and 0.33% of non-spam. 98.8%
accuracy. The rescoring algorithm looks at those stats and at how much overlap
there is between rules and adjusts scores to suit.

> Thus far, I've seen no ill affect from setting SCC_BODY_URI_ONLY = 0.

And you may see no problems. I am actually quite surprised by how much spam
that rule hits in the QA submissions, which probably include more gutter-grade
spam than most sites ever feed to SA. The change I committed today will exclude
most DMARC reports and adds a negatively-scored "nice" rule for detected DMARC
reports because they are unlikely to be spam per se. It should work its way
through QA over the weekend.

--
You are receiving this mail because:
You are the assignee for the bug.