Mailing List Archive

[Bug 3228] New: RFE: Performance improving with two score thresholds
http://bugzilla.spamassassin.org/show_bug.cgi?id=3228

Summary: RFE: Performance improving with two score thresholds
Product: Spamassassin
Version: unspecified
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P5
Component: spamassassin
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: nicola.gatta@assyrus.it


Currently spamassassin performs all checks on headers, body, RBL, etc.
After performing all the checks it compares the score value with "required_hits"
and marks the message as spam if the score is greater than required hits.
Suppose that a message has a high score: sa doesn't need to perform all the
checks, but it can stop the checks and mark the message as a spam just after the
score become greater than "required_hits".
I think it could be interesting using a two thresholds, like "required_hits_low"
and "required_hits_high".
Using this two values sa can classify messages in three buckets:
1) Non spam
2) Probable spam
3) Certain spam

The behaviour of sa should be something like the following.
Assuming that checks are sorted by speed:

begin
foreach (check in all_checks) {
do(check);
if (score > required_hits_high)
goto out;
}

out:
if (score < required_hits_low)
message_rated_as_non_spam;
elsif ( score > required_hits_high)
message_rated_as_certain_spam;
else
message_rated_as_probable_spam;
end;

So with "required_hits_low" you specify the score which identify spam.
With "required_hits_high" you can specify a limit to CPU usage
when you encounter a high score message.
Additionally you can have two different subject_tag.

I wrote a patch for the Debian/unstable sa package (2.63-1) and I'm currently
testing it.
The patch modify a the check() method in PerMsgStatus.pm and a few lines in
Spamassassin.pm and (obvious) in configuration file.

I can send the patch if you like it.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.