Mailing List Archive

[Spamassassin Wiki] Update of "SaUpdatePlan" by BobMenschel
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by BobMenschel:
http://wiki.apache.org/spamassassin/SaUpdatePlan

The comment on the change is:
Added comments about scoring

------------------------------------------------------------------------------

TODO: need more detail here

+ == Scoring ==
+
+ The primary reason for the long release time for new rules is the need to score new rules (and rescore old rules) to generate optimal scores, to flag as much spam as reasonably, while keeping false positives to a conservative minimum.
+
+ BobMenschel: The ideal would be to find some way to incorporate new rules into a GA/Perceptron-line mechanism, perhaps a Perceptron run which a) assumes whatever hit frequency applied to the last full scoring run, b) freezes all scores in all score sets according to the most recent distribution, and then c) incorporates an sa-update scoring run and calculates appropriate scores for the new rules.
+
+ If that's not practical, then perhaps we can use some standardized algorithms to determine provisional scores. The algorithms we use for general purpose rules within SARE seem to work very well, adding significantly to spam scores without causing any significant number of FPs.
+
+ DanielQuinlan favored the first options, saying, "That would not be too hard and would be more accurate than any estimation technique. There is definitely a correlation between hit rates, S/O ratio, RANK, etc. to the ultimate perceptron-generated score, but the correlations are not all that high, unfortunately."
+