Mailing List Archive

[Spamassassin Wiki] Update of "SaUpdatePlan" by JustinMason
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/SaUpdatePlan

------------------------------------------------------------------------------
- == Rules Project: Streamlining the rules process ==
+ == Rules Project: sa-update and rules release cycle ==

'''(part of RulesProjectPlan)'''

+ Problem description: '2. The "release cycle" problem: Any high quality rules that are incorporated into SpamAssassin are not distributed until the next release. Since rules and code are tied together, the release cycle for rules is too long. Submitted rules are not distributed while they are most effective, and rules lose their effectiveness too quickly.'
- Problem description: '2. The "release cycle" problem: Any high quality rules
- that are incorporated into SpamAssassin are not distributed until the next
- release. Since rules and code are tied together, the release cycle for rules is
- too long. Submitted rules are not distributed while they are most effective, and
- rules lose their effectiveness too quickly.'

Theo has written sa-update -- a new script that will be included with SpamAssassin 3.1.0. In theory, we will be able to distribute rules more frequently, and rules releases won't be tied to code releases.

- TODO: need more detail here
+ TODO: need more detail here. Ping, Theo!

== Scoring ==

@@ -24, +20 @@


DanielQuinlan favored the first options, saying, "That would not be too hard and would be more accurate than any estimation technique. There is definitely a correlation between hit rates, S/O ratio, RANK, etc. to the ultimate perceptron-generated score, but the correlations are not all that high, unfortunately."

+ JustinMason: yes, agreed; perceptron just does a better job, every time. Having said that, we don't need to institute a policy requiring regular perceptron runs; we can actually measure false positive rates across an entire corpus, using the 'fp-fn-statistics' masses tool, and get an idea of whether the current scoreset is FP-prone or FN-prone as a whole (indicating that the perceptron needs to be run soon), or not.
+
+ in other words, let's defer on making this a task right now ;)
+