Mailing List Archive

Automatic Updates
On Mon, 9 Feb 2004, Matt Kettler wrote:
> ... because I've already got a tool that covers it, complete with
> intelligent, cryptographically signed, automatic updates and everything.

Which reminds me. The above comment is about clamav, and I know that there
has been some effort to automate updates of 'bigevil' and custom rulesets,
but I was wondering if there are any plans to automate the upgrades to
spamassassin?

More particularly, when I read the coments on 2.63 I go the impression
that they had updated code, along with adding rules, and so I faced the
potetential for the new code perhaps 'breaking' on my system/config.
Is there anyway to get updated rules *only*, without program updates?
I realize that some updates involve 'eval' rules and that does get into
program code, but still, I think it would be nice if there was some way to
have the spamassassin rules updates work a bit more like a virus scanner
update, with new rules properlyl 'reviewed' so that people like me without
a 'corpus' could trust a 'central repository' to have a good generic set
of rules, and have them updated often enough to catch the new spammer
'tricks' within a week or so?

Just a stray idea (and possibly too much wishful thinking). :-)

- Charles
Re: Automatic Updates [ In reply to ]
[.Note to listmom - disregard my last post to the old list - stupid stale
Pine aliases...]

Hi,

On Mon, 9 Feb 2004, Charles Gregory wrote:

> On Mon, 9 Feb 2004, Matt Kettler wrote:
> > ... because I've already got a tool that covers it, complete with
> > intelligent, cryptographically signed, automatic updates and everything.
>
> Which reminds me. The above comment is about clamav, and I know that there
> has been some effort to automate updates of 'bigevil' and custom rulesets,
> but I was wondering if there are any plans to automate the upgrades to
> spamassassin?

No. The long answer is here:

http://wiki.spamassassin.org/w/VirusScannerTypeUpdates

The short answer is that to achieve the high accuracy, the rule scores
need careful testing and balancing (this took four runs using 400,000
sample messages over a period of four weeks.) Additionally, unlike
signature-based virus scanners, SA rules often depend on code. Some of
this might be helped by the new plugin architecture in 2.7x and the work
done on replacing the GA with a perceptron-based score balancer but don't
expect virus-scanner-like signature updates anytime soon.

-- Bob
Re: Automatic Updates [ In reply to ]
At 06:46 PM 2/9/2004, Charles Gregory wrote:
>On Mon, 9 Feb 2004, Matt Kettler wrote:
> > ... because I've already got a tool that covers it, complete with
> > intelligent, cryptographically signed, automatic updates and everything.
>
>Which reminds me. The above comment is about clamav, and I know that there
>has been some effort to automate updates of 'bigevil' and custom rulesets,
>but I was wondering if there are any plans to automate the upgrades to
>spamassassin?

No, it's fundamentally not possible given what SA is, and how official
rulesets are made...

read the fine FAQ:

http://wiki.spamassassin.org/w/VirusScannerTypeUpdates
Re: Automatic Updates [ In reply to ]
On Mon, 9 Feb 2004, Matt Kettler wrote:
> read the fine FAQ:
> http://wiki.spamassassin.org/w/VirusScannerTypeUpdates

A very good, compelling argument for the need to thoroughly
research/balance scoring, etc. However, I would still like to advocate the
idea of incremental rule 'adjustment'. Specifically, I think of the
rapidly evolving obfuscation of everyone's favourite V drug. The spammers
are adjusting their 'tricks' on an almost weekly basis, and it seems to me
that in many instances, the obfuscation is only a variant not caught by
spamassassin but does not represent a serious difference to the spam/ham
ratio. If SA checks for 'abc_d' and the following week the spammers start
using 'ab_cd', why should this take a month or so to update? The
functional nature of the rule remains the same, so the scoring remains the
same.

Anyways, I don't wish to belabor a point, or open up useless arguments.
If someone sees merit in this idea, then I invite them to pursue it.
Otherwise, please accept my gratitude for such an excellent system that is
also highly customizable. Either way, thanks!

- Charles
Re: Automatic Updates [ In reply to ]
At 11:30 PM 2/9/04 -0500, Charles Gregory wrote:
>A very good, compelling argument for the need to thoroughly
>research/balance scoring, etc. However, I would still like to advocate the
>idea of incremental rule 'adjustment'. Specifically, I think of the
>rapidly evolving obfuscation of everyone's favourite V drug. The spammers
>are adjusting their 'tricks' on an almost weekly basis, and it seems to me
>that in many instances, the obfuscation is only a variant not caught by
>spamassassin but does not represent a serious difference to the spam/ham
>ratio. If SA checks for 'abc_d' and the following week the spammers start
>using 'ab_cd', why should this take a month or so to update?


Agreed, and this is currently the role filled by the add-on rulesets
written by many of the SA advocates and users (myself included).

Ideally, it would be very beneficial if the add-on rulesets became more
organized and formalized via some kind of "micro GA" system with an
organized release and scoring system.

However, for the general ruleset, there's by far too many ties to take
things to the length of thinking that the "rules" are somehow not part of
the "code" and can somehow be "updated" without it. Even without the
extensive GA process, The HTML tokenizer, QP decoder, and other parts of
the "code" have deep running implications on many of the rules. Not to
mention rules that are actually implemented directly in the code itself
(eval tests are actually IN the code itself, and called from a rule name).

See http://wiki.spamassassin.org/w/CustomRulesets for a decent listing of
add-on rulesets that are being worked on.

Some of these will likely be folded back into the official SA sooner or
later, some may always exist as add-ons.
RE: Automatic Updates [ In reply to ]
> -----Original Message-----
> From: Matt Kettler [mailto:mkettler_sa@comcast.net]
> Sent: Tuesday, February 10, 2004 12:17 AM
> To: Charles Gregory; spamassassin-users@incubator.apache.org
> Subject: Re: Automatic Updates
>
>
> At 11:30 PM 2/9/04 -0500, Charles Gregory wrote:
> >A very good, compelling argument for the need to thoroughly
> >research/balance scoring, etc. However, I would still like
> to advocate the
> >idea of incremental rule 'adjustment'. Specifically, I think of the
> >rapidly evolving obfuscation of everyone's favourite V drug.
> The spammers
> >are adjusting their 'tricks' on an almost weekly basis, and
> it seems to me
> >that in many instances, the obfuscation is only a variant
> not caught by
> >spamassassin but does not represent a serious difference to
> the spam/ham
> >ratio. If SA checks for 'abc_d' and the following week the
> spammers start
> >using 'ab_cd', why should this take a month or so to update?
>
>
> Agreed, and this is currently the role filled by the add-on rulesets
> written by many of the SA advocates and users (myself included).
>

Me too! ;) As someone else has mentioned before, the previous months have
seen the most rules work done of any SA release. This is the first time a
group has been semi-organised to do just rules. We have 8-12 people looking
at rules in there 'spare' time. The devs have their hands full with coding
new features.

> Ideally, it would be very beneficial if the add-on rulesets
> became more
> organized and formalized via some kind of "micro GA" system with an
> organized release and scoring system.

I thought a Micro GA was called an "R.M. run" :) We have plans to oganise
the rules better, like 419.cf and such. Theoretically SA could have modular
rules like we are doing. And update them anytime. But who will do that?

We even have one person who is now the official "rule submitter" to the SA
release. He looks for the best custom ones, and submits them for the GA for
the next release. It is like watching glaciers race! So sloooow.

>
> However, for the general ruleset, there's by far too many
> ties to take
> things to the length of thinking that the "rules" are somehow
> not part of
> the "code" and can somehow be "updated" without it. Even without the
> extensive GA process, The HTML tokenizer, QP decoder, and
> other parts of
> the "code" have deep running implications on many of the
> rules. Not to
> mention rules that are actually implemented directly in the
> code itself
> (eval tests are actually IN the code itself, and called from
> a rule name).

Yup, as seen with simple meta addition code. If you aren't running 2.60 or
higher, you can't use those.

>
> See http://wiki.spamassassin.org/w/CustomRulesets for a
> decent listing of
> add-on rulesets that are being worked on.

Hey that is a cool page!! And we get the black ninja!!! Thankfully not the
pink one. Because as I said before, the pink ninja frieghtens me!

>
> Some of these will likely be folded back into the official SA
> sooner or
> later, some may always exist as add-ons.
>

Agreed. Some will have to simply remain out of SA for the sake of swift
changes. For speed in reacting to spammers we have to forgo some of the cool
geeky stuff like a GA run. Replaced with micro GA runs and a human "Fudge
Factor" (My favorite Calc term!) Often we find that a little human intuition
can score a rule in seconds compared to a GA run.

Right now the rule writer are taking a big group deep breath. A lot of
rulesets came out and are working wonderfully. So a little break to work on
our real jobs is happening now. Soon another push will be made. But for now,
I think we are clearly ahead of the spammers.

--Chris
Re: Automatic Updates [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Matt Kettler writes:
>At 11:30 PM 2/9/04 -0500, Charles Gregory wrote:
>>A very good, compelling argument for the need to thoroughly
>>research/balance scoring, etc. However, I would still like to advocate the
>>idea of incremental rule 'adjustment'. Specifically, I think of the
>>rapidly evolving obfuscation of everyone's favourite V drug. The spammers
>>are adjusting their 'tricks' on an almost weekly basis, and it seems to me
>>that in many instances, the obfuscation is only a variant not caught by
>>spamassassin but does not represent a serious difference to the spam/ham
>>ratio. If SA checks for 'abc_d' and the following week the spammers start
>>using 'ab_cd', why should this take a month or so to update?
>
>
>Agreed, and this is currently the role filled by the add-on rulesets
>written by many of the SA advocates and users (myself included).
>
>Ideally, it would be very beneficial if the add-on rulesets became more
>organized and formalized via some kind of "micro GA" system with an
>organized release and scoring system.

Well, we *are* looking for volunteers here ;)

We need someone to help out with rule QA and "folding back" into the main
set; I'd say we could do interim GA runs, but we haven't quite got to that
stage of discussion yet. For now, we're just short-handed on that
side of things.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAKV0RQTcbUG5Y7woRAkScAJ998wmOYiSHyi/B61Gn//X/GPxbnACeNJ5U
Lw3fg+AMeZrJDxmmlmH4NSw=
=/8FN
-----END PGP SIGNATURE-----