Mailing List Archive: [Spamassassin Wiki] Update of "RulesProjBuildBot" by JustinMason

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RulesProjBuildBot

The comment on the change is:
go into detail on the buildbot idea

New page:
= Rules Project: BuildBot =

(part of RulesProjectPlan)

For active rule development, it's important to get rules mass-checked very
quickly.

Loren outlined the system used in SARE:

* rule developer sends mail to mailing list
* various other participants run scripts that automatically extract certain attachments posted to the list
* turn those into rules files
* lint them
* run a mass-check immediately with just the rules in that file
* post results including hit freqs and false positives matches
* masscheck requester asks for false positive verification based on report

For active rule development, this is obviously quite useful! If you can't run
mass-check locally for whatever reason, it offers a way to do this using other
people's corpora in almost-real-time.

JustinMason: We had been considering a pretty direct copy of this. However I
think I've thought of an alternative that improves on it, in my opinion...

== The Preflight Mass-checking BuildBot ==

We set up a BuildBot. It is dedicated to performing mass-checks in the zone,
in a chroot, solely on the rules found in the "rulesrc/sandbox" tree. It
uses a small(ish) corpus so that mass-checks run quickly.

Mail-submitted rules: if a mail arrives from a known good source at a certain
address, and contains a text/plain attachment, that attachment is extracted and
passed through "spamassassin --lint". If it passes, it's checked in as
"rulesrc/sandbox/mailed/latest.cf", triggering the preflight buildbot to start
its mass-checks.

Results are visible immediately through the normal buildbot web UI. They are
presented as 'freqs' output (as usual).

Because mail-submitted rules are OK, we have to be a little more careful than
normal. For example, plugins are ignored in mail-submitted rules. Nothing
happens if --lint fails. In addition, the preflight mass-check script runs
with very paranoid ulimits, to trap CPU-load problems.

Why use BuildBot?

* Good web UI for "builds in progress"; you can monitor progress as it happens
* Designed to do "build whenever necessary, but not more than necessary", ie it solves the load issues caused by continuous integration
* Every mass-check output and every 'freqs' output will get a HTTP permalink, which allows side discussion to "point at" test results
* Integrated with SVN for version control and history tracking, so we can easily find test results that correspond to a mailed-in rules file

In terms of results of mail-in rule checking; there's no automated check-ins
into the "sandboxes" or "core" from this. Instead rules considered suitable
for use are ''manually'' checked into the "sandbox" area by one of the
committers who has privs to do that. With luck, they'll go into the core
based on the automated testing described in RulesProjStreamlining. This is
a rule-QA system, not an alternative to "svn commit".

== Issues ==

''The mailed-in rules are checked into SVN temporarily, therefore will be
publicly viewable.''

JustinMason: In my opinion, we should go with this for now anyway. The use of
SVN makes it hard to make it privately-viewable only, but at the same time
offers some of the biggest advantages -- such as keeping archive copies of the
submitted rules after the mass-check has completed, providing network
transparency, and ensuring that submitted rules get mass-checked (eventually).

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RulesProjBuildBot

The comment on the change is:
add security note

------------------------------------------------------------------------------
* Designed to do "build whenever necessary, but not more than necessary", ie it solves the load issues caused by continuous integration
* Every mass-check output and every 'freqs' output will get a HTTP permalink, which allows side discussion to "point at" test results
* Integrated with SVN for version control and history tracking, so we can easily find test results that correspond to a mailed-in rules file
+ * Allows us to run mass-checks securely, in a buildbot slave running in a segmented chroot jail

In terms of results of mail-in rule checking; there's no automated check-ins
into the "sandboxes" or "core" from this. Instead rules considered suitable