Mailing List Archive

[Spamassassin Wiki] Update of "PreflightBuildBot" by JustinMason
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/PreflightBuildBot

The comment on the change is:
documentation about this new daemon

New page:
= Preflight mass-checks buildbot =

The preflight mass-check buildbot is running at: http://buildbot.spamassassin.org:8011/ .

== The preflight mass-check corpus ==

The idea is that we have this corpus split into multiple differently-sized chunks, starting with a small set of mail in the "mc-fast" chunk, and gradually increasing until we get to the largest block in "mc-slower". This division means that early "fast" results can arrive quickly, with less to scan, and as time goes on, more and more of the "slower" slaves complete their mass-checks and upload the results.

The filesystem layout is like this:

{{{
/home/bbmass/cor/CORPUSNAME/TYPE/WHO
}}}

Each "CORPUSNAME" directory corresponds to one of the 'slaves' listed on http://buildbot.spamassassin.org:8011/ , "mc-fast", "mc-med", "mc-slow", "mc-slower".

Under that, we have "TYPE", which is either "ham" or "spam".

Next, "WHO". This is the username of the person whose corpus it is!

And under that, is another level of directories, whatever the person feels is appropriate. For example, I use date-stamped dirs here.

The result is e.g.:

{{{
/home/bbmass/cor/mc-fast/ham/jm/20051018a
/home/bbmass/cor/mc-fast/spam/jm/20051018a
/home/bbmass/cor/mc-fast/spam/jm/20051018b
}}}

How mass-check discovers this -- at the selection level, every "CORPUSNAME" dir
has a 'targets' file, something like the following in
/home/bbmass/cor/mc-fast/targets:

{{{
ham:dir:/home/bbmass/cor/mc-fast/ham/jm/*
spam:dir:/home/bbmass/cor/mc-fast/spam/jm/*
ham:dir:/home/bbmass/cor/mc-fast/ham/username/*
spam:dir:/home/bbmass/cor/mc-fast/spam/username/*
}}}

ie. a file listing all the targets to mass-check.

== Uploading corpora ==

In terms of getting corpora in there -- this will be via rsync, probably, but
isn't yet implemented (TODO). Right now I'm just schlepping around tarfiles
over SSH.

== Creating a new buildbot slave to perform mass-checks ==

I don't see us needing to do this anytime soon, but it's worth recording.
Here's the commands that do this, run in the zone.

{{{
PASSWORD=[randompassword]
NAME=mc-new

sudo mkdir -p /home/bbmass/slaves/$NAME
sudo chown bbmass /home/bbmass/slaves/$NAME

cd /home/bbmass/slaves/$NAME
sudo su bbmass -c \
"mktap buildbot slave --basedir /home/bbmass/slaves/$NAME \
--master buildbot.spamassassin.org:9988 --name $NAME \
--passwd $PASSWORD --usepty=0"

echo $PASSWORD > $HOME/pwd
sudo mv $HOME/pwd /home/buildbot/pwds/$NAME
sudo chown buildbot /home/buildbot/pwds/$NAME
sudo chmod 600 /home/buildbot/pwds/$NAME

sudo vi /home/buildbot/bots/bbmass/master.cfg

[search for mc-fast and add new lines/entries for $NAME]

sudo vi /etc/init.d/buildbot

[search for mc-fast and add new lines/entries for $NAME]

}}}