Mailing List Archive

[Spamassassin Wiki] Update of "PreflightBuildBot" by JustinMason
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/PreflightBuildBot

The comment on the change is:
record more info about setting up rsync

------------------------------------------------------------------------------
=== Admin: Creating a new rsync area for someone to upload corpora ===

{{{
+ sudo vi /etc/rsyncd.conf
+ }}}
+
+ add something like this to the end, changing "CORPUSUSER" to the username you want to give out:
+
+ {{{
+ [mailcorpus_CORPUSUSER]
+ path = /home/bbmass/rawcor/CORPUSUSER
+ read only = false
+ auth users = CORPUSUSER
+ secrets file = /home/corpus-rsync/secrets
+ }}}
+
+ {{{
CORPUSUSER="[username you want to give out]"
- sudo vi /etc/rsyncd.conf
cd /home/bbmass/rawcor/
mkdir $CORPUSUSER
chmod 1777 $CORPUSUSER
[Spamassassin Wiki] Update of "PreflightBuildBot" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/PreflightBuildBot

The comment on the change is:
refactor shared stuff into new page

------------------------------------------------------------------------------

'''Configure'''; a final summarisation step; first off, a 'FAST FREQS REPORT' is output, the HitFrequencies from the mass-check. Next, the logs from the mass-check are copied to a safe location, and the 'corpus-hourly' script run to generate various reports from them for the RuleQaApp. The URL for viewing the results in the RuleQaApp is printed prominently.

- == Administrivia: how the corpus is laid out ==
+ == Administrivia: how the corpus is generated ==

- The filesystem layout of the corpora rsynced up to the server, is like this:
-
- {{{
- /home/bbmass/rawcor/WHO/TYPE/FOLDER
- }}}
-
- "WHO" is the person who submitted it via rsync, e.g. "doc", "jm", "zmi".
-
- Under that, we have "TYPE", which is either "ham" or "spam".
-
- Under that, "FOLDER", which is whatever the person feels is appropriate. For example, I use date-stamped dirs here. It is also possible to use mboxes, as long as they are files and their filename ends in ".mbox".
-
- Then, the script 'populate_cor' is run from cron periodically to rebuild the mass-checkable corpus from this. It attempts to 'smooth out' the multiple corpora into several new corpora, named "mc-fast", "mc-med", "mc-slow", "mc-slower", matching the buildbot slave names at http://buildbot.spamassassin.org/preflight/ .
+ The corpus is created from the UploadedCorpora. The script 'populate_cor' is run from cron periodically to rebuild the mass-checkable corpus from this. It attempts to 'smooth out' the multiple corpora into several new corpora, named "mc-fast", "mc-med", "mc-slow", "mc-slower", matching the buildbot slave names at http://buildbot.spamassassin.org/preflight/ .

It does this by:

@@ -60, +48 @@


== Uploading corpora ==

+ See UploadedCorpora.
- This is done via rsync.
-
- Give somebody on the PMC a shout, since they have privileges to
- create an rsync area for you to upload stuff to. (If you're on
- the PMC, just SSH in and copy over a tarball yourself! or create
- yourself an rsync account using a random password.)
-
- Once they've done this, they'll send you the username and password;
- you can then sync your files like so:
-
- {{{
- export RSYNC_PASSWORD=$YOURPASS
- rsync -vr /path/to/your/files \
- rsync://$YOURUSER@rsync.spamassassin.org/mailcorpus_$YOURUSER
- }}}
-
- (where $YOURPASS, $YOURUSER, $YOU are whatever the PMC guy mailed to
- you.)
-
- It's important that you have 2 dirs in the {{{/path/to/your/files}}} directory,
- {{{ham}}} and {{{spam}}}. Any files ending in {{{.mbox}}} inside those dirs
- will be treated as UNIX mbox-format files; any other files will be treated as
- individual messages (one message per file).
-
- == Administrivia ==
-
- Some stuff for PMC people hacking on this...
-
- === Admin: Creating a new rsync area for someone to upload corpora ===
-
- {{{
- sudo vi /etc/rsyncd.conf
- }}}
-
- add something like this to the end, changing "CORPUSUSER" to the username you want to give out:
-
- {{{
- [mailcorpus_CORPUSUSER]
- path = /home/bbmass/rawcor/CORPUSUSER
- read only = false
- auth users = CORPUSUSER
- secrets file = /home/corpus-rsync/secrets
- }}}
-
- {{{
- CORPUSUSER="[username you want to give out]"
- cd /home/bbmass/rawcor/
- mkdir $CORPUSUSER
- chmod 1777 $CORPUSUSER
- }}}
-
- Then create a random password string, and add a line to {{{/home/corpus-rsync/secrets}}} with $CORPUSUSER and that password.
-
- Finally, let the submitter know their new username and password.

=== Admin: Creating a new buildbot slave to perform mass-checks ===