Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.
The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails
The comment on the change is:
draft email
New page:
= Rescore Mass-Check Instructions =
''THIS IS A DRAFT. Please fill in the TODOs if you have the answers...''
Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0:
First, send mail to <submit.at.spamassassin.org>, and ask for a
rescore-submission account if you haven't already got one.
It's helpful, but not required, to have some or all of the helper applications
installed:
* the Mail::SPF::Query module
* the Net::DNS module
* Razor
* DCC
* Pyzor
* (TODO: this list probably needs updating)
If you're running nightly mass-checks, please feel free to disable them when
running the rescore mass-check runs. Also, please note that the nightly
submission accounts will work for rescore submissions as well.
Then run these commands:
{{{
wget http://spamassassin.apache.org/devel/Mail-SpamAssassin-3.1.0-pre2.tar.gz
tar xvfz Mail-SpamAssassin-3.1.0-pre2.tar.gz
cd Mail-SpamAssassin-3.1.0
perl Makefile.PL < /dev/null; make
cd masses
mkdir spamassassin
rm -f spamassassin/*
echo "bayes_auto_learn 0" > spamassassin/user_prefs
echo "lock_method flock" >> spamassassin/user_prefs
echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
echo "use_auto_whitelist 0" >> spamassassin/user_prefs
./mass-check --bayes --net -j 4 --restart=400 --learn=35 --reuse \
--after=1041397200 <targets>
}}}
{{{<targets>}}} is the list of directories, mboxes, etc., like
{{{spam:dir:~/Mail/spam}}}. See the comments at the top of "mass-check" for
details.
This takes *ages* to run. {{{-j 4}}} controls the number of processes to use; 4
should be OK for a single-processor machine, since most of the time they'll be
waiting for network results to arrive. If you have adequate RAM and don't mind
the load, you can use {{{-j 6}}} or {{{-j 8}}}. There's not much benefit in
going higher than {{{-j 8}}}.
The {{{--after=1041397200}}} option tells mass-check to ignore messages older than 18 months ago (in this case January 1 2003). This is useful if your corpus has older messages intermingled with your newer messages.
If you have an unusual network layout, you may need to specify
{{{trusted_networks}}} and/or {{{internal_networks}}} in the
{{{spamassassin/user_prefs}}} file. But SA should be able to infer it in most
cases. If you get less than a 10% or 15% spam hit rate for RCVD_IN_XBL, then
you might need to use these configuration parameters.
Once it finishes:
{{{
USER="[whatever your username is]"
RSYNC_PASSWORD="[whatever your password is]"
export RSYNC_PASSWORD
rsync -CPcvuzb ham.log $USER@rsync.spamassassin.org::submit/ham-bayes-net-$USER.log
rsync -CPcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log
}}}
That's it!
The results for this run will need to be in by Monday July NNth. If you're
still running then, submit what you have so far and beg for more time. ;)
You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.
The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails
The comment on the change is:
draft email
New page:
= Rescore Mass-Check Instructions =
''THIS IS A DRAFT. Please fill in the TODOs if you have the answers...''
Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0:
First, send mail to <submit.at.spamassassin.org>, and ask for a
rescore-submission account if you haven't already got one.
It's helpful, but not required, to have some or all of the helper applications
installed:
* the Mail::SPF::Query module
* the Net::DNS module
* Razor
* DCC
* Pyzor
* (TODO: this list probably needs updating)
If you're running nightly mass-checks, please feel free to disable them when
running the rescore mass-check runs. Also, please note that the nightly
submission accounts will work for rescore submissions as well.
Then run these commands:
{{{
wget http://spamassassin.apache.org/devel/Mail-SpamAssassin-3.1.0-pre2.tar.gz
tar xvfz Mail-SpamAssassin-3.1.0-pre2.tar.gz
cd Mail-SpamAssassin-3.1.0
perl Makefile.PL < /dev/null; make
cd masses
mkdir spamassassin
rm -f spamassassin/*
echo "bayes_auto_learn 0" > spamassassin/user_prefs
echo "lock_method flock" >> spamassassin/user_prefs
echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
echo "use_auto_whitelist 0" >> spamassassin/user_prefs
./mass-check --bayes --net -j 4 --restart=400 --learn=35 --reuse \
--after=1041397200 <targets>
}}}
{{{<targets>}}} is the list of directories, mboxes, etc., like
{{{spam:dir:~/Mail/spam}}}. See the comments at the top of "mass-check" for
details.
This takes *ages* to run. {{{-j 4}}} controls the number of processes to use; 4
should be OK for a single-processor machine, since most of the time they'll be
waiting for network results to arrive. If you have adequate RAM and don't mind
the load, you can use {{{-j 6}}} or {{{-j 8}}}. There's not much benefit in
going higher than {{{-j 8}}}.
The {{{--after=1041397200}}} option tells mass-check to ignore messages older than 18 months ago (in this case January 1 2003). This is useful if your corpus has older messages intermingled with your newer messages.
If you have an unusual network layout, you may need to specify
{{{trusted_networks}}} and/or {{{internal_networks}}} in the
{{{spamassassin/user_prefs}}} file. But SA should be able to infer it in most
cases. If you get less than a 10% or 15% spam hit rate for RCVD_IN_XBL, then
you might need to use these configuration parameters.
Once it finishes:
{{{
USER="[whatever your username is]"
RSYNC_PASSWORD="[whatever your password is]"
export RSYNC_PASSWORD
rsync -CPcvuzb ham.log $USER@rsync.spamassassin.org::submit/ham-bayes-net-$USER.log
rsync -CPcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log
}}}
That's it!
The results for this run will need to be in by Monday July NNth. If you're
still running then, submit what you have so far and beg for more time. ;)