Mailing List Archive

[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
draft email

New page:
= Rescore Mass-Check Instructions =

''THIS IS A DRAFT. Please fill in the TODOs if you have the answers...''

Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0:

First, send mail to <submit.at.spamassassin.org>, and ask for a
rescore-submission account if you haven't already got one.

It's helpful, but not required, to have some or all of the helper applications
installed:

* the Mail::SPF::Query module
* the Net::DNS module
* Razor
* DCC
* Pyzor
* (TODO: this list probably needs updating)

If you're running nightly mass-checks, please feel free to disable them when
running the rescore mass-check runs. Also, please note that the nightly
submission accounts will work for rescore submissions as well.

Then run these commands:

{{{
wget http://spamassassin.apache.org/devel/Mail-SpamAssassin-3.1.0-pre2.tar.gz
tar xvfz Mail-SpamAssassin-3.1.0-pre2.tar.gz
cd Mail-SpamAssassin-3.1.0
perl Makefile.PL < /dev/null; make

cd masses
mkdir spamassassin
rm -f spamassassin/*
echo "bayes_auto_learn 0" > spamassassin/user_prefs
echo "lock_method flock" >> spamassassin/user_prefs
echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
echo "use_auto_whitelist 0" >> spamassassin/user_prefs

./mass-check --bayes --net -j 4 --restart=400 --learn=35 --reuse \
--after=1041397200 <targets>
}}}

{{{<targets>}}} is the list of directories, mboxes, etc., like
{{{spam:dir:~/Mail/spam}}}. See the comments at the top of "mass-check" for
details.

This takes *ages* to run. {{{-j 4}}} controls the number of processes to use; 4
should be OK for a single-processor machine, since most of the time they'll be
waiting for network results to arrive. If you have adequate RAM and don't mind
the load, you can use {{{-j 6}}} or {{{-j 8}}}. There's not much benefit in
going higher than {{{-j 8}}}.

The {{{--after=1041397200}}} option tells mass-check to ignore messages older than 18 months ago (in this case January 1 2003). This is useful if your corpus has older messages intermingled with your newer messages.

If you have an unusual network layout, you may need to specify
{{{trusted_networks}}} and/or {{{internal_networks}}} in the
{{{spamassassin/user_prefs}}} file. But SA should be able to infer it in most
cases. If you get less than a 10% or 15% spam hit rate for RCVD_IN_XBL, then
you might need to use these configuration parameters.

Once it finishes:

{{{
USER="[whatever your username is]"
RSYNC_PASSWORD="[whatever your password is]"
export RSYNC_PASSWORD

rsync -CPcvuzb ham.log $USER@rsync.spamassassin.org::submit/ham-bayes-net-$USER.log
rsync -CPcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log
}}}

That's it!

The results for this run will need to be in by Monday July NNth. If you're
still running then, submit what you have so far and beg for more time. ;)
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
remove uncertainty, become certain

------------------------------------------------------------------------------
= Rescore Mass-Check Instructions =
-
- ''THIS IS A DRAFT. Please fill in the TODOs if you have the answers...''

Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0:
@@ -15, +13 @@


* the Mail::SPF::Query module
* the Net::DNS module
- * Razor
- * DCC
* Pyzor
- * (TODO: this list probably needs updating)

If you're running nightly mass-checks, please feel free to disable them when
running the rescore mass-check runs. Also, please note that the nightly
@@ -27, +22 @@

Then run these commands:

{{{
- wget http://spamassassin.apache.org/devel/Mail-SpamAssassin-3.1.0-pre2.tar.gz
+ wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.1.0-pre2.tar.gz
tar xvfz Mail-SpamAssassin-3.1.0-pre2.tar.gz
cd Mail-SpamAssassin-3.1.0
- perl Makefile.PL < /dev/null; make
+ perl Makefile.PL < /dev/null
+ make

cd masses
mkdir spamassassin
@@ -40, +36 @@

echo "bayes_store_module Mail::SpamAssassin::BayesStore::SDBM" >> spamassassin/user_prefs
echo "use_auto_whitelist 0" >> spamassassin/user_prefs

- ./mass-check --bayes --net -j 4 --restart=400 --learn=35 --reuse \
+ nohup ./mass-check --bayes --net -j 4 --restart=400 --learn=35 --reuse \
--after=1041397200 <targets>
}}}

@@ -75, +71 @@


That's it!

- The results for this run will need to be in by Monday July NNth. If you're
+ The results for this run will need to be in by Wednesday July 6th. If you're
still running then, submit what you have so far and beg for more time. ;)
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

------------------------------------------------------------------------------
Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0:

- First, send mail to <submit.at.spamassassin.org>, and ask for a
- rescore-submission account if you haven't already got one.
+ Clean up the corpus of mail you intend to mass-check (see CorpusCleaning),
+ and get an rsync account (see RsyncAccounts). The latter can be done while mass-check is running, btw, it's not needed until the end; and the 'checking for false positives and false negatives' stage of corpus cleaning can be done afterwards as well.

It's helpful, but not required, to have some or all of the helper applications
installed:
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

------------------------------------------------------------------------------
= Rescore Mass-Check Instructions =

Here's the procedure you'll need to follow, if you wish to submit data for the
- rescoring run for 3.1.0:
+ rescoring run for 3.1.0 using MassCheck:

- Clean up the corpus of mail you intend to mass-check (see CorpusCleaning),
+ Clean up the corpus of mail you intend to MassCheck (see CorpusCleaning),
and get an rsync account (see RsyncAccounts). The latter can be done while mass-check is running, btw, it's not needed until the end; and the 'checking for false positives and false negatives' stage of corpus cleaning can be done afterwards as well.

It's helpful, but not required, to have some or all of the helper applications
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
updated for pre3

------------------------------------------------------------------------------
= Rescore Mass-Check Instructions =
+
+ '''(These are the instructions for the re-run of 3.1.0 mass-checks)'''

Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0 using MassCheck:
@@ -21, +23 @@

Then run these commands:

{{{
- wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.1.0-pre2.tar.gz
+ wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.1.0-pre3.tar.gz
- tar xvfz Mail-SpamAssassin-3.1.0-pre2.tar.gz
+ tar xvfz Mail-SpamAssassin-3.1.0-pre3.tar.gz
cd Mail-SpamAssassin-3.1.0
perl Makefile.PL < /dev/null
make
@@ -70, +72 @@


That's it!

- The results for this run will need to be in by Wednesday July 6th. If you're
+ The results for this run will need to be in by Monday July 11th. If you're
- still running then, submit what you have so far and beg for more time. ;)
+ still running then, submit what you have so far and beg for more time. We
+ may be pushing it out a little further anyway depending on how things go ;)
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
not problem with --reuse

------------------------------------------------------------------------------
{{{spam:dir:~/Mail/spam}}}. See the comments at the top of "mass-check" for
details.

+ Do not use {{{--reuse}}} if the targets you are using have not been scanned by SpamAssassin with network tests enabled, or if you have disabled common network tests or SPF. This is because it relies on the presence of the {{{X-Spam-Status}}} line to pick up hits on those rules.
+
This takes *ages* to run. {{{-j 4}}} controls the number of processes to use; 4
should be OK for a single-processor machine, since most of the time they'll be
waiting for network results to arrive. If you have adequate RAM and don't mind
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

------------------------------------------------------------------------------

Once it finishes:

- 1. First check that the results are sane. See CorpusCleaning to remove any misclassified messages.
+ 1. First check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages.
1. Submit your results!
{{{
USER="[whatever your username is]"
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
note new cvs usage

------------------------------------------------------------------------------
RSYNC_PASSWORD="[whatever your password is]"
export RSYNC_PASSWORD

- rsync -CPcvuzb ham.log $USER@rsync.spamassassin.org::submit/ham-bayes-net-$USER.log
+ rsync -Pcvuzb ham.log $USER@rsync.spamassassin.org::submit/ham-bayes-net-$USER.log
- rsync -CPcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log
+ rsync -Pcvuzb spam.log $USER@rsync.spamassassin.org::submit/spam-bayes-net-$USER.log
}}}
+
+ ('''note: previously, we used -C on those rsync commands. it should be removed as the current host seems to be running a version of rsync that cannot handle that, giving this error: 'filter rules are too modern for remote rsync. rsync error: syntax or usage error (code 1) at exclude.c(1119)'.''')

That's it!
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
add uplink to parent node

------------------------------------------------------------------------------
= Rescore Mass-Check Instructions =

- '''(These are the instructions for the re-run of 3.1.0 mass-checks)'''
+ '''(These are the instructions for the re-run of 3.1.0 mass-checks; see RescoreMassCheck for the overview of the general process in toto)'''

Here's the procedure you'll need to follow, if you wish to submit data for the
rescoring run for 3.1.0 using MassCheck:
[Spamassassin Wiki] Update of "RescoreDetails" by JustinMason [ In reply to ]
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/RescoreDetails

The comment on the change is:
update for pre4

------------------------------------------------------------------------------
Then run these commands:

{{{
- wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.1.0-pre3.tar.gz
+ wget http://people.apache.org/~jm/devel/Mail-SpamAssassin-3.1.0-pre4.tar.gz
- tar xvfz Mail-SpamAssassin-3.1.0-pre3.tar.gz
+ tar xvfz Mail-SpamAssassin-3.1.0-pre4.tar.gz
cd Mail-SpamAssassin-3.1.0
perl Makefile.PL < /dev/null
make
@@ -45, +45 @@

{{{spam:dir:~/Mail/spam}}}. See the comments at the top of "mass-check" for
details.

- Do not use {{{--reuse}}} if the targets you are using have not been scanned by SpamAssassin with network tests enabled, or if you have disabled common network tests or SPF. This is because it relies on the presence of the {{{X-Spam-Status}}} line to pick up hits on those rules.
+ Do not use {{{--reuse}}} if you have scanned with SA, but have configured that scanner to run with -L, or you have disabled common network tests or SPF. This is because it relies on the presence of the {{{X-Spam-Status}}} line to pick up hits on those rules, and currently cannot detect those conditions.

This takes *ages* to run. {{{-j 4}}} controls the number of processes to use; 4
should be OK for a single-processor machine, since most of the time they'll be
@@ -58, +58 @@

If you have an unusual network layout, you may need to specify
{{{trusted_networks}}} and/or {{{internal_networks}}} in the
{{{spamassassin/user_prefs}}} file. But SA should be able to infer it in most
+ cases. A good way to tell is if you see no SPF_PASS results -- SPF will not be used if the message passes through one or more trusted relays.
- cases. If you get less than a 10% or 15% spam hit rate for RCVD_IN_XBL, then
- you might have needed to use these configuration parameters. (Since RCVD_IN_XBL
- is reused, this won't help you much now...)

- Once it finishes:
+ Once it finishes, check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages.

- 1. First check that the results are sane. See CorpusCleaning to remove any result lines that deal with misclassified or corrupt messages.
- 1. Submit your results!
+ Then submit your results!
+
{{{
USER="[whatever your username is]"
RSYNC_PASSWORD="[whatever your password is]"
@@ -79, +77 @@


That's it!

+ The results for this run will need to be in by Friday July 22nd (tentatively). If you're still running then, submit what you have so far and beg for more time. We
- The results for this run will need to be in by Monday July 11th. If you're
- still running then, submit what you have so far and beg for more time. We
may be pushing it out a little further anyway depending on how things go ;)