Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.
The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/CorpusCleaning
------------------------------------------------------------------------------
Here's a few methods used to deal with common forms of corpus pollution -- messages in a mail corpus that aren't suitable for use in a MassCheck.
- == False Positives and False Negatives ==
+ == Cleaning Out False Positives ==
To clean a spam corpus of FalsePositives -- first, do a mass-check. You will wind up with a 'spam.log' and 'ham.log' file. Run these commands to get a list of the 200 lowest-scoring spams, create a mbox file with just those messages, then open that mbox up in the "mutt" mail client:
@@ -28, +28 @@
}}}
You can also remove the offending files, or messages from the source mailboxes, directly. However, this depends on what format you use to store messages; Maildirs, mboxes, etc. etc. (Maildirs are easiest, since you can just delete the files named in the 'id.fps' file.)
+
+ == Cleaning Out False Negatives ==
Doing the same operation to clean the ham corpus of FalseNegatives is similar, but reverses a few things... here's the commands to do that:
You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.
The following page has been changed by JustinMason:
http://wiki.apache.org/spamassassin/CorpusCleaning
------------------------------------------------------------------------------
Here's a few methods used to deal with common forms of corpus pollution -- messages in a mail corpus that aren't suitable for use in a MassCheck.
- == False Positives and False Negatives ==
+ == Cleaning Out False Positives ==
To clean a spam corpus of FalsePositives -- first, do a mass-check. You will wind up with a 'spam.log' and 'ham.log' file. Run these commands to get a list of the 200 lowest-scoring spams, create a mbox file with just those messages, then open that mbox up in the "mutt" mail client:
@@ -28, +28 @@
}}}
You can also remove the offending files, or messages from the source mailboxes, directly. However, this depends on what format you use to store messages; Maildirs, mboxes, etc. etc. (Maildirs are easiest, since you can just delete the files named in the 'id.fps' file.)
+
+ == Cleaning Out False Negatives ==
Doing the same operation to clean the ham corpus of FalseNegatives is similar, but reverses a few things... here's the commands to do that: