Mailing List Archive

[Spamassassin Wiki] Update of "iXhash" by dbonengel
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by dbonengel:
http://wiki.apache.org/spamassassin/iXhash

The comment on the change is:
Added comments on how to tune the plugin

------------------------------------------------------------------------------
You need Net::DNS and Digest::MD5 installed.

Please note that, while this plugin realises some sort of 'fuzzy checksum', the fuzzyness is realised by reducing the body text of an email to its (characteristic) minumum. The MD5 hashing algorithm used is not fuzzy. Accordingly there are no 'confidence levels' or things of that sort. It's hit or no hit.
+
+ There are likely to be FPs, especially with relatively short emails. You can somewhat tune the plugin by editing the RE with the code - see comments for that.

For those more deeply interested: This plugin is based on parts of the procmail-based project 'NiX Spam', developed by Bert Ungerer.(un@ix.de)
If you can read German, read up at http://www.heise.de/ix/nixspam/. The procmail code producing the hashes only can be found here:
@@ -157, +159 @@

#-----------------------------------------------------------------------
# Creation of hash # 1 if following conditions are met:
# - mail contains at least 16 spaces or tabs
- # - mail consists of at least 2 lines
+ # - mail consists of at least 2 lines
+ # NB: Edit this if you want to minimize FPs at the cost of not checking shorter mails.
+ # FP ratio will be the higher the shorter the mails are
if (($body =~ /([\s\t].+?){16,}/ ) && ($body =~ /.*$.*/)){
# Copy $body into $body_copy - thanks to J. Stehle for pointing this out
$body_copy = $body;
@@ -192, +196 @@

# Creation of hash # 1 if mail contains at least 3 of the following characters:
# '[<>()|@*'!?,]' or the combination of ':/'
# (To match something like "Already seen? http:/host.domain.tld/")
+ # edit this if you want to minimize FPs (i.e. make sure that short emails are not checked)
+ #
if ($body =~ /((([<>\(\)\|@\*'!?,])|(:\/)).*?){3,}/m ) {
$body_copy = $body;
# remove redundant stuff