Mailing List Archive

[Bug 2894] New: Spelling Checker
http://bugzilla.spamassassin.org/show_bug.cgi?id=2894

Summary: Spelling Checker
Product: Spamassassin
Version: unspecified
Platform: Other
OS/Version: other
Status: NEW
Severity: enhancement
Priority: P5
Component: Rules
AssignedTo: spamassassin-dev@incubator.apache.org
ReportedBy: elliot@greenbaynet.com


It was brought up in a slashdot.org posting that we should utilize a spelling/grammar checker in
SA. The idea is that rather than trying to catch many different spellings of common spams, we do
the opposite: we tag higher those messages with high rates of uncommon misspellings. It could be
"exponential" type increase with each additional misspelling.

I don't know how workable this is as I have never actually written a SA rule test, but I thought I
would pass along a seemingly good idea. Happy New Year, SA team!



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
Re: [Bug 2894] New: Spelling Checker [ In reply to ]
I have an alternative solution that works. I implemented using Exim
filters but the concept is the same.

First - you start with a lits of words that spamers misspell. The list
is spelled correctly. Then - in my case I just look at the subject and
the first 100 characters of the message - but SA could look at more.

What I do then is that I remove all the words that are spelled corrrectly.

I then delete all spaces and punctuation and characters to create gappy
names.

I then convert characters @-a 1=i 3=e etc.

The idea here is that I'm correcting the spelling.

Then - after I correct the string - I them look again for the list of
words and if the process created the misspelled word - then it's a
deliberate spam.

As to the details - I do preserve spaces between "big words" to prevent
joined words from becoming words in the list.

The trick is about 95% effective.