Mailing List Archive

[Bug 2894] Spelling Checker
http://bugzilla.spamassassin.org/show_bug.cgi?id=2894





------- Additional Comments From gary@intrepid.com 2004-01-03 10:50 -------
Since many SA users do not always communicate in English (including some
English speakers <g>), it would seem difficult to impossible to use a spell
checker without knowing the language of the sender.

Here's a different idea, ralating to the parsing of multipart/alternative
messages: process *only* the HTML part. Since we know that the spammer will
likely use the text part to discuss the spam in the HTML part, then SA should
focus on where the spam is likely to be: the HTML part.

In addition if sufficient work is put into ferreting out only the visible part
of the HTML (ie, ignoring text that is "invisible" to the reader, because it
is hidden in a font that is colored the same as the background, or there is
Bayes poison sprinkled in HTML comments bogus tags and such, and only the
visible part is passed to Bayes, and/or scanned by SA, then perhaps the
ability to spoof with a text part that doesn't match the HTML part goes
away, and so does a major source of Bayes poison.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 2894] Spelling Checker [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=2894

spamassassin-contrib@msquadrat.de changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |DUPLICATE



------- Additional Comments From spamassassin-contrib@msquadrat.de 2004-01-03 11:08 -------


*** This bug has been marked as a duplicate of 2868 ***



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
[Bug 2894] Spelling Checker [ In reply to ]
http://bugzilla.spamassassin.org/show_bug.cgi?id=2894





------- Additional Comments From marc@perkel.com 2004-01-03 11:16 -------
Subject: Re: New: Spelling Checker

I have an alternative solution that works. I implemented using Exim
filters but the concept is the same.

First - you start with a lits of words that spamers misspell. The list
is spelled correctly. Then - in my case I just look at the subject and
the first 100 characters of the message - but SA could look at more.

What I do then is that I remove all the words that are spelled corrrectly.

I then delete all spaces and punctuation and characters to create gappy
names.

I then convert characters @-a 1=i 3=e etc.

The idea here is that I'm correcting the spelling.

Then - after I correct the string - I them look again for the list of
words and if the process created the misspelled word - then it's a
deliberate spam.

As to the details - I do preserve spaces between "big words" to prevent
joined words from becoming words in the list.

The trick is about 95% effective.






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.