Mailing List Archive

Dictionary lookup?
I notice that creative misspelling seems to be the trick of the day, not to
mention the random "words" like ghoidfs and fdsfreaol that tend to show up a
lot.

How hard (or expensive) would it be to do a dictionary lookup and count the
percentage of misspelled words in the body? Could assign a score that is
the fraction times 10, so 100% misspelled words would be a score of 10.
Might have to bias it down a bit for messages with less than 20 words or so.

Loren
RE: Dictionary lookup? [ In reply to ]
Dallas has done research on this. The short version is it doesn't turn out
good. You have to look at the word length, % of misspelled, ect.... If it
isn't in the Dictionary, does it really mean it is misspelled? Links,
Acronyms, technical talk, ..........

So overall the findings were, it doesn't work as a simple filter, though it
may seem it would have. Getting it to work would cost more CPU time then
other easier methods.

Research wasn't a complete failure however. It gave great insight into other
language/spam patterns. We learn more from our failures then our successes.

--Chris

> -----Original Message-----
> From: Loren Wilton [mailto:lwilton@earthlink.net]
> Sent: Monday, February 09, 2004 12:38 AM
> To: SpamAssassin Mailing List
> Subject: Dictionary lookup?
>
>
> I notice that creative misspelling seems to be the trick of
> the day, not to
> mention the random "words" like ghoidfs and fdsfreaol that
> tend to show up a
> lot.
>
> How hard (or expensive) would it be to do a dictionary lookup
> and count the
> percentage of misspelled words in the body? Could assign a
> score that is
> the fraction times 10, so 100% misspelled words would be a
> score of 10.
> Might have to bias it down a bit for messages with less than
> 20 words or so.
>
> Loren
>