Mailing List Archive

Ways around Bayes filters?
With the moving of the list and a few sick days, I'm a little behind. So I'm
not sure if this has been brought up or not. My boss sent me this BBC
article this morning and suggested I send it along.

http://news.bbc.co.uk/2/hi/technology/3458457.stm

To (over) summarize, the article states something that we here already know,
or should already know. That, given time and training, there will be certain
words that the filter learns as hammy, no matter what the situation. Names
of Businesses, street addresses, and staff members seem to be a likely
target as they are used daily in ham messages and learned as such. I'm sure
for many of us here, we can find things like 'spamassassin', 'bayes' and
'procmail' scoring low somewhere in our own bayes databases.

To some extent, we are already seeing this. At least here, there as been
reports of the random gibberish words being replaced with 'technical' terms
or excepts from novels. So I've been asked if there are any suggestions to
combat, or at least keep up with this current spam mutation. Or are we
reaching a point where the effectiveness of the current systems fall behind
and we are forced to the next step, whatever that may be.

-Jeff
Re: Ways around Bayes filters? [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jeff Heinen writes:
>
>With the moving of the list and a few sick days, I'm a little behind. So I'm
>not sure if this has been brought up or not. My boss sent me this BBC
>article this morning and suggested I send it along.
>
>http://news.bbc.co.uk/2/hi/technology/3458457.stm
>
>To (over) summarize, the article states something that we here already know,
>or should already know. That, given time and training, there will be certain
>words that the filter learns as hammy, no matter what the situation. Names
>of Businesses, street addresses, and staff members seem to be a likely
>target as they are used daily in ham messages and learned as such. I'm sure
>for many of us here, we can find things like 'spamassassin', 'bayes' and
>'procmail' scoring low somewhere in our own bayes databases.
>
>To some extent, we are already seeing this. At least here, there as been
>reports of the random gibberish words being replaced with 'technical' terms
>or excepts from novels. So I've been asked if there are any suggestions to
>combat, or at least keep up with this current spam mutation. Or are we
>reaching a point where the effectiveness of the current systems fall behind
>and we are forced to the next step, whatever that may be.

Read the archives -- we covered this yesterday ;)
Also, watch the presentation in question. It says exactly the opposite.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAIr7WQTcbUG5Y7woRAreBAJ9y1qvqn3fn+Dt8cWSsRbkG3nygxQCgo7SN
BimIgGCq8/xYrjTPV8hSok0=
=FpUk
-----END PGP SIGNATURE-----