Mailing List Archive: Some real anti-bayes stuffing

Re[2]: Some real anti-bayes stuffing [ In reply to ]

Feb 13, 2004, 10:45 PM

Post #26 of 28 (545 views)

Hello Chris,

Friday, February 13, 2004, 11:49:39 AM, you wrote:

CS> I'm really not worried. I'm surprised it took them this long to
CS> figure this part out.

Actually, this is just a variation on the KWS_SPAMSENTENCE fodder posted
here four or five months ago by CAB. That's a series of 79 sentences
taken from someone's web page, with one sentence picked at random applied
to the end of each spam generated by that spammer tool.

What I see in this new batch is an Adlibs exercise in sentence creation
(take one verb, two nouns, add an adjective and/or adverb or two, maybe a
preposition, and stir).

I agree with Justin's recent post -- randomly constructed sentences
cannot poison Bayes -- they will just feed bayes with tokens which do not
exist in normal ham for my domains, and will get themselves identified. A
few may slip through (a very few, since the basic rules do catch almost
all spam content), but that window is already shrinking.

Bob Menschel

Re: Some real anti-bayes stuffing [ In reply to ]

jon at radscan

Feb 19, 2004, 1:59 PM

Post #27 of 28 (565 views)

Permalink

On Fri, 13 Feb 2004, Dan Melomedman wrote:

> From: Dan Melomedman <dan@devonit.com>
> Date: Fri, 13 Feb 2004 13:24:19 -0500
> Subject: Re: Some real anti-bayes stuffing
> To: spamassassin-users@incubator.apache.org
> X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
> version=2.60
>
> Pat Noordsij wrote:
> > I have one email that included 2 pages of text from Tom Sawyer.
> >
> > It didn't get caught.
>
> There are also sentence-writing AI programs conveniently available for
> spammers. Finally they found a way to foil Bayesian filters.
> Congratulations.
>
> Welp, time to find a new anti-spam mechanism. What is it this time?
>

Hmm. I don't think so... Our SA installation has worked extremely
well for the 6 months or so it has been in operation. The spammers try
new tricks, and bayes/new rules fix them. Considering we are detecting
over 99% of the spam (approx 2000 a day), in spite of spammer attempts to
'poison' bayes, change speelings, etc, I'd say SA is doing just great.

--
Jon Trulson mailto:jon@radscan.com
ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962
PGP keys at http://radscan.com/~jon/PGPKeys.txt
#include <std/disclaimer.h>
"I am Nomad." -Nomad

Re: Some real anti-bayes stuffing [ In reply to ]

jon at radscan

Feb 19, 2004, 2:04 PM

Post #28 of 28 (572 views)

Permalink

On Fri, 13 Feb 2004, Justin Mason wrote:

> From: Justin Mason <jm@jmason.org>
> Date: Fri, 13 Feb 2004 16:05:02 -0800
> Subject: Re: Some real anti-bayes stuffing
> To: Mark A. DeMichele <demi@intellipro.com>
> Cc: spamassassin-users@incubator.apache.org
> X-Spam-Status: No, hits=-12.9 required=5.0 tests=AWL,BAYES_00,HABEAS_SWE
> autolearn=ham version=2.60
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Mark A. DeMichele writes:
> > This is exactly what I was talking about in a previous post.
> >
> > If spammers start doing this and they make sure this bogus section is
> > larger than the actual spam section, the bayes filter will probably mark
> > it as ham. What's worst is if you then force the bayes filter to learn
> > this as spam, now you just increased the spam score for each of these
> > good words. If this happens over and over again, I would imagine that
> > the bayes filter would malfunction. At least that's my opinion, but
> > feel free to disagree.
>
> Good, will do.
>
> The idea of bayes is that you train it on
>
> 1. *YOUR* ham
> 2. *YOUR* spam
>
> Unless spammers figure out what *YOU* call ham, they can add random words,
> bits of Russian literature, snippets of Tom Sawyer until the cows come
> home.
>

Exactly - this has been my experience as well. These attempts to
circumvent/poison bayes have been totally ineffective - at least in our
case.

> In the worst case, they'll find one or two strong ham-sign words -- like
> 'Kits', or 'entries' (for my corpus). Worst case? I retrain on their
> mail, and those tokens become about even ham and spam counts, 0.5
> probability, and are *ignored* by the Bayes calculation in future.
>

:)

> *PLEASE* read up on how Bayes works. READ John Graham-Cumming's
> presentation from the last Spam Conf, and NOTICE how it took him thousands
> of iterations of bayes-poisoning, sending a mail each time with a direct
> feedback loop, to get a single spam through.
>

Well said. It does help to understand exactly what bayes does.

--
Jon Trulson mailto:jon@radscan.com
ID: 1A9A2B09, FP: C23F328A721264E7 B6188192EC733962
PGP keys at http://radscan.com/~jon/PGPKeys.txt
#include <std/disclaimer.h>
"I am Nomad." -Nomad