Mailing List Archive

1 2  View All
Re: AWL vs. mailing lists [ In reply to ]
* "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:23:55:15-0500] scribed:
<snip />

> Along with those five examples of correctly identified spams,
> you did post two examples where the AWL adjustment may have
> caused a false negative. But in both cases the message
> triggered BAYES_00, which probably had a larger contribution to
> the miscategorization. Judging by those I'd say the real
> problem is spam being incorrectly autolearned as ham. Spam
> should not be getting BAYES_00 or BAYES_01, as it is in the
> majority of your examples. Maybe you should be blaming
> autolearning rather than autowhitelisting.

I am intrigued by your last sentence. Care to expound?

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--
Re: AWL vs. mailing lists [ In reply to ]
Michael D Schleif <mds@helices.org> wrote:
> * "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:23:55:15-0500] scribed:
> <snip />
>
> > Along with those five examples of correctly identified spams,
> > you did post two examples where the AWL adjustment may have
> > caused a false negative. But in both cases the message
> > triggered BAYES_00, which probably had a larger contribution to
> > the miscategorization. Judging by those I'd say the real
> > problem is spam being incorrectly autolearned as ham. Spam
> > should not be getting BAYES_00 or BAYES_01, as it is in the
> > majority of your examples. Maybe you should be blaming
> > autolearning rather than autowhitelisting.
>
> I am intrigued by your last sentence. Care to expound?

It seems an obvious conclusion from the preceding sentences in
the paragraph. When spam is getting BAYES_00 or BAYES_01, it's
a strong indication that some spam has been mislearned as ham.
Usually such mislearning happens because of autolearning,
though it could also happen by mistakes in manual learning.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC
Re: AWL vs. mailing lists [ In reply to ]
Michael D Schleif <mds@helices.org> wrote:

> I posted those examples, such as they are, as evidence that I really get
> AWL scores that high -- since some skeptics argued that that is not
> possible. In fact, so far, everybody has accepted my examples as
> examples of high AWL scores. To now argue that they are inadequate
> examples of else, is arguably perverse ;>

Yes, I believe that you have gotten AWL adjustments of -25, but
they're irrelevant to the problem. They're not causing spam to
be miscategorized, so those high AWL adjustments don't matter.
They're a natural side effect of high scores.

You haven't posted the details of cases where AWL has caused
miscategorization, but I don't believe that the AWL adjustment
in those cases is anything like -25.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC
RE: AWL vs. mailing lists [ In reply to ]
-----Original Message-----
> From: Michael D. Schleif [mailto:mds@bragi.private.network]
>
> P.S., Whatever I do know about AWL, how can one message have _both_
> BAYES_99 *AND* AWL of -5.4 ???
>
>

That is the most basic misunderstanding of what AWL is. This is exactly
what's being asked in http://wiki.spamassassin.org/w/AwlWrongWay. If I
send you 10000 emails that all score 10, and I send you one that scores
20.8, it will give it an AWL of -5.4. (20.8 + 10.0)/2 Bayes and any
other test is completely irrelevent and has nothing to do with AWL.
It's simply a score averaging system.

If a spammer sends me 10000 emails that score 25, then ONE email that
scores 75, there will be a -25 AWL. Which I believe is what you saw
when you suddenly introduced a 50-point test into your configuration.
Of *course* it's possible. Same as it's possible for me to make a
900-point test on your signature, and all of a sudden the first time you
send me an email your AWL will be +900, even if it's a direct reply to
something I sent you and not spam. Then I remove the test, and your AWL
will still be high, but decrease with each subsequent low-scoring,
Bayes_00, bonded-sender, habeas-stamped email.

Everybody misunderstands this. (Even me - I picked insanely high
numbers of emails as an example so I didn't have to do the math right
now). Maybe we should just pretend that AWL stands for "Average Weight
L-of-all-the-emails-this-person-has-sent-to-us"

-tom
Re: AWL vs. mailing lists [ In reply to ]
Gary Funck wrote:

>
> Although I agree with your overall description of AWL, it doesn't in fact
> let my friend's messages through because his messages often score higher.
> If the often score high, his AWL score will be high (more spammy), and as
> it creeps up will actually increase the chances that his message goes above
> the spam threshold.
>
Got it... hmmm, I suppose that's because he sends more "spammy looking"
mail than non-spammy. If he normally sent very clearly hammy messages,
then sends a very blue joke email once in a while, AWL would probably
work as intended.

>
>>The idea does break a bit with mailing lists and with AWL poisoning
>>attempts, of course. For that, your procmail example looks perfect.
>>
>
>
> The idea partially breaks with mailing lists, because often the spammers
> post their very first message to the list and it is spam. Same would
> go for mary@blandagency.gov, if the first message she sent you was spam.
> Maybe AWL should start off at 1/5 of the spam threshold, thus 1.0 on a 5.0
> scale, slightly biasing in favor of spam upon first arrivals?
>
Or at least do a slow start, where AWL doesn't really kick in until it
sees a few messages.

> I don't think it happens that often that someone who has posted to a mailing
> list over a long period of time suddenly turns around and sends a spam
> (though it does happen, and some spammers are even using this trick). What
> makes mailing lists different is that there are many more senders, and on
> overage you don't see very many messages from a given sender. Again, maybe
> AWL should begin with a bias in favor of judging initial messages as spam?
>
>
>>The other nice thing about the AWL approach vs relying on Bayes...
>>suppose your friend sends you enough risqué messages that traditional
>>porn spam indicators are no longer reliable spam tokens?
>
>
> My approach to this is to outright whielist joe@sailor.org. Bayes ignores
> whitelisting as I recall, in deciding whether to autolearn or not.
>
> Same would go for broker@mortgage.com - I just add him to my explicit
> white list.
>
Good call. Although that's a bit of a pain site wide. Wondering how a
program like SA could automatically determine that joe@sailor.org is
legit. A semi-automatic way would be if the Bayes learn process also
updated the AWL... ham senders get a AWL ham bias spam senders get an
AWL spam bias. Don't know how practical that is versus just manually
maintaining whitelist entries.

--Rich
RE: AWL vs. mailing lists [ In reply to ]
> From: Rich Puhek
> Sent: Thursday, February 26, 2004 7:57 AM
[...]
> > My approach to this is to outright whielist joe@sailor.org.
> Bayes ignores
> > whitelisting as I recall, in deciding whether to autolearn or not.
> >
> > Same would go for broker@mortgage.com - I just add him to my explicit
> > white list.
> >
> Good call. Although that's a bit of a pain site wide. Wondering how a
> program like SA could automatically determine that joe@sailor.org is
> legit. A semi-automatic way would be if the Bayes learn process also
> updated the AWL... ham senders get a AWL ham bias spam senders get an
> AWL spam bias. Don't know how practical that is versus just manually
> maintaining whitelist entries.
>

I agree. Manual whitelisting is a pain, and so user-level re-categorization
of misfile ham/spam. What is needed is a nice GUI for managing those tasks.

Regarding whitelisting, it's been mentioned before (and I agree that it
sounds
promising) to look at the outgoing mail stream for whitelisting clues. If we
assume that the users generally only send mail to those addresses they wish
to engage in a conversation with, then one should bias incoming mail as
likely
ham, if it is from an address that was originally mentioned in an outgoing
mail.
This feedback loop could be accomplished either as an MTA filter, or
possibly
just a program that monitors the mail logs.

A simple Bayes system that learns spam only from spam traps, and learns ham
only from the outgoing mail stream (some adjustments would have to be made
for things like from/to addresses and Received lines) might be an
interesting
cut at a totally automated set up. That works until one of the users decides
to
go into the spam mailing business, or his PC is co-opted by a spammer's
worm.
Re: AWL vs. mailing lists [ In reply to ]
Rich Puhek <rpuhek@etnsystems.com> wrote:
> Good call. Although that's a bit of a pain site wide.
> Wondering how a program like SA could automatically determine
> that joe@sailor.org is legit. A semi-automatic way would be if
> the Bayes learn process also updated the AWL... ham senders
> get a AWL ham bias spam senders get an AWL spam bias.

Hmm. I have bayes auto turned on, and I don't think it does that. However, I
also capture all flagged spam for bayes training (other bayes tools), so
extracting the offending sender and doing --add-to-blacklist= works, though
it's probably futile given that they rotate adresses. Maybe just REMOVE those
addresses to make sure they don't clutter up the AWL.

In the case of "good" posters, one could easily set up an "add to whitelist" in
the same way a bayes training folder is done. The folder gets scanned
periodically, and the sender extracted and fed to spamassassin
with --add-to-whitelist=.

I see "add-to-whitelist", "remove-from-whitelist", "add-to-blacklist" and
"remove-from-blacklist" folders. Cumbersome, but any of the techniques for
bayes updating could be adapted.

> Don't
> know how practical that is versus just manually maintaining
> whitelist entries.

Scalability seems to be the problem. Adapting a proven bayes-training approach
is probably easiest.

- Bob
RE: AWL vs. mailing lists [ In reply to ]
Gary Funck <gary@intrepid.com> wrote:

> Although I agree with your overall description of AWL, it doesn't in fact
> let my friend's messages through because his messages often score higher.
> If the often score high, his AWL score will be high (more spammy), and as
> it creeps up will actually increase the chances that his message goes above
> the spam threshold.

I think you're misunderstanding how AWL works. There's no way
it will put your friend's message over the spam threshold
unless the average score for your friend's previous messages is
above the spam threshold (and probably significantly above,
unless the new message is already close to the threshold). If
you have a friend that consistently sends you messages that
score above the spam threshold, then you need to manually
whitelist that friend.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC
Re: AWL vs. mailing lists [ In reply to ]
Michael,

I think I know what happened with spams getting through with high AWL scores to
some lists after reading some recent messages on related topics.

Check one of those spams, particularly the received headers. See if any in the
path that message took are listed in /usr/share/spamassassin/60_whitelist.cf.
If so, messages taking that path are whitelisted, meaning they start with
a -100 score. When the message hits AWL, THAT SENDER (the spammer) starts
with -100, regardless of the score, so their average IS highly-anti-spam. Their
next message (which there probably won't be one) will be scored more spammy,
but still not as spam. It's only after several posts that the AWL adjustment
will be corrected.

If this is correct, that same spammer could probably get a message through
based on DEFAULT WHITELIST settings, regardless of AWL. Have you seen repeats
of this problem?

I'm increasingly convinced that the default whitelist is risky for this reason.

- Bob

1 2  View All