Mailing List Archive: AWL vs. mailing lists

AWL vs. mailing lists

Feb 25, 2004, 10:16 AM

Post #1 of 34 (3712 views)

How are you all handling AWL and mailing lists?

# spamassassin -V
SpamAssassin version 2.63

I subscribe to over one hundred (100) mailing lists. procmail processes
every incoming message <256kB through SA, unless procmail dumps it
first.

High volume mailing lists (e.g., debian-user) have an extremely high
incidence of ham; therefore, AWL is extremely high for such lists. When
the virus/worm-of-the-day spams the list, SA identifies these as spam,
then transforms them into ham -- solely due to AWL skewing ;<

I know that I can turn OFF AWL.

Is that my only alternative?

What do you think?

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

RE: AWL vs. mailing lists [ In reply to ]

gary at intrepid

Feb 25, 2004, 11:15 AM

Post #2 of 34 (3698 views)

Permalink

> From: Michael D. Schleif [mailto:mds@bragi.private.network]On Behalf Of
> Michael D Schleif
> Sent: Wednesday, February 25, 2004 9:17 AM
[...]
>
> High volume mailing lists (e.g., debian-user) have an extremely high
> incidence of ham; therefore, AWL is extremely high for such lists. When
> the virus/worm-of-the-day spams the list, SA identifies these as spam,
> then transforms them into ham -- solely due to AWL skewing ;<
>
> I know that I can turn OFF AWL.
>
> Is that my only alternative?
>
> What do you think?

1. For viruses, a virus scanner is the better tool.
2. Be warned though, on techie mail lists it can be quite okay to send
a .bat file, a .bas file, and even the occasional .exe file. Therefore,
quaranteening is a better policy than outright deleting 'dangerous'
attachments that can't be directly determined to contain a known virus.
3. You could drop auto-whitelisting, and use 'more_spam_to
mailinglist@frob.com'.

Personally, apart from mailing lists like spamassassin, spam-l, and procmail
where spam is often a common topic, I've found little loss from just filing
suspected spam away in the "possible spam" file. I'm also not a big fan
of auto-whitelisting these days.

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 12:01 PM

Post #3 of 34 (3694 views)

Permalink

"Michael D Schleif" <mds@helices.org> wrote:

> I subscribe to over one hundred (100) mailing lists. procmail processes

> every incoming message <256kB through SA, unless procmail dumps it

> first.

In the same boat here, list-wise.

> High volume mailing lists (e.g., debian-user) have an extremely high

> incidence of ham; therefore, AWL is extremely high for such lists. When

> the virus/worm-of-the-day spams the list, SA identifies these as spam,

> then transforms them into ham -- solely due to AWL skewing ;<

Keep in mind that relying on SA or bayes as your SOLE means of filtering
viruses and worms can be risky. They can be USEFUL certainly, but nasty
surprises are going to come in a variety of "non-spammish" packaging. And SA IS
only checking messages smaller than 256KB, so it's not a complete solution in
any case.

I would say that the problem is that a worm/virus got into your mail queue
rather than spamassassin scoring it wrong. Based on my recent experiences using
SA, various bayes-tools, procmail and anti-virus (clamav, bitdefender), I'm
increasingly convinced that MULTIPLE layers of defense are the best bet:

1. Coarse, fast filtering with procmail (enforce policy - see below)

2. Scanning of attachments/messages for virus/worms using a tool with modern
capabilities for the job (heuristics)

3. Detection of spam in otherwise "clean" mail with SA to keep mail tidy

4. Seletive defanging/quarantine (i.e. anomy sanitizer) (enforce policy - see
below)

I'm leaning towards a "policy-based" approach. If attachments (or html content)
don't belong on a list/to a user, filter/defang it. This actually works well
for my mailing lists, as I can simply put a defang/quarantine filter in front
of them. While it's nice that SA may detect something, an executable doesn't
belong on (most of) those lists, so I don't want to let it through unless I
create a specific exception. Depending on how testing goes, it might be
simplified to "no attachments on messages (not directly addressed to
individuals with local accounts|with headers indicating a mailing list)" or
some variation.

Policy may allow some attachments (i.e. pictures) but not others (i.e. MP3s)
for some users/to some lists.

Each tool has its place, and all are complimentary:

* Procmail is great for doing fast filtering of messages

* SpamAssassin is great at detecting existing and emerging spam patterns, based
on a user's specific preferences.

* Anti-Virus tools are great at detecting existing and emerging virus/worms,
based on heuristics and other techniques.

* Defang/quarantine is useful for discriminating among types of content.

Just my thoughts.

- Bob

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 12:58 PM

Post #4 of 34 (3696 views)

Permalink

* Bob George <mailings02@ttlexceeded.com> [2004:02:25:14:01:30-0500] scribed:
> "Michael D Schleif" <mds@helices.org> wrote:
>
> > I subscribe to over one hundred (100) mailing lists. procmail processes
> > every incoming message <256kB through SA, unless procmail dumps it
> > first.
>
> In the same boat here, list-wise.
>
> > High volume mailing lists (e.g., debian-user) have an extremely high
> > incidence of ham; therefore, AWL is extremely high for such lists. When
> > the virus/worm-of-the-day spams the list, SA identifies these as spam,
> > then transforms them into ham -- solely due to AWL skewing ;<
>
> Keep in mind that relying on SA or bayes as your SOLE means of filtering
> viruses and worms can be risky. They can be USEFUL certainly, but nasty
> surprises are going to come in a variety of "non-spammish" packaging. And SA IS
> only checking messages smaller than 256KB, so it's not a complete solution in
> any case.
<snip />

Thank you, for your insightful comments.

Nevertheless, I have an elaborate spam/virus plan, just as you and Gary
do.

I think that you missed the intent of my post. Like Gary, I am becoming
disillusioned with AWL results. I am wondering if there is something I
can do to _enhance_ AWL, or should I just turn it OFF?

In the non-mailing list context, I currently have no objections with
AWL. The objections to AWL that I have appear only in mailing list
context. If I understand how spamassassin currently implements AWL,
this all makes sense: mailing lists, as senders of email, tend to have
extremely high ham incidence, and therefore, when they do emit spam, it
is extremely skewed by AWL.

In other words, when I turn OFF AWL, the spam in question scores plenty
high enough and procmail drops it into my is.spam folder. Otherwise,
AWL scores of -25 and lower skew *ALL* mailing list messages.

What am I missing?

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 2:25 PM

Post #5 of 34 (3691 views)

Permalink

"Michael D Schleif" <mds@helices.org> wrote:

> Nevertheless, I have an elaborate spam/virus plan, just as you and Gary
> do.

Ah, not trying to preach. Just tossing out ideas for discussion.

> I think that you missed the intent of my post. Like Gary, I am becoming
> disillusioned with AWL results. I am wondering if there is something I
> can do to _enhance_ AWL, or should I just turn it OFF?

Ah, OK. I got thrown by the "virus got by" aspect of the discussion...

> In the non-mailing list context, I currently have no objections with
> AWL. The objections to AWL that I have appear only in mailing list
> context.

So you'd ideally like SELECTIVE disabling of AWL?

I'm playing with --remove-addr-from-whitelist=addr but I'm not sure if the
effect is permanent or not. I ran a list message through spamassassin -R, and
re-ran checks. It claims the address is removed, yet an AWL adjustment still
shows when scoring a message.

> If I understand how spamassassin currently implements AWL,
> this all makes sense: mailing lists, as senders of email, tend to have
> extremely high ham incidence, and therefore, when they do emit spam, it
> is extremely skewed by AWL.

Hmm... I was thinking a meta-rule for list-headers and AWL might work, but a
quick test makes me think not.

A procmail rule that checks for list headers (i.e. List-Id:) then calls
spamassassin without -a perhaps? (Back to the use of procmail for policy-based
decisions).

> In other words, when I turn OFF AWL, the spam in question scores plenty
> high enough and procmail drops it into my is.spam folder. Otherwise,
> AWL scores of -25 and lower skew *ALL* mailing list messages.

After some quick checking, it seems most of my list AWL adjustments are small
(.1 or so). To be a factor, the adjustment would have to be significant,
implying the normal list is HIGHLY non-spammy, right? Is that list adderss
perhaps whitelisted? -25 seems very high to me. So the message is scoring +25
or more without AWL right?

- Bob

RE: AWL vs. mailing lists [ In reply to ]

gary at intrepid

Feb 25, 2004, 4:27 PM

Post #6 of 34 (3693 views)

Permalink

> From: Michael D. Schleif [mailto:mds@bragi.private.network]On Behalf Of
> Michael D Schleif
> Sent: Wednesday, February 25, 2004 11:58 AM
>
> In other words, when I turn OFF AWL, the spam in question scores plenty
> high enough and procmail drops it into my is.spam folder. Otherwise,
> AWL scores of -25 and lower skew *ALL* mailing list messages.
>
> What am I missing?

Think about AWL a moment: in the limit, it let's people you trust send
you an occasional spam. Is that what you want? I've got a friend, who swears
like a sailor, and tosses me a risque' message every now and then, that
used to bounce off the walls back in the days that I used a home-grown
spam-phrase checking script to fend off spam. Because of him, I had to
introduce
a white list. And in the new, SA, scheme of things, I think there's still a
place
for explicit whitelisting. And, especially when Bayes is trained and tuned,
I don't
see much of a case for AWL, at all. Bayes will notice which from addresses
and
other attributes typically indicate ham, so why add another level of
checking that
sometimes works against the other mechanisms in place?

BTW, you could tune up your procmail script to not call spamassassin with
AWL enabled,
when receiving e-mail from mailing lists.

(refer to: http://www.professional.org/procmail/listname_id.rc)

AWL="-a"

INCLUDERC="/etc/procmailrcs/listname_id.rc"

:0
* ! LISTNAME ?? ^^^^
{ AWL = "" }

:0 fw:spam.lock
| spamassassin "$AWL"

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 4:53 PM

Post #7 of 34 (3700 views)

Permalink

* Gary Funck <gary@intrepid.com> [2004:02:25:15:27:11-0800] scribed:
>
> > From: Michael D. Schleif [mailto:mds@bragi.private.network]On Behalf Of
> > Michael D Schleif
> > Sent: Wednesday, February 25, 2004 11:58 AM
> >
> > In other words, when I turn OFF AWL, the spam in question scores plenty
> > high enough and procmail drops it into my is.spam folder. Otherwise,
> > AWL scores of -25 and lower skew *ALL* mailing list messages.
> >
> > What am I missing?
>
> Think about AWL a moment: in the limit, it let's people you trust send
> you an occasional spam. Is that what you want? I've got a friend, who
> swears like a sailor, and tosses me a risque' message every now and
> then, that used to bounce off the walls back in the days that I used a
> home-grown spam-phrase checking script to fend off spam. Because of
> him, I had to introduce a white list. And in the new, SA, scheme of
> things, I think there's still a place for explicit whitelisting. And,
> especially when Bayes is trained and tuned, I don't see much of a case
> for AWL, at all. Bayes will notice which from addresses and other
> attributes typically indicate ham, so why add another level of
> checking that sometimes works against the other mechanisms in place?

OK, I am probably going to turn OFF AWL for my environment, because it
is becoming clear to me that the value added is not enough to warrant
it. When I do this, I will no longer have _any_ AWL scores in
subsequent messages? I do not need to do `spamassassin -R' on a corpus
of mail?

> BTW, you could tune up your procmail script to not call spamassassin
> with AWL enabled, when receiving e-mail from mailing lists.
>
> (refer to: http://www.professional.org/procmail/listname_id.rc)
>
> AWL="-a"
>
> INCLUDERC="/etc/procmailrcs/listname_id.rc"
>
> :0
> * ! LISTNAME ?? ^^^^
> { AWL = "" }
>
> :0 fw:spam.lock
> | spamassassin "$AWL"

This looks interesting, and I had started to piece this together.
Nevertheless, it looks like `automatic' white listing is not a good fit
with my overall strategy.

Thank you.

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

rpuhek at etnsystems

Feb 25, 2004, 5:25 PM

Post #8 of 34 (3690 views)

Permalink

Gary Funck wrote:

> Think about AWL a moment: in the limit, it let's people you trust send
> you an occasional spam. Is that what you want? I've got a friend, who swears
> like a sailor, and tosses me a risque' message every now and then, that
> used to bounce off the walls back in the days that I used a home-grown
> spam-phrase checking script to fend off spam. Because of him, I had to
> introduce
> a white list. And in the new, SA, scheme of things, I think there's still a
> place
> for explicit whitelisting. And, especially when Bayes is trained and tuned,
> I don't
> see much of a case for AWL, at all. Bayes will notice which from addresses
> and
> other attributes typically indicate ham, so why add another level of
> checking that
> sometimes works against the other mechanisms in place?

But Bayes doesn't know that "bob@sailor.org" tends to send messages that
score 10 points while "mary@blandagency.gov" tends to send messages that
score 0 points. It may be able to tell that "bob@sailor.org" is a ham
token, assuming that you've trained Bob's messages as ham.

Seems like AWL is indeed intended to let people I trust occasionally
send me email that _SA thinks is spam_ (slight difference from your
question). So the idea is that the AWL system figures out that your
sailor buddy's messages tend to score high, and AWL compensates for it
automatically (no training needed).

The idea does break a bit with mailing lists and with AWL poisoning
attempts, of course. For that, your procmail example looks perfect.

The other nice thing about the AWL approach vs relying on Bayes...
suppose your friend sends you enough risqué messages that traditional
porn spam indicators are no longer reliable spam tokens? With AWL, at
least there's an association with a specific sender, so your friend's
emails get through, but porn spam gets stopped. The same arguments can
be applied to stocks, prescription medications, mortgages, and even just
HTML email from friends on hotmail.

If the Bayes system was able to associate per sender (so that "new low
rates" was ok from my mortgage broker, but not from just anyone) Bayes
might supersede AWL, but that doesn't seem practical to me.

--Rich

RE: AWL vs. mailing lists [ In reply to ]

Matthew.van.Eerde at hbinc

Feb 25, 2004, 5:32 PM

Post #9 of 34 (3692 views)

Permalink

> From: Rich Puhek [mailto:rpuhek@etnsystems.com]
> If the Bayes system was able to associate per sender (so that "new low
> rates" was ok from my mortgage broker, but not from just
> anyone) Bayes
> might supersede AWL, but that doesn't seem practical to me.

!!!!!!

You might have hit on something there. What giving the From address
double-Bayes weight? Or triple weight? Or one-hundred-times weight? This
could serve the dual purpose of auto-learning that hahaha@sexyfun.net is
always bad, and myfriend@example.com is always good.

RE: AWL vs. mailing lists [ In reply to ]

gary at intrepid

Feb 25, 2004, 5:44 PM

Post #10 of 34 (3699 views)

Permalink

> From: Rich Puhek [mailto:rpuhek@etnsystems.com]
> Sent: Wednesday, February 25, 2004 4:26 PM
[...]
>
> But Bayes doesn't know that "bob@sailor.org" tends to send messages that
> score 10 points while "mary@blandagency.gov" tends to send messages that
> score 0 points. It may be able to tell that "bob@sailor.org" is a ham
> token, assuming that you've trained Bob's messages as ham.
>
> Seems like AWL is indeed intended to let people I trust occasionally
> send me email that _SA thinks is spam_ (slight difference from your
> question). So the idea is that the AWL system figures out that your
> sailor buddy's messages tend to score high, and AWL compensates for it
> automatically (no training needed).
>

Although I agree with your overall description of AWL, it doesn't in fact
let my friend's messages through because his messages often score higher.
If the often score high, his AWL score will be high (more spammy), and as
it creeps up will actually increase the chances that his message goes above
the spam threshold.

> The idea does break a bit with mailing lists and with AWL poisoning
> attempts, of course. For that, your procmail example looks perfect.
>

The idea partially breaks with mailing lists, because often the spammers
post their very first message to the list and it is spam. Same would
go for mary@blandagency.gov, if the first message she sent you was spam.
Maybe AWL should start off at 1/5 of the spam threshold, thus 1.0 on a 5.0
scale, slightly biasing in favor of spam upon first arrivals?

I don't think it happens that often that someone who has posted to a mailing
list over a long period of time suddenly turns around and sends a spam
(though it does happen, and some spammers are even using this trick). What
makes mailing lists different is that there are many more senders, and on
overage you don't see very many messages from a given sender. Again, maybe
AWL should begin with a bias in favor of judging initial messages as spam?

> The other nice thing about the AWL approach vs relying on Bayes...
> suppose your friend sends you enough risqué messages that traditional
> porn spam indicators are no longer reliable spam tokens?

My approach to this is to outright whielist joe@sailor.org. Bayes ignores
whitelisting as I recall, in deciding whether to autolearn or not.

Same would go for broker@mortgage.com - I just add him to my explicit
white list.

Re: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 25, 2004, 6:26 PM

Post #11 of 34 (3689 views)

Permalink

Michael D Schleif <mds@helices.org> wrote:

> In other words, when I turn OFF AWL, the spam in question scores plenty
> high enough and procmail drops it into my is.spam folder. Otherwise,
> AWL scores of -25 and lower skew *ALL* mailing list messages.

It seems like something's wrong somewhere, possibly a bug in
your version of SA. How are you getting an AWL adjustment of
anything close to -25? Even if the unadjusted spam score for
the message is 30, which is uncommonly high, the historical
average for the sender would have to be -20 in order for the
AWL adjustment to be so large. I thought that the average was
calculated without whitelist scores, so how is your average
getting to be so low?

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 6:50 PM

Post #12 of 34 (3690 views)

Permalink

* Bob George <mailings02@ttlexceeded.com> [2004:02:25:20:27:00-0700] scribed:
> "Michael D Schleif" <mds@helices.org> wrote:
> > OK, I am probably going to turn OFF AWL for my environment, because
> > it is becoming clear to me that the value added is not enough to
> > warrant it.
>
> Is this a personal, or multi-user system you're running? I'm
> curious how a mailing list address wound up with SUCH a negative
> score, unless you specifically whitelisted it.

Yesterday, I received 414 messages from debian mailing lists. By far,
the most voluminous is debian-user. Plus, debian lists are targets for
spammers -- I do not understand the rationale; but, I have quite the
corpus from those lists. Worse, I have other mailing list examples.

> > When I do this, I will no longer have _any_ AWL scores in
> > subsequent messages? I do not need to do `spamassassin -R' on a
> > corpus of mail?
>
> I did some quick testing after your previous message, and AWL
> scores STILL showed after doing
> both --remove-addr-from-whitelist=list@ddress and -R on list
> messages, calling spamassassin without -a. I'm not sure how to GET
> RID of existing AWL scores! I'd be interested in hearing from
> anyone who has myself.

Yes, I know. All those (apparently) do are zero out the current AWL,
then build again.

I have turned OFF AWL completely, and have already noticed that AWL
negative scores have positive counterparts. Now, lack of those positive
AWL scores are increasing my grey-area scores (5 - 6).

What do you think?

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 7:20 PM

Post #13 of 34 (3700 views)

Permalink

* "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:20:26:48-0500] scribed:
> Michael D Schleif <mds@helices.org> wrote:
>
> > In other words, when I turn OFF AWL, the spam in question scores plenty
> > high enough and procmail drops it into my is.spam folder. Otherwise,
> > AWL scores of -25 and lower skew *ALL* mailing list messages.
>
> It seems like something's wrong somewhere, possibly a bug in
> your version of SA. How are you getting an AWL adjustment of
> anything close to -25? Even if the unadjusted spam score for
> the message is 30, which is uncommonly high, the historical
> average for the sender would have to be -20 in order for the
> AWL adjustment to be so large. I thought that the average was
> calculated without whitelist scores, so how is your average
> getting to be so low?

Yes, wrong ;<

My best examples are run through `spamassassin -R', then deposited in my
`not.spam' folder for nightly sa-learn'ing into ham. So, they are *not*
available; but, the following may illustrate my particular problem:

pts rule name description
---- ---------------------- --------------------------------------------------
50 MDS_Remove_Subject MDS - un-subscribe idiocy
-0.9 BAYES_10 BODY: Bayesian spam probability is 10 to 20%
[score: 0.1441]
-24 AWL AWL: Auto-whitelist adjustment

pts rule name description
---- ---------------------- --------------------------------------------------
50 MDS_Subject_Out_of_Office MDS - Out of the Office
-1.5 BAYES_01 BODY: Bayesian spam probability is 1 to 10%
[score: 0.0481]
0.1 HTML_MESSAGE BODY: HTML included in message
-22 AWL AWL: Auto-whitelist adjustment

X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on
bragi.private.network
X-Spam-Level: **
X-Spam-Status: No, hits=2.7 required=5.0 tests=AWL,BAYES_00,
MDS_Subject_Subscribe,RCVD_IN_SORBS autolearn=no version=2.63

pts rule name description
---- ---------------------- --------------------------------------------------
4.1 FROM_OFFERS From address is "at something-offers"
1.9 REMOVE_REMOVAL_1WORD BODY: List removal information
2.2 YOUR_INCOME BODY: Doing something with my income
3.5 MARKETING_PARTNERS BODY: Claims you registered with a partner
0.2 OFFERS_ETC BODY: Stop the offers, coupons, discounts etc!
0.1 EXCUSE_14 BODY: Tells you how to stop further spam
5.4 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
[score: 0.9992]
0.5 REMOVE_PAGE URI: URL of page called "remove"
1.0 URI_OFFERS URI: Message has link to company offers
0.1 BIZ_TLD URI: Contains a URL in the BIZ top-level domain
0.5 FVGT_u_BIZ_SITE URI: FVGT - contains a URL in the BIZ top-level domain
0.1 CLICK_BELOW Asks you to click below
-5.4 AWL AWL: Auto-whitelist adjustment

pts rule name description
---- ---------------------- --------------------------------------------------
50 MDS_Subject_Out_of_Office MDS - Out of the Office
2.1 BAYES_90 BODY: Bayesian spam probability is 90 to 99%
[score: 0.9625]
-17 AWL AWL: Auto-whitelist adjustment

pts rule name description
---- ---------------------- --------------------------------------------------
50 MDS_Subject_Out_of_Office MDS - Out of the Office
-4.9 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
[score: 0.0000]
-20 AWL AWL: Auto-whitelist adjustment

X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on
bragi.private.network
X-Spam-Level: **
X-Spam-Status: No, hits=2.6 required=5.0 tests=AWL,BAYES_00,
MDS_Subject_Subscribe autolearn=no version=2.61

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 7:47 PM

Post #14 of 34 (3697 views)

Permalink

* Bob George <mailings02@ttlexceeded.com> [2004:02:25:21:20:55-0700] scribed:
> "Michael D Schleif" <mds@helices.org> wrote:
<snip />

> > I have turned OFF AWL completely, and have already noticed that AWL
> > negative scores have positive counterparts. Now, lack of those
> > positive AWL scores are increasing my grey-area scores (5 - 6).
>
> You mean that most messages are now somewhat positive (spam)? What
> do you mean by positive counterparts? Is something being ADDED to
> the scores, or just no AWL negative adjustment? (Sorry, just
> confused here.)

I mean that AWL generates *both* positive _and_ negative scores.

The negative scores were perplexing me. I have turned OFF AWL, and now
I realize the affects of the positive AWL scores. I have a grey zone
between scores of 5 and 6, where I am un-certain about their spam-ness.
With AWL, I had very few (2-3/day) grey zone messages. Without AWL, it
looks like I may see 3-5x that number ;<

> > What do you think?
>
> I think that -25 AWL adjustment is at the heart of the problem!

See my other post.

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 25, 2004, 8:03 PM

Post #15 of 34 (3696 views)

Permalink

Michael D Schleif <mds@helices.org> wrote:

> pts rule name description
> ---- ---------------------- --------------------------------------------------
> 50 MDS_Remove_Subject MDS - un-subscribe idiocy
> -0.9 BAYES_10 BODY: Bayesian spam probability is 10 to 20%
> [score: 0.1441]
> -24 AWL AWL: Auto-whitelist adjustment

Okay, so you have an insanely high AWL adjustment because
you've set up a rule with an insanely high score. Even so, the
adjustment only brings the score down to 25, which presumably
is still far above the spam threshold, so what's the problem?

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 8:27 PM

Post #16 of 34 (3694 views)

Permalink

"Michael D Schleif" <mds@helices.org> wrote:
> OK, I am probably going to turn OFF AWL for my environment,
because it
> is becoming clear to me that the value added is not enough to
warrant
> it.

Is this a personal, or multi-user system you're running? I'm
curious how a mailing list address wound up with SUCH a negative
score, unless you specifically whitelisted it.

> When I do this, I will no longer have _any_ AWL scores in
> subsequent messages? I do not need to do `spamassassin -R' on a
corpus
> of mail?

I did some quick testing after your previous message, and AWL
scores STILL showed after doing
both --remove-addr-from-whitelist=list@ddress and -R on list
messages, calling spamassassin without -a. I'm not sure how to GET
RID of existing AWL scores! I'd be interested in hearing from
anyone who has myself.

- Bob

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 9:13 PM

Post #17 of 34 (3699 views)

Permalink

* "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:22:03:46-0500] scribed:
> Michael D Schleif <mds@helices.org> wrote:
>
> > pts rule name description
> > ---- ---------------------- --------------------------------------------------
> > 50 MDS_Remove_Subject MDS - un-subscribe idiocy
> > -0.9 BAYES_10 BODY: Bayesian spam probability is 10 to 20%
> > [score: 0.1441]
> > -24 AWL AWL: Auto-whitelist adjustment
>
> Okay, so you have an insanely high AWL adjustment because
> you've set up a rule with an insanely high score. Even so, the
> adjustment only brings the score down to 25, which presumably
> is still far above the spam threshold, so what's the problem?

The problem is in the scores that I cannot present -- those that I have
-R'd and redisposed -- as I said. Those did *NOT* use
MDS_Remove_Subject; but, were from same mailing lists. They *ARE* spam;
but, scored as ham.

NOTE: I raised scores of MDS_Remove_Subject, and its ilk, in response to
AWL scores that continually dragged these messages from the ranks of
spam into the ranks of ham. Of course, I did this before I better
understood the AWL process. Perhaps, I should never have used AWL --
perhaps, procmail is a better spam fighting tool than spamassassin ;>

Nevertheless, skewing my point by taking only one (1) of my examples
does not bolster the value of AWL. In point of fact, I have witnessed
AWL -- as you call it, `insanely high AWL adjustment' -- skew mailing
list messages, and I began this thread in hopes of understanding how to
avoid this and to use AWL to my benefit.

Unfortunately, this thread has only driven me from AWL, because its
implementation appears skewed, especially in mailing list context.

And, nobody has offered evidence to the contrary. Nor has anybody
offered a way to reconfigure my AWL to provide value to my environment.
I can live without AWL, and I am doing so now.

Have a great day!

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 9:14 PM

Post #18 of 34 (3693 views)

Permalink

Gary Funck wrote:
>> From: Rich Puhek [mailto:rpuhek@etnsystems.com]
> [...]
>> Seems like AWL is indeed intended to let people I trust
occasionally
>> send me email that _SA thinks is spam_ (slight difference from
your
>> question).

And vice-versa I suppose.

>> So the idea is that the AWL system figures out that your
>> sailor buddy's messages tend to score high, and AWL compensates
for
>> it automatically (no training needed).
>
> Although I agree with your overall description of AWL, it
doesn't in
> fact let my friend's messages through because his messages often
> score higher. If the often score high, his AWL score will be
high
> (more spammy), and as it creeps up will actually increase the
chances
> that his message goes above the spam threshold.

I think a manual whitelist entry is probably more appropriate for
those spammy friends... and this certainly raises questions about
the wisdom of auto-bayes training off ALL ham.

But yes, my understanding is that it brings messages more towards
the overall norm for the sender in terms of scoring.

>> The idea does break a bit with mailing lists and with AWL
poisoning
>> attempts, of course. For that, your procmail example looks
perfect.
>
> The idea partially breaks with mailing lists, because often the
> spammers post their very first message to the list and it is
spam.

Hmm. In the case of a one-off poster, does AWL come into play?
Hopefully, any frequent spammer will be moderated or banned from a
list. It varies by list, but most I'm on have the From: field set
to the original sender, and THIS is the address considered in AWL.
So messages to the list FROM FREQUENT POSTERS are affected by AWL,
not all.

I've yet to see an AWL adjustment of more than -2, and certainly
nothing like the -25 described by Michael

> Same would
> go for mary@blandagency.gov, if the first message she sent you
was
> spam. Maybe AWL should start off at 1/5 of the spam threshold,
thus
> 1.0 on a 5.0 scale, slightly biasing in favor of spam upon first
> arrivals?

Yes (based on what I know -- admittedly only as a lowly end-user).

> I don't think it happens that often that someone who has posted
to a
> mailing list over a long period of time suddenly turns around
and
> sends a spam (though it does happen, and some spammers are even
using
> this trick).

I suppose forged messages would be a problem. Still, other SA
methods would, I think, generally over-ride a +/-2 offset.

> What makes mailing lists different is that there are
> many more senders, and on overage you don't see very many
messages
> from a given sender.

So AWL really won't come into play will it? On the busiest lists
I'm on, AWL adjustments are generally -1 - 0 range. Minor
"nudges". I'd only really expect AWL to matter if a VERY trusted,
frequent sender sends (or forwards) the odd spam.

> Again, maybe AWL should begin with a bias in
> favor of judging initial messages as spam?

It starts with none at all, which seems to me, prudent.

- Bob

>> The other nice thing about the AWL approach vs relying on
Bayes...
>> suppose your friend sends you enough risqué messages that
traditional
>> porn spam indicators are no longer reliable spam tokens?
>
> My approach to this is to outright whielist joe@sailor.org.
Bayes
> ignores whitelisting as I recall, in deciding whether to
autolearn or
> not.
>
> Same would go for broker@mortgage.com - I just add him to my
explicit
> white list.

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 9:20 PM

Post #19 of 34 (3691 views)

Permalink

"Michael D Schleif" <mds@helices.org> wrote:
> Yesterday, I received 414 messages from debian mailing lists.
By far,
> the most voluminous is debian-user. Plus, debian lists are
targets for
> spammers -- I do not understand the rationale; but, I have quite
the
> corpus from those lists. Worse, I have other mailing list
examples.

I was once subscribed to that list, but found it a bit
overwhelming. :)

Aren't messages still posted with From: set to the original
sender? Or are they From: a moderator address (making me wonder
how spam got out)?

> Yes, I know. All those (apparently) do are zero out the current
AWL,
> then build again.

I'm leaning more and more towards different 'policy' for lists.
AWL works for me, but thinking about it, I can see where I might
not want to 'trust' it for more public forums. I'm considering
either calling spamassassin (rather than spamc) without the -a
option, or running another spamd without it (which I need to
test).

Running messages through both in parallel might be interesting.

> I have turned OFF AWL completely, and have already noticed that
AWL
> negative scores have positive counterparts. Now, lack of those
positive
> AWL scores are increasing my grey-area scores (5 - 6).

You mean that most messages are now somewhat positive (spam)? What
do you mean by positive counterparts? Is something being ADDED to
the scores, or just no AWL negative adjustment? (Sorry, just
confused here.)

> What do you think?

I think that -25 AWL adjustment is at the heart of the problem!

BUT, if AWL isn't gaining you anything, there's no need to run it
either.

- Bob

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how
much
we think we know. The more I know, the more I know I don't know .
. .
--

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 9:37 PM

Post #20 of 34 (3694 views)

Permalink

Michael D Schleif wrote:

> [...]
>
>The problem is in the scores that I cannot present -- those that I have
>-R'd and redisposed -- as I said. Those did *NOT* use
>MDS_Remove_Subject; but, were from same mailing lists. They *ARE* spam;
>but, scored as ham.
>
>
So... you're saying AWL adjustments are still being made, even after
disabling AWL (-a)?

>NOTE: I raised scores of MDS_Remove_Subject, and its ilk, in response to
>AWL scores that continually dragged these messages from the ranks of
>spam into the ranks of ham.
>
I am wondering how this even happened. AWL is based on the SENDER
address, which will (typically) be the actual sender -- not list --
address. So those individual accounts must've had incredibly high
non-spam scores. Are some perhaps whitelisted manually?

>Of course, I did this before I better
>understood the AWL process. Perhaps, I should never have used AWL
>
If you don't WANT to use AWL, that's fine. But I don't think AWL is
broken to the point of causing anything like the scoring your're
seeing... at least not without some help. I'm not trying to say YOU
should use AWL, but I don't think it's correct to say AWL normally
causes the behavior you're seeing either. And I'm not convinced AWL on a
mailing list is a problem.

>perhaps, procmail is a better spam fighting tool than spamassassin ;>
>
>
So's the delete key for that matter! But SA sure makes things a lot EASIER.

>Nevertheless, skewing my point by taking only one (1) of my examples
>does not bolster the value of AWL. In point of fact, I have witnessed
>AWL -- as you call it, `insanely high AWL adjustment' -- skew mailing
>list messages, and I began this thread in hopes of understanding how to
>avoid this and to use AWL to my benefit.
>
>
ARE YOU WHITELISTING ADDRESSES? Doing any other manual
tweaks/improvements to the rules?

>Unfortunately, this thread has only driven me from AWL, because its
>implementation appears skewed, especially in mailing list context.
>
>
Whitelist surely doesn't fit in all scenarios. But again, I'm puzzled as
to how you got to -25 AWL scores.

>And, nobody has offered evidence to the contrary.
>
Not "evidence", perhaps but SOMETHING caused AWL to give that -25
adjustment... Again, I see +/- 2 max. adjustment most of the time for
normal addresses.

>Nor has anybody
>offered a way to reconfigure my AWL to provide value to my environment.
>I can live without AWL, and I am doing so now.
>
>
Wait, I'm confused. I understood you were having problems even WITHOUT
AWL. Is something still not working, or just "spammy" messages are
getting through? What rules are you using?

- Bob

Re: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 25, 2004, 9:55 PM

Post #21 of 34 (3695 views)

Permalink

Michael D Schleif <mds@helices.org> wrote:

> The problem is in the scores that I cannot present -- those that I have
> -R'd and redisposed -- as I said. Those did *NOT* use
> MDS_Remove_Subject; but, were from same mailing lists. They *ARE* spam;
> but, scored as ham.

They also wouldn't have had an adjustment of -25, which is what
people have been surprised by in your description of your
situation. The -25 results from the very high spam score
caused by the 50-point rule. Without it, the adjustment would
be much less -- though of course it will certainly sometimes
cause messages' scores to be moved from spam range to the
nonspam range.

> Nevertheless, skewing my point by taking only one (1) of my examples
> does not bolster the value of AWL. In point of fact, I have witnessed
> AWL -- as you call it, `insanely high AWL adjustment' -- skew mailing
> list messages, and I began this thread in hopes of understanding how to
> avoid this and to use AWL to my benefit.

And people have wanted more information about your situation,
because they haven't experienced anything like the problems you
have. Thank you for posting the examples. I apologize for
selecting one of them, but I didn't understand why you were
posting it or the other four high-scoring examples you posted.
They were classified as spam, so the system seems to be working
as intended in those cases (I'm assuming you've already read
http://wiki.spamassassin.org/w/AwlWrongWay ).

Along with those five examples of correctly identified spams,
you did post two examples where the AWL adjustment may have
caused a false negative. But in both cases the message
triggered BAYES_00, which probably had a larger contribution to
the miscategorization. Judging by those I'd say the real
problem is spam being incorrectly autolearned as ham. Spam
should not be getting BAYES_00 or BAYES_01, as it is in the
majority of your examples. Maybe you should be blaming
autolearning rather than autowhitelisting.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 25, 2004, 10:00 PM

Post #22 of 34 (3692 views)

Permalink

Bob George <mailings02@ttlexceeded.com> wrote:

> Not "evidence", perhaps but SOMETHING caused AWL to give that -25
> adjustment... Again, I see +/- 2 max. adjustment most of the time for
> normal addresses.

As do most people, but if you set up rules in such a way that
you had messages with scores of 50 or so, as Michael does, then
you would see AWL adjustments near -25. The higher the score,
the higher the AWL adjustment will be, because it's half the
difference between the sender's average score and the score for
the current message. It still shouldn't be a problem, because
taking 25 points off a score of 50 still leaves it firmly
within the spam range.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 10:03 PM

Post #23 of 34 (3693 views)

Permalink

* Bob George <mailings02@ttlexceeded.com> [2004:02:25:23:37:49-0500] scribed:
> Michael D Schleif wrote:
> >The problem is in the scores that I cannot present -- those that I have
> >-R'd and redisposed -- as I said. Those did *NOT* use
> >MDS_Remove_Subject; but, were from same mailing lists. They *ARE* spam;
> >but, scored as ham.
>
> So... you're saying AWL adjustments are still being made, even after
> disabling AWL (-a)?

I am not sure yet; but, my point in that quoted statement is that AWL
scores can be both negative *AND* positive. Without AWL, some positive
scores will be *LESS* positive than with AWL. I will learn to deal with
this; but, it remains remark-able ;>

> >NOTE: I raised scores of MDS_Remove_Subject, and its ilk, in response to
> >AWL scores that continually dragged these messages from the ranks of
> >spam into the ranks of ham.
> >
> I am wondering how this even happened. AWL is based on the SENDER
> address, which will (typically) be the actual sender -- not list --
> address. So those individual accounts must've had incredibly high
> non-spam scores. Are some perhaps whitelisted manually?

I, too, am puzzled by this? I have not read the AWL code, nor am I
currently of a mind to do so. Clearly, the `From:' of the subject
mailing list messages is the original sender, not the list. And,
empirically, the AWL scores are highly positive.

> >Of course, I did this before I better
> >understood the AWL process. Perhaps, I should never have used AWL
> >
> If you don't WANT to use AWL, that's fine. But I don't think AWL is
> broken to the point of causing anything like the scoring your're
> seeing... at least not without some help. I'm not trying to say YOU
> should use AWL, but I don't think it's correct to say AWL normally
> causes the behavior you're seeing either. And I'm not convinced AWL on a
> mailing list is a problem.
>
> >perhaps, procmail is a better spam fighting tool than spamassassin ;>
> >
> So's the delete key for that matter! But SA sure makes things a lot EASIER.
>
> >Nevertheless, skewing my point by taking only one (1) of my examples
> >does not bolster the value of AWL. In point of fact, I have witnessed
> >AWL -- as you call it, `insanely high AWL adjustment' -- skew mailing
> >list messages, and I began this thread in hopes of understanding how to
> >avoid this and to use AWL to my benefit.
> >
> ARE YOU WHITELISTING ADDRESSES? Doing any other manual
> tweaks/improvements to the rules?

NO I AM NOT WHITELISTING *A*N*Y* ADDRESSES !?!?

Yes, as you know, I have some custom rules -- some of my own device,
some from the wiki:

~/.spam.MDS.rules.cf
~/.spam.custom.rules.cf

> >Unfortunately, this thread has only driven me from AWL, because its
> >implementation appears skewed, especially in mailing list context.
> >
> Whitelist surely doesn't fit in all scenarios. But again, I'm puzzled as
> to how you got to -25 AWL scores.

Yes, I am also puzzled.

> >And, nobody has offered evidence to the contrary.
> >
> Not "evidence", perhaps but SOMETHING caused AWL to give that -25
> adjustment... Again, I see +/- 2 max. adjustment most of the time for
> normal addresses.

I receive 3,000 - 4,000 email messages per day. Perhaps, the problem
lies in the volume? the mix?

> >Nor has anybody
> >offered a way to reconfigure my AWL to provide value to my environment.
> >I can live without AWL, and I am doing so now.
> >
> Wait, I'm confused. I understood you were having problems even WITHOUT
> AWL. Is something still not working, or just "spammy" messages are
> getting through? What rules are you using?

It is not so much a problem _without_ AWL, as it is an adjustment to
living without AWL. AWL impacts scores both + and -. With AWL, I get
far too many false _negatives_, or spam labeled as ham. Without AWL I
am getting some, a different kind of, false _negatives_, or spam labeled
as grey. At this point, I can easier deal with the latter.

Thank you.

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 25, 2004, 10:10 PM

Post #24 of 34 (3696 views)

Permalink

Michael D Schleif wrote:

>* Bob George <mailings02@ttlexceeded.com> [2004:02:25:21:20:55-0700] scribed:
>
>
> [...]
>
>>You mean that most messages are now somewhat positive (spam)? What
>>do you mean by positive counterparts? Is something being ADDED to
>>the scores, or just no AWL negative adjustment? (Sorry, just
>>confused here.)
>>
>>
>
>I mean that AWL generates *both* positive _and_ negative scores.
>
>The negative scores were perplexing me. I have turned OFF AWL, and now
>I realize the affects of the positive AWL scores. I have a grey zone
>between scores of 5 and 6, where I am un-certain about their spam-ness.
>With AWL, I had very few (2-3/day) grey zone messages. Without AWL, it
>looks like I may see 3-5x that number ;<
>
>
Ah, so formerly "spam" messages are now scored borderline and thus under
the threshold?

Are you using any of the add-on rule sets? Bayes? Until bayes kicks in,
there are many messages that I see that might squeak past the defaults.

- Bob

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 10:18 PM

Post #25 of 34 (3699 views)

Permalink

* "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:23:55:15-0500] scribed:
> Michael D Schleif <mds@helices.org> wrote:
>
> > The problem is in the scores that I cannot present -- those that I have
> > -R'd and redisposed -- as I said. Those did *NOT* use
> > MDS_Remove_Subject; but, were from same mailing lists. They *ARE* spam;
> > but, scored as ham.
>
> They also wouldn't have had an adjustment of -25, which is what
> people have been surprised by in your description of your
> situation. The -25 results from the very high spam score
> caused by the 50-point rule. Without it, the adjustment would
> be much less -- though of course it will certainly sometimes
> cause messages' scores to be moved from spam range to the
> nonspam range.

I wish that I had saved the ideal examples. However, once I do
`spamassassin -R', and reprocess those messages, those examples are
gone. After the fact, I decided to post to this list. You have seen
the best examples that I have archived. MDS_Remove_Subject originally
scored +5, then 10, 20 and now 50, because these _obvious_, cut-and-dry
culprits should never be mistaken for ham -- never.

> > Nevertheless, skewing my point by taking only one (1) of my examples
> > does not bolster the value of AWL. In point of fact, I have witnessed
> > AWL -- as you call it, `insanely high AWL adjustment' -- skew mailing
> > list messages, and I began this thread in hopes of understanding how to
> > avoid this and to use AWL to my benefit.
>
> And people have wanted more information about your situation,
> because they haven't experienced anything like the problems you
> have. Thank you for posting the examples. I apologize for
> selecting one of them, but I didn't understand why you were
> posting it or the other four high-scoring examples you posted.
> They were classified as spam, so the system seems to be working
> as intended in those cases (I'm assuming you've already read
> http://wiki.spamassassin.org/w/AwlWrongWay ).

I posted those examples, such as they are, as evidence that I really get
AWL scores that high -- since some skeptics argued that that is not
possible. In fact, so far, everybody has accepted my examples as
examples of high AWL scores. To now argue that they are inadequate
examples of else, is arguably perverse ;>

> Along with those five examples of correctly identified spams,
> you did post two examples where the AWL adjustment may have
> caused a false negative. But in both cases the message
> triggered BAYES_00, which probably had a larger contribution to
> the miscategorization. Judging by those I'd say the real
> problem is spam being incorrectly autolearned as ham. Spam
> should not be getting BAYES_00 or BAYES_01, as it is in the
> majority of your examples. Maybe you should be blaming
> autolearning rather than autowhitelisting.

Hopefully, you notice that _those_ examples also include:

score MDS_Subject_Subscribe +15.0

I'll leave the math to those so inclined ;>

Actually, on further review, I thought that I showed an example with
that rule and its score; but, you get the idea. I have several custom
rules for which I have repeatedly increased the score, because AWL was
taking it away from me. Again, perhaps these type of personal spam are
handled better by procmail.

P.S., Whatever I do know about AWL, how can one message have _both_
BAYES_99 *AND* AWL of -5.4 ???

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

mds at helices

Feb 25, 2004, 10:21 PM

Post #26 of 34 (770 views)

Permalink

* "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:23:55:15-0500] scribed:
<snip />

> Along with those five examples of correctly identified spams,
> you did post two examples where the AWL adjustment may have
> caused a false negative. But in both cases the message
> triggered BAYES_00, which probably had a larger contribution to
> the miscategorization. Judging by those I'd say the real
> problem is spam being incorrectly autolearned as ham. Spam
> should not be getting BAYES_00 or BAYES_01, as it is in the
> majority of your examples. Maybe you should be blaming
> autolearning rather than autowhitelisting.

I am intrigued by your last sentence. Care to expound?

--
Best Regards,

mds
mds resource
877.596.8237
-
Dare to fix things before they break . . .
-
Our capacity for understanding is inversely proportional to how much
we think we know. The more I know, the more I know I don't know . . .
--

Re: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 25, 2004, 10:41 PM

Post #27 of 34 (772 views)

Permalink

Michael D Schleif <mds@helices.org> wrote:
> * "Keith C. Ivey" <kcivey@cpcug.org> [2004:02:25:23:55:15-0500] scribed:
> <snip />
>
> > Along with those five examples of correctly identified spams,
> > you did post two examples where the AWL adjustment may have
> > caused a false negative. But in both cases the message
> > triggered BAYES_00, which probably had a larger contribution to
> > the miscategorization. Judging by those I'd say the real
> > problem is spam being incorrectly autolearned as ham. Spam
> > should not be getting BAYES_00 or BAYES_01, as it is in the
> > majority of your examples. Maybe you should be blaming
> > autolearning rather than autowhitelisting.
>
> I am intrigued by your last sentence. Care to expound?

It seems an obvious conclusion from the preceding sentences in
the paragraph. When spam is getting BAYES_00 or BAYES_01, it's
a strong indication that some spam has been mislearned as ham.
Usually such mislearning happens because of autolearning,
though it could also happen by mistakes in manual learning.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 25, 2004, 10:47 PM

Post #28 of 34 (769 views)

Permalink

Michael D Schleif <mds@helices.org> wrote:

> I posted those examples, such as they are, as evidence that I really get
> AWL scores that high -- since some skeptics argued that that is not
> possible. In fact, so far, everybody has accepted my examples as
> examples of high AWL scores. To now argue that they are inadequate
> examples of else, is arguably perverse ;>

Yes, I believe that you have gotten AWL adjustments of -25, but
they're irrelevant to the problem. They're not causing spam to
be miscategorized, so those high AWL adjustments don't matter.
They're a natural side effect of high scores.

You haven't posted the details of cases where AWL has caused
miscategorization, but I don't believe that the AWL adjustment
in those cases is anything like -25.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

RE: AWL vs. mailing lists [ In reply to ]

Tom.Meunier at courts

Feb 25, 2004, 11:20 PM

Post #29 of 34 (771 views)

Permalink

-----Original Message-----
> From: Michael D. Schleif [mailto:mds@bragi.private.network]
>
> P.S., Whatever I do know about AWL, how can one message have _both_
> BAYES_99 *AND* AWL of -5.4 ???
>
>

That is the most basic misunderstanding of what AWL is. This is exactly
what's being asked in http://wiki.spamassassin.org/w/AwlWrongWay. If I
send you 10000 emails that all score 10, and I send you one that scores
20.8, it will give it an AWL of -5.4. (20.8 + 10.0)/2 Bayes and any
other test is completely irrelevent and has nothing to do with AWL.
It's simply a score averaging system.

If a spammer sends me 10000 emails that score 25, then ONE email that
scores 75, there will be a -25 AWL. Which I believe is what you saw
when you suddenly introduced a 50-point test into your configuration.
Of *course* it's possible. Same as it's possible for me to make a
900-point test on your signature, and all of a sudden the first time you
send me an email your AWL will be +900, even if it's a direct reply to
something I sent you and not spam. Then I remove the test, and your AWL
will still be high, but decrease with each subsequent low-scoring,
Bayes_00, bonded-sender, habeas-stamped email.

Everybody misunderstands this. (Even me - I picked insanely high
numbers of emails as an example so I didn't have to do the math right
now). Maybe we should just pretend that AWL stands for "Average Weight
L-of-all-the-emails-this-person-has-sent-to-us"

-tom

Re: AWL vs. mailing lists [ In reply to ]

rpuhek at etnsystems

Feb 26, 2004, 8:57 AM

Post #30 of 34 (768 views)

Permalink

Gary Funck wrote:

>
> Although I agree with your overall description of AWL, it doesn't in fact
> let my friend's messages through because his messages often score higher.
> If the often score high, his AWL score will be high (more spammy), and as
> it creeps up will actually increase the chances that his message goes above
> the spam threshold.
>
Got it... hmmm, I suppose that's because he sends more "spammy looking"
mail than non-spammy. If he normally sent very clearly hammy messages,
then sends a very blue joke email once in a while, AWL would probably
work as intended.

>
>>The idea does break a bit with mailing lists and with AWL poisoning
>>attempts, of course. For that, your procmail example looks perfect.
>>
>
>
> The idea partially breaks with mailing lists, because often the spammers
> post their very first message to the list and it is spam. Same would
> go for mary@blandagency.gov, if the first message she sent you was spam.
> Maybe AWL should start off at 1/5 of the spam threshold, thus 1.0 on a 5.0
> scale, slightly biasing in favor of spam upon first arrivals?
>
Or at least do a slow start, where AWL doesn't really kick in until it
sees a few messages.

> I don't think it happens that often that someone who has posted to a mailing
> list over a long period of time suddenly turns around and sends a spam
> (though it does happen, and some spammers are even using this trick). What
> makes mailing lists different is that there are many more senders, and on
> overage you don't see very many messages from a given sender. Again, maybe
> AWL should begin with a bias in favor of judging initial messages as spam?
>
>
>>The other nice thing about the AWL approach vs relying on Bayes...
>>suppose your friend sends you enough risqué messages that traditional
>>porn spam indicators are no longer reliable spam tokens?
>
>
> My approach to this is to outright whielist joe@sailor.org. Bayes ignores
> whitelisting as I recall, in deciding whether to autolearn or not.
>
> Same would go for broker@mortgage.com - I just add him to my explicit
> white list.
>
Good call. Although that's a bit of a pain site wide. Wondering how a
program like SA could automatically determine that joe@sailor.org is
legit. A semi-automatic way would be if the Bayes learn process also
updated the AWL... ham senders get a AWL ham bias spam senders get an
AWL spam bias. Don't know how practical that is versus just manually
maintaining whitelist entries.

--Rich

RE: AWL vs. mailing lists [ In reply to ]

gary at intrepid

Feb 26, 2004, 9:10 AM

Post #31 of 34 (770 views)

Permalink

> From: Rich Puhek
> Sent: Thursday, February 26, 2004 7:57 AM
[...]
> > My approach to this is to outright whielist joe@sailor.org.
> Bayes ignores
> > whitelisting as I recall, in deciding whether to autolearn or not.
> >
> > Same would go for broker@mortgage.com - I just add him to my explicit
> > white list.
> >
> Good call. Although that's a bit of a pain site wide. Wondering how a
> program like SA could automatically determine that joe@sailor.org is
> legit. A semi-automatic way would be if the Bayes learn process also
> updated the AWL... ham senders get a AWL ham bias spam senders get an
> AWL spam bias. Don't know how practical that is versus just manually
> maintaining whitelist entries.
>

I agree. Manual whitelisting is a pain, and so user-level re-categorization
of misfile ham/spam. What is needed is a nice GUI for managing those tasks.

Regarding whitelisting, it's been mentioned before (and I agree that it
sounds
promising) to look at the outgoing mail stream for whitelisting clues. If we
assume that the users generally only send mail to those addresses they wish
to engage in a conversation with, then one should bias incoming mail as
likely
ham, if it is from an address that was originally mentioned in an outgoing
mail.
This feedback loop could be accomplished either as an MTA filter, or
possibly
just a program that monitors the mail logs.

A simple Bayes system that learns spam only from spam traps, and learns ham
only from the outgoing mail stream (some adjustments would have to be made
for things like from/to addresses and Received lines) might be an
interesting
cut at a totally automated set up. That works until one of the users decides
to
go into the spam mailing business, or his PC is co-opted by a spammer's
worm.

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Feb 26, 2004, 9:59 AM

Post #32 of 34 (769 views)

Permalink

Rich Puhek <rpuhek@etnsystems.com> wrote:
> Good call. Although that's a bit of a pain site wide.
> Wondering how a program like SA could automatically determine
> that joe@sailor.org is legit. A semi-automatic way would be if
> the Bayes learn process also updated the AWL... ham senders
> get a AWL ham bias spam senders get an AWL spam bias.

Hmm. I have bayes auto turned on, and I don't think it does that. However, I
also capture all flagged spam for bayes training (other bayes tools), so
extracting the offending sender and doing --add-to-blacklist= works, though
it's probably futile given that they rotate adresses. Maybe just REMOVE those
addresses to make sure they don't clutter up the AWL.

In the case of "good" posters, one could easily set up an "add to whitelist" in
the same way a bayes training folder is done. The folder gets scanned
periodically, and the sender extracted and fed to spamassassin
with --add-to-whitelist=.

I see "add-to-whitelist", "remove-from-whitelist", "add-to-blacklist" and
"remove-from-blacklist" folders. Cumbersome, but any of the techniques for
bayes updating could be adapted.

> Don't
> know how practical that is versus just manually maintaining
> whitelist entries.

Scalability seems to be the problem. Adapting a proven bayes-training approach
is probably easiest.

- Bob

RE: AWL vs. mailing lists [ In reply to ]

kcivey at cpcug

Feb 26, 2004, 6:14 PM

Post #33 of 34 (770 views)

Permalink

Gary Funck <gary@intrepid.com> wrote:

> Although I agree with your overall description of AWL, it doesn't in fact
> let my friend's messages through because his messages often score higher.
> If the often score high, his AWL score will be high (more spammy), and as
> it creeps up will actually increase the chances that his message goes above
> the spam threshold.

I think you're misunderstanding how AWL works. There's no way
it will put your friend's message over the spam threshold
unless the average score for your friend's previous messages is
above the spam threshold (and probably significantly above,
unless the new message is already close to the threshold). If
you have a friend that consistently sends you messages that
score above the spam threshold, then you need to manually
whitelist that friend.

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: AWL vs. mailing lists [ In reply to ]

mailings02 at ttlexceeded

Mar 1, 2004, 12:36 PM

Post #34 of 34 (770 views)

Permalink

Michael,

I think I know what happened with spams getting through with high AWL scores to
some lists after reading some recent messages on related topics.

Check one of those spams, particularly the received headers. See if any in the
path that message took are listed in /usr/share/spamassassin/60_whitelist.cf.
If so, messages taking that path are whitelisted, meaning they start with
a -100 score. When the message hits AWL, THAT SENDER (the spammer) starts
with -100, regardless of the score, so their average IS highly-anti-spam. Their
next message (which there probably won't be one) will be scored more spammy,
but still not as spam. It's only after several posts that the AWL adjustment
will be corrected.

If this is correct, that same spammer could probably get a message through
based on DEFAULT WHITELIST settings, regardless of AWL. Have you seen repeats
of this problem?

I'm increasingly convinced that the default whitelist is risky for this reason.

- Bob

Mailing List Archive

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files:

Attached Files: