Mailing List Archive: Re[2]: Some real anti-bayes stuffing followup

Re[2]: Some real anti-bayes stuffing followup

Feb 13, 2004, 9:59 PM

Post #1 of 8 (1043 views)

Hello Bart, Devs,

Friday, February 13, 2004, 12:33:27 PM, you wrote, concerning Bayes:

BS> (I hope the use of message-id for this goes by the wayside soon,
BS> before spammers get the bright idea to steal old message-id headers
BS> from nonspam usenet or list archives and insert them into newly
BS> generated spam.)

Actually, a new spam-detecting mechanism could be to look for duplicate
message ids. I've received multiple spams all using the same message id.

a) If a ham is sent to my domain with four recipients here, then because
of the way I run SA, I could process that email four times, once for each
mailbox. That's expected. And it's expected that each of those emails
will have identical bodies, and identical subjects.

b) I receive spam where in a given day I can receive similar spam,
identical message ids, but with different subject headers (usually random
words or letters added to a subject), and/or with different bodies
(sometimes minor random differences, sometimes very different messages).

c) I receive spam where on Jan 2 I can receive spam with a given message
ID, and I can receive spam (similar or not) with identical message ids on
Jan 14, Jan 30, Feb 12, etc.

I suggest that if we could store a record with three or four fields,
message-id, checksum(subject), checksum(body), and maybe time(firstseen),
we could use this as a database, and apply a rule (maybe named
DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2)
time(now) is significantly different from time(firstseen).

Does this seem like a worthwhile approach?

Bob Menschel

Re: Re[2]: Some real anti-bayes stuffing followup [ In reply to ]

jon at tgpsolutions

Feb 13, 2004, 10:11 PM

Post #2 of 8 (1039 views)

Permalink

On Fri, 2004-02-13 at 20:59, Robert Menschel wrote:
> I suggest that if we could store a record with three or four fields,
> message-id, checksum(subject), checksum(body), and maybe time(firstseen),
> we could use this as a database, and apply a rule (maybe named
> DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2)
> time(now) is significantly different from time(firstseen).
>
> Does this seem like a worthwhile approach?
>

IANAD (I am not a developer) but I don't think I this a worthwhile
approach for two related reasons:

* it costs us (the mail admins) too much
* it costs spammers too little

We would need to go through the effort of implementing this in code,
then setting off resources (disk and CPU) to checksum and record these
attributes of incoming messages.

In response, spammers would only need to insert a %RND_MSG_ID to render
all our efforts useless.

- Jon

--
jon@tgpsolutions.com

Administrator, tgpsolutions
http://www.tgpsolutions.com

Re[4]: Some real anti-bayes stuffing followup [ In reply to ]

Robert at Menschel

Feb 13, 2004, 10:48 PM

Post #3 of 8 (1021 views)

Permalink

Hello Jon,

Friday, February 13, 2004, 9:11:41 PM, you wrote:

J> On Fri, 2004-02-13 at 20:59, Robert Menschel wrote:
>> I suggest that if we could store a record with three or four fields,
>> message-id, checksum(subject), checksum(body), and maybe time(firstseen),
>> we could use this as a database, and apply a rule (maybe named
>> DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2)
>> time(now) is significantly different from time(firstseen).
>>
>> Does this seem like a worthwhile approach?

J> IANAD (I am not a developer) but I don't think I this a worthwhile
J> approach for two related reasons:

J> * it costs us (the mail admins) too much
J> * it costs spammers too little

J> We would need to go through the effort of implementing this in code,
J> then setting off resources (disk and CPU) to checksum and record these
J> attributes of incoming messages.

I see this resource requirement as being minimal -- a small fraction of
what we do currently with Bayes.

J> In response, spammers would only need to insert a %RND_MSG_ID to
J> render all our efforts useless.

It'd be easier to simply have their spam-mail programs generate normal,
unique message ids...

Bob Menschel

Re[2]: Some real anti-bayes stuffing followup [ In reply to ]

dbfunk at engineering

Feb 14, 2004, 12:34 AM

Post #4 of 8 (1021 views)

Permalink

On Fri, 13 Feb 2004, Robert Menschel wrote:

> Hello Bart, Devs,
>
> Friday, February 13, 2004, 12:33:27 PM, you wrote, concerning Bayes:
>
> BS> (I hope the use of message-id for this goes by the wayside soon,
> BS> before spammers get the bright idea to steal old message-id headers
> BS> from nonspam usenet or list archives and insert them into newly
> BS> generated spam.)
>
> Actually, a new spam-detecting mechanism could be to look for duplicate
> message ids. I've received multiple spams all using the same message id.

Silly question, how does Bayes deal with a message that has -no-
Message-ID? Unlike NNTP, SMTP does not require a Message-ID, just
reccomends one.

I see many messages a day that come into our mail server that totally lack
a Message-ID (I use that as a spam-sign and assign a value of 1.5 to it ;).
My sendmail daemon synthesizes a Message-ID before delivery but it isn't
there during the filtering process.

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Some real anti-bayes stuffing followup [ In reply to ]

schaefer at zanshin

Feb 14, 2004, 3:17 AM

Post #5 of 8 (1020 views)

Permalink

On Fri, 13 Feb 2004, Robert Menschel wrote:

> I suggest that if we could store a record with three or four fields,
> message-id, checksum(subject), checksum(body), and maybe
> time(firstseen), we could use this as a database, and apply a rule
> (maybe named DUPLICATE_MESSAGEID) where either (1) checksums don't
> match, or (2) time(now) is significantly different from time(firstseen).

On Fri, 13 Feb 2004, Jon wrote:

> IANAD (I am not a developer) but I don't think I this a worthwhile
> approach for two related reasons:
>
> * it costs us (the mail admins) too much
> * it costs spammers too little

Just two points before I go to bed:

(1) Isn't this effectively what DCC, Razor, Pyzor, etc. already do?

(2) Isn't most of this data already in the Bayes database, just being
used differently?

Re: Re[2]: Some real anti-bayes stuffing followup [ In reply to ]

kcivey at cpcug

Feb 14, 2004, 5:46 AM

Post #6 of 8 (1019 views)

Permalink

David B Funk <dbfunk@engineering.uiowa.edu> wrote:

> Silly question, how does Bayes deal with a message that has -no-
> Message-ID? Unlike NNTP, SMTP does not require a Message-ID, just
> reccomends one.

If there is no message ID, SA uses a hash of the message text
followed by '@sa_generated'. Unfortunately that means if the
message is modified at a later stage before delivery it won't
be possible to correct mislearning (of course, relearning a
modified message doesn't work completely right even if there is
a message ID).

--
Keith C. Ivey <kcivey@cpcug.org>
Washington, DC

Re: Re[2]: Some real anti-bayes stuffing followup [ In reply to ]

b-nordquist at bethel

Feb 17, 2004, 6:59 AM

Post #7 of 8 (1025 views)

Permalink

On Sat, 14 Feb 2004, Keith C. Ivey <kcivey@cpcug.org> wrote:

> David B Funk <dbfunk@engineering.uiowa.edu> wrote:
>
> > Silly question, how does Bayes deal with a message that has -no-
> > Message-ID? Unlike NNTP, SMTP does not require a Message-ID, just
> > reccomends one.
>
> If there is no message ID, SA uses a hash of the message text
> followed by '@sa_generated'.

Note that many MTAs will add the Message-ID on the way through, if it
didn't have one already. SA, in turn, uses that as useful intelligence;
search for MSGID_FROM_MTA_ in the *.cf rules distributed with SA.

--
Brent J. Nordquist <b-nordquist@bethel.edu> N0BJN
Other contact information: http://kepler.acns.bethel.edu/~bjn/contact.html
* Fast pipe * Always on * Get out of the way - Tim Bray http://tinyurl.com/7sti

Re: Re[4]: Some real anti-bayes stuffing followup [ In reply to ]

jm at jmason

Feb 17, 2004, 5:13 PM

Post #8 of 8 (1024 views)

Permalink

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Robert Menschel writes:
> Hello Jon,
>
> Friday, February 13, 2004, 9:11:41 PM, you wrote:
>
> J> On Fri, 2004-02-13 at 20:59, Robert Menschel wrote:
> >> I suggest that if we could store a record with three or four fields,
> >> message-id, checksum(subject), checksum(body), and maybe time(firstseen),
> >> we could use this as a database, and apply a rule (maybe named
> >> DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2)
> >> time(now) is significantly different from time(firstseen).
> >>
> >> Does this seem like a worthwhile approach?
>
> J> IANAD (I am not a developer) but I don't think I this a worthwhile
> J> approach for two related reasons:
>
> J> * it costs us (the mail admins) too much
> J> * it costs spammers too little
>
> J> We would need to go through the effort of implementing this in code,
> J> then setting off resources (disk and CPU) to checksum and record these
> J> attributes of incoming messages.
>
> I see this resource requirement as being minimal -- a small fraction of
> what we do currently with Bayes.
>
> J> In response, spammers would only need to insert a %RND_MSG_ID to
> J> render all our efforts useless.
>
> It'd be easier to simply have their spam-mail programs generate normal,
> unique message ids...

That's what a real message-ID *is* anyway. The reason they don't do
it is because we can use those patterns as spam signs.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAMq4TQTcbUG5Y7woRAk1bAKC9JhMQ3C6TOHWGdjpnhErar3ne5gCg0EPu
XmwUNygJFZxn9QqasC5lAIM=
=+Bl0
-----END PGP SIGNATURE-----

Mailing List Archive

Mailing List Archive

Attached Files: