Hello Bart, Devs,
Friday, February 13, 2004, 12:33:27 PM, you wrote, concerning Bayes:
BS> (I hope the use of message-id for this goes by the wayside soon,
BS> before spammers get the bright idea to steal old message-id headers
BS> from nonspam usenet or list archives and insert them into newly
BS> generated spam.)
Actually, a new spam-detecting mechanism could be to look for duplicate
message ids. I've received multiple spams all using the same message id.
a) If a ham is sent to my domain with four recipients here, then because
of the way I run SA, I could process that email four times, once for each
mailbox. That's expected. And it's expected that each of those emails
will have identical bodies, and identical subjects.
b) I receive spam where in a given day I can receive similar spam,
identical message ids, but with different subject headers (usually random
words or letters added to a subject), and/or with different bodies
(sometimes minor random differences, sometimes very different messages).
c) I receive spam where on Jan 2 I can receive spam with a given message
ID, and I can receive spam (similar or not) with identical message ids on
Jan 14, Jan 30, Feb 12, etc.
I suggest that if we could store a record with three or four fields,
message-id, checksum(subject), checksum(body), and maybe time(firstseen),
we could use this as a database, and apply a rule (maybe named
DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2)
time(now) is significantly different from time(firstseen).
Does this seem like a worthwhile approach?
Bob Menschel
Friday, February 13, 2004, 12:33:27 PM, you wrote, concerning Bayes:
BS> (I hope the use of message-id for this goes by the wayside soon,
BS> before spammers get the bright idea to steal old message-id headers
BS> from nonspam usenet or list archives and insert them into newly
BS> generated spam.)
Actually, a new spam-detecting mechanism could be to look for duplicate
message ids. I've received multiple spams all using the same message id.
a) If a ham is sent to my domain with four recipients here, then because
of the way I run SA, I could process that email four times, once for each
mailbox. That's expected. And it's expected that each of those emails
will have identical bodies, and identical subjects.
b) I receive spam where in a given day I can receive similar spam,
identical message ids, but with different subject headers (usually random
words or letters added to a subject), and/or with different bodies
(sometimes minor random differences, sometimes very different messages).
c) I receive spam where on Jan 2 I can receive spam with a given message
ID, and I can receive spam (similar or not) with identical message ids on
Jan 14, Jan 30, Feb 12, etc.
I suggest that if we could store a record with three or four fields,
message-id, checksum(subject), checksum(body), and maybe time(firstseen),
we could use this as a database, and apply a rule (maybe named
DUPLICATE_MESSAGEID) where either (1) checksums don't match, or (2)
time(now) is significantly different from time(firstseen).
Does this seem like a worthwhile approach?
Bob Menschel