Mailing List Archive

How can I write rules to catch this baby?
Hi. My first post, so I apologise in advance if I miss some of the "rules
of the road" around here.

First, let me say that I have a real hard time with regular expressions, so
just telling me 'write a regexp to do it' won't help much. A little more
detail will be required, sorry.

I have a spam with the following interesting characteristics, none of which
were caught by the current rules:

First, the To: list:

From: "Cynthia Kincaid" <tfbrtyxriipu@earthlink.net>
To: shutts1@earthlink.net, janivan@earthlink.net, auxpolice@earthlink.net,
ceewilson@earthlink.net, auxtv@earthlink.net, gary101@earthlink.net,
amcmilla@earthlink.net, brigidmary@earthlink.net, lwilton@earthlink.net,
brigido@earthlink.net

This avoids the "lots of similar names check" that currently exists.
However, it has "lots of @earthlink.net" names. For me, this is a highly
improbable case, although it could be valid for others.

So. Show can I write one or more rules along the lines of "name has lots of
recipients at earthlink.net". Ideally in either the To or CC fields, but
I'm willing to write two rules.

Second:

The mail was mimed, with a small text part, and more interestingly a
malformed HTML part. The HTML part looks like:

----------------------
--06774864503199475989
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 8bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>

<TITLE>Message</TITLE>

<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
lots of spam
</BODY></HTML>
president boylston corpsmen navigable owly bordello convenient fur lascar
holmium icosahedron complete bucket alp node copperas adler gyrocompass
exemption alpha ophiuchus rap <br>
checklist declassify philharmonic incidental bibb kim alberto hughes
moiseyev eke augustan omniscient chinamen jet mimetic crosstalk antenna
foreign <br>
accusation newtonian playground bridgewater comport hibernate library goal
awe hosiery dogtrot hypocycloid crabmeat gondola questionnaire sierra lac
beyond emplace attica cabal airspeed lawyer column cheerful coverage antioch
hieratic goody aaron deformation bounty sanderling cdc <br>

-- and more hundreds of words, AFTER THE END OF THE BODY.

--06774864503199475989--
----------------------

So the rule might be somethign along the lines of "lots of body text after
the /BODY tag". Or maybe after the /HTML tag.


Any friendly suggestions on how to deal with this puppy?

(Yes, I suppose Bayes will eventually catch it once it trains up enough.
But I like rules for things that are really obvious cases, at least in my
specific instance.)

Thanks,

Loren
Re: How can I write rules to catch this baby? [ In reply to ]
At 04:13 AM 2/4/2004, Loren Wilton wrote:
>So the rule might be somethign along the lines of "lots of body text after
>the /BODY tag". Or maybe after the /HTML tag.

Someone already has

From http://www.merchantsoverseas.com/wwwroot/gorilla/rawbody.txt


rawbody MK_BAD_HTML_1 /\<\/html\>\s{0,50}\S+\d{0,10}/i
describe MK_BAD_HTML_1 Bad HTML form. Content after closing HTML tag
score MK_BAD_HTML_1 .55


Note that MK here is not me (Matt Kettler), so I can't take credit for the
rule. I think Mike Kuentz wrote it, but I'm not sure where Chris got it from.
Re: How can I write rules to catch this baby? [ In reply to ]
At Wed Feb 4 09:13:14 2004, Loren Wilton wrote:
>
> From: "Cynthia Kincaid" <tfbrtyxriipu@earthlink.net>
> To: shutts1@earthlink.net, janivan@earthlink.net, auxpolice@earthlink.net,
> ceewilson@earthlink.net, auxtv@earthlink.net, gary101@earthlink.net,
> amcmilla@earthlink.net, brigidmary@earthlink.net, lwilton@earthlink.net,
> brigido@earthlink.net
>
> This avoids the "lots of similar names check" that currently exists.
> However, it has "lots of @earthlink.net" names. For me, this is a highly
> improbable case, although it could be valid for others.

It's certainly unlikely where the messages are addressed to ISP
accounts. However, it wouldn't be unusual in a corporate or academic
environment.

Martin
--
Martin Radford | "Only wimps use tape backup: _real_
martin@zamenhof.demon.co.uk | men just upload their important stuff -o)
Registered Linux user #9257 | on ftp and let the rest of the world /\\
- see http://counter.li.org | mirror it ;)" - Linus Torvalds _\_V