Hi. My first post, so I apologise in advance if I miss some of the "rules
of the road" around here.
First, let me say that I have a real hard time with regular expressions, so
just telling me 'write a regexp to do it' won't help much. A little more
detail will be required, sorry.
I have a spam with the following interesting characteristics, none of which
were caught by the current rules:
First, the To: list:
From: "Cynthia Kincaid" <tfbrtyxriipu@earthlink.net>
To: shutts1@earthlink.net, janivan@earthlink.net, auxpolice@earthlink.net,
ceewilson@earthlink.net, auxtv@earthlink.net, gary101@earthlink.net,
amcmilla@earthlink.net, brigidmary@earthlink.net, lwilton@earthlink.net,
brigido@earthlink.net
This avoids the "lots of similar names check" that currently exists.
However, it has "lots of @earthlink.net" names. For me, this is a highly
improbable case, although it could be valid for others.
So. Show can I write one or more rules along the lines of "name has lots of
recipients at earthlink.net". Ideally in either the To or CC fields, but
I'm willing to write two rules.
Second:
The mail was mimed, with a small text part, and more interestingly a
malformed HTML part. The HTML part looks like:
----------------------
--06774864503199475989
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 8bit
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
lots of spam
</BODY></HTML>
president boylston corpsmen navigable owly bordello convenient fur lascar
holmium icosahedron complete bucket alp node copperas adler gyrocompass
exemption alpha ophiuchus rap <br>
checklist declassify philharmonic incidental bibb kim alberto hughes
moiseyev eke augustan omniscient chinamen jet mimetic crosstalk antenna
foreign <br>
accusation newtonian playground bridgewater comport hibernate library goal
awe hosiery dogtrot hypocycloid crabmeat gondola questionnaire sierra lac
beyond emplace attica cabal airspeed lawyer column cheerful coverage antioch
hieratic goody aaron deformation bounty sanderling cdc <br>
-- and more hundreds of words, AFTER THE END OF THE BODY.
--06774864503199475989--
----------------------
So the rule might be somethign along the lines of "lots of body text after
the /BODY tag". Or maybe after the /HTML tag.
Any friendly suggestions on how to deal with this puppy?
(Yes, I suppose Bayes will eventually catch it once it trains up enough.
But I like rules for things that are really obvious cases, at least in my
specific instance.)
Thanks,
Loren
of the road" around here.
First, let me say that I have a real hard time with regular expressions, so
just telling me 'write a regexp to do it' won't help much. A little more
detail will be required, sorry.
I have a spam with the following interesting characteristics, none of which
were caught by the current rules:
First, the To: list:
From: "Cynthia Kincaid" <tfbrtyxriipu@earthlink.net>
To: shutts1@earthlink.net, janivan@earthlink.net, auxpolice@earthlink.net,
ceewilson@earthlink.net, auxtv@earthlink.net, gary101@earthlink.net,
amcmilla@earthlink.net, brigidmary@earthlink.net, lwilton@earthlink.net,
brigido@earthlink.net
This avoids the "lots of similar names check" that currently exists.
However, it has "lots of @earthlink.net" names. For me, this is a highly
improbable case, although it could be valid for others.
So. Show can I write one or more rules along the lines of "name has lots of
recipients at earthlink.net". Ideally in either the To or CC fields, but
I'm willing to write two rules.
Second:
The mail was mimed, with a small text part, and more interestingly a
malformed HTML part. The HTML part looks like:
----------------------
--06774864503199475989
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 8bit
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<TITLE>Message</TITLE>
<META content="MSHTML 6.00.2800.1276" name=GENERATOR></HEAD>
<BODY>
lots of spam
</BODY></HTML>
president boylston corpsmen navigable owly bordello convenient fur lascar
holmium icosahedron complete bucket alp node copperas adler gyrocompass
exemption alpha ophiuchus rap <br>
checklist declassify philharmonic incidental bibb kim alberto hughes
moiseyev eke augustan omniscient chinamen jet mimetic crosstalk antenna
foreign <br>
accusation newtonian playground bridgewater comport hibernate library goal
awe hosiery dogtrot hypocycloid crabmeat gondola questionnaire sierra lac
beyond emplace attica cabal airspeed lawyer column cheerful coverage antioch
hieratic goody aaron deformation bounty sanderling cdc <br>
-- and more hundreds of words, AFTER THE END OF THE BODY.
--06774864503199475989--
----------------------
So the rule might be somethign along the lines of "lots of body text after
the /BODY tag". Or maybe after the /HTML tag.
Any friendly suggestions on how to deal with this puppy?
(Yes, I suppose Bayes will eventually catch it once it trains up enough.
But I like rules for things that are really obvious cases, at least in my
specific instance.)
Thanks,
Loren