Mailing List Archive: How would one write a rule to match this?

How would one write a rule to match this?

Feb 10, 2004, 9:36 AM

Post #1 of 3 (1018 views)

Got a new spam-sign today that I don't know how to make a rule match on. This
is just another token/word breaking method, however it uses valid html, in
this case it's using the same font over and over again. The message did get
tagged as spam (barely) only as a result of RBL's and the fact that there was
no text/plain part.

---spam sample---
<html>
<body bgcolor=3D"#FFFFFF">
 
From
Diet
To The Blue Pill 
A</fon=
t>ll
The Popula=
r Prescr</fo=
nt>iption
Meds 
 <a href=3D"http://allquickmeds4u.com">ClickHere</a><=
br>
No Fees 

 
  
<a href=3D"http:=
//allquickmeds4u.com/evm.htm">ListExclusionHere 
</a>Easylink
Suite1483 9 Tanbark Circuit Werrington Downs NSW 2747 AU
</body>
</html>
---spam sample---

Re: How would one write a rule to match this? [ In reply to ]

mkettler at evi-inc

Feb 10, 2004, 9:51 AM

Post #2 of 3 (999 views)

Permalink

At 11:36 AM 2/10/2004, Brian Godette wrote:
>Got a new spam-sign today that I don't know how to make a rule match on. This
>is just another token/word breaking method, however it uses valid html, in
>this case it's using the same font over and over again.

I know it's always good to try to have rules for every word-break tactic,
however, let's face it, this particular obfuscation tactic shouldn't be
effective against spamassassin in the first place.

Remember, SA strips out HTML tags before it runs rules.

Rules like this:
body LOCAL_MEDS /\bmeds\b/i
score LOCAL_MEDS 0.1

Should hit on that mail just fine, despite the gapping stuck in between the
letters.

Really it strikes me as more of a lacking in your bayes training, and a
lacking in the default ruleset.

Re: How would one write a rule to match this? [ In reply to ]

bgodette at idcomm

Feb 10, 2004, 10:13 AM

Post #3 of 3 (1012 views)

Permalink

However the word-breaking is a far greater and more reliable (less FP)
spam-sign than any of the fairly generic and contextual words that were used
in the body of the spam. This spam hit NO body rules other than the HTML
related ones. It didn't hit any bayes score at all even with bayes turned on,
something I've noticed happen every now and then.

To match this sort of word break one would have to backreference the prior
font and compare faces (if used), size (if used), and color (if used).

On Tuesday 10 February 2004 09:51 am, Matt Kettler wrote:
> At 11:36 AM 2/10/2004, Brian Godette wrote:
> >Got a new spam-sign today that I don't know how to make a rule match on.
> > This is just another token/word breaking method, however it uses valid
> > html, in this case it's using the same font over and over again.
>
> I know it's always good to try to have rules for every word-break tactic,
> however, let's face it, this particular obfuscation tactic shouldn't be
> effective against spamassassin in the first place.
>
> Remember, SA strips out HTML tags before it runs rules.
>
> Rules like this:
> body LOCAL_MEDS /\bmeds\b/i
> score LOCAL_MEDS 0.1
>
> Should hit on that mail just fine, despite the gapping stuck in between the
> letters.
>
> Really it strikes me as more of a lacking in your bayes training, and a
> lacking in the default ruleset.