Mailing List Archive

Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM"
Hi,

Could anybody please run this rule against his SPAM/HAM corpus?

I just whipped up this

rawbody LOCAL_URL_SYNTAX_1 /www\.[a-z]\.[a-z]\.com\/[a-z0-9
{1,4}\/\?AFF_ID=[a-z0-9]+\&[a-z]+[a-z]+/
describe LOCAL_URL_SYNTAX_1 Spammer-like URL syntax - TEST RULE 04-02-07
score LOCAL_URL_SYNTAX_1 1.0



to catch all those mails that contain URLs like

<A
HREF="http://www.xbaq.whatuthinkwillhappen.com/c/?AFF_ID=c1224&qgdwcmaewo=uwdi">Clwck
Here for Gensric Cinlis</a><br>
<A
HREF="http://www.iprorvpe.whatuthinkwillhappen.com/v/?AFF_ID=v1224&vtdo=aajtyv">Clqck
Here for Genbric Vibgra</a><br>
<A
HREF="http://www.phaabofzs.suppatimeitnow.com/m/?AFF_ID=m1&bigkssmn=hnewt">
<A
HREF="http://www.jkmwbrwh.takeituptothetop.com/v2/?AFF_ID=d1230&ctxs=zorlmhqb">FIkND
IT HERfE</A><BR>
<A
HREF="http://www.xizrsjfrlz.takeituptothetop.com/x/?AFF_ID=o1230&vollwgu=bwocx">FINsD
IT HxERE</A><BR>
<A
HREF="http://www.vbraaud.takeituptothetop.com/l/?AFF_ID=a1230&wgeazwzomc=jeuz">FsIND
IT HEuRE</A><BR>
<A
HREF="http://www.ebeefbnw.unbelievablepricez.com/c/?AFF_ID=c1224&xijshovcp=rkyvjha">Clfck
Here for Gentric Cialis</a><br>
<A
HREF="http://www.ewxsyeree.unbelievablepricez.com/v/?AFF_ID=v1224&oqydznwv=krixtkg">Click
Here for Gengric Viegra</a><br>
<A
HREF="http://www.lexg.takeituptothetop.com/cv/?AFF_ID=cv0119&yyvvps=nvsvvx">Enter
Here</a><br>
<A
HREF="http://www.hbsw.foreveryourhost.com/c2/?AFF_ID=c20206&fifzban=ebwxhfm">Entdr
Here</a><br>
N

also things like

<A
HREF="http://www.xgsumpub.stlg.com=www.xomhe.ozgzcqbrh.entertheoneandlive.com/c/?AFF_ID=a3&uhkz=krdhfrspg">Eyntzer
Hegre</a><br>
<A
HREF="http://www.pyuaw.colx.com=www.vzlxxjyuk.ypfeavly.entertheoneandlive.com/v/?AFF_ID=a3&hqaecbxhhr=hjfjj">Etntzer
Heare</a><br>

are in my SPAM folder lots and lots of times.


Thanks!

(the above created via grepping for "AFF_ID" in my spam folder.
"AFF_ID" hits on about half my spam!!)

Maybe this is better: (hits only 1192 times though)
"(www\.[a-z]\.com=)?[a-z]+\.[a-z]+\.com\/[a-z0-9
{1,2}\/\?AFF_ID=[a-z0-9]+\&[a-z0-9]+=[a-z0-9]+"



--
Jens Benecke (jens at spamfreemail.de)
http://www.hitchhikers.de - Europaweite kostenlose Mitfahrzentrale
http://www.spamfreemail.de - 100% saubere Postfächer - garantiert!
http://www.rb-hosting.de - PHP ab 9? - SSH ab 19? - günstiger Traffic
Re: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jens Benecke writes:
>Hi,
>
>Could anybody please run this rule against his SPAM/HAM corpus?
>
>I just whipped up this
>
>rawbody LOCAL_URL_SYNTAX_1 /www\.[a-z]\.[a-z]\.com\/[a-z0-9
>{1,4}\/\?AFF_ID=[a-z0-9]+\&[a-z]+[a-z]+/
>describe LOCAL_URL_SYNTAX_1 Spammer-like URL syntax - TEST RULE 04-02-07
>score LOCAL_URL_SYNTAX_1 1.0
>
>
>
>to catch all those mails that contain URLs like
>
><A
>HREF="http://www.xbaq.whatuthinkwillhappen.com/c/?AFF_ID=c1224&qgdwcmaewo=uwdi">Clwck

Actually -- has anyone got *any* legit mail containing "aff_id",
"AFF_ID", "affiliateid", "aff_sub_id" etc.? I would bet not.

This may make a good rule:

uri LOCAL_URI_AFFILIATE /aff\w+id=/i

0 FPs on my corpus, plenty of spam hits.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAJWNUQTcbUG5Y7woRAoITAKDP4Ot57SPD65RwWCqcRJMfQJpCTgCdGzcZ
q7+1MHM5Nd0LbWgy8FuuMk0=
=SNqQ
-----END PGP SIGNATURE-----
Re[2]: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
Hello Justin,

Saturday, February 7, 2004, 2:14:44 PM, you responded to Jens:

>>Could anybody please run this rule against his SPAM/HAM corpus?
>>rawbody LOCAL_URL_SYNTAX_1 /www\.[a-z]\.[a-z]\.com\/[a-z0-9
>> {1,4}\/\?AFF_ID=[a-z0-9]+\&[a-z]+[a-z]+/
>>describe LOCAL_URL_SYNTAX_1 Spammer-like URL syntax - TEST RULE 04-02-07
>>score LOCAL_URL_SYNTAX_1 1.0
>>to catch all those mails that contain URLs like
>><A HREF="http://www.xbaq.whatuthinkwillhappen.com/c/?AFF_ID=c1224&qgdwcmaewo=uwdi">Clwck

JM> Actually -- has anyone got *any* legit mail containing "aff_id",
JM> "AFF_ID", "affiliateid", "aff_sub_id" etc.? I would bet not.

JM> This may make a good rule:

JM> uri LOCAL_URI_AFFILIATE /aff\w+id=/i

I tested these two rules:
rawbody LOCAL_URL_SYNTAX_1 /www\.[a-z]\.[a-z]\.com\/[a-z0-9]{1,4}\/\?AFF_ID=[a-z0-9]+\&[a-z]+[a-z]+/
describe LOCAL_URL_SYNTAX_1 Spammer-like URL syntax - TEST RULE 04-02-07
score LOCAL_URL_SYNTAX_1 1.0
uri LOCAL_URI_AFFILIATE /aff\w+id=/i
describe LOCAL_URI_AFFILIATE spam from an affiliate
score LOCAL_URI_AFFILIATE 1

OVERALL% SPAM% HAM% S/O RANK SCORE NAME
91185 73148 18037 0.802 0.00 0.00 (all messages)
100.000 80.2193 19.7807 0.802 0.00 0.00 (all messages as %)
2.071 2.5811 0.0000 1.000 1.00 1.00 LOCAL_URI_AFFILIATE
0.000 0.0000 0.0000 0.500 0.00 1.00 LOCAL_URL_SYNTAX_1

No matches at all for Jens' rule, great results to Jason's.

Bob Menschel
Re: Re[2]: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
Out of curiosity, if you have the spare machine time, could you test the
simple rule

uri AFF_ID /\/\?AFF_ID\=/
describe AFF_ID URL contains AFF_ID=
score AFF_ID 3

This has been working well for me for a couple days, but I really don't get
a huge quantity of messages. I'm interested if it hits any ham at all.

Also, up to today this one has been catching lots of spam for me, although
interestingly today I haven't had a single spam matching this pattern.
Normally it catches 30-40 a day.

uri URL_EQUALS /www\.[0-9a-z\.\_]+\=[0-9a-z\.\_]+/i
describe URL_EQUALS URL has equal sign in hostname
score URL_EQUALS 4.4

Thanks,

Loren
Re: Re[2]: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
Robert Menschel wrote:

> I tested these two rules:
> rawbody LOCAL_URL_SYNTAX_1
> /www\.[a-z]\.[a-z]\.com\/[a-z0-9]{1,4}\/\?AFF_ID=[a-z0-9]+\&[a-z]+[a-z]+/
> describe LOCAL_URL_SYNTAX_1 Spammer-like URL syntax - TEST RULE 04-02-07
> score LOCAL_URL_SYNTAX_1 1.0
> uri LOCAL_URI_AFFILIATE /aff\w+id=/i
> describe LOCAL_URI_AFFILIATE spam from an affiliate
> score LOCAL_URI_AFFILIATE 1
>
> OVERALL% SPAM% HAM% S/O RANK SCORE NAME
> 91185 73148 18037 0.802 0.00 0.00 (all messages)
> 100.000 80.2193 19.7807 0.802 0.00 0.00 (all messages as %)
> 2.071 2.5811 0.0000 1.000 1.00 1.00 LOCAL_URI_AFFILIATE
> 0.000 0.0000 0.0000 0.500 0.00 1.00 LOCAL_URL_SYNTAX_1
>
> No matches at all for Jens' rule, great results to Jason's.

That's because my rule was buggy. And the one above contains a typing
mistake, I think. :)

This should catch them:

rawbody LOCAL_URL_SYNTAX_1 /(www\.[a-z]\.com=)?[a-z]+\.[a-z]+\.com\/[a-z0-9
{1,4}\/(index\.php)?\?AFF_ID=[a-z0-9]+(\&[a-z0-9]+=[a-z0-9]+)?/
describe LOCAL_URL_SYNTAX_1 Spammer-like URL syntax - TEST RULE 04-02-07
score LOCAL_URL_SYNTAX_1 1.0

or use "uri" instead of "rawbody" (I honestly don't know exactly what "uri"
assumes so I just search the raw message body).

At least my SPAM folder likes them:
(total mails, mails containing AFF_ID, mails containing my rule)

# grep -c "^From " .Mailbox.S{PAM,URESPAM}
.Mailbox.SPAM:3368
.Mailbox.SURESPAM:8014

# grep -c AFF_ID .Mailbox.S{PAM,URESPAM}
.Mailbox.SPAM:1473
.Mailbox.SURESPAM:1058

# egrep -c '(www\.[a-z]\.com=)?[a-z]+\.[a-z]+\.com\/[a-z0-9
{1,2}\/(index\.php)?\?AFF_ID=[a-z0-9]+(\&[a-z0-9]+=[a-z0-9]+)?' .Mailbox
{SPAM,SURESPAM}
.Mailbox.SPAM:1469
.Mailbox.SURESPAM:1027


--
Jens Benecke (jens at spamfreemail.de)
http://www.hitchhikers.de - Europaweite kostenlose Mitfahrzentrale
http://www.spamfreemail.de - 100% saubere Postfächer - garantiert!
http://www.rb-hosting.de - PHP ab 9? - SSH ab 19? - günstiger Traffic
Re[4]: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
Hello Loren,

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

OVERALL SPAM HAM S/O SCORE NAME
91185 73148 18037 0.802 0.00 0.00 (all messages)
468 468 0 1.000 0.97 4.40 URL_EQUALS
417 417 0 1.000 0.97 3.00 AFF_ID

OVERALL% SPAM% HAM% S/O RANK SCORE NAME
91185 73148 18037 0.802 0.00 0.00 (all messages)
100.000 80.2193 19.7807 0.802 0.00 0.00 (all messages as %)
0.513 0.6398 0.0000 1.000 0.97 4.40 URL_EQUALS
0.457 0.5701 0.0000 1.000 0.97 3.00 AFF_ID

Rules hit a decent amount of spam, and no ham. I like them.

FYI, Justin's affiliate rule posted recently hits better (I've renamed it
for my system):
uri JM_uwd_AFFILIATE /aff\w+id=/i
describe JM_uwd_AFFILIATE spam from an affiliate
score JM_uwd_AFFILIATE 4.000 # 1888s/0h of 91185 corpus (73148s/18037h) 02/09/04

Bob Menschel


Monday, February 9, 2004, 10:54:58 PM, you wrote:

LW> Out of curiosity, if you have the spare machine time, could you test the
LW> simple rule

LW> uri AFF_ID /\/\?AFF_ID\=/
LW> describe AFF_ID URL contains AFF_ID=
LW> score AFF_ID 3

LW> This has been working well for me for a couple days, but I really don't get
LW> a huge quantity of messages. I'm interested if it hits any ham at all.

LW> Also, up to today this one has been catching lots of spam for me, although
LW> interestingly today I haven't had a single spam matching this pattern.
LW> Normally it catches 30-40 a day.

LW> uri URL_EQUALS /www\.[0-9a-z\.\_]+\=[0-9a-z\.\_]+/i
LW> describe URL_EQUALS URL has equal sign in hostname
LW> score URL_EQUALS 4.4

LW> Thanks,

LW> Loren
RE: Re[4]: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
If I'm reading that URL_EQUALS rule correctly, it seems that any URL
with an attribute in the query string will be caught. If that's the
case, I know I create lots of e-mails with links to asp pages with stuff
in the query string, so wouldn't that rule be bad.

> LW> uri URL_EQUALS /www\.[0-9a-z\.\_]+\=[0-9a-z\.\_]+/i
> LW> describe URL_EQUALS URL has equal sign in hostname
> LW> score URL_EQUALS 4.4
>
> LW> Thanks,
>
> LW> Loren
>
>
>
Re: Re[4]: Rule for V-word spam with "?AFF_ID=[a-z]+&$RANDOM=$RANDOM" [ In reply to ]
Well, I'm not that good with a regexp, so I could well have it screwed up to
do that.
However, the intent is to catch

http://www.ghjlfadsafd=hjkafdsjlffda.com/blah/blah

but not catch

http://www.blah.blah.com/foo?=hithere

So hopefully that second scan will stop at the first slash.

Possibly I could have anchored it to the left side instead of using www. as
an anchor, and maybe caught some more spam. But I can't find anything that
tells me what a left anchor means in a uri clause, or if it will even work.
Originally this was a rawbody scan anchored to "http://".

However, at this point I wouldn't bother with that rule. For a couple of
weeks it was catching over half my spam, since everything from Taiwan had
that signature. On the day I posted that rule all of the urls from those
sites changed, and it has not since then caught a single spam.

Loren

>If I'm reading that URL_EQUALS rule correctly, it seems that any URL
>with an attribute in the query string will be caught. If that's the
>case, I know I create lots of e-mails with links to asp pages with stuff
>in the query string, so wouldn't that rule be bad.

> LW> uri URL_EQUALS /www\.[0-9a-z\.\_]+\=[0-9a-z\.\_]+/i
> LW> describe URL_EQUALS URL has equal sign in hostname
> LW> score URL_EQUALS 4.4
>
> LW> Thanks,
>
> LW> Loren
>
>
>