Mailing List Archive

Need a rule for random words & weird punctuation.
Can somebody point me to a rule to catch long strings of random words(some
aren't even words - more like meta-words or quasi-words lol) without words
like the, to, as, then a, to, like this:

Vbreadroot chicken firecracker blandnesses brent
blunderbuss pi regretful cretan wightman synonymous chip dogberry
assertion bayous !! Zhairpin threesome decant taxicab rank backer .
Xsepia ducat amplified curve americiums incompletion recriminate eve

One of the things I noticed about this was the punctuation. It's there, but
NOBODY I know puts a space BEFORE their punctuation.

I'm sure this has probably been covered by now, but I couldn't find anything
specific to this in the archives. Everything I saw was for short four letter
words, or spammer screwups (I.e. %RANDOM_WORD%)

Thanks,
J.C. Blouin
jc@wittmann-ct.com
Wittmann Robot & Automation Systems, Inc.
... your partner in automation success!
RE: Need a rule for random words & weird punctuation. [ In reply to ]
JC,

Chris has many good rules at http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm <http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm> that might help you.

Gary

-----Original Message-----
From: JC [mailto:JC@wittmann-ct.com]
Sent: Mon 3/1/2004 10:37 AM
To: spamassassin-users@incubator.apache.org
Cc:
Subject: Need a rule for random words & weird punctuation.



Can somebody point me to a rule to catch long strings of random words(some
aren't even words - more like meta-words or quasi-words lol) without words
like the, to, as, then a, to, like this:

Vbreadroot chicken firecracker blandnesses brent
blunderbuss pi regretful cretan wightman synonymous chip dogberry
assertion bayous !! Zhairpin threesome decant taxicab rank backer .
Xsepia ducat amplified curve americiums incompletion recriminate eve

One of the things I noticed about this was the punctuation. It's there, but
NOBODY I know puts a space BEFORE their punctuation.

I'm sure this has probably been covered by now, but I couldn't find anything
specific to this in the archives. Everything I saw was for short four letter
words, or spammer screwups (I.e. %RANDOM_WORD%)

Thanks,
J.C. Blouin
jc@wittmann-ct.com
Wittmann Robot & Automation Systems, Inc.
... your partner in automation success!
RE: Need a rule for random words & weird punctuation. [ In reply to ]
Just an odd note. I just got a bounce back from another person who subscribes to the list with a full mailbox. If you need to sign up to mailing lists you should try to use an account that won't generate ndr's for the users.

Just my $0.02.

Gary



-----Original Message-----
From: Gary Smith
Sent: Mon 3/1/2004 10:38 AM
To: JC; spamassassin-users@incubator.apache.org
Cc:
Subject: RE: Need a rule for random words & weird punctuation.



JC,

Chris has many good rules at http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm <http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm> that might help you.

Gary

-----Original Message-----
From: JC [mailto:JC@wittmann-ct.com]
Sent: Mon 3/1/2004 10:37 AM
To: spamassassin-users@incubator.apache.org
Cc:
Subject: Need a rule for random words & weird punctuation.



Can somebody point me to a rule to catch long strings of random words(some
aren't even words - more like meta-words or quasi-words lol) without words
like the, to, as, then a, to, like this:

Vbreadroot chicken firecracker blandnesses brent
blunderbuss pi regretful cretan wightman synonymous chip dogberry
assertion bayous !! Zhairpin threesome decant taxicab rank backer .
Xsepia ducat amplified curve americiums incompletion recriminate eve

One of the things I noticed about this was the punctuation. It's there, but
NOBODY I know puts a space BEFORE their punctuation.

I'm sure this has probably been covered by now, but I couldn't find anything
specific to this in the archives. Everything I saw was for short four letter
words, or spammer screwups (I.e. %RANDOM_WORD%)

Thanks,
J.C. Blouin
jc@wittmann-ct.com
Wittmann Robot & Automation Systems, Inc.
... your partner in automation success!
Re: Need a rule for random words & weird punctuation. [ In reply to ]
On Mon, 1 Mar 2004, JC wrote:

> One of the things I noticed about this was the punctuation. It's there, but
> NOBODY I know puts a space BEFORE their punctuation.

Um, watch out for FPs. When I speak a language called 'C' I often
put spaces before punctuation. (or for that matter, emoticons. ;)

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
RE: Need a rule for random words & weird punctuation. [ In reply to ]
On Mon, 1 Mar 2004, Gary Smith wrote:

> Just an odd note. I just got a bounce back from another person who subscribes to the list with a full mailbox. If you need to sign up to mailing lists you should try to use an account that won't generate ndr's for the users.
>
> Just my $0.02.
>
> Gary

More to the point, rhnet.org/mail.rh.monroe.edu needs to get a less
brain-damaged mail system.

According to RFC-2821 bounce/NDRs should be sent to the Envelope From
address, NOT the header from. The SA list sets the Envelope From address
to point back to the list server. So a proper mail system will send the
NDRs to the list server thus the list maintainer will see them and can
drop dead addresses and list members will not be bothered by them.
(Gee there's a reason for all that RFC-stuff ;).

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
RE: Need a rule for random words & weird punctuation. [ In reply to ]
Darn. I hadn't thought of that. C wouldn't have been a huge issue, but the
emoticons I could see as possibly an issue.

>
> > One of the things I noticed about this was the
> punctuation. It's there, but
> > NOBODY I know puts a space BEFORE their punctuation.
>
> Um, watch out for FPs. When I speak a language called 'C' I often
> put spaces before punctuation. (or for that matter, emoticons. ;)
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA
> 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>
Re: Need a rule for random words & weird punctuation. [ In reply to ]
JC <JC@wittmann-ct.com> wrote:
> Darn. I hadn't thought of that. C wouldn't have been a huge
> issue, but the emoticons I could see as possibly an issue.

Well, messages to the procmail list often wind up with content translated TO
emoticons by "friendly" client software.

Add procmail to the list of things broken by checking for
space-before-punctuation.

- Bob
RE: Need a rule for random words & weird punctuation. [ In reply to ]
Lol, yeah. That idea seems like it's getting worse and worse with every
post.
Anybody know of a rule for the long strings of random words that don't
contain words like 'the, to, a, an, then, and' and those sort of words? I'd
try to write one, but my REGEX skills are inexistent.
Thanks for your help.
>
> JC <JC@wittmann-ct.com> wrote:
> > Darn. I hadn't thought of that. C wouldn't have been a huge
> > issue, but the emoticons I could see as possibly an issue.
>
> Well, messages to the procmail list often wind up with
> content translated TO
> emoticons by "friendly" client software.
>
> Add procmail to the list of things broken by checking for
> space-before-punctuation.
>
> - Bob
>
Re: Need a rule for random words & weird punctuation. [ In reply to ]
> Anybody know of a rule for the long strings of random words that don't
> contain words like 'the, to, a, an, then, and' and those sort of words?
I'd

Here are two sets. The first one checks only for long strings without
punctuation, and works pretty well. The second set is a modified version
that allows some punctuation, specifically to catch some recent spams that
made it thorough the first set by including random punctuation. This second
set hasn't been through a mass check, and for all I know may be hitting all
kinds of legit stuff as well as the spam.

# match Bayes-poison lists of lowercase words without articles or common
prepositions

body PT_WORDLIST_10
/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z]{4,12}\s+){10}/
describe PT_WORDLIST_10 string of 10+ random words
score PT_WORDLIST_10 1.0

body PT_WORDLIST_13
/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z]{4,12}\s+){13}/
describe PT_WORDLIST_13 string of 13+ random words
score PT_WORDLIST_13 3.0

body PT_WORDLIST_30
/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z]{4,12}\s+){30}/
describe PT_WORDLIST_30 string of 30+ random words
score PT_WORDLIST_30 10.0

# match Bayes-poison lists of lowercase words without articles or common
prepositions

body XX_WORDLIST_10
/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z\.\,\-\;]{4,18}\s+){10}/
describe XX_WORDLIST_10 string of 10+ random words
score XX_WORDLIST_10 1.0

body XX_WORDLIST_13
/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z\.\,\-\;]{4,18}\s+){13}/
describe XX_WORDLIST_13 string of 13+ random words
score XX_WORDLIST_13 3.0

body XX_WORDLIST_30
/(?:\b(?!(?:from|that|have|this|were|with)\b)[a-z\.\,\-\;]{4,18}\s+){30}/
describe XX_WORDLIST_30 string of 30+ random words
score XX_WORDLIST_30 10.0
Re[4]: Need a rule for random words & weird punctuation. [ In reply to ]
Hello JC,

Friday, March 5, 2004, 6:38:52 AM, you wrote:

J> That'd be great. Thanks. :)

J> > J> Anybody know of a rule for the long strings of random
J> > words that don't
J> > J> contain words like 'the, to, a, an, then, and' and
J> > those sort of words? I'd
J> > J> try to write one, but my REGEX skills are inexistent.

RM> I've got several that have been posted to the list, seem to do well.
RM> I'm on the road today, but I'll post them again when I get home.

# longwords -- possible sign of random words placed into spam to confuse anti-spam software

body RM_bpt_longwords68a /\b(?:[a-z]{6,}\s+){8}/
describe RM_bpt_longwords68a Long string of long words
score RM_bpt_longwords68a 1.500 # type=FP - 7429s/2h of 91714 corpus (74113s/17601h) 01/23/04
# ham: userid list,
# "improving compatibility between computer platforms demands certain levels "
body RM_bpt_longwords69a /\b(?:[a-z]{6,}\s+){9}/
describe RM_bpt_longwords69a Long string of long words
score RM_bpt_longwords69a 1.000 # type=max:1 (add to 59a,68a) - 6595s/1h of 91714 corpus (74113s/17601h) 01/23/04
# ham: userid list
body RM_bpt_longwords78a /\b(?:[a-z]{7,}\s+){8}/
describe RM_bpt_longwords78a Long string of long words
score RM_bpt_longwords78a 2.000 # type=max:2 (add to 68a) - 4163s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords59a /\b(?:[a-z]{5,}\s+){9}/
describe RM_bpt_longwords59a Long string of long words
score RM_bpt_longwords59a 1.500 # type=FP - 8753s/8h of 91714 corpus (74113s/17601h) 01/23/04
# ham: userid list
body RM_bpt_longwords79a /\b(?:[a-z]{7,}\s+){9}/
describe RM_bpt_longwords79a Long string of long words
score RM_bpt_longwords79a 1.000 # type=max:1 (add to 78a) - 2950s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords96a /\b(?:[a-z]{9,}\s+){6}/
describe RM_bpt_longwords96a Long string of long words
score RM_bpt_longwords96a 4.000 # 1162s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords88a /\b(?:[a-z]{8,}\s+){8}/
describe RM_bpt_longwords88a Long string of long words
score RM_bpt_longwords88a 4.000 # 1025s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords89a /\b(?:[a-z]{8,}\s+){9}/
describe RM_bpt_longwords89a Long string of long words
score RM_bpt_longwords89a 1.000 # type=max:1 (add to 88a) - 590s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords97 /\b(?:\w{9,}\s+){7}/
describe RM_bpt_longwords97 Long string of long words
score RM_bpt_longwords97 3.000 # 545s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords98 /\b(?:\w{9,}\s+){8}/
describe RM_bpt_longwords98 Long string of long words
score RM_bpt_longwords98 1.000 # type=max:1 (add to 97) - 442s/0h of 91714 corpus (74113s/17601h) 01/23/04
body RM_bpt_longwords99 /\b(?:\w{9,}\s+){9}/
describe RM_bpt_longwords99 Long string of long words
score RM_bpt_longwords99 1.000 # type=max:1 (add to 98) - 330s/0h of 91714 corpus (74113s/17601h) 01/23/04

# Second pattern -- really long words, 20+ characters in length, possibly
# separated by commas or periods. Avoid common section separators and
# common/valid encoding lengths
body RM_bpt_longwords20 /(?! _+ )(?! A+ )(?! \w{24} ) \w{20,29}[,.]? /
describe RM_bpt_longwords20 One long string of letters/digits, possible comma or period at end, space between "words"
score RM_bpt_longwords20 1.939 # 10992s/116h of 100689 corpus (81249s/19440h) 02/29/04
body RM_bpt_longwords30 /(?! _+ )(?! A+ )(?! \w{32} ) \w{30,39}[,.]? /
describe RM_bpt_longwords30 One long string of letters/digits, possible comma or period at end, space between "words"
score RM_bpt_longwords30 1.630 # 567s/8h of 100689 corpus (81249s/19440h) 02/29/04
body RM_bpt_longwords40 /(?! _+ )(?! A+ ) \w{40,49}[,.]? /
describe RM_bpt_longwords40 One long string of letters/digits, possible comma or period at end, space between "words"
score RM_bpt_longwords40 3.000 # 209s/0h of 100689 corpus (81249s/19440h) 02/29/04
body RM_bpt_longwords50 /(?! _+ )(?! A+ ) \w{50,59}[,.]? /
describe RM_bpt_longwords50 One long string of letters/digits, possible comma or period at end, space between "words"
score RM_bpt_longwords50 1.860 # 86s/0h of 100689 corpus (81249s/19440h) 02/29/04
body RM_bpt_longwords60 /(?! _+ )(?! A+ )(?! \w{62,64} ) \w{61,69}[,.]? /
describe RM_bpt_longwords60 One long string of letters/digits, possible comma or period at end, space between "words"
score RM_bpt_longwords60 2.410 # 141s/0h of 100689 corpus (81249s/19440h) 02/29/04
body RM_bpt_longwords70 /(?! _+ )(?! A+ )(?! \w{72} )(?! \w{76} ) \w{70,}[,.]? /
describe RM_bpt_longwords70 One long string of letters/digits, possible comma or period at end, space between "words"
score RM_bpt_longwords70 2.400 # 280s/1h of 100689 corpus (81249s/19440h) 02/29/04