Mailing List Archive

Could someone please TEST this rule?
Hello!

I'm seeing some spam with bogus-looking 'yahoo' message-ID's.
Could someone please test this rule against a nice large corpus?

header LOC_BADYAHOOMSGID Message-ID =~ /\@yahoo.com/i

If that one matches ham, please try the more-specific version below:

header LOC_BADYAHOOMSGID Message-ID =~ /[A-Z]{8,}\@yahoo.com/

I have deliberately left off the 'i' (case insensitive) switch in
this rule.

Thanks!

- Charles
Re: Could someone please TEST this rule? [ In reply to ]
On Thu, 19 Feb 2004, Matt Kettler wrote:
> >I'm seeing some spam with bogus-looking 'yahoo' message-ID's.
> >Could someone please test this rule against a nice large corpus?
> I definitely have at least one legitimate yahoo email that matches the
> first rule. It's a mailing list post.
> Message-ID: <FHEIILEMBPEHMJHDEFAGMEGOCDAA.EEEEEE@yahoo.com>

Dang, and that ID's close enough in stlye that I think this idea dies
stillborn. Thanks anyways!

- Charles
Re: Could someone please TEST this rule? [ In reply to ]
At 02:48 PM 2/19/2004, Charles Gregory wrote:
>Dang, and that ID's close enough in stlye that I think this idea dies
>stillborn. Thanks anyways!

Not so... the message ID contains a period in it.. right before the user's
login ID.

your second rule that uses [A-Z]{8,} rule should be fine with it.
Re: Could someone please TEST this rule? [ In reply to ]
Hello Charles,

Thursday, February 19, 2004, 11:32:03 AM, you wrote:

CG> Hello!

CG> I'm seeing some spam with bogus-looking 'yahoo' message-ID's.
CG> Could someone please test this rule against a nice large corpus?

I took your two suggestions,

CG> header LOC_BADYAHOOMSGID Message-ID =~ /\@yahoo.com/i
CG> header LOC_BADYAHOOMSGID Message-ID =~ /[A-Z]{8,}\@yahoo.com/

And tested the following variations:

header LOC_BADYAHOOMSGID1 Message-ID =~ /\@yahoo.com/i
describe LOC_BADYAHOOMSGID1 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID1 0.5
header LOC_BADYAHOOMSGID2 Message-ID =~ /[A-Z]{8,}\@yahoo.com/
describe LOC_BADYAHOOMSGID2 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID2 0.5
header LOC_BADYAHOOMSGID3 Message-ID =~ /[A-Z]{8}\@yahoo.com/
describe LOC_BADYAHOOMSGID3 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID3 0.5
header LOC_BADYAHOOMSGID4 Message-ID =~ /[A-Z]{8}\@yahoo\.com/
describe LOC_BADYAHOOMSGID4 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID4 0.5

2 and 3 should be equivalent -- the "and more" comma has no real effect
here (except maybe on performance).

I quoted the period in .com in moving from 3 to 4.

Results:

Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)

OVERALL SPAM HAM S/O SCORE NAME
100793 82099 18694 0.815 0.00 0.00 (all messages)
1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID3
1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID4
1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID2
1647 1639 8 0.979 0.00 0.50 LOC_BADYAHOOMSGID1

OVERALL% SPAM% HAM% S/O RANK SCORE NAME
100793 82099 18694 0.815 0.00 0.00 (all messages)
100.000 81.4531 18.5469 0.815 0.00 0.00 (all messages as %)
1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID3
1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID4
1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID2
1.634 1.9964 0.0428 0.979 0.00 0.50 LOC_BADYAHOOMSGID1

My ham corpus includes lots of emails from yahoo.com webmail users, and
lots of YahooGroups email mailing lists.

Bob Menschel