Hello Charles,
Thursday, February 19, 2004, 11:32:03 AM, you wrote:
CG> Hello!
CG> I'm seeing some spam with bogus-looking 'yahoo' message-ID's.
CG> Could someone please test this rule against a nice large corpus?
I took your two suggestions,
CG> header LOC_BADYAHOOMSGID Message-ID =~ /\@yahoo.com/i
CG> header LOC_BADYAHOOMSGID Message-ID =~ /[A-Z]{8,}\@yahoo.com/
And tested the following variations:
header LOC_BADYAHOOMSGID1 Message-ID =~ /\@yahoo.com/i
describe LOC_BADYAHOOMSGID1 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID1 0.5
header LOC_BADYAHOOMSGID2 Message-ID =~ /[A-Z]{8,}\@yahoo.com/
describe LOC_BADYAHOOMSGID2 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID2 0.5
header LOC_BADYAHOOMSGID3 Message-ID =~ /[A-Z]{8}\@yahoo.com/
describe LOC_BADYAHOOMSGID3 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID3 0.5
header LOC_BADYAHOOMSGID4 Message-ID =~ /[A-Z]{8}\@yahoo\.com/
describe LOC_BADYAHOOMSGID4 From Charles Gregory <cgregory@hwcn.org>
score LOC_BADYAHOOMSGID4 0.5
2 and 3 should be equivalent -- the "and more" comma has no real effect
here (except maybe on performance).
I quoted the period in .com in moving from 3 to 4.
Results:
Section 3 -- Frequencies Log
(First numeric frequencies, followed by percentage frequencies)
OVERALL SPAM HAM S/O SCORE NAME
100793 82099 18694 0.815 0.00 0.00 (all messages)
1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID3
1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID4
1218 1218 0 1.000 1.00 0.50 LOC_BADYAHOOMSGID2
1647 1639 8 0.979 0.00 0.50 LOC_BADYAHOOMSGID1
OVERALL% SPAM% HAM% S/O RANK SCORE NAME
100793 82099 18694 0.815 0.00 0.00 (all messages)
100.000 81.4531 18.5469 0.815 0.00 0.00 (all messages as %)
1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID3
1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID4
1.208 1.4836 0.0000 1.000 1.00 0.50 LOC_BADYAHOOMSGID2
1.634 1.9964 0.0428 0.979 0.00 0.50 LOC_BADYAHOOMSGID1
My ham corpus includes lots of emails from yahoo.com webmail users, and
lots of YahooGroups email mailing lists.
Bob Menschel