Mailing List Archive

Spam Reporting Tool
Hi,
I am managing a postfix gateway with a traffic of 100000 mails/day and interested to integrate SA but before that i would like to know which is best spam report generation tool to be used with SA

Many Thanks in advance
Sagar Shah



http://clients.rediff.com/signature/track_sig.asp"]
Re: Spam Reporting Tool [ In reply to ]
> I am managing a postfix gateway with a traffic of 100000 mails/day and
> interested to integrate SA but before that i would like to know which is
> best spam report generation tool to be used with SA

If you mean provide statistics on SpamAssassin's effectiveness, I'm not
aware of anything, especially not with the level of detail that webserver
log analyzers have, and I've looked around once or twice. I've basically
rolled my own (my needs were not that sophisticated) by using an excellent
Perl script called "hgrep" that allows you to grep mail headers by
intelligently dealing with the line breaks for you. You can grab a copy
from here:

http://www.cpan.org/authors/id/E/EL/ELIJAH/

I just run a script that utilises this across the sorted piles of Ham and
Spam I accumulate each week and produce a summary of what I want from
that. It's trivial to get a report showing the number of times each rule
has scored against spam and ham so that it can be rescored or disabled to
improve effectiveness, particularly spammy domains with no ham and so on.

Still, I wouldn't say no to a "spamalog" or "spamalizer" if anyone knows
of such a beast...

Andy
Re: Spam Reporting Tool [ In reply to ]
Hi,

On Fri, 5 Mar 2004 14:24:24 -0000 (GMT) "Andy Blanchard" <andyb@zocalo.uk.com> wrote:

> ...I've basically
> rolled my own (my needs were not that sophisticated) by using an excellent
> Perl script called "hgrep" that allows you to grep mail headers by
> intelligently dealing with the line breaks for you. You can grab a copy
> from here:
>
> http://www.cpan.org/authors/id/E/EL/ELIJAH/

... which led me to analyze my spam corpus (the 3219 flagged by SA 2.6x)
to get my Top 25 Rule list:

86.30% 2778 HTML_MESSAGE
82.82% 2666 BAYES_99
71.51% 2302 MIME_HTML_ONLY
54.61% 1758 MIME_HTML_NO_CHARSET
50.85% 1637 RCVD_IN_SORBS
37.65% 1212 BIZ_TLD
34.61% 1114 DCC_CHECK
33.55% 1080 HTML_FONT_BIG
26.53% 854 MISSING_MIMEOLE
25.82% 831 RAZOR2_CHECK
25.41% 818 FORGED_OUTLOOK_TAGS
24.79% 798 MIME_HTML_ONLY_MULTI
24.42% 786 RAZOR2_CF_RANGE_51_100
23.33% 751 RCVD_IN_NJABL
22.83% 735 HTML_FONTCOLOR_RED
22.24% 716 CLICK_BELOW
21.78% 701 HTML_IMAGE_ONLY_02
21.00% 676 RCVD_IN_DSBL
19.01% 612 HTML_FONT_INVISIBLE
18.27% 588 HTML_70_80
18.11% 583 USERPASS
17.05% 549 FORGED_OUTLOOK_HTML
16.56% 533 HTML_FONTCOLOR_UNKNOWN
15.91% 512 HTML_60_70
15.56% 501 RCVD_IN_RFCI

And with all the network and Bayes tests removed:

86.30% 2778 HTML_MESSAGE
71.51% 2302 MIME_HTML_ONLY
54.61% 1758 MIME_HTML_NO_CHARSET
37.65% 1212 BIZ_TLD
33.55% 1080 HTML_FONT_BIG
26.53% 854 MISSING_MIMEOLE
25.41% 818 FORGED_OUTLOOK_TAGS
24.79% 798 MIME_HTML_ONLY_MULTI
22.83% 735 HTML_FONTCOLOR_RED
22.24% 716 CLICK_BELOW
21.78% 701 HTML_IMAGE_ONLY_02
19.01% 612 HTML_FONT_INVISIBLE
18.27% 588 HTML_70_80
18.11% 583 USERPASS
17.05% 549 FORGED_OUTLOOK_HTML
16.56% 533 HTML_FONTCOLOR_UNKNOWN
15.91% 512 HTML_60_70
13.20% 425 HTML_FONTCOLOR_UNSAFE
12.30% 396 MISSING_OUTLOOK_NAME
12.24% 394 HTML_FONTCOLOR_BLUE
11.96% 385 PENIS_ENLARGE2
11.18% 360 HTML_50_60
10.84% 349 HTML_LINK_CLICK_HERE
9.94% 320 HTTP_EXCESSIVE_ESCAPES
9.85% 317 DATE_IN_FUTURE_12_24

Note that during this period the Tripwire rules changed name from
FVGT_TRIPWIRE_xx to TW_xx and rules like Chickenpox, Backhair, Weeds,
and Tripwire should be condensed into a group.

One should analyze ham as well to see which tests they trigger; you
might as well run a mass-check if you want good, detailed statistics.

fyi,

-- Bob