Matt Kettler wrote to Mike Samba and spamassassin-users@incubator.apache.org:
> FWIW I use a combination of two sources for HAM training:
>
> 1) some selected chunks of my own email (ie: mailing lists not
> involving SA, personal email, etc)
>
> 2) I set up a "nonspamtrap" account, and I've subscribed this to a few
> of the newsletters my user's commonly subscribe to.
Good sources. We provide "spam" and "nonspam" accounts for our more
pro-active clients to forward spam and ham, particularly messages that
were incorrectly classified. As long as they're instructed to forward
such messages as attachments, the messages (attachments) come through
unmolested.
I'm fortunate enough to personally own a domain that is now very close
in spelling (same name, different TLD) to a domain used by a large ISP
in our region. After seeing the postmaster logs on our email server, I
set up an account to catch all of the incoming email on my domain. There
are enough mistypes that I get several hundred messages per day for
different recipients, including ham, spam, and virii. It's the closest
thing to broadly varied user email that we can get without violating our
own privacy policy.
I have a staff member (otherwise known as our Resident SpamQueen) go
through that, as well as our shared email boxes (sales, support, etc),
and train the filter. She has no problem finding 1000+ SPAM and HAM
weekly. It's done wonders for our filtering.
If we didn't have such a good source of email, I guess I'd ask a small
percentage of our customers to *voluntarily* allow us to use their
accounts to train the filter... at which point we could just have the
server FCC all of their messages to another shared mailbox on our system
for our bodacious SpamQueen to traverse. That's trivial to implement on
most systems.
Yes, filtering can be configured on a per-user basis, but we chose to
make it as simple for our clients (and as simple for us) as possible,
and go site-wide. So, the filtering may not be quite as precise, but at
least *we* control the QoS, and we err on the side of caution.
It's worked remarkably well. We've been sustaining about 95% correctly
filtered, with no false positives. Server-wide, our HAM:SPAM ratio is
about 1.5:1. With many personal accounts, though, it's more like 1:15
(90-95% SPAM), after viruses are taken out of the equation (but that's
another tangent). We'd be sunk without SpamAssassin.
- Ryan
--
Ryan Thompson <ryan@sasknow.com>
SaskNow Technologies -
http://www.sasknow.com 901-1st Avenue North - Saskatoon, SK - S7K 1Y4
Tel: 306-664-3600 Fax: 306-244-7037 Saskatoon
Toll-Free: 877-727-5669 (877-SASKNOW) North America