Mailing List Archive

How to optimize SA?
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been running SA on a Cobalt Qube since 2.42 was current, and it's
been a wonderful solution. Recently I moved to running spamd -d to
because of the increased load the increase in spam was putting on the
server. Now that's not cutting it- they're hitting me so hard I'm
seeing processor loads in the 30's, and my poor little Qube is buckling
under the load- I can't ssh in, email exchanges time out, ect.

What else can I do to make SA run as quickly as possible- besides the
obvious "buy a bigger server"?

Thanks,

Steve.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFATjmxw0v3pepO1lMRAq8TAJ0VggCnqdAcyxyiJMoZaWKr4z+tSwCgiMZG
1ecxpcyXJnBqtJ96TTgXghU=
=/KqD
-----END PGP SIGNATURE-----
Re: How to optimize SA? [ In reply to ]
> I've been running SA on a Cobalt Qube since 2.42 was current, and it's
> been a wonderful solution. Recently I moved to running spamd -d to
> because of the increased load the increase in spam was putting on the
> server. Now that's not cutting it- they're hitting me so hard I'm
> seeing processor loads in the 30's, and my poor little Qube is buckling
> under the load- I can't ssh in, email exchanges time out, ect.
>
> What else can I do to make SA run as quickly as possible- besides the
> obvious "buy a bigger server"?

I don't think tuning is going to get your loads down much from the 30's.
Assuming a best case optimization gives you a 10% to 15% improvement, your
box will still be overloaded. How many emails per hour/day are you receiving
and over what link speed?
Re: How to optimize SA? [ In reply to ]
At 04:41 PM 3/9/2004, Steve Yuroff wrote:
>I've been running SA on a Cobalt Qube since 2.42 was current, and it's
>been a wonderful solution. Recently I moved to running spamd -d to
>because of the increased load the increase in spam was putting on the
>server. Now that's not cutting it- they're hitting me so hard I'm seeing
>processor loads in the 30's, and my poor little Qube is buckling under the
>load- I can't ssh in, email exchanges time out, ect.
>
>What else can I do to make SA run as quickly as possible- besides the
>obvious "buy a bigger server"?


First, you said you're running spamd -d... did you replace calls to
spamassassin with spamc? Otherwise using spamd doesn't help at all.

You can limit the load by using the -m parameter to spamd, forcing it to
limit the number of children, but then you need to use -f on spamc,
resulting in unfiltered mail.

You can also gain speed, at a sacrifice of accuracy, by disabling network
checks (razor, dnsbls, etc), bayes, and the AWL. Also ejecting some add-on
rulesets helps. A good first-check to find any culprits is to run a message
through spamassassin -D and see if it's getting "hung up" anywhere along
the way. If you're not on a current version (2.6x) you could be suffering
from a dead DNSBL.

Another method of dealing with it all is to limit the number of emails fed
to SA in the first place.. This usually requires an integration tool that
does queuing, like MailScanner. Of course, with this kind of system you
wind up trading off CPU load for increased queue depth and delayed mail
delivery. However, if you mail load is "bursty" and not continuous over the
whole day, this will deal with the spikes pretty nicely.
Re: How to optimize SA? [ In reply to ]
Steve Yuroff wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I've been running SA on a Cobalt Qube since 2.42 was current, and it's
> been a wonderful solution. Recently I moved to running spamd -d to
> because of the increased load the increase in spam was putting on the
> server. Now that's not cutting it- they're hitting me so hard I'm
> seeing processor loads in the 30's, and my poor little Qube is buckling
> under the load- I can't ssh in, email exchanges time out, ect.
>
> What else can I do to make SA run as quickly as possible- besides the
> obvious "buy a bigger server"?
>
> Thanks,
>

Hi Steve,
What model Qube, how many users, what version of SA?
I'm running a Q3 w/512M and 56 users all individual Bayes and User_prefs
and my load is .1. Sometimes it gets to 1.5 if I'm under a spamstorm but
usually it isn't even breathing hard. We're averaging 3200 messages a
day with spam accounting for 66% of the load.
Scott

--
Scott V. Blomquist,A-SA-CN-NRK TINLC(tm) #2598
ITI/Bear&Co Rochester, VT
802-767-3174(v) 802-767-3726(f)
"Any technology sufficiently advanced is indistinguishable from Magic."
A. C. Clarke
Re: How to optimize SA? [ In reply to ]
On Mar 9, 2004, at 3:46 PM, Jason Borkowsky wrote:

>
>> I've been running SA on a Cobalt Qube since 2.42 was current, and it's
>> been a wonderful solution. Recently I moved to running spamd -d to
>> because of the increased load the increase in spam was putting on the
>> server. Now that's not cutting it- they're hitting me so hard I'm
>> seeing processor loads in the 30's, and my poor little Qube is
>> buckling
>> under the load- I can't ssh in, email exchanges time out, ect.
>>
>> What else can I do to make SA run as quickly as possible- besides the
>> obvious "buy a bigger server"?
>
> I don't think tuning is going to get your loads down much from the
> 30's.
> Assuming a best case optimization gives you a 10% to 15% improvement,
> your
> box will still be overloaded. How many emails per hour/day are you
> receiving
> and over what link speed?
>
>
Last 24 hr period (4A-4A) was 1350 messages, over a 768K DSL line.
Re: How to optimize SA? [ In reply to ]
On Tue, Mar 09, 2004 at 03:41:48PM -0600, Steve Yuroff wrote:
> I've been running SA on a Cobalt Qube since 2.42 was current, and it's
> been a wonderful solution. Recently I moved to running spamd -d to
> because of the increased load the increase in spam was putting on the
> server. Now that's not cutting it- they're hitting me so hard I'm
> seeing processor loads in the 30's, and my poor little Qube is buckling
> under the load- I can't ssh in, email exchanges time out, ect.

First off, disable the network checks and move any RBL checks you relied on
SA for out onto your Internet-frontend SMTP server. That way you lose no
functionality, but remove a bunch of "hangs" from SA - which lead to
loading issues.

The DNS checks cannot be underestimated. We run Qmail-Scanner on all our
e-mails - and it's a perl script. It typically processes an e-mail in <0.3
secs, but calling spamc adds 2-8sec to it due to all the RBL lookups going
on.

[.I prefer running them via SA and will do so until the load gets too much -
then I will take my own advice and move RBL back out to where it was
designed for ;-)]

--
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Re: How to optimize SA? [ In reply to ]
At 01:56 PM 3/9/2004, Matt Kettler wrote:
>You can also gain speed, at a sacrifice of accuracy, by disabling network
>checks (razor, dnsbls, etc), bayes, and the AWL.

Another thing you can do, if you want to keep some network tests active, is
set low timeout values for the tests you do use.

Kelson Vibber
SpeedGate Communications <www.speed.net>
Re: How to optimize SA? [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jason Haar writes:
>On Tue, Mar 09, 2004 at 03:41:48PM -0600, Steve Yuroff wrote:
>> I've been running SA on a Cobalt Qube since 2.42 was current, and it's
>> been a wonderful solution. Recently I moved to running spamd -d to
>> because of the increased load the increase in spam was putting on the
>> server. Now that's not cutting it- they're hitting me so hard I'm
>> seeing processor loads in the 30's, and my poor little Qube is buckling
>> under the load- I can't ssh in, email exchanges time out, ect.
>
>First off, disable the network checks and move any RBL checks you relied on
>SA for out onto your Internet-frontend SMTP server. That way you lose no
>functionality, but remove a bunch of "hangs" from SA - which lead to
>loading issues.
>
>The DNS checks cannot be underestimated. We run Qmail-Scanner on all our
>e-mails - and it's a perl script. It typically processes an e-mail in <0.3
>secs, but calling spamc adds 2-8sec to it due to all the RBL lookups going
>on.
>
>[.I prefer running them via SA and will do so until the load gets too much -
>then I will take my own advice and move RBL back out to where it was
>designed for ;-)]

FWIW, though, SpamAssassin 2.6x has a very sophisticated DNSBL lookup
algorithm which (a) runs all queries in parallel and (b) aborts ones
that are taking significantly longer than the others, reducing problems
if one or two hosts go down. So nowadays this should be *more* efficient
run from SpamAssassin than from the MTA.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFATkqKQTcbUG5Y7woRAjW6AKDxOA86Q4FhT+tWccGi0Dnhue2WTwCfdhSh
cK6u9F9PTiemsC5iUkTeCrk=
=t35S
-----END PGP SIGNATURE-----
Re: How to optimize SA? [ In reply to ]
On Tue, Mar 09, 2004 at 02:51:54PM -0800, Justin Mason wrote:
> FWIW, though, SpamAssassin 2.6x has a very sophisticated DNSBL lookup
> algorithm which (a) runs all queries in parallel and (b) aborts ones
> that are taking significantly longer than the others, reducing problems
> if one or two hosts go down. So nowadays this should be *more* efficient
> run from SpamAssassin than from the MTA.

I don't doubt it. But that means spamc hangs around for up to 10 seconds
awaiting a response on a system that runs "spamd -m XX" - i.e. XX max
processes. If you ran RBL checks out on the SMTP server, then you may be
inefficient, but at least that 25K SMTP process is dealing with it rather
than a 28M spamd process...

--
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1
Re: How to optimize SA? [ In reply to ]
On Mar 9, 2004, at 3:57 PM, Scott Blomquist wrote:

>
>
> Steve Yuroff wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> I've been running SA on a Cobalt Qube since 2.42 was current, and
>> it's been a wonderful solution. Recently I moved to running spamd -d
>> to because of the increased load the increase in spam was putting on
>> the server. Now that's not cutting it- they're hitting me so hard
>> I'm seeing processor loads in the 30's, and my poor little Qube is
>> buckling under the load- I can't ssh in, email exchanges time out,
>> ect.
>> What else can I do to make SA run as quickly as possible- besides the
>> obvious "buy a bigger server"?
>> Thanks,
>
> Hi Steve,
> What model Qube, how many users, what version of SA?
> I'm running a Q3 w/512M and 56 users all individual Bayes and
> User_prefs and my load is .1. Sometimes it gets to 1.5 if I'm under a
> spamstorm but usually it isn't even breathing hard. We're averaging
> 3200 messages a day with spam accounting for 66% of the load.
> Scott
>

OK, I must be doing something wrong! This is a Qube 3 Pro, 384M, 40
users, shared Bayes. SA 2.63. BigEvil, BlackHair, evilnumbers,
popcorn all included. I see under 2K messages/day, about 70-80% spam.

I need to run the test message through spamassassin -D and see where it
hangs.... but I can't remember where the default message lives. Anyone
want to jog my memory?
Re: How to optimize SA? [ In reply to ]
Justin Mason wrote:

>> FWIW, though, SpamAssassin 2.6x has a very sophisticated DNSBL lookup
>> algorithm which (a) runs all queries in parallel and (b) aborts ones
>> that are taking significantly longer than the others, reducing problems
>> if one or two hosts go down. So nowadays this should be *more* efficient
>> run from SpamAssassin than from the MTA.

Jason Haar <Jason.Haar@trimble.co.nz> writes:

> I don't doubt it. But that means spamc hangs around for up to 10 seconds
> awaiting a response on a system that runs "spamd -m XX" - i.e. XX max
> processes.

It's more like 3 seconds when some DNSBL are timing out, not 10.

Sometimes, the MX checking tests can be slow, but you can turn those off
independently of the DNSBL (RBL)tests.

> If you ran RBL checks out on the SMTP server, then you may be
> inefficient, but at least that 25K SMTP process is dealing with it
> rather than a 28M spamd process...

You realize that fork() in Linux (and BSD too, I'm sure) doesn't copy
every page? 28M virtual size != the actual additional memory used by a
spamd process.

Nevermind the fact that most DNSBLs aren't accurate enough as outright
reject/blocks. SpamAssassin is much more accurate than MTA blacklists.

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
Re: How to optimize SA? [ In reply to ]
Steve Yuroff <syuroff@hiebing.com> writes:

> OK, I must be doing something wrong! This is a Qube 3 Pro, 384M, 40
> users, shared Bayes.

You could be having some lock contention issues with a read-write shared
Bayes, but I doubt that too.

> SA 2.63. BigEvil, BlackHair, evilnumbers, popcorn all included. I
> see under 2K messages/day, about 70-80% spam.

It might be that one of the rule sets you've added that is making it
slow(er).

> I need to run the test message through spamassassin -D and see where
> it hangs.... but I can't remember where the default message lives.
> Anyone want to jog my memory?

Depends on how it was installed. It's better to use a few random emails
received from the outside world and sent to your site, save them in Unix
mbox mode and then test.

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
Re: How to optimize SA? [ In reply to ]
Okay, hiebing.com rejected my mail. Given how much *crap* has been
added to their SpamAssassin and MTA configuration, it's no wonder it
ain't so fast.

------- start of cut text --------------
From: Mail Delivery System <Mailer-Daemon@pathname.com>
To: quinlan@pathname.com
Subject: Mail delivery failed: returning message to sender
Date: Tue, 09 Mar 2004 17:46:49 -0800

This message was created automatically by mail delivery software (Exim).

A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:

syuroff@hiebing.com
SMTP error from remote mailer after MAIL FROM:<quinlan@pathname.com>:
host mail.hiebing.com [64.73.14.137]: 550 5.0.0 UCE Refused by www.spamlist.org
------- end ----------------------------

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
Re: How to optimize SA? [ In reply to ]
On Mar 9, 2004, at 7:55 PM, Daniel Quinlan wrote:

> Okay, hiebing.com rejected my mail. Given how much *crap* has been
> added to their SpamAssassin and MTA configuration, it's no wonder it
> ain't so fast.
>
> ------- start of cut text --------------
> From: Mail Delivery System <Mailer-Daemon@pathname.com>
> To: quinlan@pathname.com
> Subject: Mail delivery failed: returning message to sender
> Date: Tue, 09 Mar 2004 17:46:49 -0800
>
> This message was created automatically by mail delivery software
> (Exim).
>
> A message that you sent could not be delivered to one or more of its
> recipients. This is a permanent error. The following address(es)
> failed:
>
> syuroff@hiebing.com
> SMTP error from remote mailer after MAIL
> FROM:<quinlan@pathname.com>:
> host mail.hiebing.com [64.73.14.137]: 550 5.0.0 UCE Refused by
> www.spamlist.org
> ------- end ----------------------------
>
> Daniel
>
That's the first report I've had of the spamlist.org database bouncing
mail from a legit source.

Are you saying that spamlist.org and the add-in rulesets I'm using are
crap? They seemed to be useful, legit tools to me, but I'm here to
learn. If it's to be avoided, I'm all for it.

My understanding was that the spamlist.org list would cut my processing
load at the MTA level... isn't it easier to not accept the mail, than
to accept it and then process it for ham/spam?

Steve.
RE: How to optimize SA? [ In reply to ]
> -----Original Message-----
> From: Daniel Quinlan [mailto:quinlan@pathname.com]
> Sent: Tuesday, March 09, 2004 8:47 PM
> To: Steve Yuroff
> Cc: Scott Blomquist; spamassassin-users@incubator.apache.org
> Subject: Re: How to optimize SA?
>
>
> Steve Yuroff <syuroff@hiebing.com> writes:
>
> > OK, I must be doing something wrong! This is a Qube 3 Pro,
> 384M, 40
> > users, shared Bayes.
>
> You could be having some lock contention issues with a
> read-write shared
> Bayes, but I doubt that too.
>
> > SA 2.63. BigEvil, BlackHair, evilnumbers, popcorn all included. I
> > see under 2K messages/day, about 70-80% spam.
>
> It might be that one of the rule sets you've added that is making it
> slow(er).
>

Hey now! :-) Popcorn isn't used anymore. Backhair now includes what popcorn
used to do. So you can remove popcorn. Bigevil and Evilnumbers are run on
servers with MUCH more traffic then you get, and they run fine.

*snip*
Re: How to optimize SA? [ In reply to ]
On Mar 9, 2004, at 7:46 PM, Daniel Quinlan wrote:

> Steve Yuroff <syuroff@hiebing.com> writes:
>
>> OK, I must be doing something wrong! This is a Qube 3 Pro, 384M, 40
>> users, shared Bayes.
>
> You could be having some lock contention issues with a read-write
> shared
> Bayes, but I doubt that too.
>
>>

Would lock contention issues look like this?:

debug: bayes: 8340 tie-ing to DB file R/O
/home/users/sharedspam/.spamassassin/bayes_toks
debug: bayes: 8340 tie-ing to DB file R/O
/home/users/sharedspam/.spamassassin/bayes_seen


I've been doing some debug runs on the included sample-spam.txt, and I
see that's where the biggest pause happens right before that line.
Maybe a shared bayes database for my whole company isn't going to work
anymore?

Thanks all for your time helping me out. I greatly appreciate it.

Steve.
RE: How to optimize SA? [ In reply to ]
Chris Santerre <csanterre@MerchantsOverseas.com> writes:

> Hey now! :-) Popcorn isn't used anymore. Backhair now includes what popcorn
> used to do. So you can remove popcorn. Bigevil and Evilnumbers are run on
> servers with MUCH more traffic then you get, and they run fine.

When someone is having performance problems, asks how to optimize SA,
and it later evolves that they have installed a lot of extra rule sets
*and* their mail server bounces my email response due to another thing
being installed that wasn't mentioned, I reserve the right to be a tad
bit suspicious that other things have been installed or that something
has been changed affecting performance badly.

Debugging performance issues starting with a 100% standard baseline is a
fairly common sense way to approach problems like these.

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
RE: How to optimize SA? [ In reply to ]
> -----Original Message-----
> From: Daniel Quinlan [mailto:quinlan@pathname.com]
> Sent: Wednesday, March 10, 2004 2:41 PM
> To: spamassassin-users@incubator.apache.org
> Cc: Chris Santerre
> Subject: RE: How to optimize SA?
>
>
> Chris Santerre <csanterre@MerchantsOverseas.com> writes:
>
> > Hey now! :-) Popcorn isn't used anymore. Backhair now
> includes what popcorn
> > used to do. So you can remove popcorn. Bigevil and
> Evilnumbers are run on
> > servers with MUCH more traffic then you get, and they run fine.
>
> When someone is having performance problems, asks how to optimize SA,
> and it later evolves that they have installed a lot of extra rule sets
> *and* their mail server bounces my email response due to another thing
> being installed that wasn't mentioned, I reserve the right to be a tad
> bit suspicious that other things have been installed or that something
> has been changed affecting performance badly.
>
> Debugging performance issues starting with a 100% standard
> baseline is a
> fairly common sense way to approach problems like these.
>
> Daniel

I agree.

-C
Re: How to optimize SA? [ In reply to ]
Steve Yuroff <syuroff@hiebing.com> writes:

> Would lock contention issues look like this?:
>
> debug: bayes: 8340 tie-ing to DB file R/O
> /home/users/sharedspam/.spamassassin/bayes_toks
> debug: bayes: 8340 tie-ing to DB file R/O
> /home/users/sharedspam/.spamassassin/bayes_seen

No, that's just R/O locking.

> I've been doing some debug runs on the included sample-spam.txt, and I
> see that's where the biggest pause happens right before that line.
> Maybe a shared bayes database for my whole company isn't going to work
> anymore?

I believe individual bayes databases are probably faster, but other
people seem to do quite okay with shared bayes databases.

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
Re: How to optimize SA? [ In reply to ]
Steve Yuroff <syuroff@hiebing.com> writes:

> That's the first report I've had of the spamlist.org database bouncing
> mail from a legit source.

Well, they blocked my entire /24 network at PacBell. These addresses
are quite static and I'm not aware of any spammers being on it.

> Are you saying that spamlist.org and the add-in rulesets I'm using are
> crap? They seemed to be useful, legit tools to me, but I'm here to
> learn. If it's to be avoided, I'm all for it.

I'm saying if you have a performance problem, start with the default
installation and see if it has a performance problem before you start
trying to optimize things.

> My understanding was that the spamlist.org list would cut my processing
> load at the MTA level... isn't it easier to not accept the mail, than
> to accept it and then process it for ham/spam?

Yes and no.

First, it's very easy to reject messages. You can get no spam (and not
necessarily many complaints since nobody will be able to mail you) if
you bounce most everything.

Second, most MTAs probably don't do as good a job handling DNS
blacklists that are down, so it could be actually slower. It's
definitely not as accurate as I discovered.

Let me be blunt, when I reply to a question on a mailing list and I get
a bounce, I'm not going to be very happy about it and I'm going to call
your configuration some well-deserved bad names. I don't know much
about spamlist.org, but I'm not impressed so far. My IP isn't listed on
a single spam blacklist listed at <http://www.moensted.dk/spam/> or
<http://www.declude.com/Junkmail/support/ip4r.htm>. (I have seen it
listed on one PacBell blacklist that lists every PacBell IP.)

Fortunately, the SpamAssassin developers have been testing DNS
blacklists for years and include the very best ones in SA, scoring them
with a score optimizer to reduce false positives.

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
Re: How to optimize SA? [ In reply to ]
On Wed, 2004-03-10 at 08:12, Steve Yuroff wrote:

> That's the first report I've had of the spamlist.org database bouncing
> mail from a legit source.
>
> Are you saying that spamlist.org and the add-in rulesets I'm using are
> crap? They seemed to be useful, legit tools to me, but I'm here to
> learn. If it's to be avoided, I'm all for it.
>
> My understanding was that the spamlist.org list would cut my processing
> load at the MTA level... isn't it easier to not accept the mail, than
> to accept it and then process it for ham/spam?

Unfortunately it seems that a log of ISPs are buying that line and just
using blacklists to reject e-mail outright. I hate spam, which is why
I've installed SA locally, and I'm sure that I'm doing a much better job
of filtering spam than my isp is.

Most of the blacklists claim that they have a very low false-positive
rate, but then most if not all warn against discarding mail based on the
BL alone. I first became aware of the isp/rbl problem when I tried to
trace down why yahoo groups was periodically suspending sending me mail
because I was bouncing it. The bounce history showed that my ISP was
giving hard bounces because the particular yahoo server was blacklisted
temporarily on spamcop.net. It was no use trying to convince my ISP that
they should not use an RBL to block mail, they LOVED the fact that the
load on their servers was reduced. The problem is that they had no way
of knowing whether or not the mail they were rejecting was in fact
legitimate.

More recently I've had more problems just trying to sign up for some
support lists on sourceforge, because the registration confirmation
email never got to me. I finally resorted to opening up my own MTA and
got the email immediately via this back channel.

Using RBLs as one input to a filtering tool like SA might be a good
idea, but from what I'm seeing the growing trend towards using them to
block e-mails is a very bad one. I'm beginning to think that the USPS is
doing a better job of actually delivering my mail than a lot of ISPs do.

Sorry for the rant!
Re: How to optimize SA? [ In reply to ]
On Mar 10, 2004, at 5:27 PM, Rick DeNatale wrote:

> On Wed, 2004-03-10 at 08:12, Steve Yuroff wrote:
>
>> That's the first report I've had of the spamlist.org database bouncing
>> mail from a legit source.
>>
>> Are you saying that spamlist.org and the add-in rulesets I'm using are
>> crap? They seemed to be useful, legit tools to me, but I'm here to
>> learn. If it's to be avoided, I'm all for it.
>>
>> My understanding was that the spamlist.org list would cut my
>> processing
>> load at the MTA level... isn't it easier to not accept the mail, than
>> to accept it and then process it for ham/spam?
>
> Unfortunately it seems that a log of ISPs are buying that line and just
> using blacklists to reject e-mail outright. I hate spam, which is why
> I've installed SA locally, and I'm sure that I'm doing a much better
> job
> of filtering spam than my isp is.
>
> Most of the blacklists claim that they have a very low false-positive
> rate, but then most if not all warn against discarding mail based on
> the
> BL alone. I first became aware of the isp/rbl problem when I tried to
> trace down why yahoo groups was periodically suspending sending me mail
> because I was bouncing it. The bounce history showed that my ISP was
> giving hard bounces because the particular yahoo server was blacklisted
> temporarily on spamcop.net. It was no use trying to convince my ISP
> that
> they should not use an RBL to block mail, they LOVED the fact that the
> load on their servers was reduced. The problem is that they had no way
> of knowing whether or not the mail they were rejecting was in fact
> legitimate.
>
> More recently I've had more problems just trying to sign up for some
> support lists on sourceforge, because the registration confirmation
> email never got to me. I finally resorted to opening up my own MTA and
> got the email immediately via this back channel.
>
> Using RBLs as one input to a filtering tool like SA might be a good
> idea, but from what I'm seeing the growing trend towards using them to
> block e-mails is a very bad one. I'm beginning to think that the USPS
> is
> doing a better job of actually delivering my mail than a lot of ISPs
> do.
>
> Sorry for the rant!
>
I've taken the spamlist.org blacklist off of my server since the (I've
been persuaded that it's just too harsh), and I've also removed the
stearns.org blacklist, and Pyzor checks, and my scan times are down to
2-5 seconds, (usually), while my load averages are sitting under 1.
What's odd is that once in a while I see a complete outlier- an 8K
email took 71 seconds to scan, and another that took 47 seconds. The
neighboring entries took under 8 seconds apiece.

How can I determine why some messages take 10x longer to process than
others? These messages seem to be only 1% of the traffic, but I just
want to know what's so odd about them.

Thanks,

Steve.
RE: How to optimize SA? [ In reply to ]
> > I've been doing some debug runs on the included
> sample-spam.txt, and I
> > see that's where the biggest pause happens right before that line.
> > Maybe a shared bayes database for my whole company isn't
> going to work
> > anymore?
>
> I believe individual bayes databases are probably faster, but
> other people seem to do quite okay with shared bayes databases.

We're running more messages through than that with a shared bayes
database on an AMD 700 with 768MB ram on Windows. Not a fair comparison,
I know, but I'd doubt that bayes database contention is much of an
issue.

Bret