Mailing List Archive

AWL bloat-reducer
I've been having user quota problems due to AWL bloat in a growing
number of accounts. Most customers' AWL files include a *long* list of
one-off spam addresses which *SIGNIFICANTLY* increase disk usage.

I finally got disgusted with this, and hacked check_whitelist into
trim_whitelist. It makes a backup copy of the "old" AWL db, creates a
fresh db and copies only those addresses that have a count greater than
1 from old to new. It then moves the new db over the old one and makes
sure ownership of the new db is correct if running as root. I didn't
want to autodelete the old db in case something broke.

At the moment, it only understands AWL files in "Berkeley DB (Hash,
version 5, native byte-order)" format (or any other file-based hash with
files that end with .db), but it could probably be expanded to
understand others without too much trouble; and could probably accept
other options to control which addresses it discards (ie, anything with
a *really* high AWL entry likely doesn't need to be kept; chose the
count cutoff, etc). It could also be adapted to upgrade AWL dbs as
necessary.

Size reduction varied a LOT; I checked it on a number of users whose
AWL db has grown to over 8M. Typical reduction was ~8:1, with a few
dropping to ~300K (~27:1). Smaller dbs showed even more drastic
reductions; one went from 4500K to 86K (!!!). Given that I have this
server set up for per-user AWLs, and a 20M per-user quota on the home
directory, this is pretty significant. (I've had to move quite a few
user's SA directories into another partition, and symlink them back in
order to allow them 20M of "non-inbox" email folder space.)

If you or your users are running short on disk space due to ballooning
AWL files, (in total, or within the system quota) you may want to play
with this.

Download at http://www.deepnet.cx/~kdeugau/spamtools/trim_whitelist

-kgd
--
"Sendmail administration is not black magic. There are legitimate
technical reasons why it requires the sacrificing of a live chicken."
- Unknown
Re: AWL bloat-reducer [ In reply to ]
Kris Deugau <kdeugau@webhart.net> writes:

> I've been having user quota problems due to AWL bloat in a growing
> number of accounts. Most customers' AWL files include a *long* list of
> one-off spam addresses which *SIGNIFICANTLY* increase disk usage.

Definitely!

> I finally got disgusted with this, and hacked check_whitelist into
> trim_whitelist. It makes a backup copy of the "old" AWL db, creates a
> fresh db and copies only those addresses that have a count greater than
> 1 from old to new. It then moves the new db over the old one and makes
> sure ownership of the new db is correct if running as root. I didn't
> want to autodelete the old db in case something broke.

Makes sense to me.

> At the moment, it only understands AWL files in "Berkeley DB (Hash,
> version 5, native byte-order)" format (or any other file-based hash with
> files that end with .db), but it could probably be expanded to
> understand others without too much trouble; and could probably accept
> other options to control which addresses it discards (ie, anything with
> a *really* high AWL entry likely doesn't need to be kept; chose the
> count cutoff, etc). It could also be adapted to upgrade AWL dbs as
> necessary.

Why not keep really high AWL entries? It can't hurt.

> Size reduction varied a LOT; I checked it on a number of users whose
> AWL db has grown to over 8M. Typical reduction was ~8:1, with a few
> dropping to ~300K (~27:1). Smaller dbs showed even more drastic
> reductions; one went from 4500K to 86K (!!!). Given that I have this
> server set up for per-user AWLs, and a 20M per-user quota on the home
> directory, this is pretty significant. (I've had to move quite a few
> user's SA directories into another partition, and symlink them back in
> order to allow them 20M of "non-inbox" email folder space.)
>
> If you or your users are running short on disk space due to ballooning
> AWL files, (in total, or within the system quota) you may want to play
> with this.
>
> Download at http://www.deepnet.cx/~kdeugau/spamtools/trim_whitelist

Sounds like an initial version of what has been proposed in this bug:

http://bugzilla.spamassassin.org/show_bug.cgi?id=3082

Separate program seems like the way to go, but I am very hesitant at
adding new commands/options to handle expiry rather than just doing it
all automatically behind the scenes.

Daniel

--
Daniel Quinlan anti-spam (SpamAssassin), Linux,
http://www.pathname.com/~quinlan/ and open source consulting
Re: AWL bloat-reducer [ In reply to ]
On Fri, 27 Feb 2004, Kris Deugau wrote:

> I've been having user quota problems due to AWL bloat in a growing
> number of accounts. Most customers' AWL files include a *long* list of
> one-off spam addresses which *SIGNIFICANTLY* increase disk usage.
> ...
> I finally got disgusted with this, and hacked check_whitelist into
> trim_whitelist. It makes a backup copy of the "old" AWL db, creates a
> fresh db and copies only those addresses that have a count greater than
> 1 from old to new. It then moves the new db over the old one and makes
> sure ownership of the new db is correct if running as root. I didn't
> want to autodelete the old db in case something broke.
>
> Download at http://www.deepnet.cx/~kdeugau/spamtools/trim_whitelist

Just wanted to say thanks for this script! It works great, can be run as
root globally on all users, and we even scripted it to run on multiple
servers from one command. Disk space saved was quite large, and it helps
our users' quotas as well.

Rob M.
Re: AWL bloat-reducer [ In reply to ]
Rob Mangiafico wrote:
> Just wanted to say thanks for this script! It works great,

So far. I haven't yet set up any automated calls to it- I'm a little
too paranoid about losing customer AWL files. <g> (Although I haven't
heard any complaints yet.) I've just used in in extreme cases where the
customer is getting close to the automated "You are almost out of disk
space for you spam folder" warning...

> can be run
> as root globally on all users,

Or by individual users; although I've just realized a regular user
could attempt to run it on any .db file they have write access to. :/

Not a *major* security problem, but it could be a nuisance.

> Disk space saved was quite large, and it helps
> our users' quotas as well.

Which is why I wrote it. <g>

-kgd
--
"Sendmail administration is not black magic. There are legitimate
technical reasons why it requires the sacrificing of a live chicken."
- Unknown