Mailing List Archive

Metric for sending IP "pinkness"?
Hi,

I'm working on a project to combine mail log analysis and SpamAssassin
(spamd) scoring to rank the spamminess of a connecting IP address. I
haven't found any standard metrics so I'm guessing at what might be
useful, such as %spam per unit time {15-minutes, hour, day, week} per
unit network {/32, /28, /24}.

The intent is to generate this metric quickly and place all the
hosts/networks that exceed some threshold on a local access list or
DNSBL. I'd rather not use SpamAssassin's scores directly when generating
the metric because I'd rather the metric be package-neutral.

I've done a little research into this, probably not nearly enough, and
I'd rather not redo work (badly) that someone else has already done.
spamhammerd (http://n0rp.chemlab.org/spamhammer/spamhammerd) comes
closest to what I'm looking for, though it's meant more to defend
against dictionary attacks.

Anyone have a weighting scheme & threshold they're willing to share?

Thanks,

-- Bob
Re: Metric for sending IP "pinkness"? [ In reply to ]
On Sat, 7 Feb 2004 07:01:57 -0600, Bob Apthorpe wrote:

> I'm working on a project to combine mail log analysis and
> SpamAssassin (spamd) scoring to rank the spamminess of a
> connecting IP address. I haven't found any standard metrics so I'm
> guessing at what might be useful, such as %spam per unit time {15-
> minutes, hour, day, week} per unit network {/32, /28, /24}.

Two comments:

1: I'm using relaydb for something similar (but not identical) to this.

This technique simply stores the number of spams and hams per IP in a small database. I'm then checking the ratio of spam to ham for connecting IPs. If the ratio is above a certain threshold, I reject the connection.

I'm also expiring records after a certain time.

2: This method might seem effective in theory, but in reality it doesn't do as much as I'd hoped for.

Nowadays spam more often comes from a multitude of addresses rather than a few dedicated spam sending hosts. This means that few sender IPs actually ever reach the threshold I've set up (a more aggressive threshold could change this though).

I haven't checked what difference it'd make is subnets were used instead of IP-addresses.

Regards
/Jonas
--
Jonas Eckerman, jonas_lists@frukt.org
http://www.fsdb.org/
Re: Metric for sending IP "pinkness"? [ In reply to ]
Hi,

On Sat, 7 Feb 2004 15:40:20 +0100 Jonas Eckerman <jonas_lists@frukt.org> wrote:

> On Sat, 7 Feb 2004 07:01:57 -0600, Bob Apthorpe wrote:
>
> > I'm working on a project to combine mail log analysis and
> > SpamAssassin (spamd) scoring to rank the spamminess of a
> > connecting IP address. I haven't found any standard metrics so I'm
> > guessing at what might be useful, such as %spam per unit time {15-
> > minutes, hour, day, week} per unit network {/32, /28, /24}.
>
> Two comments:
>
> 1: I'm using relaydb for something similar (but not identical) to this.
>
> This technique simply stores the number of spams and hams per IP in a
> small database. I'm then checking the ratio of spam to ham for
> connecting IPs. If the ratio is above a certain threshold, I reject the
> connection.
>
> I'm also expiring records after a certain time.

Sounds like what I'm looking for; a threshold and an expiry time.

> 2: This method might seem effective in theory, but in reality it doesn't do
> as much as I'd hoped for.
>
> Nowadays spam more often comes from a multitude of addresses rather than
> a few dedicated spam sending hosts. This means that few sender IPs
> actually ever reach the threshold I've set up (a more aggressive
> threshold could change this though).

Right, and since I plan to tempfail (reject with 450), a more aggressive
threshold should result only in delayed mail, not rejected ham (except
for broken old Groupwise servers that don't comply with RFCs and take
450s as permanent failures.) This is different from traditional
graylisting in that some amount of spam leaks into your system but you
don't need to retain sender-envelope/sender-IP/recipient triplets. The
nice thing about this method is that you can process your logs in real
time to generate a tempfail access list. With multiple MTAs writing to a
central log host, you can generate one access list for all inbound MTAs
and the load of log processing can be pushed off the MTAs. Then with
something like CFEngine you could periodically push out the updated
access lists. I don't run a system nearly that large but I'd rather
build something that has a chance of scaling beyond a single host.

> I haven't checked what difference it'd make is subnets were used instead
> of IP-addresses.

My system probably doesn't receive enough mail to generate useful
statistics. :/

-- Bob
RE: Metric for sending IP "pinkness"? [ In reply to ]
> -----Original Message-----
> From: Bob Apthorpe [mailto:apthorpe+sa@cynistar.net]
> Sent: Saturday, February 07, 2004 5:02 AM
> To: SATalk
> Subject: Metric for sending IP "pinkness"?
>
>
> Hi,
>
> I'm working on a project to combine mail log analysis and SpamAssassin
> (spamd) scoring to rank the spamminess of a connecting IP address. I
> haven't found any standard metrics so I'm guessing at what might be
> useful, such as %spam per unit time {15-minutes, hour, day, week} per
> unit network {/32, /28, /24}.
>
[...]

A bit off-topic, but in the vein of using mail logs ... I was thinking it
might be good to monitor outgoing mail addresses as well, on the assumption
that your site isn't hosting spammers or spam tool developers (<g>) and that
the people listed in the outgoing mail might at a minimum be whitelisted,
but
certainly those addresses should never be automatically blacklisted.

As you mentioned a spam-filter-neutral approach would be too look for
dictionary
attacks, or for attempted transmissions to users without logins (like adm,
games, bin,
and accounts which have never been listed on the 'net). This might catch a
lot of them.
In our cases, before we started blocking them we receieved a lot of mail to
these
bogus users (which were probably discovered by prior dictionary attacks), so
they make
a good spam signature.

Sender base (http://www.senderbase.org/) which I believe Justin mentioned
before
looks interesting, can be accessed via DNS to get some interesting
statistics
about a host. I just couldn't quite figure out what to do with the data.

This is a fun little page as well: http://hatcheck.org/blockparade.html.
Find your favorite ISP or country and see what percentage of their IP
addresses
are blocked <g>. One might be able to use this info. to give a weighting in
making a blocking decision. It will also give you an idea of which
blacklists
are more/less aggressive.