I recently went thrugh my setup and read a bunch of web pages, and
decided to try TXREP. My summary comments after a few weeks:
It seems to be working quite well.
Outbound processing is really useful; people I mail to get negative
points on inbound messages.
I worry less about FPs from giving rules like RDNS_NONE and
FREEMAIL_FROM a point or so.
sa-learn(1) doesn't document that it feeds to TXREP.
sa-learn(1) doesn't document that messages will be scanned again at
learn time if TXPRE is configured, and that this will cause DNSBL
lookups. Therefore use the -L flag!
There are probably some reasons why this isn't default, but it seeems
highly useful.
and the complicated bits:
I'm a little concerned about performance with senders like gmai.
where while some high % of the mail I get from them is ham, that's
different than the a priori odds of a message arriving from an unknown
sender. So I'd like to not really give weight to "this came from
gmail". However many domains are correlated. So I wonder about
keeping second-order stats to figure out variance and basically not
paying attention to reputation by IP address (sending domain,
logically) for those that send both ham and spam, leaving the weight
to lookups by full email address only.
I don't see why the message is rescored when learning. The user says
if it's ham or spam, for -20 or +20, and thus SA's opinion is moot.
decided to try TXREP. My summary comments after a few weeks:
It seems to be working quite well.
Outbound processing is really useful; people I mail to get negative
points on inbound messages.
I worry less about FPs from giving rules like RDNS_NONE and
FREEMAIL_FROM a point or so.
sa-learn(1) doesn't document that it feeds to TXREP.
sa-learn(1) doesn't document that messages will be scanned again at
learn time if TXPRE is configured, and that this will cause DNSBL
lookups. Therefore use the -L flag!
There are probably some reasons why this isn't default, but it seeems
highly useful.
and the complicated bits:
I'm a little concerned about performance with senders like gmai.
where while some high % of the mail I get from them is ham, that's
different than the a priori odds of a message arriving from an unknown
sender. So I'd like to not really give weight to "this came from
gmail". However many domains are correlated. So I wonder about
keeping second-order stats to figure out variance and basically not
paying attention to reputation by IP address (sending domain,
logically) for those that send both ham and spam, leaving the weight
to lookups by full email address only.
I don't see why the message is rescored when learning. The user says
if it's ham or spam, for -20 or +20, and thus SA's opinion is moot.