On Saturday, April 3, 2004, 12:34:24 AM, Daniel Quinlan wrote:
> Jeff Chan <jeffc@surbl.org> writes:
>> Can you cite some examples of FP-prevention strategies?
> 1. Automated testing. We're testing URLs (web sites). That allows a
> large number of strategies which could be used from each aspect of
> the URL.
> A record
> check other blacklists
> check IP owner against SBL
> domain name
> check name servers in other blacklists
> check registrar
> check age of domain (SenderBase information)
> check ISP / IP block owner (SenderBase, SBL, etc.)
> web content
> check web site for common spam web site content (porn, drugs, credit
> card forms, empty top-level page, etc.)
> Any of those can also be used in concert with threshold tuning. For
> example, lower thresholds if a good blacklist hits and somewhat
> higher thresholds for older domains.
I agree with the content check, but will step on many toes here
by proclaiming that other blacklists (other than SBL), name
servers, registrars, ISP address blocks, and similar approaches
are overly broad and have too much potential for collateral
damage *for my sensibilities*. I really, really hate
blacklisting innocent victims. I consider that a false
accusation or even false punishment. Having policies which allow
blacklisting an entire ISP or even an entire web server IP
address have the potential to harm too many innocent bystanders,
IMO. Your mileage may and probably does vary. ;)
Our approach is to start with some likely good data in the
SpamCop URIs. See comments below.
> 2. Building up a long and accurate whitelist of good URLs over time
> would also help. Maybe work with places that vouch for domain's
> anti-spam policies (Habeas, BondedSender, IADB) to develop longer
> whitelists.
I agree in principle, however I feel that the SpamCop reported
URIs tend to have relatively few FPs. They are domains that
people took the time to report; in essence they are *voting with
their time that these are spam domains*.
That's one of the reasons our whitelist is quite small now (@ 35)
yet catches the few legitimate domains and subdomains that
survive the reporting and thresholding and are (mis-over-)reported
enough to get onto the list before I can notice and whitelist
them. That need has been small so far.
http://spamcheck.freeapp.net/whitelist-domains
> 3. Using a corpus to tune results and thresholds (also whitelist
> seeding).
Agreed. Currently we lack spam and ham corporea of our own and
have not had a chance to set some up yet. That may come later
though.
I hope I'm not taking too confrontational a tone here. I'm
just trying to defend our approach, which I think can be valid.
I also realize people have a lot of work invested in other
approaches, but I hope they will eventually give ours a try.
I feel it has value, even if I can't prove it conclusively
myself yet. LOL! :-)
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/
> Jeff Chan <jeffc@surbl.org> writes:
>> Can you cite some examples of FP-prevention strategies?
> 1. Automated testing. We're testing URLs (web sites). That allows a
> large number of strategies which could be used from each aspect of
> the URL.
> A record
> check other blacklists
> check IP owner against SBL
> domain name
> check name servers in other blacklists
> check registrar
> check age of domain (SenderBase information)
> check ISP / IP block owner (SenderBase, SBL, etc.)
> web content
> check web site for common spam web site content (porn, drugs, credit
> card forms, empty top-level page, etc.)
> Any of those can also be used in concert with threshold tuning. For
> example, lower thresholds if a good blacklist hits and somewhat
> higher thresholds for older domains.
I agree with the content check, but will step on many toes here
by proclaiming that other blacklists (other than SBL), name
servers, registrars, ISP address blocks, and similar approaches
are overly broad and have too much potential for collateral
damage *for my sensibilities*. I really, really hate
blacklisting innocent victims. I consider that a false
accusation or even false punishment. Having policies which allow
blacklisting an entire ISP or even an entire web server IP
address have the potential to harm too many innocent bystanders,
IMO. Your mileage may and probably does vary. ;)
Our approach is to start with some likely good data in the
SpamCop URIs. See comments below.
> 2. Building up a long and accurate whitelist of good URLs over time
> would also help. Maybe work with places that vouch for domain's
> anti-spam policies (Habeas, BondedSender, IADB) to develop longer
> whitelists.
I agree in principle, however I feel that the SpamCop reported
URIs tend to have relatively few FPs. They are domains that
people took the time to report; in essence they are *voting with
their time that these are spam domains*.
That's one of the reasons our whitelist is quite small now (@ 35)
yet catches the few legitimate domains and subdomains that
survive the reporting and thresholding and are (mis-over-)reported
enough to get onto the list before I can notice and whitelist
them. That need has been small so far.
http://spamcheck.freeapp.net/whitelist-domains
> 3. Using a corpus to tune results and thresholds (also whitelist
> seeding).
Agreed. Currently we lack spam and ham corporea of our own and
have not had a chance to set some up yet. That may come later
though.
I hope I'm not taking too confrontational a tone here. I'm
just trying to defend our approach, which I think can be valid.
I also realize people have a lot of work invested in other
approaches, but I hope they will eventually give ours a try.
I feel it has value, even if I can't prove it conclusively
myself yet. LOL! :-)
Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://www.surbl.org/