Mailing List Archive: New RBL for use with URIDNSBL plugin

Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]

Mar 29, 2004, 1:49 PM

Post #26 of 36 (139 views)

On Monday, March 29, 2004, 9:35:58 AM, Marc Perkel wrote:
> For what its worth - I've been blacklisting against my own URI list for
> over a year now and quite frankly - it's the best thing I have for
> trapping spam of anything I do. It's 100% accurate and if I see new spam
> getting through all I have to do is add to the list and no more of them.

> So - YES !!!!

Thanks for the feedback Marc. I agree and believe in this
general approach.

> Glad to see SA implementing this.

Well it's not fully implemented yet. Trying SURBL with URIDNSBL
was only an experiment and it didn't work nearly as well as I
would have liked. We need some code written to use it in a
slightly different, simpler way which I suggested earlier....

> Just want to say though - the ability to add my own URIs to the
> blacklist is important. Should also support a flat text file with regex
> expressions.

> My 2 centz .....

Should be doable, though I expect the SpamCop reported URI data
in SURBL to be very useful, at a minimum as a firm foundation.

Cheers,

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/

Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]

jeffc at surbl

Mar 29, 2004, 1:49 PM

Post #27 of 36 (139 views)

Permalink

Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]

jeffc at surbl

Mar 29, 2004, 2:05 PM

Post #28 of 36 (139 views)

Permalink

On Monday, March 29, 2004, 10:01:59 AM, Marc Perkel wrote:
> Here's some of my initial thoughts.

> In the domain is what I would call the "real" part of the domain.

> farmsex.com
> farmsex.co.uk

> The part before the "farmsex" should be ignored. Anyone who controls the
> domains also probably controls the subdomains and that is likely the
> rotating part.

Yes, that's done automatically by the averaging and summarization
on the data service side (i.e. in SURBL). This effect is
hopefully adequately explained in my docs:

http://spamcheck.freeapp.net/

http://sc.surbl.org/

On the date service side, this desirable effect happens
essentially by side-effect of the data handling. It's part
of the design, but the "real" domains pop out of the data
and into SURBL pretty much on their own.

Extraction of the "real" domain also needs to be done on the
SA/SURBL client side so that only the "real" part of the domain
from the message is compared against SURBL. Heuristically all
that's needed most of the time is to compare the second and third
levels of any given domain that occurs in a message body URI.
That will match the data in SURBL very well:

http://spamcheck.freeapp.net/top-sites-domains

for named URIs. Numeric addresses should match on all four
octets.

> Additionally - a reverse lookup should be done on the IPs of the links
> for the purpouses of statistical tracking. We might find the the
> resolved IP is always spam - or always not spam - or sometimes spam and
> sometimes not spam. We may be able to return a score on the resolved IP
> addresses. I believe that we are going to see a lot of spam linking to
> the same IP or groups of IPs and that if a new URI resolves to the same
> IP address as farmsex.com then it is likely also spam.

That can be done, but it's not really part of my intended purpose
for the SURBL data itself. I envision it as literal, unresolved
domain name (and IP address 1.2.3.4 in the original URI like
http://1.2.3.4/foo) comparison. I expect no DNS resolution to
be used at all anywhere around SURBL, and I expect this to work
well!

> The thought is that spammers might start linking to cnn.com or something
> to try to raise the score - even if it's in hidden text. And - that's an
> issue - but live links to other sites might defeat the purpose of the
> spam and mixing blacklisted sites with nonblacklisted might even become
> a stronger indicator of spam.

I'd lump this into the general category of Joe Job. It can
pretty easily be defeated by whitelisting. In fact I already
have cnn.com in my small whitelist along with a couple other
news sites:

http://spamcheck.freeapp.net/whitelist-domains.sort

Due to the averaging effect and careful reporting, and a somewhat
high inclusion threshold, full domains in the whitelist are
seldom actually hit however. Frankly this all works somewhat
better than I expected on the data side, mostly due to the
quality of the SpamCop URI data probably.

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/

Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]

jeffc at surbl

Mar 29, 2004, 2:05 PM

Post #29 of 36 (139 views)

Permalink