Mailing List Archive

1 2  View All
Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]
On Monday, March 29, 2004, 9:35:58 AM, Marc Perkel wrote:
> For what its worth - I've been blacklisting against my own URI list for
> over a year now and quite frankly - it's the best thing I have for
> trapping spam of anything I do. It's 100% accurate and if I see new spam
> getting through all I have to do is add to the list and no more of them.

> So - YES !!!!

Thanks for the feedback Marc. I agree and believe in this
general approach.

> Glad to see SA implementing this.

Well it's not fully implemented yet. Trying SURBL with URIDNSBL
was only an experiment and it didn't work nearly as well as I
would have liked. We need some code written to use it in a
slightly different, simpler way which I suggested earlier....

> Just want to say though - the ability to add my own URIs to the
> blacklist is important. Should also support a flat text file with regex
> expressions.

> My 2 centz .....

Should be doable, though I expect the SpamCop reported URI data
in SURBL to be very useful, at a minimum as a firm foundation.

Cheers,

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/
Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]
On Monday, March 29, 2004, 9:35:58 AM, Marc Perkel wrote:
> For what its worth - I've been blacklisting against my own URI list for
> over a year now and quite frankly - it's the best thing I have for
> trapping spam of anything I do. It's 100% accurate and if I see new spam
> getting through all I have to do is add to the list and no more of them.

> So - YES !!!!

Thanks for the feedback Marc. I agree and believe in this
general approach.

> Glad to see SA implementing this.

Well it's not fully implemented yet. Trying SURBL with URIDNSBL
was only an experiment and it didn't work nearly as well as I
would have liked. We need some code written to use it in a
slightly different, simpler way which I suggested earlier....

> Just want to say though - the ability to add my own URIs to the
> blacklist is important. Should also support a flat text file with regex
> expressions.

> My 2 centz .....

Should be doable, though I expect the SpamCop reported URI data
in SURBL to be very useful, at a minimum as a firm foundation.

Cheers,

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/
Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]
On Monday, March 29, 2004, 10:01:59 AM, Marc Perkel wrote:
> Here's some of my initial thoughts.

> In the domain is what I would call the "real" part of the domain.

> farmsex.com
> farmsex.co.uk

> The part before the "farmsex" should be ignored. Anyone who controls the
> domains also probably controls the subdomains and that is likely the
> rotating part.

Yes, that's done automatically by the averaging and summarization
on the data service side (i.e. in SURBL). This effect is
hopefully adequately explained in my docs:

http://spamcheck.freeapp.net/

http://sc.surbl.org/

On the date service side, this desirable effect happens
essentially by side-effect of the data handling. It's part
of the design, but the "real" domains pop out of the data
and into SURBL pretty much on their own.

Extraction of the "real" domain also needs to be done on the
SA/SURBL client side so that only the "real" part of the domain
from the message is compared against SURBL. Heuristically all
that's needed most of the time is to compare the second and third
levels of any given domain that occurs in a message body URI.
That will match the data in SURBL very well:

http://spamcheck.freeapp.net/top-sites-domains

for named URIs. Numeric addresses should match on all four
octets.

> Additionally - a reverse lookup should be done on the IPs of the links
> for the purpouses of statistical tracking. We might find the the
> resolved IP is always spam - or always not spam - or sometimes spam and
> sometimes not spam. We may be able to return a score on the resolved IP
> addresses. I believe that we are going to see a lot of spam linking to
> the same IP or groups of IPs and that if a new URI resolves to the same
> IP address as farmsex.com then it is likely also spam.

That can be done, but it's not really part of my intended purpose
for the SURBL data itself. I envision it as literal, unresolved
domain name (and IP address 1.2.3.4 in the original URI like
http://1.2.3.4/foo) comparison. I expect no DNS resolution to
be used at all anywhere around SURBL, and I expect this to work
well!

> The thought is that spammers might start linking to cnn.com or something
> to try to raise the score - even if it's in hidden text. And - that's an
> issue - but live links to other sites might defeat the purpose of the
> spam and mixing blacklisted sites with nonblacklisted might even become
> a stronger indicator of spam.

I'd lump this into the general category of Joe Job. It can
pretty easily be defeated by whitelisting. In fact I already
have cnn.com in my small whitelist along with a couple other
news sites:

http://spamcheck.freeapp.net/whitelist-domains.sort

Due to the averaging effect and careful reporting, and a somewhat
high inclusion threshold, full domains in the whitelist are
seldom actually hit however. Frankly this all works somewhat
better than I expected on the data side, mostly due to the
quality of the SpamCop URI data probably.

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/
Re[2]: New RBL for use with URIDNSBL plugin [ In reply to ]
On Monday, March 29, 2004, 10:01:59 AM, Marc Perkel wrote:
> Here's some of my initial thoughts.

> In the domain is what I would call the "real" part of the domain.

> farmsex.com
> farmsex.co.uk

> The part before the "farmsex" should be ignored. Anyone who controls the
> domains also probably controls the subdomains and that is likely the
> rotating part.

Yes, that's done automatically by the averaging and summarization
on the data service side (i.e. in SURBL). This effect is
hopefully adequately explained in my docs:

http://spamcheck.freeapp.net/

http://sc.surbl.org/

On the date service side, this desirable effect happens
essentially by side-effect of the data handling. It's part
of the design, but the "real" domains pop out of the data
and into SURBL pretty much on their own.

Extraction of the "real" domain also needs to be done on the
SA/SURBL client side so that only the "real" part of the domain
from the message is compared against SURBL. Heuristically all
that's needed most of the time is to compare the second and third
levels of any given domain that occurs in a message body URI.
That will match the data in SURBL very well:

http://spamcheck.freeapp.net/top-sites-domains

for named URIs. Numeric addresses should match on all four
octets.

> Additionally - a reverse lookup should be done on the IPs of the links
> for the purpouses of statistical tracking. We might find the the
> resolved IP is always spam - or always not spam - or sometimes spam and
> sometimes not spam. We may be able to return a score on the resolved IP
> addresses. I believe that we are going to see a lot of spam linking to
> the same IP or groups of IPs and that if a new URI resolves to the same
> IP address as farmsex.com then it is likely also spam.

That can be done, but it's not really part of my intended purpose
for the SURBL data itself. I envision it as literal, unresolved
domain name (and IP address 1.2.3.4 in the original URI like
http://1.2.3.4/foo) comparison. I expect no DNS resolution to
be used at all anywhere around SURBL, and I expect this to work
well!

> The thought is that spammers might start linking to cnn.com or something
> to try to raise the score - even if it's in hidden text. And - that's an
> issue - but live links to other sites might defeat the purpose of the
> spam and mixing blacklisted sites with nonblacklisted might even become
> a stronger indicator of spam.

I'd lump this into the general category of Joe Job. It can
pretty easily be defeated by whitelisting. In fact I already
have cnn.com in my small whitelist along with a couple other
news sites:

http://spamcheck.freeapp.net/whitelist-domains.sort

Due to the averaging effect and careful reporting, and a somewhat
high inclusion threshold, full domains in the whitelist are
seldom actually hit however. Frankly this all works somewhat
better than I expected on the data side, mostly due to the
quality of the SpamCop URI data probably.

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
On 3/29/2004 at 1:31 PM, "Justin Mason" <jm@jmason.org> wrote:

> Tony Finch writes:
>> On Mon, 29 Mar 2004, Jeff Chan wrote:
>> >
>> > So a technique to defeat the randomizers greater count is to look
>> > at the higher levels of the domain, under which SURBL will always
>> > count the randomized children of the "bad" parent. In this case
>> > the URI diversity created through randomization hurts the spammer
>> > by increasing the number of unique reports and increasing the
>> > report count of their parent domain, making them more likely to
>> > be added to SURBL. (Dooh, this paragraph is redundant...)
>>
>> Another approach is to blacklist nameservers that host spamvertized
>> domains. If an email address or a URI uses a domain name whose nameservers
>> are blacklisted (e.g. the SBL has appropriate listing criteria), or if the
>> reverse DNS is hosted on blacklisted nameservers, these may be grounds for
>> increasing the score.
>>
>> I don't know if SA does this check yet.

> Yep, it does -- that's what the URIBL plugin does currently.

sorry, I still haven't been able to test the -current version, but it
was my understanding based on the discussion under [Bug 1375] that URIBL
checks only check the RECURSIVELY RESOLVED 'A' records of the host part
of a URI against DNSBLs.

Justin: could you write a 5-liner saying what the final version of the
module does, and in what order?


I had previously thought this up as a defense against Ralsky's new randomized
URLs like: http://[a-z]{14}.hfgr33.us :

1 - do non-recursive resolution of NS records for the second-level domain ONLY
(I have a list of third-level delegations for various country TLD's, too,
if required, but I have yet to see any spamvertized URLs in 3rd-level
delegation domains),
to avoid triggering their 'web bug' by way of them loging+tracking DNS lookups.
This is a very fast query, always and only going to nameservers of TLDs,
which will (hopefully) never be under the control of spammers playing DoS games.
2 - If we don't do this, they can easily tell what recipient MX hosts use
this functionality, and singling these specific ones out with messages
crafted to DoS this scheme in the future. This is especially important
as long as these URIBL method(s) are not widely adopted - and I am not
exclusively talking about SA here.
3 - do DNSBL lookups on all such nameserver IPs - if there's more than say: 4,
pick N random ones from the list, where N = 4 + ((number_of_NS_hosts - 4) / 2 )
(e.g.: only every second one beyond the magic number of 4).
- if you get DNSBL hits for 2 or more of the nameservers, abort
all further lookups and return a match on the rule.
4 - when all nameserver IP's are "clear" of DNSBL hits, proceed to query
them for A records for the hostname part of the URIs (or the IP numbers
in numeric URLs, in which case you never went through steps 1-3), and do
the DNSBL queries against them.

There is plenty of evidence that major numbers of spamvertized
domains/websites are using (*) SBL-listed nameservers and networks:
stop worrying about FP's for innocent domains hosted on these nameservers:
it will become impossible for providers to run nameservers that are SBL-listed,
and be ignorant about it. If they were quick to terminate spamsites pointing to
their NS's, they'd have never landed in the SBL in the first place.

(*) listing policies of other DNSBLs may have substantially different results
when used as URiBL query targets.

The only thing I am unsure how to do with the above is: how to communicate
that a rule match happened based on NS DNSBL hits, or based on DNSBL hits for
A records for the host(s)?

bye,Kai
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Kai writes:
> On 3/29/2004 at 1:31 PM, "Justin Mason" <jm@jmason.org> wrote:
>
> > Tony Finch writes:
> >> On Mon, 29 Mar 2004, Jeff Chan wrote:
> >> >
> >> > So a technique to defeat the randomizers greater count is to look
> >> > at the higher levels of the domain, under which SURBL will always
> >> > count the randomized children of the "bad" parent. In this case
> >> > the URI diversity created through randomization hurts the spammer
> >> > by increasing the number of unique reports and increasing the
> >> > report count of their parent domain, making them more likely to
> >> > be added to SURBL. (Dooh, this paragraph is redundant...)
> >>
> >> Another approach is to blacklist nameservers that host spamvertized
> >> domains. If an email address or a URI uses a domain name whose nameservers
> >> are blacklisted (e.g. the SBL has appropriate listing criteria), or if the
> >> reverse DNS is hosted on blacklisted nameservers, these may be grounds for
> >> increasing the score.
> >>
> >> I don't know if SA does this check yet.
>
> > Yep, it does -- that's what the URIBL plugin does currently.
>
> sorry, I still haven't been able to test the -current version, but it
> was my understanding based on the discussion under [Bug 1375] that URIBL
> checks only check the RECURSIVELY RESOLVED 'A' records of the host part
> of a URI against DNSBLs.
>
> Justin: could you write a 5-liner saying what the final version of the
> module does, and in what order?

Sure, maybe when I get some free time ;)

In the meantime, the POD docs in Mail::SpamAssassin::Plugin::URIBL
should help.

> I had previously thought this up as a defense against Ralsky's new randomized
> URLs like: http://[a-z]{14}.hfgr33.us :
>
> 1 - do non-recursive resolution of NS records for the second-level domain ONLY
> (I have a list of third-level delegations for various country TLD's, too,
> if required, but I have yet to see any spamvertized URLs in 3rd-level
> delegation domains),
> to avoid triggering their 'web bug' by way of them loging+tracking DNS lookups.
> This is a very fast query, always and only going to nameservers of TLDs,
> which will (hopefully) never be under the control of spammers playing DoS games.

Interesting. Can you suggest how to do that with Net::DNS? I'm not
sure we have control to that degree.

> 2 - If we don't do this, they can easily tell what recipient MX hosts use
> this functionality, and singling these specific ones out with messages
> crafted to DoS this scheme in the future. This is especially important
> as long as these URIBL method(s) are not widely adopted - and I am not
> exclusively talking about SA here.

My take is to cut off at the registrar-registered portion: e.g.
"foo.co.uk", "foo.biz" etc., and use stringent timeouts. The
scanning will always kill any pending lookups 2 seconds after
the normal DNSBL lookups complete.

> 3 - do DNSBL lookups on all such nameserver IPs - if there's more than say: 4,
> pick N random ones from the list, where N = 4 + ((number_of_NS_hosts - 4) / 2 )
> (e.g.: only every second one beyond the magic number of 4).

It performs lookups on all in parallel.

> - if you get DNSBL hits for 2 or more of the nameservers, abort
> all further lookups and return a match on the rule.

Well, the timeout provides this as well; if I recall correctly it'll
complete as many as it can, no matter how many other hits there are.

> 4 - when all nameserver IP's are "clear" of DNSBL hits, proceed to query
> them for A records for the hostname part of the URIs (or the IP numbers
> in numeric URLs, in which case you never went through steps 1-3), and do
> the DNSBL queries against them.

Currently it just does NS queries, on the basis that A lookups of the
hostname parts are easily evadable by using a randomized hostname portion
beyond the registrar domain portion, and that if we A-lookup'd that
hostname, we'd provide a back channel to the spammers for address
confirmation etc. Hence I've deliberately avoided A lookups.

Can you think of an algorithm that'll do this reliably? If
we can limit recursive lookups to the roots, it may help. But
I wasn't sure if this was possible.

I have a vague feeling that there are a wide range of evasions for this --
e.g. a spammer could set extremely low TTLs so the roots will time out
non-auth data very quickly; our non-recursive lookups would return no
match, but a recursive lookup from a MUA would cause the root to query the
NS, and return a match.

> There is plenty of evidence that major numbers of spamvertized
> domains/websites are using (*) SBL-listed nameservers and networks:
> stop worrying about FP's for innocent domains hosted on these nameservers:
> it will become impossible for providers to run nameservers that are SBL-listed,
> and be ignorant about it. If they were quick to terminate spamsites pointing to
> their NS's, they'd have never landed in the SBL in the first place.
>
> (*) listing policies of other DNSBLs may have substantially different results
> when used as URiBL query targets.

SBL is certainly working very well in these lookups; they're doing a
good job avoiding FPs.

In terms of worrying about FPs -- let mass-check worry about it ;)
There's no URI BL providing dangerous FPs at the moment.

> The only thing I am unsure how to do with the above is: how to communicate
> that a rule match happened based on NS DNSBL hits, or based on DNSBL hits for
> A records for the host(s)?

well, given the NS-only nature, that's not a problem right now. ;)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAaM8CQTcbUG5Y7woRAmvVAJ9qlUY6MdDFFk4UTHQYvfX1xJcAagCffIrZ
khD4b8G20XKdGsQ0lPwqRR4=
=2tE/
-----END PGP SIGNATURE-----
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
On Monday, March 29, 2004, 5:36:02 PM, Justin Mason wrote:
> Kai writes:
>> Justin: could you write a 5-liner saying what the final version of the
>> module does, and in what order?

> Sure, maybe when I get some free time ;)

> In the meantime, the POD docs in Mail::SpamAssassin::Plugin::URIBL
> should help.

Pardon my noobness, but can you provide a URL?

> My take is to cut off at the registrar-registered portion: e.g.
> "foo.co.uk", "foo.biz" etc., and use stringent timeouts. The
> scanning will always kill any pending lookups 2 seconds after
> the normal DNSBL lookups complete.

This extraction of domains sounds important and interesting.
Do the docs or source code include how this was done? It's
not immediately obvious to me how one can programmatically
determine the registered part of domains.

TIA,

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
On Monday, March 29, 2004, 5:36:02 PM, Justin Mason wrote:
> Kai writes:
>> Justin: could you write a 5-liner saying what the final version of the
>> module does, and in what order?

> Sure, maybe when I get some free time ;)

> In the meantime, the POD docs in Mail::SpamAssassin::Plugin::URIBL
> should help.

Pardon my noobness, but can you provide a URL?

> My take is to cut off at the registrar-registered portion: e.g.
> "foo.co.uk", "foo.biz" etc., and use stringent timeouts. The
> scanning will always kill any pending lookups 2 seconds after
> the normal DNSBL lookups complete.

This extraction of domains sounds important and interesting.
Do the docs or source code include how this was done? It's
not immediately obvious to me how one can programmatically
determine the registered part of domains.

TIA,

Jeff C.
--
Jeff Chan
mailto:jeffc@surbl.org-nospam
http://sc.surbl.org/
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Jeff Chan writes:
>On Monday, March 29, 2004, 5:36:02 PM, Justin Mason wrote:
>> Kai writes:
>>> Justin: could you write a 5-liner saying what the final version of the
>>> module does, and in what order?
>
>> Sure, maybe when I get some free time ;)
>
>> In the meantime, the POD docs in Mail::SpamAssassin::Plugin::URIBL
>> should help.
>
>Pardon my noobness, but can you provide a URL?

http://spamassassin.org/full/3.0.x/dist/lib/Mail/SpamAssassin/Plugin/URIDNSBL.pm

>> My take is to cut off at the registrar-registered portion: e.g.
>> "foo.co.uk", "foo.biz" etc., and use stringent timeouts. The
>> scanning will always kill any pending lookups 2 seconds after
>> the normal DNSBL lookups complete.
>
>This extraction of domains sounds important and interesting.
>Do the docs or source code include how this was done? It's
>not immediately obvious to me how one can programmatically
>determine the registered part of domains.

It's pretty simple -- we have a list of TLDs that use 3-level domains
(like .co.uk) and all others are assumed to be 2-level (like .com). Then
we cut off at the level below the TLD. That's done by the function
Mail::SpamAssassin::Util::uri_to_domain() .

There are a couple of exceptions we don't deal with:

- dyndns.org-type sites. I think the operators of those will be quite
stringently antispam if it arises, given how quickly they've rolled
out SPF!

- www.geocities.com-type sites. Impossible to do URIBL lookups there
anyway, since all hosts share the same As and NSes, so we're reliant
on them exercising some abuse control. The SBL will list abusers
anyway I think -- wonder if they've listed terra.es yet?

But I don't think those exceptions will cause trouble.

PS: I was wrong -- I said it didn't limit how many lookups it started. In
fact it does -- it'll use a random selection of 'uridnsbl_max_domains'
(default: 20) from all the URIs in the message.

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAaN1CQTcbUG5Y7woRAtEQAJ9KnYW2dTCWuCIUtT9qYUTngiNaVgCg2FLf
E2FdtCVuHhIZRkhLkMstDNI=
=5tlF
-----END PGP SIGNATURE-----
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
On Mon, 29 Mar 2004, Justin Mason wrote:
>
> In the meantime, the POD docs in Mail::SpamAssassin::Plugin::URIBL
> should help.

Cool. Have you tried applying this to the SMTP return path, or
information in the Received: headers?

--
Tony Finch <dot@dotat.at> http://dotat.at/
Re: New RBL for use with URIDNSBL plugin [ In reply to ]
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Tony Finch writes:
> On Mon, 29 Mar 2004, Justin Mason wrote:
> >
> > In the meantime, the POD docs in Mail::SpamAssassin::Plugin::URIBL
> > should help.
>
> Cool. Have you tried applying this to the SMTP return path, or
> information in the Received: headers?

Actually, no ;) (at this stage that's trivially evadable through
use of proxies, so I'm not sure it'd be worthwhile -- but I'd
be happy to see any results if anyone wants to try it out.)

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFAaiEwQTcbUG5Y7woRAnSeAKDlRdniWhED217JwqkobSn363qn0wCguuTD
bnN+bSCIK6fEARLnc/9vRKE=
=0Rvy
-----END PGP SIGNATURE-----

1 2  View All