Mailing List Archive

Decoding Google URL redirections and check VS URI Blacklists
Hi SA Community

In the last couple of weeks, I see a massive increase of spam mails
which make use of google site redirection and dodge all our attempts at
filtering.

That is google redirector is about the only common thing in those
emails. Source IP, text content etc. is quite random.

Such an example URI looks like (two spaces added to prevent this
triggering other filters)

https://www.goo gle.com/url?q=https%3A%2F%2Fkissch icksrr.com%2F%3Futm_source%3DbDukb6xHEYDF2%26amp%3Butm_campaign%3DKirka2&sa=D&sntz=1&usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A

google.com of course is whitelisted.

Creating a rule to match the string "google.com/url?q=" also is a no go
as this would create way to many false positives.

So if I could somehow extract the domain "kissch icksrr.com"
and ckeck it against URI blacklists, we would probably solve that issue.

Has anyone already come up with a way how to do that?

Mit freundlichen Grüssen

-Benoît Panizzon-
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
Hello Benoit,

> In the last couple of weeks, I see a massive increase of spam mails
> which make use of google site redirection and dodge all our attempts at
> filtering.
>
> That is google redirector is about the only common thing in those
> emails. Source IP, text content etc. is quite random.
>
> Such an example URI looks like (two spaces added to prevent this
> triggering other filters)
>
> google.com of course is whitelisted.
>
> Creating a rule to match the string "google.com/url?q=" also is a no go
> as this would create way to many false positives.
>
> So if I could somehow extract the domain "kissch icksrr.com"
> and ckeck it against URI blacklists, we would probably solve that issue.
>
> Has anyone already come up with a way how to do that?

If you could check that it would help a lot....

Some rules to translate common used services and your example is a good
one. If you would check the specific domain it would havbe hit SURBL.

Bye, Raymond
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
Hi Raymond

> If you could check that it would help a lot....
>
> Some rules to translate common used services and your example is a good
> one. If you would check the specific domain it would havbe hit SURBL.

Yes, and future hits to the SWINOG Spamtrap (uribl.swinog.ch) will also
extract such target URI:

Part of the Spamtrap code that looks at the decoded email content:

my $finder = URI::Find->new(sub {
my($uri) = shift;
print "FOUND $uri\n" if ($debug);
if ($uri =~ m|^https://www.google.com/url\?q=https\%3A\%2F\%2F([\w\.-]*\w)|) {
print "GOOGLE REDIR to $1\n" if ($debug);
push @uris, $1;
}


Mit freundlichen Grüssen

-Benoît Panizzon-
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
you're looking to use a redirector_pattern rule - weird that this hasn't
been yet been added in SA's default ruleset
Please submit a bug with a sample message

On 11/2/21 9:52 AM, Benoit Panizzon wrote:
> Hi SA Community
>
> In the last couple of weeks, I see a massive increase of spam mails
> which make use of google site redirection and dodge all our attempts at
> filtering.
>
> That is google redirector is about the only common thing in those
> emails. Source IP, text content etc. is quite random.
>
> Such an example URI looks like (two spaces added to prevent this
> triggering other filters)
>
> https://www.goo gle.com/url?q=https%3A%2F%2Fkissch icksrr.com%2F%3Futm_source%3DbDukb6xHEYDF2%26amp%3Butm_campaign%3DKirka2&sa=D&sntz=1&usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A
>
> google.com of course is whitelisted.
>
> Creating a rule to match the string "google.com/url?q=" also is a no go
> as this would create way to many false positives.
>
> So if I could somehow extract the domain "kissch icksrr.com"
> and ckeck it against URI blacklists, we would probably solve that issue.
>
> Has anyone already come up with a way how to do that?
>
> Mit freundlichen Grüssen
>
> -Benoît Panizzon-
>
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
On Tue, 2021-11-02 at 09:52 +0100, Benoit Panizzon wrote:
> Hi SA Community
>
You can find out quite a lot about a spamming site with a few common
commandline tools:

- 'ping' tells you of the hostname part of the UREL is valid
- 'host hostname' should get the sender's IP
- 'host ip' IOW a reverse host lookup, tells yo if the first
sender address was an alias
- 'lynx hostname' lets you see if there's a website there, which is
often useful (when prompted to accept cookies hit 
'V' to never accept them. This is IMO safer then
using Firefox etc because lynx shows all pages as
plaintext.

Generally using those in the sequence I've listed them tells me enough
to decide whether to treat the site as a spam source.

In this case, either feed that URL to your favourite blacklist or write
a local rule that fires if that url you spotted is in body text.

I've recently started to see regular Google gmail spam. This looks like
boring sex spam, but that's probably a disguise since it contains
attachments with suspicious (i.e. executable) file types. Fortunately, a
more complex rule, built from a set of subrules, that I wrote years ago
to trap mail with this sort of attachment is catching them now.

Martin
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
Hi Martin

> You can find out quite a lot about a spamming site with a few common
> commandline tools:
>
> - 'ping' tells you of the hostname part of the UREL is valid
> - 'host hostname' should get the sender's IP
> - 'host ip' IOW a reverse host lookup, tells yo if the first
> sender address was an alias
> - 'lynx hostname' lets you see if there's a website there, which is
> often useful (when prompted to accept cookies hit 
> 'V' to never accept them. This is IMO safer then
> using Firefox etc because lynx shows all pages as
> plaintext.

Yes, of course. The SWINOG spamtrap does this a bit more sophisticated:

We check if there is a SOA for the URI. If not, we remove the part
before the dot from the left and repeat until the URI contains at
least one dot. If no SOA found, discard.

So we end up with a list of valid 'base' domains and not TLD.

I do this also for the extracted redirection target in case of google
redirectors.

BUT, my question was: I would need SpamAssassin to ALSO extract the
target URI when encountering such a google redirector URL, and check
that against URI blacklists. Is there already a module or easy way to do
so?

Mit freundlichen Grüssen

-Benoît Panizzon-
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
Hi Alex

> you're looking to use a redirector_pattern rule - weird that this hasn't
> been yet been added in SA's default ruleset
> Please submit a bug with a sample message

Thank you, that sounds promising. Digging into how to use.

Mit freundlichen Grüssen

-Benoît Panizzon-
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
That works perfect!

Nov 2 12:31:28.935 [8391] dbg: uri: parsed uri pattern: (?^:(?i)^https?:/*(?:\w+\.)?google(?:\.\w{2,3}){1,2}/url\?.*?(?<=[?&])q=(.*?)(?:$|[&#]))
Nov 2 12:31:28.935 [8391] dbg: uri: parsed uri found: https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2 in redirector: https://www.google.com/url?q=https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2&amp;utm_campaign=Kirka2&sa=D&sntz=1&usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A

[...]

Nov 2 12:31:28.935 [8391] dbg: uri: added host: www.google.com domain: google.com
Nov 2 12:31:28.935 [8391] dbg: uri: cleaned uri: https://www.google.com/url?q=https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2&amp;utm_campaign=Kirka2&sa=D&sntz=1&usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A
Nov 2 12:31:28.936 [8391] dbg: uri: added host: www.google.com domain: google.com
Nov 2 12:31:28.936 [8391] dbg: uri: cleaned uri: https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2
Nov 2 12:31:28.936 [8391] dbg: uri: added host: kisschicksrr.com domain: kisschicksrr.com

Nov 2 12:31:28.948 [8391] dbg: async: launching A/kisschicksrr.com.multi.surbl.org for DNSBL:kisschicksrr.com:multi.surbl.org

1.2 URIBL_ABUSE_SURBL Enthält URL in ABUSE-Liste (www.surbl.org) -
changed from JP to ABUSE bug 7279
[URIs: kisschicksrr.com]

Let's tune scoring a bit, or create some meta rule to yield a higher
score when google is involved.

Mit freundlichen Grüssen

-Benoît Panizzon-
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
So what redirector_pattern rule did you use?

could save us some time



On 11/2/21 12:36 PM, Benoit Panizzon wrote:
> That works perfect!
>
> Nov 2 12:31:28.935 [8391] dbg: uri: parsed uri pattern: (?^:(?i)^https?:/*(?:\w+\.)?google(?:\.\w{2,3}){1,2}/url\?.*?(?<=[?&])q=(.*?)(?:$|[&#]))
> Nov 2 12:31:28.935 [8391] dbg: uri: parsed uri found: https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2 in redirector: https://www.google.com/url?q=https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2&amp;utm_campaign=Kirka2&sa=D&sntz=1&usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A
>
> [...]
>
> Nov 2 12:31:28.935 [8391] dbg: uri: added host: www.google.com domain: google.com
> Nov 2 12:31:28.935 [8391] dbg: uri: cleaned uri: https://www.google.com/url?q=https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2&amp;utm_campaign=Kirka2&sa=D&sntz=1&usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A
> Nov 2 12:31:28.936 [8391] dbg: uri: added host: www.google.com domain: google.com
> Nov 2 12:31:28.936 [8391] dbg: uri: cleaned uri: https://kisschicksrr.com/?utm_source=bDukb6xHEYDF2
> Nov 2 12:31:28.936 [8391] dbg: uri: added host: kisschicksrr.com domain: kisschicksrr.com
>
> Nov 2 12:31:28.948 [8391] dbg: async: launching A/kisschicksrr.com.multi.surbl.org for DNSBL:kisschicksrr.com:multi.surbl.org
>
> 1.2 URIBL_ABUSE_SURBL Enthält URL in ABUSE-Liste (www.surbl.org) -
> changed from JP to ABUSE bug 7279
> [URIs: kisschicksrr.com]
>
> Let's tune scoring a bit, or create some meta rule to yield a higher
> score when google is involved.
>
> Mit freundlichen Grüssen
>
> -Benoît Panizzon-
>
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
Hi Alex

> So what redirector_pattern rule did you use?

Turned out, the shipped one matched:

redirector_pattern m'^https?:/*(?:\w+\.)?google(?:\.\w{2,3}){1,2}/url\?.*?(?<=[?&])q=(.*?)(?:$|[&\#])'i

But when I first tested, the URI was not yet blacklisted to this missed
my attention.

Mit freundlichen Grüssen

-Benoît Panizzon-
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
On 2021-11-02 at 04:52:17 UTC-0400 (Tue, 2 Nov 2021 09:52:17 +0100)
Benoit Panizzon <benoit.panizzon@imp.ch>
is rumored to have said:

> Hi SA Community
>
> In the last couple of weeks, I see a massive increase of spam mails
> which make use of google site redirection and dodge all our attempts
> at
> filtering.
>
> That is google redirector is about the only common thing in those
> emails. Source IP, text content etc. is quite random.
>
> Such an example URI looks like (two spaces added to prevent this
> triggering other filters)
>
> https://www.goo gle.com/url?q=https%3A%2F%2Fkissch
> icksrr.com%2F%3Futm_source%3DbDukb6xHEYDF2%26amp%3Butm_campaign%3DKirka2&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A
>
> google.com of course is whitelisted.

Why "of course?"

Have you tested what happens if you add "clear_uridnsbl_skip_domain
google.com" to your config?


> Creating a rule to match the string "google.com/url?q=" also is a no
> go
> as this would create way to many false positives.

Do not be scared by SA rules matching non-spam. That is a design
feature, not an inadvertent bug. All of the most useful rules match some
ham.

It's only really a "false positive" if the total score for a non-spam
message goes over your local threshold. The fact that the automated
re-scorer assigns scores well below the default threshold is a clue.

>
> So if I could somehow extract the domain "kissch icksrr.com"
> and ckeck it against URI blacklists, we would probably solve that
> issue.
>
> Has anyone already come up with a way how to do that?

I do not believe there's a means of doing that currently. It may be
possible to work something up using the existing internal blocklisting
tools (HashBL, enlist*, etc) but I think it will require new code.

It would be an interesting addition to have a way to define arbitrary
extractor patterns to pull elements out of a string to check against
hostname blocklists or other specific classes of patterns.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Decoding Google URL redirections and check VS URI Blacklists [ In reply to ]
Benoit had already confirmed that the redirector_pattern worked as expected.

On 11/2/21 6:07 PM, Bill Cole wrote:
> On 2021-11-02 at 04:52:17 UTC-0400 (Tue, 2 Nov 2021 09:52:17 +0100)
> Benoit Panizzon <benoit.panizzon@imp.ch>
> is rumored to have said:
>
>> Hi SA Community
>>
>> In the last couple of weeks, I see a massive increase of spam mails
>> which make use of google site redirection and dodge all our attempts at
>> filtering.
>>
>> That is google redirector is about the only common thing in those
>> emails. Source IP, text content etc. is quite random.
>>
>> Such an example URI looks like (two spaces added to prevent this
>> triggering other filters)
>>
>> https://www.goo gle.com/url?q=https%3A%2F%2Fkissch
>> icksrr.com%2F%3Futm_source%3DbDukb6xHEYDF2%26amp%3Butm_campaign%3DKirka2&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNGkpnVKLl8I1IP9aQXtTha-jCnt3A
>>
>>
>> google.com of course is whitelisted.
>
> Why "of course?"
>
> Have you tested what happens if you add "clear_uridnsbl_skip_domain
> google.com" to your config?
>
>
>> Creating a rule to match the string "google.com/url?q=" also is a no go
>> as this would create way to many false positives.
>
> Do not be scared by SA rules matching non-spam. That is a design
> feature, not an inadvertent bug. All of the most useful rules match some
> ham.
>
> It's only really a "false positive" if the total score for a non-spam
> message goes over your local threshold. The fact that the automated
> re-scorer assigns scores well below the default threshold is a clue.
>
>>
>> So if I could somehow extract the domain "kissch icksrr.com"
>> and ckeck it against URI blacklists, we would probably solve that issue.
>>
>> Has anyone already come up with a way how to do that?
>
> I do not believe there's a means of doing that currently. It may be
> possible to work something up using the existing internal blocklisting
> tools (HashBL, enlist*, etc) but I think it will require new code.
>
> It would be an interesting addition to have a way to define arbitrary
> extractor patterns to pull elements out of a string to check against
> hostname blocklists or other specific classes of patterns.
>
>