Mailing List Archive

Negative lookbehind in URIs?
I'm looking to detect a mismatch between the domain in the href
property of a URI and a domain in the anchor text itself. It seems
like this is the right place for a negative lookbehind, and I don't
mind writing my own rule, but I can't help thinking that this has been
solved already. Searching the list for lookbehind comes up with a
couple of instances of people getting errors (about a variable length
lookbehind), but I'm not finding anything like what I'm looking for.

Does anyone have a sample rule for this, or other suggestions on how
to detect this is in SA (maybe a plugin)?

--
Public key #7BBC68D9 at | Shane Williams
http://pgp.mit.edu/ | System Admin - UT CompSci
=----------------------------------+-------------------------------
All syllogisms contain three lines | shanew@shanew.net
Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
Re: Negative lookbehind in URIs? [ In reply to ]
On 14 Jul 2020, at 18:02, Shane Williams wrote:

> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.

That will match a lot of ham. I'm not saying that it is a bad rule but
it would probably need to be a component in meta-rules to be useful.

> It seems
> like this is the right place for a negative lookbehind, and I don't
> mind writing my own rule, but I can't help thinking that this has been
> solved already. Searching the list for lookbehind comes up with a
> couple of instances of people getting errors (about a variable length
> lookbehind), but I'm not finding anything like what I'm looking for.
>
> Does anyone have a sample rule for this, or other suggestions on how
> to detect this is in SA (maybe a plugin)?

I'm also somewhat surprised to find that it's not there already, but
indeed it is not.

If you come up with a good rule for it, PLEASE share it here. If you
want it tested in RuleQA, I'd be happy to drop a candidate rule into my
testing sandbox.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not For Hire (currently)
Re: Negative lookbehind in URIs? [ In reply to ]
> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.

Not using lookbehind, but I long ago wrote these two rules to look for similar situations. Either could be modified fairly easily to do what you want.

Note: these are probably around 10 years old, written before there were URI rules (if I remember correctly) so there may be more efficient ways to do these these days.

Loren

#check for attempting to phish
rawbody __LW_PHISH_2 m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
full __LW_PHISH_2a m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
meta LW_PHISH_2 __LW_PHISH_2 || __LW_PHISH_2a
score LW_PHISH_2 50
describe LW_PHISH_2 numeric href with https description
#score __LW_PHISH_2 1
#score __LW_PHISH_2a 1

rawbody __LW_PHISH_3 /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
full __LW_PHISH_3a /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
meta LW_PHISH_3 __LW_PHISH_3 || __LW_PHISH_3a
score LW_PHISH_3 50
describe LW_PHISH_3 secure description with insecure link
#score __LW_PHISH_3 10
#score __LW_PHISH_3a 1
Re: Negative lookbehind in URIs? [ In reply to ]
On 14 Jul 2020, at 20:20, Loren Wilton wrote:

>> I'm looking to detect a mismatch between the domain in the href
>> property of a URI and a domain in the anchor text itself.
>
> Not using lookbehind, but I long ago wrote these two rules to look for
> similar situations. Either could be modified fairly easily to do what
> you want.
>
> Note: these are probably around 10 years old, written before there
> were URI rules (if I remember correctly) so there may be more
> efficient ways to do these these days.
>
> Loren
>
> #check for attempting to phish
> rawbody __LW_PHISH_2
> m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
> full __LW_PHISH_2a
> m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
> meta LW_PHISH_2 __LW_PHISH_2 || __LW_PHISH_2a
> score LW_PHISH_2 50
> describe LW_PHISH_2 numeric href with https description
> #score __LW_PHISH_2 1
> #score __LW_PHISH_2a 1
>
> rawbody __LW_PHISH_3 /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
> full __LW_PHISH_3a /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
> meta LW_PHISH_3 __LW_PHISH_3 || __LW_PHISH_3a
> score LW_PHISH_3 50
> describe LW_PHISH_3 secure description with insecure link
> #score __LW_PHISH_3 10
> #score __LW_PHISH_3a 1

There are rough equivalents to these in the current default rules:
HTTPS_IP_MISMATCH and HTTPS_HTTP_MISMATCH.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not For Hire (currently)
Re: Negative lookbehind in URIs? [ In reply to ]
> There are rough equivalents to these in the current default rules:
> HTTPS_IP_MISMATCH and HTTPS_HTTP_MISMATCH.

I'm not surprised. Those were my original rules, which became SARE rules,
and a number of those still exist under different names.

Loren
Re: Negative lookbehind in URIs? [ In reply to ]
Dear Shane,

Have you had a look at the uri_detail plugin? You should find
interesting info there:

perldoc Mail::SpamAssassin::Plugin::URIDetail

I guess you should be able to do what you want with this plugin. But I
rarely use it, so I can't help you further.

In order to catch those mismatch that you mention, I rather use the
phish sigs from ClamAV, which is very convenient to use.

https://www.clamav.net/documents/phishsigs

Lastly, as Bill Cole mentioned, you will have a lot of false positives.
You should curate a list of commonly abused URI and only try to catch
those. There are too many ESP rewriting links (for tracking purposes)...
There are even banks using those ESP...

Best,

Laurent

On 15.07.20 00:02, Shane Williams wrote:
>
> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself. It seems
> like this is the right place for a negative lookbehind, and I don't
> mind writing my own rule, but I can't help thinking that this has been
> solved already. Searching the list for lookbehind comes up with a
> couple of instances of people getting errors (about a variable length
> lookbehind), but I'm not finding anything like what I'm looking for.
>
> Does anyone have a sample rule for this, or other suggestions on how
> to detect this is in SA (maybe a plugin)?
>
> --
> Public key #7BBC68D9 at | Shane Williams
> http://pgp.mit.edu/ | System Admin - UT CompSci
> =----------------------------------+-------------------------------
> All syllogisms contain three lines | shanew@shanew.net
> Therefore this is not a syllogism | www.ischool.utexas.edu/~shanew
>
Re: Negative lookbehind in URIs? [ In reply to ]
Nice Loren....
nowadays with uri_detail this is easily solved with something like
uri_detail          HTTPS_HTTP_MISMATCH     text =~ /^https:\/\//i     cleaned =~ /^http:\/\//iscore                 HTTPS_HTTP_MISMATCH     0.5describe            HTTPS_HTTP_MISMATCH     URL claims to use SSL but it does not


---------Pedro

>On Wednesday, July 15, 2020, 02:20:34 AM GMT+2, Loren Wilton <lwilton@earthlink.net> wrote:
> I'm looking to detect a mismatch between the domain in the href
> property of a URI and a domain in the anchor text itself.   >Not using lookbehind, but I long ago wrote these two rules to look for similar situations. Either could be modified fairly easily to do what you want.

>Note: these are probably around 10 years old, written before there were URI rules (if I remember correctly) so there may be more efficient ways to do these these days.         Loren

>#check for attempting to phish
>rawbody __LW_PHISH_2   m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
>full    __LW_PHISH_2a  m'<a\s+[\s\w=\.]*href=\"https?://\d+[^>]+>https://[^\d]'is
>meta    LW_PHISH_2     __LW_PHISH_2 || __LW_PHISH_2a
>score   LW_PHISH_2      50
>describe LW_PHISH_2    numeric href with https description
>#score   __LW_PHISH_2  1
>#score   __LW_PHISH_2a 1
>rawbody  __LW_PHISH_3  /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
>full     __LW_PHISH_3a /<a\s+[\s\w=\.]*href=\"http:[^>]+>https:/is
>meta     LW_PHISH_3    __LW_PHISH_3 || __LW_PHISH_3a
>score    LW_PHISH_3    50
>describe LW_PHISH_3    secure description with insecure link
>#score   __LW_PHISH_3  10
>#score   __LW_PHISH_3a 1
Re: Negative lookbehind in URIs? [ In reply to ]
Bill, Shane...

we do that with a plugin becasue exceptions must be considered...  for example to avoid false positives with rewrited URLs  (used by some companies)

-----Pedro.