Mailing List Archive

comparing sender domain against recipient domain
I was wondering if spamassassin is applying some sort of algorithm to comparing sender domain against recipient domain to detect a phishing attempt?
Re: comparing sender domain against recipient domain [ In reply to ]
what useful information would you be looking for from this kind of comparison?
All the time I receive mail from people with non-local domains and regularly
receive e-mail from co-workers using the same domain as me.

The kind of things that might be useful are:
1) detecting local-domain forgeries (IE if you have DKIM/SPF, etc and the
message appears to be from your domain but fails those checks)
2) examining the "comment" part of the From: address to see if it contains a
misleading 'domain-like' text.
EG: From: "bill@my.domain.org" <bill-bob@gmail.com>


On Thu, 11 May 2023, Marc wrote:

> I was wondering if spamassassin is applying some sort of algorithm to comparing sender domain against recipient domain to detect a phishing attempt?
>
>

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: comparing sender domain against recipient domain [ In reply to ]
On 2023-05-11 at 16:22:12 UTC-0400 (Thu, 11 May 2023 20:22:12 +0000)
Marc <Marc@f1-outsourcing.eu>
is rumored to have said:

> I was wondering if spamassassin is applying some sort of algorithm to
> comparing sender domain against recipient domain to detect a phishing
> attempt?

There is a suite of meta rules and subrules with names containing
TO_EQ_FROM in the default rule channel. Consult the rules files for
implementation details.


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
RE: comparing sender domain against recipient domain [ In reply to ]
>
>
> what useful information would you be looking for from this kind of
> comparison?

sender@a1exander.com
recipient@alexander.com

* 3.9 PHISHING 1=l attempt

I assume there are some character substitude algorithms available, maybe an adapted version of an algorithm that tries to detect typos.
RE: comparing sender domain against recipient domain [ In reply to ]
>
> > I was wondering if spamassassin is applying some sort of algorithm to
> > comparing sender domain against recipient domain to detect a phishing
> > attempt?
>
> There is a suite of meta rules and subrules with names containing
> TO_EQ_FROM in the default rule channel. Consult the rules files for
> implementation details.
>
>

hmmm, I guess not

some test message with these headers
test2:~# spamassassin -D < spam-test.txt > out2

Date: Mon, 24 Oct 2016 22:10:07 +0200
To: recipient@alexander.com
From: Lara <sender@a1exander.com>
Subject: asdf asd fas df asdf asdf asd fas dfa sdf
Message-ID: <c62c9a07b9eae3e640f24740c86278ba@a1exander.com>
Return-Path: sender@a1exander.com

gives this result:

X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on test2.local
X-Spam-Flag: YES
X-Spam-Level: ****
X-Spam-Status: Yes, score=4.4 required=3.0 tests=DKIM_ADSP_NXDOMAIN,
EMPTY_MESSAGE,RDNS_NONE,T_TVD_MIME_EPI autolearn=disabled version=3.4.6
X-Spam-Report:
* 0.8 DKIM_ADSP_NXDOMAIN No valid author signature and domain not in
* DNS
* 0.0 T_TVD_MIME_EPI BODY: No description available.
* 1.3 RDNS_NONE Delivered to internal network by a host with no rDNS
* 2.3 EMPTY_MESSAGE Message appears to have no textual parts
Re: comparing sender domain against recipient domain [ In reply to ]
On Thu, May 11, 2023 at 09:41:34PM +0000, Marc wrote:
> > > I was wondering if spamassassin is applying some sort of algorithm to
> > > comparing sender domain against recipient domain to detect a phishing
> > > attempt?
> >
> > There is a suite of meta rules and subrules with names containing
> > TO_EQ_FROM in the default rule channel. Consult the rules files for
> > implementation details.
>
> hmmm, I guess not
>
> some test message with these headers
> test2:~# spamassassin -D < spam-test.txt > out2
>
> Date: Mon, 24 Oct 2016 22:10:07 +0200
> To: recipient@alexander.com
> From: Lara <sender@a1exander.com>

That is because those domains are not EQUAL? Od did you wanted a
rule that checks only on SIMILAR domain names (e.g. with lowercase
letter "L" replaced with number "1" as in your example)?

Also, most of those rules (like __TO_EQ_FROM_DOM) will not show in
standard output, but only in standard error, so you should call it
like this:

spamassassin -D < spam-test.txt > out2 2>&1

to be able to see it in:
grep TO_EQ_FROM out2

--
Opinions above are GNU-copylefted.
Re: comparing sender domain against recipient domain [ In reply to ]
On Fri, 12 May 2023, Matija Nalis wrote:

> On Thu, May 11, 2023 at 09:41:34PM +0000, Marc wrote:
>>>> I was wondering if spamassassin is applying some sort of algorithm to
>>>> comparing sender domain against recipient domain to detect a phishing
>>>> attempt?
>>>
[snip..]
> That is because those domains are not EQUAL? Od did you wanted a
> rule that checks only on SIMILAR domain names (e.g. with lowercase
> letter "L" replaced with number "1" as in your example)?
>

Now I get it, the OP is looking for some kind of comparison function that does
an "apparent linguistic distance" evaluation of two strings and returns a score
that indicates a "visual similarity" value.
(EG replacing 'l' with '1' or 'O' with '0', etc).

several years ago there were a flood of phish messages that had a 'From' address
that used 'PayPaI' to try to fool people.
I've also seen attempts using European character sets with letters that look
like O or e to fake common domain names.

I've hand coded rules to check for this stuff when frequently abused but I don't
know of a programmatic algorithm to do it automagically.

Dave

--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
Re: comparing sender domain against recipient domain [ In reply to ]
On Fri, May 12, 2023 at 09:49:40AM -0500, Dave Funk wrote:
> On Fri, 12 May 2023, Matija Nalis wrote:
> > That is because those domains are not EQUAL? Od did you wanted a
> > rule that checks only on SIMILAR domain names (e.g. with lowercase
> > letter "L" replaced with number "1" as in your example)?
>
> Now I get it, the OP is looking for some kind of comparison function that
> does an "apparent linguistic distance" evaluation of two strings and returns
> a score that indicates a "visual similarity" value.
> (EG replacing 'l' with '1' or 'O' with '0', etc).

It should be relatively easy to write SA plugin for that:

- replace those numeric and uppercase letters in one of the strings,
convert both to lowercase, and compare them

- it should also remove spacer characters (like "paypal" vs "pay-pal")

- It should also not only hit on exact matches, but return similarity
in percentage (so trying to fake "spamassassin" with "spamasassin"
can be detected).

Of course, non-ASCII would complicate those replacement tables
significantly (there are MANY more similar-looking glyphs then in
pure ASCII), but as I treat any IDN domains as suspicios, and they
are easy to detect, it would probably not be such a big deal.

> I've hand coded rules to check for this stuff when frequently abused but I
> don't know of a programmatic algorithm to do it automagically.

I wonder if someone has already done it, and something sufficiently
similar to be used to that purpose?

--
Opinions above are GNU-copylefted.
Re: comparing sender domain against recipient domain [ In reply to ]
On Fri, May 12, 2023 at 05:32:30PM +0200, Reindl Harald wrote:
> > On Fri, May 12, 2023 at 09:49:40AM -0500, Dave Funk wrote:
> > > On Fri, 12 May 2023, Matija Nalis wrote:
> > > > That is because those domains are not EQUAL? Od did you wanted a
> > > > rule that checks only on SIMILAR domain names (e.g. with lowercase
> > > > letter "L" replaced with number "1" as in your example)?
> >
> > It should be relatively easy to write SA plugin for that:
>
> and with *what* do you replace the "1"?

With one of the similar looking characters. Doesn't really matter
which one, but it needs to be done consistently. Personally I'd
probably chose lowercase "L", but it can be anything.

e.g. for simple first variant (i.e. for direct matching, not more
advanced statistical similarity based approach suggested in later
step)

sub normalize_domain($)
{
my ($domain) = @_;

# (yes I know we have tr///)
$domain =~ s/1/l/g; # number 1 to lowercase "L"
$domain =~ s/I/l/g; # uppercase "I" to lowercase "L"

return lc($domain);
}

[...]

if (lc($domain1) ne lc($domain2)) { # domains are NOT the same...
if (normalize_domain($domain1) eq normalize_domain($domain2))) { # ...but they LOOK the same
add_spam_score("domain_is_not_same_but_looks_the_same")
}
}

so normalize_domain() would return the same string for "paypal.com",
"PayPal.com", "PayPaI.com" or "PayPa1.com": i.e. "paypal.com"

It doesn't matter if the result of it isn't the real domain (as it
will be used only for comparison to simularly mangled other domain),
e.g. if one had real domain "TheReallyBest1.com", it would be
normalized to "thereallybestl.com" -- so while that is NOT how domain
is really named, it doesn't matter, as it would still work for
detecting fakes like "TheReallyBestI.com" (regardless if neither
lowercase "L" nor the uppercase "I" are used in real domain name).


> be careful with "relatively easy" when it comes to reality

Sure, I though I was. Do you spot problems with the code above?
Think of any real-life examples where it would backfire or fail to work?

The code like the above looks trivial to me ("relatively easy" was
more geared toward statistical analyses of the words to return
statistical score in percentage instead of simple fake/not_fake
boolean like above; as it should take into account ordering of the
letters, missed letters, duplicated letters, dyslexia-alike reversal
of two neighboring letters and similar psychological ways in which
human mind can easily be fooled). Still might take few weeks to make
it to reasonably publishable shape...

But I was more interested if SA already has something like that?
I haven't dabbled in 4.0 yet, and there might be code already
writting to accomplish similar things, so it would be a waste to
reinvent a wheel.

--
Opinions above are GNU-copylefted.
Re: comparing sender domain against recipient domain [ In reply to ]
On 2023-05-12 at 15:16:59 UTC-0400 (Fri, 12 May 2023 21:16:59 +0200)
Matija Nalis <mnalis-sa-list@voyager.hr>
is rumored to have said:

> But I was more interested if SA already has something like that?

It does not.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: comparing sender domain against recipient domain [ In reply to ]
>> But I was more interested if SA already has something like that?
>
> It does not.

Weren't there a whole set of "FUZZY" rules once? I'm pretty sure that they
looked for words in in the subject and maybe body of the email that had
exactly this sort of obfuscation. I don't think they were applied to domain
names, and certinaly not matching two fields in different headers. But if
the code for the fuzzy rules is still around, it possibly could be adapted
for this use.
Re: comparing sender domain against recipient domain [ In reply to ]
On Fri, 12 May 2023, Matija Nalis wrote:

> I wonder if someone has already done it, and something sufficiently
> similar to be used to that purpose?

There are a lot of ReplaceTags rules in the base ruleset.

I don't know if offhand that works with header rules.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Maxim XXXV: That which does not kill you has made a tactical error.
-----------------------------------------------------------------------
2 days until the 75th anniversary of Israel's independence
Re: comparing sender domain against recipient domain [ In reply to ]
On Fri, 12 May 2023, Loren Wilton wrote:

>>> But I was more interested if SA already has something like that?
>>
>> It does not.
>
> Weren't there a whole set of "FUZZY" rules once?

There still are.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Before Adolph Hitler came to power, there was a black market in
firearms, but the German people had been so conditioned to be law
abiding, that they would never consider buying an unregistered
gun. The German people really believed that only hoodlums own such
guns. What fools we were. -- Theodore Haas, Dachau survivor
-----------------------------------------------------------------------
2 days until the 75th anniversary of Israel's independence
RE: comparing sender domain against recipient domain [ In reply to ]
>
> On Fri, May 12, 2023 at 05:32:30PM +0200, Reindl Harald wrote:
> > > On Fri, May 12, 2023 at 09:49:40AM -0500, Dave Funk wrote:
> > > > On Fri, 12 May 2023, Matija Nalis wrote:
> > > > > That is because those domains are not EQUAL? Od did you wanted a
> > > > > rule that checks only on SIMILAR domain names (e.g. with
> lowercase
> > > > > letter "L" replaced with number "1" as in your example)?
> > >
> > > It should be relatively easy to write SA plugin for that:
> >
> > and with *what* do you replace the "1"?
>
> With one of the similar looking characters. Doesn't really matter
> which one, but it needs to be done consistently. Personally I'd
> probably chose lowercase "L", but it can be anything.
>
> e.g. for simple first variant (i.e. for direct matching, not more
> advanced statistical similarity based approach suggested in later
> step)
>
> sub normalize_domain($)
> {
> my ($domain) = @_;
>
> # (yes I know we have tr///)
> $domain =~ s/1/l/g; # number 1 to lowercase "L"
> $domain =~ s/I/l/g; # uppercase "I" to lowercase "L"
>
> return lc($domain);
> }
>
> [...]
>
> if (lc($domain1) ne lc($domain2)) { # domains are NOT the same...
> if (normalize_domain($domain1) eq normalize_domain($domain2))) { #
> ...but they LOOK the same
> add_spam_score("domain_is_not_same_but_looks_the_same")
> }
> }
>
> so normalize_domain() would return the same string for "paypal.com",
> "PayPal.com", "PayPaI.com" or "PayPa1.com": i.e. "paypal.com"
>
> It doesn't matter if the result of it isn't the real domain (as it
> will be used only for comparison to simularly mangled other domain),
> e.g. if one had real domain "TheReallyBest1.com", it would be
> normalized to "thereallybestl.com" -- so while that is NOT how domain
> is really named, it doesn't matter, as it would still work for
> detecting fakes like "TheReallyBestI.com" (regardless if neither
> lowercase "L" nor the uppercase "I" are used in real domain name).
>
>
> > be careful with "relatively easy" when it comes to reality
>
> Sure, I though I was. Do you spot problems with the code above?
> Think of any real-life examples where it would backfire or fail to work?
>
> The code like the above looks trivial to me ("relatively easy" was
> more geared toward statistical analyses of the words to return
> statistical score in percentage instead of simple fake/not_fake
> boolean like above; as it should take into account ordering of the
> letters, missed letters, duplicated letters, dyslexia-alike reversal
> of two neighboring letters and similar psychological ways in which
> human mind can easily be fooled). Still might take few weeks to make
> it to reasonably publishable shape...
>
> But I was more interested if SA already has something like that?
> I haven't dabbled in 4.0 yet, and there might be code already
> writting to accomplish similar things, so it would be a waste to
> reinvent a wheel.
>

Hi Matija,

It is nice to see such interest in this topic. The goal is indeed to catch purposfully chosen domains that could mislead the recipient like john@her0.com > admin@hero.com not to be mistaken with info@paypa1.com > admin@hero.com although these algorithms are probably very similar.

Catching stuff like paypa1.com could be a start, and if you combine the knowledge of knowing that the received email is not from the same company, but external. One could apply the checks on the sender/recipient combination.

For similar character sets one could also look at password generators that do exactly the opposite, skip such characters in passwords.

If I am not mistaken, some registries are already utilizing technology to try and catch phishing domains.
Re: comparing sender domain against recipient domain [ In reply to ]
>>>> But I was more interested if SA already has something like that?

>>> It does not.

>On Fri, 12 May 2023, Loren Wilton wrote:
>>Weren't there a whole set of "FUZZY" rules once?

On 12.05.23 20:01, John Hardin wrote:
>There still are.

however these rules only search for words like viagra, unubscribe etc.

they don't compare domains to each other.

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Chernobyl was an Windows 95 beta test site.
Re: comparing sender domain against recipient domain [ In reply to ]
A while back I created a plugin for checking Levenshtein distance on From
and To domains, this might answer the problem?

An example configuration might look like this -

This would look just for From domains with a distance equal to 1 from
alexander.com

---8<---
ifplugin Mail::SpamAssassin::Plugin::Levenshtein
header LEVENSHTEIN_ALEXANDER_VCLOSE eval:check_levenshtein_from('
alexander.com', 1)
describe LEVENSHTEIN_ALEXANDER_VCLOSE From domain has distance of 1
from alexander.com
score LEVENSHTEIN_ALEXANDER_VCLOSE 0.1
endif
---8<---

A bit more generic use, protecting To domains -

---8<---
ifplugin Mail::SpamAssassin::Plugin::WLBLEval &&
Mail::SpamAssassin::Plugin::Levenshtein
enlist_addrlist (LEVENSHTEINPROTECT) *@alexander.com
header __LEVENSHTEIN_PROTECT eval:check_to_in_list('LEVENSHTEINPROTECT')

header __LEVENSHTEIN_FROM eval:check_levenshtein()

meta LEVENSHTEIN_PROTECT __LEVENSHTEIN_PROTECT && __LEVENSHTEIN_FROM
describe LEVENSHTEIN_PROTECT From address has a close distance to To
address
score LEVENSHTEIN_PROTECT 0.1
endif
---8<---

Looking at something like paypal -

---8<---
ifplugin Mail::SpamAssassin::Plugin::Levenshtein
header LEVENSHTEIN_PAYPAL_VCLOSE
eval:check_levenshtein_from('paypal', 1)
describe LEVENSHTEIN_PAYPAL_VCLOSE From domain has distance of 1 from
paypal
score LEVENSHTEIN_PAYPAL_VCLOSE 0.1
endif
---8<---

There are a few more examples and details here

https://github.com/fmbla/spamassassin-levenshtein/

Note that this is a third party plugin.

Paul
Re: comparing sender domain against recipient domain [ In reply to ]
On Sat, 13 May 2023, Matus UHLAR - fantomas wrote:

>>>>> But I was more interested if SA already has something like that?
>
>>>> It does not.
>
>> On Fri, 12 May 2023, Loren Wilton wrote:
>>> Weren't there a whole set of "FUZZY" rules once?
>
> On 12.05.23 20:01, John Hardin wrote:
>> There still are.
>
> however these rules only search for words like viagra, unubscribe etc.
>
> they don't compare domains to each other.

The techniques should apply to header rules assuming the ReplaceTags works
on header rules. I don't know any reson it wouldn't, I've just never tried
it.

It would be difficult to provide site-specific phishing rules in the base
ruleset, of course, but perhaps some examples could be added for domains
like (as noted) paypal.com, and those could be used as examples for
someone wanting to make a site-custom phishing rule.

I'll try to play with that this weekend and see if it bears fruit.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
When designing software, any time you think to yourself "a user
would never be stupid enough to do *that*", you're wrong.
-----------------------------------------------------------------------
Tomorrow: the 75th anniversary of Israel's independence
RE: comparing sender domain against recipient domain [ In reply to ]
On Thu, 11 May 2023, Marc wrote:

>>> I was wondering if spamassassin is applying some sort of algorithm to
>>> comparing sender domain against recipient domain to detect a phishing
>>> attempt?
>>
>> There is a suite of meta rules and subrules with names containing
>> TO_EQ_FROM in the default rule channel. Consult the rules files for
>> implementation details.
>>
>>
>
> hmmm, I guess not
>
> some test message with these headers
> test2:~# spamassassin -D < spam-test.txt > out2
>
> Date: Mon, 24 Oct 2016 22:10:07 +0200
> To: recipient@alexander.com
> From: Lara <sender@a1exander.com>


Try this:


header __TO_OUR_DOMAIN To:addr =~ /alexander\.com/i
header __FROM_OUR_DOMAIN_FUZZY From =~ /(?!alexander)<A><L><E><X><A><N><D><E><R>\.com/i
replace_rules __FROM_OUR_DOMAIN_FUZZY
meta OUR_DOMAIN_SPOOFED_FROM __TO_OUR_DOMAIN && __FROM_OUR_DOMAIN_FUZZY

Note that the Levenshtein distance plugin would be a more general
solution, but this might be quite useful.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
An operating system design that requires a system reboot in order
to install a document viewing utility does not earn my respect.
-----------------------------------------------------------------------
Tomorrow: the 75th anniversary of Israel's independence