Mailing List Archive

FROM header obfuscation
Hi All,

Recently we're seeing more spam passing our spamfilters using text
obfuscating in the FROM header. The problem mainly targets users which
are using mail clients like iPhone Mail which are only displaying the
display name of the FROM header and not the actual email address which
was used, bypassing DKIM measures. For example:

From: =?UTF-8?B?0KBvc3RubC5ubCDQoGFra2V0?= <are@qbocel.com>

This is base64 encoded "?ostnl.nl ?akket" and pretends to come from
Postnl, a dutch snailmail company. However the hexadecimal
representation of this base64 decoded text differs from that of normal
ASCII:

Obfuscated:

$ printf "?ostnl.nl ?akket" | od -A n -t x1
 d0 a0 6f 73 74 6e 6c 2e 6e 6c 20 d0 a0 61 6b 6b
 65 74

Plain ASCII:

$ printf "Postnl.nl Pakket" | od -A n -t x1
 50 6f 73 74 6e 6c 2e 6e 6c 20 50 61 6b 6b 65 74

There is no way to tell the difference with the naked eye. You can
obfuscate text using this online tool: https://obfuscator.uo1.net/

Is there any way to detect this type of obfuscation with a spamassassin
rule?

Best regards,
Frido Otten
Re: FROM header obfuscation [ In reply to ]
Frido Otten wrote:
> Hi All,
>
> Recently we're seeing more spam passing our spamfilters using text
> obfuscating in the FROM header. The problem mainly targets users which
> are using mail clients like iPhone Mail which are only displaying the
> display name of the FROM header and not the actual email address which
> was used, bypassing DKIM measures. For example:
>
> From: =?UTF-8?B?0KBvc3RubC5ubCDQoGFra2V0?= <are@qbocel.com>
>
> This is base64 encoded "?ostnl.nl ?akket" and pretends to come from
> Postnl, a dutch snailmail company. However the hexadecimal
> representation of this base64 decoded text differs from that of normal
> ASCII:
>
> Obfuscated:
>
> $ printf "?ostnl.nl ?akket" | od -A n -t x1
>  d0 a0 6f 73 74 6e 6c 2e 6e 6c 20 d0 a0 61 6b 6b
>  65 74
>
> Plain ASCII:
>
> $ printf "Postnl.nl Pakket" | od -A n -t x1
>  50 6f 73 74 6e 6c 2e 6e 6c 20 50 61 6b 6b 65 74
>
> There is no way to tell the difference with the naked eye.

That depends on the font. Many variations do in fact look different,
and from some of the FP-approaching "ham" I've seen that abuses this I
can only conclude that some marketing.... person has decided that this
is Necessary and Required and the tech folks can Go Suck It.

As far as I'm concerned, formatting outside of language accents on
characters absolutely does NOT belong in either the From: name or
Subject. An "a" in the From: name or Subject absolutely MUST be
presented as a US-ASCII "a", and not some extended UTF8 lookalike
that's... oooooo! in *italics*!

Naturally the spammers go to various amounts of effort to avoid the ones
that are clearly different.

> Is there any way to detect this type of obfuscation with a spamassassin
> rule?

I have a longish list of rule groups similar to below for different
extended UTF8 ASCII-lookalike characters and words. Some are derived
from rules discussed on this list within the past year or so.

header __SUSP_NAME_CHAR_01 From:name =~ /(?:\xd0[\xa0-\xbf])/
tflags __SUSP_NAME_CHAR_01 multiple maxhits 10
header __SUSP_NAME_CHAR_02 From:name =~
/(?:\xef\xbc[\x80-\xbf]|\xef\xbd[\x80-\xa0])/
tflags __SUSP_NAME_CHAR_02 multiple maxhits 10
meta __SUSP_NAME_CHAR __SUSP_NAME_CHAR_01 + __SUSP_NAME_CHAR_02
meta SUSP_NAME_CHAR_5 __SUSP_NAME_CHAR >= 5
describe SUSP_NAME_CHAR_5 5 or more lookalike characters in the
From: name
score SUSP_NAME_CHAR_5 1.5
meta SUSP_NAME_CHAR_10 __SUSP_NAME_CHAR >= 10
describe SUSP_NAME_CHAR_10 10 or more lookalike characters in the
From: name
score SUSP_NAME_CHAR_10 1.75

I've used this tool:

https://www.utf8-chartable.de/

with a bit of effort to take an example character and locate the full
a-z list of entries for these rules. (Convert individual characters to
hex, then flip pages until you've found the fakes. There are many groups.)

Single characters are trickier; depending on context I've added rules
for individual lookalike characters, or whole words with mixed variants
(and an exclusion for pure ASCII) as I see new runs of FNs.

-kgd
Re: FROM header obfuscation [ In reply to ]
(Please keep mail on-list)

Laurent S. wrote:
> On Tuesday, February 8th, 2022 at 16:41, Kris Deugau <kdeugau@vianet.ca> wrote:
>
>> I have a longish list of rule groups similar to below for different
>> extended UTF8 ASCII-lookalike characters and words. Some are derived
>> from rules discussed on this list within the past year or so.
>> header __SUSP_NAME_CHAR_01 From:name =~ /(?:\xd0[\xa0-\xbf])/
>> tflags __SUSP_NAME_CHAR_01 multiple maxhits 10
>> header __SUSP_NAME_CHAR_02 From:name =~
>> /(?:\xef\xbc[\x80-\xbf]|\xef\xbd[\x80-\xa0])/
>> tflags __SUSP_NAME_CHAR_02 multiple maxhits 10
>> meta __SUSP_NAME_CHAR __SUSP_NAME_CHAR_01 + __SUSP_NAME_CHAR_02
>> meta SUSP_NAME_CHAR_5 __SUSP_NAME_CHAR >= 5
>> describe SUSP_NAME_CHAR_5 5 or more lookalike characters in the
>> From: name
>> score SUSP_NAME_CHAR_5 1.5
>> meta SUSP_NAME_CHAR_10 __SUSP_NAME_CHAR >= 10
>> describe SUSP_NAME_CHAR_10 10 or more lookalike characters in the
>> From: name
>> score SUSP_NAME_CHAR_10 1.75
>> I've used this tool:
>> https://www.utf8-chartable.de/
>> with a bit of effort to take an example character and locate the full
>> a-z list of entries for these rules. (Convert individual characters to
>> hex, then flip pages until you've found the fakes. There are many groups.)
>> Single characters are trickier; depending on context I've added rules
>> for individual lookalike characters, or whole words with mixed variants
>> (and an exclusion for pure ASCII) as I see new runs of FNs.
>>
>
>> -kgd
>
> Out of curiosity, I've tested it with a replace_tag rule (/<P><O><S><T>/) without luck. Shouldn't those UTF8 range be added to the ReplaceTags plugin?

Probably. However, the rules as above and the other similar ones I've
set up locally are detecting the abstracted use of certain subsets of
these variant characters seen in local FNs (often different variant sets
for different cases, FN corpus depending), not variations of a
particular character as used for ReplaceTags.

To put it another way, I explicitly do not care about *what* these
characters are spelling out, just the fact that they're present at all
in certain places where I consider them to be inherently invalid. I
also *don't* want to match the ASCII version - ReplaceTags substitutions
usually include the base ASCII character, so your final rule has to have
some exclusion component on its own, eg:

/(?!Post)<P><O><S><T>/

or

/(?!P)<P><(?!o)<O>(?!s)<S>(?!t)<T>/

etc.

TBH for specific phishing cases like yours, I would tend to just
copy-paste the spoofed From: name into a rule directly - text editor
depending, this should work fine. Perl will happily match the literal
pasted character or the hex sequence equally well unless your editor
mangles the character.

-kgd
Re: FROM header obfuscation [ In reply to ]
On Thursday, February 10th, 2022 at 16:33, Kris Deugau <kdeugau@vianet.ca> wrote:

> (Please keep mail on-list)

Oops, replied too quick without checking this. Sorry.

> > Out of curiosity, I've tested it with a replace_tag rule (/<P><O><S><T>/) without luck. Shouldn't those UTF8 range be added to the ReplaceTags plugin?
>

> Probably. However, the rules as above and the other similar ones I've
> set up locally are detecting the abstracted use of certain subsets of
> these variant characters seen in local FNs (often different variant sets
> for different cases, FN corpus depending), not variations of a
> particular character as used for ReplaceTags.
> To put it another way, I explicitly do not care about what these
> characters are spelling out, just the fact that they're present at all
> in certain places where I consider them to be inherently invalid. I
> also don't want to match the ASCII version - ReplaceTags substitutions
> usually include the base ASCII character, so your final rule has to have
> some exclusion component on its own, eg:
> /(?!Post)<P><O><S><T>/
> or
> /(?!P)<P><(?!o)<O>(?!s)<S>(?!t)<T>/
> etc.
> TBH for specific phishing cases like yours, I would tend to just
> copy-paste the spoofed From: name into a rule directly - text editor
> depending, this should work fine. Perl will happily match the literal
> pasted character or the hex sequence equally well unless your editor
> mangles the character.
> -kgd

I think both are valid. Your way to counting the number of those special characters is great. But I also want to be able to block some specific strings like the usual suspects (paypal, dhl, volksbank, post, ...) where a single unexpected char is enough. I've been using a meta for this, with the same idea that you just gave.


I guess a few people created their own ReplaceTags with for instance their own company name. Including letter in \xd0[\xa0-\xbf] in ReplaceTags would be good I think.