Mailing List Archive

Homoglyph spam/phishing targeting popular brands
What are the community's thoughts on handling spam/phishing that utilize
homoglyphs to obfuscate the brands they're targeting? Are there any
plugins that are in development that might assist with catching these?

For example, here are some phrases that I've been monitoring from reported
messages:

* that Âmåzon has received
* Äpple Watch
* Ã??le iPad
* A??le iPad
* PäyPäl Credit
* P?yP?l Credit
* Spãce Gray
* to Over Støck Inc on
* subscribed for Nõrtõn Yearly
* subscribed for Nõrtøn Yearly
* the Nõrtõn Freedom Protection

Existing rules (mainline SpamAssassin channel, KAM, etc.) don't seem to
flag much, if anything substantial, on the messages I've seen with this
behavior. I've trained bayes on each, and created a custom set of rules to
try to catch various patterns used in the messages.
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Sun, 14 Feb 2021, Ricky Boone wrote:

> What are the community's thoughts on handling spam/phishing that utilize
> homoglyphs to obfuscate the brands they're targeting? Are there any
> plugins that are in development that might assist with catching these?

Take a look at the definition of the FUZZY rules.

There's no general plugin for this currently. That would be a bit
difficult to do on-the-fly without getting (potentially lots of) FPs on
non-English words.

At the moment it's:

1) notice that some word is being obfuscated
2) add a FUZZY rule for that word
3) tune it for FPs (may hit legitimate words in non-English, exclude them)

The problem is such obfuscations may not be common enough in the masscheck
corpora for the rules to be promoted, scored and published.


> For example, here are some phrases that I've been monitoring from reported
> messages:
>
> * that Âmåzon has received
> * Äpple Watch
> * Ã??le iPad
> * A??le iPad
> * PäyPäl Credit
> * P?yP?l Credit
> * Spãce Gray
> * to Over Støck Inc on
> * subscribed for Nõrtõn Yearly
> * subscribed for Nõrtøn Yearly
> * the Nõrtõn Freedom Protection
>
> Existing rules (mainline SpamAssassin channel, KAM, etc.) don't seem to
> flag much, if anything substantial, on the messages I've seen with this
> behavior. I've trained bayes on each, and created a custom set of rules to
> try to catch various patterns used in the messages.

I've added FUZZY rules for amazon, apple, microsoft, facebook, paypal and
norton to my sandbox, they are likely going to be fairly commonB.

How often do you see (over)stock and space obfuscated?

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
At $8 billion per year, the TSA is the most expensive
theatrical production in history. -- David Burge @iowahawkblog
-----------------------------------------------------------------------
8 days until George Washington's 289th Birthday
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Sun, Feb 14, 2021 at 4:45 PM John Hardin <jhardin@impsec.org> wrote:
>
> On Sun, 14 Feb 2021, Ricky Boone wrote:
>
> > What are the community's thoughts on handling spam/phishing that utilize
> > homoglyphs to obfuscate the brands they're targeting? Are there any
> > plugins that are in development that might assist with catching these?
>
> Take a look at the definition of the FUZZY rules.
>
> There's no general plugin for this currently. That would be a bit
> difficult to do on-the-fly without getting (potentially lots of) FPs on
> non-English words.
>
> At the moment it's:
>
> 1) notice that some word is being obfuscated
> 2) add a FUZZY rule for that word
> 3) tune it for FPs (may hit legitimate words in non-English, exclude them)

Good to know. I'll check out the FUZZY rules for possible rules in the future.

> The problem is such obfuscations may not be common enough in the masscheck
> corpora for the rules to be promoted, scored and published.

Understood. There may be better rules that could be built with
additional context other than just the individual words/phrases. If
there is interest in the original messages, I can make sanitized
versions available.

> > For example, here are some phrases that I've been monitoring from reported
> > messages:
> >
> > * that Âmåzon has received
> > * Äpple Watch
> > * Ã??le iPad
> > * A??le iPad
> > * PäyPäl Credit
> > * P?yP?l Credit
> > * Spãce Gray
> > * to Over Støck Inc on
> > * subscribed for Nõrtõn Yearly
> > * subscribed for Nõrtøn Yearly
> > * the Nõrtõn Freedom Protection
> >
> > Existing rules (mainline SpamAssassin channel, KAM, etc.) don't seem to
> > flag much, if anything substantial, on the messages I've seen with this
> > behavior. I've trained bayes on each, and created a custom set of rules to
> > try to catch various patterns used in the messages.
>
> I've added FUZZY rules for amazon, apple, microsoft, facebook, paypal and
> norton to my sandbox, they are likely going to be fairly commonB.
>
> How often do you see (over)stock and space obfuscated?

So far, 4 times and once, respectively, the latter in context was
describing a version of an Apple iPad, so full product names must have
been used for the input to whatever homoglyph generating process the
spammers were using.
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Sun, 14 Feb 2021, Ricky Boone wrote:

> On Sun, Feb 14, 2021 at 4:45 PM John Hardin <jhardin@impsec.org> wrote:
>>
>> How often do you see (over)stock and space obfuscated?
>
> So far, 4 times and once, respectively

OK, I added FUZZY_OVERSTOCK as well, we'll see what happens.

If they don't perform well in masscheck you can always grab them out of my
sandbox for your local rules.

Masscheck results:

https://ruleqa.spamassassin.org/?rule=%2FFUZZY_



--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Precision mis-clicks since 1994!
-----------------------------------------------------------------------
8 days until George Washington's 289th Birthday
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On 2/14/2021 9:58 PM, Ricky Boone wrote:
> On Sun, Feb 14, 2021 at 4:45 PM John Hardin <jhardin@impsec.org> wrote:
>> On Sun, 14 Feb 2021, Ricky Boone wrote:
>>
>>> What are the community's thoughts on handling spam/phishing that utilize
>>> homoglyphs to obfuscate the brands they're targeting? Are there any
>>> plugins that are in development that might assist with catching these?
>> Take a look at the definition of the FUZZY rules.
>>
>> There's no general plugin for this currently. That would be a bit
>> difficult to do on-the-fly without getting (potentially lots of) FPs on
>> non-English words.
>>
>> At the moment it's:
>>
>> 1) notice that some word is being obfuscated
>> 2) add a FUZZY rule for that word
>> 3) tune it for FPs (may hit legitimate words in non-English, exclude them)
> Good to know. I'll check out the FUZZY rules for possible rules in the future.
>
>> The problem is such obfuscations may not be common enough in the masscheck
>> corpora for the rules to be promoted, scored and published.
> Understood. There may be better rules that could be built with
> additional context other than just the individual words/phrases. If
> there is interest in the original messages, I can make sanitized
> versions available.
>
>>> For example, here are some phrases that I've been monitoring from reported
>>> messages:
>>>
>>> * that Âmåzon has received
>>> * Äpple Watch
>>> * Ã??le iPad
>>> * A??le iPad
>>> * PäyPäl Credit
>>> * P?yP?l Credit
>>> * Spãce Gray
>>> * to Over Støck Inc on
>>> * subscribed for Nõrtõn Yearly
>>> * subscribed for Nõrtøn Yearly
>>> * the Nõrtõn Freedom Protection
>>>
>>> Existing rules (mainline SpamAssassin channel, KAM, etc.) don't seem to
>>> flag much, if anything substantial, on the messages I've seen with this
>>> behavior. I've trained bayes on each, and created a custom set of rules to
>>> try to catch various patterns used in the messages.
>> I've added FUZZY rules for amazon, apple, microsoft, facebook, paypal and
>> norton to my sandbox, they are likely going to be fairly commonB.
>>
>> How often do you see (over)stock and space obfuscated?
> So far, 4 times and once, respectively, the latter in context was
> describing a version of an Apple iPad, so full product names must have
> been used for the input to whatever homoglyph generating process the
> spammers were using.

Ricky,

The CHAOS module *may* do what you want.  It has a lot of Unicode
support, especially for Unicode "Look alike" characters.  It also has
detection for multiple Unicode Character Sets.

Frankly, in my experience, legitimate Email Subject and From names stay
within their character sets.  Chinese use Chinese characters; Greeks,
Greek; Arabs, Arabic, and so on.  As for body rules, that's a little
more complex; a lot of moving parts there, from the SA and PERL versions
to the RE2C compiler.   On occasion even I have to crank out a
mathematical symbol or two or plunk in an emoji.

telecom2k3/CHAOS: PERL plugin module for SpamAssassin (github.com)
<https://github.com/telecom2k3/CHAOS>

<https://github.com/telecom2k3/CHAOS>-- Jared
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Mon, 15 Feb 2021 23:58:17 -0500
Jared Hall wrote:


>
> The CHAOS module *may* do what you want.  ...  It also has
> detection for multiple Unicode Character Sets.

That's not a bad idea, but if anyone is interested I'd suggest copying
the character matching regexes into ordinary rules. Or better still into
template tags, so that they can be reused in multiple rules.

I don't think there's much, if anything, in that module that benefits
from being in perl.

Also the "adaptive scoring" seems like a bad idea to me. The scores are
hard-coded fractions of one of the three thresholds. The choice of
which threshold is used is also hard-coded per rule. The only sense in
which it's adaptive is that it opposes an admin adjusting how
aggressive the filtering should be.
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Mon, Feb 15, 2021 at 12:16 AM John Hardin <jhardin@impsec.org> wrote:
>
> OK, I added FUZZY_OVERSTOCK as well, we'll see what happens.
>
> If they don't perform well in masscheck you can always grab them out of my
> sandbox for your local rules.
>
> Masscheck results:
>
> https://ruleqa.spamassassin.org/?rule=%2FFUZZY_

Nice, thanks!

I see the test rules got picked up with sa-update, and they all work
against the samples I have. It does appear that T_FUZZY_APPLE is
catching some FP's. Word boundaries might need to be added, as words
like "happiest" get caught by it.
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Tue, 16 Feb 2021, Ricky Boone wrote:

> On Mon, Feb 15, 2021 at 12:16 AM John Hardin <jhardin@impsec.org> wrote:
>>
>> OK, I added FUZZY_OVERSTOCK as well, we'll see what happens.
>>
>> If they don't perform well in masscheck you can always grab them out of my
>> sandbox for your local rules.
>>
>> Masscheck results:
>>
>> https://ruleqa.spamassassin.org/?rule=%2FFUZZY_
>
> Nice, thanks!
>
> I see the test rules got picked up with sa-update, and they all work
> against the samples I have. It does appear that T_FUZZY_APPLE is
> catching some FP's. Word boundaries might need to be added, as words
> like "happiest" get caught by it.

Yep, I've addressed that, take a look at the latest masscheck results.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Are you a mildly tech-literate politico horrified by the level of
ignorance demonstrated by lawmakers gearing up to regulate online
technology they don't even begin to grasp? Cool. Now you have a
tiny glimpse into a day in the life of a gun owner. -- Sean Davis
-----------------------------------------------------------------------
6 days until George Washington's 289th Birthday
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
Yep, so far so good. Thank you again for the pointers and creating
the rules so quickly.

On Tue, Feb 16, 2021 at 9:06 PM John Hardin <jhardin@impsec.org> wrote:
>
> On Tue, 16 Feb 2021, Ricky Boone wrote:
>
> > On Mon, Feb 15, 2021 at 12:16 AM John Hardin <jhardin@impsec.org> wrote:
> >>
> >> OK, I added FUZZY_OVERSTOCK as well, we'll see what happens.
> >>
> >> If they don't perform well in masscheck you can always grab them out of my
> >> sandbox for your local rules.
> >>
> >> Masscheck results:
> >>
> >> https://ruleqa.spamassassin.org/?rule=%2FFUZZY_
> >
> > Nice, thanks!
> >
> > I see the test rules got picked up with sa-update, and they all work
> > against the samples I have. It does appear that T_FUZZY_APPLE is
> > catching some FP's. Word boundaries might need to be added, as words
> > like "happiest" get caught by it.
>
> Yep, I've addressed that, take a look at the latest masscheck results.
>
>
> --
> John Hardin KA7OHZ http://www.impsec.org/~jhardin/
> jhardin@impsec.org pgpk -a jhardin@impsec.org
> key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
> Are you a mildly tech-literate politico horrified by the level of
> ignorance demonstrated by lawmakers gearing up to regulate online
> technology they don't even begin to grasp? Cool. Now you have a
> tiny glimpse into a day in the life of a gun owner. -- Sean Davis
> -----------------------------------------------------------------------
> 6 days until George Washington's 289th Birthday
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On 2/16/2021 2:06 PM, RW wrote:
> That's not a bad idea, but if anyone is interested I'd suggest copying
> the character matching regexes into ordinary rules. Or better still into
> template tags, so that they can be reused in multiple rules.
Agreed, RW.  Most of the stuff in there originated from rules to start with.
I do discuss it in the project's Wiki.  If one person's "kruft"  is
someone else's
gain, that's fine with me.

It was the intent of this module to be useful enough for the Noob, yet
interesting enough for the Pros.  In your case, the latter.
> I don't think there's much, if anything, in that module that benefits
> from being in perl.
Counts and amounts; even variable arithmetic amounts based on counts.
Everything else is just a regex.
>
> Also the "adaptive scoring" seems like a bad idea to me. The scores are
> hard-coded fractions of one of the three thresholds. The choice of
> which threshold is used is also hard-coded per rule. The only sense in
> which it's adaptive is that it opposes an admin adjusting how
> aggressive the filtering should be.
Yes, perhaps.  But armed with a rule, a score, and a baseline reference
like
{chaos_tag}, I can deliver scoring with the EXACT weighting that the author
intended.  Downloading ANYBODY's rules is a risk, since one does not
know the context in which the rules were developed.  That's a Day 1 bitch
of mine to the SA Adminisphere.  'nuff said.

Also of interest to me are Time-Of-Day/Day-Of-Week scoring.
They only come out at night.

This is an ACTIVE project.  In the project's Discussion forums, I do outline
a development roadmap (To Do) and also a peek at what's coming up.

This includes incorporating TAG-ONLY modes.
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Wed, 17 Feb 2021 10:23:13 -0500
Jared Hall wrote:

> On 2/16/2021 2:06 PM, RW wrote:

> > I don't think there's much, if anything, in that module that
> > benefits from being in perl.
> Counts and amounts; even variable arithmetic amounts based on counts.
> Everything else is just a regex.


You can do that with meta rules, which support arithmetic and comparison
operators. You can count regex hits with the "multiple" flag and an
optional "maxhits=...".


> > Also the "adaptive scoring" seems like a bad idea to me. The scores
> > are hard-coded fractions of one of the three thresholds. The choice
> > of which threshold is used is also hard-coded per rule. The only
> > sense in which it's adaptive is that it opposes an admin adjusting
> > how aggressive the filtering should be.
> Yes, perhaps.  But armed with a rule, a score, and a baseline
> reference like {chaos_tag},  


I'm not sure what you mean by "armed with...a score", most of the rules
are scored like this:

$score = 0.33 * $pms->{conf}->{chaos_tag};

governed by a single global tunable and various hard-coded multipliers.


> Downloading ANYBODY's rules is a risk, since one
> does not know the context in which the rules were developed. 


Which is why the scores need to be overridden on a per rule basis. Some
rules translate to another system much better than others. For example
rules about emojis developed in a corporate environment may not work as
well on student mail.

The chief problem with your scoring is that it overrides scores in
the local configuration where score overrides would normally go.
Re: Homoglyph spam/phishing targeting popular brands [ In reply to ]
On Sun, Feb 14, 2021 at 4:45 PM John Hardin <jhardin@impsec.org> wrote:
>
> I've added FUZZY rules for amazon, apple, microsoft, facebook, paypal and
> norton to my sandbox, they are likely going to be fairly commonB.

Looks like the FUZZY_PAYPAL rule may need word boundaries added to the
regex. I'm seeing it catch phrases like "pay pai", but with full
context the phrase may be "...back pay paid out in...".

Other than that, the rules are looking good. I've taken some of the
examples and started new rules for other phishing words/phrases I'm
seeing getting through (obfuscated versions of Validation,
Verification, etc.). Thank you again for the suggestions, and for
your help with this.