Mailing List Archive: (Re-)emergence of UTF based obfuscation in phishing/spam

Something I noticed on a set of emails that were reported to me.

I have custom rules to look out for certain names in From:name. The
messages should have been caught by them, however upon inspection the
name was UTF-8 encoded, and included a character that doesn't seem to
render, but interferes with the regex I used. Specifically, the bad
actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f)
effectively as a null-space character. The body of the message was
also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO
WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly
placed within the body and within words to interfere with other rules.
When debugging the message, it doesn't appear that the characters are
normalized, so from SA's perspective it seems like all of these
characters have to be accounted for with any rules.

To add, I'm currently on SA 3.6.x. It looks like 4.0 improves UTF-8
handling, but I'm not sure if it would address the behavior I see
(though happy to be wrong... albeit not able to update immediately).

I'm trying to see if ReplaceTags might be useful, and found an older
discussion in this list on the matter related to the trouble with
UTF-8. I checked to see if there were any existing tags that would
account for null-space/zero-width space-like characters, but didn't
see any. I have no issues working on creating a tag, but wanted to
gauge the community to see what their thoughts were while I started
down that path.

Typo, I meant to say I was on SA 3.4.6.

On Wed, Aug 30, 2023, 3:22 PM Ricky Boone <ricky.boone@gmail.com> wrote:

> Something I noticed on a set of emails that were reported to me.
>
> I have custom rules to look out for certain names in From:name. The
> messages should have been caught by them, however upon inspection the
> name was UTF-8 encoded, and included a character that doesn't seem to
> render, but interferes with the regex I used. Specifically, the bad
> actor included a RIGHT-TO-LEFT mark (U+200F, or \xe2\x80\x8f)
> effectively as a null-space character. The body of the message was
> also flooded with LEFT-TO-RIGHT (U+200E, or \xe2\x80\x8e) and ZERO
> WIDTH NO-BREAK SPACE (U+FEFF, or \xef\xbb\xbf) characters randomly
> placed within the body and within words to interfere with other rules.
> When debugging the message, it doesn't appear that the characters are
> normalized, so from SA's perspective it seems like all of these
> characters have to be accounted for with any rules.
>
> To add, I'm currently on SA 3.6.x. It looks like 4.0 improves UTF-8
> handling, but I'm not sure if it would address the behavior I see
> (though happy to be wrong... albeit not able to update immediately).
>
> I'm trying to see if ReplaceTags might be useful, and found an older
> discussion in this list on the matter related to the trouble with
> UTF-8. I checked to see if there were any existing tags that would
> account for null-space/zero-width space-like characters, but didn't
> see any. I have no issues working on creating a tag, but wanted to
> gauge the community to see what their thoughts were while I started
> down that path.
>