Mailing List Archive

Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header.
Hi Gang

At the moment we see a lot of phishing emails with UTF-8 encoded
subject containing emojis like:

=?UTF-8?Q?=E2=9C=85_Dein_Paket_wartet_auf_dich!_-14.12.2021-?=

I noticed a Rule:

header PP001 Subject =~ /=\?UTF-8\?Q\?=E2=9C=85_Dein_Paket/

is not matching.

Neiter does: /Dein_Paket/

But /Dein\ Paket/

Does match. So it looks like SpamAssassin is passing the 'decoded'
header.
Unfortunately the 'readable' part is a bit too generic. I would like to
also match the emoji.

How do I do this? There is no rawheader or rawbody matcher as far as I
could determine.

--
Mit freundlichen Grüssen

-Benoît Panizzon- @ HomeOffice und normal erreichbar
--
I m p r o W a r e A G - Leiter Commerce Kunden
______________________________________________________

Zurlindenstrasse 29 Tel +41 61 826 93 00
CH-4133 Pratteln Fax +41 61 826 93 01
Schweiz Web http://www.imp.ch
______________________________________________________
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
> How do I do this? There is no rawheader or rawbody matcher as far as I
> could determine.

There is 'rawbody', but it may or may not help you. I seem to recall the
Subject is prepended to the body text, but I don't recall if it is prepended
to rawbody. You could try it.

Short of that, you may have to fall back on 'full' and match for something
like

full MY_SUB /\nSubject: <something>\n/
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
On Tue, Dec 14, 2021 at 11:36:00AM +0100, Beno?t Panizzon wrote:
> Hi Gang
>
> At the moment we see a lot of phishing emails with UTF-8 encoded
> subject containing emojis like:
>
> =?UTF-8?Q?=E2=9C=85_Dein_Paket_wartet_auf_dich!_-14.12.2021-?=
>
> I noticed a Rule:
>
> header PP001 Subject =~ /=\?UTF-8\?Q\?=E2=9C=85_Dein_Paket/
>
> is not matching.
>
> Neiter does: /Dein_Paket/
>
> But /Dein\ Paket/
>
> Does match. So it looks like SpamAssassin is passing the 'decoded'
> header.
> Unfortunately the 'readable' part is a bit too generic. I would like to
> also match the emoji.
>
> How do I do this? There is no rawheader or rawbody matcher as far as I
> could determine.

Subject:raw =~ /=E2=9C=85/

Or better yet, just match the UTF-8 bytes as is, which will work regardless
of encoding method:

Subject =~ /\xE2\x9C\x85/
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
On 12/14/2021 5:36 AM, Benoît Panizzon wrote:
> How do I do this? There is no rawheader or rawbody matcher as far as I
> could determine.

The :raw modifier is what you're looking for:

header     PP001        Subject:raw =~ /=\?UTF-8\?Q\?=E2=9C=85_Dein_Paket/

You can use the decoded/literal format by expressing the Emoji in its 3
hexadecimal bytes:

header    PP001         Subject =~ /\xE2\x9C\x85 Dein Paket/


Regards,

-- Jared Hall
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
Look into ‘normalize_charset 1’. For background maybe this:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
On 14.12.21 17:46, David Bürgin wrote:
>Look into ‘normalize_charset 1’. For background maybe this:
>https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656

from what I remember, normalize_charset should not be used until SA 4.*

--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Support bacteria - they're the only culture some people have.
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
On 2021-12-14 at 13:18:09 UTC-0500 (Tue, 14 Dec 2021 19:18:09 +0100)
Matus UHLAR - fantomas <uhlar@fantomas.sk>
is rumored to have said:

> On 14.12.21 17:46, David Bürgin wrote:
>> Look into ‘normalize_charset 1’. For background maybe this:
>> https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656
>
> from what I remember, normalize_charset should not be used until SA
> 4.*

I know of no reason to NOT use normalize_charset=1 in any 3.4.x version.

I believe it will either be the default in 4.x or that it won't even be
switchable. The most clear major goal for 4.x is to handle Unicode
charsets better, and we may end up with a situation akin to that of the
Python 2->3 switch, made hellish by the Unicode support. Hopefully not,
since we are just following behind and making use of a lot of work done
in recent versions of Perl without major headaches.



--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: Rawheader or Rawsubject? Or how to match UTF-8 Emoji in Header. [ In reply to ]
>>On 14.12.21 17:46, David Bürgin wrote:
>>>Look into ‘normalize_charset 1’. For background maybe this:
>>>https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656

>On 2021-12-14 at 13:18:09 UTC-0500 (Tue, 14 Dec 2021 19:18:09 +0100) Matus
>UHLAR - fantomas <uhlar@fantomas.sk> is rumored to have said:
>>from what I remember, normalize_charset should not be used until SA 4.*

On 14.12.21 18:31, Bill Cole wrote:
>I know of no reason to NOT use normalize_charset=1 in any 3.4.x version.

https://mail-archives.apache.org/mod_mbox/spamassassin-users/201812.mbox/<20181224081658.GA11272%40hege.li>

>I believe it will either be the default in 4.x or that it won't even
>be switchable. The most clear major goal for 4.x is to handle Unicode
>charsets better, and we may end up with a situation akin to that of
>the Python 2->3 switch, made hellish by the Unicode support. Hopefully
>not, since we are just following behind and making use of a lot of
>work done in recent versions of Perl without major headaches.

it's gonna be the default:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7656


--
Matus UHLAR - fantomas, uhlar@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
"To Boot or not to Boot, that's the question." [WD1270 Caviar]