Mailing List Archive: Unicode considered harmful again

Unicode considered harmful again

ruga at protonmail

Nov 3, 2021, 10:23 PM

Post #1 of 12 (1248 views)

Permalink

Please convert all source code to ASCII. If it fails to compile, then it may have a trojan hiding in Unicode clothing.

Re: Unicode considered harmful again [ In reply to ]

spamassassin at arcsin

Nov 3, 2021, 11:45 PM

Post #2 of 12 (1248 views)

Permalink

> Please convert all source code to ASCII. If it fails to compile, then it may have a trojan hiding in Unicode clothing.

Instructions unclear.

Re: Unicode considered harmful again [ In reply to ]

ruga at protonmail

Nov 4, 2021, 12:37 AM

Post #3 of 12 (1248 views)

Permalink

-------- Original Message --------
On Nov 4, 2021, 07:45, Damian < spamassassin@arcsin.de> wrote:

>> Please convert all source code to ASCII. If it fails to compile, then it may have a trojan hiding in Unicode clothing.

>Instructions unclear.

CVE 2021-42574

Re: Unicode considered harmful again [ In reply to ]

spamassassin at arcsin

Nov 4, 2021, 1:34 AM

Post #4 of 12 (1248 views)

Permalink

> >> Please convert all source code to ASCII. If it fails to compile,
> then it may have a trojan hiding in Unicode clothing.
>
> >Instructions unclear.
>
> CVE 2021-42574

It remains unclear (to me). What source code should spamassassin-users
convert? Attached source code in emails? How should they convert, is
there a SpamAssassin-Plugin? Should they install compilers on their mail
system?

Re: Unicode considered harmful again [ In reply to ]

ruga at protonmail

Nov 4, 2021, 1:59 AM

Post #5 of 12 (1248 views)

Permalink

-------- Original Message --------
On Nov 4, 2021, 09:34, Damian < spamassassin@arcsin.de> wrote:
> >> Please convert all source code to ASCII. If it fails to compile,
> then it may have a trojan hiding in Unicode clothing.
>
> >Instructions unclear.
>
> CVE 2021-42574

> It remains unclear (to me). What source code should spamassassin-users convert? Attached source code in emails? How should they convert, is there a SpamAssassin-Plugin? Should they install compilers on their mail system?

The CVE is a call to action for the developers. On users, if SA can safely detect an attack, it should report it.

Re: Unicode considered harmful again [ In reply to ]

jared at jaredsec

Nov 4, 2021, 5:45 AM

Post #6 of 12 (1248 views)

Permalink

On 11/4/2021 3:37 AM, Rupert Gallagher wrote:
>
> -------- Original Message --------
> On Nov 4, 2021, 07:45, Damian < spamassassin@arcsin.de> wrote:
>
> >> Please convert all source code to ASCII. If it fails to compile,
> then it may have a trojan hiding in Unicode clothing.
>
> >Instructions unclear.
>
> CVE 2021-42574

Love it!!!

Yes, and it's companion: CVE-2021-42694

Here are the key takeaways:

1) Some people write things from right to left. Beware the evil BIDI.
2) Beware of using somebody else's source code :)
3) Homoglyphs/Punycode, like Doppelgängers, DO exist! (sorry about the
Unicode: ????? ?? ????????)
4) Code containing BIDIs and Homoglyphs can be found on GitHub. Oh, my!

Hard to believe that Cambridge even accepted their paper:
https://trojansource.codes/trojan-source.pdf
From their paper: "We present proofs of concept for C, C++, C#,
JavaScript, Java, Rust, Go, and Python".
That's where they went wrong. Most PERLers here would be 4xPHDs by
Cambridge's standards.

Oops, I used their reference so I must acknowledge those rocket
scientists as per their instruction:

/@article{boucher_trojansource_2021,/
/    title = {Trojan {Source}: {Invisible} {Vulnerabilities}},/
/    url = {https://trojansource.codes/trojan-source.pdf},/
/    journal = {Preprint.},/
/    author = {Nicholas Boucher and Ross Anderson},/
/    year = {2021}/
/}/

On a funny side note, the most popular question at every Unicode
conference is: "Why are all the character descriptions written in ASCII?"

I say that if we all just wrote in Ordinals, the world would be a
happier place!

-- Jared Hall

Re: Unicode considered harmful again [ In reply to ]

sausers-20150205 at billmail

Nov 4, 2021, 7:44 AM

Post #7 of 12 (1248 views)

Permalink

On 2021-11-04 at 08:45:02 UTC-0400 (Thu, 4 Nov 2021 08:45:02 -0400)
Jared Hall <jared@jaredsec.com>
is rumored to have said:

[...]
> 2) Beware of using somebody else's source code :)

That's the really significant warning...

The relevance to SA is that it uses a config system with "rules" that
can be auto-updated and are which de facto source code: somebody else's
source code. :)

We do not currently publish non-ASCII rules in the default ruleset
channel. I don't believe that KAM ever does so. At least one 3rd-party
ruleset has done so in the past, generating errors and warnings from
some versions of Perl. Through 3.x, SA does not have conscious support
for non-ASCII rules and while it is possible that SA could be vulnerable
to something akin to CVE-2021-42574 and CVE-2021-42694 via malicious
rules, it would be a noisy and rather difficult attack.

In v4.x, Unicode support will be better. That also means it may be
easier to make this sort of attack quieter in the future, as non-ASCII
rules won't be definitively wrong as they are now.

--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire

Re: Unicode considered harmful again [ In reply to ]

jared at jaredsec

Nov 4, 2021, 9:09 AM

Post #8 of 12 (1248 views)

Permalink

On 11/4/2021 10:44 AM, Bill Cole wrote:
> On 2021-11-04 at 08:45:02 UTC-0400 (Thu, 4 Nov 2021 08:45:02 -0400)
> Jared Hall <jared@jaredsec.com>
> is rumored to have said:
>
> [...]
>> 2) Beware of using somebody else's source code :)
>
> That's the really significant warning...

Agreed. Does one need to write a paper and publish a couple of CVEs for
that? I thought Mitre or whoever runs CVE nowadays would triage these
types of reports through a "Captain Obvious" department to sort Wants
from Needs.

>
> We do not currently publish non-ASCII rules in the default ruleset
> channel. I don't believe that KAM ever does so.

KAM certainly has. I do recall seeing at least an infinity symbol as
well as the Euro symbol in his rulesets last I looked. NBD, works
anyway. I crank out hex when dealing with Unicode, and I have tons of
that. I have a nice Unicode converter that works on strings. One of
these days I'll change it to parse entire files; Heinlein's stuff for
instance.

> In v4.x, Unicode support will be better. That also means it may be
> easier to make this sort of attack quieter in the future, as non-ASCII
> rules won't be definitively wrong as they are now.

I have my own thoughts/reservations about distributing Unicode
rulesets. Challenging days ahead, to be sure. It'd sure be nice to get
sa-compile to run entirely clean though.

Thanks,

-- Jared Hall

Re: Unicode considered harmful again [ In reply to ]

me at junc

Nov 4, 2021, 5:27 PM

Post #9 of 12 (1241 views)

Permalink

On 2021-11-04 09:34, Damian wrote:
>> >> Please convert all source code to ASCII. If it fails to compile, then it may have a trojan hiding in Unicode clothing.
>>
>> >Instructions unclear.
>>
>> CVE 2021-42574
>
> It remains unclear (to me). What source code should spamassassin-users
> convert? Attached source code in emails? How should they convert, is
> there a SpamAssassin-Plugin? Should they install compilers on their
> mail system?

https://bugs.gentoo.org/807781

not all 3dr party have clean rules with leds to that problem

==============================================================================
$ perl -ne 'print "$. $_" if m/[\x80-\xFF]/'
/var/lib/spamassassin/3.004006/updates_spamassassin_org/50_scores.cf
526 # Validity (née ReturnPath) Certified
==============================================================================

i dont have tested if its solved in defeault rules now, but kam and ita
channel still have it

we are all waiting for spamassassin 4.x

Re: Unicode considered harmful again [ In reply to ]

lwilton at earthlink

Nov 4, 2021, 7:05 PM

Post #10 of 12 (1241 views)

Permalink

> In v4.x, Unicode support will be better. That also means it may be easier
> to make this sort of attack quieter in the future, as non-ASCII rules
> won't be definitively wrong as they are now.

The question is whether non-ascii malicious rules could do anything more
damaging than simply failing to match on the obvious strings "visible" in
the rule, or alternately deliberately match on some string that should not
be matched, in some form of DOS attempt.

It's hard to see how someone could inject Perl (or any other) code with
screwy rules. There was a time Perl code was allowed in rules, that was
disallowed many years ago:

uri LW_PRINTIT /(^.*$)(?{ print "URI:\n$^N\nEnd URI\n\n" })/is

That was a real handy debugging rule once, but you can't get away with that
anymore.

Loren

Re: Unicode considered harmful again [ In reply to ]

jhardin at impsec

Nov 5, 2021, 7:50 AM

Post #11 of 12 (1241 views)

Permalink

On Fri, 5 Nov 2021, Benny Pedersen wrote:

> On 2021-11-04 09:34, Damian wrote:
>>> >> Please convert all source code to ASCII. If it fails to compile, then
>>> it may have a trojan hiding in Unicode clothing.
>>>
>>> >Instructions unclear.
>>>
>>> CVE 2021-42574
>>
>> It remains unclear (to me). What source code should spamassassin-users
>> convert? Attached source code in emails? How should they convert, is
>> there a SpamAssassin-Plugin? Should they install compilers on their
>> mail system?
>
> https://bugs.gentoo.org/807781
>
> not all 3dr party have clean rules with leds to that problem
>
> ==============================================================================
> $ perl -ne 'print "$. $_" if m/[\x80-\xFF]/'
> /var/lib/spamassassin/3.004006/updates_spamassassin_org/50_scores.cf
> 526 # Validity (née ReturnPath) Certified
> ==============================================================================

And what of the BIDI sequence that actually causes the problem?

All Of Unicode is not the problem.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
does quite what I want. I wish Christopher Robin was here."
-- Peter da Silva in a.s.r
-----------------------------------------------------------------------
2 days until Daylight Saving Time ends in U.S. - Fall Back
Getting an extra hour of 2021 is like
getting a free track on a Yoko Ono album.

Re: Unicode considered harmful again [ In reply to ]

jared at jaredsec

Nov 5, 2021, 11:20 AM

Post #12 of 12 (1241 views)

Permalink

On 11/5/2021 10:50 AM, John Hardin wrote:
>
> And what of the BIDI sequence that actually causes the problem?

1) The authors cite, as Reference 18, a 2011 Krebs article:
'Right-to-Left Override' Aids Email Attacks
https://krebsonsecurity.com/2011/09/right-to-left-override-aids-email-attacks/

That's relevant to SA/Email in a general fashion.

The authors were concerned about their use within compilers (other than
in text strings). They found some bad apples (unnamed) on GitHub. They
also found valid use cases on GitHub as well. Go figure.
>
> All Of Unicode is not the problem.
>
NONE of Unicode is the problem. The CVEs should've been issued against
the 19 companies/organizations they talked to, not Unicode. Unless you
want to "Adopt-a-Character" or something, Unicode is not going to do
anything about it.

-----

Speaking of the Unicode Consortium's "Adopt-a-Character" program, I
mentioned that to my psychiatrist a while back. "It's only a hundred
bucks", I told her.

She probes, "If you could be a character, which would you be?"

"That's easy", I said, "I'd be a F09F."

"That certainly sounds very specific, Jared. Why that one?" she queried.

I chuckled, "Because then I could hook up with any other character and
make a great Emoji"

Happy Friday,

-- Jared Hall