Mailing List Archive

Flase Positives on Hebrew Emails (because Hebrew is considered "raw illegal characters")
Hi Everyone,

In /etc/mail/spamassassin/local.df I have the following line:

ok_languages en he

This supposedly allows English and Hebrew.
However, I get lots of false positive on Hebrew letters because it wrongly
identifies Hebrew letters as "raw illegal characters" and gives them high
scores.
You can see 7 points are contributed to the scroe just from these 2 lines
copied from the reports:

4.3 FROM_ILLEGAL_CHARS From contains too many raw illegal
characters
2.7 SUBJ_ILLEGAL_CHARS Subject contains too many raw illegal
characters

Anyone can advise on this?

--
Ilan Aisic
Pointer Software Systems, Ltd.
Tel: +972-4-993-7330; Fax: +972-4-993-5332, Cell: +972-64-815-533
Re: Flase Positives on Hebrew Emails (because Hebrew is considered "raw illegal characters") [ In reply to ]
"Ilan Aisic" <ilan@pointer.co.il> writes:

> However, I get lots of false positive on Hebrew letters because it
> wrongly identifies Hebrew letters as "raw illegal characters" and
> gives them high scores. You can see 7 points are contributed to the
> scroe just from these 2 lines copied from the reports:
>
> 4.3 FROM_ILLEGAL_CHARS From contains too many raw illegal characters
> 2.7 SUBJ_ILLEGAL_CHARS Subject contains too many raw illegal characters
> Anyone can advise on this?

Yes, they are illegal characters in the From and Subject
lines. Non-ASCII characters in header line should be encoded using
RFC2047.
Re: Flase Positives on Hebrew Emails (because Hebrew is considered "raw illegal characters") [ In reply to ]
On Wed, 25 Feb 2004, Ilan Aisic wrote:

> Hi Everyone,
>
> In /etc/mail/spamassassin/local.df I have the following line:
>
> ok_languages en he
>
> This supposedly allows English and Hebrew.
> However, I get lots of false positive on Hebrew letters because it wrongly
> identifies Hebrew letters as "raw illegal characters" and gives them high
> scores.
> You can see 7 points are contributed to the scroe just from these 2 lines
> copied from the reports:
>
> 4.3 FROM_ILLEGAL_CHARS From contains too many raw illegal
> characters
> 2.7 SUBJ_ILLEGAL_CHARS Subject contains too many raw illegal
> characters
>
> Anyone can advise on this?

No that isn't a false positive, that's an appropriate hit against a
bad e-mail client that is violating internet standards.

Internet standard RFC-2822 (section 2.2) unequivocally states that you
MUST use only 7-bit characters in e-mail HEADERS. If you want to represent
non-7-bit characters in a header (such as 'From' or 'Subject') you must
use some kind of encoding (such as 'QP' or Base64), not the "raw" data.

Good e-mail client programs follow RFC standards and would not generate
such messages. Spammers usually are not concerned with following
standards and are more likely to generate such garbage.

The "ok_languages" option sets the types of languages that you will
accept, after they're decoded. (IE what kinds of character sets
can be encoded).

Thus if somebody were to send you a message with an encoded Korean
subject your SA would score against that.


--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{
RE: Flase Positives on Hebrew Emails (because Hebrew is considered "raw illegal characters") [ In reply to ]
OK thanks for enlightening me.
Indeed there are plenty of other Hebrew email messages that don't trigger
these rules.
I guess I'll have to educate some of the violators in here.

--ilan


> -----Original Message-----
> From: David B Funk [mailto:dbfunk@engineering.uiowa.edu]
> Sent: Wednesday, February 25, 2004 1:24 PM
> To: Ilan Aisic
> Cc: spamassassin-users@incubator.apache.org
> Subject: Re: Flase Positives on Hebrew Emails (because Hebrew
> is considered "raw illegal characters")
>
>
> On Wed, 25 Feb 2004, Ilan Aisic wrote:
>
> > Hi Everyone,
> >
> > In /etc/mail/spamassassin/local.df I have the following line:
> >
> > ok_languages en he
> >
> > This supposedly allows English and Hebrew.
> > However, I get lots of false positive on Hebrew letters
> because it
> > wrongly identifies Hebrew letters as "raw illegal characters" and
> > gives them high scores.
> > You can see 7 points are contributed to the scroe just
> from these 2
> > lines copied from the reports:
> >
> > 4.3 FROM_ILLEGAL_CHARS From contains too many raw illegal
> > characters
> > 2.7 SUBJ_ILLEGAL_CHARS Subject contains too many
> raw illegal
> > characters
> >
> > Anyone can advise on this?
>
> No that isn't a false positive, that's an appropriate hit
> against a bad e-mail client that is violating internet standards.
>
> Internet standard RFC-2822 (section 2.2) unequivocally states
> that you MUST use only 7-bit characters in e-mail HEADERS. If
> you want to represent non-7-bit characters in a header (such
> as 'From' or 'Subject') you must use some kind of encoding
> (such as 'QP' or Base64), not the "raw" data.
>
> Good e-mail client programs follow RFC standards and would
> not generate such messages. Spammers usually are not
> concerned with following standards and are more likely to
> generate such garbage.
>
> The "ok_languages" option sets the types of languages that
> you will accept, after they're decoded. (IE what kinds of
> character sets can be encoded).
>
> Thus if somebody were to send you a message with an encoded
> Korean subject your SA would score against that.
>
>
> --
> Dave Funk University of Iowa
> <dbfunk (at) engineering.uiowa.edu> College of Engineering
> 319/335-5751 FAX: 319/384-0549 1256 Seamans Center
> Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{
>