Mailing List Archive

Rule HK_SCAM is triggered by standard business email
I opened a bug (7832) about this but was told to report on the SA users mailing list instead.

The attached email is an example which triggers the HK_SCAM rule. Looks like __HK_SCAM_S7
is the culprit here since it matches the words "business" and "enterprise" when they are
found one after the other (even on different lines).

In the real world this was triggered by a business email that had the following in the
signature:

FirstName LastName
Altice Business
Enterprise Account Executive

- Aner
Re: Rule HK_SCAM is triggered by standard business email [ In reply to ]
On Wed, 1 Jul 2020, Aner Perez wrote:

> I opened a bug (7832) about this but was told to report on the SA users
> mailing list instead.
>
> The attached email is an example which triggers the HK_SCAM rule. Looks like
> __HK_SCAM_S7 is the culprit here since it matches the words "business" and
> "enterprise" when they are found one after the other (even on different
> lines).
>
> In the real world this was triggered by a business email that had the
> following in the signature:
>
> FirstName LastName
> Altice Business
> Enterprise Account Executive

What was the *overall* score of that message? Was this rule enough to push
the message over the spam threshold (5 points)? Or was the message still
scored as ham?

It looks like to me like the logic in __HK_SCAM_S7 is a little off...

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture) (?:enterprise|propos(?:al|ition)))/i

seems like it should be:

/(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i

...but I'll let Henrik comment.


Potentially, making it a rawbody rule might avoid this FP without
affecting its performance against the targeted spams...


For future reference: sending a sample email to the list as a bare
attachment is problematic, as it may be altered en-route and thus
invalidate any meaningful analysis. It's better to attach it as a
zip/gzip, or to upload it to someplace like Pastebin and just post the URL
to it here. (In this case, your description should probably be enough to
figure it out without the sample so you shouldn't need to do that unless
someone explicitly asks you to do so.)



--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The philosophy of gun control: Teenagers are roaring through
town at 90MPH, where the speed limit is 25. Your solution is to
lower the speed limit to 20. -- Sam Cohen
-----------------------------------------------------------------------
3 days until the 244th anniversary of the Declaration of Independence
Re: Rule HK_SCAM is triggered by standard business email [ In reply to ]
On 7/1/20 3:52 PM, John Hardin wrote:
> On Wed, 1 Jul 2020, Aner Perez wrote:
>
>> I opened a bug (7832) about this but was told to report on the SA users mailing list
>> instead.
>>
>> The attached email is an example which triggers the HK_SCAM rule.  Looks like
>> __HK_SCAM_S7 is the culprit here since it matches the words "business" and "enterprise"
>> when they are found one after the other (even on different lines).
>>
>> In the real world this was triggered by a business email that had the following in the
>> signature:
>>
>> FirstName LastName
>> Altice Business
>> Enterprise Account Executive
>
> What was the *overall* score of that message? Was this rule enough to push the message
> over the spam threshold (5 points)? Or was the message still scored as ham?

In our case it was marked as spam but only because we have the spam threshold set very low
(2.4). The message scored a 3.357 when the BAYES_50 was added in.

>
> It looks like to me like the logic in __HK_SCAM_S7 is a little off...
>
> /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture)
> (?:enterprise|propos(?:al|ition)))/i
>
> seems like it should be:
>
> /(?:(?:investment|proposed|lucrative) (?:business|venture)|(?:business|venture|enterprise)
> propos(?:al|ition))/i
>

That makes more sense but the rule still seems like it would be easily triggered by
standard business talk (e.g. business proposal). I guess that's the nature of business
emails... they're naturally spammy.

> ...but I'll let Henrik comment.
>
>
> Potentially, making it a rawbody rule might avoid this FP without affecting its
> performance against the targeted spams...
>
>
> For future reference: sending a sample email to the list as a bare attachment is
> problematic, as it may be altered en-route and thus invalidate any meaningful analysis.
> It's better to attach it as a zip/gzip, or to upload it to someplace like Pastebin and
> just post the URL to it here. (In this case, your description should probably be enough to
> figure it out without the sample so you shouldn't need to do that unless someone
> explicitly asks you to do so.)
>

Thanks I'll keep that in mind.

- Aner
Re: Rule HK_SCAM is triggered by standard business email [ In reply to ]
On Wed, 1 Jul 2020, Aner Perez wrote:

> On 7/1/20 3:52 PM, John Hardin wrote:
>> On Wed, 1 Jul 2020, Aner Perez wrote:
>>
>>> I opened a bug (7832) about this but was told to report on the SA users
>>> mailing list instead.
>>>
>>> The attached email is an example which triggers the HK_SCAM rule.? Looks
>>> like __HK_SCAM_S7 is the culprit here since it matches the words
>>> "business" and "enterprise" when they are found one after the other (even
>>> on different lines).
>>>
>>> In the real world this was triggered by a business email that had the
>>> following in the signature:
>>>
>>> FirstName LastName
>>> Altice Business
>>> Enterprise Account Executive
>>
>> What was the *overall* score of that message? Was this rule enough to push
>> the message over the spam threshold (5 points)? Or was the message still
>> scored as ham?
>
> In our case it was marked as spam but only because we have the spam
> threshold set very low (2.4). The message scored a 3.357 when the
> BAYES_50 was added in.

Yeah, that's why doing that blindly is a bad idea. Masscheck sets the base
rule scores so that spams score 5 points. If you reduce the spam
threshold, you increase FPs. You need to compensate for that if you do it.

>> It looks like to me like the logic in __HK_SCAM_S7 is a little off...
>>
>> /(?:(?:investment|proposed|lucrative)
>> (?:business|venture)|(?:business|venture)
>> (?:enterprise|propos(?:al|ition)))/i
>>
>> seems like it should be:
>>
>> /(?:(?:investment|proposed|lucrative)
>> (?:business|venture)|(?:business|venture|enterprise) propos(?:al|ition))/i
>>
>
> That makes more sense but the rule still seems like it would be easily
> triggered by standard business talk (e.g. business proposal). I guess that's
> the nature of business emails... they're naturally spammy.

Agreed, that's why I want Henrik to comment. I don't have the corpus he
used to develop that rule.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org FALaholic #11174 pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Of the twenty-two civilizations that have appeared in history,
nineteen of them collapsed when they reached the moral state the
United States is in now. -- Arnold Toynbee
-----------------------------------------------------------------------
3 days until the 244th anniversary of the Declaration of Independence
Re: Rule HK_SCAM is triggered by standard business email [ In reply to ]
On Wed, 2020-07-01 at 16:20 -0400, Aner Perez wrote:
> It looks like to me like the logic in __HK_SCAM_S7 is a little
> > off...
> >
> > /(?:(?:investment|proposed|lucrative)
> > (?:business|venture)|(?:business|venture)
> > (?:enterprise|propos(?:al|ition)))/i
> >
> > seems like it should be:
> >
> > /(?:(?:investment|proposed|lucrative)
> > (?:business|venture)|(?:business|venture|enterprise)
> > propos(?:al|ition))/i
> >
>
IME using a meta-rule that ANDs two rules of that type works well.

The key is to put words or phrases that often occur in spam in each of
the sub-rules, for instance having selling jargon ("lowest prices",
"unbeatable value") in one rule and product names ("flip flops",
"vodka", "power packs") in the other. As a benefit, if the lists are
well-chosen from words and phrases from spam you've received, it will
also hit on sales spam using combinations you've not previously seen
while being surprisingly good at not giving FPs on business or personal
letters.

The only disadvantage is that the subrules get a bit unwieldy and hard
to edit once their definitions get much longer than 80 characters. That
aside, they're easy to understand and maintain.

Martin
Re: Rule HK_SCAM is triggered by standard business email [ In reply to ]
On Wed, Jul 01, 2020 at 01:29:51PM -0700, John Hardin wrote:
>
> Agreed, that's why I want Henrik to comment. I don't have the corpus he used
> to develop that rule.

It's really old rules, I don't have either. ;-)

__HK_SCAM_S7 seems to have regressed FP wise, just gonna drop it..
Re: Rule HK_SCAM is triggered by standard business email [ In reply to ]
On 01 Jul 2020, at 14:20, Aner Perez <aner@ncstech.com> wrote:
> we have the spam threshold set very low (2.4)

This is a terrible idea and exposes a fundamental misunderstanding of how SA works.

If SA scores an email as 3.3 then the message is not considered spam by SA. If you ignore this and mark it as sam anyway, you have no one to blame but yourself. Reducing the threshold increases the number of non-spam messages that are marked as spam. It will also have very little effect on actual spam messages. The only exception to this is if you have a badly trained Bayes, as that can swing the scoring quite a lot.

Set your threshold back to 5.0 and train your Bayes with actual spam you receive and actual ham you receive. The best Spam to train is spam that is not tagged by SA as spam (ignoring the bayes portion of a score). So, a message marked at 5.5 with BAYES_50 is a price candidate for training as it would be marked 4.7 without the BAYES_50.

It would have been better, I think, had SA designed the system to score anything over 0 as spam and anything under 0 as ham as I suspect very few people would make this mistake, but it's a bit late for that now.

Just think of it this way, when you set the threshold below 5, you are saying to SA "please mark legitimate mail theat I want to receive as spam."



--
'Oh, them as makes the endings don't get them,' said Granny.
--Maskerade