Mailing List Archive

A few noob questions
Please forgive me if these are easy/common questions. I have done some
searching and haven't found any clear answers.

I'm running SpamAssassin 3.4.4 in a cPanel environment.

1. What is the smallest increment for a rule score? I see some
indications that it's 0.1, others seem to say it is 0.01. Can I go to
0.001? Lower?

The reason for asking is that I want to use SpamAssassin to flag some
things that are suspicious but only when other conditions are met for
specific users. I'd like to have SA insert the rule text, eg.
LOCAL_SOME_RULE so that I can have an exim filter check for a specific
form of to address plus this rule match before removing the message. But
at the same time I don't want messages that match this rule generate
false positives for other users.

2. I would like to match against some suspicious URLs that contain long
sequences of random characters, but only have the rule match if I find
multiple URLs that follow the same pattern. Normally I would use
/(some-regex){5}/ but it seems that the rawbody command only looks at
smaller chunks of the message (in this case the spammer is sending
messages that are in the 11KB range and I have adjusted exim to pass
enough in $message_body to capture enough URLs to fire a rule).

Is it possible to configure SA to look at bigger chunks? 8 KB or even 16
KB would work. If not, is there a way to write a rule that counts the
total number of matches of a regex against the raw body?
Re: A few noob questions [ In reply to ]
On 19 Dec 2020, at 23:39, Alan wrote:

> Please forgive me if these are easy/common questions. I have done some
> searching and haven't found any clear answers.
>
> I'm running SpamAssassin 3.4.4 in a cPanel environment.
>
> 1. What is the smallest increment for a rule score? I see some
> indications that it's 0.1, others seem to say it is 0.01. Can I go to
> 0.001? Lower?

Any number that Perl understands will work but very small scores are
pointless. So if you really want to score a rule at 12.34e-56 you can.

> The reason for asking is that I want to use SpamAssassin to flag some
> things that are suspicious but only when other conditions are met for
> specific users. I'd like to have SA insert the rule text, eg.
> LOCAL_SOME_RULE so that I can have an exim filter check for a specific
> form of to address plus this rule match before removing the message.
> But at the same time I don't want messages that match this rule
> generate false positives for other users.

Generally 0.01 or -0.01 is adequately small for such purposes.

> 2. I would like to match against some suspicious URLs that contain
> long sequences of random characters, but only have the rule match if I
> find multiple URLs that follow the same pattern. Normally I would use
> /(some-regex){5}/ but it seems that the rawbody command only looks at
> smaller chunks of the message (in this case the spammer is sending
> messages that are in the 11KB range and I have adjusted exim to pass
> enough in $message_body to capture enough URLs to fire a rule).
>
> Is it possible to configure SA to look at bigger chunks? 8 KB or even
> 16 KB would work. If not, is there a way to write a rule that counts
> the total number of matches of a regex against the raw body?

A rule can be allowed to match multiple times, as described in the
documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example
provided there:

uri __KAM_COUNT_URIS /^./
tflags __KAM_COUNT_URIS multiple maxhits=16
describe __KAM_COUNT_URIS A multiple match used to count
URIs in a message

meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0)
meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1)
meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2)
meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3)
meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4)
meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5)
meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10)
meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15)




--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire
Re: A few noob questions [ In reply to ]
Thanks Bill. I know very little about Perl, so while I saw the reference
to Mail::SpamAssassin::Conf without the "perldoc" in front of it, I had
no clue what to do with that information.

On 2020-12-20 00:18, Bill Cole wrote:
> On 19 Dec 2020, at 23:39, Alan wrote:
>
>> Please forgive me if these are easy/common questions. I have done
>> some searching and haven't found any clear answers.
>>
>> I'm running SpamAssassin 3.4.4 in a cPanel environment.
>>
>> 1. What is the smallest increment for a rule score? I see some
>> indications that it's 0.1, others seem to say it is 0.01. Can I go to
>> 0.001? Lower?
>
> Any number that Perl understands will work but very small scores are
> pointless.  So if you really want to score a rule at 12.34e-56 you can.
>
>> The reason for asking is that I want to use SpamAssassin to flag some
>> things that are suspicious but only when other conditions are met for
>> specific users. I'd like to have SA insert the rule text, eg.
>> LOCAL_SOME_RULE so that I can have an exim filter check for a
>> specific form of to address plus this rule match before removing the
>> message. But at the same time I don't want messages that match this
>> rule generate false positives for other users.
>
> Generally 0.01 or -0.01 is adequately small for such purposes.
>
>> 2. I would like to match against some suspicious URLs that contain
>> long sequences of random characters, but only have the rule match if
>> I find multiple URLs that follow the same pattern. Normally I would
>> use /(some-regex){5}/ but it seems that the rawbody command only
>> looks at smaller chunks of the message (in this case the spammer is
>> sending messages that are in the 11KB range and I have adjusted exim
>> to pass enough in $message_body to capture enough URLs to fire a rule).
>>
>> Is it possible to configure SA to look at bigger chunks? 8 KB or even
>> 16 KB would work. If not, is there a way to write a rule that counts
>> the total number of matches of a regex against the raw body?
>
> A rule can be allowed to match multiple times, as described in the
> documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example
> provided there:
>
>               uri      __KAM_COUNT_URIS /^./
>               tflags   __KAM_COUNT_URIS multiple maxhits=16
>               describe __KAM_COUNT_URIS A multiple match used to count
> URIs in a message
>
>               meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS == 0)
>               meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >= 1)
>               meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >= 2)
>               meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >= 3)
>               meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >= 4)
>               meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >= 5)
>               meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS >= 10)
>               meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS >= 15)
>
>
>
>
Re: A few noob questions [ In reply to ]
On Sat, 19 Dec 2020, Alan wrote:

> 1. What is the smallest increment for a rule score? I see some indications
> that it's 0.1, others seem to say it is 0.01. Can I go to 0.001? Lower?

As Bill said, anything works. Zero does disable the rule; a score of 0.001
is generally termed "informative" - you want to include it in the hits
output so that you know that the rule hits, but you don't want it (by
itself) to affect the score. See, for example, LOTSA_MONEY.

> The reason for asking is that I want to use SpamAssassin to flag some things
> that are suspicious but only when other conditions are met for specific
> users. I'd like to have SA insert the rule text, eg. LOCAL_SOME_RULE so that
> I can have an exim filter check for a specific form of to address plus this
> rule match before removing the message.

You should be able to do that purely in SA; it's a tad more difficult if
you want to match the envelope to address rather than the To: header. If
you want to reliably match the envelope to address you'd need to have it
recorded in a Received header (either the one that your MTA generates or
the one that some trusted MTA prior to your MTA generates).

You'd make LOCAL_SOME_RULE an unscored subrule by prepending two
underscores: __LCL_SOME_RULE, and then you'd develop some subrule(s) to
hit on the specific form of to address(es) you're interested in. Then
these can be combined in a scored meta rule:

meta LCL_POISON_01 __LCL_SOME_RULE && (__LCL_SUSP_TO_01 || __LCL_SUSP_TO_02)
score LCL_POISON_01 10.000

> But at the same time I don't want messages that match this rule generate
> false positives for other users.

If you've done the __LCL_SUSP_TO_* rule(s) properly that shouldn't happen.
You can set the score to informative while testing it.

> 2. I would like to match against some suspicious URLs that contain long
> sequences of random characters, but only have the rule match if I find
> multiple URLs that follow the same pattern.

Bill answered that adequately.

One comment on his answer:

describe __KAM_COUNT_URIS

Subrules never appear in the hits output so a description on them is only
for internal documentation purposes; a regular #comment would work just as
well for that.

As for long sequences of random characters - that's FP-prone. It's
difficult to detect *random* in a simple RE. A long string of characters
from a given set, easy. Characteristics about that string? complicated. A
rule like that might potentially hit on legitimate (for values of
"legitimate") tracking analysis URIs or caching URIs, unless there is some
kind of uncommon pattern to it that you can discern and look for in the
RE.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
does quite what I want. I wish Christopher Robin was here."
-- Peter da Silva in a.s.r
-----------------------------------------------------------------------
5 days until Christmas
Re: A few noob questions [ In reply to ]
Many thanks for your help.

On 2020-12-20 15:26, John Hardin wrote:
> On Sat, 19 Dec 2020, Alan wrote:
>
>> The reason for asking is that I want to use SpamAssassin to flag some
>> things that are suspicious but only when other conditions are met for
>> specific users. I'd like to have SA insert the rule text, eg.
>> LOCAL_SOME_RULE so that I can have an exim filter check for a
>> specific form of to address plus this rule match before removing the
>> message.
>
> You should be able to do that purely in SA; it's a tad more difficult
> if you want to match the envelope to address rather than the To:
> header. If you want to reliably match the envelope to address you'd
> need to have it recorded in a Received header (either the one that
> your MTA generates or the one that some trusted MTA prior to your MTA
> generates).

Agreed, ideally this is something I can stick into a KB article and have
afflicted users implement on their own. I'd like to keep system-wide
modifications to a minimum. A user's exim filters also move when we
transfer an account to another server, so as long as there's a common
rule set, not having to adjust SA configuration is a benefit.

Basically what I have now is this:

uri __LCL_SUSPECT_LINK1 /target_pattern_1/i
tflags __LCL_SUSPECT_LINK1 multiple maxhits=5
uri __LCL_SUSPECT_LINK2 /target_pattern_2/i
tflags __LCL_SUSPECT_LINK2 multiple maxhits=5
meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2
&& rules_matching(__LCL_SUSPECT_LINK?) > 5
score LCL_MANY_SUSPECT_LINKS 0.001
describe LCL_MANY_SUSPECT_LINKS More than 5 links match a suspected spam
pattern
> As for long sequences of random characters - that's FP-prone. It's
> difficult to detect *random* in a simple RE. A long string of
> characters from a given set, easy. Characteristics about that string?
> complicated. A rule like that might potentially hit on legitimate (for
> values of "legitimate") tracking analysis URIs or caching URIs, unless
> there is some kind of uncommon pattern to it that you can discern and
> look for in the RE.

No kidding. I've seen this specific pattern in many a spam message over
the years so I suspect it's particularly FP vulnerable. If there was a
regex rule for "matches English word" I could nail them with ease. OTOH
my regex skills are pretty decent. Finding the two common patterns and
checking that at least one of each is there will hopefully eliminate
messages that consistently only use one form, eliminating a range of FPs.

If I can use the "many suspect links" match along with a few other
indicators, including that this particular [expletive] makes the message
look like it comes from a mailing list, I think I can kill their spew.
I'm seeing upwards of 20 messages per day per user from this source, but
they're rotating through junk data center IP addresses and disposable
mail server identities daily. This is war.

One more noob question. Can I test a rule without messing with the
production environment by using

spamassassin -t -cf='include myrule.cf' path

or should I build a test environment?
Re: A few noob questions [ In reply to ]
On Sun, 20 Dec 2020, Alan wrote:

n.b.: you're not subscribed to the list from
netbeans.5zcb0c@ambitonline.com but I pushed it through moderation. If
you're going to post regularly from that address you should register it as
an alternate.

From the mailing list help:

You can start a subscription for an alternate address,
for example "john@host.domain", just add a hyphen and your
address (with '=' instead of '@') after the command word:
<users-subscribe-john=host.domain@spamassassin.apache.org>


> Many thanks for your help.
>
> On 2020-12-20 15:26, John Hardin wrote:
>> On Sat, 19 Dec 2020, Alan wrote:
>>
>>> The reason for asking is that I want to use SpamAssassin to flag some
>>> things that are suspicious but only when other conditions are met for
>>> specific users. I'd like to have SA insert the rule text, eg.
>>> LOCAL_SOME_RULE so that I can have an exim filter check for a specific
>>> form of to address plus this rule match before removing the message.
>>
>> You should be able to do that purely in SA; it's a tad more difficult if
>> you want to match the envelope to address rather than the To: header. If
>> you want to reliably match the envelope to address you'd need to have it
>> recorded in a Received header (either the one that your MTA generates or
>> the one that some trusted MTA prior to your MTA generates).
>
> Agreed, ideally this is something I can stick into a KB article and have
> afflicted users implement on their own. I'd like to keep system-wide
> modifications to a minimum. A user's exim filters also move when we transfer
> an account to another server, so as long as there's a common rule set, not
> having to adjust SA configuration is a benefit.

Ah, ok. That makes sense.

> Basically what I have now is this:
>
> uri __LCL_SUSPECT_LINK1 /target_pattern_1/i
> tflags __LCL_SUSPECT_LINK1 multiple maxhits=5
> uri __LCL_SUSPECT_LINK2 /target_pattern_2/i
> tflags __LCL_SUSPECT_LINK2 multiple maxhits=5
> meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 && __LCL_SUSPECT_LINK2 && rules_matching(__LCL_SUSPECT_LINK?) > 5

No, it doesn't need to be that complex. This is all you need:

meta LCL_MANY_SUSPECT_LINKS __LCL_SUSPECT_LINK1 > 4 && __LCL_SUSPECT_LINK2 > 4

Treat the rule names as variables having their value = # hits. Mostly
you're doing logical comparisons (R1 && R2 && !R3) but math is totally
acceptable as well, e.g. (R1 + R2 + R3 > 1) for an "any two out of three"
meta rule.

...so, if you want to count multiple hits across several rules, perhaps:

meta LCL_MANY_SUSPECT_LINKS (__LCL_SUSPECT_LINK1 + __LCL_SUSPECT_LINK2) > 4

Also note that with "maxhits=5" the number of times the rule will hit will
be at most 5, so "> 5" will never match.

> One more noob question. Can I test a rule without messing with the production
> environment by using
>
> spamassassin -t -cf='include myrule.cf' path
>
> or should I build a test environment?

I do a lot of rule dev so I have a dedicated test environment. I can't say
whether --cf would work, I've never tried it. Seems plausible.

You'll also want "--debug area=all,rules,rules-all,message,uri" to see
the hits in the log output.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
"Bother," said Pooh as he struggled with /etc/sendmail.cf, "it never
does quite what I want. I wish Christopher Robin was here."
-- Peter da Silva in a.s.r
-----------------------------------------------------------------------
5 days until Christmas
Re: A few noob questions [ In reply to ]
On 2020-12-20 21:11, John Hardin wrote:
> On Sun, 20 Dec 2020, Alan wrote:
>
> n.b.: you're not subscribed to the list from
> netbeans.5zcb0c@ambitonline.com but I pushed it through moderation. If
> you're going to post regularly from that address you should register
> it as an alternate.
>
Oh nuts. I always set up a forwarder per list with random suffix, just
so that if it ever leaks out I can change the suffix and beat the
harvesters. I picked the wrong identity to send from. Guess my Netbeans
address now needs an update. Self-inflicted wounds. :(
>
> I do a lot of rule dev so I have a dedicated test environment. I can't
> say whether --cf would work, I've never tried it. Seems plausible.
>
> You'll also want "--debug area=all,rules,rules-all,message,uri" to see
> the hits in the log output.
>
Perfect. Thanks!
Re: A few noob questions [ In reply to ]
On 20 Dec 2020, at 0:38, Alan wrote:

> Thanks Bill. I know very little about Perl, so while I saw the
> reference to Mail::SpamAssassin::Conf without the "perldoc" in front
> of it, I had no clue what to do with that information.

Sorry about that. The same info is at
https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html


> On 2020-12-20 00:18, Bill Cole wrote:
>> On 19 Dec 2020, at 23:39, Alan wrote:
>>
>>> Please forgive me if these are easy/common questions. I have done
>>> some searching and haven't found any clear answers.
>>>
>>> I'm running SpamAssassin 3.4.4 in a cPanel environment.
>>>
>>> 1. What is the smallest increment for a rule score? I see some
>>> indications that it's 0.1, others seem to say it is 0.01. Can I go
>>> to 0.001? Lower?
>>
>> Any number that Perl understands will work but very small scores are
>> pointless.  So if you really want to score a rule at 12.34e-56 you
>> can.
>>
>>> The reason for asking is that I want to use SpamAssassin to flag
>>> some things that are suspicious but only when other conditions are
>>> met for specific users. I'd like to have SA insert the rule text,
>>> eg. LOCAL_SOME_RULE so that I can have an exim filter check for a
>>> specific form of to address plus this rule match before removing the
>>> message. But at the same time I don't want messages that match this
>>> rule generate false positives for other users.
>>
>> Generally 0.01 or -0.01 is adequately small for such purposes.
>>
>>> 2. I would like to match against some suspicious URLs that contain
>>> long sequences of random characters, but only have the rule match if
>>> I find multiple URLs that follow the same pattern. Normally I would
>>> use /(some-regex){5}/ but it seems that the rawbody command only
>>> looks at smaller chunks of the message (in this case the spammer is
>>> sending messages that are in the 11KB range and I have adjusted exim
>>> to pass enough in $message_body to capture enough URLs to fire a
>>> rule).
>>>
>>> Is it possible to configure SA to look at bigger chunks? 8 KB or
>>> even 16 KB would work. If not, is there a way to write a rule that
>>> counts the total number of matches of a regex against the raw body?
>>
>> A rule can be allowed to match multiple times, as described in the
>> documentation (perldoc Mail::SpamAssassin::Conf.) Here's the example
>> provided there:
>>
>>               uri      __KAM_COUNT_URIS /^./
>>               tflags   __KAM_COUNT_URIS multiple
>> maxhits=16
>>               describe __KAM_COUNT_URIS A multiple match
>> used to count URIs in a message
>>
>>               meta __KAM_HAS_0_URIS (__KAM_COUNT_URIS ==
>> 0)
>>               meta __KAM_HAS_1_URIS (__KAM_COUNT_URIS >=
>> 1)
>>               meta __KAM_HAS_2_URIS (__KAM_COUNT_URIS >=
>> 2)
>>               meta __KAM_HAS_3_URIS (__KAM_COUNT_URIS >=
>> 3)
>>               meta __KAM_HAS_4_URIS (__KAM_COUNT_URIS >=
>> 4)
>>               meta __KAM_HAS_5_URIS (__KAM_COUNT_URIS >=
>> 5)
>>               meta __KAM_HAS_10_URIS (__KAM_COUNT_URIS
>> >= 10)
>>               meta __KAM_HAS_15_URIS (__KAM_COUNT_URIS
>> >= 15)
>>
>>
>>
>>


--
Bill Cole
bill@scconsult.com or billcole@apache.org
(AKA @grumpybozo and many *@billmail.scconsult.com addresses)
Not Currently Available For Hire