Mailing List Archive

Script or command for testing new rules to ensure new rules don't generate false positives/negatives?
I'm experimenting with writing a library of my own SA rules and scores.
I'd like to be sure that the rules I write don't turn ham into spam and
vice versa. I figured the best way to do this would be to run SA against
an existing collection of ham and spam to make sure emails are still
scored accurately with the new rules.

I imagine a utility like this must exists so figured I'd ask here before
re-inventing the wheel and writing my own (probably bugg) script.

The script would need to check against all email files in .INBOX.* and
.Spam directory in a user's IMAP directory.

Thanks again, everyone.
Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives? [ In reply to ]
On Fri, 2021-04-23 at 16:28 -0400, Steve Dondley wrote:
> I'm experimenting with writing a library of my own SA rules and
> scores.
>
I do this on a separate computer, which has Spamassassin installed but
not linked into anything else. It also has a copy of all the live SA
configuration files. Alongside this I have a directory filled with
examples of spam to function as testing input.

Along with I have a bash script or two which is used to do things like:

1) start SA in debug mode to check the testing config for errors. 
No messages are processed - its just looking for configuration
errors.

2) run SA against a spam sample and only display the list of spam hits

3) run SA against a spam sample and display the entire output message
using less so it can be scrolled through

4) run SA against the complete spam collection and only display
references to messages which are not scored as spam

5) replace the live SA configuration with with the current testing
configuration, i.e. make the most set of changes live.

In practise (1) through (3) are east to combine into a single script
with an option to select the required action while (4) and (5) are best
kept separate.

It helps a lot of to name the items in the spam collection to relate
each set of similar spam to the local rule that's intended to trap this
spam type.

> I'd like to be sure that the rules I write don't turn ham into spam
> and vice versa.
>
It won't if you test the rules against related spam and give some
thought to the score you apply to each rule.

> I imagine a utility like this must exists so figured I'd ask here
> before re-inventing the wheel and writing my own (probably bugg)
> script.
>
The sort of scripts I use are fairly short and simple.
>
> The script would need to check against all email files in .INBOX.* and
> .Spam directory in a user's IMAP directory.
>
No. Treat this like any other code development project: use a rule
development SA installation like I describe so you never develop rules
using the live mail stream. This way your rules will be better written
and tested and you'll cause fewer false positives in your live mail
stream.

Martin
Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives? [ In reply to ]
On 2021-04-23 05:41 PM, Martin Gregorie wrote:
> On Fri, 2021-04-23 at 16:28 -0400, Steve Dondley wrote:
>> I'm experimenting with writing a library of my own SA rules and
>> scores.
>>
> I do this on a separate computer, which has Spamassassin installed but
> not linked into anything else. It also has a copy of all the live SA
> configuration files. Alongside this I have a directory filled with
> examples of spam to function as testing input.
>
> Along with I have a bash script or two which is used to do things like:
>
> 1) start SA in debug mode to check the testing config for errors. 
> No messages are processed - its just looking for configuration
> errors.
>
> 2) run SA against a spam sample and only display the list of spam hits
>
> 3) run SA against a spam sample and display the entire output message
> using less so it can be scrolled through
>
> 4) run SA against the complete spam collection and only display
> references to messages which are not scored as spam
>
> 5) replace the live SA configuration with with the current testing
> configuration, i.e. make the most set of changes live.
>
> In practise (1) through (3) are east to combine into a single script
> with an option to select the required action while (4) and (5) are best
> kept separate.
>
> It helps a lot of to name the items in the spam collection to relate
> each set of similar spam to the local rule that's intended to trap this
> spam type.
>
>> I'd like to be sure that the rules I write don't turn ham into spam
>> and vice versa.
>>
> It won't if you test the rules against related spam and give some
> thought to the score you apply to each rule.
>
>> I imagine a utility like this must exists so figured I'd ask here
>> before re-inventing the wheel and writing my own (probably bugg)
>> script.
>>
> The sort of scripts I use are fairly short and simple.
>>
>> The script would need to check against all email files in .INBOX.* and
>> .Spam directory in a user's IMAP directory.
>>
> No. Treat this like any other code development project: use a rule
> development SA installation like I describe so you never develop rules
> using the live mail stream. This way your rules will be better written
> and tested and you'll cause fewer false positives in your live mail
> stream.
>
> Martin

Sounds like the best plan. Thanks for the advice.
Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives? [ In reply to ]
On Sat, 24 Apr 2021, Steve Dondley wrote:

> On 2021-04-23 05:41 PM, Martin Gregorie wrote:
>> On Fri, 2021-04-23 at 16:28 -0400, Steve Dondley wrote:
>>> I'm experimenting with writing a library of my own SA rules and
>>> scores.
>>>
>> Treat this like any other code development project: use a rule
>> development SA installation like I describe so you never develop rules
>> using the live mail stream. This way your rules will be better written
>> and tested and you'll cause fewer false positives in your live mail
>> stream.
>
> Sounds like the best plan. Thanks for the advice.


And if you want to test your rules against a corpus rather than testing
against a few one-off spamples, then look into setting up a local
masscheck instance. You don't need to upload the results to SA, but it
will give you a good overview of how a rule behaves against multiple
messages.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Human beings are born with different capacities.
If they are free, they are not equal. And if they are equal,
they are not free. -- Aleksandr Solzhenitsyn
-----------------------------------------------------------------------
329 days since the first private commercial manned orbital mission (SpaceX)
Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives? [ In reply to ]
>
> And if you want to test your rules against a corpus rather than
> testing against a few one-off spamples, then look into setting up a
> local masscheck instance. You don't need to upload the results to SA,
> but it will give you a good overview of how a rule behaves against
> multiple messages.

I'm not sure what you mean by "Local masscheck instance". But I plan to
do the following:

1) set up SA in a docker container which has a volume containing my
spam/ham folders
2) run a script that syncs ham/spam with live server
2) set up a script that will compare scores before a rule is implemented
and with scores after it is implemented
3) script will output a report that tells me the results and report
whether a spam/ham email is "flipped"
Re: Script or command for testing new rules to ensure new rules don't generate false positives/negatives? [ In reply to ]
On Sat, 24 Apr 2021, Steve Dondley wrote:

>
>>
>> And if you want to test your rules against a corpus rather than
>> testing against a few one-off spamples, then look into setting up a
>> local masscheck instance. You don't need to upload the results to SA,
>> but it will give you a good overview of how a rule behaves against
>> multiple messages.
>
> I'm not sure what you mean by "Local masscheck instance".

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/MassCheck

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Making good people helpless does not make bad people harmless.
-----------------------------------------------------------------------
329 days since the first private commercial manned orbital mission (SpaceX)