Mailing List Archive

Train Spamassassin
Hallo everybody,

I have a serious Spam-Problem and I want to feed sa-learn with those
Spam-mails to get rid of it.

Has someone found a solution to train Spamassassin with data from DBMail?

Cheers

Claas

_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Train Spamassassin [ In reply to ]
Am 04.01.2017 um 20:50 schrieb Claas Kähler:
> Hallo everybody,
>
> I have a serious Spam-Problem and I want to feed sa-learn with those
> Spam-mails to get rid of it.
>
> Has someone found a solution to train Spamassassin with data from DBMail?

just put your samples in folders and use "sa-learn" with the correct user


[root@mail-gw:~]$ bayes-stats.sh
0 81476 SPAM
0 25659 HAM
0 3227434 TOKEN

insgesamt 376M
24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29 bayes_seen
65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29 bayes_toks
312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29 wordlist.db

BAYES_00 1682 58.99 %
BAYES_05 84 2.94 %
BAYES_20 81 2.84 %
BAYES_40 104 3.64 %
BAYES_50 394 13.81 %
BAYES_60 56 1.96 % 9.75 % (OF TOTAL BLOCKED)
BAYES_80 60 2.10 % 10.45 % (OF TOTAL BLOCKED)
BAYES_95 38 1.33 % 6.62 % (OF TOTAL BLOCKED)
BAYES_99 352 12.34 % 61.32 % (OF TOTAL BLOCKED)
BAYES_999 285 9.99 % 49.65 % (OF TOTAL BLOCKED)

DELIVERED 5104 91.91 %
DNSWL 4984 89.75 %
SPF 4088 73.61 %
SPF/DKIM WL 2250 40.51 %
SHORTCIRCUIT 2690 48.44 %

BLOCKED 574 10.33 %
SPAMMY 506 9.11 % 88.15 % (OF TOTAL BLOCKED)
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Train Spamassassin [ In reply to ]
Okay this part is easy, but how could i extract samples from dbmail to
put them into a folder?


Am 04.01.17 um 22:26 schrieb Reindl Harald:
>
>
> Am 04.01.2017 um 20:50 schrieb Claas Kähler:
>> Hallo everybody,
>>
>> I have a serious Spam-Problem and I want to feed sa-learn with those
>> Spam-mails to get rid of it.
>>
>> Has someone found a solution to train Spamassassin with data from
>> DBMail?
>
> just put your samples in folders and use "sa-learn" with the correct user
>
>
> [root@mail-gw:~]$ bayes-stats.sh
> 0 81476 SPAM
> 0 25659 HAM
> 0 3227434 TOKEN
>
> insgesamt 376M
> 24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29 bayes_seen
> 65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29 bayes_toks
> 312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29 wordlist.db
>
> BAYES_00 1682 58.99 %
> BAYES_05 84 2.94 %
> BAYES_20 81 2.84 %
> BAYES_40 104 3.64 %
> BAYES_50 394 13.81 %
> BAYES_60 56 1.96 % 9.75 % (OF TOTAL BLOCKED)
> BAYES_80 60 2.10 % 10.45 % (OF TOTAL BLOCKED)
> BAYES_95 38 1.33 % 6.62 % (OF TOTAL BLOCKED)
> BAYES_99 352 12.34 % 61.32 % (OF TOTAL BLOCKED)
> BAYES_999 285 9.99 % 49.65 % (OF TOTAL BLOCKED)
>
> DELIVERED 5104 91.91 %
> DNSWL 4984 89.75 %
> SPF 4088 73.61 %
> SPF/DKIM WL 2250 40.51 %
> SHORTCIRCUIT 2690 48.44 %
>
> BLOCKED 574 10.33 %
> SPAMMY 506 9.11 % 88.15 % (OF TOTAL BLOCKED)
> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail

_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Train Spamassassin [ In reply to ]
Am 04.01.2017 um 22:32 schrieb Claas Kähler:
> Okay this part is easy, but how could i extract samples from dbmail to
> put them into a folder?

by IMAP - you are not supposed to directyl access email via mysql since
you underestimate the complexity of the strcuture and must not rely on
internal aka non-public API's

> Am 04.01.17 um 22:26 schrieb Reindl Harald:
>>
>>
>> Am 04.01.2017 um 20:50 schrieb Claas Kähler:
>>> Hallo everybody,
>>>
>>> I have a serious Spam-Problem and I want to feed sa-learn with those
>>> Spam-mails to get rid of it.
>>>
>>> Has someone found a solution to train Spamassassin with data from
>>> DBMail?
>>
>> just put your samples in folders and use "sa-learn" with the correct user
>>
>>
>> [root@mail-gw:~]$ bayes-stats.sh
>> 0 81476 SPAM
>> 0 25659 HAM
>> 0 3227434 TOKEN
>>
>> insgesamt 376M
>> 24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29 bayes_seen
>> 65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29 bayes_toks
>> 312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29 wordlist.db
>>
>> BAYES_00 1682 58.99 %
>> BAYES_05 84 2.94 %
>> BAYES_20 81 2.84 %
>> BAYES_40 104 3.64 %
>> BAYES_50 394 13.81 %
>> BAYES_60 56 1.96 % 9.75 % (OF TOTAL BLOCKED)
>> BAYES_80 60 2.10 % 10.45 % (OF TOTAL BLOCKED)
>> BAYES_95 38 1.33 % 6.62 % (OF TOTAL BLOCKED)
>> BAYES_99 352 12.34 % 61.32 % (OF TOTAL BLOCKED)
>> BAYES_999 285 9.99 % 49.65 % (OF TOTAL BLOCKED)
>>
>> DELIVERED 5104 91.91 %
>> DNSWL 4984 89.75 %
>> SPF 4088 73.61 %
>> SPF/DKIM WL 2250 40.51 %
>> SHORTCIRCUIT 2690 48.44 %
>>
>> BLOCKED 574 10.33 %
>> SPAMMY 506 9.11 % 88.15 % (OF TOTAL BLOCKED)
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Train Spamassassin [ In reply to ]
There's also a dbmail-export command isn't there to turn it into mbox files?


On Wed, Jan 4, 2017 at 5:21 PM, Reindl Harald <h.reindl@thelounge.net>
wrote:

>
>
> Am 04.01.2017 um 22:32 schrieb Claas Kähler:
>
>> Okay this part is easy, but how could i extract samples from dbmail to
>> put them into a folder?
>>
>
> by IMAP - you are not supposed to directyl access email via mysql since
> you underestimate the complexity of the strcuture and must not rely on
> internal aka non-public API's
>
> Am 04.01.17 um 22:26 schrieb Reindl Harald:
>>
>>
>>>
>>> Am 04.01.2017 um 20:50 schrieb Claas Kähler:
>>>
>>>> Hallo everybody,
>>>>
>>>> I have a serious Spam-Problem and I want to feed sa-learn with those
>>>> Spam-mails to get rid of it.
>>>>
>>>> Has someone found a solution to train Spamassassin with data from
>>>> DBMail?
>>>>
>>>
>>> just put your samples in folders and use "sa-learn" with the correct user
>>>
>>>
>>> [root@mail-gw:~]$ bayes-stats.sh
>>> 0 81476 SPAM
>>> 0 25659 HAM
>>> 0 3227434 TOKEN
>>>
>>> insgesamt 376M
>>> 24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29 bayes_seen
>>> 65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29 bayes_toks
>>> 312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29 wordlist.db
>>>
>>> BAYES_00 1682 58.99 %
>>> BAYES_05 84 2.94 %
>>> BAYES_20 81 2.84 %
>>> BAYES_40 104 3.64 %
>>> BAYES_50 394 13.81 %
>>> BAYES_60 56 1.96 % 9.75 % (OF TOTAL BLOCKED)
>>> BAYES_80 60 2.10 % 10.45 % (OF TOTAL BLOCKED)
>>> BAYES_95 38 1.33 % 6.62 % (OF TOTAL BLOCKED)
>>> BAYES_99 352 12.34 % 61.32 % (OF TOTAL BLOCKED)
>>> BAYES_999 285 9.99 % 49.65 % (OF TOTAL BLOCKED)
>>>
>>> DELIVERED 5104 91.91 %
>>> DNSWL 4984 89.75 %
>>> SPF 4088 73.61 %
>>> SPF/DKIM WL 2250 40.51 %
>>> SHORTCIRCUIT 2690 48.44 %
>>>
>>> BLOCKED 574 10.33 %
>>> SPAMMY 506 9.11 % 88.15 % (OF TOTAL BLOCKED)
>>>
>> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail
>
Re: Train Spamassassin [ In reply to ]
Am 05.01.2017 um 03:10 schrieb Ryan Butler:
> There's also a dbmail-export command isn't there to turn it into mbox files?

yes, but you need to fix your picture how to train SA *properly*

blowing each and every email there without 100% classification is the
wrong way - every single missclassified message does that much harm that
you need 10-20 correct ones fixing the damage

> On Wed, Jan 4, 2017 at 5:21 PM, Reindl Harald <h.reindl@thelounge.net
> <mailto:h.reindl@thelounge.net>> wrote:
>
>
>
> Am 04.01.2017 um 22:32 schrieb Claas Kähler:
>
> Okay this part is easy, but how could i extract samples from
> dbmail to
> put them into a folder?
>
>
> by IMAP - you are not supposed to directyl access email via mysql
> since you underestimate the complexity of the strcuture and must not
> rely on internal aka non-public API's
>
> Am 04.01.17 um 22:26 schrieb Reindl Harald:
>
>
>
> Am 04.01.2017 um 20:50 schrieb Claas Kähler:
>
> Hallo everybody,
>
> I have a serious Spam-Problem and I want to feed
> sa-learn with those
> Spam-mails to get rid of it.
>
> Has someone found a solution to train Spamassassin with
> data from
> DBMail?
>
>
> just put your samples in folders and use "sa-learn" with the
> correct user
>
>
> [root@mail-gw:~]$ bayes-stats.sh
> 0 81476 SPAM
> 0 25659 HAM
> 0 3227434 TOKEN
>
> insgesamt 376M
> 24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29
> bayes_seen
> 65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29
> bayes_toks
> 312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29
> wordlist.db
>
> BAYES_00 1682 58.99 %
> BAYES_05 84 2.94 %
> BAYES_20 81 2.84 %
> BAYES_40 104 3.64 %
> BAYES_50 394 13.81 %
> BAYES_60 56 1.96 % 9.75 % (OF TOTAL BLOCKED)
> BAYES_80 60 2.10 % 10.45 % (OF TOTAL BLOCKED)
> BAYES_95 38 1.33 % 6.62 % (OF TOTAL BLOCKED)
> BAYES_99 352 12.34 % 61.32 % (OF TOTAL BLOCKED)
> BAYES_999 285 9.99 % 49.65 % (OF TOTAL BLOCKED)
>
> DELIVERED 5104 91.91 %
> DNSWL 4984 89.75 %
> SPF 4088 73.61 %
> SPF/DKIM WL 2250 40.51 %
> SHORTCIRCUIT 2690 48.44 %
>
> BLOCKED 574 10.33 %
> SPAMMY 506 9.11 % 88.15 % (OF TOTAL BLOCKED)
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Train Spamassassin [ In reply to ]
Okay, i did an export from my Junk folder with a Thunderbird Plugin, it
worked quiet well.

But at least that is not a very nice way! Does someone know a good
script that pulls a folder from imap in a sa-learn compatible format?

Thoses 2 do not work, sa-learn could not encode them:

https://github.com/rtucker/imap2maildir

https://github.com/rcarmo/imapbackup



Am 05.01.17 um 04:50 schrieb Reindl Harald:
>
>
> Am 05.01.2017 um 03:10 schrieb Ryan Butler:
>> There's also a dbmail-export command isn't there to turn it into mbox
>> files?
>
> yes, but you need to fix your picture how to train SA *properly*
>
> blowing each and every email there without 100% classification is the
> wrong way - every single missclassified message does that much harm
> that you need 10-20 correct ones fixing the damage
>
>> On Wed, Jan 4, 2017 at 5:21 PM, Reindl Harald <h.reindl@thelounge.net
>> <mailto:h.reindl@thelounge.net>> wrote:
>>
>>
>>
>> Am 04.01.2017 um 22:32 schrieb Claas Kähler:
>>
>> Okay this part is easy, but how could i extract samples from
>> dbmail to
>> put them into a folder?
>>
>>
>> by IMAP - you are not supposed to directyl access email via mysql
>> since you underestimate the complexity of the strcuture and must not
>> rely on internal aka non-public API's
>>
>> Am 04.01.17 um 22:26 schrieb Reindl Harald:
>>
>>
>>
>> Am 04.01.2017 um 20:50 schrieb Claas Kähler:
>>
>> Hallo everybody,
>>
>> I have a serious Spam-Problem and I want to feed
>> sa-learn with those
>> Spam-mails to get rid of it.
>>
>> Has someone found a solution to train Spamassassin with
>> data from
>> DBMail?
>>
>>
>> just put your samples in folders and use "sa-learn" with the
>> correct user
>>
>>
>> [root@mail-gw:~]$ bayes-stats.sh
>> 0 81476 SPAM
>> 0 25659 HAM
>> 0 3227434 TOKEN
>>
>> insgesamt 376M
>> 24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29
>> bayes_seen
>> 65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29
>> bayes_toks
>> 312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29
>> wordlist.db
>>
>> BAYES_00 1682 58.99 %
>> BAYES_05 84 2.94 %
>> BAYES_20 81 2.84 %
>> BAYES_40 104 3.64 %
>> BAYES_50 394 13.81 %
>> BAYES_60 56 1.96 % 9.75 % (OF TOTAL
>> BLOCKED)
>> BAYES_80 60 2.10 % 10.45 % (OF TOTAL
>> BLOCKED)
>> BAYES_95 38 1.33 % 6.62 % (OF TOTAL
>> BLOCKED)
>> BAYES_99 352 12.34 % 61.32 % (OF TOTAL
>> BLOCKED)
>> BAYES_999 285 9.99 % 49.65 % (OF TOTAL
>> BLOCKED)
>>
>> DELIVERED 5104 91.91 %
>> DNSWL 4984 89.75 %
>> SPF 4088 73.61 %
>> SPF/DKIM WL 2250 40.51 %
>> SHORTCIRCUIT 2690 48.44 %
>>
>> BLOCKED 574 10.33 %
>> SPAMMY 506 9.11 % 88.15 % (OF TOTAL
>> BLOCKED)
> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail

_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://lists.nfg.nl/mailman/listinfo/dbmail
Re: Train Spamassassin [ In reply to ]
Am 05.01.2017 um 12:56 schrieb Claas Kähler:
> Okay, i did an export from my Junk folder with a Thunderbird Plugin, it
> worked quiet well.
>
> But at least that is not a very nice way! Does someone know a good
> script that pulls a folder from imap in a sa-learn compatible format?

a plugin?
just "save as" - the eml files *are* sa-learn compatible by definition
because they are nothing else than a raw-message in mbox format

> Thoses 2 do not work, sa-learn could not encode them:
> https://github.com/rtucker/imap2maildir
> https://github.com/rcarmo/imapbackup

see attachment

the trainig-scripts itself are not included, it just pulls samples and
talks to spamd to ignore junk which is already classified as BAYES_99

> Am 05.01.17 um 04:50 schrieb Reindl Harald:
>>
>>
>> Am 05.01.2017 um 03:10 schrieb Ryan Butler:
>>> There's also a dbmail-export command isn't there to turn it into mbox
>>> files?
>>
>> yes, but you need to fix your picture how to train SA *properly*
>>
>> blowing each and every email there without 100% classification is the
>> wrong way - every single missclassified message does that much harm
>> that you need 10-20 correct ones fixing the damage
>>
>>> On Wed, Jan 4, 2017 at 5:21 PM, Reindl Harald <h.reindl@thelounge.net
>>> <mailto:h.reindl@thelounge.net>> wrote:
>>>
>>>
>>>
>>> Am 04.01.2017 um 22:32 schrieb Claas Kähler:
>>>
>>> Okay this part is easy, but how could i extract samples from
>>> dbmail to
>>> put them into a folder?
>>>
>>>
>>> by IMAP - you are not supposed to directyl access email via mysql
>>> since you underestimate the complexity of the strcuture and must not
>>> rely on internal aka non-public API's
>>>
>>> Am 04.01.17 um 22:26 schrieb Reindl Harald:
>>>
>>>
>>>
>>> Am 04.01.2017 um 20:50 schrieb Claas Kähler:
>>>
>>> Hallo everybody,
>>>
>>> I have a serious Spam-Problem and I want to feed
>>> sa-learn with those
>>> Spam-mails to get rid of it.
>>>
>>> Has someone found a solution to train Spamassassin with
>>> data from
>>> DBMail?
>>>
>>>
>>> just put your samples in folders and use "sa-learn" with the
>>> correct user
>>>
>>>
>>> [root@mail-gw:~]$ bayes-stats.sh
>>> 0 81476 SPAM
>>> 0 25659 HAM
>>> 0 3227434 TOKEN
>>>
>>> insgesamt 376M
>>> 24K -rw-r----- 1 sa-milt sa-milt 24K 2017-01-04 16:29
>>> bayes_seen
>>> 65M -rw-r----- 1 sa-milt sa-milt 81M 2017-01-04 16:29
>>> bayes_toks
>>> 312M -rw-r----- 1 sa-milt sa-milt 312M 2017-01-04 16:29
>>> wordlist.db
>>>
>>> BAYES_00 1682 58.99 %
>>> BAYES_05 84 2.94 %
>>> BAYES_20 81 2.84 %
>>> BAYES_40 104 3.64 %
>>> BAYES_50 394 13.81 %
>>> BAYES_60 56 1.96 % 9.75 % (OF TOTAL
>>> BLOCKED)
>>> BAYES_80 60 2.10 % 10.45 % (OF TOTAL
>>> BLOCKED)
>>> BAYES_95 38 1.33 % 6.62 % (OF TOTAL
>>> BLOCKED)
>>> BAYES_99 352 12.34 % 61.32 % (OF TOTAL
>>> BLOCKED)
>>> BAYES_999 285 9.99 % 49.65 % (OF TOTAL
>>> BLOCKED)
>>>
>>> DELIVERED 5104 91.91 %
>>> DNSWL 4984 89.75 %
>>> SPF 4088 73.61 %
>>> SPF/DKIM WL 2250 40.51 %
>>> SHORTCIRCUIT 2690 48.44 %
>>>
>>> BLOCKED 574 10.33 %
>>> SPAMMY 506 9.11 % 88.15 % (OF TOTAL
>>> BLOCKED)
>> _______________________________________________
>> DBmail mailing list
>> DBmail@dbmail.org
>> http://lists.nfg.nl/mailman/listinfo/dbmail
>
> _______________________________________________
> DBmail mailing list
> DBmail@dbmail.org
> http://lists.nfg.nl/mailman/listinfo/dbmail

--

Reindl Harald
the lounge interactive design GmbH
A-1060 Vienna, Hofmühlgasse 17
CTO / CISO / Software-Development
m: +43 676 40 221 40
p: +43 1 595 3999 33
http://www.thelounge.net/