Mailing List Archive

Starting Clean with Bayes
I am starting over with a clean install of SA on an AWS Linux2 EC2.  I'm
am struggling with getting Bayes set up correctly.  I have a very old
bayes_toks file from a Jam Windows install from about 4 years ago.  I
created a userId for spamd, and I put the bayes_toks file in
/home/spamd/bayes.  I set the bayes_path in local.cf to
/home/spamd/bayes/bayes.  I changed the file owner to spamd:spamd.  I
get the error message:

     cannot open bayes databases /home/spamd/bayes/bayes_* R/O: tie failed

I tried running the Spamassassin from the command line as sudo, and get
the same error.  So I don't think it's a permissions issue.

So I moved the file out of the folder and now get:

     no dbs present, cannot tie DB R/O: /home/spamd/bayes/bayes_toks

So in the first case it finds the file but can't open it.  I found some
posts on forums that suggested there's a possibility the file is so old
the format is obsolete.  Fine with me.  At this point, I just want to
start clean.  But I can't find a way to start using bayes from scratch
with no toks file starting off.  I even did another clean install on a
separate ec2 to see if SA would create an initial  toks file. But I
couldn't find one.

My old toks file is probably of marginal value now anyway.  I just need
to know where to find a brand new toks file to put into my bayes_path
folder so it can start building up the ham/spam file and start
contributing to my SA scores.

Where do I find a starter toks file?

Thx
Re: Starting Clean with Bayes [ In reply to ]
On 10/19/21 8:06 PM, Jerry Malcolm wrote:
>
> Where do I find a starter toks file?

You don't need a "starter" file. As soon as it needs them, SA
automagically creates the necessary files if it can write into the
defined path.
Just feed it some spams and hams as per docs and you'll see the files.
Re: Starting Clean with Bayes [ In reply to ]
On Wed, 20 Oct 2021, Axb wrote:

> On 10/19/21 8:06 PM, Jerry Malcolm wrote:
>>
>> Where do I find a starter toks file?
>
> You don't need a "starter" file.

Your Bayes starter is your training corpora, which you should retain in
case you ever need to start over from scratch as you're doing now.

--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
At what point then is the approach of danger to be expected?
I answer, if it ever reach us, it must spring up amongst us.
It cannot come from abroad. If destruction be our lot, we must
ourselves be its author and finisher. As a nation of freemen, we
must live through all time, or die by suicide. -- Abraham Lincoln
...popularly summarized as:
"America will never be destroyed from the outside. If we falter
and lose our freedoms, it will be because we destroyed ourselves."
-----------------------------------------------------------------------
508 days since the first private commercial manned orbital mission (SpaceX)
Starting Clean With Bayes [ In reply to ]
I am starting over with a clean install of SA on an AWS Linux2 EC2.  I'm
am struggling with getting Bayes set up correctly.  I have a very old
bayes_toks file from a Jam Windows install from about 4 years ago.  I
created a userId for spamd, and I put the bayes_toks file in
/home/spamd/bayes.  I set the bayes_path in local.cf to
/home/spamd/bayes/bayes.  I changed the file owner to spamd:spamd.  I
get the error message:

     cannot open bayes databases /home/spamd/bayes/bayes_* R/O: tie failed

I tried running the Spamassassin from the command line as sudo, and get
the same error.  So I don't think it's a permissions issue.

So I moved the file out of the folder and now get:

     no dbs present, cannot tie DB R/O: /home/spamd/bayes/bayes_toks

So in the first case it finds the file but can't open it.  I found some
posts on forums that suggested there's a possibility the file is so old
the format is obsolete.  Fine with me.  At this point, I just want to
start clean.  But I can't find a way to start using bayes from scratch
with no toks file starting off.  I even did another clean install on a
separate ec2 to see if SA would create an initial  toks file. But I
couldn't find one.

My old toks file is probably of marginal value now anyway.  I just need
to know where to find a brand new toks file to put into my bayes_path
folder so it can start building up the ham/spam file and start
contributing to my SA scores.

Where do I find a starter toks file?

Thx
Re: Starting Clean with Bayes [ In reply to ]
On 2021-10-20 16:58, John Hardin wrote:
> On Wed, 20 Oct 2021, Axb wrote:
>
>> On 10/19/21 8:06 PM, Jerry Malcolm wrote:
>>>
>>> Where do I find a starter toks file?
>>
>> You don't need a "starter" file.
>
> Your Bayes starter is your training corpora, which you should retain
> in case you ever need to start over from scratch as you're doing now.

no one asked how to make a backup/restore, with imho would have answered
all this just like one would just use corpus retraining data

hmm :)

i just wish that its not only bayes that can be backup/restored but also
TxRep and awl data

this will make it possible to change from postgresql to redis if needed,
who will use mysql or berkdb ?
Re: Starting Clean with Bayes [ In reply to ]
On Sat, 23 Oct 2021, Benny Pedersen wrote:

> On 2021-10-20 16:58, John Hardin wrote:
>> On Wed, 20 Oct 2021, Axb wrote:
>>
>>> On 10/19/21 8:06 PM, Jerry Malcolm wrote:
>>>>
>>>> Where do I find a starter toks file?
>>>
>>> You don't need a "starter" file.
>>
>> Your Bayes starter is your training corpora, which you should retain
>> in case you ever need to start over from scratch as you're doing now.
>
> no one asked how to make a backup/restore, with imho would have answered all
> this just like one would just use corpus retraining data

A backup is fine for migration.

A backup of a database that has gone off the rails is useless.

It fairly accepted that there's no such thing as a "generic starter Bayes
database" due to the variability of peoples' ham.


--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhardin@impsec.org pgpk -a jhardin@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Are you a mildly tech-literate politico horrified by the level of
ignorance demonstrated by lawmakers gearing up to regulate online
technology they don't even begin to grasp? Cool. Now you have a
tiny glimpse into a day in the life of a gun owner. -- Sean Davis
-----------------------------------------------------------------------
511 days since the first private commercial manned orbital mission (SpaceX)