Mailing List Archive: What the...?!?!?!?! My bayes ate itself!

What the...?!?!?!?! My bayes ate itself!

Feb 27, 2004, 4:14 PM

Post #1 of 5 (555 views)

ACK!!!!!!!!

So I'm trying to do my regular Friday ritual of training Bayes, and
everything was going fine:

# sa-learn --showdots --spam --mbox 'SPAM_False_Negatives.mbox'
.... {*snip*}
Learned from 262 message(s) (290 message(s) examined).

# sa-learn --showdots --ham --mbox 'SPAM_False_Positives.mbox'
............
Learned from 12 message(s) (12 message(s) examined).

# sa-learn --showdots --ham --mbox '! Ham.mbox'
.... {*snip*}
Learned from 435 message(s) (446 message(s) examined).

Until it barfed oddly:

# sa-learn --showdots --spam --mbox Caught_Spam.mbox
.... {*snip*}
Learned from 1981 message(s) (3500 message(s) examined).
unlock: 9771 unlink failed: /root/.spamassassin/bayes.lock

There's no /root/.spamassassin/bayes.lock file, so I'm not sure why it
couldn't unlock it, unless sa-learn didn't make it in the first place... so
what's up with that?

Then I went to check on the total number of learned spam/ham:

# sa-learn --dump magic
{*snip*}
0.000 0 18 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 360 0 non-token data: ntokens
{*snip*}

What the?!?!??! Where'd all my learned stuff go???

# spamassassin -D --lint
{*snip*}
debug: bayes: 10314 tie-ing to DB file R/O /root/.spamassassin/bayes_toks
debug: bayes: 10314 tie-ing to DB file R/O /root/.spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: bayes: Not available for scanning, only 19 spam(s) in Bayes DB < 200
{*snip*}

So my Bayes is completely gone (more importantly: essentially turned OFF!)?

But there's stuff there:

# ls -l /root/.spamassassin/
total 113708
-rw------- 1 root mail 84168704 Feb 2 11:52 auto-whitelist
-rw------- 1 root mail 34885632 Feb 27 15:00 auto-whitelist.db
-rw------- 1 root mail 10481664 Feb 27 15:00 bayes_seen
-rw------- 1 root mail 5611520 Feb 27 15:00 bayes_toks
-rw-r--r-- 1 root mail 1218 Jan 2 14:47 user_prefs

So now I'm re-training on my two IMAP archives of what I had previously
trained it with. I just fed it 30374 spam messages. But this kept
scrolling across my screen over and over and over:

Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line 1267.
Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line 1267.
Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line 1267.
Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line 1267.

And when it finished:

Learned from 327 message(s) (30386 message(s) examined).
unlock: 10665 unlink failed: /root/.spamassassin/bayes.lock

What's going on here? (And no, we haven't done anything to perl. This
machine pretty much sits there and minds its own business. The only
"changes" I make to it are to update custom .cf files and to train bayes.)

I get the same BayesStore.pm error when trying to feed it the 2824 ham
messages I have archived. But it didn't puke with a unlink message:

Learned from 18 message(s) (2827 message(s) examined).

Now I'm up to a whoppingly unuseful message count that won't enable bayes:

0.000 0 26 0 non-token data: nspam
0.000 0 18 0 non-token data: nham

This is JUST what I need at 3pm on a Friday.

ANY clues as what the hell happened and how to (quickly) fix it?

This sucks.

--JR

RE: What the...?!?!?!?! My bayes ate itself! [ In reply to ]

gary at primeexalia

Feb 27, 2004, 6:11 PM

Post #2 of 5 (548 views)

Permalink

Why don't you put the database in a better place like
/etc/mail/spamassassin. It could be a permission issue as well.
Looking at the ls output that you provided only root can read the files.
What user is spamd running under?

In my /etc/mail/spamassassin/local.cf I have:
bayes_path /etc/mail/spamassassin/bayes

-rw------- 1 filter root 43113 Feb 27 17:02 bayes_journal
-rw------- 1 filter root 167936 Feb 27 16:44 bayes_seen
-rw------- 1 filter root 2605056 Feb 27 16:44 bayes_toks

Hope this helps

Gary Smith

-----Original Message-----
From: JR [mailto:alerts@nu-designs.com]
Sent: Friday, February 27, 2004 3:14 PM
To: spamassassin-users@incubator.apache.org
Subject: What the...?!?!?!?! My bayes ate itself!

ACK!!!!!!!!

So I'm trying to do my regular Friday ritual of training Bayes, and
everything was going fine:

# sa-learn --showdots --spam --mbox 'SPAM_False_Negatives.mbox'
.... {*snip*}
Learned from 262 message(s) (290 message(s) examined).

# sa-learn --showdots --ham --mbox 'SPAM_False_Positives.mbox'
............
Learned from 12 message(s) (12 message(s) examined).

# sa-learn --showdots --ham --mbox '! Ham.mbox'
.... {*snip*}
Learned from 435 message(s) (446 message(s) examined).

Until it barfed oddly:

# sa-learn --showdots --spam --mbox Caught_Spam.mbox
.... {*snip*}
Learned from 1981 message(s) (3500 message(s) examined).
unlock: 9771 unlink failed: /root/.spamassassin/bayes.lock

There's no /root/.spamassassin/bayes.lock file, so I'm not sure why it
couldn't unlock it, unless sa-learn didn't make it in the first place...
so
what's up with that?

Then I went to check on the total number of learned spam/ham:

# sa-learn --dump magic
{*snip*}
0.000 0 18 0 non-token data: nspam
0.000 0 0 0 non-token data: nham
0.000 0 360 0 non-token data: ntokens
{*snip*}

What the?!?!??! Where'd all my learned stuff go???

# spamassassin -D --lint
{*snip*}
debug: bayes: 10314 tie-ing to DB file R/O
/root/.spamassassin/bayes_toks
debug: bayes: 10314 tie-ing to DB file R/O
/root/.spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: bayes: Not available for scanning, only 19 spam(s) in Bayes DB <
200
{*snip*}

So my Bayes is completely gone (more importantly: essentially turned
OFF!)?

But there's stuff there:

# ls -l /root/.spamassassin/
total 113708
-rw------- 1 root mail 84168704 Feb 2 11:52 auto-whitelist
-rw------- 1 root mail 34885632 Feb 27 15:00
auto-whitelist.db
-rw------- 1 root mail 10481664 Feb 27 15:00 bayes_seen
-rw------- 1 root mail 5611520 Feb 27 15:00 bayes_toks
-rw-r--r-- 1 root mail 1218 Jan 2 14:47 user_prefs

So now I'm re-training on my two IMAP archives of what I had previously
trained it with. I just fed it 30374 spam messages. But this kept
scrolling across my screen over and over and over:

Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line
1267.
Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line
1267.
Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line
1267.
Argument "" isn't numeric in numeric lt (<) at
/usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm line
1267.

And when it finished:

Learned from 327 message(s) (30386 message(s) examined).
unlock: 10665 unlink failed: /root/.spamassassin/bayes.lock

What's going on here? (And no, we haven't done anything to perl. This
machine pretty much sits there and minds its own business. The only
"changes" I make to it are to update custom .cf files and to train
bayes.)

I get the same BayesStore.pm error when trying to feed it the 2824 ham
messages I have archived. But it didn't puke with a unlink message:

Learned from 18 message(s) (2827 message(s) examined).

Now I'm up to a whoppingly unuseful message count that won't enable
bayes:

0.000 0 26 0 non-token data: nspam
0.000 0 18 0 non-token data: nham

This is JUST what I need at 3pm on a Friday.

ANY clues as what the hell happened and how to (quickly) fix it?

This sucks.

--JR

RE: What the...?!?!?!?! My bayes ate itself! [ In reply to ]

dvanderv at rocheste

Feb 27, 2004, 6:31 PM

Post #3 of 5 (550 views)

Permalink

I've had this problem several times when I fed spamassassin several hundred messages at once. The only solution I found (admittedly quite an annoying one) was to blow away the old database files and retrain from scratch - in smaller increments than previously so as not to break it again.

I have no clue why it happens. Sometimes I can get away with large numbers of messages, sometimes I can't. Generally, I try to stick to no more than one or two hundred at a time so as to avoid the issue entirely.

> -----Original Message-----
> From: JR [mailto:alerts@nu-designs.com]
> Sent: Friday, February 27, 2004 6:14 PM
> To: spamassassin-users@incubator.apache.org
> Subject: What the...?!?!?!?! My bayes ate itself!
>
>
>
>
> ACK!!!!!!!!
>
> So I'm trying to do my regular Friday ritual of training Bayes, and
> everything was going fine:
>
> # sa-learn --showdots --spam --mbox 'SPAM_False_Negatives.mbox'
> .... {*snip*}
> Learned from 262 message(s) (290 message(s) examined).
>
> # sa-learn --showdots --ham --mbox 'SPAM_False_Positives.mbox'
> ............
> Learned from 12 message(s) (12 message(s) examined).
>
> # sa-learn --showdots --ham --mbox '! Ham.mbox'
> .... {*snip*}
> Learned from 435 message(s) (446 message(s) examined).
>
>
> Until it barfed oddly:
>
> # sa-learn --showdots --spam --mbox Caught_Spam.mbox
> .... {*snip*}
> Learned from 1981 message(s) (3500 message(s) examined).
> unlock: 9771 unlink failed: /root/.spamassassin/bayes.lock
>
>
> There's no /root/.spamassassin/bayes.lock file, so I'm not
> sure why it
> couldn't unlock it, unless sa-learn didn't make it in the
> first place... so
> what's up with that?
>
> Then I went to check on the total number of learned spam/ham:
>
> # sa-learn --dump magic
> {*snip*}
> 0.000 0 18 0 non-token data: nspam
> 0.000 0 0 0 non-token data: nham
> 0.000 0 360 0 non-token data: ntokens
> {*snip*}
>
>
> What the?!?!??! Where'd all my learned stuff go???
>
> # spamassassin -D --lint
> {*snip*}
> debug: bayes: 10314 tie-ing to DB file R/O
> /root/.spamassassin/bayes_toks
> debug: bayes: 10314 tie-ing to DB file R/O
> /root/.spamassassin/bayes_seen
> debug: bayes: found bayes db version 2
> debug: bayes: Not available for scanning, only 19 spam(s) in
> Bayes DB < 200
> {*snip*}
>
> So my Bayes is completely gone (more importantly: essentially
> turned OFF!)?
>
> But there's stuff there:
>
> # ls -l /root/.spamassassin/
> total 113708
> -rw------- 1 root mail 84168704 Feb 2 11:52 auto-whitelist
> -rw------- 1 root mail 34885632 Feb 27 15:00
> auto-whitelist.db
> -rw------- 1 root mail 10481664 Feb 27 15:00 bayes_seen
> -rw------- 1 root mail 5611520 Feb 27 15:00 bayes_toks
> -rw-r--r-- 1 root mail 1218 Jan 2 14:47 user_prefs
>
>
> So now I'm re-training on my two IMAP archives of what I had
> previously
> trained it with. I just fed it 30374 spam messages. But this kept
> scrolling across my screen over and over and over:
>
> Argument "" isn't numeric in numeric lt (<) at
> /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm
> line 1267.
> Argument "" isn't numeric in numeric lt (<) at
> /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm
> line 1267.
> Argument "" isn't numeric in numeric lt (<) at
> /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm
> line 1267.
> Argument "" isn't numeric in numeric lt (<) at
> /usr/lib/perl5/site_perl/5.6.1/Mail/SpamAssassin/BayesStore.pm
> line 1267.
>
> And when it finished:
>
> Learned from 327 message(s) (30386 message(s) examined).
> unlock: 10665 unlink failed: /root/.spamassassin/bayes.lock
>
> What's going on here? (And no, we haven't done anything to
> perl. This
> machine pretty much sits there and minds its own business. The only
> "changes" I make to it are to update custom .cf files and to
> train bayes.)
>
> I get the same BayesStore.pm error when trying to feed it the
> 2824 ham
> messages I have archived. But it didn't puke with a unlink message:
>
> Learned from 18 message(s) (2827 message(s) examined).
>
> Now I'm up to a whoppingly unuseful message count that won't
> enable bayes:
>
> 0.000 0 26 0 non-token data: nspam
> 0.000 0 18 0 non-token data: nham
>
>
> This is JUST what I need at 3pm on a Friday.
>
> ANY clues as what the hell happened and how to (quickly) fix it?
>
> This sucks.
>
> --JR
>
>
>
>
>
>
>
>
>
>
>

RE: What the...?!?!?!?! My bayes ate itself! [ In reply to ]

alerts at nu-designs

Mar 1, 2004, 10:44 AM

Post #4 of 5 (553 views)

Permalink

/root/.spamassassin is actually just a symlink to /etc/mail/spamassassin
because I train as root and that's where sa-learn automatically looks.....

--JR

At 05:11 PM (-0800) 2/27/2004 (Friday), Gary Smith wrote:
>Why don't you put the database in a better place like
>/etc/mail/spamassassin. It could be a permission issue as well.
>Looking at the ls output that you provided only root can read the files.
>What user is spamd running under?
>
>In my /etc/mail/spamassassin/local.cf I have:
>bayes_path /etc/mail/spamassassin/bayes
>
>-rw------- 1 filter root 43113 Feb 27 17:02 bayes_journal
>-rw------- 1 filter root 167936 Feb 27 16:44 bayes_seen
>-rw------- 1 filter root 2605056 Feb 27 16:44 bayes_toks
>
>Hope this helps
>
>Gary Smith

RE: What the...?!?!?!?! My bayes ate itself! [ In reply to ]

alerts at nu-designs

Mar 1, 2004, 11:52 AM

Post #5 of 5 (553 views)

Permalink

At 08:31 PM (-0500) 2/27/2004 (Friday), Vandervort, David wrote:
>I've had this problem several times when I fed spamassassin several
>hundred messages at once. The only solution I found (admittedly quite an
>annoying one) was to blow away the old database files and retrain from
>scratch - in smaller increments than previously so as not to break it again.
>
>I have no clue why it happens. Sometimes I can get away with large numbers
>of messages, sometimes I can't. Generally, I try to stick to no more than
>one or two hundred at a time so as to avoid the issue entirely.

I've been training it with 2000 to 3000 spam and 100 to 300 ham every week
for the last ~7 weeks with no problems.

I just broke my ham down into smaller batches, and it didn't learn from
anything:

# sa-learn --ham --mbox Ham-001.mbox
Learned from 0 message(s) (450 message(s) examined).

# sa-learn --ham --mbox Ham-002.mbox
Learned from 0 message(s) (541 message(s) examined).

# sa-learn --ham --mbox Ham-003.mbox
Learned from 0 message(s) (540 message(s) examined).

# sa-learn --ham --mbox Ham-004.mbox
Learned from 0 message(s) (542 message(s) examined).

# sa-learn --ham --mbox Ham-005.mbox
Learned from 0 message(s) (540 message(s) examined).

# sa-learn --ham --mbox Ham-006.mbox
Learned from 0 message(s) (214 message(s) examined).

So if I just delete bayes_seen and bayes_toks and then re-train that will
solve my problems? Should I set use_bayes to 0 while I'm deleting /
retraining to not make stuff barf?

Thx.

-JR