Mailing List Archive

using sa-learn with a remote spamd, was Re: how to pipe an mbox to sa-learn (fwd)
Good evening, Tom,
Here's a post I did a while back on remote sa-learn learning.
Cheers,
- Bill

---------------------------------------------------------------------------
"You know you're drinking too much coffee when the only time
you're standing still is in an earthquake."
(Courtesy of Rich Pinkall Pollei <whraven@worldnet.att.net>)
--------------------------------------------------------------------------
William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org
--------------------------------------------------------------------------

---------- Forwarded message ----------
Date: Thu, 5 Feb 2004 11:33:18 -0500 (EST)
From: William Stearns <wstearns@pobox.com>
To: Andy Spiegl <spamassassin.andy@spiegl.de>
Cc: ML-spamassassin-talk <spamassassin-users@incubator.apache.org>,
William Stearns <wstearns@pobox.com>
Subject: Re: how to pipe an mbox to sa-learn

Good day, Andy,

On Thu, 5 Feb 2004, Andy Spiegl wrote:

> Usually I call
> sa-learn --spam --mbox spammybox
> but to do the same on remote machines I'd like to do
> cat spammybox | ssh server "sa-learn --spam --mbox"
>
> But sa-learn only spits perl-errors in that case.
>
> This doesn't:
> cat spammybox | ssh server "sa-learn --spam"
> but sa-learn thinks the whole mbox is just ONE message.
>
> Is that a bug or a feature or am I missing something? :-)

I wasn't able to get it to work either. To enable remote
reporting, I put together the following script. It's also available at
http://www.stearns.org/sa-blacklist/learn-spam.current . You'll need to
customize $HOME, SpamFolders, HamFolders, and ReportServers. The contents
of SpamFolders and HamFolders are assumed to be emails you've personally
verified to be spams/hams. Once reported, these folders will be renamed
and compressed to save space, but still provide access if you need them.
I happen to use ssh-agent to provide instant access to the remote
machines without requiring a password (see
http://www.stearns.org/doc/ssh-techniques.current.html and
http://www.stearns.org/doc/ssh-techniques-two.current.html ; once
ssh-agent is running, I type "set | grep '^SSH >~/agent"). If run from
the command line, the ssh access will work with a normal password or
whatever you use.




#!/bin/bash
#Copyright 2003 William Stearns <wstearns@pobox.com>
#GPL'd.
#Is razor-report enough, or do we need to do some equivalent of spamassassin -r -d -a?

if [ -z "$HOME" ]; then
HOME="/home/wstearns/"
fi

LOCKFILE=$HOME/learnspam.lock
[ -f "$LOCKFILE" ] && exit 0
trap "rm -f $LOCKFILE" EXIT
touch $LOCKFILE
renice +15 -p $$ >/dev/null 2>&1


#User settings:
#wildcards OK, relative dirs, OK, absolute dirs aren't.
SpamFolders="verified-spam"

#wildcards OK, relative dirs, OK, absolute dirs aren't.
HamFolders="verified-ham"

#The following are the machines (and optional usernames) to which we'll
#ssh to learn these spams into their respective bayesian databases.
#The user we ssh under needs to have ssh set up, and needs write
#privileges to the (we assume shared) bayesian and whitelist databases.
ReportServers="localhost spamtrap@somemachine spam@somebox.domain.org"

MailDir="$HOME/mail/"
ArchiveDir="$HOME/mail/archives/"
#End of user settings


if [ -f $HOME/agent ]; then
. $HOME/agent
export SSH_AUTH_SOCK SSH_AGENT_PID SSH_ASKPASS
else
echo SSH agent info not in $HOME/agent, please place there.
fi
export LC_ALL=C


for OneFolder in $SpamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "Reporting $MailDir/$OneFolder to the razor database."
nice razor-report "$MailDir/$OneFolder"
fi
done

for Server in $ReportServers ; do
for OneFolder in $SpamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "========== $Server: SSSS $OneFolder"
#sa-learn --no-rebuild --showdots --mbox --spam "$MailDir/$OneFolder"
cat "$MailDir/$OneFolder" | ssh -o BatchMode=yes -o Compression=yes $Server \
'export TF=`mktemp -q /tmp/spam.XXXXXX </dev/null` && cat >>$TF && nice sa-learn --no-rebuild --showdots --mbox --spam $TF 2>&1 && [ -f $TF ] && rm -f $TF && echo Successful.' 2>/dev/null
fi
done

for OneFolder in $HamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "========== $Server: HHHH $OneFolder"
#sa-learn --no-rebuild --showdots --mbox --ham "$MailDir/$OneFolder"
cat "$MailDir/$OneFolder" | ssh -o BatchMode=yes -o Compression=yes $Server \
'export TF=`mktemp -q /tmp/ham.XXXXXX </dev/null` && cat >>$TF && nice sa-learn --no-rebuild --showdots --mbox --ham $TF 2>&1 && [ -f $TF ] && rm -f $TF && echo Successful.' 2>/dev/null
fi
done

echo "========== $Server: rebuild"
#sa-learn --rebuild
ssh -o BatchMode=yes $Server 'sa-learn --rebuild 2>&1' 2>/dev/null
done

DateStamp=`date +%Y%m%d%H%M`
for OneFolder in $SpamFolders $HamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "Saving to $OneFolder.$DateStamp"
mv "$MailDir/$OneFolder" "$ArchiveDir/$OneFolder.$DateStamp"
nice bzip2 -9 "$ArchiveDir/$OneFolder.$DateStamp"
fi
done