Mailing List Archive

how to pipe an mbox to sa-learn
Usually I call
sa-learn --spam --mbox spammybox
but to do the same on remote machines I'd like to do
cat spammybox | ssh server "sa-learn --spam --mbox"

But sa-learn only spits perl-errors in that case.

This doesn't:
cat spammybox | ssh server "sa-learn --spam"
but sa-learn thinks the whole mbox is just ONE message.

Is that a bug or a feature or am I missing something? :-)
Thanks,
Andy.

--
o _ _ _
------- __o __o /\_ _ \\o (_)\__/o (_) -o)
----- _`\<,_ _`\<,_ _>(_) (_)/<_ \_| \ _|/' \/ /\\
---- (_)/ (_) (_)/ (_) (_) (_) (_) (_)' _\o_ _\_v
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Indeed, thanks to the Microsoft PC, the Internet has a level of
inconvenience that would be unacceptable in any other mass-market medium.
(http://www.economist.com/editorial/)
Re: how to pipe an mbox to sa-learn [ In reply to ]
Good day, Andy,

On Thu, 5 Feb 2004, Andy Spiegl wrote:

> Usually I call
> sa-learn --spam --mbox spammybox
> but to do the same on remote machines I'd like to do
> cat spammybox | ssh server "sa-learn --spam --mbox"
>
> But sa-learn only spits perl-errors in that case.
>
> This doesn't:
> cat spammybox | ssh server "sa-learn --spam"
> but sa-learn thinks the whole mbox is just ONE message.
>
> Is that a bug or a feature or am I missing something? :-)

I wasn't able to get it to work either. To enable remote
reporting, I put together the following script. It's also available at
http://www.stearns.org/sa-blacklist/learn-spam.current . You'll need to
customize $HOME, SpamFolders, HamFolders, and ReportServers. The contents
of SpamFolders and HamFolders are assumed to be emails you've personally
verified to be spams/hams. Once reported, these folders will be renamed
and compressed to save space, but still provide access if you need them.
I happen to use ssh-agent to provide instant access to the remote
machines without requiring a password (see
http://www.stearns.org/doc/ssh-techniques.current.html and
http://www.stearns.org/doc/ssh-techniques-two.current.html ; once
ssh-agent is running, I type "set | grep '^SSH >~/agent"). If run from
the command line, the ssh access will work with a normal password or
whatever you use.




#!/bin/bash
#Copyright 2003 William Stearns <wstearns@pobox.com>
#GPL'd.
#Is razor-report enough, or do we need to do some equivalent of spamassassin -r -d -a?

if [ -z "$HOME" ]; then
HOME="/home/wstearns/"
fi

LOCKFILE=$HOME/learnspam.lock
[ -f "$LOCKFILE" ] && exit 0
trap "rm -f $LOCKFILE" EXIT
touch $LOCKFILE
renice +15 -p $$ >/dev/null 2>&1


#User settings:
#wildcards OK, relative dirs, OK, absolute dirs aren't.
SpamFolders="verified-spam"

#wildcards OK, relative dirs, OK, absolute dirs aren't.
HamFolders="verified-ham"

#The following are the machines (and optional usernames) to which we'll
#ssh to learn these spams into their respective bayesian databases.
#The user we ssh under needs to have ssh set up, and needs write
#privileges to the (we assume shared) bayesian and whitelist databases.
ReportServers="localhost spamtrap@somemachine spam@somebox.domain.org"

MailDir="$HOME/mail/"
ArchiveDir="$HOME/mail/archives/"
#End of user settings


if [ -f $HOME/agent ]; then
. $HOME/agent
export SSH_AUTH_SOCK SSH_AGENT_PID SSH_ASKPASS
else
echo SSH agent info not in $HOME/agent, please place there.
fi
export LC_ALL=C


for OneFolder in $SpamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "Reporting $MailDir/$OneFolder to the razor database."
nice razor-report "$MailDir/$OneFolder"
fi
done

for Server in $ReportServers ; do
for OneFolder in $SpamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "========== $Server: SSSS $OneFolder"
#sa-learn --no-rebuild --showdots --mbox --spam "$MailDir/$OneFolder"
cat "$MailDir/$OneFolder" | ssh -o BatchMode=yes -o Compression=yes $Server \
'export TF=`mktemp -q /tmp/spam.XXXXXX </dev/null` && cat >>$TF && nice sa-learn --no-rebuild --showdots --mbox --spam $TF 2>&1 && [ -f $TF ] && rm -f $TF && echo Successful.' 2>/dev/null
fi
done

for OneFolder in $HamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "========== $Server: HHHH $OneFolder"
#sa-learn --no-rebuild --showdots --mbox --ham "$MailDir/$OneFolder"
cat "$MailDir/$OneFolder" | ssh -o BatchMode=yes -o Compression=yes $Server \
'export TF=`mktemp -q /tmp/ham.XXXXXX </dev/null` && cat >>$TF && nice sa-learn --no-rebuild --showdots --mbox --ham $TF 2>&1 && [ -f $TF ] && rm -f $TF && echo Successful.' 2>/dev/null
fi
done

echo "========== $Server: rebuild"
#sa-learn --rebuild
ssh -o BatchMode=yes $Server 'sa-learn --rebuild 2>&1' 2>/dev/null
done

DateStamp=`date +%Y%m%d%H%M`
for OneFolder in $SpamFolders $HamFolders ; do
if [ -f "$MailDir/$OneFolder" ]; then
echo "Saving to $OneFolder.$DateStamp"
mv "$MailDir/$OneFolder" "$ArchiveDir/$OneFolder.$DateStamp"
nice bzip2 -9 "$ArchiveDir/$OneFolder.$DateStamp"
fi
done
Re: how to pipe an mbox to sa-learn [ In reply to ]
> I happen to use ssh-agent to provide instant access to the remote
> machines without requiring a password (see
> http://www.stearns.org/doc/ssh-techniques.current.html and
> http://www.stearns.org/doc/ssh-techniques-two.current.html ; once
> ssh-agent is running, I type "set | grep '^SSH >~/agent"). If run from
> the command line, the ssh access will work with a normal password or
> whatever you use.

Great primer on ssh! Thanks for putting that together. Gotta forward
that to my only-slightly-savy-colleagues.
Re: how to pipe an mbox to sa-learn [ In reply to ]
> Usually I call
> sa-learn --spam --mbox spammybox
> but to do the same on remote machines I'd like to do
> cat spammybox | ssh server "sa-learn --spam --mbox"
>
> But sa-learn only spits perl-errors in that case.
>
> This doesn't:
> cat spammybox | ssh server "sa-learn --spam"
> but sa-learn thinks the whole mbox is just ONE message.
>
> Is that a bug or a feature or am I missing something? :-)

Have you tried

cat spammybox | ssh server "sa-learn --spam --mbox -"

I tried it locally (sa-learn --spam --mbox - < spammybox) and it worked fine.
- is a pretty standard shorthand for stdin.

At least something to try.
--
Adam Lopresto
http://cec.wustl.edu/~adam/

Where now the horse and the rider? Where is the horn that was blowing?
Where is the helm and the hauberk, and the bright hair flowing?
Where is the hand on the harpstring, and the red fire glowing?
Where is the spring and the harvest and the tall corn growing?
They have passed like rain on the mountain, like a wind in the meadow;
The days have gone down in the West behind the hills into shadow.
Who shall gather the smoke of the dead wood burning,
Or behold the flowing years from the Sea returning?
Re: how to pipe an mbox to sa-learn [ In reply to ]
On Thu, Feb 05, 2004 at 11:33:18AM -0500, William Stearns wrote:
> > Usually I call
> > sa-learn --spam --mbox spammybox
> > but to do the same on remote machines I'd like to do
> > cat spammybox | ssh server "sa-learn --spam --mbox"

Yeah, 'sa-learn --mbox' _NEEDS_ a file (it wants to seek around in the
file) in the current iteration. I have come up with a way to solve this
(namely if --mbox has no files specified, send STDIN to a temporary file
and let sa-learn deal with that), but haven't actually modified the code
yet. :(

bugzilla ticket 2869 is tracking this issue.

--
Randomly Generated Tagline:
The way I see it, if you declare something portable, you'll always be
wrong, and if you declare it non-portable, you'll always be right. :-)
-- Larry Wall in <199806232215.PAA02356@wall.org>
Re: how to pipe an mbox to sa-learn [ In reply to ]
Good afternoon, Chris,

On Thu, 5 Feb 2004, Chris Thielen wrote:

> > I happen to use ssh-agent to provide instant access to the remote
> > machines without requiring a password (see
> > http://www.stearns.org/doc/ssh-techniques.current.html and
> > http://www.stearns.org/doc/ssh-techniques-two.current.html ; once
> > ssh-agent is running, I type "set | grep '^SSH >~/agent"). If run from
> > the command line, the ssh access will work with a normal password or
> > whatever you use.
>
> Great primer on ssh! Thanks for putting that together. Gotta forward
> that to my only-slightly-savy-colleagues.

There are more ssh articles in the http://www.stearns.org/doc/
directory too...
Cheers,
- Bill

---------------------------------------------------------------------------
"I can picture in my mind a world without war, a world without hate.
And I can picture us attacking that world, because they'd never expect
it."
--Bad Mojo
--------------------------------------------------------------------------
William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org
--------------------------------------------------------------------------
Re: how to pipe an mbox to sa-learn [ In reply to ]
Good afternoon, Adam,

On Thu, 5 Feb 2004, Adam D. Lopresto wrote:

> > Usually I call
> > sa-learn --spam --mbox spammybox
> > but to do the same on remote machines I'd like to do
> > cat spammybox | ssh server "sa-learn --spam --mbox"
> >
> > But sa-learn only spits perl-errors in that case.
> >
> > This doesn't:
> > cat spammybox | ssh server "sa-learn --spam"
> > but sa-learn thinks the whole mbox is just ONE message.
> >
> > Is that a bug or a feature or am I missing something? :-)
>
> Have you tried
>
> cat spammybox | ssh server "sa-learn --spam --mbox -"
>
> I tried it locally (sa-learn --spam --mbox - < spammybox) and it worked fine.

Have you tried it with the exact "cat spammybox | ... --mbox -"
you mentioned above?

> - is a pretty standard shorthand for stdin.

If a Unix tool _chooses_ to implement that functionality, I agree
that's the most common syntax for it.
Cheers,
- Bill

---------------------------------------------------------------------------
"Tatu Ylonen was given public awards & wined & dined by the
Finnish government for making ssh available to the world. Phil
Zimmerman got a prison-term-commuted-to-community-service for doing the
same with PGP. There is **NO** doubt that US government laws, PAST and
PRESENT, HAVE hindered the development, commercialization and sales of
cryptographic software."
-- Duncan Napier <napier@napiersys.bc.ca>
--------------------------------------------------------------------------
William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org
--------------------------------------------------------------------------
Re: how to pipe an mbox to sa-learn [ In reply to ]
Hi Adam,

> Have you tried
>
> cat spammybox | ssh server "sa-learn --spam --mbox -"

Yes, of course. :-)
But sa-learn only spits out the same perl-errors as without the "-".
I didn't mention it because the manpage says it's without the "-".

cat spammybox | sa-learn --spam --mbox -
Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 324.
Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 324.
Use of uninitialized value in string at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 331.
Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 334.
unable to open : No such file or directory
Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 324.
...
..
.

BUT you are right that it works like this: (BTW, also without the "-")
sa-learn --spam --mbox - < spammybox
Learned from 0 message(s) (5 message(s) examined).

I don't understand where the difference is!?
In both cases sa-learn sees it coming from stdin, right?

Thanks,
Andy.

--
o _ _ _
------- __o __o /\_ _ \\o (_)\__/o (_) -o)
----- _`\<,_ _`\<,_ _>(_) (_)/<_ \_| \ _|/' \/ /\\
---- (_)/ (_) (_)/ (_) (_) (_) (_) (_)' _\o_ _\_v
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Sometimes it is better to apologize afterwards than to ask for
permission beforehand.
Re: how to pipe an mbox to sa-learn [ In reply to ]
Good afternoon, Andy,

On Thu, 5 Feb 2004, Andy Spiegl wrote:

> > Have you tried
> >
> > cat spammybox | ssh server "sa-learn --spam --mbox -"
>
> Yes, of course. :-)
> But sa-learn only spits out the same perl-errors as without the "-".
> I didn't mention it because the manpage says it's without the "-".
>
> cat spammybox | sa-learn --spam --mbox -
> Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 324.
> Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 324.
> Use of uninitialized value in string at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 331.
> Use of uninitialized value in concatenation (.) or string at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 334.
> unable to open : No such file or directory
> Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Mail/SpamAssassin/ArchiveIterator.pm line 324.
> ...
> ..
> .
>
> BUT you are right that it works like this: (BTW, also without the "-")
> sa-learn --spam --mbox - < spammybox
> Learned from 0 message(s) (5 message(s) examined).
>
> I don't understand where the difference is!?
> In both cases sa-learn sees it coming from stdin, right?

Nope. The "< spammybox" is what's providing the _file_
spamassasin needs. The "-" is ignored, to the best of my understanding.
Many thanks, Theo, for explaining that a file is needed.
Cheers,
- Bill

---------------------------------------------------------------------------
"This virus only works on Linux. It works on the honor system.
Please forward this email to everyone you know, then delete a bunch of
files."
-- Ross Carlson
--------------------------------------------------------------------------
William Stearns (wstearns@pobox.com). Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org
--------------------------------------------------------------------------
Re: how to pipe an mbox to sa-learn [ In reply to ]
On Thu, Feb 05, 2004 at 02:47:39PM -0500, William Stearns wrote:
> Nope. The "< spammybox" is what's providing the _file_
> spamassasin needs. The "-" is ignored, to the best of my understanding.
> Many thanks, Theo, for explaining that a file is needed.

The more verbose description of this, btw, is:

sa-learn's mbox reader wants to generate a list of messages and their
offsets (uses the same code as mass-check). ie: mbox.0, mbox.1523,
mbox.25432, etc. Then after scanning the mbox file to generate those
offsets, it'll go back through and seek() to the offset, deal with the
message, seek() to the next offset, deal with the message, etc, until
all of the messages in the file are done.

It's really a byproduct of the fact that we're sharing code with
mass-check on this one since sa-learn could just deal with the messages as
it reads through the file. :( It's unfortunately a lot of very similar
code which is why they're tied together like that.

--
Randomly Generated Tagline:
We don't have enough parallel universes to allow all uses of all
junction types--in the absence of quantum computing the combinatorics
are not in our favor...
-- Larry Wall in <20031213210102.GE18685@wall.org>
Re: how to pipe an mbox to sa-learn [ In reply to ]
On Thu, Feb 05, 2004 at 07:57:58PM +0100, Andy Spiegl wrote:
> BUT you are right that it works like this: (BTW, also without the "-")
> sa-learn --spam --mbox - < spammybox
> Learned from 0 message(s) (5 message(s) examined).
>
> I don't understand where the difference is!?
> In both cases sa-learn sees it coming from stdin, right?

I was noticing that myself. My random guess is that it's a shell thing.
'cat foo | sa-learn' means start a process and aim STDOUT from 'cat'
to STDIN of 'sa-learn'. 'sa-learn ... < foo' tells the shell to make
'foo' available as the STDIN of 'sa-learn'.

So therefore you can seek() on the latter, but not the former.

Anwyay, I'm checking in a kluge for 3.0.0 to fix this. It could be made
to work in 2.6x as well, but would require some mods. :(

--
Randomly Generated Tagline:
"A word to the wise: a credentials dicksize war is usually a bad idea on the
net."
(David Parsons in c.o.l.development.system, about coding in C.)