Mailing List Archive

SPAMD with sa-learn
Hi All

I am using spamd/spamc with Qmail. My spamd/spamc are owned by qmailq
user. Now for emails that are miscaught as spam or failed to catch as spam
needs to be learned by a user. Now I am not sure how I can pick those
slipped and false spams and pipe them through sa-learn so that only one
user learns all these

I can create a user called spam and forward all the spams that are
missed and have a dot-qmail file with sa-learn. But it is not the same
user who owns spamd. Also I can create a user called ham and forward all
emails that are falsely labeled as spam and have a dot-qmail file
sa-learn to learn it as ham.

But problem is I will have one user 'spam' learnings spams, a different
user 'ham' learning hams and a third user 'qmailq' detecting spams using
spamd. I can make spamd to run as user 'spam' but then I still have
another user 'ham' learning hams.

Did you see my predicament ?

I am trying to find a site-specific solution which will not only detect
spams through spamd but will also can learn slipped spams as spam and false
spams as ham


Any help/suggestion on this would be greatly appreciated

Currently I am using

/usr/local/bin/spamc | /var/qmail/bin/qmail-queue.orig

as my qmail-queue

and I have spamd running as qmailq

Thanks

--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
There's no place like 127.0.0.1
Re: SPAMD with sa-learn [ In reply to ]
I am reposting it incase someone have missed my post. I am looking for
some help / suggestion on this issue

Asif Iqbal wrote:
> Hi All
>
> I am using spamd/spamc with Qmail. My spamd/spamc are owned by qmailq
> user. Now for emails that are miscaught as spam or failed to catch as spam
> needs to be learned by a user. Now I am not sure how I can pick those
> slipped and false spams and pipe them through sa-learn so that only one
> user learns all these
>
> I can create a user called spam and forward all the spams that are
> missed and have a dot-qmail file with sa-learn. But it is not the same
> user who owns spamd. Also I can create a user called ham and forward all
> emails that are falsely labeled as spam and have a dot-qmail file
> sa-learn to learn it as ham.
>
> But problem is I will have one user 'spam' learnings spams, a different
> user 'ham' learning hams and a third user 'qmailq' detecting spams using
> spamd. I can make spamd to run as user 'spam' but then I still have
> another user 'ham' learning hams.
>
> Did you see my predicament ?
>
> I am trying to find a site-specific solution which will not only detect
> spams through spamd but will also can learn slipped spams as spam and false
> spams as ham
>
>
> Any help/suggestion on this would be greatly appreciated
>
> Currently I am using
>
> /usr/local/bin/spamc | /var/qmail/bin/qmail-queue.orig
>
> as my qmail-queue
>
> and I have spamd running as qmailq
>
> Thanks
>
> --
> Asif Iqbal
> PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
> There's no place like 127.0.0.1

--
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
There's no place like 127.0.0.1
Re: SPAMD with sa-learn [ In reply to ]
On Mon, 2004-02-16 at 00:02, Asif Iqbal wrote:
> Hi All
>
> I am using spamd/spamc with Qmail. My spamd/spamc are owned by qmailq
> user. Now for emails that are miscaught as spam or failed to catch as spam
> needs to be learned by a user. Now I am not sure how I can pick those
> slipped and false spams and pipe them through sa-learn so that only one
> user learns all these
>
> I can create a user called spam and forward all the spams that are
> missed and have a dot-qmail file with sa-learn. But it is not the same
> user who owns spamd. Also I can create a user called ham and forward all
> emails that are falsely labeled as spam and have a dot-qmail file
> sa-learn to learn it as ham.
>

You shouldn't forward mail to be learned by bayes because bayes will
pick up your FWD headers and tag those as spam.

I would suggest setting up a cron job that executes as the user spamc is
running as to search through a Spam folder, learn it as Spam, then
search through Inbox/Sent/whatever and learn it as Ham.

- Jon

--
jon@tgpsolutions.com

Administrator, tgpsolutions
http://www.tgpsolutions.com
Re: SPAMD with sa-learn [ In reply to ]
----- Original Message -----
From: "Jonathan Tai" <jon@tgpsolutions.com>
On Mon, 2004-02-16 at 00:02, Asif Iqbal wrote:
>> Hi All
>>
>> I am using spamd/spamc with Qmail. My spamd/spamc are owned by qmailq
>> user. Now for emails that are miscaught as spam or failed to catch as
spam
>> needs to be learned by a user. Now I am not sure how I can pick those
>> slipped and false spams and pipe them through sa-learn so that only one
>> user learns all these
>>
>> I can create a user called spam and forward all the spams that are
>> missed and have a dot-qmail file with sa-learn. But it is not the same
>> user who owns spamd. Also I can create a user called ham and forward all
>> emails that are falsely labeled as spam and have a dot-qmail file
>> sa-learn to learn it as ham.
>>

> You shouldn't forward mail to be learned by bayes because bayes will
> pick up your FWD headers and tag those as spam.
>
> I would suggest setting up a cron job that executes as the user spamc is
> running as to search through a Spam folder, learn it as Spam, then
> search through Inbox/Sent/whatever and learn it as Ham.

I understand the downside part about forwarding spam back to myself at
a spam account well enough. Is there a procmailrc entry that can be used
to strip off the forwarding headers and then forward that to the sa-learn
tool as the "From:" addressee user? That might be a handy way to make
training easier from non-Linux email software like Outlook Express.

{^_^}
Re: SPAMD with sa-learn [ In reply to ]
On Thu, 2004-02-19 at 14:36, jdow wrote:
> I understand the downside part about forwarding spam back to myself at
> a spam account well enough. Is there a procmailrc entry that can be used
> to strip off the forwarding headers and then forward that to the sa-learn
> tool as the "From:" addressee user? That might be a handy way to make
> training easier from non-Linux email software like Outlook Express.
>

IANAOU (I'm not an Outlook user), but someone suggested a few days ago
that for Outlook, if you create a new message and *drag* the spam into
the new message, it will forward it as an attachment. Then the Spam
Admin can extract the un-mangled spam into an IMAP folder for learning
via cron job.

- Jon

--
jon@tgpsolutions.com

Administrator, tgpsolutions
http://www.tgpsolutions.com
Re: SPAMD with sa-learn [ In reply to ]
From: "Jonathan Tai" <jon@tgpsolutions.com>
On Thu, 2004-02-19 at 14:36, jdow wrote:
>> I understand the downside part about forwarding spam back to myself at
>> a spam account well enough. Is there a procmailrc entry that can be used
>> to strip off the forwarding headers and then forward that to the sa-learn
>> tool as the "From:" addressee user? That might be a handy way to make
>> training easier from non-Linux email software like Outlook Express.
>>
>
> IANAOU (I'm not an Outlook user), but someone suggested a few days ago
> that for Outlook, if you create a new message and *drag* the spam into
> the new message, it will forward it as an attachment. Then the Spam
> Admin can extract the un-mangled spam into an IMAP folder for learning
> via cron job.

I saw that one. I'm trying to minimize my admin time by automating it
if possible. It seems like this should be possible. (Push comes to
shove I toss together a C program to strip the excess headers.)

"If it comes from jdow to jdowspam on the internal net and has a
forwarded email in it then strip off the forwarding cruft and feed
it through sa-learn rather than spamc. Then drop it on the floor."

I'm just hoping that someone's been crazy enough to do something like
this already. In principle it should not be particularly hard to strip
off a level of mime before farming it out.

{^_^}
Re: SPAMD with sa-learn [ In reply to ]
On Thu, Feb 19, 2004 at 03:44:05PM -0800, jdow wrote:
> From: "Jonathan Tai" <jon@tgpsolutions.com>
> On Thu, 2004-02-19 at 14:36, jdow wrote:
>
> I saw that one. I'm trying to minimize my admin time by automating it
> if possible. It seems like this should be possible. (Push comes to
> shove I toss together a C program to strip the excess headers.)
>
> "If it comes from jdow to jdowspam on the internal net and has a
> forwarded email in it then strip off the forwarding cruft and feed
> it through sa-learn rather than spamc. Then drop it on the floor."
>
> I'm just hoping that someone's been crazy enough to do something like
> this already. In principle it should not be particularly hard to strip
> off a level of mime before farming it out.
>
> {^_^}

The problem is that outlook only forwards some of the headers and most
of them it changes. Date: is changed to Sent: and the contents of From:
are mangled. So far the only header I've seen that remains the same
when forwarded is the Subject:. As I understand it, you really need all
of the original, unmodified headers to train via sa-learn since it
tokenizes the headers as well as the body.

I'm using spamassasssin via mailscanner. I haven't come up with a
totally automated solution, yet. (and I'm not sure I can, really.)

But,mailscanner has an option to archive all mail in it's original
pristine state. I have users forwarding false-negatives to a mailbox.
From that I'm grepping out the forwarded Subject: lines and piping them
via xargs to a grepmail on the archive. grepmail spits all matching
messages into an mbox which I then weed through with mutt to delete any
that aren't spam (so far I haven't seen any, but it is definitely
possible). Finally, I feed sa-learn w/that mbox.

I actually end up feeding more than the reported number of false
positives, since often many people received spams with the same subject
but, either aren't having their mail spam-scanned, yet, or didn't bother
to report the false negative.

This actually kind of sucks, but it's the best I've come up with so far.
I've run into some tricky issues with subjects that contain special
regex characters like (,),!,?,.,etc. I'm not really sure where I'm going
to go from here with this, but maybe it gives you some ideas ...

-Eric Rz.
Re: SPAMD with sa-learn [ In reply to ]
On Thu, 2004-02-19 at 15:44, jdow wrote:
> > IANAOU (I'm not an Outlook user), but someone suggested a few days ago
> > that for Outlook, if you create a new message and *drag* the spam into
> > the new message, it will forward it as an attachment. Then the Spam
> > Admin can extract the un-mangled spam into an IMAP folder for learning
> > via cron job.
>
> I saw that one. I'm trying to minimize my admin time by automating it
> if possible. It seems like this should be possible. (Push comes to
> shove I toss together a C program to strip the excess headers.)

Here's what I'm doing for Outlook users:

Have the users create SpamAssassin-SPAM and SpamAssassin-HAM folders.

Copy the missed spams to the SPAM folder, and the false positives to the
HAM folder.

Periodically export those folders to a new .PST file on the spamd box.

A nightly process runs that extracts the messages from the .PST files
and runs sa-learn over them.

So far (we're still in beta) it appears to be working well. It's more
reliable than Save-As individual messages and getting the wrong format
(Outlook seems to have stopped believing in RFC-822 format about two
weeks ago) and is simple enough for the users to understand.

I can provide more details if needed.

--
John Hardin KA7OHZ
Internal Systems Administrator/Guru voice: (425) 672-1304
Apropos Retail Management Systems, Inc. fax: (425) 672-0192
-----------------------------------------------------------------------
Failure to plan ahead on someone else's part does not constitute an
emergency on my part.
- David W. Barts in a.s.r
-----------------------------------------------------------------------
11 days until ICQ Corp goes away - have you installed Jabber yet?
Re: SPAMD with sa-learn [ In reply to ]
----- Original Message -----
From: "jdow" <jdow@earthlink.net>
To: <spamassassin-users@incubator.apache.org>
Sent: Thursday, February 19, 2004 5:36 PM
Subject: Re: SPAMD with sa-learn


>
> ----- Original Message -----
> From: "Jonathan Tai" <jon@tgpsolutions.com>
> On Mon, 2004-02-16 at 00:02, Asif Iqbal wrote:
> >> Hi All
> >>
> >> I am using spamd/spamc with Qmail. My spamd/spamc are owned by qmailq
> >> user. Now for emails that are miscaught as spam or failed to catch as
> spam
> >> needs to be learned by a user. Now I am not sure how I can pick those
> >> slipped and false spams and pipe them through sa-learn so that only one
> >> user learns all these
> >>
> >> I can create a user called spam and forward all the spams that are
> >> missed and have a dot-qmail file with sa-learn. But it is not the same
> >> user who owns spamd. Also I can create a user called ham and forward
all
> >> emails that are falsely labeled as spam and have a dot-qmail file
> >> sa-learn to learn it as ham.
> >>
>
> > You shouldn't forward mail to be learned by bayes because bayes will
> > pick up your FWD headers and tag those as spam.
> >
> > I would suggest setting up a cron job that executes as the user spamc is
> > running as to search through a Spam folder, learn it as Spam, then
> > search through Inbox/Sent/whatever and learn it as Ham.
>
> I understand the downside part about forwarding spam back to myself at
> a spam account well enough. Is there a procmailrc entry that can be used
> to strip off the forwarding headers and then forward that to the sa-learn
> tool as the "From:" addressee user? That might be a handy way to make
> training easier from non-Linux email software like Outlook Express.

At home, I get email with Outlook Express from my remote Linux machine using
POP3. OE moves "[SPAM]" tagged email to a spam folder, and I drag these, or
missed spam, (or ham) to the corresponding spam or ham IMAP folder, thus
putting them back on the Linux machine to be learned at a time I decide
later. Works great.
Re: SPAMD with sa-learn [ In reply to ]
On Thu, 2004-02-19 at 18:41, John Hardin wrote:
> Here's what I'm doing for Outlook users:
>
> Have the users create SpamAssassin-SPAM and SpamAssassin-HAM folders.
>
> Copy the missed spams to the SPAM folder, and the false positives to the
> HAM folder.
>
> Periodically export those folders to a new .PST file on the spamd box.
>
> A nightly process runs that extracts the messages from the .PST files
> and runs sa-learn over them.
>
> So far (we're still in beta) it appears to be working well. It's more
> reliable than Save-As individual messages and getting the wrong format
> (Outlook seems to have stopped believing in RFC-822 format about two
> weeks ago) and is simple enough for the users to understand.
>
> I can provide more details if needed.

I'd sure be interested in some details ;)

--
Homer Parker /"\ ASCII Ribbon Campaign
\ / No HTML/RTF in email
http://www.homershut.net x No Word docs in email
telnet://bbs.homershut.net / \ Respect for open standards

"Bill Gates reports on security progress made and the challenges ahead."
-- Microsoft's Homepage, on the day an SQL Server bug crippled large
sections of the Internet.
RE: SPAMD with sa-learn [ In reply to ]
I would be very interested in details.

Thanks,
Dan

> -----Original Message-----
> From: Homer [mailto:hparker@homershut.net]
> Sent: Friday, February 20, 2004 11:17 AM
> To: SpamAssassin list
> Subject: Re: SPAMD with sa-learn
>
> On Thu, 2004-02-19 at 18:41, John Hardin wrote:
> > Here's what I'm doing for Outlook users:
> >
> > Have the users create SpamAssassin-SPAM and
> SpamAssassin-HAM folders.
> >
> > Copy the missed spams to the SPAM folder, and the false
> positives to
> > the HAM folder.
> >
> > Periodically export those folders to a new .PST file on the
> spamd box.
> >
> > A nightly process runs that extracts the messages from the
> .PST files
> > and runs sa-learn over them.
> >
> > So far (we're still in beta) it appears to be working well.
> It's more
> > reliable than Save-As individual messages and getting the
> wrong format
> > (Outlook seems to have stopped believing in RFC-822 format
> about two
> > weeks ago) and is simple enough for the users to understand.
> >
> > I can provide more details if needed.
>
> I'd sure be interested in some details ;)
>
> --
> Homer Parker /"\ ASCII Ribbon Campaign
> \ / No HTML/RTF in email
> http://www.homershut.net x No Word docs in email
> telnet://bbs.homershut.net / \ Respect for open standards
>
> "Bill Gates reports on security progress made and the
> challenges ahead."
> -- Microsoft's Homepage, on the day an SQL Server bug crippled large
> sections of the Internet.
>
>