Mailing List Archive

sa-learning hidden IMAP message?
I'm currently running spamassassin on my mailserver where my users primarily use IMAP to connect.

I want to set up two folders for each user, "spam_to_learn" and "ham_to_learn", so that when they get uncaught spam or caught ham, they can just move the message to the appropriate folder, and once per day my script will run to push the mbox file through sa-learn.

There's two things I'm concerned about:

1. the "hidden" imap message that always sits at the start of the mbox file. Based on my initial tests, it looks like that hidden message gets processed by sa-learn, which tells me token info about my mailserver etc. are getting inadvertently marked as spamy tokens. Is the the case, or should I be worried about it? If this is the case then I can write a routine to flush out the hidden message, but why bother if i don't need to...

2. Ham that got marked as spam. Since spam messages get altered so that the spammy message becomes an attachment to the spam info, when I get a ham that got caught as spam, I suspect that I need to move the "original" message to my ham folder, not the "altered" message? ie. If I move the altered message to the ham folder, the message will contain all of the spam info and the actual message will still be an attachement. the alternative is that they need to open the attachment to see the original ham message and then move THAT message into the ham folder. ???

thoughts?

regards,

Paul


-------------------------
paulweb@fielding.skoda.ca

get rid of the car to email me.
Re: sa-learning hidden IMAP message? [ In reply to ]
Hi Paul,

Paul Fielding wrote:

> I want to set up two folders for each user, "spam_to_learn" and
> "ham_to_learn", so that when they get uncaught spam or caught ham, they
> can just move the message to the appropriate folder, and once per day my
> script will run to push the mbox file through sa-learn.
>
> There's two things I'm concerned about:
>
> 1. the "hidden" imap message that always sits at the start of the mbox
> file. Based on my initial tests, it looks like that hidden message gets
> processed by sa-learn, which tells me token info about my mailserver
> etc. are getting inadvertently marked as spamy tokens. Is the the case,
> or should I be worried about it? If this is the case then I can write a
> routine to flush out the hidden message, but why bother if i don't need
> to...

It depends a little on the header in question and how it gets tokenized.
If it is learned as a single token, and if that header never occurs in
your inbound stream, there isn't a problem: It will learn the header but
never see it on incoming mail, so it will never influence the Bayesian
statistics. I'm using Cyrus IMAP and although Bayes has tokenized the
'X-Sieve: CMU Sieve 2.2' header in various ways, but it doesn't seem to
penalize mentioning the word Sieve.

> 2. Ham that got marked as spam. Since spam messages get altered so that
> the spammy message becomes an attachment to the spam info, when I get a
> ham that got caught as spam, I suspect that I need to move the
> "original" message to my ham folder, not the "altered" message? ie. If
> I move the altered message to the ham folder, the message will contain
> all of the spam info and the actual message will still be an
> attachement. the alternative is that they need to open the attachment
> to see the original ham message and then move THAT message into the ham
> folder. ???

According to a post just a few lines down in the list, 'sa-learn removes
SpamAssassin markup?' Martin Radford writes:
"On my system, '-d' does remove all the markup, including converting the
attachement back to the main body of the mail."

Regards, Paul Boven.
Re: sa-learning hidden IMAP message? [ In reply to ]
At Sat Feb 7 10:11:40 2004, Paul Boven wrote:

> > 2. Ham that got marked as spam. Since spam messages get altered so that
> > the spammy message becomes an attachment to the spam info, when I get a
> > ham that got caught as spam, I suspect that I need to move the
> > "original" message to my ham folder, not the "altered" message? ie. If
> > I move the altered message to the ham folder, the message will contain
> > all of the spam info and the actual message will still be an
> > attachement. the alternative is that they need to open the attachment
> > to see the original ham message and then move THAT message into the ham
> > folder. ???
>
> According to a post just a few lines down in the list, 'sa-learn removes
> SpamAssassin markup?' Martin Radford writes:
> "On my system, '-d' does remove all the markup, including converting the
> attachement back to the main body of the mail."

... except that if you read that mail, you see I was referring to the
command "spamassassin -d", not sa-learn.

However, sa-learn knows what SA's markup looks like, and will remove
it when learning, so this is not a problem.

See http://wiki.spamassassin.org/w/BayesInSpamAssassin :

It's OK to feed emails with Spamassassin markup into the sa-learn
command -- sa-learn will ignore any standard Spamassassin headers, and
if the original email has been encapsulated into an attachment it will
decapsulate the email. In other words sa-learn will undo any changes
which Spamassassin has done before learning the spam/ham character of
the email.

Martin
--
Martin Radford | "Only wimps use tape backup: _real_
martin@zamenhof.demon.co.uk | men just upload their important stuff -o)
Registered Linux user #9257 | on ftp and let the rest of the world /\\
- see http://counter.li.org | mirror it ;)" - Linus Torvalds _\_V
Re: sa-learning hidden IMAP message? [ In reply to ]
Paul Fielding wrote:
> I'm currently running spamassassin on my mailserver where my users
> primarily use IMAP to connect.
>
> I want to set up two folders for each user, "spam_to_learn" and
> "ham_to_learn", so that when they get uncaught spam or caught ham, they
> can just move the message to the appropriate folder, and once per day my
> script will run to push the mbox file through sa-learn.
>
> There's two things I'm concerned about:
>
> 1. the "hidden" imap message that always sits at the start of the mbox
> file. Based on my initial tests, it looks like that hidden message gets
> processed by sa-learn, which tells me token info about my mailserver
> etc. are getting inadvertently marked as spamy tokens. Is the the case,
> or should I be worried about it? If this is the case then I can write a
> routine to flush out the hidden message, but why bother if i don't need
> to...
>
> 2. Ham that got marked as spam. Since spam messages get altered so that
> the spammy message becomes an attachment to the spam info, when I get a
> ham that got caught as spam, I suspect that I need to move the
> "original" message to my ham folder, not the "altered" message? ie. If
> I move the altered message to the ham folder, the message will contain
> all of the spam info and the actual message will still be an
> attachement. the alternative is that they need to open the attachment
> to see the original ham message and then move THAT message into the ham
> folder. ???
>
> thoughts?
>
> regards,
>
> Paul
>
Paul

I use a perl script to dig out the imap based emails one at a time and
then push them through sa-learn. Saves having to have local mbox for the
learning..

If you'd like a copy, email me off list and I'll forward it along with
how to run the thing.

There's also a setting (usually placed in local.cf) to ignore the
SA-headers whilst learning. This is..

bayes_ignore_header X-Spam
bayes_ignore_header X-Spam-SpamCheck
bayes_ignore_header X-Spam-SpamScore
bayes_ignore_header X-Spam-Information

You'll need to alter these to your local settings..





--
Martin Hepworth
Snr Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300


**********************************************************************

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.

**********************************************************************
RE: sa-learning hidden IMAP message? [ In reply to ]
Paul:



I'm not one of the experts, actually a newbie; however, I also use IMAP
boxes to collect Ham and spam for learning. I have not noticed a problem
with those dummy first messages. But, two thoughts: IF it were learned as
spam, it really wouldn't matter, since you don't want them delivered anyway.
And, if you are concerned about it, you can use either the

--forget

Forget a given message previously learnt. [learned]



Option to un-learn it.



On your second question, as I understand it, sa-learn, when used in
conjunction with the report_safe { 0 | 1 | 2 } (default: 1) option, is
smart enough to learn the original message and not the service e-mail to
which it was attached.



I have also found that a lot of what I learned about sa-learn is actually
contained in other sections of the documentation, so you may find that wider
reading is necessary to find all the information. While I would call SA the
best thing since sliced bread, its documentation is IMHO it's weakest link.
Perhaps because it's the most boring to write?



If I'm wrong on either of your points, I'm sure someone will let us know and
we'll both learn.



/s/ John





-----Original Message-----
From: Paul Fielding [mailto:paulweb@fielding.ca]
Sent: Saturday, February 07, 2004 2:22 AM
To: spamassassin-users@incubator.apache.org
Subject: sa-learning hidden IMAP message?



I'm currently running spamassassin on my mailserver where my users primarily
use IMAP to connect.


I want to set up two folders for each user, "spam_to_learn" and
"ham_to_learn", so that when they get uncaught spam or caught ham, they can
just move the message to the appropriate folder, and once per day my script
will run to push the mbox file through sa-learn.



There's two things I'm concerned about:



1. the "hidden" imap message that always sits at the start of the mbox file.
Based on my initial tests, it looks like that hidden message gets processed
by sa-learn, which tells me token info about my mailserver etc. are getting
inadvertently marked as spamy tokens. Is the the case, or should I be
worried about it? If this is the case then I can write a routine to flush
out the hidden message, but why bother if i don't need to...



2. Ham that got marked as spam. Since spam messages get altered so that the
spammy message becomes an attachment to the spam info, when I get a ham that
got caught as spam, I suspect that I need to move the "original" message to
my ham folder, not the "altered" message? ie. If I move the altered
message to the ham folder, the message will contain all of the spam info and
the actual message will still be an attachement. the alternative is that
they need to open the attachment to see the original ham message and then
move THAT message into the ham folder. ???



thoughts?



regards,



Paul




-------------------------
paulweb@fielding.skoda.ca

get rid of the car to email me.