Mailing List Archive: adding option to support anti-spam filters

adding option to support anti-spam filters

feargal at chrysalink

Nov 20, 2003, 7:34 PM

Post #1 of 34 (5105 views)

Hi,
I'm looking for thoughts on ways that dbmail could interact with anti-spam software?

I'm scanning users' mail with SpamAssassin which adds an 'X-Spam-Flag: Yes' header to spam prior to delivering to dbmail. I want our pop3 users not to have to download the mail which has been marked as spam, but to retain it so they can review it in the web interface I'm writing. (Anybody else out there use neowebscript? Didn't think so...)

My plan is to adjust dbmail-smtp to scan the header for the X-Spam-Flag, and if found, store the message in a 'Spam' mailbox. Then I'll alter dbmail-pop3d not to include mail from the Spam mailbox.

I'll also adjust dbmail-maintenance so that it sends a summary of the spam mailbox.

Since I'm not likely to actually write any of this until next week, I may as well do it properly, so I'd like to know how other people would like it to work.

I was thinking the best generic method is to add a section to dbmail.conf as follows:

[SPAM]
SPAM_HEADER=X-Spam-Flag # the header to scan for
SPAM_YES=Yes # header value with indicates spam
SPAM_SEND_SUMMARY=yes # whether to generate a daily summary

This makes a couple of assumptions:
1) The scanner adds a header with a fixed value to indicate spam. I don't know anything about other anti-spam filters, maybe they do not all do this? I'm thinking some might just add a score or something. If this is the case we could probably allow for operators like '>5.0'.

2) It strikes me that if an account has 2 aliases, dbmail has no way of knowing which address is the default address for that account. It's not a major problem since you can just select the first one, and I can't think of a situation where you would *really* need it, but it would be nice to know to address the daily report to 'mr.j.smith@...' instead of 'porn.be.here@...'.

3) It's okay to call a mailbox 'Spam'. I have never used IMAP, and know very little about it's operation. Is it acceptable to use a mailbox name which a user may have set up themselves? If not, is there any other way of separating spam from valid mail, other than adding an additional flag to the database schema?

The other thought I had, is that use could be made of the config table to personalise this per user, but I'm not sure that's what it's intended for.

As I said, I've no experience with IMAP, so can someone tell me the best way to seperate spam out from other mail with IMAP?

And any/all other thoughts and advice are welcomed.

Oh, and has anybody already done this so I don't have to..? :)

-fr.

--
Feargal Reilly,
Codeshifter,
Chrysalink Systems.

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 20, 2003, 10:05 PM

Post #2 of 34 (5052 views)

On Fri, Nov 21, 2003 at 02:34:39AM +0000, Feargal Reilly wrote:
> Hi, I'm looking for thoughts on ways that dbmail could interact with
> anti-spam software?
>
> I'm scanning users' mail with SpamAssassin which adds an 'X-Spam-Flag:
> Yes' header to spam prior to delivering to dbmail. I want our pop3
> users not to have to download the mail which has been marked as spam,
> but to retain it so they can review it in the web interface I'm
> writing. (Anybody else out there use neowebscript? Didn't think so...)
>
> My plan is to adjust dbmail-smtp to scan the header for the
> X-Spam-Flag, and if found, store the message in a 'Spam' mailbox. Then
> I'll alter dbmail-pop3d not to include mail from the Spam mailbox.
>
> I'll also adjust dbmail-maintenance so that it sends a summary of the
> spam mailbox.

I think you're going about this the wrong way. If you want to
quarantine spam (as opposed to rejecting it at SMTP, which I'd
recommend), have your MTA deliver deliver it to a different dbmail mail
box, e.g.
if spam
deliver to dbmail-smtp -m spam -u $user
else
deliver to dbmail-smtp -u $user

No modification to dbmail needed, just a minor configuration change to
your MTA.

xn

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 20, 2003, 10:11 PM

Post #3 of 34 (5053 views)

Feargal Reilly wrote:

> Hi,
> I'm looking for thoughts on ways that dbmail could interact with anti-spam software?

*SNIP*

I am being called by turns a heretic or an idiot for this concept BUT:

=== begin heretical thoughts:

- as mail is stored in a DB, all I want to do is add a few new fields
(BOOLEAN) to the DB and let a trigger(s) invoke (an) asynchronous
process(es) to spam check, virus chack, reformime the message *or whatever*.

- inbound messages would go direclty into the DB, but be flagged as
'pending processing'. Thereafter the message would not move - only its
flags would change.

- a user-specific set of flags would determine which filters - if any -
were to be applied.

- a stored procedure / trigger would detect the presence of one or more
'pending' messages, call up a dispatcher that would:

- read the user's preference flags, resetting the 'pending' flag to
'processed' state (pending_NOT) if no filters were wanted,

- ELSE notify the filters of the location / ID of the pending message.

- each filter would run - probably against a batch of messages,
SELECTING what it needed (header, body, attachment, or all) from the
message in the DB, do its job, notify the dispatcher to reset their
flags to 'done' *for that filter* on the messages in question (or reset
them themselves). Optionally leave the original in place and write a
modified message...

- when all filters that were required had finished, the message would be
either marked 'available for read' or deleted / moved elesewhere /
delivered to a separate 'box' / given a new X-header - whatever....

IOW - I would want any and all filters to *stay out of* any queues or
pipelines between the message store and the incoming / outgoing smtp,
pop, or imap processes.

This *should* let I/O remain simple, fast, and unbalked, and allow a
high degree of independence w/r the mix of filter modules to be applied.

Any delays would affect how long a message already in the DB waited
processing / final 'availability' to the POP or IMAP - but would NOT
slow down the incoming or outgoing communications with corresponent servers.

OK - smacks of legacy 'batch processing' - but filters can be run in
parallel, and that can be quite efficient on a DB-resident body of data.
-so batching here removes obstacles on the I/O and 'raw' mail is going
to get *into* the DB faster....

A goal would be that the 'pending' state did not last any longer than it
might under the traditional serial-process approach..

*BUT* - even if it DID, it would not choke the traffic on/off the box.

*While we are at it**

I want a message for a client flagged as an IMAP USER to have a 'isIMAP'
flag set. This sez that the message is 'persistant' in the DB, i.e not
deleted *just* because it has been read by a POP client. Another field
ID's which IMAP folder number (default null = INBOX) the user has moved
the IMAP message into.

Display to the IMAP client is then a SELECT <whatever> WHERE isIMAP=true
order by folder-number date ... or something like that.

Again, the message per se doesn't move, only the flags change.

If we have a RDBMS and don't *use* it as an RDBMS, we won't gain the
advantages of an RDBMS....

======
End heretical thoughts..

Bill Hacker

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 20, 2003, 10:19 PM

Post #4 of 34 (5050 views)

Christian G. Warden wrote:

> On Fri, Nov 21, 2003 at 02:34:39AM +0000, Feargal Reilly wrote:
>
>>Hi, I'm looking for thoughts on ways that dbmail could interact with
>>anti-spam software?
>>
>>I'm scanning users' mail with SpamAssassin which adds an 'X-Spam-Flag:
>>Yes' header to spam prior to delivering to dbmail. I want our pop3
>>users not to have to download the mail which has been marked as spam,
>>but to retain it so they can review it in the web interface I'm
>>writing. (Anybody else out there use neowebscript? Didn't think so...)
>>
>>My plan is to adjust dbmail-smtp to scan the header for the
>>X-Spam-Flag, and if found, store the message in a 'Spam' mailbox. Then
>>I'll alter dbmail-pop3d not to include mail from the Spam mailbox.
>>
>>I'll also adjust dbmail-maintenance so that it sends a summary of the
>>spam mailbox.
>
>
> I think you're going about this the wrong way. If you want to
> quarantine spam (as opposed to rejecting it at SMTP, which I'd
> recommend), have your MTA deliver deliver it to a different dbmail mail
> box, e.g.
> if spam
> deliver to dbmail-smtp -m spam -u $user
> else
> deliver to dbmail-smtp -u $user
>
> No modification to dbmail needed, just a minor configuration change to
> your MTA.
>
> xn

That is a whole lot easier to implement that the approach I just posted.
But may have to be re-implemented as the external MTA or toolsets change..

- It also tends to leave us with a system where all we have really
done is replace Maildirs or Mboxes with an RDBMS for message and
configuration storage.

i.e raw or structured file sytem storage vs RDBMS for storage doesn't
really maximize the available RDBMS tools...

..and we are still utilizing conventional queues, pipelines, *whatever*
of the legacy MTA world, with attendent delays in the I/O.

Why not simply get the messages straight into the DB, then apply the
filtering as a separate process before flagging the message as
'available' to the POP or IMAP client?

Bill Hacker..

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 20, 2003, 10:36 PM

Post #5 of 34 (5048 views)

On Fri, Nov 21, 2003 at 01:19:56PM +0800, Bill Hacker wrote:
> That is a whole lot easier to implement that the approach I just posted.
> But may have to be re-implemented as the external MTA or toolsets change..
>
> - It also tends to leave us with a system where all we have really
> done is replace Maildirs or Mboxes with an RDBMS for message and
> configuration storage.
>
> i.e raw or structured file sytem storage vs RDBMS for storage doesn't
> really maximize the available RDBMS tools...
>
> ..and we are still utilizing conventional queues, pipelines, *whatever*
> of the legacy MTA world, with attendent delays in the I/O.
>
> Why not simply get the messages straight into the DB, then apply the
> filtering as a separate process before flagging the message as
> 'available' to the POP or IMAP client?

There is a lot of filtering that you may want to do during the SMTP
dialog such as spam filtering, virus scanning, and various other policy
checks on messages. There may also be filtering that redirects messages
to other hosts in which case it makes sense to keep messages within the
control of an MTA, and only hand messages off to dbmail for final
delivery (final, neglecting subsequent pickup by a POP/IMAP client).

I think dbmail should focus on being a high-performance, reliable,
scalable message store.

xn

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 21, 2003, 2:28 AM

Post #6 of 34 (5052 views)

Christian G. Warden wrote:

> On Fri, Nov 21, 2003 at 01:19:56PM +0800, Bill Hacker wrote:
>
>>That is a whole lot easier to implement that the approach I just posted.
>>But may have to be re-implemented as the external MTA or toolsets change..
>>
>> - It also tends to leave us with a system where all we have really
>>done is replace Maildirs or Mboxes with an RDBMS for message and
>>configuration storage.
>>
>> i.e raw or structured file sytem storage vs RDBMS for storage doesn't
>>really maximize the available RDBMS tools...
>>
>>..and we are still utilizing conventional queues, pipelines, *whatever*
>>of the legacy MTA world, with attendent delays in the I/O.
>>
>>Why not simply get the messages straight into the DB, then apply the
>>filtering as a separate process before flagging the message as
>>'available' to the POP or IMAP client?
>
>
> There is a lot of filtering that you may want to do during the SMTP
> dialog such as spam filtering, virus scanning, and various other policy
> checks on messages.

ACK. Most efficent place to do that empirically. But I am willing to
try another way in the interest of non-blocking I/O and the certainty of
100% logging.

> There may also be filtering that redirects messages
> to other hosts in which case it makes sense to keep messages within the
> control of an MTA, and only hand messages off to dbmail for final
> delivery (final, neglecting subsequent pickup by a POP/IMAP client).
>

It makes sense. It seems optimal. It is not the only way, and in the
big picture may not even be the best way for all circumstances.

Suppose you are an enterprise site and have a requirement for a
permanent record of all traffic. Logging is simpler if the mailstore
*is* the log.

> I think dbmail should focus on being a high-performance, reliable,
> scalable message store.
>
> xn

Fortunately, these goals are by no means mutually exclusive...

But the "high performance, reliable, scalable" (whatever) "store" is
called a file system. <G> In any of its flavors a fs *is* a database in
its own right, and the DB engines we use have to impose a different
style of DB on top of it. In order to recover that translation
'overhead' we need to get advantages out of the DB engine that we cannot
easily get directly from the fs.

Better security and easier configuration can probably be taken as
"stipulated" in favor of the DB engine. High performance is another matter.

A 'Database Management System' can only compete on performance with the
raw fs when there are *complex and/or difficult to predict* tasks or
manipulations to be performed on whatever is being stored. Ordinary
indexed storage, i.e. store/retrieve w/o alteration, w/o sub-selects,
w/o ordering, w/o complex WHERE clauses, etc. is not really its best
suit of clothes, 'coz the right fs does those things pretty well as is.
The better ones are even transactionally aware.

For my part, IF I am to use a DB at all, I want it to earn its place at
the table, not just do raw storage + a bit.

As the two approaches don't get in the way of each other, (extra flags
are ignored) I hope both methods get some attention.

Bill Hacker

> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
>

Re: adding option to support anti-spam filters [ In reply to ]

michael at akatose

Nov 21, 2003, 6:06 AM

Post #7 of 34 (5047 views)

Hi everybody,

> I think you're going about this the wrong way.
> [...]
> No modification to dbmail needed, just a minor configuration change to
> your MTA.

This isn't easy when combined with virtual users.
I personally think, that support for database-configured virtual users (no
shell accounts necessary) is one of the main strengths of DBMail.

If you want such a virtual user to be able to configure (activate /
deactivate) spam-filtering for his or her own mailbox, it would be
appropriate to store this configuration value in the database.

Perhaps, the patch for user-defined-filters could be useful:
http://sourceforge.net/tracker/index.php?
func=detail&aid=777878&group_id=85894&atid=577644

Best regards,
Michael

Re: adding option to support anti-spam filters [ In reply to ]

Magnus.Sundberg at dican

Nov 21, 2003, 6:15 AM

Post #8 of 34 (5042 views)

Isn't sieve a solution to this?

/Magnus

Christian G. Warden wrote:
> On Fri, Nov 21, 2003 at 01:19:56PM +0800, Bill Hacker wrote:
>
>>That is a whole lot easier to implement that the approach I just posted.
>>But may have to be re-implemented as the external MTA or toolsets change..
>>
>> - It also tends to leave us with a system where all we have really
>>done is replace Maildirs or Mboxes with an RDBMS for message and
>>configuration storage.
>>
>> i.e raw or structured file sytem storage vs RDBMS for storage doesn't
>>really maximize the available RDBMS tools...
>>
>>..and we are still utilizing conventional queues, pipelines, *whatever*
>>of the legacy MTA world, with attendent delays in the I/O.
>>
>>Why not simply get the messages straight into the DB, then apply the
>>filtering as a separate process before flagging the message as
>>'available' to the POP or IMAP client?
>
>
> There is a lot of filtering that you may want to do during the SMTP
> dialog such as spam filtering, virus scanning, and various other policy
> checks on messages. There may also be filtering that redirects messages
> to other hosts in which case it makes sense to keep messages within the
> control of an MTA, and only hand messages off to dbmail for final
> delivery (final, neglecting subsequent pickup by a POP/IMAP client).
>
> I think dbmail should focus on being a high-performance, reliable,
> scalable message store.
>
> xn
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 21, 2003, 9:30 AM

Post #9 of 34 (5051 views)

On Fri, Nov 21, 2003 at 02:06:01PM +0100, Michael Häusler wrote:
> Hi everybody,
>
> > I think you're going about this the wrong way.
> > [...]
> > No modification to dbmail needed, just a minor configuration change to
> > your MTA.
>
> This isn't easy when combined with virtual users.

I suppose it depends on your MTA.

> I personally think, that support for database-configured virtual users (no
> shell accounts necessary) is one of the main strengths of DBMail.

Agreed.

> If you want such a virtual user to be able to configure (activate /
> deactivate) spam-filtering for his or her own mailbox, it would be
> appropriate to store this configuration value in the database.

Agreed. The MTA can retreive configuration info from the database.

xn

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 21, 2003, 9:49 AM

Post #10 of 34 (5054 views)

On Fri, Nov 21, 2003 at 05:28:43PM +0800, Bill Hacker wrote:
> Christian G. Warden wrote:
[...]
> >I think dbmail should focus on being a high-performance, reliable,
> >scalable message store.
> >
> >xn
>
> Fortunately, these goals are by no means mutually exclusive...
>
> But the "high performance, reliable, scalable" (whatever) "store" is
> called a file system. <G> In any of its flavors a fs *is* a database in
> its own right, and the DB engines we use have to impose a different
> style of DB on top of it. In order to recover that translation
> 'overhead' we need to get advantages out of the DB engine that we cannot
> easily get directly from the fs.

A database solves the main performance issues associated with common
fs-based message stores, large files files containing many messages and
directories containing a large number of messages in separate files.

> Better security and easier configuration can probably be taken as
> "stipulated" in favor of the DB engine. High performance is another matter.
>
> A 'Database Management System' can only compete on performance with the
> raw fs when there are *complex and/or difficult to predict* tasks or
> manipulations to be performed on whatever is being stored. Ordinary
> indexed storage, i.e. store/retrieve w/o alteration, w/o sub-selects,
> w/o ordering, w/o complex WHERE clauses, etc. is not really its best
> suit of clothes, 'coz the right fs does those things pretty well as is.
> The better ones are even transactionally aware.

A database can perform well at easier tasks too. In my experience,
dbmail is already faster at updating a large folder than uw-imapd with
mbox. Ilja's recent changes make copying messages very fast. If we
index some of the headers, we can also get improved performance on
server-side searching.

xn

Re: adding option to support anti-spam filters [ In reply to ]

Magnus.Sundberg at dican

Nov 21, 2003, 10:08 AM

Post #11 of 34 (5044 views)

Christian G. Warden wrote:

>
> A database can perform well at easier tasks too. In my experience,
> dbmail is already faster at updating a large folder than uw-imapd with
> mbox. Ilja's recent changes make copying messages very fast. If we
> index some of the headers, we can also get improved performance on
> server-side searching.
>

Today, we deliver all the mail data to the messageblock table.
What about delivering the headers to a "header" table and the
rest of the message to the messageblock.

Suppose we have something like
CREATE TABLE headerblks (
headerblk_idnr bigint(21) NOT NULL auto_increment,
physmessage_id bigint(21) NOT NULL default '0',
headerblk longtext NOT NULL,
blocksize bigint(21) NOT NULL default '0',
PRIMARY KEY (headerblk_idnr),
KEY physmsg_index (physmessage_id),
FOREIGN KEY (`physmessage_id`) REFERENCES
`physmessage`(`id`) ON DELETE CASCADE,
===> FULLTEXT (headerblk)
) TYPE=InnoDB;
The above is a copy of the messageblks structure, with the string
"messageblk" changed to "headerblk" and a FULLTEXT index on the
headerblk. This way, we can search for all kind of headers with a
simple SQL statement, and actually not do that much of coding
ourself.

This will also make it easier to only store exactly one instance
of each messageblock, as has been diskussed yesterday.

/Magnus

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 21, 2003, 10:35 AM

Post #12 of 34 (5039 views)

On Fri, Nov 21, 2003 at 06:08:49PM +0100, Magnus Sundberg wrote:
> Christian G. Warden wrote:
>
> >
> >A database can perform well at easier tasks too. In my experience,
> >dbmail is already faster at updating a large folder than uw-imapd with
> >mbox. Ilja's recent changes make copying messages very fast. If we
> >index some of the headers, we can also get improved performance on
> >server-side searching.
> >
>
> Today, we deliver all the mail data to the messageblock table.
> What about delivering the headers to a "header" table and the
> rest of the message to the messageblock.
>
> Suppose we have something like
> CREATE TABLE headerblks (
> headerblk_idnr bigint(21) NOT NULL auto_increment,
> physmessage_id bigint(21) NOT NULL default '0',
> headerblk longtext NOT NULL,
> blocksize bigint(21) NOT NULL default '0',
> PRIMARY KEY (headerblk_idnr),
> KEY physmsg_index (physmessage_id),
> FOREIGN KEY (`physmessage_id`) REFERENCES
> `physmessage`(`id`) ON DELETE CASCADE,
> ===> FULLTEXT (headerblk)
> ) TYPE=InnoDB;
> The above is a copy of the messageblks structure, with the string
> "messageblk" changed to "headerblk" and a FULLTEXT index on the
> headerblk. This way, we can search for all kind of headers with a
> simple SQL statement, and actually not do that much of coding
> ourself.

Headers are already stored in their own messageblk. Simply adding a
header flag to messageblks would allow selecting multiple headers in a
single select. We've talked about actually storing common headers,
such as To, From, Subject, Date, etc. in their own fields.

btw, fulltext indexes aren't supported in innodb tables yet :(

xn

Re: adding option to support anti-spam filters [ In reply to ]

Nov 21, 2003, 10:40 AM

Post #13 of 34 (5059 views)

Hello,

One additional problem with this heretical idea ;) is it
doesn't scale nearly as well. We have one db server, tuned for
storage/retrieval of data, and we have 4 seperate dbmail servers
all using it as a message store. As we add new spam filtering
techniques, etc., which require more processing power, it's easy
to just throw a couple more machines in the group of dbmail
servers. If we moved all that processing to the db server,
I don't think we could afford a machine with enough horsepower
to do the job. That's definitely a huge advantage of a filesystem
message store replacement with a rdbms one (though there are certainly
others, too).

The idea as a whole is not without merit, as some of the points
you've brought up indicate, but I think as a whole is probably
not the right approach for dbmail. Possibly if you wanted to
write it (of course postgres-only for now) and post your db
schema/triggers/etc., you may get a small following of people
using it. As mentioned, there has been filtering support already
patched into dbmail (both a basic filter mechanism and sieve support),
which you may find useful (search the mailing list for the patches).

My $.02....
Jesse

---- Original Message ----
From: Bill Hacker <dbmail-dev@dbmail.org>
To: dbmail-dev@dbmail.org
Subject: Re: [Dbmail-dev] adding option to support anti-spam filters
Sent: Fri, 21 Nov 2003 13:19:56 +0800

> Christian G. Warden wrote:
>
> > On Fri, Nov 21, 2003 at 02:34:39AM +0000, Feargal Reilly wrote:
> >
> >>Hi, I'm looking for thoughts on ways that dbmail could interact with
> >>anti-spam software?
> >>
> >>I'm scanning users' mail with SpamAssassin which adds an 'X-Spam-Flag:
> >>Yes' header to spam prior to delivering to dbmail. I want our pop3
> >>users not to have to download the mail which has been marked as spam,
> >>but to retain it so they can review it in the web interface I'm
> >>writing. (Anybody else out there use neowebscript? Didn't think so...)
> >>
> >>My plan is to adjust dbmail-smtp to scan the header for the
> >>X-Spam-Flag, and if found, store the message in a 'Spam' mailbox. Then
> >>I'll alter dbmail-pop3d not to include mail from the Spam mailbox.
> >>
> >>I'll also adjust dbmail-maintenance so that it sends a summary of the
> >>spam mailbox.
> >
> >
> > I think you're going about this the wrong way. If you want to
> > quarantine spam (as opposed to rejecting it at SMTP, which I'd
> > recommend), have your MTA deliver deliver it to a different dbmail
mail
> > box, e.g.
> > if spam
> > deliver to dbmail-smtp -m spam -u $user
> > else
> > deliver to dbmail-smtp -u $user
> >
> > No modification to dbmail needed, just a minor configuration change to
> > your MTA.
> >
> > xn
>
> That is a whole lot easier to implement that the approach I just posted.
> But may have to be re-implemented as the external MTA or toolsets
change..
>
> - It also tends to leave us with a system where all we have really
> done is replace Maildirs or Mboxes with an RDBMS for message and
> configuration storage.
>
> i.e raw or structured file sytem storage vs RDBMS for storage doesn't
> really maximize the available RDBMS tools...
>
> ..and we are still utilizing conventional queues, pipelines, *whatever*
> of the legacy MTA world, with attendent delays in the I/O.
>
> Why not simply get the messages straight into the DB, then apply the
> filtering as a separate process before flagging the message as
> 'available' to the POP or IMAP client?
>
>
> Bill Hacker..
>
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
-- End Original Message --

--
Jesse Norell
jesse (at) kci.net

Re: adding option to support anti-spam filters [ In reply to ]

Magnus.Sundberg at dican

Nov 21, 2003, 11:14 AM

Post #14 of 34 (5060 views)

Christian G. Warden wrote:

>
>
> Headers are already stored in their own messageblk. Simply adding a
> header flag to messageblks would allow selecting multiple headers in a
> single select. We've talked about actually storing common headers,
> such as To, From, Subject, Date, etc. in their own fields.
>
> btw, fulltext indexes aren't supported in innodb tables yet :(

Sorry,
I was unaware of the lack of fulltext index support in innodb. I
guess we have to wait for that, since it is on innodb longterm
todo list and foreign keys for MyISAM are scheduled for version
5.1. What is postgres status?

My very strong opinion is that, you shall try very hard to make
all your searches with as few SQL statements as possible. I
beleive that the database itself is the best indexing algoritm.
You should therefore build your data base structure for optimal
performance.
I do not beleive in creating separate tables with indexes that
need separate INSERT statements everytime you insert a record in
another table.
I very much beleive that all node relationships should be
accessed straight one way, that spagetti database structure
should be avoided, if possible. But this is not possible with
MySQL this year and probably not next year.

Please correct me if this beleif is wrong.

I know that the headers are stored in a separate messageblk.
It was mentioned earlier on this list that the search of common
headers without index, actually not is that expensive. Therefore
would the actual gain of storing the common headers not be that
huge. You would also need to determine which headers that are
common, you would also get a spagetti database structure.
For performance, you should probably not add a FULLTEXT index to
all messageblocks. It is therefore more practical to move the
headers off to another table, where you can index the data that
you actually will search.

/Magnus

Re: adding option to support anti-spam filters [ In reply to ]

feargal at chrysalink

Nov 25, 2003, 9:20 AM

Post #15 of 34 (5050 views)

Okay... didn't expect that much discussion, glad for it though.
Turns out most of this comes down to your philosophy of what dbmail should do.

Christian Warden suggested using the -m flag of dbmail-smtp to deliver it to a seperate mailbox. I wasn't actually aware of that option as it's not mentioned in the man page, and it does the job pretty well (although it'll hurt my head to figure out the sendmail config...).
However, as Michael Häusler pointed out, it does have the limitation of being hard-coded in the MTA, whereas I believe it would be better to allow it to be optionally set by the end user. The potential for end-user configuration is one of the core strengths of dbmail in my eyes. Additionally, for those who have separate servers dedicated to scanning incoming mail, they may not with for the MTA on the dbmail host to spawn additional filtering mechanisms, and prefer to use a lightweight MTA to hand it on to dbmail.

It certainly is a good alternative for people who are doing the scanning on the same server, and who don't care for end-user configurability, so I will add it to my 'things to document' list.

What didn't get discussed however, and I thought this would generate more controversy, was the idea of sending a summary to the user as part of the maintenance run. Do people think this should be part of dbmail-maintenance, or should it be a separate run?

Finally, as I mentioned, I have no experience with IMAP, so should a spam mailbox be called 'Spam', '/Spam', or something else?

--
Feargal Reilly,
Codeshifter,
Chrysalink Systems.

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 25, 2003, 9:34 AM

Post #16 of 34 (5048 views)

Feargal Reilly wrote:

> Okay... didn't expect that much discussion, glad for it though. Turns
> out most of this comes down to your philosophy of what dbmail should
> do.
*SNIP*

>
> What didn't get discussed however, and I thought this would generate
> more controversy, was the idea of sending a summary to the user as
> part of the maintenance run. Do people think this should be part of
> dbmail-maintenance, or should it be a separate run?

Sounds reasonable to select *which* with a flag or config option of some
sort. Depending the luserbase, it could create more questions than it
answers...
>
> Finally, as I mentioned, I have no experience with IMAP, so should a
> spam mailbox be called 'Spam', '/Spam', or something else?
>

So long as DBMail relies on external daemonss for IMAP, it is likely to
be 'INBOX.<something>', but I would suggest a short name "other than
SPAM", on the grounds that:

- IF we were *certain* it was spam, we probably wouldn't deliver it at all.

- "Probable spam" seems appropriate if the luser wishes to reiew it in
case his filters are over zealous, but may be too long a name for
convenience...

- How about "Suspect" (INBOX.Suspect)

Bill Hacker

Re: adding option to support anti-spam filters [ In reply to ]

aaron at serendipity

Nov 25, 2003, 9:57 AM

Post #17 of 34 (5054 views)

I'm not sure what you mean by "DBMail relies on external daemons for IMAP"
because dbmail-imapd is most certainly part of DBMail and not external, ala
Courier or something like that. We have full control over how we want our IMAP
server to behave, and in this case, what the path delimiters are. Now if I
actually remember what the delimiter is...

I believe that the best way to handle spam is by using an MTA based spam
checker which adds identifying headers, prefixes the subject line, or
otherwise marks the incoming email without disrupting the MTA path towards
DBMail delivery.

At delivery time, use a Sieve script to put all of the spam into a folder, or
discard it, or keep it in INBOX, or bounce it back, or call you pager, etc.

It would also not be unreasonable to have a default Sieve script that does
this, although I'd have to add default scripts to my TODO list and it would
certainly be 3-6 months before I have the code sketched out.

BTW - As some may have seen my numerous freshmeat releases this month, I've
been busily working on libSieve and should have 2.2.0 stable and with a fully
frozen API in the sometime-between-now-and-1/1/2004 timeframe.

Aaron

Bill Hacker <wbh@conducive.org> said: [snip]
> Feargal Reilly wrote: [snip]
> >
> > Finally, as I mentioned, I have no experience with IMAP, so should a
> > spam mailbox be called 'Spam', '/Spam', or something else?
> >
>
> So long as DBMail relies on external daemonss for IMAP, it is likely to
> be 'INBOX.<something>', but I would suggest a short name "other than
> SPAM", on the grounds that:
>
> - IF we were *certain* it was spam, we probably wouldn't deliver it at all.
>
> - "Probable spam" seems appropriate if the luser wishes to reiew it in
> case his filters are over zealous, but may be too long a name for
> convenience...
>
> - How about "Suspect" (INBOX.Suspect)
>
> Bill Hacker
--

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 25, 2003, 10:03 AM

Post #18 of 34 (5052 views)

On Tue, Nov 25, 2003 at 04:20:17PM +0000, Feargal Reilly wrote:
> Okay... didn't expect that much discussion, glad for it though.
> Turns out most of this comes down to your philosophy of what dbmail should do.
>
> Christian Warden suggested using the -m flag of dbmail-smtp to deliver
> it to a seperate mailbox. I wasn't actually aware of that option as
> it's not mentioned in the man page, and it does the job pretty well
> (although it'll hurt my head to figure out the sendmail config...).

Ahh, there's the problem, your choice of MTA :)

> However, as Michael H?usler pointed out, it does have the limitation
> of being hard-coded in the MTA, whereas I believe it would be better
> to allow it to be optionally set by the end user. The potential for
> end-user configuration is one of the core strengths of dbmail in my
> eyes. Additionally, for those who have separate servers dedicated to
> scanning incoming mail, they may not with for the MTA on the dbmail
> host to spawn additional filtering mechanisms, and prefer to use a
> lightweight MTA to hand it on to dbmail.

The destination mailbox doesn't have to be hard-coded into the MTA. For
example, I configured a transport (Exim) that takes addresses of the
form /username~mailbox/ and delivers them to the correct mailbox using
dbmail-smtp. The mailbox to deliver to is the result of a
user-configurable filter which is stored in the users table in the
database.

> It certainly is a good alternative for people who are doing the
> scanning on the same server, and who don't care for end-user
> configurability, so I will add it to my 'things to document' list.
>
> What didn't get discussed however, and I thought this would generate
> more controversy, was the idea of sending a summary to the user as
> part of the maintenance run. Do people think this should be part of
> dbmail-maintenance, or should it be a separate run?

I think separate.

> Finally, as I mentioned, I have no experience with IMAP, so should a
> spam mailbox be called 'Spam', '/Spam', or something else?

'Spam' is fine, or 'Quarantine/Spam' or similar.

xn

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 25, 2003, 10:05 AM

Post #19 of 34 (5048 views)

On Wed, Nov 26, 2003 at 12:34:20AM +0800, Bill Hacker wrote:
> Feargal Reilly wrote:
> >Finally, as I mentioned, I have no experience with IMAP, so should a
> >spam mailbox be called 'Spam', '/Spam', or something else?
> >
>
> So long as DBMail relies on external daemonss for IMAP, it is likely to
> be 'INBOX.<something>', but I would suggest a short name "other than
> SPAM", on the grounds that:
>
> - IF we were *certain* it was spam, we probably wouldn't deliver it at all.
>
> - "Probable spam" seems appropriate if the luser wishes to reiew it in
> case his filters are over zealous, but may be too long a name for
> convenience...
>
> - How about "Suspect" (INBOX.Suspect)

dbmail doesn't use the (cyrus?) convention of INBOX.mailbox, so just
'Suspect' or 'Suspect/Spam' are options.

xn

Re: adding option to support anti-spam filters [ In reply to ]

xndbmail at xerus

Nov 25, 2003, 10:25 AM

Post #20 of 34 (5056 views)

On Tue, Nov 25, 2003 at 04:57:59PM -0000, Aaron Stone wrote:
> I believe that the best way to handle spam is by using an MTA based spam
> checker which adds identifying headers, prefixes the subject line, or
> otherwise marks the incoming email without disrupting the MTA path towards
> DBMail delivery.
>
> At delivery time, use a Sieve script to put all of the spam into a folder, or
> discard it, or keep it in INBOX, or bounce it back, or call you pager, etc.

You should *not* bounce spam as the sender is always forged (or often
enough that it's safe to say always). Either keep it, discard it, or
reject it at SMTP time.

xn

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 25, 2003, 10:26 AM

Post #21 of 34 (5052 views)

Aaron Stone wrote:

> I'm not sure what you mean by "DBMail relies on external daemons for IMAP"
> because dbmail-imapd is most certainly part of DBMail and not external, ala
> Courier or something like that. We have full control over how we want our IMAP
> server to behave, and in this case, what the path delimiters are. Now if I
> actually remember what the delimiter is...

Perhaps I have been under a mistaken impression, but, IIRC the docs had
instructions on how to configure DBMail with other (required) MTA's. If
that appleis only to the SMTPd (and POP3d) and the IMAPd is *not*
external, then there is a great deal more flexibility open to the devel
team...

>
> I believe that the best way to handle spam is by using an MTA based spam
> checker which adds identifying headers, prefixes the subject line, or
> otherwise marks the incoming email without disrupting the MTA path towards
> DBMail delivery.
>
Certainly the method that keeps junk out of the DB, but, IMNSHO, also
the method which potentially imposes the greatest delay in handling
inbound messages. And, per other MTA authors, effective spam filtering
is a 'non-trivial' load.

IOW - filtering-to-mark is nearly as cycle-consuming as
filtering-to-drop-on-the-floor or redirect, though perhpas less abusive
of an open connection than filtering-to-'reject'.

> At delivery time, use a Sieve script to put all of the spam into a folder, or
> discard it, or keep it in INBOX, or bounce it back, or call you pager, etc.
>
ACK.

> It would also not be unreasonable to have a default Sieve script that does
> this, although I'd have to add default scripts to my TODO list and it would
> certainly be 3-6 months before I have the code sketched out.

Open source means it should be 'borrowable', with due credit & license
etc. Not a new need.

>
> BTW - As some may have seen my numerous freshmeat releases this month, I've
> been busily working on libSieve and should have 2.2.0 stable and with a fully
> frozen API in the sometime-between-now-and-1/1/2004 timeframe.
>
> Aaron
>
>
> Bill Hacker <wbh@conducive.org> said: [snip]
>
>>Feargal Reilly wrote: [snip]
>>
>>>Finally, as I mentioned, I have no experience with IMAP, so should a
>>>spam mailbox be called 'Spam', '/Spam', or something else?
>>>
>>
>>So long as DBMail relies on external daemonss for IMAP, it is likely to
>>be 'INBOX.<something>', but I would suggest a short name "other than
>>SPAM", on the grounds that:
>>
>>- IF we were *certain* it was spam, we probably wouldn't deliver it at all.
>>
>>- "Probable spam" seems appropriate if the luser wishes to reiew it in
>>case his filters are over zealous, but may be too long a name for
>>convenience...
>>
>>- How about "Suspect" (INBOX.Suspect)
>>
>>Bill Hacker

Regards,

Bill Hacker

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 25, 2003, 10:29 AM

Post #22 of 34 (5049 views)

Christian G. Warden wrote:

> On Tue, Nov 25, 2003 at 04:57:59PM -0000, Aaron Stone wrote:
>
>>I believe that the best way to handle spam is by using an MTA based spam
>>checker which adds identifying headers, prefixes the subject line, or
>>otherwise marks the incoming email without disrupting the MTA path towards
>>DBMail delivery.
>>
>>At delivery time, use a Sieve script to put all of the spam into a folder, or
>>discard it, or keep it in INBOX, or bounce it back, or call you pager, etc.
>
>
> You should *not* bounce spam as the sender is always forged (or often
> enough that it's safe to say always). Either keep it, discard it, or
> reject it at SMTP time.
>
> xn
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
>
Second the 'not bounce' as too often the spam-bastards even have clever
forgeryways to utilize a mis-directed 'bounce' to propagate their garbage...

Bill Hacker

Re: adding option to support anti-spam filters [ In reply to ]

aaron at serendipity

Nov 25, 2003, 10:36 AM

Post #23 of 34 (5056 views)

It's something that end users, who are writing their Sieve scripts or using
interactive web happy Sieve editors, should take into account! It's not
something that I would want to ban programmatically, however, because of the
messy compexities and general klugey borkage it would involve.

Aaron

Bill Hacker <wbh@conducive.org> said:

> Christian G. Warden wrote:
>
> > On Tue, Nov 25, 2003 at 04:57:59PM -0000, Aaron Stone wrote:
> >
> >>I believe that the best way to handle spam is by using an MTA based spam
> >>checker which adds identifying headers, prefixes the subject line, or
> >>otherwise marks the incoming email without disrupting the MTA path towards
> >>DBMail delivery.
> >>
> >>At delivery time, use a Sieve script to put all of the spam into a folder, or
> >>discard it, or keep it in INBOX, or bounce it back, or call you pager, etc.
> >
> >
> > You should *not* bounce spam as the sender is always forged (or often
> > enough that it's safe to say always). Either keep it, discard it, or
> > reject it at SMTP time.
> >
> > xn
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
> >
> Second the 'not bounce' as too often the spam-bastards even have clever
> forgeryways to utilize a mis-directed 'bounce' to propagate their garbage...
>
> Bill Hacker
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

--

Re: adding option to support anti-spam filters [ In reply to ]

wbh at conducive

Nov 25, 2003, 10:53 AM

Post #24 of 34 (5052 views)

Aaron Stone wrote:

> It's something that end users, who are writing their Sieve scripts or using
> interactive web happy Sieve editors, should take into account! It's not
> something that I would want to ban programmatically, however, because of the
> messy compexities and general klugey borkage it would involve.
>
> Aaron
>

?? If you mean you would write an MTA that would *readily permit*
bouncing spam, then you should expect to be branded irresponsible.

This has a long history of causing serious grief - to the extent of
getting otherwise-good-guys blacklisted...

- I hope I misunderstood your stance. Or that you will research the
issue, and - hopefully, modify it.

....Else you may be in an IP battle with Micros**t, who own most of the
rights to software irresponsibility and abdication.. <G>

Bill

>
> Bill Hacker <wbh@conducive.org> said:
>
>
>>Christian G. Warden wrote:
>>
>>
>>>On Tue, Nov 25, 2003 at 04:57:59PM -0000, Aaron Stone wrote:
>>>
>>>
>>>>I believe that the best way to handle spam is by using an MTA based spam
>>>>checker which adds identifying headers, prefixes the subject line, or
>>>>otherwise marks the incoming email without disrupting the MTA path towards
>>>>DBMail delivery.
>>>>
>>>>At delivery time, use a Sieve script to put all of the spam into a folder, or
>>>>discard it, or keep it in INBOX, or bounce it back, or call you pager, etc.
>>>
>>>
>>>You should *not* bounce spam as the sender is always forged (or often
>>>enough that it's safe to say always). Either keep it, discard it, or
>>>reject it at SMTP time.
>>>
>>>xn
>>>_______________________________________________
>>>Dbmail-dev mailing list
>>>Dbmail-dev@dbmail.org
>>>http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>>
>>>
>>
>>Second the 'not bounce' as too often the spam-bastards even have clever
>>forgeryways to utilize a mis-directed 'bounce' to propagate their garbage...
>>
>>Bill Hacker
>>
>>_______________________________________________
>>Dbmail-dev mailing list
>>Dbmail-dev@dbmail.org
>>http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>
>
>
>
>

Re: adding option to support anti-spam filters [ In reply to ]

aaron at serendipity

Nov 25, 2003, 10:55 AM

Post #25 of 34 (5060 views)

Note to those not following the thread: if nothing else, an interesting
rationale for separating the concept of filtering and sorting as being part of
the MTA and MDA, respectively, is way down in an inlined reply. I think it's a
pretty good read ;-)

Aaron

Bill Hacker <wbh@conducive.org> said:

> Aaron Stone wrote:
>
> > I'm not sure what you mean by "DBMail relies on external daemons for IMAP"
> > because dbmail-imapd is most certainly part of DBMail and not external,
> > ala Courier or something like that. We have full control over how we want
> > our IMAP server to behave, and in this case, what the path delimiters are.
> > Now if I actually remember what the delimiter is...
>
> Perhaps I have been under a mistaken impression, but, IIRC the docs had
> instructions on how to configure DBMail with other (required) MTA's. If
> that appleis only to the SMTPd (and POP3d) and the IMAPd is *not*
> external, then there is a great deal more flexibility open to the devel
> team...
>

DBMail manages everything about the mail store. That means is has a way to get
mail in through communications with the MTA and ways to get mail out, through
IMAP and POP3 protocols with the MUA.

Literally the only piece of the puzzle DBMail doesn't touch is SMTP. There are
dozens of good SMTP servers out there, each one bigger than all of DBMail!

> >
> > I believe that the best way to handle spam is by using an MTA based spam
> > checker which adds identifying headers, prefixes the subject line, or
> > otherwise marks the incoming email without disrupting the MTA path towards
> > DBMail delivery.
> >
> Certainly the method that keeps junk out of the DB, but, IMNSHO, also
> the method which potentially imposes the greatest delay in handling
> inbound messages. And, per other MTA authors, effective spam filtering
> is a 'non-trivial' load.
>
> IOW - filtering-to-mark is nearly as cycle-consuming as
> filtering-to-drop-on-the-floor or redirect, though perhpas less abusive
> of an open connection than filtering-to-'reject'.

There isn't anything that DBMail can do to speed up spam filtering. It is what
it is; get a better spam filter, or write your own! But how the delivery chain
is configured *is* a big deal, and is something that DBMail should be quite
concerned with. Part of my work on Sieve support has been rewriting the
message delivery process to be common among the various daemons and tools and
to include code to interface with libSieve and store Sieve scripts in the
database. I could certainly see a step along the path that involves running a
spam checker, and I could see a Bayesian type thing happening with per-user
spam keywords stored in the database. There are several libraries out there
that implement a Bayesian filter to be used in a manner such as this.

So should we do that? It's questionable. There is exactly one standardized
mail sorting language, Sieve, and there is a standardized network protocol for
managing Sieve scripts on a closed server (albeit an expired draft...) This
makes it easy to choose how to implement the general request for a sorting
mechanism. With spam, there are dozens of good quality spam checkers. Further,
some of them are at the MTA level and some at the MDA level. It would be a
shame if we said, "This is the one and only spam checker DBMail works with."

>
> > At delivery time, use a Sieve script to put all of the spam into a folder,
> > or discard it, or keep it in INBOX, or bounce it back, or call you pager,
> > etc.
> >
> ACK.
>
> > It would also not be unreasonable to have a default Sieve script that does
> > this, although I'd have to add default scripts to my TODO list and it
> > would certainly be 3-6 months before I have the code sketched out.
>
> Open source means it should be 'borrowable', with due credit & license
> etc. Not a new need.
>

I'm the one writing the Sieve patches. At this point in the process, it's a
one man job. Too many cooks in the kitchen... Once I have the basic Sieve
stuff together and it's been included into the mainline DBMail, anyone else
could, and should, look into writing a the various auxiliary pieces. Just
noting that if nobody else does, I will, but at least 3-6 months from now.

> >
> > BTW - As some may have seen my numerous freshmeat releases this month, I've
> > been busily working on libSieve and should have 2.2.0 stable and with a fully
> > frozen API in the sometime-between-now-and-1/1/2004 timeframe.
> >
> > Aaron
> >
> >
> > Bill Hacker <wbh@conducive.org> said: [snip]
> >
> >>Feargal Reilly wrote: [snip]
> >>
> >>>Finally, as I mentioned, I have no experience with IMAP, so should a
> >>>spam mailbox be called 'Spam', '/Spam', or something else?
> >>>
> >>
> >>So long as DBMail relies on external daemonss for IMAP, it is likely to
> >>be 'INBOX.<something>', but I would suggest a short name "other than
> >>SPAM", on the grounds that:
> >>
> >>- IF we were *certain* it was spam, we probably wouldn't deliver it at all.
> >>
> >>- "Probable spam" seems appropriate if the luser wishes to reiew it in
> >>case his filters are over zealous, but may be too long a name for
> >>convenience...
> >>
> >>- How about "Suspect" (INBOX.Suspect)
> >>
> >>Bill Hacker
>
> Regards,
>
> Bill Hacker
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

--