Mailing List Archive

NNTP
As I said before, I'm serious about writing the nntp connector for dbmail.
Have a look at the current project specifications at:
http://www.hp.uab.edu/~ed/dbmail-nntp/

Before I start, is there anyone that has tried to tackle this before?
Hints are welcome. Comments are welcome.

ed

Security on the internet is impossible without strong, open,
and unhindered encryption.
RE: NNTP [ In reply to ]
Hello,

I've been thinking a little about a mailing-list software that
would work with dbmail (basically just an sql-backed list of
subscribers and preferences, and probably a dbmail aliases table
lookup to see if the recipient is handled locally or not, in which
case direct-inject the message to their mailbox). This is a little
different objective than what you're trying to handle, but could
well be related/complimentary. Anyone want to take that on? :)
You throw in a web-based client that can handle the mail,
newsgroups, etc., and you could have a very nice little user
portal. :)

As far as nntp is concerned, for performance so you don't have
to parse every message in a mailbox (for thread lookups, etc.),
you'll need some table for some of the header data. It might be
a very opportune time to add in the generic header caching that's
planned for dbmail, and you can simply use that to also
cache the news headers you need.

Do you plan on making an nntp client component? Perhaps "client"
is the wrong term - I'm thinking the ability for your nntp server
to connect to an upstream and be able to pull specific newsgroups
off the net to archive locally. Perhaps that's handled in the
commands you list (I'm not that familiar with nntp), but it didn't
appear to be (unless it's that IHAVE command).

You might consider calling your daemon component dbmail-nntpd.

Jn


---- Original Message ----
From: Ed K. <dbmail-dev@dbmail.org>
To: dbmail-dev@dbmail.org
Subject: [Dbmail-dev] NNTP
Sent: Tue, 16 Mar 2004 22:10:51 -0500 (EST)

As I said before, I'm serious about writing the nntp connector for dbmail.
Have a look at the current project specifications at:
http://www.hp.uab.edu/~ed/dbmail-nntp/

Before I start, is there anyone that has tried to tackle this before?
Hints are welcome. Comments are welcome.

ed

-- End Original Message --


--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--
RE: NNTP [ In reply to ]
jn,

Since dbmail is primarily a mail server, it may require many changes to
dbmail into a full fledged nntp server. That is not my intent. I just want
to export the mailbox as a newsgroup.

You could always subscribe the mailbox to the mailing-list and satisfy one
of the features you mention. But, allowing the user to inject a news
posting into the mailbox would be designed, but not implemented. Also,
dbmail-nntpd could act as a client, with the help of suck, and fetch news
from an upstream nntp server. But this is designed, and not planed for
implementation. You are correct, that is the IHAVE command.

Is there any progress on the cached_header_types? I see you email at:
http://mailman.fastxs.net/pipermail/dbmail/2003-December/003886.html

Right, the daemon should be called dbmail-nntpd

ed



On Wed, 17 Mar 2004, Jesse Norell wrote:

>
> Hello,
>
> I've been thinking a little about a mailing-list software that
> would work with dbmail (basically just an sql-backed list of
> subscribers and preferences, and probably a dbmail aliases table
> lookup to see if the recipient is handled locally or not, in which
> case direct-inject the message to their mailbox). This is a little
> different objective than what you're trying to handle, but could
> well be related/complimentary. Anyone want to take that on? :)
> You throw in a web-based client that can handle the mail,
> newsgroups, etc., and you could have a very nice little user
> portal. :)
>
> As far as nntp is concerned, for performance so you don't have
> to parse every message in a mailbox (for thread lookups, etc.),
> you'll need some table for some of the header data. It might be
> a very opportune time to add in the generic header caching that's
> planned for dbmail, and you can simply use that to also
> cache the news headers you need.
>
> Do you plan on making an nntp client component? Perhaps "client"
> is the wrong term - I'm thinking the ability for your nntp server
> to connect to an upstream and be able to pull specific newsgroups
> off the net to archive locally. Perhaps that's handled in the
> commands you list (I'm not that familiar with nntp), but it didn't
> appear to be (unless it's that IHAVE command).
>
> You might consider calling your daemon component dbmail-nntpd.
>
> Jn
>
>
> ---- Original Message ----
> From: Ed K. <dbmail-dev@dbmail.org>
> To: dbmail-dev@dbmail.org
> Subject: [Dbmail-dev] NNTP
> Sent: Tue, 16 Mar 2004 22:10:51 -0500 (EST)
>
> As I said before, I'm serious about writing the nntp connector for dbmail.
> Have a look at the current project specifications at:
> http://www.hp.uab.edu/~ed/dbmail-nntp/
>
> Before I start, is there anyone that has tried to tackle this before?
> Hints are welcome. Comments are welcome.
>
> ed
>
> -- End Original Message --
>
>
> --
> Jesse Norell
>
> administrator@kci.net is not my email address;
> change "administrator" to my first name.
> --
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

Security on the internet is impossible without strong, open,
and unhindered encryption.
RE: NNTP [ In reply to ]
Ed,

I read your spec, seems like this will work fine but make sure you add
and implement XOVER section 2.8 from rfc2980 this is the _most_ important
command in an nntp server as far as current clients today. Be carefully
to take into account that article numbers are not uniq on the servers out
in the wild, so you will need to implement a renumbering system for any
postings. For ease of use I would also suggest you implement XPAT as a
feature as well.

-leif


> jn,
>
> Since dbmail is primarily a mail server, it may require many changes to
> dbmail into a full fledged nntp server. That is not my intent. I just want
> to export the mailbox as a newsgroup.
>
> You could always subscribe the mailbox to the mailing-list and satisfy one
> of the features you mention. But, allowing the user to inject a news
> posting into the mailbox would be designed, but not implemented. Also,
> dbmail-nntpd could act as a client, with the help of suck, and fetch news
> from an upstream nntp server. But this is designed, and not planed for
> implementation. You are correct, that is the IHAVE command.
>
> Is there any progress on the cached_header_types? I see you email at:
> http://mailman.fastxs.net/pipermail/dbmail/2003-December/003886.html
>
> Right, the daemon should be called dbmail-nntpd
>
> ed
>
>
>
> On Wed, 17 Mar 2004, Jesse Norell wrote:
>
>>
>> Hello,
>>
>> I've been thinking a little about a mailing-list software that
>> would work with dbmail (basically just an sql-backed list of
>> subscribers and preferences, and probably a dbmail aliases table
>> lookup to see if the recipient is handled locally or not, in which
>> case direct-inject the message to their mailbox). This is a little
>> different objective than what you're trying to handle, but could
>> well be related/complimentary. Anyone want to take that on? :)
>> You throw in a web-based client that can handle the mail,
>> newsgroups, etc., and you could have a very nice little user
>> portal. :)
>>
>> As far as nntp is concerned, for performance so you don't have
>> to parse every message in a mailbox (for thread lookups, etc.),
>> you'll need some table for some of the header data. It might be
>> a very opportune time to add in the generic header caching that's
>> planned for dbmail, and you can simply use that to also
>> cache the news headers you need.
>>
>> Do you plan on making an nntp client component? Perhaps "client"
>> is the wrong term - I'm thinking the ability for your nntp server
>> to connect to an upstream and be able to pull specific newsgroups
>> off the net to archive locally. Perhaps that's handled in the
>> commands you list (I'm not that familiar with nntp), but it didn't
>> appear to be (unless it's that IHAVE command).
>>
>> You might consider calling your daemon component dbmail-nntpd.
>>
>> Jn
>>
>>
>> ---- Original Message ----
>> From: Ed K. <dbmail-dev@dbmail.org>
>> To: dbmail-dev@dbmail.org
>> Subject: [Dbmail-dev] NNTP
>> Sent: Tue, 16 Mar 2004 22:10:51 -0500 (EST)
>>
>> As I said before, I'm serious about writing the nntp connector for
>> dbmail.
>> Have a look at the current project specifications at:
>> http://www.hp.uab.edu/~ed/dbmail-nntp/
>>
>> Before I start, is there anyone that has tried to tackle this before?
>> Hints are welcome. Comments are welcome.
>>
>> ed
>>
>> -- End Original Message --
>>
>>
>> --
>> Jesse Norell
>>
>> administrator@kci.net is not my email address;
>> change "administrator" to my first name.
>> --
>>
>> _______________________________________________
>> Dbmail-dev mailing list
>> Dbmail-dev@dbmail.org
>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>
>
> Security on the internet is impossible without strong, open,
> and unhindered encryption.
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
RE: NNTP [ In reply to ]
> Is there any progress on the cached_header_types? I see you email at:
> http://mailman.fastxs.net/pipermail/dbmail/2003-December/003886.html

I've not heard of anyone working on it with dbmail. I've started
playing with it in weDBmail, but (in addition to lack of time) I've
kind of been waiting for dbmail's schema to emerge so I can have
weDBmail use dbmail's tables, not it's own. (dbmail support for it
will probably be added in 2.2 or later, but I'll probably backport
it into 1.2 for weDBmail, we'll just have to do all the header
parsing/caching ourselves)

--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--
RE: NNTP [ In reply to ]
I think Ilja kicked the cached headers to 2.1 development...

""Jesse Norell"" <jesse@kci.net> said:

>
> > Is there any progress on the cached_header_types? I see you email at:
> > http://mailman.fastxs.net/pipermail/dbmail/2003-December/003886.html
>
> I've not heard of anyone working on it with dbmail. I've started
> playing with it in weDBmail, but (in addition to lack of time) I've
> kind of been waiting for dbmail's schema to emerge so I can have
> weDBmail use dbmail's tables, not it's own. (dbmail support for it
> will probably be added in 2.2 or later, but I'll probably backport
> it into 1.2 for weDBmail, we'll just have to do all the header
> parsing/caching ourselves)
>
> --
> Jesse Norell
>
> administrator@kci.net is not my email address;
> change "administrator" to my first name.
> --
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>



--
RE: NNTP [ In reply to ]
> I think Ilja kicked the cached headers to 2.1 development...

Really? That'd be great! Ilja: while you're in that part of the code,
think you could stick a header flag in the messageblks table, to mark
which blocks are the headers? Should be trivial, and would make full-header
searches much easier for clients like weDBmail. That's been discussed
in the past vs. moving headers to their own table... if the latter idea
seems preferable, then don't worry about it, but it seems you get almost
all the advantages with just adding a flag to messageblks.

Jn

--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--
Re: NNTP [ In reply to ]
Jesse Norell wrote:

>
>
>>I think Ilja kicked the cached headers to 2.1 development...
>>
>>
>
> Really? That'd be great! Ilja: while you're in that part of the code,
>think you could stick a header flag in the messageblks table, to mark
>which blocks are the headers? Should be trivial, and would make full-header
>searches much easier for clients like weDBmail. That's been discussed
>in the past vs. moving headers to their own table... if the latter idea
>seems preferable, then don't worry about it, but it seems you get almost
>all the advantages with just adding a flag to messageblks.
>
>

I think it would be cleaner to put the message headers into their own
table. Very searchable, and clearly distinct from the message body.
Re: NNTP [ In reply to ]
On Wed, 17 Mar 2004, Matthew T. O'Connor wrote:

> Jesse Norell wrote:
>
> >
> >
> >>I think Ilja kicked the cached headers to 2.1 development...
> >>
> >>
> >
> > Really? That'd be great! Ilja: while you're in that part of the code,
> >think you could stick a header flag in the messageblks table, to mark
> >which blocks are the headers? Should be trivial, and would make full-header
> >searches much easier for clients like weDBmail. That's been discussed
> >in the past vs. moving headers to their own table... if the latter idea
> >seems preferable, then don't worry about it, but it seems you get almost
> >all the advantages with just adding a flag to messageblks.
> >
> >
>
> I think it would be cleaner to put the message headers into their own
> table. Very searchable, and clearly distinct from the message body.
>

I agree a seperate tablem but the question is: would Jesse be happy with
a two table relationship. one for header titles, and the other for header
contents?



Security on the internet is impossible without strong, open,
and unhindered encryption.
Re: NNTP [ In reply to ]
> > I think it would be cleaner to put the message headers into their own
> > table. Very searchable, and clearly distinct from the message body.
> >
>
> I agree a seperate tablem but the question is: would Jesse be happy with
> a two table relationship. one for header titles, and the other for header
> contents?

Heck, I'm pretty easy-going, either works for me. :) The real advantage
to just using a flag is it could probably be done right now - it's quite
trivial, and would probably even make the job of header caching easier.
To restructure the headers to another table is quite a bit more intrusive,
and I'd guess would probably get put off till a later date (remember how
long it took the changes to use physmessageblks to stablize?). If the
advantages to seperating it (which is just less data to read in a
sequential scan, right?) are compelling, keep it on the todo list and
do it right (after 2.2 :).


--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--
Re: NNTP [ In reply to ]
Jesse Norell wrote:

> Heck, I'm pretty easy-going, either works for me. :) The real advantage
>to just using a flag is it could probably be done right now - it's quite
>trivial, and would probably even make the job of header caching easier.
>To restructure the headers to another table is quite a bit more intrusive,
>and I'd guess would probably get put off till a later date (remember how
>long it took the changes to use physmessageblks to stablize?). If the
>advantages to seperating it (which is just less data to read in a
>sequential scan, right?) are compelling, keep it on the todo list and
>do it right (after 2.2 :).
>
>

At least part of the reason I am advocating a separate headers table is
that I don't see how the header flag would work. I am guessing that you
are suggesting adding a flag to the message_blks table that says "this
row has headers in it" that is fine but under the current design, a
client such as webdbmail would still have to parse that message block to
figure out where the headers begin and where the message body begins.
Perhaps a hybrid idea would be to add the header flag and then have
dbmail only put headers into any row that has that flag set. That way
you still get the clean delineation between headers and body and not the
intrusive change to the table structure.

Matthew
Re: NNTP [ In reply to ]
> At least part of the reason I am advocating a separate headers table is
> that I don't see how the header flag would work. I am guessing that you
> are suggesting adding a flag to the message_blks table that says "this
> row has headers in it" that is fine but under the current design, a
> client such as webdbmail would still have to parse that message block to
> figure out where the headers begin and where the message body begins.
> Perhaps a hybrid idea would be to add the header flag and then have
> dbmail only put headers into any row that has that flag set. That way
> you still get the clean delineation between headers and body and not the
> intrusive change to the table structure.

This last part is already done - message headers go into the first
messageblks row, the message body is in all subsequent rows - so that's
exactly what the flag would do. It would also allow you to have more
than one messageblks row that contained headers, which if I understand
correctly, cannot be done right now (so if you had a huge amount of
headers (and the rfc's place no limit on that), it would currently
break, but with adding a flag for them, it could be made to work).


--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--
Re: NNTP [ In reply to ]
""Jesse Norell"" <jesse@kci.net> said:

>
>
> > At least part of the reason I am advocating a separate headers table is
> > that I don't see how the header flag would work. I am guessing that you
> > are suggesting adding a flag to the message_blks table that says "this
> > row has headers in it" that is fine but under the current design, a
> > client such as webdbmail would still have to parse that message block to
> > figure out where the headers begin and where the message body begins.
> > Perhaps a hybrid idea would be to add the header flag and then have
> > dbmail only put headers into any row that has that flag set. That way
> > you still get the clean delineation between headers and body and not the
> > intrusive change to the table structure.
>
> This last part is already done - message headers go into the first
> messageblks row, the message body is in all subsequent rows - so that's
> exactly what the flag would do. It would also allow you to have more
> than one messageblks row that contained headers, which if I understand
> correctly, cannot be done right now (so if you had a huge amount of
> headers (and the rfc's place no limit on that), it would currently
> break, but with adding a flag for them, it could be made to work).

Being able to handle oversized headers is important to maintain compliance
with an implication of the RFC; there indeed is no limit on headers.

The header cache table would not be a replacement for the headers in the
messageblks table, though. I've suggested calling the table "fastheaders"
because its primary purpose would be holding the headers in heavily indexed
columns for fast searching.

Copying the headers into the fastheaders table would necessarily modify them,
stripping out the order in which they were received and possibly some of the
line breaks, tabbing and spacing (although these last two should be minimized,
they are likely to occur at some level).

When the message is viewed, the headers that are in the messageblks table
would be used because they are in entirely unmodified form.

Aaron


> --
> Jesse Norell
>
> administrator@kci.net is not my email address;
> change "administrator" to my first name.
> --
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>



--
Re: NNTP [ In reply to ]
> > > At least part of the reason I am advocating a separate headers table is
> > > that I don't see how the header flag would work. I am guessing that you
> > > are suggesting adding a flag to the message_blks table that says "this
> > > row has headers in it" that is fine but under the current design, a
> > > client such as webdbmail would still have to parse that message block to
> > > figure out where the headers begin and where the message body begins.
> > > Perhaps a hybrid idea would be to add the header flag and then have
> > > dbmail only put headers into any row that has that flag set. That way
> > > you still get the clean delineation between headers and body and not the
> > > intrusive change to the table structure.
> >
> > This last part is already done - message headers go into the first
> > messageblks row, the message body is in all subsequent rows - so that's
> > exactly what the flag would do. It would also allow you to have more
> > than one messageblks row that contained headers, which if I understand
> > correctly, cannot be done right now (so if you had a huge amount of
> > headers (and the rfc's place no limit on that), it would currently
> > break, but with adding a flag for them, it could be made to work).
>
> Being able to handle oversized headers is important to maintain compliance
> with an implication of the RFC; there indeed is no limit on headers.
>
> The header cache table would not be a replacement for the headers in the
> messageblks table, though. I've suggested calling the table "fastheaders"
> because its primary purpose would be holding the headers in heavily indexed
> columns for fast searching.
>
> Copying the headers into the fastheaders table would necessarily modify them,
> stripping out the order in which they were received and possibly some of the
> line breaks, tabbing and spacing (although these last two should be minimized,
> they are likely to occur at some level).
>
> When the message is viewed, the headers that are in the messageblks table
> would be used because they are in entirely unmodified form.
>
> Aaron
>

What if a solution was proposed in which the order and the formatting of
the headers could be preserved. A documented method to deconstruct and
construct the headers, and working code to be included in db.c. Then a flag
in the messageblks row that would indicate if the header is either only,
also, or not in the fastheaders table. i suggest we call the table pair
message_headers and header_labels.

ed

Security on the internet is impossible without strong, open,
and unhindered encryption.
Re: NNTP [ In reply to ]
Seems terribly complicated, and it would make the messages very difficult to
dump by hand. I'm also not sure why you want to have two tables; is that what
you're proposing:

header_labels (index on label)
id label
--------------
1 to
2 from
3 subject

message_headers (index on messageid, labelid, header and labelid, header)
-----------------------------------------------------
id message_id label_id header
1 1 1 bob@dbmail
2 1 2 joe@sender
3 1 3 Hey Bob, it's Joe!


I think that unless your database has miraculously good JOINs, this is a
nightmare; note that you cannot reassemble by returning rows in ascending
order of the 'id' column, another column would be needed to keep the order of
the headers. This is probably the most compact was to store the headers,
though, but I think it is at the cost of being severely slow and obtuse.

My proposed header looks like this:

fastheaders (index on messageid, header, contents and header, contents)
id message_id header contents
1 1 to bob@dbmail
2 1 from joe@sender


Suffers from using a lot more space, but would have faster search times.
Administrators would be able to more easily query the table by hand. As it is
intended to only be a cache, one could freely zap this table and rebuild it
from the original headers. Naturally, I'm strongly advocating my idea ;-)

Aaron



""Ed K."" <ed@hp.uab.edu> said:
[snip]
> What if a solution was proposed in which the order and the formatting of
> the headers could be preserved. A documented method to deconstruct and
> construct the headers, and working code to be included in db.c. Then a flag
> in the messageblks row that would indicate if the header is either only,
> also, or not in the fastheaders table. i suggest we call the table pair
> message_headers and header_labels.
>
> ed
>
> Security on the internet is impossible without strong, open,
> and unhindered encryption.
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>



--
RE: Cached Headers (was NNTP) [ In reply to ]
Surely for webmail applications, etc you're after a join where you get the
headers as columns for displaying a message list; to avoid having nested
queries.

E.g. avoid "select * from messages"; then for each message "select * from
headers where messageid=X"

I know the goal is to have the headers table really flexible so that
additional headers can be added, etc -- but maybe this could be done
differently. A run-time config file where you mapped the column
names/numbers to header names, etc.
That way if a developer wanted to add a new column and start caching a new
header they could alter the table and simply update the config files,
one-by-one. Or disable the headercache altogether.

I could be missing something here -- is there a query which can turn a
one-to-many join into a single line?

/Mark

> -----Original Message-----
> From: dbmail-dev-bounces@dbmail.org
> [mailto:dbmail-dev-bounces@dbmail.org] On Behalf Of Aaron Stone
> Sent: Thursday, 18 March 2004 6:30 p.m.
> To: DBMAIL Developers Mailinglist
> Subject: Re: [Dbmail-dev] NNTP
>
> Seems terribly complicated, and it would make the messages
> very difficult to
> dump by hand. I'm also not sure why you want to have two
> tables; is that what
> you're proposing:
>
> header_labels (index on label)
> id label
> --------------
> 1 to
> 2 from
> 3 subject
>
> message_headers (index on messageid, labelid, header and
> labelid, header)
> -----------------------------------------------------
> id message_id label_id header
> 1 1 1 bob@dbmail
> 2 1 2 joe@sender
> 3 1 3 Hey Bob, it's Joe!
>
>
> I think that unless your database has miraculously good
> JOINs, this is a
> nightmare; note that you cannot reassemble by returning rows
> in ascending
> order of the 'id' column, another column would be needed to
> keep the order of
> the headers. This is probably the most compact was to store
> the headers,
> though, but I think it is at the cost of being severely slow
> and obtuse.
>
> My proposed header looks like this:
>
> fastheaders (index on messageid, header, contents and
> header, contents)
> id message_id header contents
> 1 1 to bob@dbmail
> 2 1 from joe@sender
>
>
> Suffers from using a lot more space, but would have faster
> search times.
> Administrators would be able to more easily query the table
> by hand. As it is
> intended to only be a cache, one could freely zap this table
> and rebuild it
> from the original headers. Naturally, I'm strongly advocating
> my idea ;-)
>
> Aaron
>
>
>
> ""Ed K."" <ed@hp.uab.edu> said:
> [snip]
> > What if a solution was proposed in which the order and the
> formatting of
> > the headers could be preserved. A documented method to
> deconstruct and
> > construct the headers, and working code to be included in
> db.c. Then a flag
> > in the messageblks row that would indicate if the header is
> either only,
> > also, or not in the fastheaders table. i suggest we call
> the table pair
> > message_headers and header_labels.
> >
> > ed
> >
> > Security on the internet is impossible without strong, open,
> > and unhindered encryption.
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
>
>
>
> --
>
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
RE: Cached Headers (was NNTP) [ In reply to ]
I'm thinking in terms of searches primarily, where a query might be:

select distinct(message_id) from fastheaders
where header = 'to' and contents like '%bob%'

You're looking for displaying the message, where the webmail application
probably doesn't want to read in the header blocks of the message and parse
them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To fields.

Originally I suggested using a table that has the header fields as the
columns, thinking that those were the only commonly requested ones, but it was
either Ilja or Roel at the time who watched for a couple of IMAP queries and
saw all kinds of headers being requested both for searching and for listing
messages. So the configurable columns thing means that the admin needs to be
watching to see what fields his clients are requesting, and to add those, and
it means quite a bit more complexity in DBMail, which would need to either
find the header in the header cache table or parse the full headers.

For your webmail application, since you know exactly which header fields you
want and you're writing your own queries, you can just do this:

select header, contents from fastheaders
where message_id = 82 and (header = 'To' or header = 'From' or ...)

My instinct is that this would still be reasonably fast, completely flexible
and generic. Even if the query took slightly longer, the much smaller result
set size and the negligible parsing needed would offset the query speed.

Aaron


""Mark Mackay - Orcon"" <mark@orcon.net.nz> said:

> Surely for webmail applications, etc you're after a join where you get the
> headers as columns for displaying a message list; to avoid having nested
> queries.
>
> E.g. avoid "select * from messages"; then for each message "select * from
> headers where messageid=X"
>
> I know the goal is to have the headers table really flexible so that
> additional headers can be added, etc -- but maybe this could be done
> differently. A run-time config file where you mapped the column
> names/numbers to header names, etc.
> That way if a developer wanted to add a new column and start caching a new
> header they could alter the table and simply update the config files,
> one-by-one. Or disable the headercache altogether.
>
> I could be missing something here -- is there a query which can turn a
> one-to-many join into a single line?
>
> /Mark
>
> > -----Original Message-----
> > From: dbmail-dev-bounces@dbmail.org
> > [mailto:dbmail-dev-bounces@dbmail.org] On Behalf Of Aaron Stone
> > Sent: Thursday, 18 March 2004 6:30 p.m.
> > To: DBMAIL Developers Mailinglist
> > Subject: Re: [Dbmail-dev] NNTP
> >
> > Seems terribly complicated, and it would make the messages
> > very difficult to
> > dump by hand. I'm also not sure why you want to have two
> > tables; is that what
> > you're proposing:
> >
> > header_labels (index on label)
> > id label
> > --------------
> > 1 to
> > 2 from
> > 3 subject
> >
> > message_headers (index on messageid, labelid, header and
> > labelid, header)
> > -----------------------------------------------------
> > id message_id label_id header
> > 1 1 1 bob@dbmail
> > 2 1 2 joe@sender
> > 3 1 3 Hey Bob, it's Joe!
> >
> >
> > I think that unless your database has miraculously good
> > JOINs, this is a
> > nightmare; note that you cannot reassemble by returning rows
> > in ascending
> > order of the 'id' column, another column would be needed to
> > keep the order of
> > the headers. This is probably the most compact was to store
> > the headers,
> > though, but I think it is at the cost of being severely slow
> > and obtuse.
> >
> > My proposed header looks like this:
> >
> > fastheaders (index on messageid, header, contents and
> > header, contents)
> > id message_id header contents
> > 1 1 to bob@dbmail
> > 2 1 from joe@sender
> >
> >
> > Suffers from using a lot more space, but would have faster
> > search times.
> > Administrators would be able to more easily query the table
> > by hand. As it is
> > intended to only be a cache, one could freely zap this table
> > and rebuild it
> > from the original headers. Naturally, I'm strongly advocating
> > my idea ;-)
> >
> > Aaron
> >
> >
> >
> > ""Ed K."" <ed@hp.uab.edu> said:
> > [snip]
> > > What if a solution was proposed in which the order and the
> > formatting of
> > > the headers could be preserved. A documented method to
> > deconstruct and
> > > construct the headers, and working code to be included in
> > db.c. Then a flag
> > > in the messageblks row that would indicate if the header is
> > either only,
> > > also, or not in the fastheaders table. i suggest we call
> > the table pair
> > > message_headers and header_labels.
> > >
> > > ed
> > >
> > > Security on the internet is impossible without strong, open,
> > > and unhindered encryption.
> > >
> > > _______________________________________________
> > > Dbmail-dev mailing list
> > > Dbmail-dev@dbmail.org
> > > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> > >
> >
> >
> >
> > --
> >
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>



--
Re: Cached Headers (was NNTP) [ In reply to ]
Haai,

This is a major problem with most 'redundant' database based systems as far
as i understand it ... When i looked as sql relay it didnt seem to do
anything natively with the database. And considering i havent used it much i
wouldnt be able to yay or nay it. The stuff in the white papers about
backplane seem to be rather advanced and seem to fix alot of my personal
gripes with running a system such as DBMail. Redundancy in a mail system was
a really big thing for me considering that the biggest part of the internet
from an isp/companys point of view is E-Mail and downtime is not always an
option.

Another one of the features that i would think would be rather usefull would
be the SNAPShots ( for backup purposes ) I currently backup a mail
filesystem with XFSDump which is nice and allows me to make file system
snapshots. But this is not the ultimate solution in a large system because
of the nature of E-Mail in those enviroments.

Would adding SQL Relay not add another possible point of failure ? I know
with systems like portfwd and suchlike they arent created to handle high
loads of traffic or mass ammounts of threading ( as i say again i havent
used SQL Relay :) ) ?

Comments ?

P
----- Original Message -----
From: "Aaron Stone" <aaron@serendipity.palo-alto.ca.us>
To: "DBMAIL Developers Mailinglist" <dbmail-dev@dbmail.org>
Sent: Thursday, March 18, 2004 8:23 AM
Subject: RE: [Dbmail-dev] Cached Headers (was NNTP)


| I'm thinking in terms of searches primarily, where a query might be:
|
| select distinct(message_id) from fastheaders
| where header = 'to' and contents like '%bob%'
|
| You're looking for displaying the message, where the webmail application
| probably doesn't want to read in the header blocks of the message and
parse
| them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To
fields.
|
| Originally I suggested using a table that has the header fields as the
| columns, thinking that those were the only commonly requested ones, but it
was
| either Ilja or Roel at the time who watched for a couple of IMAP queries
and
| saw all kinds of headers being requested both for searching and for
listing
| messages. So the configurable columns thing means that the admin needs to
be
| watching to see what fields his clients are requesting, and to add those,
and
| it means quite a bit more complexity in DBMail, which would need to either
| find the header in the header cache table or parse the full headers.
|
| For your webmail application, since you know exactly which header fields
you
| want and you're writing your own queries, you can just do this:
|
| select header, contents from fastheaders
| where message_id = 82 and (header = 'To' or header = 'From' or ...)
|
| My instinct is that this would still be reasonably fast, completely
flexible
| and generic. Even if the query took slightly longer, the much smaller
result
| set size and the negligible parsing needed would offset the query speed.
|
| Aaron
|
|
| ""Mark Mackay - Orcon"" <mark@orcon.net.nz> said:
|
| > Surely for webmail applications, etc you're after a join where you get
the
| > headers as columns for displaying a message list; to avoid having nested
| > queries.
| >
| > E.g. avoid "select * from messages"; then for each message "select *
from
| > headers where messageid=X"
| >
| > I know the goal is to have the headers table really flexible so that
| > additional headers can be added, etc -- but maybe this could be done
| > differently. A run-time config file where you mapped the column
| > names/numbers to header names, etc.
| > That way if a developer wanted to add a new column and start caching a
new
| > header they could alter the table and simply update the config files,
| > one-by-one. Or disable the headercache altogether.
| >
| > I could be missing something here -- is there a query which can turn a
| > one-to-many join into a single line?
| >
| > /Mark
| >
| > > -----Original Message-----
| > > From: dbmail-dev-bounces@dbmail.org
| > > [mailto:dbmail-dev-bounces@dbmail.org] On Behalf Of Aaron Stone
| > > Sent: Thursday, 18 March 2004 6:30 p.m.
| > > To: DBMAIL Developers Mailinglist
| > > Subject: Re: [Dbmail-dev] NNTP
| > >
| > > Seems terribly complicated, and it would make the messages
| > > very difficult to
| > > dump by hand. I'm also not sure why you want to have two
| > > tables; is that what
| > > you're proposing:
| > >
| > > header_labels (index on label)
| > > id label
| > > --------------
| > > 1 to
| > > 2 from
| > > 3 subject
| > >
| > > message_headers (index on messageid, labelid, header and
| > > labelid, header)
| > > -----------------------------------------------------
| > > id message_id label_id header
| > > 1 1 1 bob@dbmail
| > > 2 1 2 joe@sender
| > > 3 1 3 Hey Bob, it's Joe!
| > >
| > >
| > > I think that unless your database has miraculously good
| > > JOINs, this is a
| > > nightmare; note that you cannot reassemble by returning rows
| > > in ascending
| > > order of the 'id' column, another column would be needed to
| > > keep the order of
| > > the headers. This is probably the most compact was to store
| > > the headers,
| > > though, but I think it is at the cost of being severely slow
| > > and obtuse.
| > >
| > > My proposed header looks like this:
| > >
| > > fastheaders (index on messageid, header, contents and
| > > header, contents)
| > > id message_id header contents
| > > 1 1 to bob@dbmail
| > > 2 1 from joe@sender
| > >
| > >
| > > Suffers from using a lot more space, but would have faster
| > > search times.
| > > Administrators would be able to more easily query the table
| > > by hand. As it is
| > > intended to only be a cache, one could freely zap this table
| > > and rebuild it
| > > from the original headers. Naturally, I'm strongly advocating
| > > my idea ;-)
| > >
| > > Aaron
| > >
| > >
| > >
| > > ""Ed K."" <ed@hp.uab.edu> said:
| > > [snip]
| > > > What if a solution was proposed in which the order and the
| > > formatting of
| > > > the headers could be preserved. A documented method to
| > > deconstruct and
| > > > construct the headers, and working code to be included in
| > > db.c. Then a flag
| > > > in the messageblks row that would indicate if the header is
| > > either only,
| > > > also, or not in the fastheaders table. i suggest we call
| > > the table pair
| > > > message_headers and header_labels.
| > > >
| > > > ed
| > > >
| > > > Security on the internet is impossible without strong, open,
| > > > and unhindered encryption.
| > > >
| > > > _______________________________________________
| > > > Dbmail-dev mailing list
| > > > Dbmail-dev@dbmail.org
| > > > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
| > > >
| > >
| > >
| > >
| > > --
| > >
| > >
| > >
| > > _______________________________________________
| > > Dbmail-dev mailing list
| > > Dbmail-dev@dbmail.org
| > > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
| > >
| >
| > _______________________________________________
| > Dbmail-dev mailing list
| > Dbmail-dev@dbmail.org
| > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
| >
|
|
|
| --
|
|
|
| _______________________________________________
| Dbmail-dev mailing list
| Dbmail-dev@dbmail.org
| http://twister.fastxs.net/mailman/listinfo/dbmail-dev
|
|
Re: Cached Headers (was NNTP) [ In reply to ]
Well,
I beleive a flag on the messageblock would be a good start, since
it is not that intrusive.
But I am very unsusre about how much performance we would gain by
searching directly in heavily optimized fast-header tables. Is
the pain worth the gain?
Small installations do not need the extra performance.
Large installations can probably get away with more expensive
hardware
Huge installations may run out of steam on the best availible
hardware, here the fast-headers might be of some use.

My beleif is that we should go for the easiest implementation,
that is the least intrusive and get a feeling for what we need.
I know that MySQL supports queries with regular expressions.
These queries will probably be quite fast.
I think we should se what happens when we implement server side
sorting, based on clever SQL statements using the messageblocks.

The huge question is: How much do we gain in performance by using
special header tables compared to searching the messageblocks?
The gain might be quite small.

well, that my $0.02

Magnus

Aaron Stone wrote:
> I'm thinking in terms of searches primarily, where a query might be:
>
> select distinct(message_id) from fastheaders
> where header = 'to' and contents like '%bob%'
>
> You're looking for displaying the message, where the webmail application
> probably doesn't want to read in the header blocks of the message and parse
> them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To fields.
>
> Originally I suggested using a table that has the header fields as the
> columns, thinking that those were the only commonly requested ones, but it was
> either Ilja or Roel at the time who watched for a couple of IMAP queries and
> saw all kinds of headers being requested both for searching and for listing
> messages. So the configurable columns thing means that the admin needs to be
> watching to see what fields his clients are requesting, and to add those, and
> it means quite a bit more complexity in DBMail, which would need to either
> find the header in the header cache table or parse the full headers.
>
> For your webmail application, since you know exactly which header fields you
> want and you're writing your own queries, you can just do this:
>
> select header, contents from fastheaders
> where message_id = 82 and (header = 'To' or header = 'From' or ...)
>
> My instinct is that this would still be reasonably fast, completely flexible
> and generic. Even if the query took slightly longer, the much smaller result
> set size and the negligible parsing needed would offset the query speed.
>
> Aaron
>
>
> ""Mark Mackay - Orcon"" <mark@orcon.net.nz> said:
>
>
>>Surely for webmail applications, etc you're after a join where you get the
>>headers as columns for displaying a message list; to avoid having nested
>>queries.
>>
>>E.g. avoid "select * from messages"; then for each message "select * from
>>headers where messageid=X"
>>
>>I know the goal is to have the headers table really flexible so that
>>additional headers can be added, etc -- but maybe this could be done
>>differently. A run-time config file where you mapped the column
>>names/numbers to header names, etc.
>>That way if a developer wanted to add a new column and start caching a new
>>header they could alter the table and simply update the config files,
>>one-by-one. Or disable the headercache altogether.
>>
>>I could be missing something here -- is there a query which can turn a
>>one-to-many join into a single line?
>>
>>/Mark
>>
>>
>>>-----Original Message-----
>>>From: dbmail-dev-bounces@dbmail.org
>>>[mailto:dbmail-dev-bounces@dbmail.org] On Behalf Of Aaron Stone
>>>Sent: Thursday, 18 March 2004 6:30 p.m.
>>>To: DBMAIL Developers Mailinglist
>>>Subject: Re: [Dbmail-dev] NNTP
>>>
>>>Seems terribly complicated, and it would make the messages
>>>very difficult to
>>>dump by hand. I'm also not sure why you want to have two
>>>tables; is that what
>>>you're proposing:
>>>
>>>header_labels (index on label)
>>>id label
>>>--------------
>>>1 to
>>>2 from
>>>3 subject
>>>
>>>message_headers (index on messageid, labelid, header and
>>>labelid, header)
>>>-----------------------------------------------------
>>>id message_id label_id header
>>>1 1 1 bob@dbmail
>>>2 1 2 joe@sender
>>>3 1 3 Hey Bob, it's Joe!
>>>
>>>
>>>I think that unless your database has miraculously good
>>>JOINs, this is a
>>>nightmare; note that you cannot reassemble by returning rows
>>>in ascending
>>>order of the 'id' column, another column would be needed to
>>>keep the order of
>>>the headers. This is probably the most compact was to store
>>>the headers,
>>>though, but I think it is at the cost of being severely slow
>>>and obtuse.
>>>
>>>My proposed header looks like this:
>>>
>>>fastheaders (index on messageid, header, contents and
>>>header, contents)
>>>id message_id header contents
>>>1 1 to bob@dbmail
>>>2 1 from joe@sender
>>>
>>>
>>>Suffers from using a lot more space, but would have faster
>>>search times.
>>>Administrators would be able to more easily query the table
>>>by hand. As it is
>>>intended to only be a cache, one could freely zap this table
>>>and rebuild it
>>>from the original headers. Naturally, I'm strongly advocating
>>>my idea ;-)
>>>
>>>Aaron
>>>
>>>
>>>
>>>""Ed K."" <ed@hp.uab.edu> said:
>>>[snip]
>>>
>>>>What if a solution was proposed in which the order and the
>>>
>>>formatting of
>>>
>>>>the headers could be preserved. A documented method to
>>>
>>>deconstruct and
>>>
>>>>construct the headers, and working code to be included in
>>>
>>>db.c. Then a flag
>>>
>>>>in the messageblks row that would indicate if the header is
>>>
>>>either only,
>>>
>>>>also, or not in the fastheaders table. i suggest we call
>>>
>>>the table pair
>>>
>>>>message_headers and header_labels.
>>>>
>>>>ed
>>>>
>>>>Security on the internet is impossible without strong, open,
>>>>and unhindered encryption.
>>>>
>>>>_______________________________________________
>>>>Dbmail-dev mailing list
>>>>Dbmail-dev@dbmail.org
>>>>http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>>>
>>>
>>>
>>>
>>>--
>>>
>>>
>>>
>>>_______________________________________________
>>>Dbmail-dev mailing list
>>>Dbmail-dev@dbmail.org
>>>http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>>
>>
>>_______________________________________________
>>Dbmail-dev mailing list
>>Dbmail-dev@dbmail.org
>>http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>
>
>
>
>
Re: Cached Headers (was NNTP) [ In reply to ]
Finally I come in the discussion :)

I'm all for optimizing headers, BUT:
I'd like to optimize for IMAP. Optimizing for POP3 is no issue, since
POP just downloads the whole message.

I don't really like to optimize for native clients. DBMail's IMAP should
be fast enough to work with normal IMAP clients (webbased or running on
clients). One way of optimizing is to implement IMAP features extensions
that'll help getting faster, like SORT and THREAD.

Currently, I'm not busy with this at all, but if anybody wants to do
some statistical research to find out what headers different clients
FETCH using IMAP, we can see which headers can cache.

By the way, I guess that having one table with cached headers like Aaron
proposed would be the best. I think we should only cache headers that
are used frequently and/or that are always requested *together* by
clients (it's no use if for a certain client program, we get all but one
headers from the cache, and get the last header from parsing the message.

Ilja

Magnus Sundberg wrote:
> Well,
> I beleive a flag on the messageblock would be a good start, since it is
> not that intrusive.
> But I am very unsusre about how much performance we would gain by
> searching directly in heavily optimized fast-header tables. Is the pain
> worth the gain?
> Small installations do not need the extra performance.
> Large installations can probably get away with more expensive hardware
> Huge installations may run out of steam on the best availible hardware,
> here the fast-headers might be of some use.
>
> My beleif is that we should go for the easiest implementation, that is
> the least intrusive and get a feeling for what we need.
> I know that MySQL supports queries with regular expressions. These
> queries will probably be quite fast.
> I think we should se what happens when we implement server side sorting,
> based on clever SQL statements using the messageblocks.
>
> The huge question is: How much do we gain in performance by using
> special header tables compared to searching the messageblocks? The gain
> might be quite small.
>
> well, that my $0.02
>
> Magnus
>
> Aaron Stone wrote:
>
>> I'm thinking in terms of searches primarily, where a query might be:
>>
>> select distinct(message_id) from fastheaders
>> where header = 'to' and contents like '%bob%'
>>
>> You're looking for displaying the message, where the webmail application
>> probably doesn't want to read in the header blocks of the message and
>> parse
>> them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To
>> fields.
>>
>> Originally I suggested using a table that has the header fields as the
>> columns, thinking that those were the only commonly requested ones,
>> but it was
>> either Ilja or Roel at the time who watched for a couple of IMAP
>> queries and
>> saw all kinds of headers being requested both for searching and for
>> listing
>> messages. So the configurable columns thing means that the admin needs
>> to be
>> watching to see what fields his clients are requesting, and to add
>> those, and
>> it means quite a bit more complexity in DBMail, which would need to
>> either
>> find the header in the header cache table or parse the full headers.
>>
>> For your webmail application, since you know exactly which header
>> fields you
>> want and you're writing your own queries, you can just do this:
>>
>> select header, contents from fastheaders
>> where message_id = 82 and (header = 'To' or header = 'From' or ...)
>>
>> My instinct is that this would still be reasonably fast, completely
>> flexible
>> and generic. Even if the query took slightly longer, the much smaller
>> result
>> set size and the negligible parsing needed would offset the query speed.
>>
>> Aaron
>>
>>
>> ""Mark Mackay - Orcon"" <mark@orcon.net.nz> said:
>>
>>
>>> Surely for webmail applications, etc you're after a join where you
>>> get the
>>> headers as columns for displaying a message list; to avoid having nested
>>> queries.
>>>
>>> E.g. avoid "select * from messages"; then for each message "select *
>>> from
>>> headers where messageid=X"
>>>
>>> I know the goal is to have the headers table really flexible so that
>>> additional headers can be added, etc -- but maybe this could be done
>>> differently. A run-time config file where you mapped the column
>>> names/numbers to header names, etc.
>>> That way if a developer wanted to add a new column and start caching
>>> a new
>>> header they could alter the table and simply update the config files,
>>> one-by-one. Or disable the headercache altogether.
>>> I could be missing something here -- is there a query which can turn a
>>> one-to-many join into a single line?
>>>
>>> /Mark
>>>
>>>
>>>> -----Original Message-----
>>>> From: dbmail-dev-bounces@dbmail.org
>>>> [mailto:dbmail-dev-bounces@dbmail.org] On Behalf Of Aaron Stone
>>>> Sent: Thursday, 18 March 2004 6:30 p.m.
>>>> To: DBMAIL Developers Mailinglist
>>>> Subject: Re: [Dbmail-dev] NNTP
>>>>
>>>> Seems terribly complicated, and it would make the messages very
>>>> difficult to
>>>> dump by hand. I'm also not sure why you want to have two tables; is
>>>> that what
>>>> you're proposing:
>>>>
>>>> header_labels (index on label)
>>>> id label
>>>> --------------
>>>> 1 to
>>>> 2 from
>>>> 3 subject
>>>>
>>>> message_headers (index on messageid, labelid, header and labelid,
>>>> header)
>>>> -----------------------------------------------------
>>>> id message_id label_id header
>>>> 1 1 1 bob@dbmail
>>>> 2 1 2 joe@sender
>>>> 3 1 3 Hey Bob, it's Joe!
>>>>
>>>>
>>>> I think that unless your database has miraculously good JOINs, this
>>>> is a
>>>> nightmare; note that you cannot reassemble by returning rows in
>>>> ascending
>>>> order of the 'id' column, another column would be needed to keep the
>>>> order of
>>>> the headers. This is probably the most compact was to store the
>>>> headers,
>>>> though, but I think it is at the cost of being severely slow and
>>>> obtuse.
>>>>
>>>> My proposed header looks like this:
>>>>
>>>> fastheaders (index on messageid, header, contents and header,
>>>> contents)
>>>> id message_id header contents
>>>> 1 1 to bob@dbmail
>>>> 2 1 from joe@sender
>>>>
>>>>
>>>> Suffers from using a lot more space, but would have faster search
>>>> times.
>>>> Administrators would be able to more easily query the table by hand.
>>>> As it is
>>>> intended to only be a cache, one could freely zap this table and
>>>> rebuild it
>>>> from the original headers. Naturally, I'm strongly advocating my
>>>> idea ;-)
>>>>
>>>> Aaron
>>>>
>>>>
>>>>
>>>> ""Ed K."" <ed@hp.uab.edu> said:
>>>> [snip]
>>>>
>>>>> What if a solution was proposed in which the order and the
>>>>
>>>>
>>>> formatting of
>>>>
>>>>> the headers could be preserved. A documented method to
>>>>
>>>>
>>>> deconstruct and
>>>>
>>>>> construct the headers, and working code to be included in
>>>>
>>>>
>>>> db.c. Then a flag
>>>>
>>>>> in the messageblks row that would indicate if the header is
>>>>
>>>>
>>>> either only,
>>>>
>>>>> also, or not in the fastheaders table. i suggest we call
>>>>
>>>>
>>>> the table pair
>>>>
>>>>> message_headers and header_labels.
>>>>>
>>>>> ed
>>>>>
>>>>> Security on the internet is impossible without strong, open,
>>>>> and unhindered encryption.
>>>>>
>>>>> _______________________________________________
>>>>> Dbmail-dev mailing list
>>>>> Dbmail-dev@dbmail.org
>>>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Dbmail-dev mailing list
>>>> Dbmail-dev@dbmail.org
>>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>>>
>>>
>>> _______________________________________________
>>> Dbmail-dev mailing list
>>> Dbmail-dev@dbmail.org
>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>>>
>>
>>
>>
>>
>
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
Re: Cached Headers (was NNTP) [ In reply to ]
This is for digestion:

I like the seperate tables, and to show my point, here are some sql examples:

+---------------------+
| version() |
+---------------------+
| 4.0.12-standard-log |
+---------------------+
1 row in set (0.00 sec)

+----+----------+
| id | title |
+----+----------+
| 1 | From: |
| 2 | To: |
| 3 | |
| 4 | Subject: |
+----+----------+
4 rows in set (0.00 sec)

+----+----------+--------------------------------------------+
| id | title_id | header |
+----+----------+--------------------------------------------+
| 1 | 1 | a@com.com |
| 2 | 2 | b@com.com |
| 4 | 3 | unknown header1: this is an unknown header |
| 5 | 4 | test message |
| 6 | 3 | unknown header3: this is an unknown header |
+----+----------+--------------------------------------------+
5 rows in set (0.00 sec)

header.id;
+--------------------------------------------+
| header |
+--------------------------------------------+
| From: a@com.com |
| To: b@com.com |
| unknown header1: this is an unknown header |
| Subject: test message |
| unknown header3: this is an unknown header |
+--------------------------------------------+
5 rows in set (0.00 sec)


order is garenteed, format is garenteed except for: spacing between header and subject, case of header, and any trailing spaces

who says joins are not fast? faster then searching strings. faster if the indexes are setup right.

ed

On Thu, 18 Mar 2004, Ilja Booij wrote:

> Finally I come in the discussion :)
>
> I'm all for optimizing headers, BUT:
> I'd like to optimize for IMAP. Optimizing for POP3 is no issue, since
> POP just downloads the whole message.
>
> I don't really like to optimize for native clients. DBMail's IMAP should
> be fast enough to work with normal IMAP clients (webbased or running on
> clients). One way of optimizing is to implement IMAP features extensions
> that'll help getting faster, like SORT and THREAD.
>
> Currently, I'm not busy with this at all, but if anybody wants to do
> some statistical research to find out what headers different clients
> FETCH using IMAP, we can see which headers can cache.
>
> By the way, I guess that having one table with cached headers like Aaron
> proposed would be the best. I think we should only cache headers that
> are used frequently and/or that are always requested *together* by
> clients (it's no use if for a certain client program, we get all but one
> headers from the cache, and get the last header from parsing the message.
>
> Ilja
>
> Magnus Sundberg wrote:
> > Well,
> > I beleive a flag on the messageblock would be a good start, since it is
> > not that intrusive.
> > But I am very unsusre about how much performance we would gain by
> > searching directly in heavily optimized fast-header tables. Is the pain
> > worth the gain?
> > Small installations do not need the extra performance.
> > Large installations can probably get away with more expensive hardware
> > Huge installations may run out of steam on the best availible hardware,
> > here the fast-headers might be of some use.
> >
> > My beleif is that we should go for the easiest implementation, that is
> > the least intrusive and get a feeling for what we need.
> > I know that MySQL supports queries with regular expressions. These
> > queries will probably be quite fast.
> > I think we should se what happens when we implement server side sorting,
> > based on clever SQL statements using the messageblocks.
> >
> > The huge question is: How much do we gain in performance by using
> > special header tables compared to searching the messageblocks? The gain
> > might be quite small.
> >
> > well, that my $0.02
> >
> > Magnus
> >
> > Aaron Stone wrote:
> >
> >> I'm thinking in terms of searches primarily, where a query might be:
> >>
> >> select distinct(message_id) from fastheaders
> >> where header = 'to' and contents like '%bob%'
> >>
> >> You're looking for displaying the message, where the webmail application
> >> probably doesn't want to read in the header blocks of the message and
> >> parse
> >> them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To
> >> fields.
> >>
> >> Originally I suggested using a table that has the header fields as the
> >> columns, thinking that those were the only commonly requested ones,
> >> but it was
> >> either Ilja or Roel at the time who watched for a couple of IMAP
> >> queries and
> >> saw all kinds of headers being requested both for searching and for
> >> listing
> >> messages. So the configurable columns thing means that the admin needs
> >> to be
> >> watching to see what fields his clients are requesting, and to add
> >> those, and
> >> it means quite a bit more complexity in DBMail, which would need to
> >> either
> >> find the header in the header cache table or parse the full headers.
> >>
> >> For your webmail application, since you know exactly which header
> >> fields you
> >> want and you're writing your own queries, you can just do this:
> >>
> >> select header, contents from fastheaders
> >> where message_id = 82 and (header = 'To' or header = 'From' or ...)
> >>
> >> My instinct is that this would still be reasonably fast, completely
> >> flexible
> >> and generic. Even if the query took slightly longer, the much smaller
> >> result
> >> set size and the negligible parsing needed would offset the query speed.
> >>
> >> Aaron
> >>
> >>
> >> ""Mark Mackay - Orcon"" <mark@orcon.net.nz> said:
> >>
> >>
> >>> Surely for webmail applications, etc you're after a join where you
> >>> get the
> >>> headers as columns for displaying a message list; to avoid having nested
> >>> queries.
> >>>
> >>> E.g. avoid "select * from messages"; then for each message "select *
> >>> from
> >>> headers where messageid=X"
> >>>
> >>> I know the goal is to have the headers table really flexible so that
> >>> additional headers can be added, etc -- but maybe this could be done
> >>> differently. A run-time config file where you mapped the column
> >>> names/numbers to header names, etc.
> >>> That way if a developer wanted to add a new column and start caching
> >>> a new
> >>> header they could alter the table and simply update the config files,
> >>> one-by-one. Or disable the headercache altogether.
> >>> I could be missing something here -- is there a query which can turn a
> >>> one-to-many join into a single line?
> >>>
> >>> /Mark
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: dbmail-dev-bounces@dbmail.org
> >>>> [mailto:dbmail-dev-bounces@dbmail.org] On Behalf Of Aaron Stone
> >>>> Sent: Thursday, 18 March 2004 6:30 p.m.
> >>>> To: DBMAIL Developers Mailinglist
> >>>> Subject: Re: [Dbmail-dev] NNTP
> >>>>
> >>>> Seems terribly complicated, and it would make the messages very
> >>>> difficult to
> >>>> dump by hand. I'm also not sure why you want to have two tables; is
> >>>> that what
> >>>> you're proposing:
> >>>>
> >>>> header_labels (index on label)
> >>>> id label
> >>>> --------------
> >>>> 1 to
> >>>> 2 from
> >>>> 3 subject
> >>>>
> >>>> message_headers (index on messageid, labelid, header and labelid,
> >>>> header)
> >>>> -----------------------------------------------------
> >>>> id message_id label_id header
> >>>> 1 1 1 bob@dbmail
> >>>> 2 1 2 joe@sender
> >>>> 3 1 3 Hey Bob, it's Joe!
> >>>>
> >>>>
> >>>> I think that unless your database has miraculously good JOINs, this
> >>>> is a
> >>>> nightmare; note that you cannot reassemble by returning rows in
> >>>> ascending
> >>>> order of the 'id' column, another column would be needed to keep the
> >>>> order of
> >>>> the headers. This is probably the most compact was to store the
> >>>> headers,
> >>>> though, but I think it is at the cost of being severely slow and
> >>>> obtuse.
> >>>>
> >>>> My proposed header looks like this:
> >>>>
> >>>> fastheaders (index on messageid, header, contents and header,
> >>>> contents)
> >>>> id message_id header contents
> >>>> 1 1 to bob@dbmail
> >>>> 2 1 from joe@sender
> >>>>
> >>>>
> >>>> Suffers from using a lot more space, but would have faster search
> >>>> times.
> >>>> Administrators would be able to more easily query the table by hand.
> >>>> As it is
> >>>> intended to only be a cache, one could freely zap this table and
> >>>> rebuild it
> >>>> from the original headers. Naturally, I'm strongly advocating my
> >>>> idea ;-)
> >>>>
> >>>> Aaron
> >>>>
> >>>>
> >>>>
> >>>> ""Ed K."" <ed@hp.uab.edu> said:
> >>>> [snip]
> >>>>
> >>>>> What if a solution was proposed in which the order and the
> >>>>
> >>>>
> >>>> formatting of
> >>>>
> >>>>> the headers could be preserved. A documented method to
> >>>>
> >>>>
> >>>> deconstruct and
> >>>>
> >>>>> construct the headers, and working code to be included in
> >>>>
> >>>>
> >>>> db.c. Then a flag
> >>>>
> >>>>> in the messageblks row that would indicate if the header is
> >>>>
> >>>>
> >>>> either only,
> >>>>
> >>>>> also, or not in the fastheaders table. i suggest we call
> >>>>
> >>>>
> >>>> the table pair
> >>>>
> >>>>> message_headers and header_labels.
> >>>>>
> >>>>> ed
> >>>>>
> >>>>> Security on the internet is impossible without strong, open,
> >>>>> and unhindered encryption.
> >>>>>
> >>>>> _______________________________________________
> >>>>> Dbmail-dev mailing list
> >>>>> Dbmail-dev@dbmail.org
> >>>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Dbmail-dev mailing list
> >>>> Dbmail-dev@dbmail.org
> >>>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >>>>
> >>>
> >>> _______________________________________________
> >>> Dbmail-dev mailing list
> >>> Dbmail-dev@dbmail.org
> >>> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >>>
> >>
> >>
> >>
> >>
> >
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
Re: NNTP [ In reply to ]
> > This last part is already done - message headers go into the first
> > messageblks row, the message body is in all subsequent rows - so that's
> > exactly what the flag would do. It would also allow you to have more
> > than one messageblks row that contained headers, which if I understand
> > correctly, cannot be done right now (so if you had a huge amount of
> > headers (and the rfc's place no limit on that), it would currently
> > break, but with adding a flag for them, it could be made to work).
>
> Being able to handle oversized headers is important to maintain compliance
> with an implication of the RFC; there indeed is no limit on headers.

FYI, I did some brief testing and it seems if you have > 512k headers
(ie. the max size of a messageblks entry), it silently drops the message.
Doesn't get delivered or bounced or stuck in mail queue. :(



--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--
Re: NNTP [ In reply to ]
Jesse Norell wrote:
> FYI, I did some brief testing and it seems if you have > 512k headers
> (ie. the max size of a messageblks entry), it silently drops the message.
> Doesn't get delivered or bounced or stuck in mail queue. :(
So this is definitly a bug.
I've added it to the BUGS file (as if that solves the problem...)

We should fix this ASAP

Ilja
Re: Cached Headers (was NNTP) [ In reply to ]
Well,
I like your idea, but why not fill the title table automagically
as you find new headers? Then you don't need the title_id 3 in
your example.

But I am still not sure how much the performance gain is by
putting it into separate tables compared to searching on the
message blocks.

Magnus

Ed K. wrote:
> This is for digestion:
>
> I like the seperate tables, and to show my point, here are some sql examples:
>
> mysql> select version();
> +---------------------+
> | version() |
> +---------------------+
> | 4.0.12-standard-log |
> +---------------------+
> 1 row in set (0.00 sec)
>
> mysql> select * from title;
> +----+----------+
> | id | title |
> +----+----------+
> | 1 | From: |
> | 2 | To: |
> | 3 | |
> | 4 | Subject: |
> +----+----------+
> 4 rows in set (0.00 sec)
>
> mysql> select * from header;
> +----+----------+--------------------------------------------+
> | id | title_id | header |
> +----+----------+--------------------------------------------+
> | 1 | 1 | a@com.com |
> | 2 | 2 | b@com.com |
> | 4 | 3 | unknown header1: this is an unknown header |
> | 5 | 4 | test message |
> | 6 | 3 | unknown header3: this is an unknown header |
> +----+----------+--------------------------------------------+
> 5 rows in set (0.00 sec)
>
> mysql> select IF (title.id=3,header.header,concat(title.title ," ", header.header)) as header from header join title on title.id=header.title_id order by
> header.id;
> +--------------------------------------------+
> | header |
> +--------------------------------------------+
> | From: a@com.com |
> | To: b@com.com |
> | unknown header1: this is an unknown header |
> | Subject: test message |
> | unknown header3: this is an unknown header |
> +--------------------------------------------+
> 5 rows in set (0.00 sec)
>
>
RE: Cached Headers (was NNTP) [ In reply to ]
> I'm thinking in terms of searches primarily, where a query might be:
>
> select distinct(message_id) from fastheaders
> where header = 'to' and contents like '%bob%'
>
> You're looking for displaying the message, where the webmail application
> probably doesn't want to read in the header blocks of the message and parse
> them just to show the Date, From, Subject, To, CC, Bcc, Fcc, Reply-To fields.

Yep, that's exactly what we want (searches and fast lookups of common
headers). :) But there are more uses too, I think.


<interesting comments on imap usage snipped>

> My instinct is that this would still be reasonably fast, completely flexible
> and generic. Even if the query took slightly longer, the much smaller result
> set size and the negligible parsing needed would offset the query speed.

In addition to being very flexible/generic for header, this could even
be a place for general per-message metadata, not just headers. Any time you
have to parse a message, you could store the result if it's cost-effective
and will be used again (there was one very specific and useful example in
dbmail that I'd mentioned on the mailing list, but can't find it now).
But you could also have local uses. Eg. in weDBmail, when you download a
message from a pop3 server, I'd like to save an id for which pop3 account
it came from so if you forward or reply to it you don't have to parse the
headers to try to guess what the sender profile should be (and if you were
Bcc:'d you won't ever know). Of course we can just have our own table to
track that, but if dbmail had a per-message info table like that, we'd try
to use it.



--
Jesse Norell

administrator@kci.net is not my email address;
change "administrator" to my first name.
--

1 2  View All