Mailing List Archive

using transactions
Hello everyone,

On mysql transactional tables, a commit is implied on each insert
operation unless you turn off autocommit.

Most operations that involves multiple writes to the database can be
made faster by grouping them inside a transaction, since it will mean
fewer commits overall.

For example I got a 40% performance boost during dbmail-smtp insertion
on a few hundred messages by putting a begin/commit pair at the start
and end of insert_messages (dbmail 1.2 cvs).

Are there plans for this kind of thing in the 2.0 tree? A quick look at
the code doesn't find much. It will also increase consistency after a
crash since each message will be fully created or not created at all.

-chris
Re: using transactions [ In reply to ]
On Wed, 2003-10-15 at 13:37, Chris Mason wrote:
> Hello everyone,
>
> On mysql transactional tables, a commit is implied on each insert
> operation unless you turn off autocommit.
>
> Most operations that involves multiple writes to the database can be
> made faster by grouping them inside a transaction, since it will mean
> fewer commits overall.

Ok, here's a patch for dbmail 1.2 cvs that implements the transaction
begin/commit for mysql during pipe.c:insert_messages, and the imap
copy/append message commands. This will only improve performance for
innodb/bdb table types, it should change nothing for the
non-transactional tables.

It also changes the mysql quota funcs to skip computing the sum of all
message sizes when the user has no quota. I'm not sure why there are
two different funcs that do the same quota calculations, but I think
I've got them both right.

It does not touch postgres, so applying this for postgres won't even
compile. Use at your own risk, I haven't tried huge attachments with it
yet.

-chris
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 09:57, Chris Mason wrote:
> Ok, here's a patch for dbmail 1.2 cvs that implements the transaction
> begin/commit for mysql during pipe.c:insert_messages, and the imap
> copy/append message commands.

Great, dbmail should be using transactions, why use a database if you
don't use transactions.

> It does not touch postgres, so applying this for postgres won't even
> compile. Use at your own risk, I haven't tried huge attachments with it
> yet.

Would really love to see this in postgres also, maybe I'll get motivated
at some point...

Couple of minor comments about the code,

You have created two new database functions called:
db_start_transaction() and
db_stop_transaction()

I would recommend changing them to something like:
db_begin_transaction() (begin is standard word)
db_commit_transaction() (stop is ambiguous)

also I would add:
db_rollback_transaction() There will probably be times we want to
rollback our work if we find an error.

Great work, glad to see dbmail getting more robust. In general I have
always thought we could benefit from using the database to do more for
us, rather than having to implement things ourselves.

Foreign keys are a good example, when we delete the last entry in the
message table, the database should automatically delete all entries in
the phs_message table and in the msg_blocks table for us.

Matthew
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 10:15, Matthew T. O'Connor wrote:
> On Thu, 2003-10-16 at 09:57, Chris Mason wrote:
> > Ok, here's a patch for dbmail 1.2 cvs that implements the transaction
> > begin/commit for mysql during pipe.c:insert_messages, and the imap
> > copy/append message commands.
>
> Great, dbmail should be using transactions, why use a database if you
> don't use transactions.
>

indexing ;-) Before dbmail can really use transactions, it need to be
able to detect if a given table/database supports them. I'm guessing
mysql has some method for this, but haven't checked.

Mostly the patch needs to be fit into dbmail-2.0. It's a proof of
concept for benchmarking and discussion, since I don't have much time to
experiment with the unstable tree. dbmail is pretty good, but there's
still lots of room for performance improvement. Reading through the
archives, people know this already, it's just a matter of time as they
get through the feature list ;-)

-chris
Re: using transactions [ In reply to ]
Chris Mason wrote:
> On Thu, 2003-10-16 at 10:15, Matthew T. O'Connor wrote:
>
>>On Thu, 2003-10-16 at 09:57, Chris Mason wrote:
>>
>>>Ok, here's a patch for dbmail 1.2 cvs that implements the transaction
>>>begin/commit for mysql during pipe.c:insert_messages, and the imap
>>>copy/append message commands.
>>
>>Great, dbmail should be using transactions, why use a database if you
>>don't use transactions.
>>
>

I beleive this is a non issue, suppose the following function

start_transaction(){
if ( transactions_supported) {
/* Issue SQL start transaction code */

};
};

/Magnus
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 11:03, Magnus Sundberg wrote:
> Chris Mason wrote:
> > On Thu, 2003-10-16 at 10:15, Matthew T. O'Connor wrote:
> >
> >>On Thu, 2003-10-16 at 09:57, Chris Mason wrote:
> >>
> >>>Ok, here's a patch for dbmail 1.2 cvs that implements the transaction
> >>>begin/commit for mysql during pipe.c:insert_messages, and the imap
> >>>copy/append message commands.
> >>
> >>Great, dbmail should be using transactions, why use a database if you
> >>don't use transactions.
> >>
> >
>
> I beleive this is a non issue, suppose the following function
>
> start_transaction(){
> if ( transactions_supported) {
> /* Issue SQL start transaction code */
>
> };
> };

There's a lot more to using transactions than BEGIN/COMMIT. The current
code orders operations such that transactions are not purely required.
When inserting messages it sets the unique_id last (as a kind of
informal commit), does rollback manually when some part of the insert
fails, etc.

A real transactional system would let the database do all of that. My
patch only addresses a performance aspect with mysql transactional
tables...every write triggers a commit. Enclosing them in a transaction
does one commit for the whole group instead.

Inserting a huge message as an atomic unit might not be the best idea,
I'm not sure how well mysql will cope with it.

-chris
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 11:55, Chris Mason wrote:

> Inserting a huge message as an atomic unit might not be the best idea,
> I'm not sure how well mysql will cope with it.

Hmmm, a quick test shows it dealt with a 40MB attachment fine, which is
larger than my innodb log buffer.

-chris
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 10:51, Chris Mason wrote:
> On Thu, 2003-10-16 at 10:15, Matthew T. O'Connor wrote:
> > On Thu, 2003-10-16 at 09:57, Chris Mason wrote:
> > > Ok, here's a patch for dbmail 1.2 cvs that implements the transaction
> > > begin/commit for mysql during pipe.c:insert_messages, and the imap
> > > copy/append message commands.
> >
> > Great, dbmail should be using transactions, why use a database if you
> > don't use transactions.
> >
>
> indexing ;-) Before dbmail can really use transactions, it need to be
> able to detect if a given table/database supports them. I'm guessing
> mysql has some method for this, but haven't checked.

I don't think we should play down to the lowest common denominator
here. I think that if a database doesn't support transactions, or some
other feature that will make dbmail better than that database shouldn't
be supported. Obviously a line has to be drawn somewhere and it's open
to debate, but transactions are a simple one.

I know dbmail 2.0 is trying to be more database independent, but most
any database you would want to use at least supports transactions mysql,
postgresql, oracle, sql server.
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 11:55, Chris Mason wrote:
> There's a lot more to using transactions than BEGIN/COMMIT. The current
> code orders operations such that transactions are not purely required.
> When inserting messages it sets the unique_id last (as a kind of
> informal commit), does rollback manually when some part of the insert
> fails, etc.
>
> A real transactional system would let the database do all of that. My
> patch only addresses a performance aspect with mysql transactional
> tables...every write triggers a commit. Enclosing them in a transaction
> does one commit for the whole group instead.

I agree with all the above 100%, my biggest gripe with the dbmail design
is that it does a lot of work to deal with mysql limitations. I think
that's a bad design decision. Use the database to do what it does best,
which is maintain data consistency for you.

> Inserting a huge message as an atomic unit might not be the best idea,
> I'm not sure how well mysql will cope with it.

Can't speak to that, but I wouldn't think it's a problem, I don't expect
to recieve isos for CD via email. I would imagine that mysql can handle
several MB emails in one transaction, especially since the msg_blocks
table splits it up into several small inserts.
Re: using transactions [ In reply to ]
On Thu, Oct 16, 2003 at 10:15:20AM -0400, Matthew T. O'Connor wrote:
> Foreign keys are a good example, when we delete the last entry in the
> message table, the database should automatically delete all entries in
> the phs_message table and in the msg_blocks table for us.

I think this would require triggers, which MySQL doesn't support,
because there can't be a one-to-many relationship between a foreign key
and field it references. Messages reference physmessages, not the
other way around.

xn
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 12:50, Christian G. Warden wrote:
> On Thu, Oct 16, 2003 at 10:15:20AM -0400, Matthew T. O'Connor wrote:
> > Foreign keys are a good example, when we delete the last entry in the
> > message table, the database should automatically delete all entries in
> > the phs_message table and in the msg_blocks table for us.
>
> I think this would require triggers, which MySQL doesn't support,
> because there can't be a one-to-many relationship between a foreign key
> and field it references. Messages reference physmessages, not the
> other way around.

true... so without triggers, we may have to do some manual work to make
sure we are the only message left, if so then delete phsmessage, which
will automatically delete the msg_blocks entries. BTW, this could
probably be abstracted away with the different db drivers, the pg driver
would assume that this is being taken care of by a trigger, and the
mysql driver would have to do the work.

Another important reason to have the database maintain consistency
itself as much as possible is that you can't that dbmail is the only
client app that touches the database. I know there is at least one
webmail app that bypasses IMAP / POP3 and goes directly to the database
for performance reasons, now if that client doesn't correctly do all the
same things that dbmail does the database may become inconsistent.
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 12:11, Matthew T. O'Connor wrote:
> On Thu, 2003-10-16 at 11:55, Chris Mason wrote:
> > There's a lot more to using transactions than BEGIN/COMMIT. The current
> > code orders operations such that transactions are not purely required.
> > When inserting messages it sets the unique_id last (as a kind of
> > informal commit), does rollback manually when some part of the insert
> > fails, etc.
> >
> > A real transactional system would let the database do all of that. My
> > patch only addresses a performance aspect with mysql transactional
> > tables...every write triggers a commit. Enclosing them in a transaction
> > does one commit for the whole group instead.
>
> I agree with all the above 100%, my biggest gripe with the dbmail design
> is that it does a lot of work to deal with mysql limitations. I think
> that's a bad design decision. Use the database to do what it does best,
> which is maintain data consistency for you.
>

It's important to remember that non-transactional tables have their
place in some workloads. They are faster and use less resources
overall.

So, it's not about the lowest common denominator, its about deciding
which workloads you want to support. For my own personal use, I want
transactional tables, but others might not.

-chris
Re: using transactions [ In reply to ]
On Thu, Oct 16, 2003 at 01:06:59PM -0400, Matthew T. O'Connor wrote:
> On Thu, 2003-10-16 at 12:50, Christian G. Warden wrote:
> > On Thu, Oct 16, 2003 at 10:15:20AM -0400, Matthew T. O'Connor wrote:
> > > Foreign keys are a good example, when we delete the last entry in the
> > > message table, the database should automatically delete all entries in
> > > the phs_message table and in the msg_blocks table for us.
> >
> > I think this would require triggers, which MySQL doesn't support,
> > because there can't be a one-to-many relationship between a foreign key
> > and field it references. Messages reference physmessages, not the
> > other way around.
>
> true... so without triggers, we may have to do some manual work to make
> sure we are the only message left, if so then delete phsmessage, which
> will automatically delete the msg_blocks entries. BTW, this could
> probably be abstracted away with the different db drivers, the pg driver
> would assume that this is being taken care of by a trigger, and the
> mysql driver would have to do the work.

Even without triggers, much of the logic could be moved to the database.
Currently, I believe it does something like (mixing sql and application
pseudocode):
SELECT message_idnr, physmessage_id WHERE status = 3;
foreach message_idnr {
DELETE FROM
messages
WHERE
message_idnr = $message_idnr;
SELECT
message_idnr
FROM
messages
WHERE
physmessage_id = $physmessage_id;
if (no results) {
DELETE FROM
physmessages
WHERE
id = $physmessage_id;
DELETE FROM
messageblks
WHERE
physmessage_id = $physmessage_id;
}
}

This could be simpilified (in MySQL syntax; don't know similar
PostgreSQL (or SQL92?) syntax):
DELETE FROM
messages
WHERE
status = 3;
DELETE FROM
physmessage
USING
physmessage p
LEFT JOIN
messages m ON m.physmessage_id = p.id
WHERE
m.message_idnr IS NULL;

With a foreign key constraint using ON DELETE CASCADE on messageblks,
that's it.

> Another important reason to have the database maintain consistency
> itself as much as possible is that you can't that dbmail is the only
> client app that touches the database. I know there is at least one
> webmail app that bypasses IMAP / POP3 and goes directly to the database
> for performance reasons, now if that client doesn't correctly do all the
> same things that dbmail does the database may become inconsistent.

Right. If the purging happens within an isolated transaction, it'll
always be consistent. I'm looking forward to transactions in 2.0.

xn
Re: using transactions [ In reply to ]
On Thu, 2003-10-16 at 13:07, Chris Mason wrote:
> It's important to remember that non-transactional tables have their
> place in some workloads. They are faster and use less resources
> overall.
>
> So, it's not about the lowest common denominator, its about deciding
> which workloads you want to support. For my own personal use, I want
> transactional tables, but others might not.

True enough I suppose, but I view my email as critical data, hence I
value data integrity over performance, and IMHO think that should be the
focus of the dbmail project. Keep in mind, that integrity and
performance are not always orthogonal, in fact at times they are
complimentary.