Mailing List Archive

max mails per folder
Hello,

I'm doing some perfomance testing of our dbmail instance. I noticed that If I copy large number of mails into single folder, append starts to get really slow. For example, I have a spam collection of more than 50k sitting on a cyrus imap comfortably in a single folder with append to it taking about 0.03s. When I try to copy it to a folder on dbmail imapd, initial appends to empty folder take less than a second, but this quickly grows to tens of seconds and at around 7k mails copied each append takes around 100 seconds.

Is this expected behaviour? There are no obvious slow queries in mysql slow query log.

Can anything be done to improve this?


--

Jure Pečar
https://jure.pecar.org
http://f5j.eu
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
My blind shot is that it doesn't depend on dbmail itself, but to MySQL
"filling up".

Things probably get slow when either:

* your controler cache gets filled and the controller starts writing
for real to disk
* your InnoDB log buffer gets full and the DB has to start committing
pending transactions to the real IDB files.

There is not much you can to for the first thing, but for the second one
you can tweak the innodb_log_file_size and innodb_log_files_in_group.

The purpose of those files is being written sequentially thus avoiding
the bottleneck of the disk having to seek the inodes and the free inodes
as the table grows.

I suggest you to "watch" those file while the DB is appending the
messages and see how fast they get filled, a complete roundabout implies
a flush to the real tables.

Just to give you a vague idea, on our system we have 6 * 512MB files.

---

Andrea Brancatelli
Schema31 S.p.a.
Responsabile IT

ROMA - BO - FI - PA
ITALY
Tel: +39. 06.98.358.472
Cell: +39 331.2488468
Fax: +39. 055.71.880.466
Società del Gruppo SC31 ITALIA

Il 2015-11-09 10:17 Jure Pečar ha scritto:

> Hello,
>
> I'm doing some perfomance testing of our dbmail instance. I noticed that If I copy large number of mails into single folder, append starts to get really slow. For example, I have a spam collection of more than 50k sitting on a cyrus imap comfortably in a single folder with append to it taking about 0.03s. When I try to copy it to a folder on dbmail imapd, initial appends to empty folder take less than a second, but this quickly grows to tens of seconds and at around 7k mails copied each append takes around 100 seconds.
>
> Is this expected behaviour? There are no obvious slow queries in mysql slow query log.
>
> Can anything be done to improve this?
Re: max mails per folder [ In reply to ]
On Mon, 09 Nov 2015 10:54:48 +0100
Andrea Brancatelli <abrancatelli@schema31.it> wrote:

>
>
> My blind shot is that it doesn't depend on dbmail itself, but to MySQL
> "filling up".
>
> Things probably get slow when either:
>
> * your controler cache gets filled and the controller starts writing
> for real to disk
> * your InnoDB log buffer gets full and the DB has to start committing
> pending transactions to the real IDB files.
>
> There is not much you can to for the first thing, but for the second one
> you can tweak the innodb_log_file_size and innodb_log_files_in_group.
>
> The purpose of those files is being written sequentially thus avoiding
> the bottleneck of the disk having to seek the inodes and the free inodes
> as the table grows.
>
> I suggest you to "watch" those file while the DB is appending the
> messages and see how fast they get filled, a complete roundabout implies
> a flush to the real tables.
>
> Just to give you a vague idea, on our system we have 6 * 512MB files.

I don't think that's a smart thing to do. See
https://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/

We have about 2MB/min going into innodb logs and already have 2*256MB innodb log files, which should be plenty.

What kind of append performance are you seeing on your setup?

What is your crash recovery time with such large log files?

Anyway IO is not the bottleneck, as our db server is capable more than 4GB/s (nvme is amazing). It must be something else...


--

Jure Pečar
https://jure.pecar.org
http://f5j.eu
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
Well, obviously it the depends on the mail amount.

We have about 1000 mailboxes with about 350GB of online mail. On spike
hours we have a full recycle of the logs in about 10 minutes especially
because of circular replication increasing the server activities.

Restart after a crash takes more or less 2 minutes.
---

Andrea Brancatelli
Schema31 S.p.a.
Responsabile IT

ROMA - BO - FI - PA
ITALY
Tel: +39. 06.98.358.472
Cell: +39 331.2488468
Fax: +39. 055.71.880.466
Società del Gruppo SC31 ITALIA

Il 2015-11-09 12:26 Jure Pečar ha scritto:

> On Mon, 09 Nov 2015 10:54:48 +0100
> Andrea Brancatelli <abrancatelli@schema31.it> wrote:
>
>> My blind shot is that it doesn't depend on dbmail itself, but to MySQL
>> "filling up".
>>
>> Things probably get slow when either:
>>
>> * your controler cache gets filled and the controller starts writing
>> for real to disk
>> * your InnoDB log buffer gets full and the DB has to start committing
>> pending transactions to the real IDB files.
>>
>> There is not much you can to for the first thing, but for the second one
>> you can tweak the innodb_log_file_size and innodb_log_files_in_group.
>>
>> The purpose of those files is being written sequentially thus avoiding
>> the bottleneck of the disk having to seek the inodes and the free inodes
>> as the table grows.
>>
>> I suggest you to "watch" those file while the DB is appending the
>> messages and see how fast they get filled, a complete roundabout implies
>> a flush to the real tables.
>>
>> Just to give you a vague idea, on our system we have 6 * 512MB files.
>
> I don't think that's a smart thing to do. See
> https://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/ [1]
>
> We have about 2MB/min going into innodb logs and already have 2*256MB innodb log files, which should be plenty.
>
> What kind of append performance are you seeing on your setup?
>
> What is your crash recovery time with such large log files?
>
> Anyway IO is not the bottleneck, as our db server is capable more than 4GB/s (nvme is amazing). It must be something else...


Links:
------
[1]
https://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/
Re: max mails per folder [ In reply to ]
On Mon, 09 Nov 2015 12:38:06 +0100
Andrea Brancatelli <abrancatelli@schema31.it> wrote:

>
>
> Well, obviously it the depends on the mail amount.
>
> We have about 1000 mailboxes with about 350GB of online mail. On spike
> hours we have a full recycle of the logs in about 10 minutes especially
> because of circular replication increasing the server activities.

I ran a separate dbmail imapd instance in order to capture what's happening at append time. This is the point I found lots of time spent:

Nov 09 17:24:54 email-prod-01.a dbmail-imapd[16154]: [0x7f398a864e30] Database:[db] db_stmt_prepare(+419): [0x7f398a82cb00] [.SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, DATE_FORMAT(internal_date, GET_FORMAT(DATETIME,'ISO')), rfcsize, message_idnr FROM dbmail_messages m LEFT JOIN dbmail_physmessage p ON p.id = m.physmessage_id WHERE m.mailbox_idnr = ? AND m.status IN (0,1) ORDER BY message_idnr ASC]
Nov 09 17:24:54 email-prod-01.a dbmail-imapd[16154]: [0x7f398a864e30] Database:[db] db_stmt_set_u64(+439): [0x7f396400ad70] 1:[27520]
Nov 09 17:25:06 email-prod-01.a dbmail-imapd[16154]: [0x7f398a864e30] Database:[db] db_con_clear(+298): [0x7f398a82cb00] connection cleared

It's 12 seconds in this case.

Looking at the code I find this statement in dm_mailboxstate.c, function state_load_messages. Immediately after the query is done, msginfo tree is filled with the data from the query resut.

It appears this tree insert is the cause of slowness we're observing. Since this is in GLib code, I'll dig further there. I'll also have to dust off some trees, insertion and sorting CS theory ;)


--

Jure Pečar
https://jure.pecar.org
http://f5j.eu
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
On 09/11/2015 16:44, Jure Pečar wrote:
> On Mon, 09 Nov 2015 12:38:06 +0100
> Andrea Brancatelli <abrancatelli@schema31.it> wrote:
>
>>
>>
>> Well, obviously it the depends on the mail amount.
>>
>> We have about 1000 mailboxes with about 350GB of online mail. On spike
>> hours we have a full recycle of the logs in about 10 minutes especially
>> because of circular replication increasing the server activities.
>
> I ran a separate dbmail imapd instance in order to capture what's happening at append time. This is the point I found lots of time spent:
>
> Nov 09 17:24:54 email-prod-01.a dbmail-imapd[16154]: [0x7f398a864e30] Database:[db] db_stmt_prepare(+419): [0x7f398a82cb00] [.SELECT seen_flag, answered_flag, deleted_flag, flagged_flag, draft_flag, recent_flag, DATE_FORMAT(internal_date, GET_FORMAT(DATETIME,'ISO')), rfcsize, message_idnr FROM dbmail_messages m LEFT JOIN dbmail_physmessage p ON p.id = m.physmessage_id WHERE m.mailbox_idnr = ? AND m.status IN (0,1) ORDER BY message_idnr ASC]
> Nov 09 17:24:54 email-prod-01.a dbmail-imapd[16154]: [0x7f398a864e30] Database:[db] db_stmt_set_u64(+439): [0x7f396400ad70] 1:[27520]
> Nov 09 17:25:06 email-prod-01.a dbmail-imapd[16154]: [0x7f398a864e30] Database:[db] db_con_clear(+298): [0x7f398a82cb00] connection cleared
>
> It's 12 seconds in this case.
>
> Looking at the code I find this statement in dm_mailboxstate.c, function state_load_messages. Immediately after the query is done, msginfo tree is filled with the data from the query resut.
>
> It appears this tree insert is the cause of slowness we're observing. Since this is in GLib code, I'll dig further there. I'll also have to dust off some trees, insertion and sorting CS theory ;)
>
>

Hi,

This slowing down might be able to be refactored out as I'm unsure why,
on inserting a message, a tree of messages is needed. I haven't looked
at the code recently but didn't think dbmail retrieved the messages
before or after an insert.

It's normal for databases to prefer a table scan rather than use an
index when it thinks the number of rows exceeds the benefit of using an
index.

Alan

--
Persistent Objects Ltd
128 Lilleshall Road
London SM4 6DR

Lifting brand value by using all means at my disposal
including technological, motivational and best practice.

Proud sponsor of TEDx Wandsworth 2015

Registered in England and Wales 03538717

+44/0 79 3030 5004
+44/0 20 8544 5292
http://p-o.co.uk
https://plus.google.com/+AlanHicksLondon
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
On Tue, 10 Nov 2015 10:44:37 +0000
Alan Hicks <ahicks@p-o.co.uk> wrote:

> Hi,
>
> This slowing down might be able to be refactored out as I'm unsure why,
> on inserting a message, a tree of messages is needed. I haven't looked
> at the code recently but didn't think dbmail retrieved the messages
> before or after an insert.
>
> It's normal for databases to prefer a table scan rather than use an
> index when it thinks the number of rows exceeds the benefit of using an
> index.
>
> Alan

I did some more "printf style" profiling on that while loop with some interesting results.

If I put calls to clock_gettime just outside of a loop and display difference at the end, I get say 14.something seconds. If I put these calls inside the loop and print out the duration of each pass thru the loop and then sum them up, I get about 0.04 second.

So I'm either doing something wrong or something interesting is going on inside libzdb, to which db_result_next(r) translates.

Digging further ...


--

Jure Pečar
https://jure.pecar.org
http://f5j.eu
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
On Tue, 10 Nov 2015 15:59:18 +0100
Jure Pečar <pegasus@nerv.eu.org> wrote:

> So I'm either doing something wrong or something interesting is going on inside libzdb, to which db_result_next(r) translates.

And that boils down to mysql_stmt_fetch(), which has an interesting sentence in documentation:

"By default, result sets are fetched unbuffered a row at a time from the server."

That's exactly what I'm observing with strace: lots of poll/read/write lines, operating on one result line at a time.

I guess I'll have to look into how to stick mysql_stmt_store_result() somewhere into libzdb ...


--

Jure Pečar
https://jure.pecar.org
http://f5j.eu
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
On Tue, 10 Nov 2015 17:04:56 +0100
Jure Pečar <pegasus@nerv.eu.org> wrote:

> I guess I'll have to look into how to stick mysql_stmt_store_result() somewhere into libzdb ...

Or alternatively, run dbmail processes on the same host as mysql. For my test case it's at least 20x faster ...


--

Jure Pečar
https://jure.pecar.org
http://f5j.eu
_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
Re: max mails per folder [ In reply to ]
Hi Jure,

Did you manage to come to a resolution on this problem? It is causing
me untold issues as well and putting the dbmail processes on the same
server as mysql is not feasible as I am using Amazon RDS at the moment.

All the best

Carl Taylor

On 10/11/15 16:27, Jure Pe?ar wrote:
> On Tue, 10 Nov 2015 17:04:56 +0100
> Jure Pečar <pegasus@nerv.eu.org> wrote:
>
>> I guess I'll have to look into how to stick mysql_stmt_store_result() somewhere into libzdb ...
> Or alternatively, run dbmail processes on the same host as mysql. For my test case it's at least 20x faster ...
>
>

_______________________________________________
DBmail mailing list
DBmail@dbmail.org
http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail