Hello,
I had a look at the query used in pgsql/dbpgsql.c to calculate the used quota
in db_get_quotum_used(). I executed the current query (with "size" instead of
"m.size")
SELECT SUM(messagesize)
FROM messages
WHERE mailbox_idnr IN
(SELECT mailbox_idnr
FROM mailboxes
WHERE owner_idnr = %llu::bigint)
AND status < 2;
and the equivalent query (which is used in mysql/dbmysql.c)
SELECT SUM(m.messagesize)
FROM messages m, mailboxes mb
WHERE m.mailbox_idnr = mb.mailbox_idnr
AND mb.owner_idnr = %llu::bigint
AND m.status < 2;
on an account which had 2155 mails in 52 mailboxes (the whole database
contains 26000 mails in 400 mailboxes). The server is a Celeron 1200MHz with
256 MB RAM.
I asked PostgreSQL 7.3 to EXPLAIN ANALYZE the queries (details see below):
PostgreSQL uses two nested sequential scans (on messages and mailboxes) for
the first query and the total execution time is 12144.25 msec.
The second query is executed as a hash join and takes only 154.69 msec.
These results are repeatable and the interactive behaviour of the mail server
is _much_ better with the second query when, e.g., moving mails between
folders.
I've attached a patch to change the query in pgsql/dbpgsql.c to be the same as
the query used in mysql/dbmysql.c .
Is there a reason why the query in pgsql/dbpgsql.c has been formulated
differently, with a sub-select?
Regards,
Armin
Detailed output from "EXPLAIN ANALYZE":
dbmail=# explain analyze SELECT SUM(messagesize) FROM messages WHERE
mailbox_idnr IN (SELECT mailbox_idnr FROM mailboxes WHERE owner_idnr =
10::bigint) AND status < 2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=140954.36..140954.36 rows=1 width=8) (actual
time=12144.10..12144.10 rows=1 loops=1)
-> Seq Scan on messages (cost=0.00..140922.85 rows=12601 width=8) (actual
time=11.92..12130.91 rows=2056 loops=1)
Filter: ((status < 2) AND (subplan))
SubPlan
-> Seq Scan on mailboxes (cost=0.00..11.03 rows=52 width=8)
(actual time=0.02..0.43 rows=50 loops=25105)
Filter: (owner_idnr = 10::bigint)
Total runtime: 12144.25 msec
dbmail=# explain analyze SELECT SUM(m.messagesize) FROM messages m, mailboxes
mb WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = 10::bigint AND
m.status < 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=856.95..856.95 rows=1 width=24) (actual time=154.51..154.51
rows=1 loops=1)
-> Hash Join (cost=11.16..848.80 rows=3260 width=24) (actual
time=1.35..142.72 rows=2056 loops=1)
Hash Cond: ("outer".mailbox_idnr = "inner".mailbox_idnr)
-> Seq Scan on messages m (cost=0.00..670.89 rows=25202 width=16)
(actual time=0.10..111.64 rows=25105 loops=1)
Filter: (status < 2)
-> Hash (cost=11.03..11.03 rows=52 width=8) (actual time=1.11..1.11
rows=0 loops=1)
-> Seq Scan on mailboxes mb (cost=0.00..11.03 rows=52
width=8) (actual time=0.18..1.05 rows=52 loops=1)
Filter: (owner_idnr = 10::bigint)
Total runtime: 154.69 msec
I had a look at the query used in pgsql/dbpgsql.c to calculate the used quota
in db_get_quotum_used(). I executed the current query (with "size" instead of
"m.size")
SELECT SUM(messagesize)
FROM messages
WHERE mailbox_idnr IN
(SELECT mailbox_idnr
FROM mailboxes
WHERE owner_idnr = %llu::bigint)
AND status < 2;
and the equivalent query (which is used in mysql/dbmysql.c)
SELECT SUM(m.messagesize)
FROM messages m, mailboxes mb
WHERE m.mailbox_idnr = mb.mailbox_idnr
AND mb.owner_idnr = %llu::bigint
AND m.status < 2;
on an account which had 2155 mails in 52 mailboxes (the whole database
contains 26000 mails in 400 mailboxes). The server is a Celeron 1200MHz with
256 MB RAM.
I asked PostgreSQL 7.3 to EXPLAIN ANALYZE the queries (details see below):
PostgreSQL uses two nested sequential scans (on messages and mailboxes) for
the first query and the total execution time is 12144.25 msec.
The second query is executed as a hash join and takes only 154.69 msec.
These results are repeatable and the interactive behaviour of the mail server
is _much_ better with the second query when, e.g., moving mails between
folders.
I've attached a patch to change the query in pgsql/dbpgsql.c to be the same as
the query used in mysql/dbmysql.c .
Is there a reason why the query in pgsql/dbpgsql.c has been formulated
differently, with a sub-select?
Regards,
Armin
Detailed output from "EXPLAIN ANALYZE":
dbmail=# explain analyze SELECT SUM(messagesize) FROM messages WHERE
mailbox_idnr IN (SELECT mailbox_idnr FROM mailboxes WHERE owner_idnr =
10::bigint) AND status < 2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=140954.36..140954.36 rows=1 width=8) (actual
time=12144.10..12144.10 rows=1 loops=1)
-> Seq Scan on messages (cost=0.00..140922.85 rows=12601 width=8) (actual
time=11.92..12130.91 rows=2056 loops=1)
Filter: ((status < 2) AND (subplan))
SubPlan
-> Seq Scan on mailboxes (cost=0.00..11.03 rows=52 width=8)
(actual time=0.02..0.43 rows=50 loops=25105)
Filter: (owner_idnr = 10::bigint)
Total runtime: 12144.25 msec
dbmail=# explain analyze SELECT SUM(m.messagesize) FROM messages m, mailboxes
mb WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = 10::bigint AND
m.status < 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=856.95..856.95 rows=1 width=24) (actual time=154.51..154.51
rows=1 loops=1)
-> Hash Join (cost=11.16..848.80 rows=3260 width=24) (actual
time=1.35..142.72 rows=2056 loops=1)
Hash Cond: ("outer".mailbox_idnr = "inner".mailbox_idnr)
-> Seq Scan on messages m (cost=0.00..670.89 rows=25202 width=16)
(actual time=0.10..111.64 rows=25105 loops=1)
Filter: (status < 2)
-> Hash (cost=11.03..11.03 rows=52 width=8) (actual time=1.11..1.11
rows=0 loops=1)
-> Seq Scan on mailboxes mb (cost=0.00..11.03 rows=52
width=8) (actual time=0.18..1.05 rows=52 loops=1)
Filter: (owner_idnr = 10::bigint)
Total runtime: 154.69 msec