Mailing List Archive

Quota query in pgsql/dbpgsql.c
Hello,

I had a look at the query used in pgsql/dbpgsql.c to calculate the used quota
in db_get_quotum_used(). I executed the current query (with "size" instead of
"m.size")

SELECT SUM(messagesize)
FROM messages
WHERE mailbox_idnr IN
(SELECT mailbox_idnr
FROM mailboxes
WHERE owner_idnr = %llu::bigint)
AND status < 2;

and the equivalent query (which is used in mysql/dbmysql.c)

SELECT SUM(m.messagesize)
FROM messages m, mailboxes mb
WHERE m.mailbox_idnr = mb.mailbox_idnr
AND mb.owner_idnr = %llu::bigint
AND m.status < 2;

on an account which had 2155 mails in 52 mailboxes (the whole database
contains 26000 mails in 400 mailboxes). The server is a Celeron 1200MHz with
256 MB RAM.

I asked PostgreSQL 7.3 to EXPLAIN ANALYZE the queries (details see below):
PostgreSQL uses two nested sequential scans (on messages and mailboxes) for
the first query and the total execution time is 12144.25 msec.
The second query is executed as a hash join and takes only 154.69 msec.

These results are repeatable and the interactive behaviour of the mail server
is _much_ better with the second query when, e.g., moving mails between
folders.

I've attached a patch to change the query in pgsql/dbpgsql.c to be the same as
the query used in mysql/dbmysql.c .

Is there a reason why the query in pgsql/dbpgsql.c has been formulated
differently, with a sub-select?

Regards,
Armin


Detailed output from "EXPLAIN ANALYZE":

dbmail=# explain analyze SELECT SUM(messagesize) FROM messages WHERE
mailbox_idnr IN (SELECT mailbox_idnr FROM mailboxes WHERE owner_idnr =
10::bigint) AND status < 2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=140954.36..140954.36 rows=1 width=8) (actual
time=12144.10..12144.10 rows=1 loops=1)
-> Seq Scan on messages (cost=0.00..140922.85 rows=12601 width=8) (actual
time=11.92..12130.91 rows=2056 loops=1)
Filter: ((status < 2) AND (subplan))
SubPlan
-> Seq Scan on mailboxes (cost=0.00..11.03 rows=52 width=8)
(actual time=0.02..0.43 rows=50 loops=25105)
Filter: (owner_idnr = 10::bigint)
Total runtime: 12144.25 msec


dbmail=# explain analyze SELECT SUM(m.messagesize) FROM messages m, mailboxes
mb WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr = 10::bigint AND
m.status < 2;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=856.95..856.95 rows=1 width=24) (actual time=154.51..154.51
rows=1 loops=1)
-> Hash Join (cost=11.16..848.80 rows=3260 width=24) (actual
time=1.35..142.72 rows=2056 loops=1)
Hash Cond: ("outer".mailbox_idnr = "inner".mailbox_idnr)
-> Seq Scan on messages m (cost=0.00..670.89 rows=25202 width=16)
(actual time=0.10..111.64 rows=25105 loops=1)
Filter: (status < 2)
-> Hash (cost=11.03..11.03 rows=52 width=8) (actual time=1.11..1.11
rows=0 loops=1)
-> Seq Scan on mailboxes mb (cost=0.00..11.03 rows=52
width=8) (actual time=0.18..1.05 rows=52 loops=1)
Filter: (owner_idnr = 10::bigint)
Total runtime: 154.69 msec
Re: Quota query in pgsql/dbpgsql.c [ In reply to ]
cvs is updated :-)

Armin Groesslinger heeft op dinsdag, 20 mei 2003 om 21:01
(Europe/Amsterdam) het volgende geschreven:

> Hello,
>
> I had a look at the query used in pgsql/dbpgsql.c to calculate the
> used quota
> in db_get_quotum_used(). I executed the current query (with "size"
> instead of
> "m.size")
>
> SELECT SUM(messagesize)
> FROM messages
> WHERE mailbox_idnr IN
> (SELECT mailbox_idnr
> FROM mailboxes
> WHERE owner_idnr = %llu::bigint)
> AND status < 2;
>
> and the equivalent query (which is used in mysql/dbmysql.c)
>
> SELECT SUM(m.messagesize)
> FROM messages m, mailboxes mb
> WHERE m.mailbox_idnr = mb.mailbox_idnr
> AND mb.owner_idnr = %llu::bigint
> AND m.status < 2;
>
> on an account which had 2155 mails in 52 mailboxes (the whole database
> contains 26000 mails in 400 mailboxes). The server is a Celeron
> 1200MHz with
> 256 MB RAM.
>
> I asked PostgreSQL 7.3 to EXPLAIN ANALYZE the queries (details see
> below):
> PostgreSQL uses two nested sequential scans (on messages and
> mailboxes) for
> the first query and the total execution time is 12144.25 msec.
> The second query is executed as a hash join and takes only 154.69 msec.
>
> These results are repeatable and the interactive behaviour of the mail
> server
> is _much_ better with the second query when, e.g., moving mails between
> folders.
>
> I've attached a patch to change the query in pgsql/dbpgsql.c to be the
> same as
> the query used in mysql/dbmysql.c .
>
> Is there a reason why the query in pgsql/dbpgsql.c has been formulated
> differently, with a sub-select?
>
> Regards,
> Armin
>
>
> Detailed output from "EXPLAIN ANALYZE":
>
> dbmail=# explain analyze SELECT SUM(messagesize) FROM messages WHERE
> mailbox_idnr IN (SELECT mailbox_idnr FROM mailboxes WHERE owner_idnr =
> 10::bigint) AND status < 2;
> QUERY PLAN
> -----------------------------------------------------------------------
> ------------------------------------------------
> Aggregate (cost=140954.36..140954.36 rows=1 width=8) (actual
> time=12144.10..12144.10 rows=1 loops=1)
> -> Seq Scan on messages (cost=0.00..140922.85 rows=12601 width=8)
> (actual
> time=11.92..12130.91 rows=2056 loops=1)
> Filter: ((status < 2) AND (subplan))
> SubPlan
> -> Seq Scan on mailboxes (cost=0.00..11.03 rows=52
> width=8)
> (actual time=0.02..0.43 rows=50 loops=25105)
> Filter: (owner_idnr = 10::bigint)
> Total runtime: 12144.25 msec
>
>
> dbmail=# explain analyze SELECT SUM(m.messagesize) FROM messages m,
> mailboxes
> mb WHERE m.mailbox_idnr = mb.mailbox_idnr AND mb.owner_idnr =
> 10::bigint AND
> m.status < 2;
> QUERY PLAN
> -----------------------------------------------------------------------
> ----------------------------------------------------
> Aggregate (cost=856.95..856.95 rows=1 width=24) (actual
> time=154.51..154.51
> rows=1 loops=1)
> -> Hash Join (cost=11.16..848.80 rows=3260 width=24) (actual
> time=1.35..142.72 rows=2056 loops=1)
> Hash Cond: ("outer".mailbox_idnr = "inner".mailbox_idnr)
> -> Seq Scan on messages m (cost=0.00..670.89 rows=25202
> width=16)
> (actual time=0.10..111.64 rows=25105 loops=1)
> Filter: (status < 2)
> -> Hash (cost=11.03..11.03 rows=52 width=8) (actual
> time=1.11..1.11
> rows=0 loops=1)
> -> Seq Scan on mailboxes mb (cost=0.00..11.03 rows=52
> width=8) (actual time=0.18..1.05 rows=52 loops=1)
> Filter: (owner_idnr = 10::bigint)
> Total runtime: 154.69 msec
> <pg-quota-query.diff>

_________________________
R.A. Rozendaal
IC&S
T: +31 30 2322878
F: +31 30 2322305
www.ic-s.nl