Mailing List Archive

Postgresql equivalent of Python's timeit?
Hi,
This is more related to Postgresql than to Python, I hope this is ok.
I want to measure Postgres queries N times, much like Python timeit
(https://docs.python.org/3/library/timeit.html). I know about EXPLAIN
ANALYZE and psql \timing, but there's quite a bit of variation in the
times. Is there a timeit-like function in Postgresql?
Thanks!
Albert-Jan
--
https://mail.python.org/mailman/listinfo/python-list
Re: Postgresql equivalent of Python's timeit? [ In reply to ]
On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
> This is more related to Postgresql than to Python, I hope this is ok.
> I want to measure Postgres queries N times, much like Python timeit
> (https://docs.python.org/3/library/timeit.html). I know about EXPLAIN
> ANALYZE and psql \timing, but there's quite a bit of variation in the
> times. Is there a timeit-like function in Postgresql?

Why not simply call it n times from Python?

(But be aware that calling the same query n times in a row is likely to be
unrealistically fast because most of the data will already be in
memory.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
Re: Postgresql equivalent of Python's timeit? [ In reply to ]
On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
<python-list@python.org> wrote:

On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
>    This is more related to Postgresql than to Python, I hope this is
ok.
>    I want to measure Postgres queries N times, much like Python timeit
>    (https://docs.python.org/3/library/timeit.html). I know about
EXPLAIN
>    ANALYZE and psql \timing, but there's quite a bit of variation in
the
>    times. Is there a timeit-like function in Postgresql?

Why not simply call it n times from Python?

(But be aware that calling the same query n times in a row is likely to
be
unrealistically fast because most of the data will already be in
memory.)

=====
Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
don't re-use the same database connection.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Postgresql equivalent of Python's timeit? [ In reply to ]
On 9/17/2023 5:01 AM, Albert-Jan Roskam via Python-list wrote:
> On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
> <python-list@python.org> wrote:
>
> On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
> >    This is more related to Postgresql than to Python, I hope this is
> ok.
> >    I want to measure Postgres queries N times, much like Python timeit
> >    (https://docs.python.org/3/library/timeit.html). I know about
> EXPLAIN
> >    ANALYZE and psql \timing, but there's quite a bit of variation in
> the
> >    times. Is there a timeit-like function in Postgresql?
>
> Why not simply call it n times from Python?
>
> (But be aware that calling the same query n times in a row is likely to
> be
> unrealistically fast because most of the data will already be in
> memory.)
>
> =====
> Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
> don't re-use the same database connection.

Here is a stack overflow thread that gives a good rundown of ways to get
timings from Postgres:

https://stackoverflow.com/questions/9063402/get-execution-time-of-postgresql-query

--
https://mail.python.org/mailman/listinfo/python-list
RE: Postgresql equivalent of Python's timeit? [ In reply to ]
Timing things that are fairly simple is hard enough to do repeatedly, but when it involves access to slower media and especially to network connections to servers, the number of things that can change are enormous. There are all kinds of caching at various levels depending on your hardware and resource contention with other programs running here and there as well as on various network-like structures and busses or just hard disks. Asking for anything to be repeated multiple times in a row as a general rule can make your results seem slower or faster depending on too many factors including what else is running on your machine.

I am wondering if an approach to running something N times that may average things out a bit is to simply put in a pause. Have your program wait a few minutes between attempts and perhaps even do other things within your loop that make it likely some of the resources you want not to be in a queue have a chance to be flushed as other things take their place. Obviously, a machine or system with lots of resources may take more effort to use enough new data that replaces the old.

Good luck. Getting reliable numbers is no easy feat as someone else may have trouble duplicating the results with a somewhat different setup.

-----Original Message-----
From: Python-list <python-list-bounces+avi.e.gross=gmail.com@python.org> On Behalf Of Albert-Jan Roskam via Python-list
Sent: Sunday, September 17, 2023 5:02 AM
To: Peter J. Holzer <hjp-python@hjp.at>
Cc: python-list@python.org
Subject: Re: Postgresql equivalent of Python's timeit?

On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
<python-list@python.org> wrote:

On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
> This is more related to Postgresql than to Python, I hope this is
ok.
> I want to measure Postgres queries N times, much like Python timeit
> (https://docs.python.org/3/library/timeit.html). I know about
EXPLAIN
> ANALYZE and psql \timing, but there's quite a bit of variation in
the
> times. Is there a timeit-like function in Postgresql?

Why not simply call it n times from Python?

(But be aware that calling the same query n times in a row is likely to
be
unrealistically fast because most of the data will already be in
memory.)

=====
Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
don't re-use the same database connection.
--
https://mail.python.org/mailman/listinfo/python-list

--
https://mail.python.org/mailman/listinfo/python-list
Re: Postgresql equivalent of Python's timeit? [ In reply to ]
On 9/17/2023 11:48 AM, AVI GROSS via Python-list wrote:
> Timing things that are fairly simple is hard enough to do repeatedly, but when it involves access to slower media and especially to network connections to servers, the number of things that can change are enormous. There are all kinds of caching at various levels depending on your hardware and resource contention with other programs running here and there as well as on various network-like structures and busses or just hard disks. Asking for anything to be repeated multiple times in a row as a general rule can make your results seem slower or faster depending on too many factors including what else is running on your machine.
>
> I am wondering if an approach to running something N times that may average things out a bit is to simply put in a pause. Have your program wait a few minutes between attempts and perhaps even do other things within your loop that make it likely some of the resources you want not to be in a queue have a chance to be flushed as other things take their place.

One thing I have done for timing queries is to construct a series of
test queries with the query parameters drawn randomly from a large set
of values. The hope is that random combinations will defeat caching and
provide a reasonably realistic view of the times.


--
https://mail.python.org/mailman/listinfo/python-list
Re: Postgresql equivalent of Python's timeit? [ In reply to ]
On 2023-09-17 11:01:43 +0200, Albert-Jan Roskam via Python-list wrote:
> On Sep 15, 2023 19:45, "Peter J. Holzer via Python-list"
> <python-list@python.org> wrote:
>
> On 2023-09-15 17:42:06 +0200, Albert-Jan Roskam via Python-list wrote:
> >??? This is more related to Postgresql than to Python, I hope this is
> ok.
> >??? I want to measure Postgres queries N times, much like Python timeit
> >??? (https://docs.python.org/3/library/timeit.html). I know about
> EXPLAIN
> >??? ANALYZE and psql \timing, but there's quite a bit of variation in
> the
> >??? times. Is there a timeit-like function in Postgresql?
>
> Why not simply call it n times from Python?
>
> (But be aware that calling the same query n times in a row is likely to
> be
> unrealistically fast because most of the data will already be in
> memory.)
>
> =====
> Thanks, I'll give this a shot. Hopefully the caching is not an issue if I
> don't re-use the same database connection.

There is some per-session caching, but the bulk of it is shared between
sessions or even in the operating system. And you wouldn't want to get
rid of these caches either (which you could do by rebooting or - a bit
faster - restarting postgres and dropping the caches
(/proc/sys/vm/drop_caches on Linux), because that would make the
benchmark unrealistically slow (unless you want to establish some
worst-case baseline). During normal operations some data will be cached,
but probably not all of it and it will change depending on workload and
possibly other factors.

I think Avi's advice to wait for a few minutes between repetitions is
good. Of course that means that you can't just time the whole thing but
have to time each query separately and then compute the average. (On the
bright side that also gives you the opportunity to compute standard
deviation, min, max, quantiles, etc.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"