Re: mysterious difference in speed when combining two queries with OR

From: PFC <lists(at)peufeu(dot)com>
To: "Hans Ekbrand" <hans(dot)ekbrand(at)sociology(dot)gu(dot)se>, pgsql-performance(at)postgresql(dot)org
Subject: Re: mysterious difference in speed when combining two queries with OR
Date: 2008-04-23 12:56:56
Message-ID: op.t92d86hhcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> I should say that this is on postgresql 7.4.16 (debian stable).

Whoa.

> I cannot understand why the following two queries differ so much in
> execution time (almost ten times)

Post EXPLAIN ANALYZE for both, and also post table definitions (with
indexes), use \d table. This will allow people to help you.

> $ time psql -o /dev/null -f query-a.sql fektest
>
> real 0m2.016s
> user 0m1.532s
> sys 0m0.140s

You are measuring the time it takes the server to perform the query, plus
this :
- time for the client (psql) to launch itself,
- to read the configuration file,
- to connect to the server, send the query
- to transfer the results back to the client (is this on network or local
? what is the amount of data transferred ?)
- to process the results, format them as text, display them,
- to close the connection,
- to exit cleanly

As you can see from the above numbers,
- 2.016 seconds elapsed on your wall clock, of which :
- 76% was used as CPU time in the client (therefore of absolutely no
relevance to postgresql server performance)
- and the rest (24%) distributed in unknown proportion between server CPU
spent to process your query, network roundtrips, data transfer, server
iowait, etcetera.

In order to properly benchmark your query, you should :

1- Ensure the server is not loaded and processing any other query (unless
you explicitly intend to test behaviour under load)
If you don't do that, your timings will be random, depending on how much
load you have, if someone holds a lock you have to wait on, etc.

2- ssh to your server and use a psql session local to the server, to
avoid network roundtrips.

3- enable statement timing with \t

2- EXPLAIN your query.

Check the plan.
Check the time it took to EXPLAIN, this will tell you how much time it
takes to parse and plan your query.

2- EXPLAIN ANALYZE your query.

Do it several times, note the different timings and understand the query
plans.
If the data was not cached, the first timing will be much longer than the
subsequent other timings. This will give you useful information about the
behaviour of this query : if lasts for 1 second (cached) and 5 minutes
(not cached), you might not want to execute it at the same time as that
huge scheduled backup job. Those timings will also provide hints on wether
you should CLUSTER the table, etc.

3- EXPLAIN SELECT count(*) FROM (your query) AS foo
Check that the plan is the same.

4- SELECT count(*) FROM (your query) AS foo
The count(*) means very little data is exchanged between client and
server, so this doesn't mess with the timing.

Now, compare :

The timings displayed by psql (\t) include query planning, roundtrip to
server, and result processing (hence the count(*) to reduce this overhead).
The timings displayed by EXPLAIN ANALYZE include only query execution
time, but EXPLAIN ANALYZE is slower than just executing the query, because
it takes time to instrument the query and measure its performance. For
instance, on a very simple query that computes an aggregate on lots of
rows, more time will be spent measuring than actually executing the query.
Hence steps 3 and 4 above.

Knowing this, you deduce the time it takes to parse & plan your query
(should you then use PREPAREd statements ? up to you) and the time it
takes to execute it.

5- EXPLAIN ANALYZE, while changing the parameters (trying some very
selective or less selective ones) to check for plan change, mess with
enable_**** parameters to check for different plans, rewrite the query
differently (DISTINCT/GROUP BY, OR/UNION, JOIN or IN(subquery), etc).

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Gregory Stark 2008-04-23 13:31:39 Re: mysterious difference in speed when combining two queries with OR
Previous Message Theo Kramer 2008-04-23 11:00:07 Re: mysterious difference in speed when combining two queries with OR