Query runtime dependent on ANALYZE run

From: Viktor Rosenfeld <listuser36(at)googlemail(dot)com>
To: pgus-general(at)postgresql(dot)org
Subject: Query runtime dependent on ANALYZE run
Date: 2012-06-06 17:59:43
Message-ID: 20120606175943.GA6182@client199-58.wlan.hu-berlin.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgus-general

Hi,

I've noticed that the selection of the executed query plan (and
therefore query runtime) is dependent on the statistics generated by
an ANALYZE run. As an demonstration, I chose the best runtime of 5
consecutive runs of the query linked below, regenerated the statistics
for the column node_annotation.value and re-ran the query. This
experiment was repeated a hundred times each for the statistics
targets 10, 100 (default), 1000, 10000. I've used PostgreSQL 9.1.3.

Query: http://www.informatik.hu-berlin.de/~rosenfel/analyze/query.sql
Schema: http://www.informatik.hu-berlin.de/~rosenfel/analyze/schema.pdf

The next table shows for each statistics target the number of distinct
plans generated in the experiment, how often the most common plan was
generated, the runtime (ms) of the best and worst non-unique plan, and
the number of timeouts where the query did not finish within 60 seconds.

| Statistics | # plans | most common plan | Best | Worst | Timeouts |
|------------+---------+------------------+------+-------+----------|
| 10 | 37 | 60 | 1876 | 2180 | 0 |
| 100 | 90 | 4 | 2225 | 7927 | 14 |
| 1000 | 75 | 6 | 2214 | 6329 | 22 |
| 10000 | 6 | 85 | 2195 | 2900 | 3 |

The distribution for each statistics target is linked below:

Distribution: http://www.informatik.hu-berlin.de/~rosenfel/analyze/histogram.pdf

As one can see, using the default value of 100 (and also 1000) there is
a considerable spread in the runtime of the query. The best plan is
chosen most often, but only about a quarter of the time and there are
also many timeouts. The best results can be achieved with a statistics
target of 10: The most stable query plan is the second best and there
are no timeouts. Using a statistics target of 10000 generates the most
stable plan selection, i.e. the same plan is chosen most often, but it
is almost a second slower than the best plan.

I would like to know how I can mitigate against these random results.

Cheers,
Viktor

Responses

Browse pgus-general by date

  From Date Subject
Next Message Josh Berkus 2012-06-07 00:18:56 Re: Query runtime dependent on ANALYZE run
Previous Message Gayathri Theenthayalan 2011-01-19 22:27:00 Looking for PostgreSQL DBA Architect/Administrator