Re: Simple postgresql.conf wizard

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <jd(at)commandprompt(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Simple postgresql.conf wizard
Date: 2008-12-05 15:01:51
Message-ID: 87vdtyvf6o.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:

> One more data point to try to help.
>
> While the jump from a default_statistics_target from 10 to 1000
> resulted in a plan time increase for a common query from 50 ms to 310
> ms, at a target of 50 the plan time was 53 ms. Analyze time was 7.2
> minutes and 18.5 minutes for targets of 10 and 50. This is an 842 GB
> database on an 8 processor (3.5 GHz Xeon) machine with 64 GB RAM
> running (soon to be updated) PostgreSQL 8.2.7.

So my half-finished DBT3 tests showed hardly any change in planning time for
default_statistics_targets 1000 and even Kevin Grittner's query which is the
worst real example posted so far only went from 50ms to 310ms.

So I started thinking perhaps substantially larger values might not have much
effect on planning at all. (Actual data is a lot more convincing than simple
assertions!).

Kevin's query was an OLTP query so 300ms is still way too much -- 300ms is on
the high end for OLTP response times for query *execution*. But seeing the
intermediate values would be interesting.

So I wondered what the worst-case would be for a synthetic case designed to
exercise the planner severely. This would also be useful for optimizing the
planner under gprof, though I suspect the hot spots are pretty obvious even
without empirical data.

So anyways, here's a script to create a table with a 75k pg_statistic record.
And a worst-case query where the plan time goes from 34ms to 1.2s for
histogram sizes between 10 and 1,000.

Looking at eqjoinsel I think it could be improved algorithmically if we keep
the mcv list in sorted order, even if it's just binary sorted order. But I'm
not sure what else uses those values and whether the current ordering is
significant. I'm also not sure it's the only O(n^2) algorithm there and
there's no algorithmic gain unless they're all knocked down.

Incidentally this timing is with the 75kB toasted arrays in shared buffers
because the table has just been analyzed. If it was on a busy system then just
planning the query could involve 75kB of I/O which is what I believe was
happening to me way back when I last observed super-long plan times.

postgres=# create table tk as select random()::text||random()::text||random()::text||random()::text||random()::text||random()::text as r from generate_series(1,1000);
postgres=# insert into tk (select * from tk);
postgres=# insert into tk (select * from tk);
postgres=# insert into tk (select random()::text||random()::text||random()::text||random()::text||random()::text||random()::text as r from generate_series(1,2000));
postgres=# alter table tk alter r set statistics 1000;
postgres=# analyze tk;

postgres=# select pg_column_size(stavalues1) from pg_statistic where starelid = 'tk'::regclass;
pg_column_size
----------------
75484
(1 row)
postgres=# explain select count(*) from (select * from tk as k, tk as l,tk as m,tk as n,tk as o,tk as p where k.r=l.r and k.r=m.r and k.r=n.r and k.r=o.r and k.r=p.r) as x;
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Aggregate (cost=41358.14..41358.15 rows=1 width=0)
-> Merge Join (cost=3213.13..37713.13 rows=1458000 width=0)
Merge Cond: (k.r = l.r)
-> Merge Join (cost=2677.61..14092.61 rows=486000 width=510)
Merge Cond: (k.r = m.r)
-> Merge Join (cost=2142.09..5862.09 rows=162000 width=408)
Merge Cond: (k.r = n.r)
-> Merge Join (cost=1606.57..2761.57 rows=54000 width=306)
Merge Cond: (k.r = o.r)
-> Merge Join (cost=1071.04..1371.04 rows=18000 width=204)
Merge Cond: (k.r = p.r)
-> Sort (cost=535.52..550.52 rows=6000 width=102)
Sort Key: k.r
-> Seq Scan on tk k (cost=0.00..159.00 rows=6000 width=102)
-> Sort (cost=535.52..550.52 rows=6000 width=102)
Sort Key: p.r
-> Seq Scan on tk p (cost=0.00..159.00 rows=6000 width=102)
-> Sort (cost=535.52..550.52 rows=6000 width=102)
Sort Key: o.r
-> Seq Scan on tk o (cost=0.00..159.00 rows=6000 width=102)
-> Sort (cost=535.52..550.52 rows=6000 width=102)
Sort Key: n.r
-> Seq Scan on tk n (cost=0.00..159.00 rows=6000 width=102)
-> Sort (cost=535.52..550.52 rows=6000 width=102)
Sort Key: m.r
-> Seq Scan on tk m (cost=0.00..159.00 rows=6000 width=102)
-> Sort (cost=535.52..550.52 rows=6000 width=102)
Sort Key: l.r
-> Seq Scan on tk l (cost=0.00..159.00 rows=6000 width=102)
(29 rows)

Time: 1186.315 ms

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Cave-Ayland 2008-12-05 15:09:42 Re: [postgis-devel] CLUSTER in 8.3 on GIST indexes break
Previous Message Kurt Harriman 2008-12-05 14:51:50 Re: Mostly Harmless: Welcoming our C++ friends