Re: query has huge variance in execution times

From: Brian Cox <brian(dot)cox(at)ca(dot)com>
To: "David Wilson [david(dot)t(dot)wilson(at)gmail(dot)com]" <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: query has huge variance in execution times
Date: 2010-03-31 18:10:53
Message-ID: 4BB3902D.7010205@ca.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 03/31/2010 12:37 AM, David Wilson [david(dot)t(dot)wilson(at)gmail(dot)com] wrote:
> These won't necessarily get the same plan. If you want to see what plan
> the prepared query is getting, you'll need to prepare it ("prepare foo
> as <query>") and then explain *that* via "explain execute foo".
>
> The prepared version likely has a much more generic plan, whereas the
> regular query gets optimized for the actual values provided.

I didn't know this. Thanks. The plans are indeed different:

cemdb=# prepare sq as select b.ts_id from ts_stats_tranunit_user_daily
b, ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id =
c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and
b.ts_user_incarnation_id = c.ts_user_incarnation_id and
c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and
b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4;
cemdb=# explain execute sq('2010-3-29 01:00', '2010-3-29 02:00',
'2010-3-29', '2010-3-30');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=7885.37..8085.91 rows=30 width=8)
Merge Cond: ((b.ts_transet_incarnation_id =
c.ts_transet_incarnation_id) AND (b.ts_tranunit_id = c.ts_tranunit_id)
AND (b.ts_user_incarnation_id = c.ts_user_incarnation_id))
-> Sort (cost=1711.82..1716.81 rows=3994 width=32)
Sort Key: b.ts_transet_incarnation_id, b.ts_tranunit_id,
b.ts_user_incarnation_id
-> Index Scan using ts_stats_tranunit_user_daily_starttime on
ts_stats_tranunit_user_daily b (cost=0.00..1592.36 rows=3994 width=32)
Index Cond: ((ts_interval_start_time >= $3) AND
(ts_interval_start_time < $4))

cemdb=# explain select b.ts_id from ts_stats_tranunit_user_daily b,
ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id =
c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and
b.ts_user_incarnation_id = c.ts_user_incarnation_id and
c.ts_interval_start_time >= '2010-3-29 01:00' and
c.ts_interval_start_time < '2010-3-29 02:00' and
b.ts_interval_start_time >= '2010-3-29' and b.ts_interval_start_time <
'2010-3-30';

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=291965.90..335021.46 rows=13146 width=8)
Hash Cond: ((c.ts_transet_incarnation_id =
b.ts_transet_incarnation_id) AND (c.ts_tranunit_id = b.ts_tranunit_id)
AND (c.ts_user_incarnation_id = b.ts_user_incarnation_id))
-> Index Scan using ts_stats_tranunit_user_interval_starttime on
ts_stats_tranunit_user_interval c (cost=0.00..11783.36 rows=88529 width=24)
Index Cond: ((ts_interval_start_time >= '2010-03-29
01:00:00-07'::timestamp with time zone) AND (ts_interval_start_time <
'2010-03-29 02:00:00-07'::timestamp with time zone))
-> Hash (cost=285681.32..285681.32 rows=718238 width=32)
-> Index Scan using ts_stats_tranunit_user_daily_starttime on
ts_stats_tranunit_user_daily b (cost=0.00..285681.32 rows=718238 width=32)
Index Cond: ((ts_interval_start_time >= '2010-03-29
00:00:00-07'::timestamp with time zone) AND (ts_interval_start_time <
'2010-03-30 00:00:00-07'::timestamp with time zone))
(7 rows)

-> Sort (cost=6173.55..6218.65 rows=36085 width=24)
Sort Key: c.ts_transet_incarnation_id, c.ts_tranunit_id,
c.ts_user_incarnation_id
-> Index Scan using ts_stats_tranunit_user_interval_starttime
on ts_stats_tranunit_user_interval c (cost=0.00..4807.81 rows=36085
width=24)
Index Cond: ((ts_interval_start_time >= $1) AND
(ts_interval_start_time < $2))
(10 rows)

I notice that the row estimates are substantially different; this is due
to the lack of actual values?

But, this prepared query runs in ~4 secs:

[root(at)rdl64xeoserv01 log]# cat /tmp/select.sql
prepare sq as select b.ts_id from ts_stats_tranunit_user_daily b,
ts_stats_tranunit_user_interval c where b.ts_transet_incarnation_id =
c.ts_transet_incarnation_id and b.ts_tranunit_id = c.ts_tranunit_id and
b.ts_user_incarnation_id = c.ts_user_incarnation_id and
c.ts_interval_start_time >= $1 and c.ts_interval_start_time < $2 and
b.ts_interval_start_time >= $3 and b.ts_interval_start_time < $4;
execute sq('2010-3-29 01:00', '2010-3-29 02:00', '2010-3-29', '2010-3-30

[root(at)rdl64xeoserv01 log]# time PGPASSWORD=quality psql -U postgres -d
cemdb -f /tmp/select.sql > /tmp/select1.txt 2>&1
real 0m4.131s
user 0m0.119s
sys 0m0.007s

so the question still remains: why did it take > 20 mins? To see if it
was due to autovacuum running ANALYZE, I turned off autovacuum, created
a table using SELECT * INTO temp FROM ts_stats_tranunit_user_daily,
added the index on ts_interval_start_time and ran the prepared query
with temp, but the query completed in a few secs.

Brian

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message David Wilson 2010-03-31 18:44:44 Re: query has huge variance in execution times
Previous Message Francisco Reyes 2010-03-31 16:59:52 Re: 3ware vs. MegaRAID